Uber Price Prediction with Neural Network

Uber Price Prediction with Neural NetworkTyler HaunBlockedUnblockFollowFollowingJan 30Cool NeuronsUber prices vary based on supply and demand, and I wanted to predict what they would be to save time and money when using it.

Sometimes I can wait 10 minutes and the price goes down by half, other times I end up waiting an hour with no change.

So this project is meant to help with that by predicting the prices so I know how long I should wait to get one.

Data-driven decisions would be the buzz word here.

GitHub: https://github.

com/tylerhaun/uber-price-trackerOverviewThis has a few different components in it.

ETL job which needs to be running all the time to collect data on uber prices and store it in a database, a frontend client to visualize the data, and a prediction program which can learn from the collected data and be able to output prices at future times.

The ETL job is a simple node program which uses node-cron to schedule a function to run which requests uber’s fare-estimate endpoint which I found on their main marketing site https://www.

uber.

com/fare-estimate/.

The endpoint takes 2 locations, from and to, then returns the prices for each different type of uber, like pool, X black, etc.

Then the ETL job transforms the data and stores it in a simple time series SQLite table.

A big downside of it is, you need a different job running for each pair of locations.

In my example, I had it collecting data for Santa Monica to Hollywood, but I want to experiment with others.

The frontend graph is a simple d3 chart I found some code from online and improved it for this project.

I added in a hover inspector similar to google’s graphs, where when you hover over a point on the graph, it tells you the x and y values at the y intersection.

It is served using handlebars.

js which injects the data from the database.

Since the data was getting so large, I had to add in date and type filters, and dynamic downsampling of the data using the modulus operator.

(SQL building is always so ugly to read to me).

But basically, add in row_number to columns, then filter by `row_num % downsampling_factor = 0`.

I needed to upgrade SQLite to the most recent version for this since ROW_NUMBER() is a newer feature.

The downsampling factor is calculated to return a number of points similar to the numPointsTarget, so it should be easily configured to return the number of points desired.

For the prediction program, I tried starting out with node.

js’s library synaptic, hoping it would suffice.

After playing around with it for a while, I got it partially working, but it was ridiculously slow and lacked some features, so I figured to try python out since I know it is supposed to be good for data science.

I settled on keras which worked out nicely.

Was fast and had all the features I wanted, like batching, and way more than I knew what to do with.

Machine Learning PredictionIn machine learning, picking the right features is an extremely important piece to getting useful results.

Features are just properties to be measured.

Some features that might be useful in this time series analysis:UNIX timestampdifference between point’s time and nowyearmonth (1–12)date (1–31)day (1–7)hour (0–23)minute (0–59)second (0–59)holidays (Christmas, Easter, Thanksgiving)sports events / concertsI ended up using month, date, weekday, hour, and minute since they seemed like the most important, as seen in this code snippet:This outputs an array of length 134 with all binary.

For example, the month part of it gets transformed from 1–12 to a list of booleans.

An example of that is January (month #1) -> [1,0,0,0,0,0,0,0,0,0,0,0,0], February (month #2) -> [0,1,0,0,0,0,0,0,0,0,0,0] etc.

I believed this method would be best for the network since it is the least ambiguous, compared to using the value of the month (1–12) as one input.

Then to train the network, you just need to transform the time into this array and train it on the value associated with it.

For neural networks (at least the ones I’ve seen and used), all inputs and outputs are values between 0 and 1.

So I had to map the range of output values to 0–1.

I made a linear scaler class to do so, which also has an inverse which is needed after the network gets activated in inputs to get the price.

I chose 18–30 since that encompassed all of the possible prices I have seen so far.

So 18->0, 30->1, 14->0.

5, etc.

Training the ModelTraining the model took a couple iterations of playing with the values until it started looking right.

The model has 3 layers.

One input with 134 nodes, one hidden layer with 2*134=268 nodes, and one output layer with 1 node, representing the normalized price.

The 134 comes from the transform time to inputs function from before and should be changed if that function changes.

Creating the prediction dataCreating the prediction data just requires a list of dates whose values you want, running those dates through the trained model, then scaling the value back to a price.

To create the prediction times, I found the pandas date_range function which was useful.

It gives a big list of dates to run against the trained model.

Then map those dates to the boolean input lists using the same transformation function used to train the model, run the inputs through the neural network, map the data to fit the fare_estimate table columns, and save it in the database.

Result:When playing around with the model training config, it gave some pretty believable looking results.

Everything after the dotted line was predicted using the model, which I only ran for a couple minutes.

Prediction ResultTakeaways / Lessons:Flaws in the visualization of data:As the data grows, visualizing it becomes more and more difficult.

Using downsampling is simple but potentially removes important features from data.

Averaging would be the same.

Other types of graphs may improve it, such as candlestick chart or better interactivity.

Downsides of SQLite:While SQLite is a simple way to store data and works great for a number of projects I have done, I started to run into some issues with it here.

I had 2 different programs trying to access it at once which was causing lock errors.

When the prediction job was running and saving the data into the database which took a long time (over a minute in some cases) the frontend application was broken cause the database was completely locked up during that time.

I’ve never run into that issue before with a normal database (MySQL, Postgres, MongoDB) so I’m assuming it is a SQLite thing.

Also, I hit some issues with it using SQLAlchemy when trying to do a bulk insert.

Apparently, it doesn’t support bulk inserting, which required me to insert one line at a time with thousands of records which is why it was taking so long to insert.

I read that you can do transactional inserts which speed things up but I didn’t have the time to actually do it.

Timing:I always like to measure and examine the time it takes to complete projects in order to optimize it in the future.

As the cliche goes, time is money, so the faster I can complete projects, the more money I can help my company make.

ETL job ~ 2 hoursD3 graph ~ 5 hoursNodejs neural network ~ 15 hourspython neural network ~ 18 hoursThe tasks I already knew how to do well like databases, models, endpoints, APIs went very quickly, as seen with the ETL job.

With D3, I don’t have much experience with it so a big portion of that time was spent learning and experimenting until I got things working.

The node.

js neural network was a big portion learning neural networks, and another big portion waiting for it to run since node.

js is apparently very slow with it (probably why you don’t see people using node.

js for machine learning much)Then the python was giving my many issues and making me very agitated.

I spent a lot of time trying to insert data into the SQLite database.

I started by using the normal sqlite3 module and constructing the queries manually, but I wanted something more robust so I started using ORMs like peewee and SQLAlchemy.

I already had the table schemas created and wanted something that could load the models from the table.

Peewee wanted me to run a script against the database to create files with the models in it, which is stupid (actually kinda cool, but not what I wanted).

But SQLAlchemy allowed me to autoload the schemas from the table which worked fairly nicely, so I went with that.

And I spent another big part of the time trying to get dates to work.

I’ve only ever had problems when using python’s datetime module.

It comes nowhere close to comparing to moment.

js when it comes to usability, ease, and functionality.

Datetime doesn’t have timezones build into it, so you have to use pytz to set it.

And since sqlite3 uses all UTC time, I had to create a datetime, add the timezone, then change to UTC.

And all this took me hours to figure out what was going on.

Maybe python is too difficult, or maybe using javascript has made me soft.

And another issue I had that took me a while to figure out is SQLAlchemy not saving timezone offset when I insert something.

It was a simple fix, by using SQLAlchemy to construct the statement, then passing the data in myself.

So in the future, I will start off with SQLAlchemy since it seems like the most flexible option.

I can build queries, edit then execute them myself.

And it can grab the table structure from the database which was super useful.

Python vs JavaScriptJavaScript is still #1 to me, but python is by far superior when it comes to big data.

Node.

js took a few orders of magnitude longer to train the models than python and got worse results.

But getting python to work properly is a headache.

It might be from my inexperience with it, but javascript seems so much simpler and intuitive, plus has really nice tools like the chrome dev tools inspector.

I long for the day where JavaScript can do data science, but until then, python it is.

Conclusion:There is a lot of hype around machine learning and AI in the media at the moment and I feel most of it is just marketing, as seen with many of the big companies like Google and their AI that can detect videos of cats.

But there are definitely some uses for it.

This project used it to allow me to know future prices of Ubers which can save me money and allows me to make more informed decisions.

I can see this concept being extended to many different uses and industries, helping businesses grow and giving them an advantage over their competitors.

.

. More details

Leave a Reply