Peaceful traffic near Angkor temples, CambodiaAI For SEA Traffic Management: Modeling (Part 2/2)Kilian TepBlockedUnblockFollowFollowingJun 17Also read: AI For SEA Traffic Management: Feature Engineering (Part 1/2)In the previous post, I’ve detailed how I’ve engineered the features and handled data gaps.
In this final part, I will be describing my LSTM model architecture and the considerations I had to take into account during my modeling.
All my code can be found on GitHub.
Get a brief recap of the dataset on this link.
Why deep learning rather than traditional time series modeling?As shown in the previous post, each geohash6 code has a corresponding set of timestamps.
When isolating the demand time series of each geohash6, we can (visually) observe stationarity , which may be suitable for traditional time series modeling (see example below).
Scatter plot of demand over time for selected geohash6 code ‘qp03wz’ over 15 daysWhile traditional time series modeling such as ARIMA looks promising, it is not scalable.
Since stationarity is only displayed when treating each geohash6 code separately, this means that we would need to train and maintain more than 1,300 time series models.
Furthermore, ARIMA models require constant retraining in order to ensure good performance, which makes it hard to maintain.
For the aforementioned reasons, I’ve decided to train a Long Short-Term Memory (LSTM) model, which has proven be to be quite effective for dealing with time series.
The advantage of the LSTM is that I can include all geohash6 codes into my training set without having to differentiate among them.
Given the big amount of estimated parameters, I simply hope that the model will be able to effectively differentiate the patterns for each geohash6 code.
2.
Demand dataset challenge: imbalanced distributionOne of the big challenges of this dataset is that it is extremely imbalanced overall.
Demand is heavily skewed to the left (see distribution below) regardless of the hour, meaning that demand is low most of the time.
Yet, traffic management is about being ready for high demand on the roads so it is important that our model learns how to predict peak times well.
There are two ways to deal with such a problem.
The first solution would be to resample in order to change the distribution of the training set.
This solution does not seem viable since we would risk losing a lot of potentially useful data points for the model.
Furthermore, since we are dealing with a time series problem, samples are not independent.
Sampling becomes tricky because we cannot do it at random as we may miss precious time information.
The second solution, which I’ve chosen, consists in giving greater weight to these underrepresented values.
3.
Solving the imbalanced dataset: exponential weightingIn order to assign weights to my training examples, I’ve used the following function on my training set using pd.
Series.
apply():4.
Keras LSTM architectureLSTM Model architecture built with Keras.
The model has two input layers, one consisting of a Demand Vector from T down to T-5; and the other consisting of a time and space vector with scaled latitude, longitude as well as scaled timestamps at time T down to T-5.
This LSTM architecture aims to predict the demand at Demand T+1The first input layer takes in a vector of size 6 consisting of Demand T until Demand T-5.
In order to pick an appropriate lag, I’ve randomly sampled a complete geohash6 code and ran Auto-Arima on it.
Auto-Arima determined that the best lag was at T-5, which seems to be a reasonable.
The model then passes on this input layer into a few LSTM layers.
The second input layer takes in a vector of size 8 consisting of the following normalized values (Min-Max Scaling): latitude, longitude, timestamp at time T, …, timestamp at time T-5.
As mentioned in the previous post, we have to apply Min-Max Scaling in order to avoid the problem of exploding gradients.
My intuition was that the location as well as the time of the day would have a strong impact on the demand.
I basically sought to combine the pure demand input with the time and space input so that the model learn the relationship with respect to demand at T+1.
The loss function is Mean Squared Error, which is expected for a regression problem.
The optimizer is Adam with a learning rate of 0.
001.
In case the model loss plateaus for two consecutive epochs, I’ve chosen to multiply the learning rate by a factor of 0.
2.
I’ve also added a kernel constraint on the final Dense layer as it needs to be between 0 and 1.
Training wise, I’ve trained the model for 30 epochs, and picked a batch size of 128.
See keras code below:See how I’ve prepared the model input from my preprocessed DataFrame in the function prepare_model_inputs on the GitHub repo.
5.
Evaluation methodology and resultsThe required evaluation metric is Root Mean Squared Error (RMSE)RMSE FormulaI trained the model on the first 47 days and evaluated its performance on the remaining 14 ones.
Roughly, the training set has about 3.
2 million samples and the test set has nearly 1 million samples.
T+1 Performance: achieved RMSE of 0.
04 on last 14 days.
T+1, …, T+5 Performance: achieved RMSE of 0.
07 on last 14 days.
In order to evaluate the T+1, and T+1, .
, T+5 performances, I’ve built the functions called evaluate_t_plus_1_performance and evaluate_t_plus_5_performance.
The latter can be quite slow since I am iterating over a transformed test set and constantly changing inputs to predict the next T.
It needs to be optimized further.
6.
Further explorationDue to time restrictions and the lack of computing resources, I haven’t had time to tune hyper parameters and attempt different model architectures that could potentially yield better results.
The function evaluate_t_plus_5_performance is also very slow when evaluating the performance on the model.
If such a model is deployed to production, the window function to predict the next 5 steps will have to be much faster.
Furthermore, with un-anonymized geohash6 codes, the model could perhaps integrate other features such as weather into the modeling.
Such factor is bound to have an impact upon the traffic.
7.
Wrap upIn short, I’ve built a custom LSTM model that aims to predict the value of the next demand.
The particularity of this model is that I added a space and time input layer after LSTM layers in order to capture the interaction between demand, location and hour of the day.
I’ve achieved an RMSE of 0.
04 at T+1 and 0.
07 at T+1, …, T+5 on my test set — the last 14 days of data.
What I’d like to explore more on this project is different model architectures with un-anonymized data, which will enable to incorporate external features related to the actual location of the demand.
Hope you’ve enjoyed!Kilian.. More details