DeepTraffic – DQN Tuning for Traffic Navigation (75.01 MPH) ????

(Part 1: Reinforcement Learning)Demystifying Double Deep Q-Learningtowardsdatascience.

comNow that we have revised our RL skillset, let’s proceed to the competition!DeepTraffic SolutionLet’s begin with representing our DeepTraffic problem in terms of the Markov Chain.

StateWe can define our environment as a grid, where each cell can be occupied by a vehicle.

We’ll represent our state by such a grid, where each cell will be represented by the speed of the vehicle inside it (if it’s empty, it will have a max speed of 80 mps).

We are going to feed a model with such an array of values.

Our learning input is defined with the following (customizable) variables.

lanesSide = 2;patchesAhead = 17;patchesBehind = 5;And this is how it looks.

We reference side/ahead/behind variables with the front bumper of a controlled vehicle (the one with the black tire footprint).

ActionKeep in mind that we are not going to modify or handle in any way above actions.

Our goal is to create an agent that can learn on its own when to use a specific action without explicit instructions.

RewardAfter each action taken, we are going to receive a reward that reflects whether it was a good or a bad one.

Similarly to the actions, we don’t have to worry about the rewards directly.

Finally, having our states, actions, and rewards defined, we can proceed to the Q-Learning.

DQNWhile above DQN algorithm looks very simple (and indeed it is), we need to pick good parameters that will define DQN’s behavior (brain) and allow it to successfully learn.

HyperparametersHere is the list of all parameters along with the explanations.

(source: https://github.

com/lexfridman/deeptraffic)While the DQN algorithm is already prepared for us, our goal as competitors is to search through the hyperparameters space in order to find the ones that would yield the most successful solution.

75.

01 MPH SolutionHere is my solution that gives 75.

01 mph score (Top 2%).

Feel free to use it as a starter code, improve it, and beat my score ????.

gsurma/deep_trafficMIT DeepTraffic 75.

01 mps solution.

Contribute to gsurma/deep_traffic development by creating an account on GitHub.

github.

comVisualizationInsightsI encourage you to look into the DeepTraffic’s paper to look for the valuable insights.

Here is a couple of them.

Increase the learning input.

It makes sense because, in order to achieve a better score, we need to allow our RL agent to learn how to avoid traffic pockets that can significantly slow it down.

The further we look, the more information about the future we are going to feed our model with, thus allowing us to optimize for such long-term situations.

(source: https://arxiv.

org/pdf/1801.

02805.

pdf)Increase the network size and training time.

This is directly connected with the previous insight.

In order to handle bigger learning inputs, we are going to need bigger networks that require more training iterations.

(source: https://arxiv.

org/pdf/1801.

02805.

pdf)Increase the discount factor (gamma).

The higher the discount factor, the more we value future events.

It means that we are going to pick actions that maximize long-term benefits instead of the short-term ones.

It may lead to learning patterns unfavorable in the short-term but beneficiary in the long-term like decelerating and changing lanes to avoid big clusters of cars.

(source: https://arxiv.

org/pdf/1801.

02805.

pdf)What’s Next?By now you should be able to create an RL agent that successfully navigates through dense traffic.

Don’t hesitate to submit your scores and participate in the competition.

I encourage you to play with the hyperparameters to improve your scores along with sharing your ideas so we can collectively reach higher scores and create better agents.

On the other note, I would like to recommend you to check Lex Fridman’s (MIT Deep Learning’s instructor recently called ‘Joe Rogan of AI’) podcast on Artificial Intelligence where he speaks with some of the most interesting people in the world on the AI related topics.

Don’t forget to check the project’s github page.

gsurma/deep_trafficMIT DeepTraffic 75.

01 mps solution.

Contribute to gsurma/deep_traffic development by creating an account on GitHub.

github.