Atari – Solving Games with AI???? (Part 2: Neuroevolution)

If you missed Part 1, or…towardsdatascience.comSimilarly to the above example of the evolutionary algorithm used in the game of Snake, we are also going to map specific model weights(chromosomes) with evaluation scores returned by the game environment.Afterward, we are going to perform an evolution of the population of the chromosomes (model weights) using the ‘survival of the fittest’ principles, which will ultimately lead us to selecting the top chromosomes..While our neuroevolution agent will optimize to successfully play Atlantis, such a system would be able to solve any black box problem with a similar structure (i.e reward function, ease of emulation, etc.).In other words, we can describe our approach as follows.We have a Convolutional Neural Network model that takes game state in the form of pixels as an input and returns an action to perform as an output.Our goal is to optimize the model’s weights so that the network’s outputs give us the best actions/results.We are going to optimize our network with an Evolutionary Algorithm.With that being said, let’s get straight to the code and create our Neuroevolution agent!Atari – Genetic EvolutionYou can find the full codebase of the Atari project heregsurma/atariAI research environment for the Atari 2600 games ????..- gsurma/atarigithub.comand I encourage you to check it right now and follow along.This is how our GE algorithm looks like.While it may look overwhelming at the first sight, we are going to go through it step by step.SelectionWe are selecting the top performers of the population (10%) based on their fitness scores..Only the selected chromosomes will be allowed to procreate and breed a new generation.For each chromosome, we are performing a gameplay and storing a final score which will be used for an evaluation.In order to perform a ‘gameplay for chromosome’, we need to set the weights of the model with a specified chromosome and then perform every move based on the model’s predictions..This is a key for mapping chromosomes with the evaluation scores.You may ask now if we can set the weights of the model with a vector of any shape.No, we cannot..Our chromosomes need to have the exact shape as our model’s parameters.To give you a better understanding of this concept, let’s look at the following piece of code that sets the initial population with the random values.In line 2 we are getting an array of weights from the CNN model..Specific values are not important there because we are going to overwrite them anyway but with such an array, we can easily derive its shape.Afterward, from line 7 to line 22 we are disentangling the network array with a series of the for-loops that can access every value in the array..Ultimately, we are going to access and set each of the 1 686 180 params.To wrap things up, in the selection phase each chromosome in the population gets evaluated in the game environment and only the top performers get qualified for the next step.CrossoverIn the crossover stage, we are going to mix the top chromosomes from the selection phase and get their offsprings to populate a new generation.We are going to match our parents based on the roulette selection..Without going into the further details (which I already covered), its underlying concept states that the better the score of a single chromosome, the more likely it is to reproduce.After we have picked our pairs of chromosomes, we actually need to make them perform a crossover.There are a couple of ways to perform a crossover and in this project, we are going to perform the one in which we randomly change the corresponding genes across the two parents.MutationAfter we’ve breed our new population, there is one more step to go that would keep our population genetically diverse – mutation.The concept is very simple, to maintain a genetic diversity we are going to set random values to stochastically selected (with the probability of 0.01) genes.Keep in mind that the above code with the for-loop array unwrapping can (and should) be refactored and optimized..I just left it like this to give you a better understanding of how each network’s layer/parameter gets accessed.TrainingNow as we’ve covered how a single generation is being created and evaluated, we can run our algorithm and watch how our population gets better with every generation.I’ve picked the following hyperparams for our GE algorithmbut you should definitely play with them and hopefully get even better results.~220 training generations (normalized scores)Above training phase took about a week on the 2.9 GHz Intel i7 Quad-Core CPU but there are a couple of ways that can speed up the process:reduce the network’s size, it’s definitely possible to get decent scores with way less than 1 686 180 paramsoptimize the chromosome un-wrapping codespeed up the game emulation phase (the biggest bottleneck)engage GPUsTestingFinally, after ~220 generations of training we can evaluate our final model which is an outcome of the two top performers from the last generation.Human average: ~29,000GE average: ~31,000 (106%)What’s next?We’ve proved that it’s possible to optimize Neural Network’s weights with a Genetic Evolution algorithm in order to successfully play Atlantis on an above-human level..The very important thing about such algorithm is its general nature..We didn’t have to hardcode any domain-specifics or use heuristics to make it successfully play Atlantis.. More details

Leave a Reply