Top dog: AlphaZero / AlphaStar

Top dog: AlphaZero / AlphaStarLois RBlockedUnblockFollowFollowingFeb 27AlphaZero and AlphaStar are two AIs built by Google’s DeepMind to play games at a professional level.

Building AIs to beat humans (or now previous AIs) at games has always been a way to evaluate AI systems, as it a low risk thing to test, but still captures headlines and interest.

AlphaZero came out in late 2017 and was an AI to play chess, shogi and Go, and it was able to beat previous top AIs for each game.

Go was a hard game to crack, and the first ever AI to beat a human, AlphaGo, came out in October 2015.

Whereas DeepBlue, the first AI to beat a human in chess came out in 1996.

That’s almost a 20 year difference and gives a good picture of the complexity of the game Go over chess.

AlphaGo was also made by DeepMind and is a precursor to AlphaZero.

AlphaZero was unique because of how it learned to play the games.

AlphaZero used deep neural networks that were only given the rules for the games.

It didn’t use ‘real’ games that had been played by humans for training or any strategy for how to beat the game.

Instead it only used games against itself to train on.

The process of having AlphaZero play against itself is called reinforcement learning.

It’s where an untrained system with neural networks played games against itself millions of times over, and is programed to have ‘rewards’ and ‘punishments’ for winning and losing just like operant conditioning in psychology.

The games start out completely random, but as the system learns from its wins and losses it starts making adjustments to its gameplay to try to win.

The amount of time needed to train for this varied per game depending on complexity.

Chess took 9 hours, shogi took 12 hours and Go took 13 days.

Past board game AIs used different kinds of tree searches combined with human-created heuristics to prune the search space.

AlphaZero it was able to beat the previous top AIs for the different games in a matter of hours of training to build really good machine-created heuristics.

The image below shows how AlphaZero played against the top AIs per game once training was over:AlphaStar came out in January 2019 and is the first AI to beat a professional player at StarCraft II.

StarCraft is a game that has been around since 1998 and is the 5th best selling PC game of all time.

It is considered one of the most complex Real-Time Strategy (RTS) games and it is also one of the longest played RTS games.

This makes it a great game to test AI against for more real world type challenges.

StarCraft II is a very complicated game.

There are 3 different races that a player can choose from to play, and these happen to be so different from each other that most professional players only focus on playing one particular race.

Players also have to build things to be able to use them later in the game managing a large economic system and balancing short-term goals with long-term goals.

Competitors also have to actively scout to find out information on what the opposing player is doing; unlike chess and Go there is imperfect information between players.

AlphaStar was trained in the similar manner as AlphaZero using neural networks and reinforcement learning, but it also used supervised learning.

It used game data from real human tournament games for the initial supervised learning phase of the training.

This allowed AlphaStar to imitate the basic strategies used by players for the game, and got it to the point where it could defeat most decent hobby players and the built-in “Elite” level AI for the game, but not professional players.

They were able to simulate 200 years of play for each agent, with each agent having a slightly different gameplay strategy after training for only 14 days.

The below chart show the increase in the AI’s agents effect once the switched from supervised learning to multi-agent reinforcement learning (called the AlphaStar Training League below).

TLO and MaNa are the two pros that AlphaStar was able to defeat in competition.

AlphaStar was able to overcome many challenges that have been plaguing AI for years:Game theory: There is no single best strategy for StarCraft.

So the AI needs to constantly try new strategies to find what works thus helping to expand its strategic knowledge.

Imperfect information: Unlike AlphaZero that dealt in games with perfect information, where both players can see the entire game board at all times, StarCraft has imperfect information where opposing players need to actively scout to discover what the other player is doing.

Long term planning: StarCraft games take approximately an hour or so to play.

And some of the actions that are taken early in the game, my not yield results until much later in the gameplay.

Real time: Again unlike AlphaZero where the games take a turn based system, where for humans it’s not uncommon to take time to think about each move before making it, StarCraft needs to be played in real time where both players are taking actions (and hundreds of actions a minute) at the same time and continuously for the entire game.

Large action space: Whereas Go on a 19×19 board was a big step up from the 8×8 chessboard, StarCraft gameplay uses and even larger space with hundreds of different units that can be modified.

At every time-step there are an average of 10²⁶ actions.

By being able to overcome these challenges for a more complex game environment, this pushes AI to be able to have many more possibilities in a real world environment.

AlphaZero and AlphaStar where able to prove how improving their algorithms and using reinforcement training can have large impacts on the effectiveness of the AI.

Testing these theories to see if they work in gameplay is a great proof of concept that hopefully others can take and make AIs to help solve real world issues.

Sources:https://deepmind.

com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/https://deepmind.

com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/.. More details

Leave a Reply