The Science Behind AlphaStar

In classic chess, players can safely take 1 hour to evaluate a single more but, in StarCraft II actions need to be taken real time.

From the AI perspective, this means that agents need to evaluate thousands of options real time and detect the best match for the long term strategy.

· Large Action Space: If you think that a 19×19 Go board is a large AI environment think again ????.

The StarCraft II environment requires players to control hundreds of units at any given time and the combinatorial combinations of actions grow proportional to the complexity to the environment.

Many of these challenges are present in other strategy games but none at the magnitude of StarCraft II.

In order to master the game, the DeepMind needed a different strategy.

The AlphaStar ApproachIn the StarCraft II challenge, DeepMind resorted to a strategy that has been widely successful for them in the past: let the AI master the game by playing against itself.

The core of the AlphaStar architecture is a deep neural network that receives input data from a game interface and outputs a series of actions.

The neural network was initially trained using traditional supervised learning leveraging a dataset of anonymized human games released by Blizzard.

This initial training allowed AlphaStar to master the initial strategies of the game at a decent level but it was still far from beating a professional player.

Now that AlphaStar was able to successfully play StarCraft II, the DeepMind team created a multi-agent reinforcement learning environment in which multiple variations of the agent will play against themselves.

Named the AlphaStar league, the system allows the agent to improve on specific strategies by playing against a specific version specialized on that strategy.

The following figure illustrates the general architecture of AlphaStar.

The impact that the AlphaStar League had in AlphaStar was remarkable.

The AI agent was able to master an incredibly large number of strategies that most human players never get exposed to.

The AlphaStar League constantly included new competitors that brought more refined strategies consisting on new build orders, unit compositions, and micro-management plans.

The progress on AlphaStar was it went through the League training is shows in the following chart.

AlphaStar in ActionThe DeepMind team evaluated AlphaStar against two top professional players: TLO and MaNa.

The match again TLO focused on the Protoss race and AlphaStar dominated by an astonishing 5–0 score.

After the match, AlphaStar was ready to face MaNa who is considered one of the strongest world StarCraft II players.

AlphaStar also won the match 5–0.

The following animation captures the game views from AlphaStar and MaNa showing how the observations translate into actions by the neural network.

Compared to human players, AlphaStar had both unique advantages and tangible weaknesses.

Obviously, the Leagues-based training allowed AlphaStar to be exposed to strategies never seen by human players.

Similarly, AlphaStar was able to see the whole map of the game while human players need to constantly balance what area to turn their attention to.

However, human players are able to issue hundreds of actions per minute while AlphaStar was notably limited in this area.

As you can see in the following graphic, both MaNa and TLO played issuing a considerably larger number of actions per minute than AlphaStar.

AlphaStar represents a major breakthrough in AI and one that open the door to all sorts of new challenges.

The principles of AlphaStar can be applied to many problems that require long term strategic planning with imperfect information.

From economic policy planning to weather predictions, we are likely to see AI agents inspired by AlphaStar tackle these challenges in the near future.


. More details

Leave a Reply