The Brains in Games: Video Game AI

He can zoom around the screen, chasing the mouse pointer, and even swerve to avoid obstacles that get in his way.

The simple steering algorithms that guide his movement produce a complex pattern of behaviours that give the illusion of “smart” behaviour.

But there’s one area where our boy’s simple brain lets him down.

A straight wall or another boy in his path doesn’t phase him — he simply veers away until he has cleared the obstacle.

But more complex shapes — a dead-end corridor or even a deep corner — present an insurmountable barrier.

Any time he has to briefly move away from his target in order to clear an obstacle, the boy is completely stumped.

He stays trapped in the corner, sadly bumping his head against the wall, with no idea how to proceed.

To help him, we need to give him some way of thinking ahead — of understanding that in order to move forward, he must first head backwards.

The way we’re going to achieve this is called “pathfinding”.

We’ll give the boy an imaginary map which lets him plot a course to his goal, avoiding intervening obstacles, and reversing direction when he needs to escape from dead ends.

Imagine that there’s an invisible grid overlaying the world our boy is exploring.

The grid is comprised of dozens of nodes, connected to their neighbours by a hair-thin line.

Where that line crosses only empty space, we call this an “edge” in our network.

But when the line is impeded by an obstacle — a wall or one of the boy’s friends — then we say that the nodes are “unconnected”.

The result is a kind of negative-space version of the world.

Where there is open space, the grid is a regular mesh of nodes and edges connecting them.

But walls or other obstacles puncture the mesh, creating gaps in the network.

This grid is the map our boy uses to plot a course.

Instead of accelerating directly towards his goal, the boy is more considered.

He finds the node in the grid which is closest to himself, and the node which is closest to his goal.

Then he calculates a path along the grid’s edges, through all of the intervening nodes.

Where there are no walls, the path is a straight line through those nodes which are directly between him and his goal.

But where there are obstacles in his path, the grid’s edges do not connect across these.

The boy’s path must route around this gap in the grid.

With this path in mind, the boy can confidently find his goal.

Instead of shooting off directly towards his target, he heads towards each successive node in his path, assured that, since an edge connects the nodes, there are no walls in his way.

How the boy constructs the path through the nodes and edges is one of those pure-maths problems that is intensely interesting to a very specific set of people, and incredibly dull to everyone else, so I will skip that here.

Suffice it to say that it is a well-explored area of mathematics that can reliably find more-or-less optimal paths through an arbitrary grid at extremely impressive speeds.

This impressive speed means that our boy can recalculate his path as his goal moves, seamlessly reorienting to duck around walls or change direction completely to find his way unerringly to his target, wherever it hides.

Confronted with even the most fiendish of mazes, the boy sets off immediately, never taking a wrong turn, and zooming through the twists and turns without hesitation.

In fact, the boy’s unerring skill at solving mazes has an interesting, counterintuitive effect.

Previously, the boy’s eager seeking of his goal, and his thoughtless blundering into corners and dead-ends had a kind of naive charm to it.

He was relatable in his overeagerness.

Now, his implacable tracking towards his target and unhesitating avoidance of anything in his path eliminates that charming fallibility.

The illusion of the boy having thoughts and desires is shattered.

Instead of an eager boy, we have an unthinking automaton.

By making the boy smarter, we have, ironically, made him seem entirely mindless.

The problem is that the boy has been given perfect knowledge of the maze, and can recall that information instantly and without error.

We can no longer so easily identify with his situation.

To bring his behaviour back into the familiar realm of the human, we have to reintroduce elements of discovery and error.

This also requires us to add a great deal of complexity.

Previously the boy was given a map of the maze — a network of nodes and edges showing where walls blocked his path.

Now, we give him a blank slate.

His network of nodes and edges starts off fully connected, with each node having an edge connecting to its neighbours.

The boy believes that all areas can be reached from anywhere, until he learns otherwise.

This learning process is governed by a list of known walls he keeps in his memory.

Every wall he knows about is stored there.

When the boy sees a new wall, he pops its dimensions into his memory-bank, and he updates his understanding of the world.

He finds all the edges in his map of the world which intersect with this new wall, and prunes them from the grid.

Then he recalculates his path to his goal.

Where before a path existed, now the edges between the nodes have been removed, and the path is gone.

The boy must find a new route.

As the boy moves around the maze, he encounters new walls and updates his map accordingly.

When he discovers a dead end at the end of his route, he turns around and tries another way.

Clever but a bit clumsyWhen we watch the boy traverse the maze with this new algorithm, he is relatable again.

He stops, reconsiders, turns around and tries new approaches.

We watch him take a wrong turn, discover his error, and then return to try another way.

He takes far longer to complete the maze than previously, due to his many missteps.

But, oddly, he seems more intelligent, not less.

He’s slower at completing the maze, but now he’s genuinely solving a puzzle, and watching him, we can empathise with him.

He learns from his mistakes, and slowly becomes an expert at traversing the maze he’s learnt.

Our boy is now quite sophisticated.

He has a memory, he can see things in the world around him, and he can make plans and update them as his understanding of the world changes.

But we’ve also found something interesting: What makes the boy seem intelligent isn’t how sophisticated his processes are, or how good he is at completing tasks.

“Intelligence” is something less tangible.

It’s about how much the boy’s behaviour resembles human behaviour.

It’s about whether, looking at him, we can see ourselves.

Goal-Oriented Action PlanningWe’re nearly done with our boy.

We’ve taught him steering, so he can seek out or flee a goal, and avoid obstacles along the way.

We’ve taught him pathfinding, so that he can find his way to a goal even if that requires him to navigate a complex path.

We’ve even given him a rudimentary memory — he can remember the walls he’s encountered, and use that memory to update his plans for getting to his goal.

But all this goal-seeking feels a little short sighted.

Our boy is a world-class talent at seeking a spot on a screen, but he has little idea what he’s going to do once he gets there.

He’s got short-term goals, but no longer-term ambitions.

Let’s fix that.

Let’s give our boy some facility for making plans: “first I’ll do this, then I’ll do that”.

We’ll give him some proficiency not just at goal-seeking, but actual problem solving.

This is what allows, for example, the guards in an action video game to patrol an area, and attack any intruders they discover.

It’s the difference between the ghosts in Pac-Man, which each follow their own simple routine to chase the player around the maze, and the characters in The Sims, who lead complex lives and follow sophisticated routines, only needing the barest guidance from a human player.

There are lots of ways to do this, but the way we’re going to use is one that’s been used in some form in many popular video games.

It’s called “goal-oriented action planning”, and it draws on some of the same pathfinding algorithms we’ve already learned about.

Imagine you’re in a locked room, and you want to get out.

In the room, you can see a key.

In your mind, you form a plan: go to the key, pick up the key, go to the door, unlock the door, open the door, leave the room.

It’s not a complex plan, but it’s quite important.

You can’t open the door until you’ve unlocked it.

There’s no point in going to the door until you’ve got the key.

The sequence of actions needs to be performed in a particular order to succeed.

Imagine a more complex scenario: getting ready to go to work in the morning.

There’s a whole host of actions you need to take in order to get to work, dressed, clean, and on time.

Some of them can happen in any order, some of them have complex requirements.

To eat your breakfast, you need to go to the fridge, take out the milk, go to the counter, take out a bowl, and so on.

To take a shower, there’s a whole other sequence of actions you must take.

But whether you take a shower first, or eat your breakfast first has no bearing on how successfully you eventually get to work.

The key insight of goal-oriented action planning is that these sequences of actions can be expressed using the same “nodes and edges” concepts that we’ve used for pathfinding.

The simple locked room scenario is a straight path.

To get to “pick up key”, you must pass through “go to key”.

To get to “unlock door”, you must have passed through both “pick up key” and “go to door”.

There’s only one way of ordering the sequence to achieve the final goal.

The “getting ready for work” graph is much more complex.

There are straight paths within — the “fridge — counter — bowl — breakfast” path, for example.

But linking these is a complex web of connections — things that can happen in any order — you can brush your teeth in the shower, if you choose (though it feels wrong to me, somehow).

You can put on both socks, then both shoes, or complete one foot’s dressing completely before moving on to the next (which also feels wrong to me — like one foot is briefly overdressed).

That over-long digression into my opinions on morning routines is a complex way of describing something that, in the abstract language of computers, can be represented quite simply, by two linked concepts: “States” are things which are true about the world.

Have you showered?.Have you got your toothbrush?.“Actions” are things you can do.

“Get in the shower”, “Pick up your toothbrush”.

Actions have consequences — they change the state of the world — and they also have requirements — states that must be true for the action to be taken.

In the language of algorithms, states are nodes, and actions are edges — they form a “graph”, exactly like the grid that helps our boy navigate the maze, but this time it’s a map of abstract concepts, not physical space.

To make a plan, the boy calculates a path through this network of states and actions, until he reaches our “goal” state.

This is something our boy already knows how to do!.We’ve already learned about pathfinding, and about building a grid of nodes and edges representing physical paths.

The only change is to build this grid in the realm of more abstract states.

To help us do that, we’re going to give our boy a job.

And like many young people getting their start in the world, his first job is going to be in retail.

First we’ll set up his shop.

We’ll add some walls representing aisles, a little room in one corner representing the storeroom, and a small box representing the counter.

All of this is created using the same concepts as we’ve used previously to create a maze which the boy can navigate.

The boy can explore his shop in exactly the same way.

The boy has one job, for now, cleaning up after customers.

Like many retail employees, the boy doesn’t have a lot of options about how to do his job.

His understanding of the world can be represented by a very simple network:Each circle is a “state”, and the lines between them are “actions”.

To achieve his goal — the “Mess cleaned” state, he has to take the “clean mess” action.

But he can only take that action when he is in the “close to mess” state.

To get there, he has to take the “go to mess” action, which is only possible when he’s in the “can see mess” state.

What “goal oriented action planning” achieves is allowing the boy to navigate a path through these states and actions.

He understands the order in which he must take the appropriate actions.

What’s more, if he’s interrupted in these actions, he can pick up where he left off.

This is what this looks like in practice: He wanders around aimlessly until he spots a mess, and then he kicks off a simple routine: Go to the mess, clean it up.

Going to the mess, the boy must engage his physical pathfinding and steering skills to navigate to the correct location.

To clean it up, the boy might pause for a set period.

In a more sophisticated simulation, we might play an animation of the boy furiously mopping.

What drudgery!.At this stage, the boy’s behaviour is hardly more sophisticated than it has been before.

Our boy is wasting his potential!.It’s time for him to take on greater responsibilities.

Having carried out his mess cleaning duties faithfully for a time, let’s give him an additional task.

No shop functions without customers, so let’s introduce some.

The customers, just like the boy, have a series of states and actions they can take — looking for an item in the shop, going to it, picking it up, taking it to the counter, waiting for assistance, paying for the item, and leaving the shop.

The boy gets a complementary set of states and actions: If a customer is at the counter, go to the counter, and sell them the item.

Now he has two priorities.

If there is a customer waiting at the counter, he heads over there to make the sale.

If there’s a mess on the shop floor, he runs to clean it up.

In between, he hovers anxiously, looking for an opportunity to help out.

We’re able to add new responsibilities like this very easily, because of the flexibility of the underlying decision-making architecture.

Each state and action is simply a node and edge in a grid.

Giving the boy new goals is as simple as identifying which states are required to fulfil a goal.

Adding new actions is as easy as describing their requirements and consequences.

As new goals, actions, and states are added to the world, the boy can seamlessly plot new paths through this grid of actions, figuring out how to achieve his new goal by assembling a series of actions which will result in the desired states being achieved.

Our boy, hard at workOur boy (the faithful red dot) zooms about the shop, hunting for messes (green dots), which he cleans by running back and forth over them.

Customers (orange dots) come into the shop from the top left, and seek out items (grey dots).

When they find them, they pick them up and take them to the counter (bottom right), where they wait for our boy.

When the boy sees them, he rushes over to sell them the item, at which point they head back to the top left of the screen to leave the shop.

The outcome of all of this work is a complex dance of interactions on our screen, in the boy’s shop.

Customers come and go, busily zipping around hunting down the items they need.

The boy eagerly chases after messes on the shop floor, and when he notices customers waiting, he dashes over to take their payments.

Despite the rudimentary graphics and the slightly clunky movements, the whole thing takes on a convincing semblance of life.

Each customer is acting on its own simple impulses, and the boy is scarcely more complex, but together, interacting, we observe a fascinating phenomenon: emergence.

“Emergence” is what happens when simple systems, interacting in high volumes, create complex behaviours.

Two customers race each other for the same item, caused by their goal-setting behaviour.

The “avoidance” steering that the boy and the customers share cause them to dance awkwardly around each other when their paths meet.

Most interestingly, we find that messes accumulate in the far corners of the shop, away from the counter — the boy is drawn back to the counter to sell things to the customers before he can wander into the far corners to clean.

The boy’s behaviour becomes hard to predict.

His internal state is governed by the positions of several ever-changing elements, and it becomes hard to understand what is governing his decisions.

Our simulation, the shop, is simplistic enough that given time, we could break down each action and reaction in terms of the simple algorithms that govern the boy’s behaviour, but as the complexity increases, it becomes increasingly less useful to do so.

Almost inevitably, it begins to feel natural to explain the boy’s behaviour in terms of his desires and preferences, in the language we use to describe real-life, human interactions.

“He pushed that customer out of the way so he could get to the mess”, “He gets confused when there’s a mess behind a wall”.

“He didn’t see those customers until they’d been waiting a while”.

“He’s lazy about cleaning the whole shop”.

The last few essays have focused on a branch of artificial intelligence that’s only tangentially related to the world of data science and machine learning, where the most advanced research in this field takes place.

In many ways, the algorithms which drive video game “bots” are trivial, and not hugely relevant to the emerging field of computer intelligence.

But in another sense, these algorithms are some of the most relevant to humans’ experiences of artificial intelligence.

More than almost any other kind of artificial intelligence, video game bots are on the front lines of human/computer relations — they’re the most “human-like” of their kind, despite often being substantially less mathematically sophisticated.

That is the crucial insight from these algorithms: that it is relatability, not complexity, that it is empathy, not processing power, which defines what we consider to be intelligent, what we consider to be “like us”.

The previous article in this series “Movie Maths: How Computers Understand Text”, is available here.

The code for this essay is available from my github, here.

The next essay in this series will be published in June.

.

. More details

Leave a Reply