Reinforcement Learning

Should it learn from its limited experience or wait until it explores further before taking action?This is one of the main challenges of reinforcement learning.

In order to get a higher reward, an agent must favour the actions that have been tried and tested.

But in order to discover such actions, it has to keep trying newer actions that have not been selected before.

Researchers have studied this trade-off between exploration and exploitation extensively over the years and it’s still an active topic.

Real world examples of reinforcement learningLet’s see where reinforcement learning occurs in the real world.

This will help us understand how it works and what possible applications can be built using this concept:Game playing: Let’s consider a board game like Go or Chess.

In order to determine the best move, the players need to think about various factors.

The number of possibilities is so large that it is not possible to perform a brute-force search.

If we were to build a machine to play such a game using traditional techniques, we need to specify a large number of rules to cover all these possibilities.

Reinforcement learning completely bypasses this problem.

We do not need to manually specify any rules.

The learning agent simply learns by actually playing the game.

Robotics: Let’s consider a robot whose job is to explore a new building.

It has to make sure it has enough power left to come back to the base station.

This robot has to decide if it should make decisions by considering the trade off between the amount of information collected and the ability to reach back to base station safely.

Industrial controllers: Consider the case of scheduling elevators.

A good scheduler will spend the least amount of power and service the highest number of people.

For problems like these, reinforcement learning agents can learn how to do this in a simulated environment.

They can then take that knowledge to come up with optimal scheduling.

Babies: Newborns struggle to walk in the first few months.

They learn by trying it over and over again until they learn how to balance.

If you observe these examples closely, you will see there are some common traits.

All of them involve interacting with the environment.

The learning agent aims to achieve a certain goal even though there’s uncertainty about the environment.

The actions of an agent will change the future state of that environment.

This impacts the opportunities available at later times as the agent continues to interact with the environment.

Building blocks of reinforcement learningNow that we have seen a few examples, let’s dig into the building blocks of a reinforcement learning system.

Apart from the interaction between the agent and the environment, there are other factors at play here:A typical reinforcement learning agent goes through the following steps:There is a set of states related to the agent and the environment.

At a given point of time, the agent observes an input state to sense the environment.

There are policies that govern what action needs to be taken.

These policies act as decision-making functions.

The action is determined based on the input state using these policies.

The agent takes the action based on the previous step.

The environment reacts in a particular way in response to that action.

The agent receives reinforcement, also known as a reward, from the environment.

The agent records the information about this reward.

It’s important to note that this reward is for this particular pair of state and action.

Reinforcement learning systems can do multiple things simultaneously — learn by performing a trial and error search, learn the model of the environment it is in, and then use that model to plan the next steps.

Creating an environmentWe will be using a package called OpenAI Gym to build reinforcement learning agents.

You can learn more about it here: https://gym.

openai.

com.

We can install it using pip by running the following command on the Terminal:$ pip3 install gymYou can find various tips and tricks related to its installation here: https://github.

com/openai/gym#installation.

Now that you have installed it, let’s go ahead and write some code.

Create a new python file and import the following package:import argparse import gymDefine a function to parse the input arguments.

We will be able to specify the type of environment we want to run:def build_arg_parser(): parser = argparse.

ArgumentParser(description='Run an environment') parser.

add_argument('–input-env', dest='input_env', required=True, choices=['cartpole', 'mountaincar', 'pendulum', 'taxi', 'lake'], help='Specify the name of the environment') return parserDefine the main function and parse the input arguments:if __name__=='__main__': args = build_arg_parser().

parse_args() input_env = args.

input_envCreate a mapping from input argument string to the names of the environments as specified in the OpenAI Gym package:name_map = {'cartpole': 'CartPole-v0', 'mountaincar': 'MountainCar-v0', 'pendulum': 'Pendulum-v0', 'taxi': 'Taxi-v1', 'lake': 'FrozenLake-v0'}Create an environment based on the input argument and reset it:# Create the environment and reset it env = gym.

make(name_map[input_env]) env.

reset()Iterate 1000 times and take action during each step:# Iterate 1000 times for _ in range(1000): # Render the environment env.

render() # take a random action env.

step(env.

action_space.

sample())If you want to know how to run the code, run it with the help argument as shown in the following figure:Let’s run it with the cartpole environment.

Run the following command on your Terminal:$ python3 run_environment.

py –input-env cartpoleIf you run it, you will see a window showing a cartpole moving to your right.

The following screenshot shows the initial position:In the next second or so, you will see it moving as shown in the following screenshot:Towards the end, you will see it going out of the window as shown in the following screenshot:Let’s run it with the mountain car argument.

Run the following command on your Terminal:$ python3 run_environment.

py –input-env mountaincarIf you run the code, you will see the following figure initially:If you let it run for a few seconds, you will see that the car oscillates more in order to reach the flag:It will keep taking longer strides as shown in the following figure:Building a learning agentLet’s see how to build a learning agent that can achieve a goal.

The learning agent will learn how to achieve a goal.

Create a new python file and import the following package:import argparse import gymDefine a function to parse the input arguments:def build_arg_parser(): parser = argparse.

ArgumentParser(description='Run an environment') parser.

add_argument('–input-env', dest='input_env', required=True, choices=['cartpole', 'mountaincar', 'pendulum'], help='Specify the name of the environment') return parserParse the input arguments:if __name__=='__main__': args = build_arg_parser().

parse_args() input_env = args.

input_envBuild a mapping from the input arguments to the names of the environments in the OpenAI Gym package:name_map = {'cartpole': 'CartPole-v0', 'mountaincar': 'MountainCar-v0', 'pendulum': 'Pendulum-v0'}Create an environment based on the input argument:# Create the environment env = gym.

make(name_map[input_env])Start iterating by resetting the environment:# Start iterating for _ in range(20): # Reset the environment observation = env.

reset()For each reset, iterate 100 times.

Start by rendering the environment:# Iterate 100 times for i in range(100): # Render the environment env.

render()Print the current observation and take action based on the available action space:# Print the current observation print(observation) # Take action action = env.

action_space.

sample()Extract the consequences of taking the current action:# Extract the observation, reward, status and # other info based on the action taken observation, reward, done, info = env.

step(action)Check if we have achieved our goal:# Check if it's done if done: print('Episode finished after {} timesteps'.

format(i+1)) breakThe full code is given in the file balancer.

py.

If you want to know how to run the code, run it with the help argument as shown in the following screenshot:Let’s run the code with the cartpole environment.

Run the following command on your Terminal:$ python3 balancer.

py –input-env cartpoleIf you run the code, you will see that the cartpole balances itself:If you let it run for a few seconds, you will see that the cartpole is still standing as shown in the following screenshot:You will see a lot of information printed on your Terminal.

If you look at one of the episodes, it will look something like this:Different episodes take a different number of steps to finish.

If you scroll through the information printed on your Terminal, you will be able to see that.

.

. More details

Leave a Reply