# Data Science Case Study: Optimizing Product Placement in Retail (Part 2)

If we wanted to strengthen our position within a market we could make such a move and moreover design our next outlet based on what we have learned about our customers over time.

Using Numpy’s Argmax FunctionThe first approach I thought of was using creating a grid of predictions based on the values of possible values for the ‘tractible’ features of the dataset, then using numpy’s argmax function to determine which value would yield the biggest reward or payout.

To describe argmax simply, it can be described as the set the values of x that maximises some function f.

This approach was influenced by a concept in Decision Analysis called Maximax Criterion in which the strategy selected is the one that has the highest return or payout.

The image below illustrates this how the argmax function works.

Based on the image above, the value “0” maximises either of the two functions.

So argmax, would be 0.

The problem with this approach is that it assumes that there is only one action or decision that could be taken to improve the sales forecast for the selected product.

A better approach would be better to discover which combination or actions could produce the best forecast and to determine if the current placement of is already optimal.

To do this, I will examine the Monte Carlo Tree Search algorithm.

Using Monte Carlo Tree SearchThe Multi-Armed Bandit problem is an optimization problem in which an agent seeks to find the optimal strategy for maximizing the payout of a system within a state space.

I thought that this approach was a perfect fit for the case given that there are a number of actions (and combination of actions) that can be taken to find optimal product placement.

Monte Carlo Tree Search is an approach to discovering the optimal policy (sequence of events) and thus solving the Multi-Armed Banding problem.

The image below illustrates the main steps involved in Monte Carlo Tree Search:Monte Carlo Tree SearchUsing the Monte Carlo Tree Search, it is possible to find the combination of actions (state configuration) that will yield the best result.

It should be noted however, that in this particular case the combination of actions presented are not meant to be performed in any particular sequence, but simultaneously.

If this sounds familiar, you might remember that this technique was also used in AlphaGo.

If you would like a thorough and in-depth explanation of Monte Carlo Tree Search, Jeff Bradberry has an excellent article on his website that explains its inner workings.

ConclusionIt is important to note that this algorithm is commonly found in ‘perfect system’ use cases such as game-playing and simulation in which all the rules of the system are known and there are no unknown variables.

Unfortunately, this is not how life works.

Machine Learning techniques applied to “what if” scenarios serve only to provide a guide on what may yield the best results.

Even though we were provided with sales data, we are still not sure of the seasonality of the shopping habits observed, which can certainly have an impact on the quality of the recommendation produced.

A better version of this system would be able to find the best placement options for multiple products while allowing users to prioritize one product over another.

I hope that this post gave you a clear and practical approach to using creating value with your Data Science to projects and I hope that you learned something new.

This is just one of the practical examples illustrating how reinforcement learning can be used to deliver value to a business.

As usual I welcome your feedback and look forward to producing more content.

Payoff tables show the…kfknowledgebank.

kaplan.

coAlphaGo | DeepMindExperts described the paper as “a significant step towards pure reinforcement learning in complex domains”.