Marketing Channel Attribution with Markov Chains in Python

Marketing Channel Attribution with Markov Chains in PythonMorten HegewaldBlockedUnblockFollowFollowingMar 10Any business that’s actively running marketing campaigns should be interested in identifying what marketing channels drive the actual conversions.

It is no secret that the return on investment (ROI) on your marketing efforts is a crucial KPI.

In this article we’re going to cover:Why is channel attribution important?The 3 standard attribution modelsAn advanced attribution model: Markov ChainsHow to build the 4 attribution models in PythonWhy is attribution important?As the array of platforms on which businesses can market to their customers is increasing, and most customers are engaging with your content on multiple channels, it’s now more important than ever to decide how you’re going to attribute conversions to channels.

A 2017 study showed that 92% of consumers visiting a retailer’s website for the first time aren’t there to buy (link).

To illustrate the importance of attribution, let’s consider a simple example of a user journey leading to conversion.

In this example, our user is named John.

DAY 1:John‘s awareness of your product is sparked by a YouTube ad and subsequently visits your website to browse your product catalog.

After a bit of browsing, John’s awareness of your product is sparked, yet he does not have the intention of completing a purchase.

DAY 2:The next day, when John is scrolling through his Facebook feed he receives another ad for your product, which pushes him to return to your website and this time John completes the purchasing processIn this case, when you look to calculate your ROI by marketing channel, how would you attribute the $ generated by John towards a marketing channel?Traditionally, channel attribution has been tackled by a handful of simple but powerful approaches such as First Touch, Last Touch, and Linear.

Standard Attribution Models3 standard attribution modelsLast Touch AttributionAs the name suggests, Last Touch is the attribution approach where any revenue generated is attributed to the marketing channel that a user last engaged with.

While this approach has its advantage in its simplicity, you run the risk of oversimplifying your attribution, as the last touch isn’t necessarily the marketing activity that generates the purchase.

In the above example of John, the last touch channel (Facebook) likely didn’t create 100% of the intent to purchase.

The awareness stems from the initial spark of watching the YouTube ad.

First Touch AttributionThe revenue generated by the purchase is attributed to the first marketing channel the user engaged with, on the journey towards the purchase.

Just as with the Last Touch approach, First Touch Attribution has its advantages in simplicity, but again you risk oversimplifying your attribution approach.

Linear AttributionIn this approach, the attribution is divided evenly among all the marketing channels touched by the user on the journey leading to a purchase.

This approach is better suited to capture the trend of the multi-channel touch behavior we’re seeing in consumer behavior.

However, it does not distinguish between the different channels, and since not all consumer engagements with marketing efforts are equal this is a clear drawback of this model.

Other standard attribution approaches with mentioning are Time Decay Attribution and Position Based Attribution.

An advanced attribution model: Markov ChainsWith the 3 standard attribution approaches above, we have easy-to-implement models to identify the ROI of our marketing channels.

However, the caveat with those 3 approaches is that they are oversimplified.

This may lead to overconfidence of the results driven by the marketing channels.

This oversight can be detrimental — misguiding future business / marketing decisions.

To overcome this oversight, we may consider employing a more advanced approach: Markov chains.

If you have taken a statistics course, you may have come across this theory.

Markov chains are named after the Russian mathematician Andrey Markov, and describe a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

Markov chains, in the context of channel attribution, gives us a framework to model user journeys and how each channel factors into the users traveling from one channel to another to eventually purchase (or not).

We won’t go too deep into Markov chains theory in this article.


io has a good read if you’re interested in knowing more about the math/statistics that take place behind the scenes.

)Example of a simple Markov chain with 2 events A and EThe core concepts of Markov chains is that we can use the generated data to identify the probabilities of moving from one event to another in our network of potential marketing channel events and conversion events.

In the next section, we’ll go through the Python code for implementing any of these attribution frameworks.

How to build the 4 attribution models in PythonIf you wish to follow along, the dataset we’ll be using for this example can be downloaded here.

Our dataset is structured by having engagement activities as columns and the rows being the channels that were engaged with, in chronological order.

In this case, each marketing channel is assigned a fixed numbered value which is then displayed in a column n if the n’th engagement from a given user was with that marketing channel.

Channel 21 is a conversion and our dataset only contains records of converting user journeys.

Sample of our datasetThe first thing we want to do is to import the necessary librariesimport pandas as pdimport seaborn as snsimport matplotlib.

pyplot as pltimport subprocessNext, let’s load in our dataset and clean up the data points# Load in our datadf = pd.


csv')# Grab list of columns to iterate throughcols = df.

columns# Iterate through columns to change all ints to str and remove any trailing '.

0'for col in cols: df[col] = df[col].

astype(str) df[col] = df[col].

map(lambda x: str(x)[:-2] if '.

' in x else str(x))The Markov chain framework wants the user journeys in a single variable and on the form Channel 1 > Channel 2 > Channel 3 > …, so the next loop creates exactly that# Create a total path variabledf['Path'] = ''for i in df.

index: #df.

at[i, 'Path'] = 'Start' for x in cols: df.

at[i, 'Path'] = df.

at[i, 'Path'] + df.

at[i, x] + ' > 'Since channel 21 in our dataset is a conversion event, we will separate that channel from the path and create a separate conversion variable holding the number of conversions happening (still only 1 in our user journey level data)# Split path on conversion (channel 21)df['Path'] = df['Path'].

map(lambda x: x.

split(' > 21')[0])# Create conversion value we can sum to get total conversions for each pathdf['Conversion'] = 1We’re now almost done with the initial data manipulation work.

Our data still contains all the original columns, so we grab the subset of columns that we need going forward.

Since some users may have taken the same journey we will group our data by unique user journeys and our conversion variable will hold the number of conversions for each respective journey.

# Select relevant columnsdf = df[['Path', 'Conversion']]# Sum conversions by Pathdf = df.



reset_index()# Write DF to CSV to be executed in Rdf.


csv', index=False)The last line in the above piece of code will output our data to a CSV file now that we’re done with the data manipulations.

It might be handy to have this data available for transparency purposes, and, in our case, we will also use this CSV file to run the Markov chain attribution approach.

There are a few ways to do this.

Since Python doesn’t at this time have a library put together for this, one way would be to build out the actual Markov chains/networks in Python yourself.

While this would allow you to have a complete overview of your model it would also be the most time-consuming approach.

To be more efficient, we’ll make use of the ChannelAttribution R library which has the theory behind Markov chains centered in a single application.

We will use the standard Python library subprocess to run the following piece of R code that calculates our Markov network for us.

# Read in the necessary librariesif(!require(ChannelAttribution)){ install.

packages("ChannelAttribution") library(ChannelAttribution)}# Set Working Directorysetwd <- setwd('C:/Users/Morten/PycharmProjects/Markov Chain Attribution Modeling')# Read in our CSV file outputted by the python scriptdf <- read.


csv')# Select only the necessary columnsdf <- df[c(1,2)]# Run the Markov Model functionM <- markov_model(df, 'Path', var_value = 'Conversion', var_conv = 'Conversion', sep = '>', order=1, out_more = TRUE)# Output the model output as a csv file, to be read back into Pythonwrite.

csv(M$result, file = "Markov – Output – Conversion values.

csv", row.

names=FALSE)# Output the transition matrix as well, for visualization purposeswrite.

csv(M$transition_matrix, file = "Markov – Output – Transition matrix.

csv", row.

names=FALSE)The next piece of Python code will execute our R script and load in the resulting CSV file.

# Define the path to the R script that will run the Markov Modelpath2script = 'C:/Users/Morten/PycharmProjects/Markov Chain Attribution Modeling/Markov.

r'# Call the R scriptsubprocess.

call(['Rscript', '–vanilla', path2script], shell=True)# Load in the CSV file with the model output from Rmarkov = pd.

read_csv('Markov – Output.

csv')# Select only the necessary columns and rename themmarkov = markov[['channel_name', 'total_conversion']]markov.

columns = ['Channel', 'Conversion']If you want to get around having to create a separate R script to run the Markov calculations, then a Python library that you could use is rpy2.

rpy2 allows you to import R libaries and call them directly in Python.

This approach, however, did not prove very stable during my process, and therefore I opted for the separate R script approach.

Channel Attribution using Markov Chains can be seen in the below chart.

This chart should tell you that channel 20 is driving a large portion of conversions while channels 18 and 19 are attributed very low total conversion values.

Channel contributions for Markov chain approachWhile this output may be what you’re looking for there’s a great deal of value in the information around what the outputs of the traditional approaches look like compared to our Markov chains approach.

To calculate attributions for Last Touch, First Touch and Linear, we run the following piece of code# First Touch Attributiondf['First Touch'] = df['Path'].

map(lambda x: x.

split(' > ')[0])df_ft = pd.

DataFrame()df_ft['Channel'] = df['First Touch']df_ft['Attribution'] = 'First Touch'df_ft['Conversion'] = 1df_ft = df_ft.

groupby(['Channel', 'Attribution']).


reset_index()# Last Touch Attributiondf['Last Touch'] = df['Path'].

map(lambda x: x.

split(' > ')[-1])df_lt = pd.

DataFrame()df_lt['Channel'] = df['Last Touch']df_lt['Attribution'] = 'Last Touch'df_lt['Conversion'] = 1df_lt = df_lt.

groupby(['Channel', 'Attribution']).


reset_index()# Linear Attributionchannel = []conversion = []for i in df.

index: for j in df.

at[i, 'Path'].

split(' > '): channel.

append(j) conversion.


at[i, 'Path'].

split(' > ')))lin_att_df = pd.

DataFrame()lin_att_df['Channel'] = channellin_att_df['Attribution'] = 'Linear'lin_att_df['Conversion'] = conversionlin_att_df = lin_att_df.

groupby(['Channel', 'Attribution']).


reset_index()Let’s merge all our 4 approaches together and evaluate the differences in outputs.

# Concatenate the four data frames to a single data framedf_total_attr = pd.

concat([df_ft, df_lt, lin_att_df, markov])df_total_attr['Channel'] = df_total_attr['Channel'].


sort_values(by='Channel', ascending=True, inplace=True)# Visualize the attributionssns.


rc('legend', fontsize=15)fig, ax = plt.

subplots(figsize=(16, 10))sns.

barplot(x='Channel', y='Conversion', hue='Attribution', data=df_total_attr)plt.

show()Channel contributions across all attribution approachesFrom looking at the above chart we can quickly conclude that most user journeys start with Channel 10 and end with Channel 20, while no user journeys start at Channel 20.

To get an idea of how the different channels affect the potential user journeys we can look at the total transition matrix, which can be visualized in a heatmapTransition Probability Heatmap for Markov chain approachBy running the following piece of code:# Read in transition matrix CSVtrans_prob = pd.

read_csv('Markov – Output – Transition matrix.

csv')# Convert data to floatstrans_prob ['transition_probability'] = trans_prob ['transition_probability'].

astype(float)# Convert start and conversion event to numeric values so we can sort and iterate throughtrans_prob .

replace('(start)', '0', inplace=True)trans_prob .

replace('(conversion)', '21', inplace=True)# Get unique origin channelschannel_from_unique = trans_prob ['channel_from'].



sort(key=float)# Get unique destination channelschannel_to_unique = trans_prob ['channel_to'].



sort(key=float)# Create new matrix with origin and destination channels as columns and indextrans_matrix = pd.

DataFrame(columns=channel_to_unique, index=channel_from_unique)# Assign the probabilities to the corresponding cells in our transition matrixfor f in channel_from_unique: for t in channel_to_unique: x = trans_prob [(trans_prob ['channel_from'] == f) & (trans_prob ['channel_to'] == t)] prob = x['transition_probability'].

values if prob.

size > 0: trans_matrix[t][f] = prob[0] else: trans_matrix[t][f] = 0# Convert all probabilities to floatstrans_matrix = trans_matrix.


to_numeric)# Rename our start and conversion eventstrans_matrix.

rename(index={'0': 'Start'}, inplace=True)trans_matrix.

rename(columns={'21': 'Conversion'}, inplace=True)# Visualize this transition matrix in a heatmapsns.

set_style("whitegrid")fig, ax = plt.

subplots(figsize=(22, 12))sns.

heatmap(trans_matrix, cmap="RdBu_r")plt.

show()ConclusionDifferent marketing channel attribution approaches will fit different businesses.

In this article, we’ve outlined 4 possible ways to evaluate the effectiveness of your marketing spend.

We’ve explored 3 approaches that are fixed in the sense that they are not dependent on the structure of your data, which may lead to overconfidence.

On the other hand, a Markov chain approach will look to model channel attribution by accounting for how your user journey data is structured; though this approach is more complex.

Analyzing the output of the Markov chain model will give you a “snapshot” of marketing channel effectiveness, at a specific point in time.

You might be able to gain extra insights by looking at the model output for data just before and after a new marketing campaign launch, giving you essential information on how the campaign affected the performance of each channel.

By adding even more granularity and running daily attribution models, you could evaluate the relationship between PPC or marketing dollar spent and channel contribution using correlation models.

While adding more complexity to the approach presented in this article could increase the value of the model outputs, the real business value will come from being able to interpret these quantitative model results and combine these with domain knowledge on your business and the strategic business initiatives that have produced your data.

Combining these model results with the knowledge of your business will allow you to best incorporate the model findings into future initiatives.

Marketing channel attribution can be a complex task and with consumers being reached by more marketing than ever.

As technology advances and more channels become available to marketers, it becomes more important to identify precisely the channels that are driving the most ROI.

How do you dig out the valuable attribution information from your data?.. More details

Leave a Reply