CapStone Project — Starbucks, An offer Customer cannot refuse!

CapStone Project — Starbucks, An offer Customer cannot refuse!Ambeek SharmaBlockedUnblockFollowFollowingApr 24Starbucks — Its an experience!The Big Question — What is the Problem?In today’s context, the bigger challenge is how to offer customer an offer he cannot refuse.

Putting that as the core of the problem statement I chose to solve it by building a model that predicts whether a customer will respond to an offer.

I prepared a 4 step strategy to solve this problem.

First, I combined the offer portfolio, customer profile, and transaction data.

Each row of this combined dataset describes an offer’s attributes, customer demographic data, and whether the offer was successful.

Second, I assessed the accuracy and F1-score of a naive model that assumes all offers were successful.

This provided me a baseline for evaluating the performance of models that I constructed.

Accuracy measures how well a model correctly predicts whether an offer is successful.

However, if the percentage of successful or unsuccessful offers is very low, accuracy is not a good measure of model performance.

For this situation, evaluating a model’s precision and recall provides better insight to its performance.

I chose the F1-score metric because it is “a weighted average of the precision and recall metrics”.

Third, I compared the performance of logistic regression, random forest, and gradient boosting models.

Fourth, I refined the hyper-parameters of the model that had the highest accuracy and F1-score.

Data ExplorationGoal of my exploration & visualisation of the Starbucks Capstone Challenge data files is to identify the preprocessing steps that I need to apply prior to combining them.

Offer Portfolio DataThe Starbucks Capstone Challenge offer portfolio data summaries customer offers.

I drew three conclusions based on my exploratory analysis of this data.

First, I should split the multi-label channels variable using the scikit-learn MultiLabelBinarizer.

Second, I need to rename the id variable to offerid to distinguish it from customer identifiers.

Third, I should one hot encode the offer_type variable.

Starbucks Capstone Challenge Offer Portfolio DataThe algorithm that I implemented to clean the offer portfolio data has six steps.

First, I changed the name of the id column to offerid.

Second, I renamed the duration column to durationdays.

Third, I removed underscores from column names.

Fourth, I one hot encoded the offertype column.

Fifth, I one hot encoded the channels column.

Sixth, I replaced the offertype and channelscolumns with their respective one hot encoded values.

Customer Profile DataThe Starbucks Capstone Challenge customer profile data describes customer demographics.

There are five characteristics of this data that I observed during my exploratory data analysis.

First, gender and income have approximately 13% missing data.

Second, customer age is 118 when customer income is missing (i.

e.

NaN).

Third, customer gender is not specified for ~ 1.

5% of the data.

Fourth, the year that a customer became a rewards member is not uniformly distributed.

This suggests that this feature may be a useful customer differentiator.

Fifth, the month that a customer became a rewards member is approximately uniform.

Therefore, this feature is probably not useful for predicting whether a customer offer was successful.

Starbucks Capstone Challenge Customer Profile DataThe algorithm that I implemented to clean customer profile data has seven steps.

First, I removed customers with missing income data.

Second, I removed customer profiles where the gender attribute was missing.

Third, I changed the name of the id column to customerid.

Fourth, I transformed the became_member_on column to a datetime object.

Fifth, I one hot encoded a customer’s membership start year.

Sixth, I one hot encoded a customer’s age range.

Seventh, I transformed a customer profile’s gender attribute from a character to a number.

There are three plots that I generated to explore simulated Starbucks customer demographics.

First, I plotted the distribution of customer income.

These results suggest that the minimum and maximum income for both male and female customers is approximately the same.

However, male customer income is slightly biased towards lower values compared to female customer income.

Starbucks Customer Income DistributionSecond, I generated a Starbucks rewards membership start year distribution visualization.

These results suggest that most customers recently joined the Starbucks rewards program.

They also suggest that there are more male customers than female customers.

Starbucks Customers Rewards Membership Start YearThird, I plotted the customer age range distribution.

These results suggest that the average customer age is between 50 and 60 years old.

Starbucks Customers Age Range DistributionCustomer Transaction DataThe Starbucks Capstone Challenge customer transaction data describes customer purchases and when they received, viewed, and completed an offer.

There are two conclusions that I drew from my exploratory analysis of this data.

First, I need to separate customer offer and purchase data.

Second, ~45% of the events are customer purchases and ~55% of events describe customer offers.

Starbucks Capstone Challenge Customer Transaction DataThe algorithm that I implemented to clean customer transaction data has six steps.

First, I changed the name of the person column to customerid.

Second, I removed customer id’s that are not in the customer profile DataFrame.

Third, I converted the time variable units from hours to days.

Fourth, I changed the name of the time column to timedays.

Fifth, I created a DataFrame that describes customer offers.

This included creating an offerid column, parsing the offer event type, and one hot encoding customer offer event types.

Sixth, I created a DataFrame that describes customer transaction events.

Combine Customer Transaction, Demographic and Offer DataData cleaning refers to a set of transformations that are applied to a dataset prior to predictive modeling.

The algorithm that I implemented to combine customer transaction, demographic, and offer data has five steps.

First, I select a customer’s profile.

Second, I select offer data for a specific customer.

Third, I select transactions for a specific customer.

Fourth, I initialize DataFrames that describe when a customer receives, views, and completes an offer.

Fifth, I apply the following procedure for each offer that a customer receives:a.

Initialize the current offer idb.

Look-up a description of the current offerc.

Initialize the time period when an offer is validd.

Initialize a Boolean array that select customer transactions that fall within the valid offer time windowe.

Initialize a Boolean array that selects a description of when a customer completes an offer (this array may not contain any True values)f.

Initialize a Boolean array that selects a description of when a customer views an offer (this array may not contain any True values)e.

Determine whether the current offer was successful (for an offer to be successful a customer must view and complete it)f.

Select customer transactions that occurred within the current offer valid time windowg.

Initialize a dictionary that describes the current customer offerh.

Update a list of dictionaries that describes the effectiveness of offers to a specific customerOnce I evaluated all customer transactions, I converted the resulting list of dictionaries into a pandas DataFrame.

Predict Customer Offer SuccessThe problem that I chose to solve was to build a model that predicts whether a customer will respond to an offer.

My strategy for solving this problem has four steps.

First, I combined the offer portfolio, customer profile, and transaction data.

Second, I split the combined customer offer effectiveness data into training and test sets prior to assessing the accuracy and F1-score of a naive model that assumes all offers were successful.

My analysis suggests that the naive model accuracy was 0.

471 and its F1-score was 0.

640.

I also evaluated training data customer offer statistics.

These results suggest that the distribution of offers in the simulated Starbucks mobile application data is approximately uniform.

They also imply that a customer offer’s success rate ranges from ~ 6% to 75%, with the two least successful offers being informational.

Starbucks Capstone Challenge Data Customer Offer StatisticsSince the combined customer offer effectiveness data contains both numeric and one hot encoded categorical variables, I applied minimum / maximum scaling to the numeric variables to avoid model bias.

I performed a random search of logistic regression, random forest, and gradient boosting models to select the one that had the highest training data accuracy and F1-score.

These results suggest that a random forest model had the best accuracy compared to gradient boosting, logistic regression, and a naive predictor that assumed all customer offers were successful.

I then refined the random forest model hyperparameters using a grid search.

This analysis suggests that the random forest model’s training data accuracy improved from 0.

742 to 0.

753.

This result also suggests that the random forest model’s training data F1-score increased from 0.

735 to 0.

746.

Estimated Training Data Model PerformanceBias and variance are two characteristics of a machine learning model.

Bias refers to inherent model assumptions regarding the decision boundary between different classes.

On the other hand, variance refers a model’s sensitivity to changes in its inputs.

A logistic regression model constructs a linear decision boundary to separate successful and unsuccessful offers.

However, my exploratory analysis of customer demographics for each offer suggests that this decision boundary will be non-linear.

Therefore, an ensemble method like random forest or gradient boosting should perform better.

Both random forest and gradient boosting models are a combination of multiple decision trees.

A random forest classifier randomly samples the training data with replacement to construct a set of decision trees that are combined using majority voting.

In contrast, gradient boosting iteratively constructs a set of decision trees with the goal of reducing the number of misclassified training data samples from the previous iteration.

A consequence of these model construction strategies is that the depth of decision trees generated during random forest model training are typically greater than gradient boosting weak learner depth to minimize model variance.

Typically, gradient boosting performs better than a random forest classifier.

However, gradient boosting may overfit the training data and requires additional effort to tune.

A random forest classifier is less prone to overfitting because it constructs decision trees from random training data samples.

Also, a random forest classifier’s hyperparameters are easier to optimize [1].

ConclusionThe problem that I chose to solve was to build a model that predicts whether a customer will respond to an offer.

My strategy for solving this problem has four steps.

First, I combined offer portfolio, customer profile, and transaction data.

Second, I assessed the accuracy and F1-score of a naive model that assumed all customer offers were successful.

Third, I compared the performance of logistic regression, random forest, and gradient boosting models.

This analysis suggests that a random forest model has the best training data accuracy and F1-score.

Fourth, I refined random forest model hyperparameters using a grid search.

My analysis suggests that the resulting random forest model has an training data accuracy of 0.

753 and an F1-score of 0.

746.

The test data set accuracy of 0.

736 and F1-score of 0.

727 suggests that the random forest model I constructed did not overfit the training data.

“Feature importance” refers to a numerical value that describes a feature’s contribution to building a model that maximizes its evaluation metric.

A random forest classifier is an example of a model that estimates feature importance during training.

My analysis of the Starbucks Capstone Challenge customer offer effectiveness training data suggests that the top five features based on their importance are:1.

Offer difficulty (how much money a customer must spend to complete an offer) 2.

Offer duration 3.

Offer reward 4.

Customer income 5.

Whether a customer created an account on the Starbucks rewards mobile application in 2018Since the top three features are associated with an customer offer, it may be possible to improve the performance of a random forest model by creating features that describe an offer’s success rate as a function of offer difficulty, duration, and reward.

These additional features should provide a random forest classifier the opportunity to construct a better decision boundary that separates successful and unsuccessful customer offers.

.

. More details

Leave a Reply