A Starbucks Case Study: Optimizing Customer-to-Promotion Match

A Starbucks Case Study: Optimizing Customer-to-Promotion MatchKeigo ItoBlockedUnblockFollowFollowingMay 19Photo Credit: Adrianna CalvoIntroductionSales campaigns are marketing tool used to boost short-term sales.

Success depends on targeting the right type of promotion to the right customer.

A poorly designed sales campaign can cost the company; for example, revenue can be lost in the form of the rewards they give away.

This means that, every time a company initiates a sales campaign, the form of reward must be carefully considered — and the critical information needed to make that decision is: 1) the predicted sales boost for different types of promotions, and 2) knowing how specific customers respond to different promotion ads.

With nearly 14000 stores in the United States, mega-retailer like Starbucks serve millions of customers, and the price of a poor sales campaign can be significant.

In this post, I will use data collected from an experiment (see below) done by Starbucks to analyze customer response to different promotions.

At Starbucks, most sales campaigns are executed through the Internet, and these promotions can be classified into three categories:DiscountBuy-one-get-one free (BOGO)Informational (e.


, new or seasonal products)A sales campaign is successful if a regular Starbucks customer spends more than she does during the promotion period.

A sales campaign is in danger of costing the company if a regular Starbucks customer does not change his spending behavior during this period.

Starbucks has an estimated 23 million “Starbucks mobile app” members, and an enormous collection of data on the demographics, promotions received, and purchase records of these customers.

By combining several data analysis techniques, I will answer the following questions (a brief approach for each aim is outlined):1.

Segmenting Customer Group.

Can we segment customers into responsive and non-responsive customers for different categories of sales campaign?I will engineer features and manually divide the customers into two groups: those that do not respond to a promotion, and those that respond to a promotion.

A responsive customer is one that shows increased spending at Starbucks during the promotion period.


Predicting New Customer Behavior.

Given the basic profile (age, income, gender, date of membership) of a customer with no prior purchase record at Starbucks, can we predict how he or she will respond to a certain promotion?Based on existing customer demographic data (gender, age, income, and tenure of membership), I will use machine learning to build a predictive model to help predict whether or not a new Starbucks member will be a responsive customer to a specific promotion category.


Identifying Features of a Successful Promotion.

What are some features that correlate with a successful sales campaign?Using exploratory data analysis (EDA), I will identify features that are common among successful promotions.

TerminologyA promotion sub-type can fall into one of the three categories: 1) discount, 2) BOGO, or 3) informational.

For example, the promotion sub-types “spend $10 get $2 off” and “spend $20 get $5 off” both belong to the discount category.

In the discount category, customers receive a small reward after spending a certain amount (e.


spend $10 and get $2 off).

In the BOGO category, customers pay full price for an item and receive a second item for free.

In the informational category, no monetary or rewards are presented, but a customer is notified about new or seasonal products.

A single offer is an advertisement sent to a customer announcing a promotion sub-type.

A sales campaign is the act of sending multiple single offers to a customer pool at a given time.

The Starbucks ExperimentIn this experiment performed by Starbucks, data were collected for six sales campaigns.

Sales campaigns were announced on days 1, 8, 15, 18, 21, and 24.

During each sales campaign, roughly 12700 single offers (approx.

1270 single offers per sub-type) were sent to randomly chosen customers.

A total of ten promotion sub-types were available: four in the discount category, four in the BOGO category, and two in the informational category.

Data SetThree data sets were available: Portfolio, Transcript, and Profile.

Portfolio contains a reference list of all promotion categories and sub-types offered by Starbucks.

Transcript records a customer’s transaction history, the numbers and sub-types of single offers sent to a customer, and provides data needed to analyze individual customer behavior.

Transcript data was collected through the course of one month.

Profile contains demographic information about a customer (e.


, gender, age, income, and tenure of membership), and when combined with information derived from transcript data, can be used to build predictive models for customer behavior.

Data CleaningData cleaning was performed for some features in the Portfolio, Transcript, and Profile data sets.

In Portfolio, “channels” and “offer_type” were converted to dummy features.

In Transcript, “event” was converted into a dummy feature.

In “value,” the entry format differed depending on what type of event it described.

I separated these entries into distinct features.

If the value reflected a single offer event (e.


, offer received, viewed, or completed), an ID corresponding to the sub-type of that single offer was entered into a new feature column called “id”.

If the value reflected a transaction event, transaction amount was entered into a new feature column called “transaction_amount.

”In Profile, several features were processed.

In the “age” feature, 2175 customers were listed as 118 years old (possibly the default entry for customers who did not disclose their age).

I converted all 118 values in the age column into NaN values.

In the “became_member_on” feature, the date a customer became a Starbucks member was recorded in an eight-digit format (e.


, 20180726 for July 26th, 2018).

I converted this into the total the number of days a customer has been a member since the eight-digit reference date, and renamed the feature “tenure_length”.

“Gender” was converted to a dummy feature.

Data Analysis1.

Segmenting Customer Group.

Here I will perform data processing and feature engineering to segment responsive customers from non-responsive customers for each category of promotion (discount, BOGO, informational).

A responsive customer would be someone who receives a single offer, and responds by spending more dollars than he regularly does at Starbucks during a promotion period.

Based on transaction and single offer (e.


, offer received, viewed, or completed) data, I determined the spending-per-hour (SPH) (see below) of a customer during the promotion and non-promotion times, and a set ratio (promo-SPH/non-promo-SPH) was used to identify a responsive customer.

Several metrics used to calculate the ratio are defined below: 1) outliers, 2) aware hours (the time length that a customer was aware of a promotion) and 3) SPH for each promotion category.


OutliersAt Starbucks, the main products are beverages and pastries, and most transaction amounts are less than $50 (see Figure X).

However, a small fraction of transaction amounts were surprisingly large (e.


, exceeding $1000 in one single payment).

These high-amount transactions may have been cater orders, unlikely influenced by promotions, and therefore excluded from the analysis here.

To be considered an outlier transaction, the transaction amount must be higher than a threshold determined by the truncated inter-quartile range (IQR) method.

If the IQR-derived threshold is lower than $50, then I used $50 as the threshold.

Truncated IQR was derived by first ordering the customers transaction by high to low amount and then removing records from the top 20%.

This helps to avoid an artificially high IQR.

Threshold was calculated by two times the IQR value above the first quartile (Q1) value (threshold = 2 * IQR + Q1).

2) Aware HoursIt is important to determine the time length a customer knew about a single offer, since this is the period that a customer’s purchasing behavior may be influenced by a promotion.

For discount and BOGO categories, aware hours are defined by the time when a customer viewed the offer to either 1) when the customer completed the offer (i.


, made a purchase) or 2) when the offer expired (i.


, no purchase made).

For the informational category, aware hours are defined by the time a customer viewed the offer to when the offer expired.

3) Spending-Per-Hour (SPH)Promo-SPH was calculated by category (discount, BOGO, or informational), and defined by:Sum of all normal transactions during aware hours / Sum of aware hours of all single offers that fall into a categoryFor example, to calculate the discount-SPH for a customer, the sum of all normal transactions that took place during time periods that fall within aware hours is divided by the sum of aware hours of single offers that fall in the category of discount.

Non-promo-SPH is defined by:Sum of all normal transactions during non-aware hours / Sum of non-aware hoursOutliers were excluded from all SPH calculations.

Identifying Responsive Customers.

Whether or not a customer can be classified as a responsive customer for a given promotion category was determined by the (promo-SPH/non-promo-SPH) ratio.

A customer is identified to be responsive to a particular promotion category if the (promo-SPH/non-promo-SPH) ratio is higher than 1.

04 (for discount), 1.

12 (for BOGO), and 1.

00 (for informational).

Here is an example.

Table 1.

Hypothetical transcript data of a customer.

Let’s say a customer received a single offer in the discount category, (event A, Table 1), viewed the single offer (event C), made three transactions (event B, D, E, and F), and completed the offer (event G).

Transaction B took place outside of the aware hours period and will be excluded from promo-SPH calculation.

Transaction F was identified as an outlier and will be excluded from promo-SPH calculation.

Transactions D and E took place during the aware hours and will be considered.

In this example:discount-SPH = (event D transaction amount + event E transaction amount) / (event G time stamp — event C time stamp)= ($3.

00 + $6.

00) / (100–50)= $0.

18 per hourIf this customer’s non-promo-SPH was $0.

10 per hour, then the (discount-SPH/non-promo-SPH) ratio would be: $0.

18 / $0.

10 = 1.

80Since the “passing” (discount-SPH/non-promo-SPH) ratio for a responsive customer is 1.

04, I will identify this customer as a responsive customer.

Exploratory Data Analysis (EDA).

A quick look at possible relationships between responsive customers and available features in Profile (“gender,” “age,” and “income”) showed no correlation to whether or not a customer is a responsive customer (see example using “age” feature in Figure 1).

“Tenure length” appeared to show minor differences between the positive response (Figure 2).

Figure 1.

Histograms showing similar “age” distribution for responsive vs.

non-responsive customers.

Histograms for “gender” and “income” also show no notable differences between responsive vs.

non-responsive customers.

Figure 2.

Histogram of “tenure_length” for responsive vs.

non-responsive customers.


No sophisticated predictive algorithms were used here, but basic data processing and analysis can effectively segment customer group into responsive customers vs.

non-responsive customers.


Predicting New Customer Behavior.

Here I will use machine learning to build a model for predicting how a new customer will respond to a particular promotion category (discount, BOGO, informational).

A new customer is a new “Starbucks mobile app” member with basic profile data, but no prior transaction record with Starbucks.

Based on profile data for these customers, I built three classifier models (one for each promotion category) to predict if a new member will be a responsive customer for a certain promotion category.

Initial Algorithm Scan.

An initial scan of seven algorithms, including: logistic regression, linear discriminant analysis, K-nearest neighbors, support vector machine (SVM), AdaBoost (Ada), random forest (RF), and XGBoost (XGB), showed that RF has the best performance.

F1 score was used to evaluate performance, since it reflects both type I and type II errors.

If type I error occurs, meaning that a responsive customer is predicted to be a non-responsive customer, then the number of responsive customers will be underestimated.

If a sales campaign is designed based on a predictive model with this error, then the rewards of a Starbucks promotion may be set at a level that does not maximize potential sales boost.

If type II error occurs, meaning that a non-responsive customer is predicted to be responsive customer, then the number of responsive customers will be overestimated.

If a sales campaign is designed based on a predictive model with this error, more customers will be enjoying the rewards of a promotion without contributing to sales boost, and Starbuck could loose revenue.

Optimizing Model Hyperparameters.

The best performing models from Ada, SVM, RF, and XGB were optimized.

Optimized models for discount and BOGO showed no improvement, and that for informational showed modest improvement.


F1 scores for test sets of discount (0.

65), BOGO (0.

67), and informational (0.

53) were collected.

Important Features.

“Tenure length” was identified to be the most important feature for differentiating responsive customers from non-responsive customers for all promotion categories.

But since “tenure length” would always be zero days for any new customer, this feature would not be a useful differentiating metric.

“Age,” “gender,” and “income” were not differentiating features.


Given the basic profile information of a new customer, a customer-to-promotion match cannot be predicted using machine-learning models.

Additional features such as customers’ occupation, education level, or location, might be collected to help develop a more effective predictive model (although such data collection might make the customers’ user experience less enjoyable).


Identifying Features of a Successful Promotion.

Here I use data processing is to determine the success rate of a promotion sub-type.

A successful single offer has to fulfill the follow criteria: 1) a transaction happens after the single offer was viewed, and 2) the customer spends a higher amount than that required by the promotion offer.

Success rate for a promotion sub-type was calculated by:numbers of successful single offers / total numbers of single offersExploratory Data Analysis (EDA).

Success rate for all ten promotion sub-types were calculated, and range from 17.

5% to 61.

0%, showing no notable dependence on promotion category (see Table 2).

Additionally, the point biserial correlations were computed between the offer sub-type features and the success rate.

Figure 3.

Selected point biserial correlations of offer sub-type features and the success rate.

Table 2.

Ten sub-types and their id’s, selected features (“purchase requirement,” “channel”), and projected success rate.

Choice of channel (i.


, social media, mobile app, webpage, and email) for delivering single offers is a determining factor for a successful promotion (see Figure 3).

For each subtype, a fixed set of channels were used to sent out single offers, and those that were distributed through social media and Starbucks mobile apps had the highest success rates.

Success rate also correlates with a lower purchase requirement; i.


, the less amount a customer has to pay or less items a customer has to purchase before the rewards kick in, the more successful the promotion (see Figure 3).

For example, on the same basis of giving a “20% reward,” a sub-type offer that require a customer to “spend $5 get $1 off” will be more effective than a “spend $20 get $4 off” promotion.


A successful promotion can fall into any category, but the determining factors for a high success rate are choice of channel for sending out single offers and the reward barrier.

Recommended Actions for StarbucksFew recommend actions to boost short-term sales during promotion periods are listed below.


To maximize customer response:Collect data from existing members for a certain time period, determine customer-to-promotion category match for individual customers, and send single offers accordingly.


To predict new customer behavior:Additional data (e.


, occupation, education level, location) might be requested in the profile section, when a new customer signs up to be a Starbucks member to test for possible features that might contribute to a customer-to-promotino match.


To maximize success rate of single offers:Send single offers via social networks and mobile app.

Offer discount and BOGO promotions with low purchase requirement.

What’s Next?One interesting experiment to do may be to send out single offers that invites a customer to “choose your own promotion.

” For example, a single offer may contain the option to use a discount sub-type, a BOGO sub-type, or get a first view on new or seasonal products.

Because customers are given the option to “choose,” a promotion like this will engage customers and reveal insights into what each customer likes — contributing to the quest of improved customer-to-promotion match.

Possibly an interesting idea to test?Thank you for reading my post.

Please see my GitHub repository for details of the work presented here.


. More details

Leave a Reply