Using Starbucks app user data to predict effective offers

Using my best_model function, I get the below dataframe of the results:#get best model overall for bogo,discount and info offersbest_model('bogo').append([best_model('discount'),best_model('info')]).transpose()Model performance for best modelsOverall, we can see that the top performing models are the 3rd model (with GridSearch to find optimal model parameters and removing amount_invalid column) for predicting effectiveness of BOGO and discount offers, whereas the best performing model for informational offers was just after performing GridSearch to find the optimal parameters.In order to find the most influential drivers of an effective offer, we can check the feature importances of our best models above.For BOGO model:For discount model:For informational model:Checking on the feature importance to analyse the main drivers of an effective offer, we can see that the most important driver of effective offers across all three are the tenure of membership..However, the 2nd most important feature is different for each of the three models.For a BOGO offer, the membership tenure is the most important feature, and the other variables are a lot smaller in proportions..Income, age and offer_received_cnt are the 2nd, 3rd and 4th most important features, but their proportions are very small.For a discount offer, after the membership tenure, age and income are the next most important variables..But it is still very small in proportions.The feature importances for the informational offer models are more distributed compared to the BOGO and discount models, with income being the 2nd most important feature..Age is the third and mobile channel interestingly being the 4th.Part 4: Side explorationa..Exploration on users in Groups 3 and 4 — People who purchase regardless of viewing any offersWe had earlier delineated those in groups 3 and 4 as people who would purchase regardless of viewing any offers..Now we can do some exploratory analyses to see what kind of demographic this group of users consist of.a.i..Data PreparationIt would be interesting to see how people in groups 3 and 4 contrast with people in groups 1 and 2, so I decided to compare between all 3.I appended the data from all groups from the three offer types together, then compare the characteristics of each group via visualizations.I also cleaned the dataset of null values, similar to the preparation of the datasets above for modeling.Size of each group of customers in overall datasetComparing the sizes of the 3 groups, we can see that group 1 is the largest, while group 2 is the smallest, which is unsurprising as we had seen that the classes in our datasets were imbalanced in favour of positive classes (i.e. effective_offers=1)..Meanwhile for people in groups 3 and 4 there are quite a significant number of people as well, larger than the number of people in group 2.a.ii..Exploration of demographic characteristicsMeanwhile, in order to effectively compare between the groups, I visualized the groups together.First, we can explore the income distribution between the 3 groups.Across the 3 segments, most people fall within the middle range of income (50K — 100K)..The income distribution between the 3 segments are relatively similar.Age distribution looks relatively similar between the 3 groups as well, with most people between the age 40–80 years old.Group 2 are people who did not spend at all as the offers were ineffective on them, hence they are not in the graph..But for groups 1 and 3+4, we can see that the amount spent is relatively similar, except that people in group 1 spent slightly more.. More details

Leave a Reply