Data Science Projects Employers Want To See: How To Show A Business Impact

By John Sullivan, DataOptimalGetting a job isn’t easy, you need to set yourself apart.One of my favorite strategies is building portfolio projects that show a business impact.Predicting flower types can be good if you’re just starting out, but in the real-world you’re going to directly or indirectly work on something that’s geared towards the business side of things.I’ll walk through step-by-step how to build a customer churn predictive model in R that shows a significant business impact.Here’s a quick outline of the process:At the beginning of any real-world data science project, you need to start by asking a series of questions.Here’s a few good ones to start with:Let’s assume that you work in the Telecommunications industry and you have access to customer data..To keep things simple, we’ll just look at using a logistic regression model.How will you evaluate your model?We’ll use a series of machine learning evaluation metrics (ROC, AUC, sensitivity, specificity) as well as business-oriented metrics (cost savings).The next step is to prepare the data.This workflow will vary from project to project, but for our example I’m going to use the following workflow:I’ve left out a full exploratory phase because I want to dedicate a future full post to it..To specify that we want to perform a binary logistic regression, we’ll use the argument “family=binomial”.Now that we’ve fit our model, it’s time to see how it performs.To do this, we’ll make predictions using the “test” dataset..We previously stated that our data shows it’s five times more expensive to acquire new customers than retain existing ones, so our retention cost will be $60.Here’s a quick summary of how those costs are associated with the four types of predictions:If we multiply the number of each prediction type by the associated cost and sum them, we get the following equation for cost:Cost = FN($300) + TP($60) + FP($60) + TN($0)Let’s calculate the cost per customer using various thresholds (0.1, 0.2, 0.3,…,0.9, 1.0)..The plot shows that the minimum cost per customer is about $40 at a threshold of 0.2.Let’s assume that our company is currently using the “simple” model, which costs about $48 per customer, at a threshold of 0.5.If we have a customer base of approximately 500,000 then switching from the simple model to the optimized model produces a cost savings of $4MM annually!.This cost savings is the type of significant business impact that employers would love to see.ConclusionOne of the best ways to set yourself apart during the job hunt is to build portfolio projects that show real-world business impact.If you can ask smart business questions and work through a project like a real-world data scientist, you’ll immediately be more valuable to employers.For the full step-by-step tutorial and R code, check out the original post.As always, make sure to document your work on your GitHub page and LinkedIn profile.Stay positive, continue to build projects, and you’ll be on your way to landing a job.. More details

Leave a Reply