Fighting Churn With Data

Definitely not!So to reduce churn you really need to increase value the users receive but this is going to be harder than getting people to sign up for a service in the first place.

Because these are people who already know exactly what the service is really like, so promises made via marketing or sales representatives are unlikely to get a lot of traction.

As will be discussed in detail later, one of the best options available is trying to make sure subscribers take advantage of all the best aspects of a service that are already available, but this has limits.

As the data person, you may be asked for “silver bullets” to reduce churn, and here is the bad news:TAKEAWAY There are no silver bullets to reduce churnThe alternative of course is to reduce the cost of the service.

But reducing the monetary cost is the nuclear option for a paid service — revenue churn or downsells may be better than complete and total churn, but it’s still churn.

WARNING Price reduction is a “diamond bullet” against churn : it always works, but you can’t afford it.


How to Fight Churn3.

1 Preventing Churn is the Job of the BusinessPreventing churn is not really the job of a data person, or at least not something a data person can do alone.

There have been remarkable advances in AI and data science in the past years, but for the most part preventing churn requires a human touch with the customers or users.

Consider the following examples of roles that can really reduce churn:Product Managers and Engineers (for software) and Producers, Talent and other Content Creators (for media) reduce churn by making changes to product features or content offerings to improve the value or enjoyment the subscribers derive from the service.

This is the most primary or direct method of reducing churn.

A related approach in software is to try to increase “stickiness” which roughly means modifying the product to increase the switching cost for customer to change to an alternative by providing valuable features that are hard to reproduce or are designed to be difficult to transfer.

Marketers reduce churn by crafting effective mass communication that directs subscribers to the most popular content and features.

This is actually more of an educational function for marketing than a sales function: Remember, the subscribers already have access and know what the service is like, so promises won’t help.

Still, this function is often undertaken by marketing because these are the people with skill in crafting effective mass communication.

Customer Success and Support Representatives prevent churn by making sure customers adopt a product and help them if they can’t.

Customer Support is the department which traditionally has helped customers, but Customer Success is a new and separate function in many organizations that is explicitly designed to be more proactive than Customer Support.

Whereas Customer Support helps customers when the customer’s ask for help, Customer Success tries to monitor customers and detect when a customer needs help and reach out to the customer before the customer asks for it.

Customer success is also responsible for “onboarding” a customer or making sure they do everything necessary to start taking advantage of the subscription.

For this reason, if there is a Customer Success team is typically the group that a data person will work most closely with on a day to day basis.

The data person will often help devise the metrics or models that the Customer Success representatives will use to detect struggling customers.

Also, when marketing education campaigns will often be undertaken in close coordination with Customer Success — the Customer Success department may design the content for this, and marketing’s role would then be only to make it look nice and manage the email campaigns (and the like) that distribute the content.

Account managers in the sales department (if there is one) may be the last resort in stopping churn, assuming the service has a monetary cost.

Account managers are the people who can actually reduce the price or change the subscription terms, managing the process through which a customer may downsell to a less expensive version.

In paid consumer services without a sales department this role is usually taken on by senior Customer Support representatives who have similar authority.

While we describe the role of sales as a “last resort” in preventing churn, a more proactive approach in many organizations is to “right size” the sales in the first place, meaning do a better job of selling the product version that is optimal for the customer up front, rather than selling the most expensive version possible.

This may hurt the short term gain to the service from each sale, but if done correctly it will reduce churn and ultimately improve the average Lifetime Value from each customer.

It may also hurt the sales commissions for the account managers so it might require adjustment of the sales compensation system (a complex subject which is beyond the scope of this book.


These roles vary by the type of offering and organization and my descriptions are really just generalizations.

But the point is that ultimately reducing churn depends on taking actions to influence customers, and these actions are usually taken by the specialists in different parts of business, not by a programmer who is wrangling the data.

These roles are obviously very diverse, and I will refer to all of these functions as “the business”, for lack of a better term.

It is not to imply that the data person is not a part of the company, but data people usually have no direct responsibility for concrete business outcomes (like revenue) while the people in these other roles usually do.

From the point of view of the data person “the business” is the end user of the churn analysis results.


2 The Role of Data in Supporting the BusinessAs a result of all the factors mentioned above, what the business people need to help them fight churn are a concise set of facts or rules for understanding customer engagement.

These rules need to be actionable for the business, meaning they understand how to reduce churn based on the findings.

That in turn requires that the facts you discover are really causes or drivers of churn and engagement.

These facts usually take the form of relationships between customer behavior and churn or retention.

To make a simple example, you may discover business rules like “customers who use (view) the feature X more than five times a month churn at half the rate of customers who use it less than twice a month”.

Something like using or viewing more of a particular feature may not seem very complicated, but as long as it is a correct reading of the data its really useful.

Each part of the business can use a fact like that differently: Product creators will know the feature X is providing value, and improve it or make more like it.

The marketing department can design a campaign to drive users to the feature.

And when Customer Success/Support people are talking to a customer they can ask if the customer is using that feature and encourage the customer to try it if not.

This type of analysis is not an AI algorithm of the kind that gets the most attention in the media and university education nowadays.

This may disappoint some readers: To reduce churn it will not be sufficient to deploy an “AI” system that will win a data science forecasting competition.

If you try to deliver an analysis that predicts churn without explaining it in terms of concrete rules, the business will not be able to use it easily and so they will probably not use it at all.

While there are techniques to make black box machine learning models more interpretable, I will show that you can reach better conclusions through more straightforward methods.

By using simple methods you can really make the business people part of the investigative process, capture more of their domain expertise, and give them more confidence in the result and better ability to interpret it.

Predictive models for churn can be useful, but they are much more useful when the prediction is the natural extension of a program of investigation and knowledge transfer from the data team to the business.


Case Studies: Metrics That Matter (Weapons in the War Against Churn)In this section I introduce4.

1 Klipfolio’s Active Users and License UtilizationKlipfolio is a data analytics cloud app for building and sharing real-time business dashboards.

These dashboard can be created by multiple users, and a common metric for any product that allows multiple users on one subscription is the number of users that are active.

The figure below is a demonstration of how the number of active users per month at a Klipfolio customer is related to churn.

Behavioral Cohort Churn vs.

Klipfolio’s Active UsersThe figure above uses a technique called Behavioral Cohorts to show the relationship between a behavior and churn.

You will see a lot of these plots in below and in the book and learn how to create them, and for now I will give a brief explanation of how it works: Given a pool of customers and a behavioral metric like the number of active users per month, the customers are organized into cohorts by their measurements on that metric.

Typically ten cohorts are used, so the first cohort contains the bottom 10% of customers in terms of the metric, the second cohort contains the next 10%, on up to the final cohort which contains the top 10% of customers on that metric.

Once the cohorts are formed, you calculate what percent of customers in each cohort churned.

The result is displayed in a plot like the figure above: each point in the plot corresponds to one cohort, with the x-value of the point given by the average value of the metric for all customers in the cohort and the y-value of the point given by percentage of churns (i.


the churn rate) in the cohort.

One other thing I should mention about the cohort plot is that it doesn’t show the actual churn rates, only the relative difference between the cohorts.

This is because the churn rate is a very important operational metric and it would be inappropriate to reveal the true churn rate for the companies in the case studies.

Similarly, none of the behavioral cohort churn plots in the book show the actual churn rates with labels on the y-axis.

However, the bottom of the behavioral cohort plots is always set to zero churn rate, so the distance of the points from the bottom of the cohort plot can be used for comparing relative churn rates.

For example, if one point is half as far from the bottom of the plot as another that means the churn rate in that cohort is one half the other.

Turning to the details of the churn cohort plot, it shows that the lowest cohort has under one active user per month (an average over multiple months) and the highest cohort has more than 25 active users per month.

The churn rate on the cohort with the lowest active users per month is around 8 times greater than churn in the cohort with the highest number of active users.

At the same time, most of the differences in churn rates occurs between around one and five active users per month.

While measuring the number of active users is a good metric for fighting churn, an even better one is shown in the next figure: this is the license utilization metric calculated by dividing the number of active users by the number of seats the user has purchased.

Many SaaS products are sold by the “seat” meaning the maximum number of users allowed (this is called the licensed number of seats).

If the number of active users is divided by the number of seats licensed, the resulting metric measures the percentage utilization of the seat license by the customer.

Behavioral Cohort Churn vs.

Klipfolio’s License Utilization (active users per seat)The result in the figure above shows that license utilization is a very effective metric for fighting churn: the lowest cohort in license utilization has average utilization just above zero, and the highest cohort has license utilization around 1.


The lowest cohort has around 7x the churn rate as the highest cohort and the churn rate varies more or less continuously across the cohorts — in contrast to churn in the the number of active users cohorts, there is not really a level at which having higher utilization no longer makes a difference.

This makes it more effective for distinguishing churn risks than active users alone.

As will be explained further in later posts and the book, the reason the active users per month actually conflates two different underlying factors related to churn: How many seats were sold to the customer, and how often a typical user is active.

Utilization is a measure of how active the users are that is independent of the number of seats sold and is usually very useful for segmenting customers with respect to their engagement and churn risk.


2 Broadly’s Promoter and Detractor Counts and RatesBroadly is an online service which helps small and medium businesses (SMB’s) manage their online presence including reviews.

A very important event for Broadly’s customers is the number of times the business is reviewed positively, or promoted.

The figure below shows the relationship between the number of promoters per month that a Broadly customer has and churn.

(For a detailed explanation of the cohort plot see the start of the last section.

) In the figure below the cohort with the fewest promoters per month (just above zero promoters on average) has a churn rate that is around 4x higher than the cohorts with the most promoters; the reduction in churn mostly happens between 0 and 20 promoters per month.

This is a clear relationship for an important event and it is easy to see why customers that have more on-line promoters are more likely to stay with Broadly service, because receiving positive reviews is one of the main goals for a customer using Broadly.

Behavioral Cohort Churn vs.

Broadly’s Customer Promoters (per Month)Another important event for Broadly’s customers related to the number of promoters is the number of detractors, or the number of times the business is reviewed negatively.

The figure below shows the relationship between the number of detractors per month that a Broadly customer has and churn: The cohort with the fewest detractors per month (just above zero) has a churn rate that is around 2x higher than the cohorts with the most detractors (average of just under 5 detractors per month); the reduction in churn mostly happens between 0 and 1 detractors per month.

While this relationship looks a lot like the one for customer promoters shown above, doesn’t it seem like there is something wrong here?.Getting negative reviews is a bad thing, and presumably not the result that Broadly’s customers were looking for — so why is having negative reviews associated with reduced churn?Behavioral Cohort Churn vs.

Broadly’s Customer Detractors (per Month)To understand why more of a bad thing like detractors may be associated with less churn it helps to look at another, better metric for Broadly’s customers: If you take the number of detractors and divide it by the total number of reviews (promoters plus detractors) then the result is the percentage of detractors, which I call the Detractor Rate.

The next figure shows the relationship between churn and the detractor rate — this is probably more the kind of relationship you were expecting for a product event that is negative for the customer: The higher the detractor rate, the higher the churn, and in a very significant way.

Behavioral Cohort Churn vs.

Broadly’s Detractor RateSo why does the relationship to churn show that more detractors is good when you look at detractor count alone, and that more detractors is bad when you look at the detractor rate?.The answer is that the total number of detractors in is actually related to the total number of promoters shown in, because Broadly customers who receive a lot of reviews overall are likely to receive more of both good and bad reviews, just by virtue of having lots of reviews.

So when you look at the impact on the relationship between the number of detractors and churn in the simple way it conflates two underlying factors driving the metric: having a lot of reviews (which is good), and having a high proportion of bad reviews (which is bad).

When the proportion of bad reviews is analyzed alone you get the more useful result.


3 Versature’s Call and Cost per Call MetricsVersature provides telecommunication services for businesses.

As a unified communications provider many of their most important events are voice calls which have a duration stored in a field attached to each event.

The figure below shows the relationship between the amount of voice calls a Versature customer makes and churn.

The lowest cohort in terms of local calls has practically zero calls and a churn rate that is around three times higher than those cohorts of customers that have local calls per month is in the thousands.

Behavioral Cohort Churn vs.

Versasture’s Local CallsWhen trying to understand churn it is important to consider not only the amount of a service customers use, but also how much they pay.

Monthly Recurring Revenue (MRR) is a standard metric for calculating the amount a customer pays to use a subscription service: it is the total the amount a customer pays each month to use a service, but not including any setup fees or irregular charges.

(I say more about MRR and how to calculate it in the book.

) The amount customers pay can also be analyzed with a behavioral cohort approach to look for a relationship with churn, which is shown below.

Behavioral Cohort Churn vs.

Versature’s Customers Monthly Recurring Revenue (Score)The behavioral cohort plot below does something new: rather than displaying the average MRR of the cohorts directly it shows the average MRR after every MRR measurement has been converted to a score.

If you are familiar with a concept of “grading on a curve” then metric scores are the same idea: the measurements are converted from one scale to another, but the ordering remains the same.

So a given cohort like the bottom 10% on the metric is still the same set of customers if the metric is converted into a score, and the cohort has the exact same churn rate.

That means that converting a metric into a score only affects how the cohorts are spread out along the horizontal axis of the cohort plot, but not the vertical position of the points, which is the churn rate.

Metrics are converted to scores when the re-scaling on the horizontal access makes the result easier to understand.

I say more about metric scores and how to calculate them in the book.

The cohort churn rates show that MRR is also related to churn, though not as strongly as making calls: the churn rates in the different cohorts do not vary in a totally consistent way, and the lowest cohort churn rates are only about one half or one third less than the highest churn cohorts.

But this is another case where it makes you stop and think about what it is saying: people who pay more churn less.

Is that what you expected?.This may be surprising but its actually quite common, especially in business products.

That’s because business products are sold with higher prices for bigger customers and bigger customers churn less for many reasons: They have more employees, so when it comes to product use like making calls or using software customers who pay more for a product are usually are using a lot more too.

So the lower churn for customers paying higher MRR is actually related to the lower churn for customers with more calls shown above.

But a different way to look at how the amount customers pay relates to churn is shown below: The MRR metric is divided by the metric for the number of calls per month.

This results in a metric which is the cost per call the customer makes.

I calls this a “value metric”, because it explains how much of the service the customer receives for their money.

Like in the MRR figure, the cost per call figure shows the cohort average as a score rather than in dollars.

The behavioral cohort churn plot for cost per call shows that customers that pay more really do churn more, when the payment is measured in relation to the amount of the service consumed.

The highest cohort in cost per call has a churn rate that is around 6 time higher than the cohorts with the lowest cost per call.

Value metrics like this are a key weapon for understanding why customers churn and an important subject that will be explored fully in later chapters.

Behavioral Cohort Churn vs.

Versature’s Customers Cost per Call5.

The Battle AheadThis series is going to be different from a lot of writing about data science, because I’m not going to tell you that you need some shiny new technology or a complicated sounding algorithm.

Instead I emphasize using your brain (via the scientific method) and the following areas:Full Stack Analytics: Going from raw data, through feature engineering and analysis, and all the way to explaining the results to the business.

Metric Design aka “Feature Engineering” take the spotlight, as these are the most important activities for successParsimony and Agility: Do it right, do it fast, because you are going to have to do it again, and again, and againInterpretability and Communication: Churn is an area where you fail if you can’t explain it to humans (I explain why below this post…).

. More details

Leave a Reply