# Understanding Customer Lifetime Value In Retail

Understanding Customer Lifetime Value In RetailHarminder PuriBlockedUnblockFollowFollowingFeb 26If you’re building a lifetime annual P&L for a company, a good starting point would be answering the following questions:● Expected year-on-year sales● Cost of acquiring new customers and retaining existing customersTo answer the above questions, we would need to calculate the lifetime worth for each customer.

To put it succinctly, customer lifetime value calculates the net present value over the lifetime of a customer’s relationship with the brand.

What does this mean for businesses?As with any modelling techniques, understanding how customers engage with businesses is extremely important.

A customer’s engagement can either be based on periodic subscription or non-subscription.

The number of engagements may vary if there are discrete purchase opportunities or continuous purchase opportunities.

Table: Examples of different business contextsFor example, customers purchasing tickets to an Olympic event is a discrete non-subscription engagement.

In case of a subscription service, there is a clear demarcation of whether the customer is engaged with a brand or not.

In a non-contractual setting, such as a retail or e-commerce or a hospitality customer, this becomes trickier given that the engagement duration of the customer with the brand is implied.

Most of the business cases are usually non-contractual and continuous such as grocery purchases, e-commerce website purchases, airline bookings, hotel bookings, rental services, and doctor visits, etc.

Before we deep dive into how the model is implemented for customer lifetime value, let’s look at one of the most commonly used marketing techniques for customer evaluation:The RFM (Recency, Frequency, Monetary) approachIn the RFM approach, we build segments based on three features for each customer: recency of the last purchase, the frequency of the purchase, and the monetary value of the purchase.

This is followed by splitting customers into quintiles along each metric and grouped on an ordinal scale.

Table: RFM ApproachFor example, if the top customer’s frequency lies in the top 20% percentile bracket, the customer would be marked 5 (on a scale of 1–5), against frequency.

So, we can define RFM against each customer and this is one way of grouping customers based on the three variables, providing an insight into a customer’s spend potential based on the training window.

What happens if the training window is changed?Imagine looking at two years worth of RFM data for each customer.

Consider each year as a single training window.

Now, it may so happen that for each customer, the RFM value may differ from one window to another.

RFM allows you to peek into the next immediate window from the training window but is not good in evaluating the lifetime worth of a customer.

This, therefore, poses a challenge.

But to overcome the above limitations, we need to explore alternative approaches for calculating lifetime value.

Probability Models: Pareto/NBDThis is where a probability distribution approach that characterizes observed individual customer attributes (recency, frequency) such as Pareto/NBD (Negative Binomial Distribution)* can help in demystifying the customer lifetime value in a non-contractual continuous purchase setting.

There are two components on which individual customer attributes lie:Transaction process ( f(x|λ) ): For customer i, the transaction process is considered as a poisson distribution (NBD), with an average purchase rate of λ(i) for the given window.

Dropout process ( f(Ʈ|μ) ): The dropout rate for a customer is explained using exponential distribution (Pareto) with a drop rate of μ(i).

Fig: Summary of the Pareto/NBD model for CLTV estimatesNow, that we have individual level attributes, we can estimate the population distribution based on generalizing over observed attributes.

The population distribution (parameters) of λ is estimated using Gamma Distribution with parameters (r, α), and similarly, heterogeneity in drop rate is explained using Gamma Distribution with parameters (s, β).

Key results from a Pareto/NBD model:A number of outputs are derived against each customer based on the above distributions:DERT (Discounted expected residual transactions over the lifetime): the forecast of the residual number of transactions by each over the lifetime of the customersFig: Lifetime purchase frequency forecast2.

DERTI ( Discounted expected residual transaction over a predefined time period): the forecast of the residual number of transactions by each customer over the period i (this can be a pre-decided period)3.

DERLV (Discounted expected residual lifetime value): Expected additional Lifetime revenue(Individual customer’s lifetime value = Average revenue/transaction * Discounted Expected Residual transactions (DERT))Fig: Lifetime revenue (sample) for different purchase frequency buckets4.

Probability(alive): Probability of a customer being active at the end of the training windowEvaluating model performance via test period purchase frequencyBased on the outputs derived from the Pareto/NBD approach, we then need to evaluate how well the model performs during the test period.

A good measure of how well the model fits the test window is based on the multiclass confusion matrix for purchase frequency (actuals vs forecast).

Fig: Repeat Purchase frequency actuals vs forecast count for a test period of 3 months; the numbers in each bucket show the number of customers categorized in that bucketHere again, a good starting point is evaluating the cost of acquiring a customer vs returns per customer.

For a retail customer, we would be more focused on getting a high recall for each frequency bucket (given the low cost of acquisition).

Another segregation, which is interesting to look at is the repeat vs non-repeat customers.

For instance, as seen in the cross-tab, most non-repeat (~88%) is captured in the correct bucket.

However, in the frequency 1 bucket, many non-repeats are also captured as repeats.

This does not matter in the given context due to the low cost of customer acquisition.

Fig: Model accuracy on purchase frequency categorization into correct buckets across customersMedia Activation: targeting high-value customersOnce we have an expected revenue value assigned against each customer, the same can be used as normalized weights for media activation.

This entire process is automated via MiQ’s AMAP workflow, which ingests website data for a customer along with customer IDs in real-time, and uses the same for retargeting.

AMAP is MiQ’s proprietary Analytics, Modelling and Action platform that aims at creating a self-service platform, automating the process of transforming disparate data sets to actionable insights and acting on them.

Challenges with Pareto/NBD implementation:As with any temporal data-based model, the model is only as good as the time range of the data supplied.

Hence, it is quintessential that the data captured is for a time frame considerably more than the typical customer repetition window.

Another limitation with the aforementioned approach is each customer’s transaction value is taken to be constant for future transactions.

However, these in actual may vary due to intrinsic and extrinsic factors in a lifetime frame.

If the model is periodically rebuilt, the average transaction value keeps updating for customers.

However, if it is a one-time activity then this value may be off actuals in long term.

A solution, in that case, is to use a gamma distribution of exponential parameters to forecast monetary component.

Finally, irrespective of the approach adopted for calculating CLTV, building an “always on” or “periodically refreshed” model ensures that any feature change across customers is captured to accurately forecast the lifetime value.

*Source: Schmittlein, David C.

, et al.

“Counting Your Customers: Who Are They and What Will They Do Next?” Management Science, vol.

33, no.

1, 1987, pp.

1–24.

.