How Bayesian Probability Models Can Make CLV Predictions 12x More Accurate

This is part three of a three-part series exploring ways to calculate CLV in a retail setting.  Part one discussed the shortcomings of using ARPU to calculate lifetime value, and part two discussed the shortcomings of historical cohort analysis to calculate CLV.  

Imagine you’re the marketing manager of an online retail company.  You’re wondering how much you should spend to acquire new customers.  You know you should look beyond conversion rates and set your budget based on customer lifetime value, so you ask three analysts to calculate CLV.  Each uses a different approach, and the results vary wildly:

  • Analyst A uses an ARPU-based approach, and tells you the CLV is $240.
  • Analyst B uses a historical, cohort-based approach, and tells you the CLV is $150.
  • Analyst C uses probabilistic modeling and tells you the CLV is $108.

You listen to Analyst C, and it’s a good thing you did.  It turns out the CLV turned out to be $100.  Had you based your acquisition spend on the other analyst’s numbers, you could have lost over one hundred dollars per customer.

Getting predictive with CLV

CLV is a prediction – if I pick up a new customer today, what will he spend over his customer lifetime?  As we pointed out in our previous posts in this series, if you base CLV numbers entirely off the past, you can end up with very inaccurate CLV numbers.

When it comes to predicting CLV, as with any predictive science, there are many approaches you can take.  In this post, we’ll describe how probabilistic modeling can be used to predict lifetime value.

We’ll start with the story behind these models, then describe a mathematical approach.

Probabilistic-based CLV models: the story behind the approach

“Probabilistic models” might sound like a handful, but they begin with a clean, simple story about the customer.  Each customer is unique.  Each one orders at his own pace and frequency.  Each one has a chance to be a loyal customer, but each one might also turn out to be a one-time-buyer.

The two variables we described – order frequency and loyalty – can form the basis of of our probabilistic model. We can think of each customer as having a pair of dice that he rolls every month to determine how often he orders, and a coin he flips every month to determine whether or not he will remain a customer.  Because we know the danger of using average rates, we can assume that each customer has his own weighted dice and weighted coin.

To get to accurate CLV predictions, the goal of the models is to try to understand the distribution of those dice and coins.  What percentage of the customers are weekly buyers?  Annual shopers?  What percentage are loyal?  One-and-done?  Why is this valuable?  There are a few benefits to such modeling techniques:

  1. As we just mentioned, staying away from average retention rates can lead to a massive improvement in CLV accuracy.  By understanding the distribution of loyal and non-loyal customers, we avoid the common “average” problem.
  2. We also gain the ability to make CLV projections for specific customers (more details on this below).
  3. The distribution itself tells us about the customer base.  Do we have a lot of great customers, and a lot of poor ones – a love/hate relationship with our customers?  Or is there an even distribution of customer quality?

The results:

Before digging into some details about the math, we can take a look at how accurate these models are in the “real world.”  We are constantly testing the accuracy of different customer lifetime value techniques.  On average, probabilistic models significantly outperform historical techniques.

If you’re going to make a business decision based on CLV numbers, you absolutely must ask your team how they’re generating their projections – and how accurate those numbers are.

The math behind probabilistic modeling

For the modelers who are reading, there are many forms of probabilistic modeling one can use to project CLV.  We’ll dig into one common approach here.

We follow a two-step process: first, we set a framework to model the individual customer, then we account for customer heterogeneity.

To model the individual, we first think of the customer story mentioned above.  We can think of each customer having two variables: λ (lambda), which represents his order frequency, and μ (mu), which represents his drop out rate (i.e. lambda is our “dice,” and mu is our “coin”).  We can go one step further with our frequency variables and acknowledge that customers rarely order on a strict pattern.  To handle this reality, we can think of λ as the mean number of orders a customer makes in a period – the mean of a poisson distribution.

Next we shift gears to focus on the distribution of λ and μ across our customer population.  We need a distribution that can describe different customer bases, and has a few parameters so that the model remains powerful.  The gamma distribution is a perfect candidate, since it characterizes most customer bases very well.

Now we have a way to model individuals, and we have a mathematical way to describe how people are different.  We are ready to ask our model, “what distribution of dice and coins would have given us the behavior we see in the past?”  We can use maximum likelihood estimation to find the most likely parameters for the distribution.  We use numeric optimization to figure out the parameters of the two gamma distributions, one for λ and one for μ in a way that best explains the ordering patterns we have seen in the past.  The optimizer does the work of testing different distributions and will eventually converge on shapes that best describe what’s going on in our user base.

Once we have obtained these distributions, we can use the outputs to derive a more accurate, precise projection for the expected number of orders a new customer will make.  We now understand the probability that a customer will be loyal or not, and the probability a customer will be a frequent shopper or a once-a-year-buyer.  By avoiding the dangerous average retention rate in both these cases, we’ll derive much more accurate numbers.

Moreover, we can use Bayes’ theorem to make projetions for specific customers.  Given what we have seen from a specific customer, and given what we know of the whole customer base, we can make informed probabilistic projections about individual customers.  CLV is no longer a game of the population at large – it’s a figure you can project for each and every customer.

Taking things further

There are some obvious limitations with this model.  As with most models in general, they assume the world is static.  Covariates can be added that try to handle things like seasonality and gradual changes to the business in general, but adding these to the model estimation is no small feat.  The base model, often referred to as “buy ’til you die”   or a “latent attrition” model assumes that customers, once they leave, are gone for good.  This isn’t always the case.  Hidden Markov models can be used where we assume customers have latent active/inactive states before actually leaving for good, and simulations can be run to see if the HMM describes the customers better, and makes better predictions than the latent attrition model.

Things get even trickier when you add other marketing actions into the picture.  If a firm induces a purchase with a 50% discount, should we treat that order like any other order and update our user-level expectations accordingly?  These are the issues that keep our team up at night.

Finally, we’ve been focusing this entire piece of projecting orders.  Separate models are required to get a feel of expected revenue and profit on the user level.

How are you modeling CLV and handling these challenges?  We’d love to chat here in the comments or over in the discussion on Hacker News.

  • Mr. Typeface

    This will sound nitpicky, but can you guys please use a different typeface for the body copy? It looks really, really bad.

    • Corey

      Thanks for the feedback. We’re going to make a handful of changes to the blog design in a little while, including the font.

      • Plar

        No!! As I was reading this post, I thought to myself, “Wow, what a good font. Big and clean. I should use that for my own blog”. So please don’t change it. I think you could reduce the brightness around the font (or make the font darkest possible). Otherwise, it’s a good font to read. Thank you for an insightful post, too!

        • Anonymous

          Thanks for the kind words! It seems to be a great font on OS X/Linux, but not so good on Windows. We may end up with a different font just for Windows users.

  • Pingback: CLV in retail: probabilistic modeling reduces error by 12x | Custora Blog « bartev

  • Mr. Lynx

    Your links have these strange spaces at the ends of them…what’s that about?

    • binarysolo

      I don’t seem to get them FWIW. OSX Lion/Chrome here.

    • Anonymous

      Thanks for the tip. We’d love to fix that, but can’t recreate — what OS/browser are you using?

      • http://www.facebook.com/chrislloyd515 Christopher Lloyd

        I’m getting the same thing with Mountain Lion and Chrome Dev. Seems to be a bug with my setup… right clicking and going to “inspect element” fixes it across the page.

        • Matt

          Yeah Chrome has a few issues like this from time to time, I think it might be a race condition in the rendering flow.

  • Tom

    Sounds interesting but it’s hard to follow “The Math” when it’s just a wall of text. How about some actual math? e.g. for a given customer with feature vector x, CLV(x) = what?

  • Chinmay Kulkarni

    So, just to be clear, is your gamma distribution the conjugate prior of the Poisson? Or is this a completely independent random variable?

    • Aaron Goodman

      That’s correct. Check out the discussion on HN. We posted a bunch of links and details of the model there.

  • Ed

    Great post! Thanks! How would you apply the probabilistic approach to a game developer startup, when you don’t have historical data???…use data from similar games?