Bayesian Cluster Analysis For Retail Using LDA

Probabilistic models help us predict how people will behave in the future. At Custora, we’re most interested in purchasing. Recently, we’ve posted several entries related to purchasing behaviors, and in this post, I am going to talk a little bit about a model used in marketing to predict which product customers will buy. Models of customer behavior help us make predictions about what our customers will purchase next. In the past, I have talked about how we model customer purchase behavior.

Here I will discuss how we model what brands and categories customers will buy. These models are used are helpful in understanding brand dynamics, and can be used to better market to customers. In particular these models help understand which brands customers have an affinity towards and which brands group together. Understanding these things are necessary in tailoring the shopping experience and email messaging to existing customers.

The basic Dirichlet model was developed by Goodhart, Ehrenberg and Chatfield in order to explain how customers choose between different brands. He wanted a way to explain differences in repeat buying behavior, penetration, and purchase frequency for different brands.

Model Intuition

The model he developed says that each individual makes purchases that are multinomial distributed, and the weights of these multinomial distributions are Dirichlet distributed across the population. To start, you can think of each person as having a weighted die that he throws before making each purchase. Let’s look at toothpaste as an example. I usually buy Colgate, but I sometimes go with other brands if they are on sale or if the store is out of stock. So my die would be weighted something like this.

Toothpaste Percentage
Colgate 70%
Crest 5%
Aquafresh 9%
Arm & Hammer 3%
Tom’s of Maine 1%
Other 12%

I don’t actually roll a die before purchasing my toothpaste, and there are legitimate reasons for buying the brands I buy, but it’s convenient to model my behavior as a random process. My sister, however, really likes to buy organic, and will go to great lengths to get her favorite brand.

Toothpaste Percentage
Colgate 4%
Crest 5%
Aquafresh 4%
Arm & Hammer 3%
Tom’s of Maine 80%
Other 4%

So, we’ve made it clear that everyone has his own brand preferences, represented by his own weighted die. Now imagine every consumer in a population tosses his or her individual die into a bucket. We can take the average of all the dice in that bucket and find the population average. For toothpaste in the United States, that looks something like this:

Toothpaste Percentage
Colgate 32%
Crest 27%
Aquafresh 10%
Arm & Hammer 8%
Tom’s of Maine 1%
Other 12%

Segmented Markets

It turns out that this model works very well for “unsegmented markets,” meaning that the proportion of purchases devoted to a given brand is independent to the purchases of the other brands.  Our toothpaste example is “unsegmented” if people have preferences for brands that don’t depend on their other preferences. However, let’s consider a segmented market, say automobile purchases. Suppose nationwide the data breaks down something like this:

Brand Market Share
GM 19%
Ford 16%
Toyota 13%
Chrystler 11%
Honda 10%
Other 31%

However, this breakdown may not quite tell the whole story. In a market like automobiles, there are preferences that transcend individual brands. For instance, some consumers only buy cars that are made in the US. The standard, unsegmented Dirichlet model does not do a good job of capturing these customers’ preferences. In this case, the market is better described as two segments: people who buy domestic cars and people who buy foreign cars.

Brand Segment 1 Segment 2
GM 39% 6%
Ford 31% 6%
Toyota 3% 20%
Christlyer 3% 16%
Honda 4% 14%
Other 15% 38%

Jain et al. introduced a clever model to handle segmented markets. Rather than just having one bucket of dice for the population, they use two or more, depending on the number of segments. Before each customer takes a die from a bucket, he first rolls a different weighted die to determine which bucket he should pull from. From there, he pulls a die from that bucket and makes purchases according to his die like in the basic Dirilecht model.

With our automobile example, we would have two buckets representing different distributions: one for the domestic car customers and one for the foreign car customers.

This type of analysis gives retailers valuable insights into their customers’ preferences. For example, retailers might be able to find there are certain common attributes of some products that make them appealing to certain customers, and they could redesign their website accordingly. Or they could use the clusters to customize the message they send to each customer.

In order to create an effective customized message to send to each customer, a retailer needs to be able to figure out which segment the customer belongs to. It is important to acknowledge that there is a degree of uncertainty here. Many of us have made the mistake of ordering socks from Amazon only to find our recommendations list flooded with more socks. It is important for retailers to avoid this pitfall and appreciate the amount of uncertainty that goes into a customer’s brand preferences. Just because a customer once bought a Honda does not automatically make him a member of the foreign car segment. There is some possibility that he was a domestic car customer who bought a Honda once because of a dealership special that was running.

This is where the probability model comes in. We look at the items that a customer has bought, and then we can come up with a probability that he belongs to each of the segments. A customer who has bought one Honda might have a 70% of belonging to the Foreign car segment, and a 30% chance of belonging to the Domestic segment. But if he goes on to buy three more Hondas more times, he would have a 10% chance of being a domestic customer and a 90% chance of being a foreign car customer.

It’s also important to realize that there is also heterogeneity within distributions. People within the Domestic segment may have a preference for one or the other. For example, a customer who has bought three Hondas, one Toyota, and one Ford might have underlying propensities that look like.

Brand Segment 1
GM 2%
Ford 8%
Toyota 15%
Chrystler 5%
Honda 60%
Other 15%

Heterogeneity Within Segments

Even though we can say that this customer probably belongs to the Foreign segment, we can see that he exhibits a strong preference within that segment for Hondas. We can also see multiple segments in terms of brand choice. One retailer that we looked at sold products in a number of categories, and it had two segments that looked roughly like this:

Category Segment 1 Segment 2
Art 30% 35%
Linens 25% 20%
Kitchenwares 15% 20%
Lighting 15% 10%
Organization 15% 10%

We found that the clusters themselves were very similar; the difference was between how users in the different clusters explored the brands. Customers in Segment 1 tended to explore much more than customers in Segment 2. That is, a customer in Segment 1 was likely to purchase from many categories, whereas a customer in Segment 2 likely purchased from the same categories.

Number of Purchases 1 2 3 4 5 6
Categories Tried Segment 1 1 1.6 2.1 2.4 2.8 3
Segment 2 1 1.1 1.3 1.4 1.4 1.5

So even though that the segments have the same overall proportions, Segment 1 is made up of similar customers purchasing things in roughly even proportions, and Segment 2 is made up of dissimilar customers who have mostly bought one thing.

I hope this post has given you some insight into understanding your customers’ preferences. Using preferences and segments effectively is critical to designing your marketing materials and reaching your customers. Feel free to try applying these principles on your own, or check out Custora for our ready-built implementation and lifecycle-marketing.

If you are interested in this work, Custora is hiring data scientists, software engineers, client services and business development roles. Reach out to us at

Society, R. S., & Society, R. S. (2012). The Dirichlet : A Comprehensive Model of Buying Behaviour Author ( s ): G . J . Goodhardt , A . S . C . Ehrenberg and C . Chatfield Reviewed work ( s ): Source : Journal of the Royal Statistical Society . Series A ( General ), Vol . 147 , No . 5 ( 1984 ), pp . Published by : Wiley-Blackwell for the Royal Statistical Society Stable URL : . The Dirichlet : A Comprehensive, 147(5), 621-655.

Jain, D., Bass, F., & Chen, Y.-M. (2012). Estimation of Latent Class Models with Heterogeneous Choice Probabilities, 27(1), 94-101.

Leave a Reply

Your email address will not be published. Required fields are marked *