About Aaron Goodman

Lead Data Scientist At Custora. Follow him on twitter. +Aaron Goodman

E-Commerce Customer Acquisition Snapshot

Screen Shot 2013-06-25 at 8.43.19 AM

Custora is excited to announce the publication of our first ever E-Commerce Customer Acquisition Snapshot.


The last few years have spawned massive changes in the world of online marketing. With U.S. e-commerce sales now topping $200 billion annually, digital marketers are getting savvier than ever.

The sharpest marketers in the new era of e-commerce will be looking beyond just where customers are coming from. They’ll be looking at the value of new customers acquired across channels, platforms , and geographies. And it turns out – not all customers are created equal.

The study is an effort to reflect on the rapidly changing landscape of customer acquisition and shed additional light on what’s shaping the future of e-commerce growth.


Please enter your info below to download the full report.

Why average revenue per user is a useless metric

One commonly used metric retailers use when acquiring customers is revenue per user. This number is useless for any business that has repeat customers. Here’s why:

If you are trying to evaluate your customer base and want to figure out the value of a customer, a naive approach is to look at total revenue your business has made, and divide that by the number of customers.

Suppose one business sees that the average customer has spent $64. However this is not the lifetime value of customers, it is just the average observed value. We know that we can spend at least this much to acquire a new customer. However this is less than the lifetime value of a customer. This is especially true if our business is rapidly growing, and we have been acquiring many new customers. In this case, our customer database is full of young customers who are far away from realizing their full potential.

Instead it is important to look at the lifetime value of customers, so instead of dividing by the total number of customers, you can divide by the total number of years that a customers have been active. Then we get a number such as customers spend on average $53/year. Thus if we are interested in the two year value of customers, we realize that we can actually spend up to $106 per customer.

However, for many business (especially young ones) this is actually an overestimate of the customer value, since customers are most engaged after their first purchase then slow down their purchasing over time, or take their business elsewhere. In order to accurately predict customer lifetime value, you need to either only look at customers who have been alive for the period in question, or to use a customer lifetime value model to make the predictions. In the case of our example, the actual two-year value of a customer turned out to be $81, somewhere in between the two estimates.

Using Big Data to Craft the Well-timed Email

Check out our article in Multichannel Merchant on how to craft the perfectly timed email.

We analyzed the sales data for an online clothing retailer, roughly 1.5 million purchases over a seven year period. In doing so we were able to craft a pattern of the temporal patterns of purchases, which can be used in timing your emails:

The average customer is more likely to make purchases after they have settled in at work, answered their emails, and had a productive morning. This level of activity remains constant throughout the workday, before dropping off during dinner, between 6 p.m. and 8 p.m. Activity picks up again and reaches an absolute peak in the hours after dinner and before going to bed. Based on this information, you should send marketing content around mid-morning or sometime in the evening in order to reach the maximum number of likely purchasers.

Check back in on our blog tomorrow, to see a deeper dive into this data.

You can checkout the full article on multichannel merchant. Or you can learn the sales patterns for your business by checking out Custora’s Customer Segmentation platform.

Bayesian Cluster Analysis For Retail Using LDA

Probabilistic models help us predict how people will behave in the future. At Custora, we’re most interested in purchasing. Recently, we’ve posted several entries related to purchasing behaviors, and in this post, I am going to talk a little bit about a model used in marketing to predict which product customers will buy. Models of customer behavior help us make predictions about what our customers will purchase next. In the past, I have talked about how we model customer purchase behavior.

Here I will discuss how we model what brands and categories customers will buy. These models are used are helpful in understanding brand dynamics, and can be used to better market to customers. In particular these models help understand which brands customers have an affinity towards and which brands group together. Understanding these things are necessary in tailoring the shopping experience and email messaging to existing customers.

The basic Dirichlet model was developed by Goodhart, Ehrenberg and Chatfield in order to explain how customers choose between different brands. He wanted a way to explain differences in repeat buying behavior, penetration, and purchase frequency for different brands.

Model Intuition

The model he developed says that each individual makes purchases that are multinomial distributed, and the weights of these multinomial distributions are Dirichlet distributed across the population. To start, you can think of each person as having a weighted die that he throws before making each purchase. Let’s look at toothpaste as an example. I usually buy Colgate, but I sometimes go with other brands if they are on sale or if the store is out of stock. So my die would be weighted something like this.

Toothpaste Percentage
Colgate 70%
Crest 5%
Aquafresh 9%
Arm & Hammer 3%
Tom’s of Maine 1%
Other 12%

I don’t actually roll a die before purchasing my toothpaste, and there are legitimate reasons for buying the brands I buy, but it’s convenient to model my behavior as a random process. My sister, however, really likes to buy organic, and will go to great lengths to get her favorite brand.

Toothpaste Percentage
Colgate 4%
Crest 5%
Aquafresh 4%
Arm & Hammer 3%
Tom’s of Maine 80%
Other 4%

So, we’ve made it clear that everyone has his own brand preferences, represented by his own weighted die. Now imagine every consumer in a population tosses his or her individual die into a bucket. We can take the average of all the dice in that bucket and find the population average. For toothpaste in the United States, that looks something like this:

Toothpaste Percentage
Colgate 32%
Crest 27%
Aquafresh 10%
Arm & Hammer 8%
Tom’s of Maine 1%
Other 12%

Segmented Markets

It turns out that this model works very well for “unsegmented markets,” meaning that the proportion of purchases devoted to a given brand is independent to the purchases of the other brands.  Our toothpaste example is “unsegmented” if people have preferences for brands that don’t depend on their other preferences. However, let’s consider a segmented market, say automobile purchases. Suppose nationwide the data breaks down something like this:

Brand Market Share
GM 19%
Ford 16%
Toyota 13%
Chrystler 11%
Honda 10%
Other 31%

However, this breakdown may not quite tell the whole story. In a market like automobiles, there are preferences that transcend individual brands. For instance, some consumers only buy cars that are made in the US. The standard, unsegmented Dirichlet model does not do a good job of capturing these customers’ preferences. In this case, the market is better described as two segments: people who buy domestic cars and people who buy foreign cars.

Brand Segment 1 Segment 2
GM 39% 6%
Ford 31% 6%
Toyota 3% 20%
Christlyer 3% 16%
Honda 4% 14%
Other 15% 38%

Jain et al. introduced a clever model to handle segmented markets. Rather than just having one bucket of dice for the population, they use two or more, depending on the number of segments. Before each customer takes a die from a bucket, he first rolls a different weighted die to determine which bucket he should pull from. From there, he pulls a die from that bucket and makes purchases according to his die like in the basic Dirilecht model.

With our automobile example, we would have two buckets representing different distributions: one for the domestic car customers and one for the foreign car customers.

This type of analysis gives retailers valuable insights into their customers’ preferences. For example, retailers might be able to find there are certain common attributes of some products that make them appealing to certain customers, and they could redesign their website accordingly. Or they could use the clusters to customize the message they send to each customer.

In order to create an effective customized message to send to each customer, a retailer needs to be able to figure out which segment the customer belongs to. It is important to acknowledge that there is a degree of uncertainty here. Many of us have made the mistake of ordering socks from Amazon only to find our recommendations list flooded with more socks. It is important for retailers to avoid this pitfall and appreciate the amount of uncertainty that goes into a customer’s brand preferences. Just because a customer once bought a Honda does not automatically make him a member of the foreign car segment. There is some possibility that he was a domestic car customer who bought a Honda once because of a dealership special that was running.

This is where the probability model comes in. We look at the items that a customer has bought, and then we can come up with a probability that he belongs to each of the segments. A customer who has bought one Honda might have a 70% of belonging to the Foreign car segment, and a 30% chance of belonging to the Domestic segment. But if he goes on to buy three more Hondas more times, he would have a 10% chance of being a domestic customer and a 90% chance of being a foreign car customer.

It’s also important to realize that there is also heterogeneity within distributions. People within the Domestic segment may have a preference for one or the other. For example, a customer who has bought three Hondas, one Toyota, and one Ford might have underlying propensities that look like.

Brand Segment 1
GM 2%
Ford 8%
Toyota 15%
Chrystler 5%
Honda 60%
Other 15%

Heterogeneity Within Segments

Even though we can say that this customer probably belongs to the Foreign segment, we can see that he exhibits a strong preference within that segment for Hondas. We can also see multiple segments in terms of brand choice. One retailer that we looked at sold products in a number of categories, and it had two segments that looked roughly like this:

Category Segment 1 Segment 2
Art 30% 35%
Linens 25% 20%
Kitchenwares 15% 20%
Lighting 15% 10%
Organization 15% 10%

We found that the clusters themselves were very similar; the difference was between how users in the different clusters explored the brands. Customers in Segment 1 tended to explore much more than customers in Segment 2. That is, a customer in Segment 1 was likely to purchase from many categories, whereas a customer in Segment 2 likely purchased from the same categories.

Number of Purchases 1 2 3 4 5 6
Categories Tried Segment 1 1 1.6 2.1 2.4 2.8 3
Segment 2 1 1.1 1.3 1.4 1.4 1.5

So even though that the segments have the same overall proportions, Segment 1 is made up of similar customers purchasing things in roughly even proportions, and Segment 2 is made up of dissimilar customers who have mostly bought one thing.

I hope this post has given you some insight into understanding your customers’ preferences. Using preferences and segments effectively is critical to designing your marketing materials and reaching your customers. Feel free to try applying these principles on your own, or check out Custora for our ready-built implementation and lifecycle-marketing.

If you are interested in this work, Custora is hiring data scientists, software engineers, client services and business development roles. Reach out to us at careers@custora.com.

Society, R. S., & Society, R. S. (2012). The Dirichlet : A Comprehensive Model of Buying Behaviour Author ( s ): G . J . Goodhardt , A . S . C . Ehrenberg and C . Chatfield Reviewed work ( s ): Source : Journal of the Royal Statistical Society . Series A ( General ), Vol . 147 , No . 5 ( 1984 ), pp . Published by : Wiley-Blackwell for the Royal Statistical Society Stable URL : http://www.jstor.org/stable/2981696 . The Dirichlet : A Comprehensive, 147(5), 621-655.

Jain, D., Bass, F., & Chen, Y.-M. (2012). Estimation of Latent Class Models with Heterogeneous Choice Probabilities, 27(1), 94-101.