Mobile e-commerce has exploded over the past four years and the growth isn’t stopping. Smartphone use is rising, fast. Tablet use is rising faster. Apple devices are everywhere. Android and Amazon are making moves.
Here at Custora we wanted to explore the quantitative aspect of these trends – you can read about the findings and download the full Custora E-Commerce Pulse Mobile Report here. Additionally, in the interest of catering to the more data-oriented folks, we also wanted to share more on the methodology behind the report along with a spreadsheet with some of the raw data (available for download below).
The basis of the analysis came from combining data from our customers with the obtain-scrub-explore methods of data science. Many of our hunches were confirmed through the process, but we also found out several surprises along the way, such as the sharp increase in multi-device use.
This report is a product of the Custora E-commerce Pulse, which publishes monthly updates on the state of e-commerce based on data from over 70 million anonymized shoppers and $10B in e-commerce revenue from over 100 online retailers.
Obtaining the Data: Combination of Transaction, Customer, and Web data
The data for the mobile report was obtained from Custora customers who provide us several data sources:
- Transaction data – including anonymized customer data and detailed transaction data
- Web analytics data – including mobile device data, conversion data, and marketing channel data
Some of the web analytics data was augmented by pulling in retailers’ data from 3rd party web analytics providers. The data was then combined and aggregated on a customer, device, and order level.
Scrubbing the Data
The next step was scrubbing the data. Several retailers consistently had values that made no sense – hundreds of millions of visits a month, monthly revenue of negative $1 Billion, and other wacky numbers. In those cases we would remove them from the analysis entirely when using one of those metrics. Other errors were more limited, like specific rows with unusual values (such as “time on site” numbers of many hours). Since these were more sporadic in nature, we kept the vast majority of numbers from that retailer, while removing the troublesome rows. The cleaning step also involved light organization of the categorical variables. For example, we combined the various iDevices into a single iOS category and consolidated dozens of “medium” labels into the eight marketing channels we present in the report.
Exploring the Data
We explored the data using a standard pipeline for each of the report’s statistics. Using the dplyr package, we first filtered the date range for the given analysis, as some metric and dimension combinations were not tracked two or three years ago. We converted the data from long format to wide format so each date had a column with the value of the metric of interest for each device or platform. Finally, we summarized over the metric, grouping by either month or quarter. Figures were generated using ggplot2.
Utilizing an external data source required several iterations of scrubbing and validation. However, once that was complete, we were able to cross-examine the general assumptions and intuitions about mobile e-commerce through a qualitative lens and find new avenues for exploration as dictated by the data itself. The final and most interesting step involved deciding which of the many tidbits we found most relevant, and how to present them to our audience. This is where the analysis left the boundaries of pure data science, and we were joined by Custora’s marketing and design teams to formulate the report together.
Download the companion spreadsheet for the Custora Pulse E-commerce Mobile Report
Enter your details below to download the spreadsheet (XLS format).