Advanced Customer LTV prediction in DataVault

Overview

Why we build a prediction use case

Prediction use cases enable Growth FullStack customers to get more out of their ETLs and BI solution stacks by helping them with predictive analytics and use cases, and answering questions like “What’s next?” “How soon?” and “How profitable?” etc.

why predictive analytics

Our Boosted Tree Regression Models are a part of the prediction use cases. They help us predict Day 7 to Day 90 lifetime value (LTV) with high accuracy and low execution time, so that we may use them to make intelligent decisions for our business.

Model 2.1: Boosted Tree Regression

The Boosted Tree Regression is an advanced version of our basic LTV prediction model (or Model 1.1). While Model 1.1 offers a simple framework that uses Day 0 to Day 3 LTV to make predictions about Day 7 to Day 90 LTV, Model 2.1 offers a more advanced decision tree framework to make the same predictions.

How the model works?

The Boosted Tree Regression Model works by using a subset of features (listed below) within a model to create different combinations of decision trees. The combinations can look like the following:

Combination 1: spend, site id, install, country
Combination 2: country, days_since_intall, d0_ltv, d1_ltv
…and so on

After building multiple combinations on subsets of data and features, the combinations are then boosted by aggregating values, and averaging the predictions to come up with a final value. This model can be used to predict multiple LTV values, as well as a single LTV value as needed. Additionally, this model can also utilise other features such as, country, platform, site_id etc to improve the accuracy of the prediction.

The graphic below explains the prediction process.

diagram explaining the prediction process

What are the subset of features required for the prediction?

app_id
platform
ad_network_id
campaign_id
country
site_id
spend
d0_ltv
d1_ltv
d2_ltv
d3_ltv
tracked_installs
days_since_install

What am I able to predict with this model?

D7_LTV
D14_LTV
D30_LTV
D90_LTV

What are the benefits of using this model as opposed to model 1.1?

Model 2.1 is more statistically advanced and has better accuracy since it uses multiple small models and takes the average of their errors as opposed to using a single model that uses all features together
With this model, we don’t need to wait for the campaign to have 3 days of data (as we would need to in Model 1.1), and can predict the next day already
This model requires less computing power and execution time.

Execution time of Model 1.1 vs Model 2.1

This model uses all available features and is also available at a site_id level
No feature is mandatory, the model will adapt to missing feature
This model can help you understand whether your campaign was successful in one country and not the other. Because we’re using multiple features/signals they are much more accurate and can be used for longer.
Options of incrementally upgrading this model’s accuracy can also be explored

How can you get started with this model?

If you would like to get started with this model, reach out to us and we will get back to you within 12 hours.

Tabarak

Advanced Customer LTV prediction in DataVault

Overview

Why we build a prediction use case

Model 2.1: Boosted Tree Regression

How the model works?

What are the subset of features required for the prediction?

What am I able to predict with this model?

What are the benefits of using this model as opposed to model 1.1?

How can you get started with this model?

Related

Tenjin

Attribution Provider

Stacks

Company

Use Cases

Legal