Download - Business Data Analytics · Business Data Analytics Lecture 4 ... Marketing and Sales Customer Lifecycle Management: Regression Problems. Customer lifecycle. Customer lifecycle. Moving

Business Data Analytics

Lecture 4

MTAT.03.319

The slides are available under creative common license.

The original owner of these slides is the University of Tartu.

Marketing and Sales

Customer Lifecycle

Management:

Regression Problems

Customer lifecycle

Customer lifecycle

Moving companies grow not

because they force people to move

more often, but by attracting new

customers

Relationships based on

commitment

event-based subscription-based

Relationships based on

commitment

• Packers and Movers

• Wedding Planners

Event-based Subscription-based

• Telco

• Banks

• Retail (Walmart, Konsume,

etc)

• Hairdressers

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Third Edition

Customer lifecycle

http://proquestcombo.safaribooksonline.com.ezproxy.utlib.ut.ee/book/databases/business-intelligence/9780470650936

Customer lifecycle

Customer lifecycle

(Returns)

https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-analytics-drives-customer-life-cycle-management-108033.pdf

Customer Lifecycle

(Techniques/Approaches)

• Start with understanding of

your existing customers

(Segment the customers)

• Acquire the customers more

profitable

• Understanding future behavior

with Propensity Model

• Convince/Influencing your

customers to Spend more

• Clustering (Lecture 3)

• Regression techniques

(Present, Lecture 4)

• Classification (Lecture 5)

• Cross-selling/Up-selling

(Lecture ?)

Problems Solutions

Customer lifetime value

(CLV)

Describes the amount of revenue or profit a customer generates

over his or her entire lifetime (10-20 years)

We attempt to minimize “cost per acquisition” (CAC) and

keeping any given customer.

or CLTV, lifetime customer value (LCV), or life-time value (LTV) .

Predictive lifetime value: projects what new

customers will spend over their entire lifetime.

CLV is often referred to two forms of lifetime value analysis:

Historical lifetime value: simply sums up

revenue or profit per customer.

time

Predicting purchase journeys

?

“Algorithms predict purchase frequency, average order value,

and propensity to churn to create an estimate of the value of the

customer to the business. Predictive CLV is extremely useful for

evaluating acquisition channel performance, using modeling to

target high value customers, and identifying and cultivating VIP

customers early in their brand journey.”

custora.com

Unsupervised learning Supervised learning

Supervised vs. Unsupervised Learning

The goal of the supervised approach is to learn function that maps input

x to output y, given a labeled set of pairs

The goal of the unsupervised approach is to learn “interesting patterns”

given only an input

Categorize different types of customers ?

How many will leave ? (regression)

If a particular customer will leave or not ? (classification)

Regression vs. Classification

If a particular customer will leave or not ?How many will leave ?

Sleeping habits

4 hours of sleep 8 hours of sleep

exam performance

Linear regression

Linear regression

Linear regression

Simple linear regression

x

y

Task: given a list of observations find a line

that approximates the correspondence in the data


output

(dependent variable,

response)

input

(independent variable,

feature,

explanatory variable, etc)


intercept (bias)

coefficient (slope, or weight w)

noise (error term, residual)

shows how increases output

if input increases by one unit

mean of y when x=0

shows what we are not able to predict with x


Simple linear regression: example

Built-in R dataset: a collection of observations of the Old

Faithful geyser in the USA Yellowstone National Park

> data(faithful)

> head(faithful)

eruptions waiting

1 3.600 79

2 1.800 54

3 3.333 74

4 2.283 62

5 4.533 85

6 2.883 55

the duration of the geyser eruptions (in mins)

the length of the waiting period until the next one (in mins)

> model <- lm(data=faithful, eruptions ~ waiting)

What do we want to model here?

i.e. What is input and output?

> dim(faithful)

[1] 272 2

Simple linear regression: example in R

The fitted model is: eruptions = -1.87 + 0.08 x waiting

What is the eruption time if

waiting was 70?


The fitted model is: eruptions = -1.87 + 0.08 x waiting

> -1.874016 + 70*0.075628

[1] 3.419944

> coef(model)[[1]] + coef(model)[[2]]*70

What is the eruption time if

waiting was 70?

Data

Machine learning “secret sauce”

Test

Train

Prediction Problem

Training Data

(with labeled information)

X 1 -> Y1

X2 -> Y2

:

X100 -> Y200

X101 -> ?

X102 -> ?

:

X110 -> ?

Test Data

(no labeled information)


x

y

Task: given a list of observations find a line

that approximates the correspondence in the data

Predict

this Label


train_idx <- sample(nrow(faithful), 172)

train <- faithful[train_idx,]

test <- faithful[-train_idx,]

model <- lm(data=train, eruptions ~ waiting)

test$predictions <- predict(newdata=test, model)

Multiple linear regression

all the same, but instead of one feature, x is a k-dimensional vector

the model is the linear combination of all features:


model_1 <- lm(data=train[,-1], AmountPerCust_2 ~ AmountPerCust_1

+ TransPerCustomer_1 + AmountPerTr_1 + gender + age +

discount_proposed + clicks_in_eshop)


Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -121.71416 5.57368 -21.837 < 2e-16 ***

AmountPerCust_1 -0.11450 0.07461 -1.535 0.1259

TransPerCustomer_1 1.10197 0.68570 1.607

0.1090

AmountPerTr_1 0.18432 0.11310 1.630 0.1041

gender1 -1.24804 0.58533 -2.132 0.0337 *

age 0.20375 0.03775 5.397 0.000000132 ***

discount_proposed1 59.90655 2.02840 29.534 < 2e-

16 ***

clicks_in_eshop 23.98903 1.04957 22.856 < 2e-16

***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.948 on 322 degrees of freedom

Multiple R-squared: 0.8529, Adjusted R-squared: 0.8497

F-statistic: 266.7 on 7 and 322 DF, p-value: < 2.2e-16

Interpret

Quality Assessment

• Mean Absolute Error• Mean Square Error• Root Mean Square Error• R2

MAE and (R)MSE

Reference: https://medium.com/human-in-a-machine-world/mae-and-rmse-which-

metric-is-better-e60ac3bde13d

Loss function/distance:

MAE is more robust to outliers since it

does not make use of square.

With errors, MAE is steady

MSE is more useful if we are

concerned about large errors.

With increase in errors, RMSE

increases as the variance associated

with the frequency distribution of error

magnitudes also increases.

RMSE Vs. MAE

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

RMSE should be more useful when large errors are particularly undesirable.

RMSE Vs. MAE

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

RMSE does not necessarily increase with the variance of the errors. RMSE

increases with the variance of the frequency distribution of error magnitudes


train_idx <- sample(nrow(faithful), 172)

train <- faithful[train_idx,]

test <- faithful[-train_idx,]

model <- lm(data=train, eruptions ~ waiting)

test$predictions <- predict(newdata=test, model)

MSE <- (1/nrow(test))*sum((test$eruptions - test$predictions)^2)

> MSE

[1] 0.2742695

R2: Goodness of fit

Reference: https://www.youtube.com/watch?v=w2FKXOa0HGA

R2 -> 1, better as compared to when R2-> 0var(mean) – var(model)

R2 = var(mean)

*Var = variation

R2 and more

• R-squared cannot determine whether the coefficient estimates and

predictions are biased, which is why you must assess the residual

plots.

• “Adjusted R-square” penalizes you for adding variables which do

not improve your existing model.

• Typically, the more non-significant variables you add into the

model, the gap in R-squared and Adjusted R-squared increases.

Demo time!

https://courses.cs.ut.ee/2018/bda/fall/Main/Practice