Lg ph d_slides_vfinal

Optimal Personalized Treatment Learning Modelswith Insurance Applications

PhD Student: Leo GuelmanAdvisor: Dr. Montserrat Guillen

PhD in EconomicsUniversity of Barcelona

March 2, 2015

1 / 22

Outline

1 Motivation

2 Problem formulation

3 Objectives

4 Challenges

5 Contributions

6 Limitations and future work

2 / 22

Motivation

Predictive learning has been established as a core strategiccapability in many scientific disciplines and industries.

Common goal: Predict the value of a response variable usinga collection of covariates or “predictors” under staticconditions.

From Predictive to Causal Modeling• Predictive Modeling has been established as a core strategic capability of

many top insurers.

• Common goal: to predict a response variable using a collection of attributes under static conditions — i.e., assumes “business as usual” conditions.

• Causal Modeling goes one step further: the interest is in estimating/predicting the response under changing conditions — e.g., induced by alternative actions or “treatments”.

3

X YCovariates Response

XCovariates Potential responsesY (1)

Y (2)

Actions

A = 0

A = 1

Causal learning goes one step further: the interest is inestimating/predicting the response under changing conditions– e.g., induced by alternative actions or “treatments”.

From Predictive to Causal Modeling• Predictive Modeling has been established as a core strategic capability of

many top insurers.

• Common goal: to predict a response variable using a collection of attributes under static conditions — i.e., assumes “business as usual” conditions.

• Causal Modeling goes one step further: the interest is in estimating/predicting the response under changing conditions — e.g., induced by alternative actions or “treatments”.

3

X YCovariates Response

XCovariates Potential responsesY (1)

Y (2)

Actions

A = 0

A = 1

3 / 22

Motivation

In the context of causal learning, the main interest has been inidentifying the Average Treatment Effect (ATE):

ATE = E [Y |A = 1]− E [Y |A = 0].

In many important settings, subjects can show significantheterogeneity in response to actions/treatments – i.e., whatworks for one subject may not work for another. Here the ATE isless relevant.

The objective becomes to select the optimal action or“treatment” for each subject based on individual characteristics.

IntroductionCurrent Approaches

Personalized Medicine

Goal

“Providing meaningful improved health outcomes for patients bydelivering the right drug at the right dose at the right time.”

How Do We Apply Personalized Medicine?Learn individualized treatment rules: tailor treatments basedon patient characteristics.

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs

Tailored Therapies

Concepts & Tools

SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics

4

When Do We Apply Personalized Medicine?Single-Decision Setup.Multi-Decision Setup.

5/ 50

Optimal defined as the treatment that maximizes the probabilityof a desirable outcome.

We call the task of learning the optimal personalized treatmentpersonalized treatment learning (PTL).

4 / 22

Problem Formulation

Assume a randomized experiment – i.e., subjects are randomlyassigned to two treatments, denoted by A ∈ 0, 1.

Let Y (a) ∈ 0, 1 denote a binary potential response of asubject if assigned to treatment A = a, a = 0, 1.

So the observed response is Y = AY (1) + (1− A)Y (0).

Subjects are characterized by a p-dimensional vector ofbaseline predictors X = (X1, . . . ,Xp)>.

Data consist of L i.i.d. realizations of(Y ,A,X), (Y`,A`,X`), ` = 1, . . . , L.

5 / 22

Problem Formulation

At the most granular level, the personalized treatmenteffect (PTE) is a comparison between Y (1) and Y (0) on thesame subject. Usually,

Y`(1)− Y`(0) ∀ ` = 1, . . . , L.

But this quantity is unobserved...

In practice, the best we can do is to estimate the PTE byconditioning on subjects with profile X = x.

Thus, we define the PTE by

τ(x) = E [Y`(1)− Y`(0)|X` = x]

= E [Y`|X` = x,A` = 1]− E [Y`|X` = x,A` = 0].

6 / 22

Problem Formulation

A personalized treatment rule H is a map from the space ofbaseline covariates X to the space of treatments A,H(X) : Rp → 0, 1.

An optimal treatment rule is one that maximizes the expectedoutcome, E [Y (H(X))], if the personalized treatment rule isimplemented for the whole population.

Since Y is binary, this expectation has a probabilistic interpretation.

That is, E [Y (H(X))] = P(Y (H(X)) = 1

)and thus τ(x) ∈ [−1, 1].

Assuming all treatments cost the same, the optimal personalizedtreatment rule H∗ = argmaxH E [Y (H(X))] for a subject withcovariates X` = x is given by

H∗ =

1 if τ(x) > 00 otherwise.

7 / 22

The Simplest Approach to PTE Estimation

1 Estimate E [Y |X,A = 1] using the treated subjects only.2 Estimate E [Y |X,A = 0] using the control subjects only.3 An estimate of the PTE for a subject with predictors X` = x is

τ(x) = (Y`|X = x`,A` = 1)− (Y`|X = x`,A` = 0).

Pros:

Any conventional statistical or algorithmic binary classificationmethod may serve to fit the models.

Cons:

Method aims to predict the wrong target: it emphasizes theprediction accuracy on the response under each treatment, notthe accuracy in estimating the change in the responsecaused by the treatment.Relevant predictors for Y under each treatment are usuallydifferent from relevant PTE predictors.As a result, it tends to perform poorly in practice.

8 / 22

Objectives

Formalize personalized treatment learning (PTL) as a newbranch of statistical learning.

Create the first comprehensive systematic review of theexisting PTL methods (Tian and Tibshirani, 2014; Radcliffe and

Surry, 2011; Jaskowski and Jaroszewicz, 2012; Su et al., 2009; Qian and

Murphy, 2011; Zhao et al., 2012; Rubin and Waterman, 2006;

Larsen, 2009; Imai and Ratkovic, 2013, and others).

Build improved statistical/algorithmic methods forestimating, selecting and assessing PTL models.

Introduce PTL models to insurance applications.

Build open source software implementing our proposedmethods for fitting PTL models, as well as the existing ones,and make it freely available for academia/industry.

9 / 22

Key Challenges

The fundamental problem of PTL models: The quantity we aretrying to predict (i.e., the optimal personalized treatment) isunknown on a given training data set.

Size of main effects relative to treatment heterogeneityeffects: The magnitude of the variability in the response due to thetreatment heterogeneity effects is usually much smaller than thevariability due to the main effects.

Overfitting: The risk of overfitting increases markedly in PTLmodels compared to conventional predictive learning problems.

Model selection and assessment: Methods for variable selectionand model selection/assessment used in conventional predictivelearning problems need to be redefined in the context of PTLmodels.

10 / 22

Contributions

Introduced and formalized the concept of personalized treatmentlearning (PTL) within a causal inference framework, and describedits relevance to a wide variety of fields ranging from economics tomedicine (Ch. 1-2).

Provided the first comprehensive description of the existing PTLmethods (Ch. 3) and proposed two novel methods – namely, upliftrandom forests (Ch. 4) and causal conditional inference trees(Ch. 5). Our proposal outperforms the existing methods in anextensive numerical simulation study (Ch. 6).

PTL models require not only developing new estimation methods,but also new methods for assessing model performance. Weformalized the concept of the Qini curve and the Qini coefficient,and discussed general useful methods for model assessment andselection for PTL models (Ch. 7).

11 / 22

Contributions

We described the relevance of PTL models to insurancemarketing, and illustrated two applications to optimize clientretention and cross-selling using experimental data from a largeCanadian insurer (Ch. 8).

We presented a novel approach to measuring price-elasticity andeconomic price optimization in non-life insurance based on PTLmodeling principles in the context of observational data (Ch. 9).

Selecting the optimal personalized treatment in insurance alsorequires consideration of the expected losses under treatmentalternatives. We described an unprecedented application ofgradient boosting models to estimate loss cost in non-lifeinsurance, with key advantages over the conventional generalizedlinear model approach (Ch. 10).

We implemented most of the statistical methods and algorithmsdescribed in this thesis in a package named uplift (Guelman, 2014),which is now freely available from the CRAN (Comprehensive RArchive Network) repository under the R statistical computingenvironment (Ch. 11).

12 / 22

http://www.r-project.org

PTL with Experimental DataAn Application to Insurance Cross-Sell Optimization

Cross-sell rates by group

Treatment Control

Purchased home policy = N 30,184 3,322Purchased home policy = Y 789 75Cross-sell rate 2.55% 2.21%

The average treatment effect (ATE) is 0.34% (2.55% −2.21%) which is NOT statistically significant (P value =0.23).

Cross-sell rate by PTL model decile

10 9 8 7 6 5 4 3 2 1 Deciles

0

10

20

30

−0.03 0.00 0.03 0.06Estimated PTE

dens

ity

Profitable targets Not profitable targets

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10Decile

Cro

ss−

sell

rate

(%

)

Group Treatment Control

Prototype causal conditional inference tree

0.34%

0.05%

prodCnt ≤ 1

0.65%

1.22%

adjBranch

0.38%

0.67% 2.52%

age ≤ 45

-1.2% 0.32%

xdate ≤ 2

-1.6%

age ≤ 42

-0.2%

The tree-based procedure identifiesa subgroup of clients with significantpositive impact from the cross-sellactivity (PTE = 2.52%).

13 / 22

Causal Conditional Inference Tree - Pseudocode

Algorithm 1 Causal conditional inference tree

1: for each terminal node do2: Test the global null hypothesis H0 of no interaction effect between

the treatment A and any of the p predictors at a level of significanceα based on a permutation test (Strasser and Weber, 1999)

3: if the null hypothesis H0 cannot be rejected then4: Stop5: else6: Select the j∗th predictor Xj∗ with the strongest interaction effect

(i.e., the one with the smallest adjusted P value)7: Choose a partition Ω∗ of the covariate Xj∗ in two disjoint sets

M⊂ Xj∗ and Xj∗ \ M based on the G 2(Ω) split criterion8: end if9: end for

G 2(Ω) =(L− 4)

Left Node︷︸︸︷(YnL(1)− YnL(0))−

Right Node︷︸︸︷(YnR (1)− YnR (0))2

σ21/LnL(1) + 1/LnL(0) + 1/LnR (1) + 1/LnR (0)14 / 22

Numerical simulations – Performance comparison“Strong” treatment heterogeneity effects

Simulation scenarios are obtained by varying: (i) the strength of the maineffects relative to the treatment heterogeneity effects, (ii) the correlationamong the covariates, (iii) the magnitude of the random noise.

Scenario 1 Scenario 2


0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

−0.3

0.0

0.3

0.6

l2sv

mcc

ifup

liftR

Fds

mds

m−

RF

mom

−R

Fm

omm

cmck

nnm

cm−

RF

int

int−

RF

ccif

uplif

tRF

dsm

−R

Fm

om−

RF

cknn

mcm

−R

Fin

t−R

Fl2

svm

mom

mcm

dsm

int

ccif

l2sv

mup

liftR

Fm

om−

RF

dsm

−R

Fm

omm

cmds

mm

cm−

RF

cknn

int−

RF

int

ccif

uplif

tRF

dsm

−R

Fm

om−

RF

mcm

−R

Fck

nnin

t−R

Fm

omm

cml2

svm

dsm

int

Method

Spe

arm

an's

ran

k co

rrel

atio

n co

effic

ient

15 / 22

Numerical simulations – Performance comparison“Weak” treatment heterogeneity effects



0.00

0.25

0.50

0.75

−0.5

0.0

0.5

0.00

0.25

0.50

0.75

−0.25

0.00

0.25

0.50

dsm

ccif

uplif

tRF

l2sv

mds

m−

RF

mom

−R

Fin

tck

nnm

cm−

RF

int−

RF

mom

mcm ccif

uplif

tRF

dsm

−R

Fds

mm

om−

RF

cknn

mcm

−R

Fin

t−R

Fl2

svm

mom

mcm int

dsm

ccif

uplif

tRF

dsm

−R

Fm

om−

RF

mcm

−R

Fck

nnl2

svm

int−

RF

int

mom

mcm ccif

uplif

tRF

dsm

−R

Fm

om−

RF

mcm

−R

Fck

nnds

min

t−R

Fl2

svm

mom

mcm int

Method

Spe

arm

an's

ran

k co

rrel

atio

n co

effic

ient

16 / 22

PTL with Observational DataAn application to Auto Insurance Economic Price Optimization

Background

Objective: Determine thepolicyholder-level premium(playing the role of thetreatment) that maximizes theexpected profitability of anexisting insurance portfolio,subject to a fixed overall retentionrate.

Requires estimating the expectedclient retention outcome underalternative insurance rates (priceelasticity), and loss cost.

Clients were historically exposedto rating actions based on anon-random assignmentmechanism, which requiresdesigning an observational study.

Maximize an expected profit function

MaxZà∀`∀a

∑∀`

∑∀a

Zà

[P`(1 + RCa)(1− LRà)(1− ˆrà)

]

subject to a retention constraint

∑∀a

Zà = 1 ∀`

Zà ∈ 0, 1∑∀`

∑a

Zàˆrà/L ≤ α.

Current state

Efficient frontier

A

B

C

0

10

20

30

0.90 0.92 0.94 0.96Retention rate (1 − α)

Exp

ecte

d pr

ofit

(%)

17 / 22

uplift Package Highlights

First R package implementing PTL modelsExploratory Data Analysis (EDA) tools customized forPTL models

Check balance of covariates (checkBalance)Univariate uplift analysis (explore)Preliminary variable screening (niv)

Estimating personalized treatment effectsCausal conditional inference forests (ccif)Uplift random forests (upliftRF)Modified covariate method (tian_transf)Modified outcome method (rvtu)Uplift k-nearest neighbor (upliftKNN)

Performance assessment for PTL modelsUplift by decile (performance)Qini curve and Qini-coefficient (qini)

Other functionalityProfiling PTL models (modelProfile)Monte-Carlo simulations (sim_pte)

18 / 22

Fitting a CCIF using uplift

ccif implements recursive partitioning in a causal conditional inferenceframework.

fit <- ccif(formula = Y ~ trt(A) + X1 + X2 + X3,

data = mydata,

ntree = 1000,

split_method = "Int",

distribution = approximate (B=999),

pvalue = 0.05,

verbose = TRUE)

Table: Some ccif options

ccif argument Descriptionmtry Number of variables to be tested in each nodentree Number of trees in the forestsplit_method Split criteria: "KL", "ED", "Int" or "L1"interaction.depth The maximum depth of variable interactionspvalue Maximum acceptable p-value required to make a splitbonferroni Apply Bonferroni adjustment to pvalueminsplit Minimum number of obs. for a split to be attempted... Additional args. passed to coin::independence_test.

19 / 22

Manuscripts Linked to Thesis

[1] Guelman, L. (2014). uplift: Uplift modeling. R package version 0.3.5.

[2] Guelman, L. and Guillen, M. (2014). A causal inference approach to measure priceelasticity in automobile insurance. Expert Systems with Applications, 41(2):387–396.

[3] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). A survey of personalizedtreatment models for pricing strategies in insurance. Insurance: Mathematics andEconomics, 58:68–76.

[4] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). Uplift random forests.Cybernetics & Systems. Accepted.

[5] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). A decision supportframework to implement optimal personalized marketing interventions. DecisionSupport Systems, 72: 24–32.

[6] Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modelingand prediction. Expert Systems with Applications, 39(3):3659–3667.

[7] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2012). Random forests for upliftmodeling: An insurance customer retention case. In Engemann, K. J., Lafuente, A. M.G., and Merigo, J. M., editors, Modeling and Simulation in Engineering, Economicsand Management, pages 123–133. Springer Berlin Heidelberg, New York, NY.

20 / 22

Talks Linked to Thesis

[1] Guillen, M. and Guelman, L. (2014). “New trends in predictive modelling - the uplift models success story”.R in Insurance Conference, London, UK. (July 14, 2014).

[2] Guelman, L. and Guillen, M. (2014). “Actionable predictive learning for insurance profit maximization”.Casualty Actuarial Society, Ratemaking and Product Management Seminar, Washington, D.C., USA (April 1,2014).

[3] Guelman, L. (2013). “An introduction to causal learning with applications to price elasticity modeling inCasualty insurance”. University of Barcelona, UB Economics Seminars, Barcelona, Spain (November 28, 2013).

[4] Guelman, L., Guillen, M. and Perez-Marın, A.M. (2013). “Evaluating customer loyalty with advanced upliftmodels”. APRIA, Annual Conference, New York, USA (July 28-31, 2013).

[5] Guillen, M., Guelman, L., M. and Perez-Marın, A.M. (2013). “Customer retention and price elasticity. Aremotor insurance policies homogeneous with respect to loyalty?”. 2013 Astin colloquium, The Hague, Netherlands(May 21-24, 2013).

[6] Guelman, L. and Lee, S. (2013). “Balancing robust statistics - gradient boosting”. Casualty Actuarial Society,Ratemaking and Product Management Seminar, Los Angeles, USA (March 12-13, 2013).

[7] Guelman, L., Guillen, M. and Perez-Marın, A.M. (2012). “Random forests for uplift modeling: an insurancecustomer retention case”. Association of Modeling and Simulation in Enterprise (AMSE) - International Conferenceon Modeling and Simulation, New York, USA (May 30-June 1, 2012). – Outstanding Scholarly ResearchContribution Award, AMSE.

[8] Guelman, L. and Lee, S. (2012). “Balancing robust statistics and data mining in ratemaking: gradient boostingmodeling”. Casualty Actuarial Society, Ratemaking and Product Management Seminar, Philadelphia, USA (March20-21, 2012).

21 / 22

Limitations and Future Work

1 Extensions to multi-category and continuous treatment settings.

2 Extensions to continuous uncensored and survival responses.

3 Extensions to dynamic treatment regimes: treatment type may berepeatedly adjusted according to an ongoing individual response.

4 Absolute vs. relative treatment effects – in some settings, definingthe treatment effect in terms of the ratio of the expected responsesunder alternative treatment conditions, instead of the differencebetween the expected responses may be more appropriate.

22 / 22

Date post:	15-Jul-2015
Category:	Documents
Upload:	arthur-charpentier
View:	4,731 times
Download:	0 times

Lg ph d_slides_vfinal

Documents