Date post: | 15-Jul-2015 |
Category: |
Documents |
Upload: | arthur-charpentier |
View: | 4,731 times |
Download: | 0 times |
Optimal Personalized Treatment Learning Modelswith Insurance Applications
PhD Student: Leo GuelmanAdvisor: Dr. Montserrat Guillen
PhD in EconomicsUniversity of Barcelona
March 2, 2015
1 / 22
Outline
1 Motivation
2 Problem formulation
3 Objectives
4 Challenges
5 Contributions
6 Limitations and future work
2 / 22
Motivation
Predictive learning has been established as a core strategiccapability in many scientific disciplines and industries.
Common goal: Predict the value of a response variable usinga collection of covariates or “predictors” under staticconditions.
From Predictive to Causal Modeling• Predictive Modeling has been established as a core strategic capability of
many top insurers.
• Common goal: to predict a response variable using a collection of attributes under static conditions — i.e., assumes “business as usual” conditions.
• Causal Modeling goes one step further: the interest is in estimating/predicting the response under changing conditions — e.g., induced by alternative actions or “treatments”.
3
X YCovariates Response
XCovariates Potential responsesY (1)
Y (2)
Actions
A = 0
A = 1
Causal learning goes one step further: the interest is inestimating/predicting the response under changing conditions– e.g., induced by alternative actions or “treatments”.
From Predictive to Causal Modeling• Predictive Modeling has been established as a core strategic capability of
many top insurers.
• Common goal: to predict a response variable using a collection of attributes under static conditions — i.e., assumes “business as usual” conditions.
• Causal Modeling goes one step further: the interest is in estimating/predicting the response under changing conditions — e.g., induced by alternative actions or “treatments”.
3
X YCovariates Response
XCovariates Potential responsesY (1)
Y (2)
Actions
A = 0
A = 1
3 / 22
Motivation
In the context of causal learning, the main interest has been inidentifying the Average Treatment Effect (ATE):
ATE = E [Y |A = 1]− E [Y |A = 0].
In many important settings, subjects can show significantheterogeneity in response to actions/treatments – i.e., whatworks for one subject may not work for another. Here the ATE isless relevant.
The objective becomes to select the optimal action or“treatment” for each subject based on individual characteristics.
IntroductionCurrent Approaches
Personalized Medicine
Goal
“Providing meaningful improved health outcomes for patients bydelivering the right drug at the right dose at the right time.”
How Do We Apply Personalized Medicine?Learn individualized treatment rules: tailor treatments basedon patient characteristics.
MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs
Tailored Therapies
Concepts & Tools
SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics
4
MotivationsTailoring Therapies and Delayed EffectsDynamic Treatment Regime & Biomarker Adaptive Designs
Tailored Therapies
Concepts & Tools
SymptomsDemographicsDisease historyBiomarkersImagingBioinformaticsPharmacogenomics
4
When Do We Apply Personalized Medicine?Single-Decision Setup.Multi-Decision Setup.
5/ 50
Optimal defined as the treatment that maximizes the probabilityof a desirable outcome.
We call the task of learning the optimal personalized treatmentpersonalized treatment learning (PTL).
4 / 22
Problem Formulation
Assume a randomized experiment – i.e., subjects are randomlyassigned to two treatments, denoted by A ∈ 0, 1.
Let Y (a) ∈ 0, 1 denote a binary potential response of asubject if assigned to treatment A = a, a = 0, 1.
So the observed response is Y = AY (1) + (1− A)Y (0).
Subjects are characterized by a p-dimensional vector ofbaseline predictors X = (X1, . . . ,Xp)>.
Data consist of L i.i.d. realizations of(Y ,A,X), (Y`,A`,X`), ` = 1, . . . , L.
5 / 22
Problem Formulation
At the most granular level, the personalized treatmenteffect (PTE) is a comparison between Y (1) and Y (0) on thesame subject. Usually,
Y`(1)− Y`(0) ∀ ` = 1, . . . , L.
But this quantity is unobserved...
In practice, the best we can do is to estimate the PTE byconditioning on subjects with profile X = x.
Thus, we define the PTE by
τ(x) = E [Y`(1)− Y`(0)|X` = x]
= E [Y`|X` = x,A` = 1]− E [Y`|X` = x,A` = 0].
6 / 22
Problem Formulation
A personalized treatment rule H is a map from the space ofbaseline covariates X to the space of treatments A,H(X) : Rp → 0, 1.
An optimal treatment rule is one that maximizes the expectedoutcome, E [Y (H(X))], if the personalized treatment rule isimplemented for the whole population.
Since Y is binary, this expectation has a probabilistic interpretation.
That is, E [Y (H(X))] = P(Y (H(X)) = 1
)and thus τ(x) ∈ [−1, 1].
Assuming all treatments cost the same, the optimal personalizedtreatment rule H∗ = argmaxH E [Y (H(X))] for a subject withcovariates X` = x is given by
H∗ =
1 if τ(x) > 00 otherwise.
7 / 22
The Simplest Approach to PTE Estimation
1 Estimate E [Y |X,A = 1] using the treated subjects only.2 Estimate E [Y |X,A = 0] using the control subjects only.3 An estimate of the PTE for a subject with predictors X` = x is
τ(x) = (Y`|X = x`,A` = 1)− (Y`|X = x`,A` = 0).
Pros:
Any conventional statistical or algorithmic binary classificationmethod may serve to fit the models.
Cons:
Method aims to predict the wrong target: it emphasizes theprediction accuracy on the response under each treatment, notthe accuracy in estimating the change in the responsecaused by the treatment.Relevant predictors for Y under each treatment are usuallydifferent from relevant PTE predictors.As a result, it tends to perform poorly in practice.
8 / 22
Objectives
Formalize personalized treatment learning (PTL) as a newbranch of statistical learning.
Create the first comprehensive systematic review of theexisting PTL methods (Tian and Tibshirani, 2014; Radcliffe and
Surry, 2011; Jaskowski and Jaroszewicz, 2012; Su et al., 2009; Qian and
Murphy, 2011; Zhao et al., 2012; Rubin and Waterman, 2006;
Larsen, 2009; Imai and Ratkovic, 2013, and others).
Build improved statistical/algorithmic methods forestimating, selecting and assessing PTL models.
Introduce PTL models to insurance applications.
Build open source software implementing our proposedmethods for fitting PTL models, as well as the existing ones,and make it freely available for academia/industry.
9 / 22
Key Challenges
The fundamental problem of PTL models: The quantity we aretrying to predict (i.e., the optimal personalized treatment) isunknown on a given training data set.
Size of main effects relative to treatment heterogeneityeffects: The magnitude of the variability in the response due to thetreatment heterogeneity effects is usually much smaller than thevariability due to the main effects.
Overfitting: The risk of overfitting increases markedly in PTLmodels compared to conventional predictive learning problems.
Model selection and assessment: Methods for variable selectionand model selection/assessment used in conventional predictivelearning problems need to be redefined in the context of PTLmodels.
10 / 22
Contributions
Introduced and formalized the concept of personalized treatmentlearning (PTL) within a causal inference framework, and describedits relevance to a wide variety of fields ranging from economics tomedicine (Ch. 1-2).
Provided the first comprehensive description of the existing PTLmethods (Ch. 3) and proposed two novel methods – namely, upliftrandom forests (Ch. 4) and causal conditional inference trees(Ch. 5). Our proposal outperforms the existing methods in anextensive numerical simulation study (Ch. 6).
PTL models require not only developing new estimation methods,but also new methods for assessing model performance. Weformalized the concept of the Qini curve and the Qini coefficient,and discussed general useful methods for model assessment andselection for PTL models (Ch. 7).
11 / 22
Contributions
We described the relevance of PTL models to insurancemarketing, and illustrated two applications to optimize clientretention and cross-selling using experimental data from a largeCanadian insurer (Ch. 8).
We presented a novel approach to measuring price-elasticity andeconomic price optimization in non-life insurance based on PTLmodeling principles in the context of observational data (Ch. 9).
Selecting the optimal personalized treatment in insurance alsorequires consideration of the expected losses under treatmentalternatives. We described an unprecedented application ofgradient boosting models to estimate loss cost in non-lifeinsurance, with key advantages over the conventional generalizedlinear model approach (Ch. 10).
We implemented most of the statistical methods and algorithmsdescribed in this thesis in a package named uplift (Guelman, 2014),which is now freely available from the CRAN (Comprehensive RArchive Network) repository under the R statistical computingenvironment (Ch. 11).
12 / 22
PTL with Experimental DataAn Application to Insurance Cross-Sell Optimization
Cross-sell rates by group
Treatment Control
Purchased home policy = N 30,184 3,322Purchased home policy = Y 789 75Cross-sell rate 2.55% 2.21%
The average treatment effect (ATE) is 0.34% (2.55% −2.21%) which is NOT statistically significant (P value =0.23).
Cross-sell rate by PTL model decile
10 9 8 7 6 5 4 3 2 1 Deciles
0
10
20
30
−0.03 0.00 0.03 0.06Estimated PTE
dens
ity
Profitable targets Not profitable targets
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10Decile
Cro
ss−
sell
rate
(%
)
Group Treatment Control
Prototype causal conditional inference tree
0.34%
0.05%
prodCnt ≤ 1
0.65%
1.22%
adjBranch
0.38%
0.67% 2.52%
age ≤ 45
-1.2% 0.32%
xdate ≤ 2
-1.6%
age ≤ 42
-0.2%
The tree-based procedure identifiesa subgroup of clients with significantpositive impact from the cross-sellactivity (PTE = 2.52%).
13 / 22
Causal Conditional Inference Tree - Pseudocode
Algorithm 1 Causal conditional inference tree
1: for each terminal node do2: Test the global null hypothesis H0 of no interaction effect between
the treatment A and any of the p predictors at a level of significanceα based on a permutation test (Strasser and Weber, 1999)
3: if the null hypothesis H0 cannot be rejected then4: Stop5: else6: Select the j∗th predictor Xj∗ with the strongest interaction effect
(i.e., the one with the smallest adjusted P value)7: Choose a partition Ω∗ of the covariate Xj∗ in two disjoint sets
M⊂ Xj∗ and Xj∗ \ M based on the G 2(Ω) split criterion8: end if9: end for
G 2(Ω) =(L− 4)
Left Node︷ ︸︸ ︷(YnL(1)− YnL(0))−
Right Node︷ ︸︸ ︷(YnR (1)− YnR (0))2
σ21/LnL(1) + 1/LnL(0) + 1/LnR (1) + 1/LnR (0)14 / 22
Numerical simulations – Performance comparison“Strong” treatment heterogeneity effects
Simulation scenarios are obtained by varying: (i) the strength of the maineffects relative to the treatment heterogeneity effects, (ii) the correlationamong the covariates, (iii) the magnitude of the random noise.
Scenario 1 Scenario 2
Scenario 3 Scenario 4
0.00
0.25
0.50
0.75
0.00
0.25
0.50
0.75
0.00
0.25
0.50
0.75
−0.3
0.0
0.3
0.6
l2sv
mcc
ifup
liftR
Fds
mds
m−
RF
mom
−R
Fm
omm
cmck
nnm
cm−
RF
int
int−
RF
ccif
uplif
tRF
dsm
−R
Fm
om−
RF
cknn
mcm
−R
Fin
t−R
Fl2
svm
mom
mcm
dsm
int
ccif
l2sv
mup
liftR
Fm
om−
RF
dsm
−R
Fm
omm
cmds
mm
cm−
RF
cknn
int−
RF
int
ccif
uplif
tRF
dsm
−R
Fm
om−
RF
mcm
−R
Fck
nnin
t−R
Fm
omm
cml2
svm
dsm
int
Method
Spe
arm
an's
ran
k co
rrel
atio
n co
effic
ient
15 / 22
Numerical simulations – Performance comparison“Weak” treatment heterogeneity effects
Scenario 5 Scenario 6
Scenario 7 Scenario 8
0.00
0.25
0.50
0.75
−0.5
0.0
0.5
0.00
0.25
0.50
0.75
−0.25
0.00
0.25
0.50
dsm
ccif
uplif
tRF
l2sv
mds
m−
RF
mom
−R
Fin
tck
nnm
cm−
RF
int−
RF
mom
mcm ccif
uplif
tRF
dsm
−R
Fds
mm
om−
RF
cknn
mcm
−R
Fin
t−R
Fl2
svm
mom
mcm int
dsm
ccif
uplif
tRF
dsm
−R
Fm
om−
RF
mcm
−R
Fck
nnl2
svm
int−
RF
int
mom
mcm ccif
uplif
tRF
dsm
−R
Fm
om−
RF
mcm
−R
Fck
nnds
min
t−R
Fl2
svm
mom
mcm int
Method
Spe
arm
an's
ran
k co
rrel
atio
n co
effic
ient
16 / 22
PTL with Observational DataAn application to Auto Insurance Economic Price Optimization
Background
Objective: Determine thepolicyholder-level premium(playing the role of thetreatment) that maximizes theexpected profitability of anexisting insurance portfolio,subject to a fixed overall retentionrate.
Requires estimating the expectedclient retention outcome underalternative insurance rates (priceelasticity), and loss cost.
Clients were historically exposedto rating actions based on anon-random assignmentmechanism, which requiresdesigning an observational study.
Maximize an expected profit function
MaxZ`a∀`∀a
∑∀`
∑∀a
Z`a
[P`(1 + RCa)(1− LR`a)(1− ˆr`a)
]
subject to a retention constraint
∑∀a
Z`a = 1 ∀`
Z`a ∈ 0, 1∑∀`
∑a
Z`aˆr`a/L ≤ α.
Current state
Efficient frontier
A
B
C
0
10
20
30
0.90 0.92 0.94 0.96Retention rate (1 − α)
Exp
ecte
d pr
ofit
(%)
17 / 22
uplift Package Highlights
First R package implementing PTL modelsExploratory Data Analysis (EDA) tools customized forPTL models
Check balance of covariates (checkBalance)Univariate uplift analysis (explore)Preliminary variable screening (niv)
Estimating personalized treatment effectsCausal conditional inference forests (ccif)Uplift random forests (upliftRF)Modified covariate method (tian_transf)Modified outcome method (rvtu)Uplift k-nearest neighbor (upliftKNN)
Performance assessment for PTL modelsUplift by decile (performance)Qini curve and Qini-coefficient (qini)
Other functionalityProfiling PTL models (modelProfile)Monte-Carlo simulations (sim_pte)
18 / 22
Fitting a CCIF using uplift
ccif implements recursive partitioning in a causal conditional inferenceframework.
fit <- ccif(formula = Y ~ trt(A) + X1 + X2 + X3,
data = mydata,
ntree = 1000,
split_method = "Int",
distribution = approximate (B=999),
pvalue = 0.05,
verbose = TRUE)
Table: Some ccif options
ccif argument Descriptionmtry Number of variables to be tested in each nodentree Number of trees in the forestsplit_method Split criteria: "KL", "ED", "Int" or "L1"interaction.depth The maximum depth of variable interactionspvalue Maximum acceptable p-value required to make a splitbonferroni Apply Bonferroni adjustment to pvalueminsplit Minimum number of obs. for a split to be attempted... Additional args. passed to coin::independence_test.
19 / 22
Manuscripts Linked to Thesis
[1] Guelman, L. (2014). uplift: Uplift modeling. R package version 0.3.5.
[2] Guelman, L. and Guillen, M. (2014). A causal inference approach to measure priceelasticity in automobile insurance. Expert Systems with Applications, 41(2):387–396.
[3] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). A survey of personalizedtreatment models for pricing strategies in insurance. Insurance: Mathematics andEconomics, 58:68–76.
[4] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). Uplift random forests.Cybernetics & Systems. Accepted.
[5] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2014). A decision supportframework to implement optimal personalized marketing interventions. DecisionSupport Systems, 72: 24–32.
[6] Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modelingand prediction. Expert Systems with Applications, 39(3):3659–3667.
[7] Guelman, L., Guillen, M. and Perez-Marın, A. M. (2012). Random forests for upliftmodeling: An insurance customer retention case. In Engemann, K. J., Lafuente, A. M.G., and Merigo, J. M., editors, Modeling and Simulation in Engineering, Economicsand Management, pages 123–133. Springer Berlin Heidelberg, New York, NY.
20 / 22
Talks Linked to Thesis
[1] Guillen, M. and Guelman, L. (2014). “New trends in predictive modelling - the uplift models success story”.R in Insurance Conference, London, UK. (July 14, 2014).
[2] Guelman, L. and Guillen, M. (2014). “Actionable predictive learning for insurance profit maximization”.Casualty Actuarial Society, Ratemaking and Product Management Seminar, Washington, D.C., USA (April 1,2014).
[3] Guelman, L. (2013). “An introduction to causal learning with applications to price elasticity modeling inCasualty insurance”. University of Barcelona, UB Economics Seminars, Barcelona, Spain (November 28, 2013).
[4] Guelman, L., Guillen, M. and Perez-Marın, A.M. (2013). “Evaluating customer loyalty with advanced upliftmodels”. APRIA, Annual Conference, New York, USA (July 28-31, 2013).
[5] Guillen, M., Guelman, L., M. and Perez-Marın, A.M. (2013). “Customer retention and price elasticity. Aremotor insurance policies homogeneous with respect to loyalty?”. 2013 Astin colloquium, The Hague, Netherlands(May 21-24, 2013).
[6] Guelman, L. and Lee, S. (2013). “Balancing robust statistics - gradient boosting”. Casualty Actuarial Society,Ratemaking and Product Management Seminar, Los Angeles, USA (March 12-13, 2013).
[7] Guelman, L., Guillen, M. and Perez-Marın, A.M. (2012). “Random forests for uplift modeling: an insurancecustomer retention case”. Association of Modeling and Simulation in Enterprise (AMSE) - International Conferenceon Modeling and Simulation, New York, USA (May 30-June 1, 2012). – Outstanding Scholarly ResearchContribution Award, AMSE.
[8] Guelman, L. and Lee, S. (2012). “Balancing robust statistics and data mining in ratemaking: gradient boostingmodeling”. Casualty Actuarial Society, Ratemaking and Product Management Seminar, Philadelphia, USA (March20-21, 2012).
21 / 22
Limitations and Future Work
1 Extensions to multi-category and continuous treatment settings.
2 Extensions to continuous uncensored and survival responses.
3 Extensions to dynamic treatment regimes: treatment type may berepeatedly adjusted according to an ongoing individual response.
4 Absolute vs. relative treatment effects – in some settings, definingthe treatment effect in terms of the ratio of the expected responsesunder alternative treatment conditions, instead of the differencebetween the expected responses may be more appropriate.
22 / 22