+ All Categories
Home > Documents > Representation Learning for Treatment EffectEstimation from...

Representation Learning for Treatment EffectEstimation from...

Date post: 05-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Representation Learning for Treatment Effect Estimation from Observational Data Liuyi Yao 1 , Sheng Li 2 , Yaliang Li 3 , Mengdi Huai 1 , Jing Gao 1 , Aidong Zhang 1 1 University at Buffalo, 2 University of Georgia, 3 Tencent Medical AI Lab Problem Setting Triplet Pair Selection Experiment Setting Challenges Experimental Results PDDM & MPDM Observational Data Natural observations Randomized Controlled Trials (RCTs) Need to control the treatment assignment Ø Estimating ITE can help decision making: Ø Why from the observational data? Ø Missing counterfactuals !"# $ =& '() (* $ )- & '(+ (* $ ) Ø Selection bias • Distributions of control & treated groups: ,(. / ) ≠ 2(. 3 ) • Counterfactual inference is more difficult. • Provided data: (4 5 , 6 7(8 ), (4 9 , 6 7(: ) •(4 9 , 6 7(: ) infer 6 ;(: |4 5 •(4 5 , 6 ;(8 ) infer 6 ;(8 |4 3 One is always missing in the data; The missing one is the counterfactual. Our Framework Ø The latent space Z should encode: The balanced distributions; The similarity information in X. Existing works ignore this. Ø SITE takes input units in a mini-batch fashion, which improves efficiency. Ø = : propensity score (probability of a unit in treated group). Can reflect the relative location of units in the original space. Ø Position-Dependent Deep Metric (PDDM): Preserve the similarity. Ø Middle Point Distance Minimization (MPDM): MPDM makes the middle point of (? ̂ A ,? C D ) close to the middle point of (? ̂ E ,? F G ). Balance the distribution in the latent space. IHDP Dataset: Treatment: specialist home visits; Outcome: Infants’ cognitive test scores; Pre-treatment covariates: 25 covariates measuring aspects of children and their mothers; Performance measurement: Precision in Estimation of Heterogeneous Effect (H IJKJ ). Jobs Dataset: Treatment: Job training; Outcome: employment status after training; Pre-treatment covariates: 8 covariates, such as age, education, ethnicity, previous earnings; Performance measurement: policy risk (L MNO ). Twins Dataset: Treatment: being the heavier one in twin; Outcome: one year mortality; Pre-treatment covariates: 40 covariates measuring aspects of pregnancy; Performance measurement: AUC. Ø Results on three datasets: Treated group . P (with treatment t = 1) Control group . Q (with treatment t = 0) Ø Individual treatment effect (ITE) of unit R : !"# $ =& '() (* $ ) - & '(+ (* $ ) Treated outcome Control outcome Expensive Time consuming Ethical issues… Easy to access Large amount of Data Who will benefit from job training? Policy Decision: Healthcare: Which medicine? absolute location relative location Local similarity preserved individual treatment effect estimation (SITE): " " Ø Code of SITE: https://github.com/Osier-Yi/SITE
Transcript
Page 1: Representation Learning for Treatment EffectEstimation from ...liuyiyao/pdf/nips18poster.pdfRepresentation Learning for Treatment EffectEstimation from Observational Data LiuyiYao1,ShengLi2,Yaliang

Representation Learning for Treatment Effect Estimation from Observational Data

Liuyi Yao1, Sheng Li2, Yaliang Li3, Mengdi Huai1, Jing Gao1, Aidong Zhang1

1 University at Buffalo, 2 University of Georgia, 3 Tencent Medical AI Lab

Problem Setting Triplet Pair Selection Experiment SettingChallenges

Experimental Results

PDDM & MPDM

Observational Data

Natural observations

Randomized Controlled Trials

(RCTs)

Need to control the treatment assignment

Ø Estimating ITE can help decision making:

Ø Why from the observational data?

Ø Missing counterfactuals!"#$ = &'()(*$) - &'(+(*$)

Ø Selection bias• Distributions of control & treated groups:

,(./) ≠ 2(.3)

• Counterfactual inference is more difficult.

• Provided data: (45 , 67(8), (49 , 67(:)

• (49, 67(:) infer

6;(:|45

• (45, 6;(8) infer

6;(8|43

One is always missing in the data;The missing one is the counterfactual.

Our Framework

Ø The latent space Z should encode:• The balanced distributions;• The similarity information in X. Existing worksignore this.

Ø SITE takes input units in a mini-batch fashion,which improves efficiency.

Ø =∗: propensity score (probability of a unit in treatedgroup). Can reflect the relative location of units inthe original space.

Ø Position-Dependent Deep Metric (PDDM):•Preserve the similarity.

Ø Middle Point Distance Minimization (MPDM):• MPDM makes the middle point of (?A, ? CD) close to the middle point of (?E, ?FG).

•Balance the distribution in the latent space.

IHDP Dataset: • Treatment: specialist home visits;• Outcome: Infants’ cognitive test scores;• Pre-treatment covariates: 25 covariates

measuring aspects of children and their mothers;• Performance measurement: Precision in

Estimation of Heterogeneous Effect (HIJKJ).

Jobs Dataset: • Treatment: Job training;• Outcome: employment status after training;• Pre-treatment covariates: 8 covariates, such as

age, education, ethnicity, previous earnings;• Performance measurement: policy risk (LMNO).

Twins Dataset: • Treatment: being the heavier one in twin;• Outcome: one year mortality;• Pre-treatment covariates: 40 covariates

measuring aspects of pregnancy;• Performance measurement: AUC.

Ø Results on three datasets:

Treated group .P(with treatment t = 1)

Control group .Q(with treatment t = 0)

Ø Individual treatment effect (ITE) of unit R:!"#$ = &'()(*$) - &'(+(*$)

Treated outcome Control outcome

Expensive Timeconsuming

Ethical issues…

Easy to access

Largeamount of

Data

☹ ☹ ☹

Who willbenefit fromjob training?

PolicyDecision:

Healthcare: Whichmedicine?

absolute location

relative location

Local similarity preserved individual treatment effect estimation (SITE):

" "Ø Code of SITE: https://github.com/Osier-Yi/SITE

Recommended