Representation Learning for Treatment Effect Estimation from Observational Data
Liuyi Yao1, Sheng Li2, Yaliang Li3, Mengdi Huai1, Jing Gao1, Aidong Zhang1
1 University at Buffalo, 2 University of Georgia, 3 Tencent Medical AI Lab
Problem Setting Triplet Pair Selection Experiment SettingChallenges
Experimental Results
PDDM & MPDM
Observational Data
Natural observations
Randomized Controlled Trials
(RCTs)
Need to control the treatment assignment
Ø Estimating ITE can help decision making:
Ø Why from the observational data?
Ø Missing counterfactuals!"#$ = &'()(*$) - &'(+(*$)
Ø Selection bias• Distributions of control & treated groups:
,(./) ≠ 2(.3)
• Counterfactual inference is more difficult.
• Provided data: (45 , 67(8), (49 , 67(:)
• (49, 67(:) infer
6;(:|45
• (45, 6;(8) infer
6;(8|43
One is always missing in the data;The missing one is the counterfactual.
Our Framework
Ø The latent space Z should encode:• The balanced distributions;• The similarity information in X. Existing worksignore this.
Ø SITE takes input units in a mini-batch fashion,which improves efficiency.
Ø =∗: propensity score (probability of a unit in treatedgroup). Can reflect the relative location of units inthe original space.
Ø Position-Dependent Deep Metric (PDDM):•Preserve the similarity.
Ø Middle Point Distance Minimization (MPDM):• MPDM makes the middle point of (?A, ? CD) close to the middle point of (?E, ?FG).
•Balance the distribution in the latent space.
IHDP Dataset: • Treatment: specialist home visits;• Outcome: Infants’ cognitive test scores;• Pre-treatment covariates: 25 covariates
measuring aspects of children and their mothers;• Performance measurement: Precision in
Estimation of Heterogeneous Effect (HIJKJ).
Jobs Dataset: • Treatment: Job training;• Outcome: employment status after training;• Pre-treatment covariates: 8 covariates, such as
age, education, ethnicity, previous earnings;• Performance measurement: policy risk (LMNO).
Twins Dataset: • Treatment: being the heavier one in twin;• Outcome: one year mortality;• Pre-treatment covariates: 40 covariates
measuring aspects of pregnancy;• Performance measurement: AUC.
Ø Results on three datasets:
Treated group .P(with treatment t = 1)
Control group .Q(with treatment t = 0)
Ø Individual treatment effect (ITE) of unit R:!"#$ = &'()(*$) - &'(+(*$)
Treated outcome Control outcome
Expensive Timeconsuming
Ethical issues…
Easy to access
Largeamount of
Data
☹ ☹ ☹
Who willbenefit fromjob training?
PolicyDecision:
Healthcare: Whichmedicine?
absolute location
relative location
Local similarity preserved individual treatment effect estimation (SITE):
" "Ø Code of SITE: https://github.com/Osier-Yi/SITE