Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | anissa-shelton |
View: | 216 times |
Download: | 0 times |
Reasoning about Uncertainty in High-dimensional Data Analysis
Adel Javanmard
Stanford University
1
What is high-dimensional data?
• Modern data sets are both massive and fine-grained.
2
# Features (variables) > # Observations (Samples)
• A trend in modern data analysis.
High-Dimensional Data: an example
Allergiestype
reactionseverity
start year stop year
…..
Diagnosis infoICD9 codesDescriptionstart year stop year
…..
Medicationsname
strengthschedule
…..
Transcript recordsage
genderBMI
heart rate billing information
…..
Lab resultsHl7 text
valueabnormalityobs. year
…..
Medical images
3
What can we do with such data?
• Extract useful, actionable information.
• Predictive models for: clinical outcomes patient evolution readmission rate …
• Design (or advise) treatment clinical interventions and trials
4
Health Care Reform
HITECH
Heritage Health Prize
• More than 71 million persons are admitted to hospitals each year.
• Over $30 billion was spent on unnecessary hospital readmissions (2006).
Diabetes Example
• n = 500 (patients)• P = 805 (variables) medical information: medications, lab results, diagnosis, …
[Data from Practice Fusion posted on kaggle]
• Find significant variables in predicting Type 2-Diabetes
• “People with higher Bilirubin are more susceptible to diabetes ”
How certain we are about this claim?
5
para
met
ers
Problem of Uncertainty Assessment
6
How stable are these estimates? What can we say about the true parameters?
indices
Confidence intervalspa
ram
eter
s
index
7
Blood pressure
Why is it hard?
• Low dimensional regime ( fixed, )
8
Large Sample Theory
• Situation in high-dimensional regime is very different!
9
• Much progress has been achieved for high-dimensional parameter estimation high-dimensional variable/feature selection high-dimensional prediction
Supp ( )
[Tibshirani, Donoho, Cai, Zhou, Candés, Tao, Bickel, van de Geer, Ritov, Bühlmann, Meinshausen, Zhao, Yu, Wainwright, …]
10
How to assign measures of uncertainty to each single parameter?
Other examples
11
Targeted online advertising Personalized medicine
Social Networks Collaborative filtering
Genomics
Overview of Regularized Estimators
12
Regularized Estimators
• Investigate low dimensional structures in data
minimize Loss +λ Model Complexity
• Mitigates spurious correlations noise accumulation instability (to noise and sampling)
• This comes at a price. biased (towards small complexity) nonlinear and non explicit
13
Diabetes Example
• Patient gets type-2 diabetes• Variables of patient
• Convex optimization• Variable selection (some of )
contribution of feature
logistic loss
argmin
14
Selects 62 interesting features.
We want to construct confidence intervals for each parameter.15
What is a confidence interval?
• We observe data generated by
16
• Confidence intervals:
Confidence intervals are random objects.
Confidence level
Why uncertainty assessment?
Scientific discoveries
99% confidence
70% confidence
17
• Curry increases the cognitive capacity of the brain. [Tze-Pin Ng, 2006]
• Beautiful parents have more daughters than ugly parents. [Kanazawa 2006]
• Left-handedness in males has a significant level on wage level. [Ruebeck 2006]
“Why most published research findings are wrong?” [John P. A. Ioannidis]
Why uncertainty assessment?
Decision making
18
System state:
We take measurements
abnormal zone
normal zone
State space
Why uncertainty assessment?
Optimization/ Stopping rules
19
First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)
iteration
Optimization is a tool not the goal!
Why uncertainty assessment?
Optimization/ Stopping rules
20
First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)
Stopping point
iteration
Reasoning about Uncertainty
21
Setup
p
Gaussian noise with mean zero and covariance .
= +n p
22
Lasso
[Tibshirani 1996, Chen, Donoho 1996]
distribution of ?
argmin
23
Deterministic Random
Approach 1: Sample splitting
Lasso Subset S of variables
[Wasserman, Röeder 2009, Bühlmann, Meier, Meinshausen 2009]24
S S
Least square Distribution of
explicitly
Problems with sample splitting
• Have to cut half of data
• Assumes Lasso on selects all relevant features (plus some).
• It depends on the splitting.
25
Approach 2: Bootstrap
Data Sampled Data
26
fails because of the bias!
Our approach: de-biasing Lasso
27
Classical setting n>p:
Unbiased estimator Precise distributional characterization
Gaussian error
Problem in high dimension (n <p):
is not invertible!
Our approach: de-biasing Lasso
28
Use your favorite M
Lets (try to) subtract the bias
Gaussian errorBiasBias
Geometric interpretation
29
Ball
subgradient of
How should we choose M?
Bias Error
30
We want small bias and small error.
Choosing M?
i
31
minimize Var (Error )
subject to |Bias | ≤ ξ
i
i
feasible set
• m : i-th row of M• e : (0,0,..,1,0,…0)
ii
Bias
Varia
nce
infeasible
ξξ = ξ*
minimize
subject to ξ
What does it look like?
is not sparse!32
Distribution of our estimator?
Neglecting the bias
33
Distribution of our estimator?
Histogram Q-Q plot
‘Ground-truth’ from n = 10000 records.tot34
Confidence intervalsco
effici
ents
indices
35
Blood pressure
Coverage: 93.6%
Theorem (Javanmard, Montanari 2013)
Assume has i.i.d. subgaussian rows with covariance . Also eigenvalues of are bounded as sample size grows.Then, asymptotically as , with ,
36
Main Theorem
number of truly significant variables (number of nonzero parameters).
What is s?
Consequences
• Confidence interval for each individual parameter
37
• Length of confidence intervals do not depend on p.
• This is optimal.
Summary (so far)
• High dimensionality and regularized estimators
• Uncertainty assessment for parameter estimations
• Optimality
38
R-package will be available soon!
Further insights and related work
39
Two questions
• How general?
• What about smaller sample size?
40
Question1: How to generalize it?
Regularized estimators:
argminloss regularizer
Suppose that loss decomposes over samples:
41
Question1: How to generalize it?
• Debiasing the regularized estimator
• Find M by solving the same optimization problem
minimize
Subject to ξ
42
Fisher information
Question 2: How about smaller sample size?
• Estimation, prediction:
[Candés, Tao 2007, Bickel et al. 2009]
• Uncertainty assessment, confidence intervals:
[This talk]
Can we match the optimal sample size, ?
43
Can we match the optimal sample size, ?
• Javanmard, Montanari, 2013 Sample size, . Gaussian designs. Exact asymptotic characterization.
• Javanmard, Montanari, 2013 Sample size, . Confidence intervals have (nearly) optimal average length.
44
Related work
• Lockhart, Taylor, Tibshirani, Tibshirani, 2012 Test significance along Lasso path.
• Zhang, Zhang, 2012, Van de Geer, Bühlmann, Ritov, 2013 Assume structure on X. For random designs is assumed to be sparse. Optimality in terms of semiparametric efficiency.
• Bühlmann, 2012 Tests overly conservative.
45
Future directions
46
Two directions
• Uncertainty assessment for predictions
• Other applications
47
Thank you!
48