Reasoning about Uncertainty in High-dimensional Data Analysis Adel Javanmard Stanford University 1.

transcript

Reasoning about Uncertainty in High-dimensional Data Analysis

Adel Javanmard

Stanford University

What is high-dimensional data?

• Modern data sets are both massive and fine-grained.

# Features (variables) > # Observations (Samples)

• A trend in modern data analysis.

High-Dimensional Data: an example

Allergiestype

reactionseverity

start year stop year

Diagnosis infoICD9 codesDescriptionstart year stop year

Medicationsname

strengthschedule

Transcript recordsage

genderBMI

heart rate billing information

Lab resultsHl7 text

valueabnormalityobs. year

Medical images

What can we do with such data?

• Extract useful, actionable information.

• Predictive models for: clinical outcomes patient evolution readmission rate …

• Design (or advise) treatment clinical interventions and trials

Health Care Reform

HITECH

Heritage Health Prize

• More than 71 million persons are admitted to hospitals each year.

• Over $30 billion was spent on unnecessary hospital readmissions (2006).

Diabetes Example

• n = 500 (patients)• P = 805 (variables) medical information: medications, lab results, diagnosis, …

[Data from Practice Fusion posted on kaggle]

• Find significant variables in predicting Type 2-Diabetes

• “People with higher Bilirubin are more susceptible to diabetes ”

How certain we are about this claim?

Problem of Uncertainty Assessment

How stable are these estimates? What can we say about the true parameters?

indices

Confidence intervalspa

Blood pressure

Why is it hard?

• Low dimensional regime ( fixed, )

Large Sample Theory

• Situation in high-dimensional regime is very different!

• Much progress has been achieved for high-dimensional parameter estimation high-dimensional variable/feature selection high-dimensional prediction

Supp ( )

[Tibshirani, Donoho, Cai, Zhou, Candés, Tao, Bickel, van de Geer, Ritov, Bühlmann, Meinshausen, Zhao, Yu, Wainwright, …]

How to assign measures of uncertainty to each single parameter?

Other examples

Targeted online advertising Personalized medicine

Social Networks Collaborative filtering

Genomics

Overview of Regularized Estimators

Regularized Estimators

• Investigate low dimensional structures in data

minimize Loss +λ Model Complexity

• Mitigates spurious correlations noise accumulation instability (to noise and sampling)

• This comes at a price. biased (towards small complexity) nonlinear and non explicit

Diabetes Example

• Patient gets type-2 diabetes• Variables of patient

• Convex optimization• Variable selection (some of )

contribution of feature

logistic loss

argmin

Selects 62 interesting features.

We want to construct confidence intervals for each parameter.15

What is a confidence interval?

• We observe data generated by

• Confidence intervals:

Confidence intervals are random objects.

Confidence level

Why uncertainty assessment?

Scientific discoveries

99% confidence

70% confidence

• Curry increases the cognitive capacity of the brain. [Tze-Pin Ng, 2006]

• Beautiful parents have more daughters than ugly parents. [Kanazawa 2006]

• Left-handedness in males has a significant level on wage level. [Ruebeck 2006]

“Why most published research findings are wrong?” [John P. A. Ioannidis]

Decision making

System state:

We take measurements

abnormal zone

normal zone

State space

Optimization/ Stopping rules

First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)

iteration

Optimization is a tool not the goal!

Optimization/ Stopping rules

First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)

Stopping point

iteration

Reasoning about Uncertainty

Gaussian noise with mean zero and covariance .

= +n p

[Tibshirani 1996, Chen, Donoho 1996]

distribution of ?

argmin

Deterministic Random

Approach 1: Sample splitting

Lasso Subset S of variables

[Wasserman, Röeder 2009, Bühlmann, Meier, Meinshausen 2009]24

Least square Distribution of

explicitly

Problems with sample splitting

• Have to cut half of data

• Assumes Lasso on selects all relevant features (plus some).

• It depends on the splitting.

Approach 2: Bootstrap

Data Sampled Data

fails because of the bias!

Our approach: de-biasing Lasso

Classical setting n>p:

Unbiased estimator Precise distributional characterization

Gaussian error

Problem in high dimension (n <p):

is not invertible!

Our approach: de-biasing Lasso

Use your favorite M

Lets (try to) subtract the bias

Gaussian errorBiasBias

Geometric interpretation

subgradient of

How should we choose M?

Bias Error

We want small bias and small error.

Choosing M?

minimize Var (Error )

subject to |Bias | ≤ ξ

feasible set

• m : i-th row of M• e : (0,0,..,1,0,…0)

infeasible

ξξ = ξ*

minimize

subject to ξ

What does it look like?

is not sparse!32

Distribution of our estimator?

Neglecting the bias

Distribution of our estimator?

Histogram Q-Q plot

‘Ground-truth’ from n = 10000 records.tot34

Confidence intervalsco

effici

indices

Blood pressure

Coverage: 93.6%

Theorem (Javanmard, Montanari 2013)

Assume has i.i.d. subgaussian rows with covariance . Also eigenvalues of are bounded as sample size grows.Then, asymptotically as , with ,

Main Theorem

number of truly significant variables (number of nonzero parameters).

What is s?

Consequences

• Confidence interval for each individual parameter

• Length of confidence intervals do not depend on p.

• This is optimal.

Summary (so far)

• High dimensionality and regularized estimators

• Uncertainty assessment for parameter estimations

• Optimality

R-package will be available soon!

Further insights and related work

Two questions

• How general?

• What about smaller sample size?

Question1: How to generalize it?

Regularized estimators:

argminloss regularizer

Suppose that loss decomposes over samples:

Question1: How to generalize it?

• Debiasing the regularized estimator

• Find M by solving the same optimization problem

minimize

Subject to ξ

Fisher information

Question 2: How about smaller sample size?

• Estimation, prediction:

[Candés, Tao 2007, Bickel et al. 2009]

• Uncertainty assessment, confidence intervals:

[This talk]

Can we match the optimal sample size, ?

• Javanmard, Montanari, 2013 Sample size, . Gaussian designs. Exact asymptotic characterization.

• Javanmard, Montanari, 2013 Sample size, . Confidence intervals have (nearly) optimal average length.

Related work

• Lockhart, Taylor, Tibshirani, Tibshirani, 2012 Test significance along Lasso path.

• Zhang, Zhang, 2012, Van de Geer, Bühlmann, Ritov, 2013 Assume structure on X. For random designs is assumed to be sparse. Optimality in terms of semiparametric efficiency.

• Bühlmann, 2012 Tests overly conservative.

Future directions

Two directions

• Uncertainty assessment for predictions

• Other applications

Thank you!

Reasoning about Uncertainty in High-dimensional Data Analysis Adel Javanmard Stanford University 1.

Documents