Reasoning about Uncertainty in High-dimensional Data Analysis Adel Javanmard Stanford University 1.

Post on 13-Jan-2016

216 views 0 download

Tags:

transcript

Reasoning about Uncertainty in High-dimensional Data Analysis

Adel Javanmard

Stanford University

1

What is high-dimensional data?

• Modern data sets are both massive and fine-grained.

2

# Features (variables) > # Observations (Samples)

• A trend in modern data analysis.

High-Dimensional Data: an example

Allergiestype

reactionseverity

start year stop year

…..

Diagnosis infoICD9 codesDescriptionstart year stop year

…..

Medicationsname

strengthschedule

…..

Transcript recordsage

genderBMI

heart rate billing information

…..

Lab resultsHl7 text

valueabnormalityobs. year

…..

Medical images

3

What can we do with such data?

• Extract useful, actionable information.

• Predictive models for: clinical outcomes patient evolution readmission rate …

• Design (or advise) treatment clinical interventions and trials

4

Health Care Reform

HITECH

Heritage Health Prize

• More than 71 million persons are admitted to hospitals each year.

• Over $30 billion was spent on unnecessary hospital readmissions (2006).

Diabetes Example

• n = 500 (patients)• P = 805 (variables) medical information: medications, lab results, diagnosis, …

[Data from Practice Fusion posted on kaggle]

• Find significant variables in predicting Type 2-Diabetes

• “People with higher Bilirubin are more susceptible to diabetes ”

How certain we are about this claim?

5

para

met

ers

Problem of Uncertainty Assessment

6

How stable are these estimates? What can we say about the true parameters?

indices

Confidence intervalspa

ram

eter

s

index

7

Blood pressure

Why is it hard?

• Low dimensional regime ( fixed, )

8

Large Sample Theory

• Situation in high-dimensional regime is very different!

9

• Much progress has been achieved for high-dimensional parameter estimation high-dimensional variable/feature selection high-dimensional prediction

Supp ( )

[Tibshirani, Donoho, Cai, Zhou, Candés, Tao, Bickel, van de Geer, Ritov, Bühlmann, Meinshausen, Zhao, Yu, Wainwright, …]

10

How to assign measures of uncertainty to each single parameter?

Other examples

11

Targeted online advertising Personalized medicine

Social Networks Collaborative filtering

Genomics

Overview of Regularized Estimators

12

Regularized Estimators

• Investigate low dimensional structures in data

minimize Loss +λ Model Complexity

• Mitigates spurious correlations noise accumulation instability (to noise and sampling)

• This comes at a price. biased (towards small complexity) nonlinear and non explicit

13

Diabetes Example

• Patient gets type-2 diabetes• Variables of patient

• Convex optimization• Variable selection (some of )

contribution of feature

logistic loss

argmin

14

Selects 62 interesting features.

We want to construct confidence intervals for each parameter.15

What is a confidence interval?

• We observe data generated by

16

• Confidence intervals:

Confidence intervals are random objects.

Confidence level

Why uncertainty assessment?

Scientific discoveries

99% confidence

70% confidence

17

• Curry increases the cognitive capacity of the brain. [Tze-Pin Ng, 2006]

• Beautiful parents have more daughters than ugly parents. [Kanazawa 2006]

• Left-handedness in males has a significant level on wage level. [Ruebeck 2006]

“Why most published research findings are wrong?” [John P. A. Ioannidis]

Why uncertainty assessment?

Decision making

18

System state:

We take measurements

abnormal zone

normal zone

State space

Why uncertainty assessment?

Optimization/ Stopping rules

19

First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)

iteration

Optimization is a tool not the goal!

Why uncertainty assessment?

Optimization/ Stopping rules

20

First order methods for large scale data (coordinate descent, mirror descent, Nesterov’s method, …)

Stopping point

iteration

Reasoning about Uncertainty

21

Setup

p

Gaussian noise with mean zero and covariance .

= +n p

22

Lasso

[Tibshirani 1996, Chen, Donoho 1996]

distribution of ?

argmin

23

Deterministic Random

Approach 1: Sample splitting

Lasso Subset S of variables

[Wasserman, Röeder 2009, Bühlmann, Meier, Meinshausen 2009]24

S S

Least square Distribution of

explicitly

Problems with sample splitting

• Have to cut half of data

• Assumes Lasso on selects all relevant features (plus some).

• It depends on the splitting.

25

Approach 2: Bootstrap

Data Sampled Data

26

fails because of the bias!

Our approach: de-biasing Lasso

27

Classical setting n>p:

Unbiased estimator Precise distributional characterization

Gaussian error

Problem in high dimension (n <p):

is not invertible!

Our approach: de-biasing Lasso

28

Use your favorite M

Lets (try to) subtract the bias

Gaussian errorBiasBias

Geometric interpretation

29

Ball

subgradient of

How should we choose M?

Bias Error

30

We want small bias and small error.

Choosing M?

i

31

minimize Var (Error )

subject to |Bias | ≤ ξ

i

i

feasible set

• m : i-th row of M• e : (0,0,..,1,0,…0)

ii

Bias

Varia

nce

infeasible

ξξ = ξ*

minimize

subject to ξ

What does it look like?

is not sparse!32

Distribution of our estimator?

Neglecting the bias

33

Distribution of our estimator?

Histogram Q-Q plot

‘Ground-truth’ from n = 10000 records.tot34

Confidence intervalsco

effici

ents

indices

35

Blood pressure

Coverage: 93.6%

Theorem (Javanmard, Montanari 2013)

Assume has i.i.d. subgaussian rows with covariance . Also eigenvalues of are bounded as sample size grows.Then, asymptotically as , with ,

36

Main Theorem

number of truly significant variables (number of nonzero parameters).

What is s?

Consequences

• Confidence interval for each individual parameter

37

• Length of confidence intervals do not depend on p.

• This is optimal.

Summary (so far)

• High dimensionality and regularized estimators

• Uncertainty assessment for parameter estimations

• Optimality

38

R-package will be available soon!

Further insights and related work

39

Two questions

• How general?

• What about smaller sample size?

40

Question1: How to generalize it?

Regularized estimators:

argminloss regularizer

Suppose that loss decomposes over samples:

41

Question1: How to generalize it?

• Debiasing the regularized estimator

• Find M by solving the same optimization problem

minimize

Subject to ξ

42

Fisher information

Question 2: How about smaller sample size?

• Estimation, prediction:

[Candés, Tao 2007, Bickel et al. 2009]

• Uncertainty assessment, confidence intervals:

[This talk]

Can we match the optimal sample size, ?

43

Can we match the optimal sample size, ?

• Javanmard, Montanari, 2013 Sample size, . Gaussian designs. Exact asymptotic characterization.

• Javanmard, Montanari, 2013 Sample size, . Confidence intervals have (nearly) optimal average length.

44

Related work

• Lockhart, Taylor, Tibshirani, Tibshirani, 2012 Test significance along Lasso path.

• Zhang, Zhang, 2012, Van de Geer, Bühlmann, Ritov, 2013 Assume structure on X. For random designs is assumed to be sparse. Optimality in terms of semiparametric efficiency.

• Bühlmann, 2012 Tests overly conservative.

45

Future directions

46

Two directions

• Uncertainty assessment for predictions

• Other applications

47

Thank you!

48