+ All Categories
Home > Documents > Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems...

Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems...

Date post: 30-Jun-2019
Category:
Upload: truongquynh
View: 228 times
Download: 2 times
Share this document with a friend
18
Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 1 Missing Data Problems in Missing Data Problems in Missing Data Problems in Missing Data Problems in Machine Learning Machine Learning Machine Learning Machine Learning Senate Thesis Defense Senate Thesis Defense Senate Thesis Defense Senate Thesis Defense Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 8, 2008 Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 2 Contents: Contents: Contents: Contents: Overview Notation Theory Of Missing Data Factorization Classification What does MAR Mean? MAR and Inference MAR and Model Misspecification Unsupervised Learning – MAR Finite Multinomial Mixtures DP Multinomial Mixtures Factor Analysis and Mixtures Restricted Boltzmann Machines Unsupervised Learning – NMAR Problem, Data Sets, Protocols Finite Multinomial Mixture/CPT-v DP Multinomial Mixture/CPT-v Finite Multinomial Mixture/Logit-vd RBM/E-v Results Classification with Missing Data Generative Framework and LDA Discriminative Frameworks Linear Logistic Regression Kernel Logistic Regression Neural Networks Results Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 3 Overview: Overview: Overview: Overview: Theory of Missing Data Unsupervised Learning - MAR Classification with Missing Features Classification with Complete Data Unsupervised Learning - NMAR Collaborative Prediction Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 4 Basic Notation Basic Notation Basic Notation Basic Notation Number of data cases. Number of clusters or hidden units. Number of multinomial values. Number of classes. Number of data dimensions. Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 5 Missing Data Observed Data Missing Dimensions Observed Dimensions Response Vector Data Vector Basic Notation for Missing Data Basic Notation for Missing Data Basic Notation for Missing Data Basic Notation for Missing Data 0.3 0.7 0.2 0.9 0.1 1 1 0 0 1 5 4 1 3 2 0.3 0.7 0.1 0.2 0.9 Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of Toronto 6 Theory of Missing Data: Overview Background on the Theory of Missing Data (Little/Rubin): • Factorizations of the generative process • Three classes of missing data Missing Completely at Random (MCAR) Missing at Random (MAR) Not Missing at Random (NMAR) • The effect of each class of missing data on inference Extensions and Elaborations: • MAR assumption, multivariate data, and symmetry • MAR assumption and model misspecification
Transcript
Page 1: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 1

Missing Data Problems in Missing Data Problems in Missing Data Problems in Missing Data Problems in

Machine LearningMachine LearningMachine LearningMachine Learning

Senate Thesis DefenseSenate Thesis DefenseSenate Thesis DefenseSenate Thesis Defense

Ben MarlinMachine Learning Group

Department of Computer ScienceUniversity of Toronto

April 8, 2008

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 2

Contents: Contents: Contents: Contents: Overview

Notation

Theory Of Missing DataFactorization

Classification

What does MAR Mean?

MAR and Inference

MAR and Model Misspecification

Unsupervised Learning – MARFinite Multinomial Mixtures

DP Multinomial Mixtures

Factor Analysis and Mixtures

Restricted Boltzmann Machines

Unsupervised Learning – NMAR

Problem, Data Sets, ProtocolsFinite Multinomial Mixture/CPT-v

DP Multinomial Mixture/CPT-v

Finite Multinomial Mixture/Logit-vd

RBM/E-v

Results

Classification with Missing DataGenerative Framework and LDA

Discriminative Frameworks

Linear Logistic Regression

Kernel Logistic Regression

Neural Networks

Results

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 3

Overview: Overview: Overview: Overview:

Theory of Missing Data

Unsupervised Learning - MAR

Classification with Missing Features

Classification with Complete Data

Unsupervised Learning - NMAR

Collaborative Prediction

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 4

Basic NotationBasic NotationBasic NotationBasic Notation

Number of data cases.

Number of clusters or hidden units.

Number of multinomial values.

Number of classes.

Number of data dimensions.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 5

Missing Data

Observed Data

Missing

Dimensions

Observed

Dimensions

Response Vector

Data Vector

Basic Notation for Missing DataBasic Notation for Missing DataBasic Notation for Missing DataBasic Notation for Missing Data

0.30.70.20.90.1

11001

541

32

0.30.70.1

0.20.9

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 6

Theory of Missing Data: Overview

Background on the Theory of Missing Data (Little/Rubin):

• Factorizations of the generative process

• Three classes of missing data � Missing Completely at Random (MCAR)

� Missing at Random (MAR)

� Not Missing at Random (NMAR)

• The effect of each class of missing data on inference

Extensions and Elaborations:

• MAR assumption, multivariate data, and symmetry

• MAR assumption and model misspecification

Page 2: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 7

Theory of Missing Data: Factorizations

Data/Selection Model Factorization:

• The probability of selection depends on the true values

of the data variables and latent variables.

Pattern Mixture Model Factorization:

• Each response vector defines a different pattern, and each pattern has a different distribution over the data.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 8

Theory of Missing Data: Classification

MAR

MAR:

MCAR

MCAR:

NMAR

NMAR: No simplification in general.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 9

Theory of Missing Data: Classification

What Does it mean to be Missing at Random?

• MAR is not a statement of independence between

random variables. MAR requires that particular

symmetries hold so that P(R=r|X=x) can be determined from observed data only.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 10

Theory of Missing Data: Inference

• When data is NMAR, the selection model can not be

ignored. Doing so will “bias” inference, learning, and

prediction.

MCAR/MAR Posterior:

NMAR Posterior:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 11

Theory of Missing Data: Misspecification

Misspecified Missing Data Model, NMAR Missing Data:

• If missing data is NMAR, it is not sufficient to use any

missing data model. Inference is still biased if the wrong

missing data model is used.

Misspecified Data Model, MAR Missing Data:

• Even if missing data is MAR with respect to the

underlying generative process, inference for the parameters of a simpler data model can still be biased.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 12

Theory of Missing Data: Misspecification

Misspecified Data Model, MAR Missing Data:

• Consider a 2D binary example where the true data

model is the full four element CPT, and we approximate it

using a product of the two marginal distributions.

• Suppose the missing data model is MAR, and our goal is to estimate the marginal P(X1=1).

Page 3: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 13

Theory of Missing Data: Misspecification

Misspecified Data Model, MAR Missing Data:

• Suppose we estimate P(X1=1) under the marginal

model, and under the true model.

• We can show that Computing P(X1=1) under the

marginal model is equal to computing P(X1=1| R1=1).

• We can further prove that P(X1=1) is only equal to

P(X1=1| R1=1) if β = δ. This corresponds to the MCAR condition.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 14

Theory of Missing Data: Misspecification

Misspecified Data Model, MAR Missing Data:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 15

Unsupervised Learning – MAR: Overview

Background on Unsupervised Learning With Random Missing Data:

• Finite Bayesian Multinomial Mixture (MAP EM)

• Dirichlet Process Multinomial Mixture (Gibbs)• Finite Factor Analysis/PPCA Mixture (ML EM)

• Restricted Boltzmann Machines (Contrastive Divergence)

Extensions and Elaborations:

• Collapsed Gibbs sampler for DPMM with missing data

• Derivation of factor analysis mixture with missing data

• New view of RBM models for missing data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 16

Finite Bayesian Mixture:Finite Bayesian Mixture:Finite Bayesian Mixture:Finite Bayesian Mixture: Model

Probability Model:

Skip

Properties:• Allows for a fixed, finite number of clusters.

• In the multinomial mixture, P(xn|βk) is a

product of discrete distributions. The prior on βand θ is Dirichlet.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 17

Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Model

Dirichlet Distribution:

Bayesian mixture modeling becomes much easier when conjugate priors

are used for the model parameters. The conjugate prior for the mixture

proportions θ is the Dirichlet distribution.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 18

Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Learning

MAP EM Algorithm:

Page 4: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 19

Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Finite Bayesian Mixture: Prediction

Predictive Distribution:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 20

Dirichlet Process Mixture: Mixture: Mixture: Mixture: Model

Probability Model:

Properties:

• Since φ is discrete, the DPM can be viewed as a countably infinite mixture model.

• Another way to arrive at a DPM is to consider the limit

of a Bayesian mixture model with symmetric Dirichlet

prior as the number for components K goes to infinity.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 21

β

φ0(β)

Dirichlet Process Mixture:Mixture:Mixture:Mixture: DP

The Dirichlet Process:

Let α be a scalar and φ0 be a distribution on a random variable β. A

random distribution φ is a draw from DP(α,φ0) if and only if for any K and

any K-partition A1,…,AK of the space of β, the distribution over elements of

the partition induced by φ is given by the Dirichlet distribution:

φ(β)

A1 A2 A3

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 22

Dirichlet Process Mixture: Mixture: Mixture: Mixture: DP

• The only φ that satisfy the definition of the DP are discrete with probability 1.

• Drawing samples βn from the DP is

easy if φ0 can be sampled easily (Blackwell/McQueen).

• The “stick breaking” view of the DP

prior lets us incrementally construct a

draw φ from DP(α,φ0) .

Understanding φ:φ:φ:φ:φ0(β)

β

0 1θ1

β1

θ2

β2

θ3

β3

θ4

β4

θ5

β5

...

φ(β)

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 23

Dirichlet Process Mixture:Mixture:Mixture:Mixture: Inference

Collapsed Gibbs Sampler With Missing Data::::

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 24

Dirichlet Process Mixture: Mixture: Mixture: Mixture: Prediction

Predictive Distribution (Training Cases)::::

Page 5: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 25

Dirichlet Process Mixture: Mixture: Mixture: Mixture: Prediction

Predictive Distribution (Test Cases)::::

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 26

FA/PPCA Mixtures: : : : Model

Probability Model:

Properties:• Tn is a length Q real-valued vector ~ N(0,I).

• Factor loading matrix Λ is DxQ.

• Covariance matrix Ψ is DxD diagonal.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 27

FA/PPCA Mixtures: : : : Model

Joint Distribution of X and T given Z:

Marginal Distribution of X given Z:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 28

FA/PPCA Mixtures: : : : Learning

M-Step Inference with Missing Data:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 29

FA/PPCA Mixtures: : : : Learning

E-Step Updates with Missing Data:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 30

FA/PPCA Mixtures: : : : Prediction

Predictive Distribution:

Page 6: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 31

Conditional RBM’s: : : : Model

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 32

Conditional RBM’s: : : : Model

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 33

Conditional RBM’s: : : : Energy Function

Probability Model:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 34

Conditional RBM’s: : : : Inference

Gibbs Sampler:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 35

Conditional RBM’s: : : : Learning

Contrastive Divergence Gradients:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 36

Unsupervised Learning – NMAR:

Overview

Data Sets and Experimental Protocols:

• Conducted first ever survey of user rating behavior in a

recommender system.

• Collected first collaborative filtering data set that includes

both ratings for user-selected items and ratings for

randomly selected items.

• Designed new experimental protocols for collaborative

prediction to test methods that assume MAR vs methods

that model NMAR effects.

Page 7: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 37

Unsupervised Learning – NMAR:

Collaborative Prediction Problem

?

?

?

?

? ?

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 38

Unsupervised Learning – NMAR:

Data Sets: Yahoo!

Data was collected through an online survey of Yahoo! Music LaunchCast radio users.

• 1000 songs selected at random.

• Users rate 10 songs selected at

random from 1000 songs.

• Answer 16 questions.

• Collected data from 35,000+

users.

Image copyright Yahoo! Inc. 2006. Used with permission.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 39

Unsupervised Learning – NMAR:

Data Sets: Yahoo!User Selected Randomly Selected

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 40

Unsupervised Learning – NMAR:

Data Sets: JesterJester gauge set of 10 jokes used as complete data. Synthetic missing data was added.

• 15,000 users randomly selected

• Missing data model: µv(s) = s(v-3)+0.5

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 41

Unsupervised Learning – NMAR:

Data Sets: User Splits

5311

23455

134

4132

451

5311

23455

134

4132

4531

User Selected Ratings

Randomly Selected Ratings

23455

134

4132

23455

134

4132

451

4531

5311

5311

Test Users

Train Users

Holdout Users

Test Users

Train Users

Holdout Users

All

Users

All

Users

1

2

3

4

5

1

2

3

4

5

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 42

Unsupervised Learning – NMAR:

Data Sets: Rating Splits

User Selected Ratings

Randomly Selected

Ratings

23455

134

4132

23455

134

4132

23455

134

4132

23455

134

4132

Observed Ratings

Test Ratings for User Selected Items

Test Ratings

for Randomly Selected items23455

134

4132

Page 8: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 43

Unsupervised Learning – NMAR:

Data Sets: Connection to Notation

User Selected Ratings

23455

134

4132

23455

134

4132

Observed Ratings

45

R

455

34

1

23455

134

4132

A X

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 44

Unsupervised Learning – NMAR:

Experimental Protocols

Weak Generalization:

• Learn on training user observed ratings.

• Evaluate on training user test ratings for user selected

items, and training user test ratings for randomly selected

items.

Strong Generalization:

• Learn on training user observed ratings.

• Evaluate on test user test ratings for user selected items,

and test user test ratings for randomly selected items.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 45

Unsupervised Learning – NMAR: Models

• Follow a modeling strategy based on combining probabilistic models for complete data with simple models of the missing

data process.

• Consider complete data models including finite Bayesian mixtures, Dirichlet Process mixtures, and RBM’s.

• Consider two basic missing data models: CPT-v and LOGIT-vd.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 46

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Model

Probability Model:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 47

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Identifiability

Identifiability:

2D Binary Example:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 48

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Identifiability

2D Binary Example:

• This system will have a unique solution for µ1 and µ2 if both

are greater than 0, and the matrix Φ of sums of φw

coefficients is non-singular.

• This result is easily extended to the general case of D

dimensions and V multinomial values.

Page 9: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 49

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Learning

MAP EM Algorithm (E-Step):

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 50

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Learning

MAP EM Algorithm (M-Step):

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 51

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Prediction

Predictive Distribution

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 52

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Results

Yahoo! Weak Generalization Results: MM vs MM/CPT-v

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 53

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Results

Yahoo! Weak Generalization Results: MM vs MM/CPT-v+

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 54

Finite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPTFinite Mixture/CPT----v: v: v: v: Results

Jester Results: MM vs MM/CPT-v

Page 10: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 55

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Model

Probability Model:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 56

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Inference

Auxiliary Variable Gibbs: Mixture Indicator Update

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 57

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Inference

Auxiliary Variable Gibbs: Auxiliary Count Update

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 58

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Inference

Auxiliary Variable Gibbs: Parameter Updates

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 59

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Prediction

Predictive Distribution

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 60

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Results

Yahoo! Results: DP vs DP/CPT-v and DP/CPT-v+

Page 11: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 61

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: Results

Jester Results: DP vs DP/CPT-v

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 62

DP Mixture/CPTDP Mixture/CPTDP Mixture/CPTDP Mixture/CPT----v: v: v: v: MCMC Diagnostics

Example Parameter Traces on Jester

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 63

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Model

Probability Model:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 64

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Learning

MAP GEM Algorithm (E-Step):

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 65

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Learning

MAP GEM Algorithm (M-Step):

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 66

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Prediction

Predictive Distribution

Page 12: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 67

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Results

Yahoo! Weak Generalization Results: MM vs MM/LOGIT-vd

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 68

Finite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGITFinite Mixture/LOGIT----vd: vd: vd: vd: Results

Jester Results: MM vs MM/LOGIT-vd

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 69

Conditional RBM: : : : Model

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 70

Conditional RBM: : : : Model

Probability Model:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 71

Conditional RBM: : : : Inference

Gibbs Sampler:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 72

Conditional RBM: : : : Learning

Contrastive Divergence Gradients:

Page 13: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 73

Conditional RBM:::: Results

Yahoo! Weak Generalization Results: cRBM vs cRBM-v

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 74

Conditional RBMConditional RBMConditional RBMConditional RBM: : : : Results

Jester Results: cRBM vs cRBM-v

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 75

Unsupervised Learning – NMAR: Results

• Methods that model NMAR effects perform significantly better than methods that don’t on synthetic and real data.

• Differences between methods that model NMAR effects are small by comparison, but still significant.

• Results show a big win for rating prediction when a small number of ratings for randomly selected items is available at

training time.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 76

Unsupervised Learning – NMAR:

Comparison of Results on Yahoo! Data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 77

Unsupervised Learning – NMAR:

NEW: Ranking Results

• : mean of posterior predictive distribution for test item i.

• : rank of test item i according to .

• : rank of test item i according to .

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 78

Unsupervised Learning – NMAR:

NEW: Comparison of Yahoo! Ranking Results

Weak Generalization:

Page 14: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 79

Unsupervised Learning – NMAR:

NEW: Comparison of Yahoo! Ranking Results

Strong Generalization:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 80

Classification with Missing Data:

Background on Classification with Complete Data:

• Linear/Regularized Discriminant Analysis

• Logistic Regression• Perceptrons and SVMs

• Kernel Methods and Kernel Logistic Regression

• Multi-Layer Neural Networks

Frameworks for Classification with Missing Features:

• Generative Classifiers

• Single and Multiple Imputation• Reduced Models/Classification in Subspaces

• Response Indicator Augmentation

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 81

Factor Analysis Covariance Model:

Generative Framework

Linear Discriminant Analysis

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 82

Generative Framework

Linear Discriminant Analysis

Generative Training:

• Estimate class means from incomplete data

• Run EM for Factor analysis with missing data to estimate pooled covariance parameters

Discriminative Training:

• Directly maximize the conditional likelihood of the labels

given incomplete features.

• Non-linear gradient descent in the negative log conditional likelihood.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 83

Generative Framework

Linear Discriminant Analysis

Training Data Generative

Learning

Discriminative

Learning

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 84

Imputation Framework

Zero Imputation: Replace missing feature values with zeros.

Page 15: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 85

Imputation Framework

Mean Imputation: Replace missing feature values with mean of observed values for each feature.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 86

Imputation Framework

Multiple Imputation: Replace missing feature values with samples of xm given xo drawn from several imputation models.

1 5

Mixture of Factor Analyzers

K=1, Q=1

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 87

Imputation Framework

Multiple Imputation: Replace missing feature values with samples of xm given xo drawn from several imputation models.

1 5

Mixture of Factor Analyzers

K=3, Q=1

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 88

Reduced Models Framework

Reduced Models: Each observed data subspace defined by a pattern of missing data gives a separate classification problem.

R=[0,1] R=[1,1]

R=[1,0]

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 89

Response Augmentation Framework

Response Augmentation: Set missing features to zero and augment feature representation with response indicators.

X=[0.0, 1.1, 0, 1]~

X=[2.0, 1.1, 1, 1]~

X=[-2.8, 0, 1, 0]~

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 90

Logistic Regression: Model

Linear logistic regression optimizes the conditional likelihood of the class labels given the features using gradient methods.

• Can exactly represent the class posterior of exponential family class conditional models with shared dispersion.

Page 16: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 91

Logistic Regression: Synthetic Data

Training Data Zero Imputation Mean Imputation

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 92

Mix FA Imputation

Q=1,K=3

Mix FA Imputation

Q=1,K=2

Mix FA Imputation

Q=1,K=1

Logistic Regression: Synthetic Data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 93

Response

Augmented Reduced

Logistic Regression: Synthetic Data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 94

Kernel Logistic Regression: Model

Kernel logistic regression optimizes the conditional likelihood of the class labels given training data and a kernel function.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 95

Kernel Logistic Regression:

Basic Kernels for Missing Data

Linear:

Polynomial:

Gaussian:

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 96

Kernel Logistic Regression:

Response Augmented Kernels for Missing Data

Linear:

Polynomial:

Gaussian:

Page 17: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 97

Polynomial Kernel Logistic RegressionTraining Data Zero Imputation Mean Imputation

Augmented Reduced Mix FA Imputation

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 98

Gaussian Kernel Logistic RegressionTraining Data Zero Imputation Mean Imputation

Augmented Reduced Mix FA Imputation

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 99

Neural Networks: Model

Multi-Layer Sigmoid neural network with cross entropy loss optimizes the conditional likelihood of the class labels given

the features using backpropagation.

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 100

Neural Networks:Training Data Zero Imputation Mean Imputation

Augmented Reduced Mix FA Imputation

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 101

Classification with Missing Data:

UCI Hepatitis

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 102

Classification with Missing Data:

UCI Thyroid-AllHypo

Page 18: Contents: Missing Data Problems in Machine Learningmarlin/research/phd... · Missing Data Problems in Machine Learning Benjamin Marlin Department of Computer Science, University of

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 103

Classification with Missing Data:

UCI Thyroid-Sick

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 104

Classification with Missing Data:

MNIST Digit Classification with Missing Data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 105

Classification:

MNIST Digit Classification with Missing Data

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 106

Classification:

gKLR Augmented Kernel Details

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 107

Classification:

gKLR Augmented Kernel Details

Raw Kernel Matrix Adjusted Kernel Matrix

0

1.2

Missing Data Problems in Machine Learning

Benjamin Marlin

Department of Computer Science, University of Toronto 108

The End


Recommended