+ All Categories
Home > Documents > Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM...

Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM...

Date post: 21-Dec-2015
Category:
View: 243 times
Download: 10 times
Share this document with a friend
Popular Tags:
84
Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology Updated PPT is available: http//www.weifan.info/tutorial.htm
Transcript
Page 1: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications

Wei Fan, IBM T.J.Watson Research

Masashi Sugiyama, Tokyo Institute of Technology

Updated PPT is available:

http//www.weifan.info/tutorial.htm

Page 2: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Overview of Sample Selection Bias Problem

Page 3: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

A Toy Example

Two classes:red and green

red: f2>f1green: f2<=f1

Page 4: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Unbiased and Biased Samples

Not so-biased sampling Biased sampling

Page 5: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Effect on Learning

Unbiased 97.1% Biased 92.1% Unbiased 96.9% Biased 95.9%Unbiased 96.405% Biased 92.7%• Some techniques are more sensitive to bias than others.

• One important question:– How to reduce the effect of sample selection

bias?

Page 6: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Ubiquitous

• Loan Approval• Drug screening• Weather forecasting• Ad Campaign• Fraud Detection• User Profiling• Biomedical Informatics• Intrusion Detection• Insurance • etc

1. Normally, banks only have data of their own customers2. “Late payment, default” models are computed using their own data3. New customers may not completely follow the same distribution.

Page 7: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Face Recognition• Sample selection bias:

– Training samples are taken inside research lab, where there are a few women.

– Test samples: in real-world, men-women ratio is almost 50-50.

The Yale Face Database B

Page 8: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Brain-Computer Interface (BCI)

• Control computers by EEG signals:– Input: EEG signals– Output: Left or Right

Figure provided by Fraunhofer FIRST, Berlin, Germany

Page 9: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Training• Imagine left/right-hand movement

following the letter on the screen

Movie provided by Fraunhofer FIRST, Berlin, Germany

Page 10: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Testing: Playing Games• “Brain-Pong”

Movie provided by Fraunhofer FIRST, Berlin, Germany

Page 11: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Non-Stationarity in EEG Features

Bandpower differences betweentraining and test phases

• Different mental conditions (attention, sleepiness etc.) between training and test phases may change the EEG signals.

Features extracted from brain activityduring training and test phases

Figures provided by Fraunhofer FIRST, Berlin, Germany

Page 12: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Robot Controlby Reinforcement Learning

• Let the robot learn how to autonomously move without explicit supervision.

Khepera Robot

Page 13: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Rewards

• Give robot rewards:– Go forward: Positive reward– Hit wall: Negative reward

• Goal: Learn the control policy that maximizes future rewards

Robot moves autonomously

= goes forward without hitting wall

Page 14: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Example• After learning:

Page 15: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Policy Iteration and Covariate Shift

• Policy iteration:

• Updating the policy correspond to changing the input distributions!

Evaluatecontrol policy

Improvecontrol policy

Page 16: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Different Types of Sample Selection Bias

Page 17: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Bias as Distribution

• Think of “sampling an example (x,y) into the training data” as an event denoted by random variable s – s=1: example (x,y) is sampled into the training data– s=0: example (x,y) is not sampled.

• Think of bias as a conditional probability of “s=1” dependent on x and y

• P(s=1|x,y) : the probability for (x,y) to be sampled into the training data, conditional on the example’s feature vector x and class label y.

Page 18: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Categorization(Zadrozy’04, Fan et al’05, Fan and Davidson’07)

– No Sample Selection Bias• P(s=1|x,y) = P(s=1)

– Feature Bias/Covariate Shift• P(s=1|x,y) = P(s=1|x)

– Class Bias• P(s=1|x,y) = P(s=1|y)

– Complete Bias• No more reduction

Page 19: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

• Alternatively, consider D of the size can be sampled “exhaustively” from the universe of examples.

Bias for a Training Set

• How P(s=1|x,y) is computed

• Practically, for a given training set D– P(s=1|x,y) = 1: if (x,y) is sampled into D– P(s=1|x,y) = 0: otherwise

Page 20: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Realistic Datasets are biased?

• Most datasets are biased.

• Unlikely to sample each and every feature vector.

• For most problems, it is at least feature bias.– P(s=1|x,y) = P(s=1|x)

Page 21: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Effect on Learning

• Learning algorithms estimate the “true conditional probability” – True probability P(y|x), such as P(fraud|x)?– Estimated probabilty P(y|x,M): M is the model

built.

• Conditional probability in the biased data.– P(y|x,s=1)

• Key Issue:– P(y|x,s=1) = P(y|x) ?

Page 22: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Bias Resolutions

Page 23: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Heckman’s Two-Step Approach

• Estimate one’s donation amount if one does donate.• Accurate estimate cannot be obtained by a regression using only

data from donors.• First Step: Probit model to estimate probability to donate:

• Second Step: regression model to estimate donation:

• Expected error

• Gaussian assumption

Page 24: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Covariate Shift or Feature Bias• However, no chance for generalization

if training and test samples have nothing in common.

• Covariate shift: – Input distribution changes

– Functional relation remains unchanged

Page 25: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Example of Covariate Shift(Weak) extrapolation:

Predict output values outside training region

Training samples

Test samples

Page 26: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Covariate Shift Adaptation

Training samples

Test samples

• To illustrate the effect of covariate shift, let’s focus on linear extrapolation

True function

Learned function

Page 27: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Generalization Error= Bias + Variance

: expectation over noise

Page 28: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Model Specification• Model is said to be correctly specified if

• In practice, our model may not be correct.

• Therefore, we need a theory for misspecified models!

Page 29: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Ordinary Least-Squares (OLS)

• If model is correct:– OLS minimizes bias

asymptotically

• If model is misspecified:– OLS does not minimize

bias even asymptotically.

We want to reduce bias!

Page 30: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Law of Large Numbers• Sample average converges to the

population mean:

• We want to estimate the expectation over test input points only using training input points .

Page 31: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Key Trick:Importance-Weighted Average

• Importance: Ratio of test and training input densities

• Importance-weighted average:

(cf. importance sampling)

Page 32: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Importance-Weighted LS

• Even for misspedified models, IWLS minimizes bias asymptotically.

• We need to estimate importance in practice.

:Assumed strictly positive

(Shimodaira, JSPI2000)

Page 33: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Use of Unlabeled Samples: Importance Estimation

• Assumption: We have training inputs and test inputs .

• Naïve approach: Estimate and separately, and take the ratio of the density estimates

• This does not work well since density estimation is hard in high dimensions.

Page 34: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Vapnik’s Principle

• Directly estimating the ratio is easier than estimating the densities!

When solving a problem,more difficult problems shouldn’t be solved.

Knowing densities Knowing ratio

(e.g., support vector machines)

Page 35: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Modeling Importance Function

• Use a linear importance model:

• Test density is approximated by

• Idea: Learn so that well approximates .

Page 36: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Kullback-Leibler Divergence

(constant)

(relevant)

Page 37: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Learning Importance Function

• Thus

• Since is density,

(objective function)

(constraint)

Page 38: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

KLIEP (Kullback-LeiblerImportance Estimation

Procedure)

• Convexity: unique global solution is available

• Sparse solution: prediction is fast!

(Sugiyama et al., NIPS2007)

Page 39: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Examples

Page 40: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Experiments: Setup

• Input distributions: standard Gaussian with– Training: mean (0,0,…,0)– Test: mean (1,0,…,0)

• Kernel density estimation (KDE):– Separately estimate training and test input

densities.– Gaussian kernel width is chosen by likelihood

cross-validation.

• KLIEP– Gaussian kernel width is chosen by likelihood

cross-validation

Page 41: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

• KDE: Error increases as dim grows

• KLIEP: Error remains small for large dim

Experimental Results

KDE

KLIEP

Nor

mal

ized

MS

E

dim

Page 42: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Ensemble Methods (Fan and Davidson’07)

Posterior weighting

Class Probability

Integration Over Model Space

Averaging of estimated class probabilities weighted by posterior

Removes model uncertainty by averaging

Page 43: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

How to Use Them

• Estimate “joint probability” P(x,y) instead of just conditional probability, i.e.,– P(x,y) = P(y|x)P(x) – Makes no difference use 1 model, but

Multiple models

Page 44: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Examples of How This Works

• P1(+|x) = 0.8 and P2(+|x) = 0.4

• P1(-|x) = 0.2 and P2(-|x) = 0.6

• model averaging, – P(+|x) = (0.8 + 0.4) / 2 = 0.6– P(-|x) = (0.2 + 0.6)/2 = 0.4– Prediction will be –

Page 45: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

• But if there are two P(x) models, with probability 0.05 and 0.4

• Then– P(+,x) = 0.05 * 0.8 + 0.4 * 0.4 = 0.2– P(-,x) = 0.05 * 0.2 + 0.4 * 0.6 = 0.25

• Recall with model averaging: – P(+|x) = 0.6 and P(-|x)=0.4– Prediction is +

• But, now the prediction will be – instead of +• Key Idea:

– Unlabeled examples can be used as “weights” to re-weight the models.

Page 46: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Structure Discovery (Ren et al’08)

Original Dataset

Structural Discovery

Structural Re-balancingCorrected Dataset

Page 47: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Active Learning• Quality of learned functions depends on

training input location .

• Goal: optimize training input location

Good input location Poor input location

TargetLearned

Page 48: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Challenges

• Generalization error is unknown and needs to be estimated.

• In experiment design, we do not have training output values yet.

• Thus we cannot use, e.g., cross-validation which requires .

• Only training input positions can be used in generalization error estimation!

Page 49: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Agnostic Setup

• The model is not correct in practice.

• Then OLS is not consistent.

• Standard “experiment design” method does not work!

(Fedorov 1972; Cohn et al., JAIR1996)

Page 50: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Bias Reduction byImportance-Weighted LS (IWLS)

• The use of IWLS mitigates the problem of in consistency under agnostic setup.

• Importance is known in active learning setup since is designed by us!

Importance

(Wiens JSPI2001; Kanamori & Shimodaira JSPI2003; Sugiyama JMLR2006)

Page 51: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Model Selection and Testing

Page 52: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Model Selection• Choice of models is crucial:

• We want to determine the model so that generalization error is minimized:

Polynomial of order 1 Polynomial of order 2 Polynomial of order 3

Page 53: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Generalization Error Estimation

• Generalization error is not accessible since the target function is unknown.

• Instead, we use a generalization error estimate.

Model complexity Model complexity

Page 54: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Cross-Validation• Divide training samples into groups.

• Train a learning machine with groups.

• Validate the trained machine using the rest.

• Repeat this for all combinations and output the mean validation error.

• CV is almost unbiased without covariate shift.

• But, it is heavily biased under covariate shift!

Group 1 Group 2 Group kGroup k-1…

Training Validation

Page 55: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Importance-Weighted CV (IWCV)

• When testing the classifier in CV process, we also importance-weight the test error.

• IWCV gives almost unbiased estimates of generalization error even under covariate shift

Set 1 Set 2 Set kSet k-1…

Training Testing

(Zadrozny ICML2004; Sugiyama et al., JMLR2007)

Page 56: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Example of IWCV

• IWCV gives better estimates of generalization error.

• Model selection by IWCV outperforms CV!

Page 57: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Reserve Testing (Fan and Davidson’06)

Train

A

B

MA

MB

Test

A

B

MAA

MAB

MBA

MBB

Train

Estimate the performance of MA and MB based on the order of MAA, MAB, MBA and MBB

DA

DB

Labeledtest data

Page 58: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Rule• If “A’s labeled test data” can construct “more

accurate models” for both algorithm A and B evaluated on labeled training data, then A is expected to be more accurate.– If MAA > MAB and MBA > MBB then choose A

• Similarly, – If MAA < MAB and MBA < MBB then choose B

• Otherwise, undecided.

Page 59: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Why CV won’t work?

Sparse Region

Page 60: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Examples

Page 61: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Ozone Day Prediction (Zhang et al’06)

– Daily summary maps of two datasets from Texas Commission on Environmental Quality (TCEQ)

Page 62: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

1. Rather skewed and relatively sparse distribution

– 2500+ examples over 7 years (1998-2004)

– 72 continuous features with missing values

– Large instance space• If binary and uncorrelated, 272 is an

astronomical number

– 2% and 5% true positive ozone days for 1-hour and 8-hour peak respectively

Challenges as a Data Mining Problem

Page 63: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

3. A large number of irrelevant features– Only about 10 out of 72 features verified to be

relevant,– No information on the relevancy of the other 62

features– For stochastic problem, given irrelevant

features Xir , where X=(Xr, Xir), P(Y|X) = P(Y|Xr) only if the data is exhaustive.

– May introduce overfitting problem, and change the probability distribution represented in the data.

• P(Y = “ozone day”| Xr, Xir) 1 • P(Y = “normal day”|Xr, Xir) 0

Page 64: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

4. “Feature sample selection bias”.

– Given 7 years of data and 72 continuous features, hard to find many days in the training data that is very similar to a day in the future

– Given these, 2 closely-related challenges1. How to train an accurate model2. How to effectively use a model to predict

the future with a different and yet unknown distribution

Training Distribution

Testing Distribution

12

3

12

3

+ +

+

+

+

+

- -

Page 65: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Reliable probability estimation under irrelevant

features– Recall that due to irrelevant

features:• P(Y = “ozone day”| Xr, Xir) 1 • P(Y = “normal day”|Xr, Xir) 0

– Construct multiple models – Average their predictions

• P(“ozone”|xr): true probability• P(“ozone”|Xr, Xir, θ): estimated probability by

model θ• MSEsinglemodel:

– Difference between “true” and “estimated”.• MSEAverage

– Difference between “true” and “average of many models”

• Formally show that MSEAverage ≤ MSESingleModel

Page 66: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

• Prediction with feature sample selection bias

TrainingS

et Algorithm

…..

Estimated probability

values1 fold

Estimated probability

values10 fold

10CV

10CV

Estimated probability

values2 fold

Decision threshold

VE

VE

“Probability-TrueLabel”

file

Concatenate

Concate

nate

P(y=“ozoneday”|x,θ) Lable

7/1/98 0.1316 Normal

7/2/98 0.6245 Ozone

7/3/98 0.5944 Ozone

………

PrecRecplot

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Recall

Pre

cisi

on

MaMb

– A CV based procedure for decision threshold selection

Training Distribution

Testing Distribution

12

3

12

3

+ +

+

+

+

+

- -

P(y=“ozoneday”|x,θ) Lable

7/1/98 0.1316 Normal

7/3/98 0.5944 Ozone

7/2/98 0.6245 Ozone

………

Page 67: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Addressing Data Mining Challenges

• Prediction with feature sample selection bias– Future prediction based on decision

threshold selectedWhole TrainingSet

θ

Classification on

future days

if P(Y = “ozonedays”|X,θ ) ≥ VE

Predict “ozonedays”

Page 68: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Results

Page 69: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

KDD/Netflix CUP’07 Task1 (Liu and Kou,07)

Page 70: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Task 1Task 1: Who rated what in 2006

Given a list of 100,000 pairs of users and movies, predict for each pair the probability that the user rated the movie in 2006Result: They are the close runner-up, No 3 out of 39 teams

Challenges:•Huge amount of data how to sample the data so that any learning algorithms can be applied is critical•Complex affecting factors: decrease of interest in old movies, growing tendency of watching (reviewing) more movies by Netflix users

Page 71: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

17K

mo

vie

s

Training Data Task 2

Task 1 Movie Arrival

1998 Time 2005 2006

User Arrival

4 5 ?

3

2

?

QualifierDataset

3M

NO Useror MovieArrival

NETFLIX data generation process

Page 72: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Task 1: Effective Sampling Strategies

• Sampling the movie-user pairs for “existing” users and “existing” movies from 2004, 2005 as training set and 4Q 2005 as developing set– The probability of picking a movie was proportional to the number

of ratings that movie received; the same strategy for usersMovies

Users

HistorySamples……

Movie5 .0011 ……

Movie3 .001……

Movie4 .0007

……

User7 .0007 ……

User6 .00012……

User8 .00003……

……

Movie5 User 7 ……

Movie3 User 7……

Movie4 .User 8

….1488844,3,2005-09-06822109,5,2005-05-13885013,4,2005-10-1930878,4,2005-12-26823519,3,2004-05-03

Page 73: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

• Learning Algorithm:– Single classifiers: logistic regression, Ridge regression, decision tree, support vector

machines– Naïve Ensemble: combining sub-classifiers built on different types of features with pre-

set weights– Ensemble classifiers: combining sub-classifiers with weights learned from the

development set

Page 74: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Brain-Computer Interface (BCI)

• Control computers by brain signals:– Input: EEG signals– Output: Left or Right

Page 75: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

BCI Results

• When KL is large, covariate shift adaptation tends to improve accuracy.

• When KL is small, no difference.

Subject TrialNo

adaptationWith

adaptationKL

1

1 9.3 % 10.0 % 0.76

2 8.8 % 8.8 % 1.11

3 4.3 % 4.3 % 0.69

2

1 40.0 % 40.0 % 0.97

2 39.3 % 38.7 % 1.05

3 25.5 % 25.5 % 0.43

3

1 36.9 % 34.4 % 2.63

2 21.3 % 19.3 % 2.88

3 22.5 % 17.5 % 1.25

4

1 21.3 % 21.3 % 9.23

2 2.4 % 2.4 % 5.58

3 6.4 % 6.4 % 1.83

51 21.3 % 21.3 % 0.79

2 15.3 % 14.0 % 2.01

KL divergence from trainingto test input distributions

Page 76: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Robot Control byReinforcement Learning

• Swing-up inverted pendulum:– Swing-up the pole by

controlling the car.– Reward:

Page 77: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

ResultsCovariate shift adaptation

Existing method (b)

Existing method (a)

Page 78: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Demo: Proposed Method

Page 79: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Wafer Alignment inSemiconductor Exposure

Apparatus• Recent silicon wafers have layer structure.

• Circuit patterns are exposed multiple times.

• Exact alignment of wafers is very important.

Page 80: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Markers on Wafer• Wafer alignment process:

– Measure marker location printed on wafers.– Shift and rotate the wafer to minimize the gap.

• For speeding up, reducing the number of markers to measure is very important.

Active learning problem!

Page 81: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Non-linear Alignment Model

• When gap is only shift and rotation, linear model is exact:

• However, non-linear factors exist, e.g.,– Warp– Biased characteristic of measurement apparatus– Different temperature conditions

• Exactly modeling non-linear factors is very difficult in practice!

Agnostic setup!

Page 82: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Experimental Results

• IWLS-based active learning works very well!

20 markers (out of 38) are chosen by experiment design methods. Gaps of all markers are predicted. Repeated for 220 different wafers. Mean (standard deviation) of the gap prediction error Red: Significantly better by 5% Wilcoxon test Blue: Worse than the baseline passive method

IWLS-based OLS-based“Outer”

heuristicPassive

2.27(1.08) 2.37(1.15) 2.36(1.15) 2.32(1.11)

(Sugiyama & Nakajima ECML-PKDD2008)

Mean squared error of wafer position estimation

Page 83: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Conclusions

Page 84: Sample Selection Bias – Covariate Shift: Problems, Solutions, and Applications Wei Fan, IBM T.J.Watson Research Masashi Sugiyama, Tokyo Institute of Technology.

Book on Dataset Shift

• Quiñonero-Candela, Sugiyama, Schwaighofer & Lawrence (Eds.), Dataset Shift in Machine Learning, MIT Press, Cambridge, 2008.


Recommended