Actively Transfer Domain Knowledge

transcript

Xiaoxiao Shi† Wei Fan‡ Jiangtao Ren†

†Sun Yat-sen University‡IBM T. J. Watson Research Center

Transfer when you can, otherwise ask and don’t stretch it

Standard Supervised Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

In Reality……

New York Times

training (labeled)

test (unlabeled)

New York Times

Labeled data are insufficient!

How to improve the

performance?

Solution I : Active Learning

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

LabelDomain Expert

Labeling Cost

Solution II : Transfer Learning

Reuters

Out-of-domaintraining (labeled)

In-domaintest (unlabeled)

Transfer Classifier

New York Times

No guarantee transfer learning

could help!

Accuracydrops

Significant Differences

82.6%??43.5%

Motivation

• Active Learning:– Labeling cost

• Transfer Learning:– Domain difference risk

Both have disadvantages,

what to choose?

Active Learner choose

Proposed Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Labeled

Training

Unlabeled in-domainTraining Data

out-domain training

(labeled)

Transfer Classifier

-X: In-domain

unlabeled

1. Classify X by out-of-domain Mo: P(L+|X, Mo) and P(L-|X, Mo).

2. Classify X by mapping classifiers ML+ and ML-: P(+|X, ML+) and P(+|X, ML-).

3. Then the probability for X to be “+” is:

T(X) = P(+|X) = P(L+|X, Mo) × P(+|X, ML+) + P(L-|X, Mo) ×P(+|X, ML-)

Out-of-domain dataset (labeled)

In-domain labeled

P(L+|X, Mo )

P(L-|X, M o)

P(+|X, ML+)

, ML- )

TrainTrain

L-In-domain

labeled (very few)

L+ = { (x,y=+/-)|Mo(x)=‘L+’ }the true in-domain

label may be either‘-’or ‘+’

-/L--/L+

+/L-+/L+In-domain

Transfer Mo mapping

Active

Learner

Our Solution (AcTraK)

Reuters

Transfer Classifier

Domain Expert

Unreliable

DecisionFunction

Reliable, label by the classifier

ClassificationResult

Labeled

Training

unlabeledTraining Data

outdomain training

(labeled)

when prediction by transfer classifier is unreliable, ask domain experts

Decision Function

Transfer Classifier

• In the following, ask the domain expert to label the instance, not the transfer classifier:

a) Conflict b) Low in confidence c) Few labeled in-domain examples

Decision Function

a) Conflict? b) Confidence? c) Size?

Decision Function:

Label by Transfer ClassifierLabel by Domain Expert

R : random number [0,1]

AcTraK asks the domain expert to label the instance with probability of

T(x): prediction by the transfer classifierML(x): prediction given by the in-domain classifier

• It can reduce domain difference risk.- According to Theorem 2, the expected error is bounded.

• It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.

Properties

Theorems

expected error of the transfer classifier

Maximum size

• Data Sets

– Synthetic data sets– Remote Sensing: data collected from regions with a

specific ground surface condition data collected from a new region

– Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup)

• Comparable Models– Inductive Learning model: AdaBoost, SVM– Transfer Learning model: TrAdaBoost (ICML’07)– Active Learning model: ERS (ICML’01)

Experiments setup

Experiments on Synthetic Datasets

In-domain:2 labeled training

&testing

4 out domain labeled training

Experiments on Real World DatasetEvaluation metric:• Compared with transfer learning on accuracy.• Compared with active learning on IEA (Integral

Evaluation on Accuracy).

1. Comparison with Transfer Learner

2. Comparison with Active Learner

20 Newsgroup

Accuracy Compari son

1 2 3 4 5 6Datasets

Accuracy

SVMTrAdaBoostAcTraK

I EA(AcTraK, ERS, 250)

1 2 3 4 5 6

Datasets

• comparison with active learner ERS

• Actively Transfer Domain Knowledge

– Reduce domain difference risk: transfer useful knowledge (Theorem 2)

– Reduce labeling cost: query domain experts only when necessary (Theorem 3)

Conclusions

Actively Transfer Domain Knowledge

Documents