Post on 02-Jan-2016
transcript
Problem Identification
Dataset : AYear: 2000Features: 48
TrainingModel
‘M’ Testing
98.6%
TrainingModel
‘M’ Testing
97%
Dataset : BYear: 2006Features: 96
Model‘M’Training Testing
60.9% ??
Transfer Learning
Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.
Traditional Machine Learning vs. Transfer
Source Task
Knowledge
Target Task
Learning System
Different Tasks
Learning System
Learning System
Learning System
Traditional Machine Learning
Transfer Learning
Transfer Learning Definition
Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where
or
Examples: Cancer Data
Task
Sour
ce: C
lass
ify
into
can
cer o
r no
canc
erTa
sk T
arge
t: C
lass
ify
into
can
cer le
vel o
ne,
canc
er le
vel t
wo,
canc
er le
vel t
hree
Settings of Transfer Learning
Transfer learning settings
Labelled data in a source domain
Labelled data in a target domain
Tasks
Inductive Transfer Learning × √ Classification
Regression…
√ √Transductive Transfer Learning √ × Classification
Regression…
Unsupervised Transfer Learning × × Clustering
…
Questions to answer when transferring
Wha
t to
Tra
nsfe
r ?
How
to T
ransf
er ?
When
to
Tra
nsf
er ?
Inst
ance
s
?
Mod
el ?
Featu
res ?
Map
M
odel
?
Uni
fy
Feat
ures
?
Wei
ght
Inst
ance
s ?
In w
hich
Situ
atio
ns
What to Transfer ??
Transfer learning approaches Description
Instance-transfer To re-weight some labeled data in a source domain for use in the target domain
Feature-representation-transfer Find a “good” feature representation that reduces difference between a source and a target domain
or minimizes error of models
Model-transfer Discover shared parameters or priors of models between a source domain and a target domain
Relational-knowledge-transfer Build mapping of relational knowledge between a source domain and a target domain.
Inductive Transfer Learning (Instance-transfer)
• Assumption: the source domain and target domain data use exactly the same features and labels.
• Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting.
• Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.
Instance-transfer
• Assumptions: • Source and Target task have same feature space:
• Marginal distributions are different:
Not all source data might be helpful !
Algorithm: TrAdaBoost
• Idea:
• Iteratively reweight source samples such that: • reduce effect of “bad” source instances• encourage effect of “good” source instances
• Requires:
• Source task labeled data set • Very small Target task labeled data set• Unlabeled Target data set • Base Learner
Self taught clustering
• Unsupervised transfer learning• Co-clustering, no labelled data
• Feature based transfer learning• Features are not the same• Tasks may not be the same
• First applied on image clustering
• Key idea: found high level shared features, new feature representation
Latent Dirichlet Allocation (LDA)
• LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.
• Typically used for topic modeling• Forums, twitter messages, text corpus
• Do not consider word order• Can be viewed as a dimension reduction technique.