Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model...

transcript

Transfer Learning Task

Problem Identification

Dataset : AYear: 2000Features: 48

TrainingModel

‘M’ Testing

TrainingModel

‘M’ Testing

Dataset : BYear: 2006Features: 96

Model‘M’Training Testing

60.9% ??

Transfer Learning

Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.

Traditional Machine Learning vs. Transfer

Source Task

Knowledge

Target Task

Learning System

Different Tasks

Learning System

Traditional Machine Learning

Transfer Learning

Transfer Learning Definition

Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where

Transfer Definition

• Therefore, if either : Domain Differences

Task Differences

Examples: Cancer Data

Age Smoking

Age Height Smoking

Examples: Cancer Data

cer le

Settings of Transfer Learning

Transfer learning settings

Labelled data in a source domain

Labelled data in a target domain

Inductive Transfer Learning × √ Classification

Regression…

√ √Transductive Transfer Learning √ × Classification

Regression…

Unsupervised Transfer Learning × × Clustering

Questions to answer when transferring

What to Transfer ??

Transfer learning approaches Description

Instance-transfer To re-weight some labeled data in a source domain for use in the target domain

Feature-representation-transfer Find a “good” feature representation that reduces difference between a source and a target domain

or minimizes error of models

Model-transfer Discover shared parameters or priors of models between a source domain and a target domain

Relational-knowledge-transfer Build mapping of relational knowledge between a source domain and a target domain.

Inductive Transfer Learning (Instance-transfer)

• Assumption: the source domain and target domain data use exactly the same features and labels.

• Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting.

• Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.

Instance-transfer

• Assumptions: • Source and Target task have same feature space:

• Marginal distributions are different:

Not all source data might be helpful !

Algorithm: TrAdaBoost

• Idea:

• Iteratively reweight source samples such that: • reduce effect of “bad” source instances• encourage effect of “good” source instances

• Requires:

• Source task labeled data set • Very small Target task labeled data set• Unlabeled Target data set • Base Learner

Self taught clustering

• Unsupervised transfer learning• Co-clustering, no labelled data

• Feature based transfer learning• Features are not the same• Tasks may not be the same

• First applied on image clustering

• Key idea: found high level shared features, new feature representation

Self Taught Learning

Self taught learning

Latent Dirichlet Allocation (LDA)

• LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words.

• Typically used for topic modeling• Forums, twitter messages, text corpus

• Do not consider word order• Can be viewed as a dimension reduction technique.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model...

Documents