+ All Categories
Home > Documents > Multi-Task Machine Learning - Yunsheng...

Multi-Task Machine Learning - Yunsheng...

Date post: 11-Aug-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
60
Multi-Task Machine Learning Wasi Ahmad
Transcript
Page 1: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Multi-Task Machine LearningWasi Ahmad

Page 2: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

2

Page 3: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

What is Multi-Task Learning?

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and

differences across tasks.

- Wikipedia

3

Page 4: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

What is Multi-Task Learning?

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training

signals of related tasks as an inductive bias.

- Rich Caruana, 1997

4

Page 5: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

5

Page 6: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Motivation

● Learning multiple tasks jointly with the aim of mutual benefit● Inductive transfer helps to improve a model by introducing inductive bias

○ Common form of inductive bias: L1 regularization○ L1 regularization leads to a preference for sparse solutions

● Improves generalization on other tasks○ Caused by the inductive bias provided by the auxiliary task

6

Page 7: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Web Pages Categorization

● Classify documents into categories● The classification of each category is a task● The tasks of predicting different categories may be latently related

7Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Page 8: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Collaborative Ordinal Regression

● The preference prediction of each user can be modeled using ordinal regression

● Some users have similar tastes and their predictions may also have similarities

● Simultaneously perform multiple prediction to use such similarity information

8Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Page 9: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL for HIV Therapy Screening

● Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms

● The sample available for each combination is limited● For a patient, the prediction of using one combination is a task● Use the similarity information by simultaneously infer multiple tasks

9Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Page 10: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Image courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012 10

Single Task Learning vs. Multi-Task Learning

Page 11: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

11

Page 12: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Learning Methods

12

Source: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Page 13: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Key QuestionWhat to Share? How to Share?

13

Page 14: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

14

Page 15: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

15

Page 16: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

16

Page 17: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

17

Page 18: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Learning Approach

● Why we need to learn common feature representations?○ Original features may not have enough expressive power

● Two sub-categories of feature learning approach○ Feature transformation approach○ Feature selection approach

18

Page 19: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Learning Approach

● Feature transformation approach○ The learned features are a linear or nonlinear transformation of the original feature

representations.

● Feature selection approach○ Selects a subset of the original features as the learned representations○ Eliminates useless features based on different criteria

19

Page 20: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Transformation Approach

● Multi-task feedforward NN

20A Survey on Multi-Task Learning, 2017

Page 21: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Transformation Approach

● Context-sensitive multi-task feedforward NN

21Inductive transfer with context-sensitive neural networks, 2008

Page 22: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Transformation Approach

● Regularization Framework

● First term measures the empirical loss on the training sets of all the tasks● Second term enforces parameter matrix to be row-sparse

○ Equivalent to selecting features after transformation

22A Survey on Multi-Task Learning, 2017

Page 23: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Transformation Approach

● Regularization Framework

23A Survey on Multi-Task Learning, 2017

Page 24: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Selection Approach

● Regularization Framework

● The regularizer on W is to enforce W to be row-sparse, which helps to select important features

24A Survey on Multi-Task Learning, 2017

Page 25: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Feature Transformation vs. Selection

● Feature transformation fits data better than selection approach● Feature transformation can generalize well

○ If there is no overfitting

● Feature selection has better interpretability● Feature transformation is preferred -

○ If an application needs better performance

● Feature selection is preferred - ○ If the application needs a decision support

25A Survey on Multi-Task Learning, 2017

Page 26: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

26

Page 27: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Two MTL Methods for Deep Learning

● Hard Parameter Sharing○ Generally applied by sharing the hidden layers between all tasks.○ Keeps several task-specific output layers.

● Soft Parameter Sharing○ Each task has its own model with its own parameters.○ The distance between the parameters of the model is regularized in

order to encourage the parameters to be similar.

27

Page 28: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Two MTL Methods for Deep Learning

28Hard parameter sharing

Soft parameter sharing

Page 29: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Cross-stitch Networks

29Courtesy: http://ruder.io/multi-task/

Page 30: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

30

Page 31: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

31

Page 32: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Low-Rank Approach

● Assumes the model parameters of different tasks share a low-rank subspace*.

● The objective function can be formulated as:

32* A framework for learning predictive structures from multiple tasks and unlabeled data, 2005

Page 33: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

33

Page 34: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Clustering Approach

● First, cluster the tasks into groups○ Learn a task transfer matrix○ Minimizing pairwise within-class distances○ Maximizing pairwise between-class distances

● Second, learn classifier on the training data of tasks in a cluster○ A weighted nearest neighbor classifier is proposed*

34* Discovering Structure in Multiple Learning Tasks: The TC Algorithm, ICML 1996

Page 35: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 1: A global penalty to measure on average on average how large the parameters are:

35* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Page 36: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 2: A measure of between-cluster variance to quantify the distance among different clusters.

36* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Where,

Page 37: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 3: A measure of within-cluster variance to quantify the compactness of task clusters.

● Final regularizer:

37* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Page 38: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Clustering Approach

● Cluster tasks by identifying representative tasks○ A subset of the given tasks

38* Flexible clustered multi-task learning by learning representative tasks, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016

Page 39: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

39

Page 40: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 1: Task relations are given○ Similar task parameters are expected to be close○ Utilize task similarities to design regularizers

40A Survey on Multi-Task Learning, 2017

Page 41: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 2: Learn task relations from data○ Global learning model

■ Multi-task Gaussian process (defined as prior on functional values for training data)■ Keep the task covariance matrix positive definite

○ Local learning model■ Ex., kNN classifier (learning function as a weighted voting of neighbors)

41A Survey on Multi-Task Learning, 2017

Page 42: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

42

Page 43: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Dirty Approach

● Assumption: parameter matrix, W can be decomposed into two component matrices U and V

● Objective function can be defined as:

● g(U) and h(V) can be defined as*:

43* A dirty model for multi-task learning, NIPS 2010

Page 44: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Dirty Approach

● g(U) and h(V) defined in different ways in the following works○ Learning incoherent sparse and low-rank patterns from multiple tasks, SIGKDD, 2010○ Integrating low-rank and group-sparse structures for robust multi-task learning, SIGKDD

2011○ Robust multi-task feature learning, SIGKDD 2012○ Convex multi-task learning with flexible task clusters, ICML 2012

44A Survey on Multi-Task Learning, 2017

Page 45: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

45

Page 46: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Multi-Level Approach

● An extension of the dirty approach● Assumption: parameter matrix, W can be decomposed into h component

matrices

● Multi-level approach has more expressive power than the dirty approach● Represent task clusters as a tree - learn relations from structure

46A Survey on Multi-Task Learning, 2017

Page 47: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

47

Page 48: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Deep Relationship Networks

48Courtesy: http://ruder.io/multi-task/

Page 49: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Fully Adaptive Feature Sharing

49Courtesy: http://ruder.io/multi-task/

Page 50: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Cross-stitch Networks

50Courtesy: http://ruder.io/multi-task/

Page 51: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

A Joint Many Task Model

51

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, EMNLP 2017

Page 52: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Sluice Networks

52Courtesy: http://ruder.io/multi-task/

Page 53: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Multi-Task Sequence to Sequence Learning

53Multi-Task Sequence to Sequence Learning, ICLR 2016

Page 54: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Multi-Task Learning for IR tasks

54

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval, NAACL 2015

Page 55: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Multi-Task Domain Adaptation

55Multi-Task Domain Adaptation for Sequence Tagging, Rep4NLP, 2017

Page 56: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Adversarial Multi-Task Learning

56Adversarial Multi-task Learning for Text Classification, ACL 2017

Page 57: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

One Model to Learn Them All

57

Page 58: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

One Model to Learn Them All

58

Page 59: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Reference

● A Survey on Multi-Task Learning● An Overview of Multi-Task Learning in Deep Neural Networks

59

Page 60: Multi-Task Machine Learning - Yunsheng Baiyunshengb.com/wp-content/uploads/2017/11/Multi-Task...Dirty Approach g(U) and h(V) defined in different ways in the following works Learning

Thank You

60


Recommended