Active Learning NG

7/26/2019 Active Learning NG

1/70

ACTIVE LEARN IN G

Navneet Goyal

Slides developed using material from:1. Simon Tong, ACTIVE EA!NING: T"E#!$ AN% A&&ICATI#NS.&'.%. dissertation, Stanford (niversity, August, )**1.). +urr Settles. ACTIVE EA!NING ITE!AT(!E S(!VE$. Computer

Sienes Te'nial !eport 1-/, (niversity of 0isonsin2adison. )**3


2/70

Introduction

If I tell you t'at you an a'ieve 4etterauray 5it' less training, 5ould you4elieve me6

N#77 It is possi4le 5'en t'e learning

algorit'm is: Allo5ed to 4e 8urious9

Allo5ed to 'oose t'e data from 5'i' itlearns

It is possi4le 5it' ACTIVE EA!NING7


3/70

Introduction

2aority of 2 tas;s fall under: Supervised earning

(nsupervised earning

@or all supervised ? unsupervised learningtas;s, 5e =rst need to gat'er signi=antamount of data randomly sampled from

t'e underlying population distri4ution T'is is &ASSIVE learning77

So 5'at is ACTIVE learning6


4/70

Passive Learning

Figure taken from Simon Tongs PhD Thesis


5/70

Introduction

#ne of t'e most resoure intensive tas;is gat'ering of data7

In most ases, 5e 'ave limited resoures

for olleting data Try to ma;e t'e 4est use of t'ese

resoures

!andomly olleted data instanes are

independent ? identially distri4uted Can 5e guide t'e sampling proess6


6/70

Introduction

In most ases, data is a4undantlyavaila4le

2ails, images, videos, songs, spee'es,

douments, ratings, t5eets, et. 0'i' of t'ese are dierent from ot'ers6

2ails ? ratings a4eled data is freely availa4le

#t'ers6 a4eled instanes are very diBult, time

onsuming, ? epensive to o4tain


7/70

Introduction

Some Eamples 5'ere la4eled data is'ard to ome 4y:

Spee' !eognition

%oument Classi=ation

Image ? Video annotation


8/70

Introduction

Spee' !eognition Aurate la4eling of spee' utteranes is

etremely time onsuming and reDuirestrained linguists

Annotation at t'e 5ord level an ta;e tentimes longer t'an t'e atual audio


9/70

Labeling bottleneck

Ative learning systems attempt to overomet'e la4eling 4ottlene; 4y as;ing Dueries in t'eform of unla4eled instanes to 4e la4eled 4y an

orale


10/70

Introduction

%oument lassi=ation

arge pool of unla4elled doumentsavaila4le

!andomly pi; douments to 4ela4eled manually

#!

Carefully 'oose from t'epool t'at are to 4e la4eled


11/70

Introduction

&arameter estimation and struture disovery tas;s

Studying lung aner in a medial setting

preliminary list of t'e ages and smo;ing 'a4its ofpossi4le andidates t'at 5e 'ave t'e option of

furt'er eamining. A4ilityresoures to give only a fe5 people a

t'oroug' eamination

Instead of randomly 'oosing a su4set of t'e

andidate population to eamine 5e may Duery forandidates t'at =t ertain pro=les .


12/70

Active Learning

0e need not = our desired Dueries inadvane

Instead, 5e an 'oose our net Duery

4ased upon t'e ans5ers to ourprevious Dueries

T'e proess of guiding t'e samplingproess 4y Duerying for ertain types ofinstanes 4ased upon t'e data t'at 5e'ave seen so far is alled activelearning


13/70

Active Learning

Figure taken from Simon Tongs PhD Thesis

An active learner difers rom a passivelearner which simply receives a random dataset rom the world and then outputs aclassier or model


14/70

Active Learning

An interesting analogy7 A passive learner is a student t'at gat'ers

information 4y sitting and listening to atea'er 5'ile an ative learner is a student

t'at as;s t'e tea'er Duestions, listens to t'eans5ers and as;s furt'er Duestions 4asedupon t'e tea'ers response

T'is etra a4ility to adaptively Duery t'e 5orld

4ased upon past responses 5ould allo5 anative learner to perform 4etter t'an apassive learner


15/70

Active Learning

T'e ore dierene 4et5een an ativelearner and a passive learner is t'e a4ilityto as; Dueries a4out t'e 5orld 4asedupon t'e past Dueries and responses

T'e notion of 5'at eatly a Duery is and5'at response it reeives 5ill dependupon t'e eat tas; at 'and

T'e possi4ility of using ative learning anarise naturally in a variety of domains ?in several variants


16/70

Active Learning

T'e ;ey 'ypot'esis is t'at if t'elearning algorit'm is allo5ed to'oose t'e data from 5'i' it learnsJto 4e 8urious9 , if you 5illJit 5illperform 4etter 5it' less training

A 8urious9 student generally

performs 5ell77 %o you agree66

$ou 4etter agree and 4eome a

8urious9 student


17/70

Active Learning

2 algorit'ms 'oose t'e trainingtuples from a large pool

0'at do t'ey gain 4y doing so6 Improved Auray6

If $ES, "o56


18/70

Active Learning

Also alled 8Kuery earning9 in 2

8#ptimal Eperiment %esign9 in

Statistis +y Duerying unla4elled data

0'at ;ind of Dueries6

"o5 Dueries are formulated6 Kuery strategy frame5or;sActive Learning provides a more efficient and more accurate

solutions as compared to Passive Learning


19/70

Som e M otivating Exam ples*

earning T'res'old @untions Consider =rst t'e tas; of learning a

t'res'old funtion of a single

varia4le. A singleFvaria4le t'res'old funtion

fL : ! M 1O, parametried 4y t'e

real num4er L ! t'res'old value, isde=ned 4y

PAlgorit'ms for Ative earning%aniel Qosep' "su, Colum4ia (niv. %issertation, )*1*


20/70


earning T'res'old @untions (sed for lassifying univariate data

&assive learner 5ill 4e presented 5it' nla4eled eamples and 5ill produe apreditor t'at minimies t'e num4er ofdisagreements

T'at is, t'e learner ould 'oose L R !su' t'at:

1 i n : fL U yiO is minimum



21/70


earning T'res'old @untions @or no5, 5e assume t'at all of t'e la4els

atually orrespond to some t'res'old

funtion fL, so yiW fL for all 1 i n.

T'erefore, t'e learner an easily =ndsome t'res'old value L R ! t'at 'as no

disagreements 5it' t'e given eamples,i.e., 1 i n : fL U yiO W *



22/70

Som e M otivating Exam ples*earning T'res'old @untions An ative learner an also find a t'res'old value L R !

su' t'at fL'as no disagreements 5it' t'e , and itan do so after reDuesting ust log)n of t'e la4els7

Compare 5it' 4inary sear'77

@or t'e target t'res'old L: if a reDuested la4el yiis X1, t'en 5e an infer t'at L i, and

t'erefore yW X1 for all Y iZ

if yiis [1, t'en L \ i, and t'erefore yW [1 for all i.

T'us, one an simply 'oose to reDuest t'e la4el of a point

iat t'e median of t'e unla4eled pointsZ t'is is guaranteedto result in an outome t'at lets t'e learner la4el at least 'alf of t'e ot'er unla4eled points.



23/70


earning T'res'old @untions T'e strategy for learning singleFvaria4le t'res'old funtions

represents a 4estFase senario for ative learning: ust log)n

la4el reDuests are needed to dedue all of t'e n la4els

0'at aspets of t'e learning pro4lem made t'is possi4le6 At any point in t'e interative proess, t'e ative learner ould al5ays

ma;e a Duery t'at results in la4eling at least'alf of t'e ot'er unla4eled points. Vie5ed anot'er 5ay, t'e Dueryeliminates at least 'alf of t'e potential lassi=ers still in ontention.

0e ruially made an assumption t'at t'e la4els y iW fL orrespondto some t'res'old funtion fL

(nfortunately, t'ese aspets do not al5ays arry over to ot'er

learning pro4lems: t'ere need not al5ays 4e Dueries t'atprovide t'e information needed for a 4inary sear'Fli;e proess,even 5'en t'e la4els perfetly orrespond to a simple funtion.



24/70


earning Interval @untions

Even in t'e ase 5'ere t'e la4els orrespond eatly tosome interval funtion fa,4, t'e ative learner may

need to reDuest all la4els in order to distinguis' 4et5een

intervals t'at inlude any partiular i, and an interval t'at inludes none of t'e i

]%as*H^.

PAlgorit'ms for Ative earning%aniel Qosep' "su, Colum4ia (niv. %issertation,

%as*H: S. %asgupta. Coarse sample ompleity 4ounds forative learning. In Advane in Neural Information


25/70


earning Interval @untions

Consider t'e follo5ing t5oFp'ase strategy for learning asingleFvaria4le interval funtion fa,4, also desri4ed in

]%as*H^. !eDuest t'e la4el of randomly 'osen i until some yi is foundsu' t'at yi W X1. If no yi W X1, t'en return t'e empty intervalfuntion.

(se t'e 4inary sear'Fli;e proedure for learning singleFvaria4le

t'res'old funtions to determine t'e interval 4oundaries a and 4,

and return fa,4. T'e ruial o4servation 4e'ind t'is algorit'm is t'at an

interval funtion an 4e desri4ed 4y t5o singleFvaria4let'res'old funtions


%as*H: S. %asgupta. Coarse sample ompleity 4ounds forative learning. In Advane in Neural Information


26/70


earning Interval @untions T'e ruial o4servation 4e'ind t'is algorit'm is t'at an interval

funtion an 4e desri4ed 4y t5o singleFvaria4le t'res'old funtions:

T'e 4inary sear' for 4 pretends t'at all points to t'e left of positivepoint i 'ave a negative la4elZ t'e 4inary sear' for a is similar.

T'e =rst p'ase of t'e algorit'm is ertainly not li;e 4inary sear',4ut it serves t'e useful purpose of identifying a starting point for4inary sear' in t'e seond p'ase.

In t'e 5orst ase, t'e algorit'm may end up Duerying every la4el4efore transitioning into t'is seond p'ase.

+ut if a signi=ant fration of t'e points are la4eled X1 4y fa,4 ,t'en t'e =rst p'ase ends Dui;ly.



27/70

Types of Active Learning

argely falls into one of t'ese t5o types: 2em4ers'ip Kuery Synt'esis

learner onstruts eamples for la4eling

StreamF+ased Ative earning Consider one unla4eled eample at a time

%eide 5'et'er to Duery its la4el or ignore it

&oolF+ased Ative earning Given: a large unla4eled pool of eamples

!an; eamples in order of informativeness

Kuery t'e la4els for t'e most informativeeample


28/70

Active Learning Scenarios

Figure taken from Burr Settles article


29/70

M em bersip !uery Syntesis

#ne of t'e earliest A senarios

T'e learner may reDuest la4els for anyunla4eled instane in t'e input spae,inluding Dueries t'at t'e learner generates de

novo, rat'er t'an t'ose sampled fromsome underlying natural distri4ution

D. Angluin. Queries and concept learning. Machine earning!

"#$%&'$("! %&)).


30/70


Kuery synt'esis is reasona4le for manypro4lems

+ut, la4eling su' ar4itrary instanes an4e a5;5ard if t'e orale is a 'uman

annotator @or eg.: 'uman orales to train a ANN to

lassify 'and5ritten 'araters

2any of t'e Dueries images generated 4y t'elearner ontained no reognia4le sym4ols, onlyarti=ial 'y4rid 'araters 5it' no semantimeaning


31/70


2em4ers'ip Dueries for N& tas;smig't reate stream of test orspee' t'at amount to gi44eris'

&roposed solutions: StreamF4ased senario

&oolF4ased senario


32/70


Innovative Appliation !o4ot Sientist eeutes a series of autonomous

4iologial eperiments to disover meta4olipat'5ays in yeast

An instane is a miture of 'emial solutions t'atonstitutes a gro5t' medium as 5ell as apartiular yeast mutant

a4el 5'et'er or not t'e mutant t'rived in t'egro5t' medium

All eperiments 5ere autonomously synt'esiedand p'ysially performed using a la4. ro4ot.

_Ffold derease in ost


33/70


*n domains +here la,els come notfrom human annotators! ,ut frome-periments such as this! uer/

s/nthesis ma/ ,e a promisingdirection for automated scienti0c

discover/


34/70


StreamF+ased Ative earning

@i ure: Slides of &i us' !ai, CSH_H*-_H*: 2a'ine earnin


35/70

Stream "based selectivesam pling

Alternative to synt'esiing Dueries

#4taining an unla4eled instane is

free or inepensive @irst sampled from t'e atual

distri4ution and t'e learner deide5'et'er or not to reDuest its la4el


36/70

Stream "based selectivesam pling

"o5 to deide 5'et'er to Duery or notto Duery an instane6 Informativeness measure or Duery strategy

!egion of unertainty &art of t'e instane spae t'at is still am4iguous

to t'e learner

Kuery only t'ose instanes t'at fall in t'e region

&art of spee' tagging earning ran;ing funtions for I!

0ord sense disam4iguation


37/70


&oolF+ased Ative earning

@i ure: Slides of &i us' !ai, CSH_H*-_H*: 2a'ine earnin


38/70

Pool"based Active Learning

Starts 5it' a small num4er of la4eled trainingset `

!eDuest la4els for 1 or more arefully seletedinstanes

@ous on diBult to la4el tuple Analogy 5it' +oosting6

@ous on most informative instane

Greedy approa'6

(ses ne5 ;no5ledge to 'oose 5'i' instanes

to Duery net Ne5ly la4eled instanes are added to t'e

la4eled set `


39/70

Pool"based sam pling

In many real 5orld pro4lems, large olletionsof unla4elled data, (, an 4e gat'ered at one

Small set of la4eled data,

( is assumed to 4e losed Instanes are Dueried in a greedy manner

aording to an informativeness measure

Tet lassi=ation, imagevideo lassi=ation

and retrieval, spee' reognition and anerdiagnosis are eamples of &oolF4ased Sampling


40/70

Pool"based sam pling

2ain dierene 5it' streamF4ased: StreamF4ased: sans t'roug' t'e data

seDuentially and ma;es Duery deisionsindividually

&oolF4ased: evaluates and ran;s t'e entireolletion 4efore seleting t'e 4est Duery

&oolF4ased senarios are more ommon7

Settings 5'ere streamF4ased is more

appropriate66 0'en memory or proessing po5er is limited,

as 5it' mo4ile and em4edded devies


41/70

Potential of Active Learning

An illustrative eample of poolF4ased ative learning A toy data set of ** instanes, evenly sampled from t5olass Gaussians entered at ? ? standard deviationW 1 A logisti regression model trained 5it' _* la4eled instanesrandomly dra5n from t'e pro4lem domain A logisti regression model trained 5it' _* atively Dueriedinstanes usingunertainty sampling .

In random seletion of _* unla4eled instanes dra5n iidFigure taken from Burr Settles article


42/70

Potential of Active Learning

Figure taken from Burr Settles article

Ative earners use 8unertainty sampling9 tofous on instanes losest to t'e deision4oundary Somet'ing similar 5e do in SV26


43/70

#ocum ent $lassification

earner 'as to distinguis' 4et5een+ASE+A ? "#CcE$ douments

)* ne5sgroups orpus

)*** (senet douments, eDuallydivided among t'e t5o lasses


44/70

#ocum ent $lassification

earning urves: 4ase4all vs. 'o;ey.Curves plot lassi=ation auray as a funtion of t'e num4er ofdouments Dueried for t5o seletion strategies: unertainty sampling and random sampling .

Figure taken from Burr Settles


45/70

Learning $urves

Ative learning algorit'ms areevaluated 4y onstruting learningurves

Evaluation metri as a funtion of t'e num4er of ne5instane Dueries t'at are la4eled andadded to `

(nertainty sampling Duery strategyvs. random sampling


46/70

%o& Active Learning ' orks(

Ative earning proeeds in rounds

Ea' round 'as a urrent model

T'e la4eled eample isare inluded in t'etraining data

T'e model is reFtrained using t'e ne5 training data

T'e proess repeat until 5e 'ave 4udget left for gettingla4els or 5e 'ave attained t'e desired auray7


47/70

!uery Selection Strategies

Any Ative earning algorit'm reDuires aDuery seletion strategy. Someeamples:

(nertainty Sampling Kuery +y Committee

Epeted 2odel C'ange

Epeted Error !edution Variane !edution

%ensity 0eig'ted 2et'ods


48/70

!uery Strategy )ram e& orks

All A senarios involve evaluatingt'e informativeness of unla4eledinstanes

2any proposed solutions forformulating su' Duery strategies

PA F most informative instane


49/70

ncertainty Sam pling

Kuery t'e event t'at t'e urrent lassi=er is mostunertain a4out

(sed trivially in SV2s, grap'ial models, et.

x x x x x x xxxx

If uncertainty is measured in Euclidean distance:

[Lewis & Gale, 1994]

Figure courtesy:

Irina Rish, IBM T! "ats#n $esearc% enter


50/70

ncertainty sam pling

Ative learner Dueries instanes a4out5'i' it is least ertain 'o5 to la4el

pro4a4ilisti model 4inary

lassi=ation unertainty samplingDueries t'e instane 5'ose posteriorpro4a4ility is lose to *.H

_ or more lass la4els:


51/70


east on=dent strategy only onsidersinformation a4out t'e most pro4a4le la4els

8T'ro5s a5ay9 information a4out remainingla4el distri4ution

Enter 2argin Sampling

Still not a good strategy for pro4lems 5it' largela4el sets


52/70


Entropy as an unertainty measure:

!edues to east on=dent and2argin sampling for 4inarylassi=ation pro4lems

All _ strategies are eDuivalentDuerying t'e instane 5it' a lassposterior losest to *.H


53/70



54/70

!uery by $om m ittee +!,$-

K+C approa' involves maintaining aommittee of models 5'i' are all trained ont'e urrent la4eled data , 4ut representompeting 'ypot'eses

Ea' ommittee mem4er is allo5ed to vote

on t'e la4elings of Duery andidates 2ost informative Duery is one a4out 5'i'

t'ey most disagree

b i + -


55/70


2inimie t'e version spae

Version spae is t'e region t'at is stillun;no5n to t'e overall model lass, i.e.,

Version spae is t'e set of 'ypot'eses t'at

are onsistent 5it' t'e urrent la4eledtraining data

In ot'er 5ords, if any t5o models of t'e samemodel lass

agree on all t'e la4eled data, 4ut disagree onsome unla4eled instane, t'en t'at instanelies 5it'in t'e region of unertainty

! b $ i +!,$-


56/70


In 2, 5e sear' for t'e 4est modelin version spae

In A, 5e try to onstrain t'e sie of

t'e version spae as mu' aspossi4le

0'y6

So t'at t'e sear' an 4e morepreise 5it' as fe5 la4eledinstanes as possi4le

! b $ i +!,$- / i S


57/70

!uery by $om m ittee+!,$- . /ersion Space

! b $ itt +!,$-


58/70


To implement K+C algorit'm, 5emust: +e a4le to onstrut a ommittee of

models t'at represent dierentregions of t'e version spae

"ave some measure of disagreementamong ommittee mem4ers

! b $ itt +!,$-


59/70


Constrution of ommittee ofmodels +oosting ? +agging

! b $ itt +!,$-


60/70


2easure of disagreement: Vote Entropy

K+C generaliation of entropyF4asedunertainty sampling


61/70

!uery by $om m ittee

&rior distri4ution over 'ypot'eses

Samples a set of lassi=ers from distri4ution

Kueries an eample 4ased on t'e degree ofdisagreement 4et5een ommittee of lassi=ers

['eun( et al 199), *reund et al 199+]

x x x x x x xxxx

B

Figure courtesy:

Irina Rish, IBM T! "ats#n $esearc% enter


62/70


0'i' unla4elled point s'ould you'oose6

Slides ,/ Bar,ara 1ngelhardt and Ale- Sh/r


63/70


$ello5 W valid 'ypot'eses



64/70


&oint on maFmargin 'yperplane doesnot redue t'e num4er of valid'ypot'eses 4y mu'



65/70


Kueries an eample 4ased on t'edegree of disagreement 4et5eenommittee of lassi=ers



66/70


&rior distri4ution overlassi=ers'ypot'eses

Sample a set of lassi=ers from distri4ution

Natural for ensem4le met'ods 5'i' arealready samples !andom forests, +agged lassi=ers, et.

2easures of disagreement Entropy of predited responses

' b S i


67/70

' eb Searcing

A Web based company wishes to gatherparticular types o pages . It employs a num4er ofpeople to 'andFla4el some 5e4 pages so as to reatea training set for an automati lassi=er t'at 5ill

eventually 4e used to lassify and etrat pages fromt'e rest of t'e 5e4.

Sine 'uman epertise is a limited resoure, t'eompany 5is'es to redue t'e num4er of pages t'eemployees 'ave to la4el. !at'er t'an la4eling pagesrandomly dra5n from t'e 5e4, t'e omputer usesative learning to reDuest targeted pages t'at it4elieves 5ill 4e most informative to la4el.

P li d E il)ilt


68/70

Personali0ed Em ail )ilter

T'e user 5is'es to reate a personaliedautomati un; email =lter

In t'e learning p'ase t'e automati learner'as aess to t'e users past email =les.

(sing ative learning, it interatively 4rings upa past email and as;s t'e user 5'et'er t'edisplayed email is un; mail or not. +ased ont'e users ans5er it 4rings up anot'er emailand Dueries t'e user

T'e proess is repeated some num4er oftimes and t'e result is an email =lter tailoredto t'at spei= person.

1ele ance feedback


69/70

1elevance feedback

T'e user 5is'es to sort t'roug' adata4ase5e4site for items t'at are of personal interestZ an 8Ill ;no5it 5'en I see it9 type of sear'

T'e omputer displays an item and t'e usertells t'e learner 5'et'er t'e item is interestingor not

+ased on t'e users ans5er t'e learner 4ringsup anot'er item from t'e data4ase. After some

num4er of Dueries t'e learner t'en returns anum4er of items in t'e data4ase t'at it4elieves 5ill 4e of interest to t'e user

Active Learning


70/70

Active Learning

"appy ACTIVE EA!NING from no5on77

Date post:	02-Mar-2018
Category:	Documents
Upload:	naveen-jaiswal
View:	215 times
Download:	0 times

Active Learning NG

Documents