Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.

Guiding Semi-Supervision with

Constraint-Driven Learning

Ming-Wei Chang ,Lev Ratinow , Dan Roth

• Semi -super vised Learning ? • Scarcity of Training Data • What are constraints ?• How/why do they help ?

Supervised learning

( X1Y1) Labelled Data

(X2-Y2)

(X3 Y3).. ……(XnYn) .

What if n is less ? .. Obtaining training data is Costly and it could be inefficient . Example : (Fraud detection / Anamoly detection)

Domain expertise helps……

Definitions • X = (X1,X2,X3,X4…………Xn)• Y = (Y1,Y2,Y3,Y4…………Yn)

• H : XY is a classifier .

f : (Cross product of X and Y ) -R set of real numbers

• The out-put of the classifier will be such y which maximizes the value of function f

• Classification function .. • It’s a linear sum of feature

functions

Motivational Interviewing

Labels : Support,Reflection,Cofrontation,Facilitate, Question

Can we exploit knowledge of constraints in Inference Phase? • Lets assume n items (observations) in sequence and p labels.. i.e., n tokens and p parts of speech or n tokens and p tags in an NER task

Brute Force : O(n power p )

Viterbi : O( N power P)

Can we go down further ? Can we further reduce our search space

Further down ?

Introducing constraints into Model• Let C1, C2 ……….CK be the constraints

• C: (Cross product of X and Y) {0,1}

• Constraints are of two types . • Hard (MUST be satisfied)• Soft (Can be relaxed)

• 1Cx is the set of sequence labels that DON’T violate the constraints

Constraints come to rescue • Lets say x out of X possible tag sequences violate the constraints .

• Search space comes from X to X-x .• How do we infer ? • Does Viterbi help us ?

Example

A B C D E F G

S1 X1 X1 X1 X1 X1 X1 X1

S2 X10 X10 X10 X10 X10 X10 X10 S3 X11 X11 X11 X11 X11 X1I X11

Motivational Interviewing :

At least ONE reflection

Soft constraints

How do we calculate distance here ?

How do we learn the parameters ?

Lars Ole Andersen. Program Analysis and Specialization for the C programming Language . PhD Thesis , DIKU , University of Copenhagen, May 1994.This is Ground Truth .

But HMM gives this. Lars Ole Andersen. Program Analysis and Specialization for the C Programming Language . PhD Thesis , DIKU , University of Copenhagen, May 1994.

Top-k inference

We only chose the few top possible sequences and add ALL of of them to training data.

The author used beam search decoding, but this can be done with any inference procedure.

From the Unlabeled sample, we label them and include them in the training data.

Choice : We may include only the high confident samples.

PitFall : Then we don’t really learn properly and miss-out some characteristics

Algorithm:

Date post:	14-Dec-2015
Category:	Documents
Upload:	brooks-tinkham
View:	215 times
Download:	0 times

Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.

Documents