+ All Categories
Home > Documents > Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9...

Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9...

Date post: 02-Apr-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
60
Decision trees and their use in NLP Jan Hajič Additional Lecture to NPFL067 Fall 2018/19
Transcript
Page 1: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

Decision treesand their use in NLP

Jan HajičAdditional Lecture to NPFL067

Fall 2018/19

Page 2: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 2

Decision Trees

• Goal: Categorical or numerical predictions– i.e., classification (prevalent in NLP) / regression

• Use in NLP (examples)– Standard classification

• POS/morphological tagging CPOS: W → T• Named entity recognition NE: W → {0,1}|W|

– Modeling conditional distributions• Language modeling LM: <w1,…,wi> → Pi+1(.)• In general D: H → P(.)

– H is “history” (context)– P(.) is a set probabilistic distributions on the variable of interest

(e.g. “next word”)

Page 3: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 3

Decision Trees – the Idea

• Queries organized in a rooted tree• Queries evaluated against (input) data

– Precisely, against context of the item of interest to be classified

• Value returned → edge to be followed down the tree• (Global) answer (class, distribution) found at leaf

– Example classifier (customer helpdesk, data: customer’s answers):

Starts when switched on?

Battery inserted? Login screen reached after less than 2 minutes?

yes

no

no yes

C: missing battery C: empty battery

yes

noHard drive lamp:

off onblinking

… … …

Page 4: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 4

Goal: A Distribution

• Generalization (from a categorical answer)– Leaves contain a prob. distribution on ‘classes’

Fever (body temp. > 39C)?

Headache? Pain while swallowing?yes

no

noyes

yes

noPain over all body?

no…

Healthy 0.02Migraine 0.07Flu 0.17Tonsilitis 0.69Fracture 0.01…

Healthy 0.02Migraine 0.18Flu 0.53Tonsilitis 0.09Fracture 0.01…

Healthy 0.38Migraine 0.12Flu 0.07Tonsilitis 0.03Fracture 0.08…

yes

Page 5: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 5

Using Decision Trees

• Collect data at each query, evaluate• Follow edge, evaluate next query

• Do until leaf reached

Fever (body temp. > 39C)?

Headache? Pain while swallowing?yes

no

noyes

yes

noPain over all body?

no…

Healthy 0.02Migraine 0.07Flu 0.17Tonsilitis 0.69Fracture 0.01…

Healthy 0.02Migraine 0.18Flu 0.53Tonsilitis 0.09Fracture 0.01…

Healthy 0.38Migraine 0.12Flu 0.07Tonsilitis 0.03Fracture 0.08…

yes

Data

Temp = 39.7CData

No swallowing pain

Data

Body pain

Page 6: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 6

Constructing (Training) Decision Trees

• Set-of-Queries Acquisition– Manual (given)– (Semi-)automatic ([man.] templates → instances)

• Tree building– Machine learning (supervised)– Objective function

• Probability of data (maximize) given the (trained) tree (~ model)• MERT (Minimum error rate training)

– (If it is) hard to define probability of data (esp. categorical classifiers)

– NP complete problem • Heuristics needed (e.g., greedy search)• Approximations

– Technique: Top-down, Node-splitting

Page 7: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 7

Queries

• Binary (Yes/No)– two edges outgoing from a query node

• Is the previous word “to”?• Is there the word “car” within the same sentence?• Is the rel. unigram frequency of the prev. word > 0.05?• Is the entropy of the trigram distribution P(.|wi-2,wi-1) below 0.1

(at the position ‘i’ in the data)?

• General (discrete)– N (>2) edges outgoing from a query node

• Number of children (0, 1, 2, >2) → 4 edges down• POS of previous word (N,V,A,…,) → 10 (11, 12, …) edges• Discretization intervals: (0..0.05, ..0.10, ..1.00) → 20 edges

Page 8: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 8

Queries: Acquisition

• Defined ahead of time– Diagnosis (problem, medical), Financials, Profiling, …– NLP: POS tagging, Word Sense Disambiguation

• Template-based– Analogy:

• Brill’s TBEDL• Feature templates in MaxEnt models, perceptrons, …

– Used for most tasks in NLP• Language modeling, other models w/conditional distributions

– Two steps:• Template definition (manual)• Query Instance Generation (template “expansion”) – data-based• [Selection (cf. feature selection in MaxEnt): part of tree building]

Page 9: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 9

Tree Building: Data

• (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D|• yi – value of interest (the one being predicted)• xi – context (‘history’)

• Examples, modeling p(y|x):– Language modeling

• n-gram LM: yi – current word, xi – previous (n-1) words– POS tagging

• yi – POS tag, xi – words in sentence & tags to the left– Word Sense Disambiguation

• yi – sense of the word at position ‘i’ (from a fixed set), • xi – words +- 50 positions away from position ‘i’

– (Non-NLP:) Disease diagnostics• yi – disease, xi – vector of symptoms (numerical, categorical)

Page 10: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 10

Tree Building: the Objective Function ()

• Form– Function to maximize/minimize: : T x D →

• T – the decision tree being built, D – the training data

• Distribution-based– As usual: (Max.) Probability of data ~ (Min.) Entropy

argmaxT (T,D) = PT(D) ~ argminT (T,D) = -y,x pT(y,x) log(pT(y|x))

= -1/|D|i=1..|D| log(pT(yi|xi))

• Error-based (~Minimum Error Rate Training, MERT)

argminT (T,D) = ER = 1/|D| i=1..|D| δ(ClassifyT(xi),yi)ClassifyT: X → Y chooses y Y given context x X using T

Page 11: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 11

Tree Building: the Algorithm (~ID3)

• Using training data and the objective function:Tfinal = argminT (T,D)

• Too (exponentially) many possible trees (vs. no. of queries)

• Greedy search (Q: pool of possible queries)1. Start with ‘empty’ T (single leaf node), set 0 = (T,D)2. Iterate (iteration index k = 1..kmax); set mink = k-1

• For all leafs li in T • For all qj Q:

– Split li into a query node qj, and nj ( ≥ 2) leafs → call it Ti,j

– If (Ti,j,D) < mink: set min

k = (Ti,j,D) and remember i,j i.e. mink holds the minimal value of (Ti,j,D) so far

3. Set k = mink and T = Ti,j; repeat (2) until termination

Page 12: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 12

Tree Building: Termination

• Terminating condition(s): – For any new query splitting any leaf node of T:

• No improvement based on objective function (T,D)– i.e., mink = k-1 at the end of the iteration, for example:

» Entropy does not go down» Error rate does not go down

• Small improvement only (k-1 - mink < ) – To avoid overtraining

– Tree too big (or too deep)• To avoid overtraining, long running time; space constraints etc.

– Set kmax to desired maximum size of tree, or watch T’s depth

Page 13: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 13

Example: POS tagging

• Data:Positions: 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

• Objective function: error rate(T,D) = 1/|D| i=1..|D| δ(CT(xi),yi)

CT is the “Classify” function: Words (in/with context) → Tags

• Query pool:q1..q6: current word (wi = John, …, table)q7..q11: previous tag (ti-1 = NN, …, PRE)

Page 14: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 14

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Iteration “0”

C: NN l1

0 = (T,D) = 5/8

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 15: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 15

wi=John? q1 C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 1 min1 = 5/8 (T1,1,D) = 5/8

C: NN l1

wi=John? q1

C: NN l2

yes no

T1,1

i,j: ----

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 16: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 16

wi=can? q2

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 2 min1 = 5/8 (T1,2,D) = 5/8

C: MOD l1 C: NN l2

yes no

T1,2

i,j: ----

wi=can? q2

wi=John? q1

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 17: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 17

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 3 min1 = 5/8 (T1,3,D) = 1/2

C: VBF l1 C: NN l2

yes no

T1,3

i,j: ----

wi=bring? q3

min1 = 1/2 i,j: 1,3

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 18: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 18

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 4 min1 = 1/2 (T1,4,D) = 3/8

C: DET l1 C: NN l2

yes no

T1,4

i,j: 1,3

wi=the? q4

min1 = 3/8 i,j: 1,4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 19: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 19

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 5 min1 = 3/8 (T1,5,D) = 1/2

C: PRE l1 C: NN l2

yes no

T1,5

i,j: 1,4

wi=to? q5

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 20: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 20

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 6 min1 = 3/8 (T1,6,D) = 5/8

C: NN l1 C: NN l2

yes no

T1,6

i,j: 1,4

wi=table? q6

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 21: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 21

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 7 min1 = 3/8 (T1,7,D) = 1/2

C: MOD l1 C: NN l2

yes no

T1,7

i,j: 1,4

ti-1=NN? q7

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 22: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 22

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 8 min1 = 3/8 (T1,8,D) = 1/2

C: VBF l1 C: NN l2

yes no

T1,8

i,j: 1,4

ti-1=MOD? q8

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 23: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 23

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 9 min1 = 3/8 (T1,9,D) = 1/2

C: DET l1 C: NN l2

yes no

T1,9

i,j: 1,4

ti-1=VBF? q9

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 24: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 24

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 10 min1 = 3/8 (T1,10,D) = 1/2

C: NN l1 C: DET l2

yes no

T1,10

i,j: 1,4

ti-1=DET? q10

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Page 25: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 25

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 Leaf 1 Query 11 min1 = 3/8 (T1,11,D) = 1/2

C: DET l1 C: NN l2

yes no

T1,11

i,j: 1,4

ti-1=PRE? q11

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

End of iteration 1

Page 26: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 26

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

C: NN l1

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 1 0 = 5/8 min1 = 3/8 i,j: 1,4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

End of iteration 1

C: NN l1

C: DET l1 C: NN l2

yes no

wi=the? q4

1 = 3/8

Start of iteration 2

Iteration 2

Query q4

Leaf l1

Decreasing … go on to next iteration

Page 27: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 27

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 1 Query 1 min2 = 3/8 (T1,1,D) = 3/8

C: DET l1 C: NN l2

yes no

T1,1

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=John? q1

C: ??? l1 C: DET l3

yes no

Can we ignore the querieswith other words?

… yes(same wi)

Page 28: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 28

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 1 Query 7 min2 = 3/8 (T1,7,D) = 3/8

C: DET l1 C: NN l2

yes no

T1,7

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=NN? q7

C: ??? l1 C: DET l3

yes no

Can we ignore the querieswith tags at i-1?

… no (but see later for more powerfulheuristics)

Page 29: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 29

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 1 Query 11 min2 = 3/8 (T1,11,D) = 3/8

C: DET l1 C: NN l2

yes no

T1,11

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=PRE? q11

C: DET l1 C: DET l3

yes no

OK, so what’sthe heuristics?

Leafs with absoluteclassification certaintyneed not be split

This leaf is final.(If wordi = the, DET is the onlypossible tag.)

Page 30: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 30

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 1 min2 = 3/8 (T2,1,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,1

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=John? q1

C: NN l2 C: NN l3

yes no

Page 31: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 31

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 2 min2 = 3/8 (T2,2,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,2

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=can? q2

C: MOD l2 C: NN l3

yes no

Page 32: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 32

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 3 min2 = 3/8 (T2,3,D) = 1/4

C: DET l1 C: NN l2

yes no

T2,3

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=bring? q3

C: VBF l2 C: NN l3

yes no

i,j: 2,3 min2 = 1/4

Page 33: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 33

wi=John? q1

wi=can? q2

wi=bring? q3

wi=the? q4

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 4 min2 = 1/4 (T2,4,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,4

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=the? q4

C: ??? l2 C: NN l3

yes no

Should I look atthe same questionagain?

Queries once usedmay be excluded

…unlike in TBEDL!

Page 34: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 34

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 5 min2 = 1/4 (T2,5,D) = 1/4

C: DET l1 C: NN l2

yes no

T2,5

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=to? q5

C: PRE l2 C: NN l3

yes no

Page 35: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 35

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 6 min2 = 1/4 (T2,6,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,6

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi=table? q6

C: NN l2 C: NN l3

yes no

Page 36: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 36

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 7 min2 = 1/4 (T2,7,D) = 1/4

C: DET l1 C: NN l2

yes no

T2,7

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=NN? q7

C: MOD l2 C: NN l3

yes no

Page 37: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 37

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 8 min2 = 1/4 (T2,8,D) = 1/4

C: DET l1 C: NN l2

yes no

T2,8

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=MOD? q8

C: VBF l2 C: NN l3

yes no

Page 38: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 38

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 9 min2 = 1/4 (T2,9,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,9

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=VBF? q9

C: DET l2 C: NN l3

yes no

Page 39: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 39

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 10 min2 = 1/4 (T2,10,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,10

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=DET? q10

C: NN l2 C: NN l3

yes no

Page 40: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 40

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 11 min2 = 1/4 (T2,11,D) = 3/8

C: DET l1 C: NN l2

yes no

T2,11

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

ti-1=PRE? q11

C: DET l2 C: NN l3

yes no

Page 41: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 41

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 2 1 = 3/8 Leaf 2 Query 11 min2 = 1/4

C: DET l1 C: NN l2

yes no

i,j: 2,3

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

Query q3

Leaf l2

2 = 1/4

End of iteration 2

Page 42: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 42

C: VBF l2

wi=John? q1

wi=can? q2

wi=bring? q3

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 1

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

Start of iteration 3

min3 = 1/4

T3,1

Deletequeryused

l2 is final(no ambiguity)

wi=John? q1

C: NN l3 C: NN l4

yes no

(T3,1,D) = 1/4

Page 43: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 43

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 2

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/4

T3,2wi=can? q2

C: MOD l3 C: NN l4

yes no

(T3,2,D) = 1/4

Page 44: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 44

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 5

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/4

T3,5wi=to? q5

C: PRE l3 C: NN l4

yes no

(T3,5,D) = 1/8i,j: 3,5 min3 = 1/8

Page 45: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 45

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 6

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,6wi=table? q6

C: NN l3 C: NN l4

yes no

(T3,6,D) = 1/4

Page 46: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 46

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 7

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,7ti-1=NN? q7

C: MOD l3 C: NN l4

yes no

(T3,7,D) = 1/8

Page 47: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 47

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 8

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,8ti-1=MOD? q8

C: ??? l3 C: NN l4

yes no

(T3,8,D) = 1/4

Page 48: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 48

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 9

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,9ti-1=VBF? q9

C: ??? l3 C: NN l4

yes no

(T3,9,D) = 1/4

Page 49: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 49

C: NN l3 C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 10

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,10ti-1=DET? q10

(T3,10,D) = 1/4

Page 50: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 50

C: ??? l3 C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4 Leaf 3 Query 11

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,11ti-1=PRE? q11

(T3,11,D) = 1/4

Page 51: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 51

C: PRE l3 C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=to? q5

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 3 2 = 1/4

C: DET l1

yes no

i,j: 3,5

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2 C: NN l3

yes no

min3 = 1/8

T3,5wi=to? q5

Query q5

Leaf l3

End of iteration 3

3 = 1/8

C: PRE l3

Start of iteration 4

Page 52: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 52

C: NN l4 C: NN l5

yes no

C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 4 3 = 1/8

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2

yes no

min4 = 1/8

T4,1wi=to? q5

C: PRE l3 wi=John? q1

Leaf 4 Query 1 (T4,1,D) = 1/8

Page 53: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 53

C: MOD l4 C: NN l5

yes no

C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 4 3 = 1/8

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2

yes no

min4 = 1/8

T4,2wi=to? q5

C: PRE l3 wi=can? q2

Leaf 4 Query 2 (T4,2,D) = 1/8

Page 54: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 54

C: NN l4 C: NN l5

yes no

C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 4 3 = 1/8

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2

yes no

min4 = 1/8

T4,6wi=to? q5

C: PRE l3 wi=table? q6

Leaf 4 Query 6 (T4,6,D) = 1/8

Page 55: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 55

C: MOD l4 C: NN l5

yes no

C: NN l4

yes no

wi=John? q1

wi=can? q2

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 4 3 = 1/8

C: DET l1

yes no

i,j: ----

wi=the? q4

1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

wi = bring? q3

C: VBF l2

yes no

min4 = 1/8

T4,7

wi=to? q5

C: PRE l3 ti-1=NN? q7

Leaf 4 Query 7 (T4,7,D) = 0 i,j: 4,7 min4 = 0

Page 56: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 56

C: MOD l4 C: NN l5

yes no

yes no

wi=John? q1

wi=can? q2

wi=table? q6

ti-1=NN? q7

ti-1=MOD? q8

ti-1=VBF? q9

ti-1=DET? q10

ti-1=PRE? q11

Example: POS tagging 1 2 3 4 5 6 7 8

X: John can bring the can to the table Y: NN MOD VBF DET NN PRE DET NN

Iteration 4 4 = 0

C: DET l1

yes no

wi=the? q4

wi = bring? q3

C: VBF l2

yes no

wi=to? q5

C: PRE l3 ti-1=NN? q7

End of iteration 4

End of training

The final decision tree

Page 57: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 57

Generalized Case

• Same algorithm• Objective function – entropy:

argmin (T,D) = -y,x pT(y,x) log(pT(y|x))

= -1/|D| i=1..|D| log(pT(yi|xi))

• Effective computation– Compute only change of entropy at split node

• Savings (much) greater towards end of computation

– “Final” leaf heuristics:• Zero entropy at that leaf’s distribution

Page 58: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 58

Avoiding Overtraining (Overfitting)

• Stop growing the tree early– Set threshold for entropy gain at > 0

• Smoothing (for the generalized case)– Keep distribution at every node (not just leaves) +

weight (the “lambda”)– Smooth along the path taken from root to leaf– Train weights by EM on heldout data

• Tree prunning– Use heldout data H for (T,H) computation– Remove node if (T,H) decreases (or stays the same

← minimal complexity principle)

Page 59: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 59

Generalizations / Variants

• Historically: C4.5, C5.0– Conversion to rules at the end, prunning the rules at any “node”

(not just leaves)– Ability to use incomplete data for training (unknown values of

some attributes)– Using continuous-valued attributes

• Key: finding (effectively) the threshold t for converting to query “is valueA < t?”

– Effective runtime software for many programming languages

• Decision lists– “Narrow” graphs, allow multiple parents, special features

• in certain cases, like WSD, avoid too fine data partitioning

Page 60: Decision trees and their use in NLPpecina/courses/npfl067/slides/...2018/9 Decision Trees in NLP 9 Tree Building: Data • (Large) vector of pairs (y,x): D = (yi,xi), i = 1..|D| •

2018/9 Decision Trees in NLP 60

Further Reading

Mitchell, T. M. (1997): Machine Learning, WCB/McGraw-Hill, ISBN 0-07-042807-7, Chapter 3

Quinlan, J. R. (1986): Induction of Decision Trees. Machine Learning 1(1), 81-106.

Quinlan, J. R. (1993): C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufman.– http://rulequest.com (Quinlan’s company)– C5.0/2.09 (latest version)

Rokach, L., Maimon, O. (2005): Top-down induction of decision trees classifiers - a survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 35, no. 4, pp. 476-487, Nov. 2005

Google search


Recommended