+ All Categories
Home > Documents > The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Date post: 30-Mar-2015
Category:
Upload: sheldon-graver
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
48
The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza
Transcript
Page 1: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

The Greedy Prepend Algorithm for Decision List Induction

Deniz Yuret

Michael de la Maza

Page 2: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Overview

• Decision Lists

• Greedy Prepend Algorithm

• Opus search and UCI problems

• Version space search and secondary structure prediction

• Limited look-ahead search and Turkish morphology disambiguation

Page 3: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Introduction to Decision Lists

• Prototypical machine learning problem:– Decide democrat or republican for 435

representatives based on 16 votes.

Class Name: 2 (democrat, republican)1. handicapped-infants: 2 (y,n)2. water-project-cost-sharing: 2 (y,n)3. adoption-of-the-budget-resolution: 2 (y,n)4. physician-fee-freeze: 2 (y,n)5. el-salvador-aid: 2 (y,n)6. religious-groups-in-schools: 2 (y,n)…16. export-administration-act-south-africa: 2 (y,n)

Page 4: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Introduction to Decision Lists

• Prototypical machine learning problem:– Decide democrat or republican for 435

representatives based on 16 votes.

1. If adoption-of-the-budget-resolution = y and anti-satellite-test-ban = n and water-project-cost-sharing = y then democrat2. If physician-fee-freeze = y then republican3. If TRUE then democrat

Page 5: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Alternative Representations

• Decision trees:

Page 6: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Alternative Representations

• CNF:

• DNF:

Page 7: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Alternative Representations

• For 0 < k < n and n > 2,

k-CNF(n) U k-DNF(n) is a subset of k-DL(n)

• For 0 < k < n and n > 2,

k-DT(n) is a subset of k-CNF(n) ∩ k-DNF(n)

• k-DT(n) is a subset of k-DL(n)

Rivest 1987

Page 8: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Overview

• Decision Lists

• Greedy Prepend Algorithm

• Opus search and UCI problems

• Version space search and secondary structure prediction

• Limited look-ahead search and Turkish morphology disambiguation

Page 9: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Decision List Induction

• Start with an empty decision list or a default rule.

• Keep adding the best rule that covers the unclassified and misclassified cases.

Design Decisions:

• Where to add the new rules (front, back)

• Criteria for best rule

• Search algorithm for best rule

Page 10: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

The Greedy Prepend Algorithm

GPA(data)1. dlist = NIL2. default-class = most-common-class(data)3. rule = [ if true then default-class ]4. while gain(rule, dlist, data) > 05. do dlist = prepend(rule, dlist)6. rule = max-gain-rule(dlist, data)7. return dlist

Page 11: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

The Greedy Prepend Algorithm

• Starts with a default rule that picks the most common class

• Prepends subsequent rules to the front of the decision list

• The best rule is the one with maximum gain (increase in number of correctly classified instances)

• Several search algorithms implemented

Page 12: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Rule Search

• The default rule predicts all instances to belong to the most common category

+ -

Correct

Assignments

Partition with respect to the

Base Rule

False Assignments

Training Set

Page 13: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Rule Search

• At each step add the maximum gain rule

+ -

+

+

-

-

Partition with respect to the Decision List

Partition with respect to the

Next Rule

Page 14: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Overview

• Decision Lists

• Greedy Prepend Algorithm

• Opus search and UCI problems

• Version space search and secondary structure prediction

• Limited look-ahead search and Turkish morphology disambiguation

Page 15: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Opus Search: Simple tree

Page 16: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Opus Search: Fixed order tree

Page 17: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Opus Search: Optimal pruning

Page 18: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA-Opus on UCI Problems

Page 19: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Overview

• Decision Lists

• Greedy Prepend Algorithm

• Opus search and UCI problems

• Version space search and secondary structure prediction

• Limited look-ahead search and Turkish morphology disambiguation

Page 20: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD??????????????????????????????????????

Page 21: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????

Page 22: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????

Page 23: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????

Page 24: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????

Page 25: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD---???????????????????????????????????

Page 26: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----????????????????????????????

Page 27: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????

Page 28: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????

Page 29: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HH??????????????????????????

Page 30: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE------?

Page 31: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

A Generic Prediction Algorithm: Sequence to Structure

MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE-------

Page 32: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Rules

• The first three rules of the sequence-to-structure decision list – 58.86% performance (of 66.36%)

Page 33: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Rule 1

• Everything => Loop

Page 34: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Rule 2

        HELIX        

L4 L3 L2 L1 0 R1 R2 R3 R4

* * !GLY !GLY !ASN !GLY !PRO !PRO !PRO

      !PRO !GLY !PRO      

        !PRO        

        !SER        

               

        (Non-polaror large)        

Page 35: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Rule 3

        STRAND        

L4 L3 L2 L1 0 R1 R2 R3 R4

!LEU !ALA !ASP !ALA CYS !PRO !ARG !LEU !LEU

!LEU

!GLN 

!ASP ILE !GLN !MET  !MET

   

!GLU 

!GLY LEU !GLU  

      !PRO PHE !LYS  

      TRP !PRO  

        TYR

 (Non-Polar and Not

Charged)

   

        VAL      

        (Non-polar)      

Page 36: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA-Opus not feasible for secondary structure prediction

• 9 positions

• 20 possible amino-acids per position

• Size of rule space:– With only pos=val type attributes: 21^9– If we include disjunctions: 2^180

Page 37: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Version Space Search

Searching for a candidate rule:• Pick a random instance• If the instance is currently misclassified

and candidate rule corrects it: generalize candidate rule to include instance

• If the instance is currently correct and candidate rule changes classification: specialize candidate rule to exclude instance

Page 38: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Secondary Structure Prediction Results

• PhD 72.3

• NNSSP 71.7

• GPA 69.2

• DSC69.1

• Predator 69.0

Page 39: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Overview

• Decision Lists

• Greedy Prepend Algorithm

• Opus search and UCI problems

• Version space search and secondary structure prediction

• Limited look-ahead search and Turkish morphology disambiguation

Page 40: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Morphological Analyzer for Turkish

masalı• masal+Noun+A3sg+Pnon+Acc (= the story)• masal+Noun+A3sg+P3sg+Nom (= his story)• masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (= with

tables)

• Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing

• Oflazer, K., Hakkani-Tür, D. Z., and Tür, G. (1999) Design for a turkish treebank. EACL’99

• Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003

Page 41: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Features, IGs and Tags

• 126 unique features• 9129 unique IGs

• ∞ unique tags• 11084 distinct tags observed

in 1M word training corpus

masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

stemfeatures features

inflectional group (IG) IGderivationalboundary

tag

Page 42: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Morphological disambiguation

• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc

2. masal+Noun+A3sg+P3sg+Nom

3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

– Uzun masalı anlat Tell the long story– Uzun masalı bitti His long story ended– Uzun masalı oda Room with long table

Page 43: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Morphological disambiguation

• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc

2. masal+Noun+A3sg+P3sg+Nom

3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

Key Idea

Build a separate classifier for each feature.

Page 44: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA on Morphological Disambiguation

1. If (W = çok) and (R1 = +DA)

Then W has +Det

2. If (L1 = pek)

Then W has +Det

3. If (W = +AzI)

Then W does not have +Det

4. If (W = çok)

Then W does not have +Det

5. If TRUE

Then W has +Det

• “pek çok alanda”(R1)

• “pek çok insan”(R2)

• “insan çok daha”(R4)

Page 45: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA-Opus not feasible

Attributes for a five word window:• The exact word string (e.g. W=Ali'nin)• The lowercase version (e.g. W=ali'nin)• All suffixes (e.g. W=+n, W=+In, W=+nIn,

W=+'nIn, etc.)• Character types (e.g. Ali'nin would be

described with W=UPPER-FIRST, W=LOWER-MID, W=APOS-

MID, W=LOWERLAST)

Average 40 features per instance.

Page 46: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA limited look-ahead search

• New rules are restricted to adding one new feature to existing rules in the decision list

Page 47: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

GPA Turkish morphological disambiguation results

• Test corpus: 1000 words, hand tagged

• Accuracy: 95.87% (conf. int: 94.57-97.08)

• Better than the training data !?

Page 48: The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza.

Contributions and Future Work

• Established GPA as a competitive alternative to SVM’s, C4.5 etc.

• Need theory on why the best-gain rule does well.

• Need to study robustness to irrelevant or redundant attributes.

• Need to speed up the application of the resulting decision lists (convert to FSM?)


Recommended