+ All Categories
Home > Documents > Maxent and PosTagging

Maxent and PosTagging

Date post: 11-Nov-2014
Category:
Upload: realworld90
View: 40 times
Download: 1 times
Share this document with a friend
Description:
POS Tagging using Mallet
Popular Tags:
83
Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011
Transcript
Page 1: Maxent and PosTagging

Mallet & MaxEnt POS Tagging

Shallow Processing Techniques for NLP Ling570

November 16, 2011

Page 2: Maxent and PosTagging

Roadmap �  Mallet

�  Classifiers

�  Testing �  Resources

�  HW #8

�  MaxEnt POS Tagging �  POS Tagging as classification

�  Feature engineering �  Sequence labeling

Page 3: Maxent and PosTagging

Mallet Commands �  Mallet command types:

�  Data preparation

�  Data/model inspection �  Training

�  Classification

�  Command line scripts �  Shell scripts

�  Set up java environment

�  Invoke java programs

�  --help lists command line parameters for scripts

Page 4: Maxent and PosTagging

Mallet Data �  Mallet data instances:

�  Instance_id label f1 v1 f2 v2 …..

�  Stored in internal binary format: “vectors”

�  Binary format used by learners, decoders

�  Need to convert text files to binary format

Page 5: Maxent and PosTagging

Building & Accessing Models �  bin/mallet train-classifier --input data.vector --trainer

classifiertype –input data.vector- -training-portion 0.9 --output-classifier OF �  Builds classifier model

�  Can also store model, produce scores, confusion matrix, etc

Page 6: Maxent and PosTagging

Building & Accessing Models �  bin/mallet train-classifier --input data.vector --trainer

classifiertype --training-portion 0.9 --output-classifier OF �  Builds classifier model

�  Can also store model, produce scores, confusion matrix, etc

�  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc

Page 7: Maxent and PosTagging

Building & Accessing Models �  bin/mallet train-classifier --input data.vector --trainer

classifiertype - -training-portion 0.9 --output-classifier OF �  Builds classifier model

�  Can also store model, produce scores, confusion matrix, etc

�  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc

�  --report: train:accuracy, test:f1:en

Page 8: Maxent and PosTagging

Building & Accessing Models �  bin/mallet train-classifier --input data.vector --trainer

classifiertype - -training-portion 0.9 --output-classifier OF �  Builds classifier model

�  Can also store model, produce scores, confusion matrix, etc

�  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc

�  --report: train:accuracy, test:f1:en

�  Can also use pre-split training & testing files �  e.g. output of vectors2vectors

�  --training-file, --testing-file

Page 9: Maxent and PosTagging

Building & Accessing Models �  bin/mallet train-classifier --input data.vector --trainer

classifiertype - -training-portion 0.9 --output-classifier OF �  Builds classifier model

�  Can also store model, produce scores, confusion matrix, etc �  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc �  --report: train:accuracy, test:f1:en

�  Confusion Matrix, row=true, column=predicted accuracy=1.0 �  label 0 1 |total �  0 de 1 . |1 �  1 en . 1 |1 �  Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 �  Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0

Page 10: Maxent and PosTagging

Accessing Classifiers �  classifier2info --classifier maxent.model

�  Prints out contents of model file

Page 11: Maxent and PosTagging

Accessing Classifiers �  classifier2info --classifier maxent.model

�  Prints out contents of model file

�  FEATURES FOR CLASS en

�  <default> -0.036953801963395115

�  book 0.004605219133228236

�  the 0.24270652500835088

�  i 0.004605219133228236

Page 12: Maxent and PosTagging

Testing �  Use new data to test a previously built classifier

�  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model

Page 13: Maxent and PosTagging

Testing �  Use new data to test a previously built classifier

�  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model �  Also instance file, directories: classify-file, classify-dir

Page 14: Maxent and PosTagging

Testing �  Use new data to test a previously built classifier

�  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model �  Also instance file, directories: classify-file, classify-dir �  Prints class,score matrix

Page 15: Maxent and PosTagging

Testing �  Use new data to test a previously built classifier

�  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model �  Also instance file, directories: classify-file, classify-dir �  Prints class,score matrix

�  Inst_id class1 score1 class2 score2 �  array:0 en 0.995 de 0.0046 �  array:1 en 0.970 de 0.0294 �  array:2 en 0.064 de 0.935 �  array:3 en 0.094 de 0.905

Page 16: Maxent and PosTagging

General Use �  bin/mallet import-svmlight --input

svmltrain.vectors.txt --output svmltrain.vectors �  Builds binary representation from feature:value pairs

Page 17: Maxent and PosTagging

General Use �  bin/mallet import-svmlight --input

svmltrain.vectors.txt --output svmltrain.vectors �  Builds binary representation from feature:value pairs

�  bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model �  Trains MaxEnt classifier and stores model

Page 18: Maxent and PosTagging

General Use �  bin/mallet import-svmlight --input

svmltrain.vectors.txt --output svmltrain.vectors �  Builds binary representation from feature:value pairs

�  bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model �  Trains MaxEnt classifier and stores model

�  bin/mallet classify-svmlight --input svmltest.vectors.txt --output - --classifier svml.model �  Tests on the new data

Page 19: Maxent and PosTagging

Other Information �  Website:

�  Download and documentation (such as it is)

�  http://mallet.cs.umass.edu

Page 20: Maxent and PosTagging

Other Information �  Website:

�  Download and documentation (such as it is)

�  http://mallet.cs.umass.edu

�  API tutorial: �  http://mallet.cs.umass.edu/mallet-tutorial.pdf

Page 21: Maxent and PosTagging

Other Information �  Website:

�  Download and documentation (such as it is) �  http://mallet.cs.umass.edu

�  API tutorial: �  http://mallet.cs.umass.edu/mallet-tutorial.pdf

�  Local guide (refers to older version 0.4) �  http://courses.washington.edu/ling572/winter07/

homework/mallet_guide.pdf

Page 22: Maxent and PosTagging

HW #8

Page 23: Maxent and PosTagging

Goals �  Get experience with Mallet

�  Import data

�  Build and evaluate classifiers

Page 24: Maxent and PosTagging

Goals �  Get experience with Mallet

�  Import data

�  Build and evaluate classifiers

�  Build your own text classification systems w/Mallet �  20 Newsgroups data

�  Build your own feature extractor �  Train and test classifiers

Page 25: Maxent and PosTagging

Text Classification �  Q1: Build representations of 20 Newsgroups data

�  Use mallet built-in functions

�  text2vectors --input dropbox…/20_newsgroups/* --skip-headers --output news3.vectors

�  Q2: Do the same thing but build your own featues

Page 26: Maxent and PosTagging

Feature Creation �  Skip headers

�  Read data only from first blank line

�  Simple Tokenization: �  Convert a non-alphabetic chars ([a-zA-Z]) to white

space �  Convert everything to lowercase �  Split tokens on white space

�  Feature values �  Frequencies of tokens in documents

Page 27: Maxent and PosTagging

Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293

Lines: 38

[email protected] wrote:

: In article <[email protected]>, [email protected] (Steve Manes) writes:

Due to F. Xia

Page 28: Maxent and PosTagging

Tokenized Example [email protected] wrote:

:In article<[email protected]>, [email protected](SteveManes) writes:

hambidge bms com wrote

In article c psog c magpie linknet com manes magpie linknet com stevemanes writes

Due to F. Xia

Page 29: Maxent and PosTagging

Example Feature Vector �  guns a:11 about:2 absurd:1 again:1 an:1 and:5

any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 ….

Due to F. Xia

Page 30: Maxent and PosTagging

MaxEnt POS Tagging

Page 31: Maxent and PosTagging

N-gram POS tagging

argmaxt1n P(t1

n |w1n ) = P(wi | ti )P(ti

i! | ti"1i"n+1)

Bigram Model: P(wi | ti )P(ti

i! | ti"1)

Trigram Model: P(wi | ti )P(ti

i! | ti"1, ti"2 )

Page 32: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs?

Page 33: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs? �  What units are classified?

Page 34: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs? �  What units are classified?

�  Words

�  What are the classes?

Page 35: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs? �  What units are classified?

�  Words

�  What are the classes? �  POS tags

Page 36: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs? �  What units are classified?

�  Words

�  What are the classes? �  POS tags

�  What information should we use?

Page 37: Maxent and PosTagging

MaxEnt POS Tagging �  POS tagging as classification

�  What are the inputs? �  What units are classified?

�  Words

�  What are the classes? �  POS tags

�  What information should we use? �  Consider the ngram model

Page 38: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS?

Page 39: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

Page 40: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

�  Previous two tags: t-2t-1

Page 41: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

�  Previous two tags: t-2t-1

�  What other feature templates could be useful?

Page 42: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

�  Previous two tags: t-2t-1

�  What other feature templates could be useful? �  More word context

Page 43: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

�  Previous two tags: t-2t-1

�  What other feature templates could be useful? �  More word context

�  Previous: w-1;Pre-pre: w-2; Next: w+1;….

�  Word bigram: w-1w0

Page 44: Maxent and PosTagging

POS Feature Representation �  Feature templates

�  What feature templates correspond to trigram POS? �  Current word: w0

�  Previous two tags: t-2t-1

�  What other feature templates could be useful? �  More word context

�  Previous: w-1;Pre-pre: w-2; Next: w+1;….

�  Word bigram: w-1w0

�  Backoff tag context: t-1

Page 45: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

x2 (flies)

x3 (like)

Page 46: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s>

x2 (flies)

Time

x3 (like)

flies

Page 47: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time

x2 (flies)

Time flies

x3 (like)

flies like

Page 48: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time

x2 (flies)

Time flies Time flies

x3 (like)

flies like flies like

Page 49: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies

x2 (flies)

Time flies Time flies like

x3 (like)

flies like flies like an

Page 50: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS

x2 (flies)

Time flies Time flies like N

x3 (like)

flies like flies like an N

Page 51: Maxent and PosTagging

Feature Templates �  Time flies like an arrow

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

Page 52: Maxent and PosTagging

Feature Templates

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

In mallet:

Page 53: Maxent and PosTagging

Feature Templates

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1

Page 54: Maxent and PosTagging

Feature Templates

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1

Page 55: Maxent and PosTagging

Feature Templates

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1

Page 56: Maxent and PosTagging

MaxEnt Feature Template �  Words:

�  Current word: w0

�  Previous word: w-1 �  Word two back: w-2

�  Next word: w+1 �  Next next word: w+2

�  Tags: �  Previous tag: t-1 �  Previous tag pair: t-2t-1

�  How many features?

Page 57: Maxent and PosTagging

MaxEnt Feature Template �  Words:

�  Current word: w0

�  Previous word: w-1 �  Word two back: w-2

�  Next word: w+1 �  Next next word: w+2

�  Tags: �  Previous tag: t-1 �  Previous tag pair: t-2t-1

�  How many features? 5|V|+|T|+|T|2

Page 58: Maxent and PosTagging

Unknown Words �  How can we handle unknown words?

Page 59: Maxent and PosTagging

Unknown Words �  How can we handle unknown words?

�  Assume rare words in training similar to unknown test

�  What similarities can we exploit?

Page 60: Maxent and PosTagging

Unknown Words �  How can we handle unknown words?

�  Assume rare words in training similar to unknown test

�  What similarities can we exploit? �  Similar in link between spelling/morphology and POS

�  -able: à JJ

�  -tion àNN

�  -ly à RB

�  Case: John à NP, etc

Page 61: Maxent and PosTagging

Representing Orthographic Patterns

�  How can we represent morphological patterns as features?

Page 62: Maxent and PosTagging

Representing Orthographic Patterns

�  How can we represent morphological patterns as features? �  Character sequences

�  Which sequences?

Page 63: Maxent and PosTagging

Representing Orthographic Patterns

�  How can we represent morphological patterns as features? �  Character sequences

�  Which sequences? Prefixes/suffixes �  e.g. suffix(wi)=ing or prefix(wi)=well

Page 64: Maxent and PosTagging

Representing Orthographic Patterns

�  How can we represent morphological patterns as features? �  Character sequences

�  Which sequences? Prefixes/suffixes �  e.g. suffix(wi)=ing or prefix(wi)=well

�  Specific characters or character types �  Which?

Page 65: Maxent and PosTagging

Representing Orthographic Patterns

�  How can we represent morphological patterns as features? �  Character sequences

�  Which sequences? Prefixes/suffixes �  e.g. suffix(wi)=ing or prefix(wi)=well

�  Specific characters or character types �  Which?

�  is-capitalized

�  is-hyphenated

Page 66: Maxent and PosTagging

MaxEnt Feature Set

Page 67: Maxent and PosTagging

Rare Words & Features �  Intuition:

�  Rare words = infrequent words in training �  What qualifies as “Rare”?

Page 68: Maxent and PosTagging

Rare Words & Features �  Intuition:

�  Rare words = infrequent words in training �  What qualifies as “Rare”? 5 in paper

�  Uncommon words better represented by spelling

Page 69: Maxent and PosTagging

Rare Words & Features �  Intuition:

�  Rare words = infrequent words in training �  What qualifies as “Rare”? 5 in paper

�  Uncommon words better represented by spelling �  Spelling could generalize

�  Specific words would be undertrained

�  Intuition: �  Rare features = features less than X times in training

Page 70: Maxent and PosTagging

Rare Words & Features �  Intuition:

�  Rare words = infrequent words in training �  What qualifies as “Rare”? 5 in paper

�  Uncommon words better represented by spelling �  Spelling could generalize �  Specific words would be undertrained

�  Intuition: �  Rare features = features less than X times in training �  Infrequent features unlikely to be informative �  Skip

Page 71: Maxent and PosTagging

Examples

�  well-heeled: rare word

Page 72: Maxent and PosTagging

Examples

�  well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS-IN:1

Page 73: Maxent and PosTagging

Finding Features �  In training, where do features come from?

�  Where do features come from in testing?

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

Page 74: Maxent and PosTagging

Finding Features �  In training, where do features come from?

�  Where do features come from in testing? �  tag features come from classification of prior word

w-1 w0 w-1w0 w+1 t-1 y

x1(Time)

<s> Time <s>Time flies BOS N

x2 (flies)

Time flies Time flies like N N

x3 (like)

flies like flies like an N V

Page 75: Maxent and PosTagging

Sequence Labeling

Page 76: Maxent and PosTagging

Sequence Labeling �  Goal: Find most probable labeling of a sequence

�  Many sequence labeling tasks �  POS tagging

�  Word segmentation �  Named entity tagging �  Story/spoken sentence segmentation

�  Pitch accent detection �  Dialog act tagging

Page 77: Maxent and PosTagging

Solving Sequence Labeling

Page 78: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

Page 79: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

�  Via classification: Use classification algorithm �  Issue: What about tag features?

Page 80: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

�  Via classification: Use classification algorithm �  Issue: What about tag features?

�  Features that use class labels – depend on classification

�  Solutions:

Page 81: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

�  Via classification: Use classification algorithm �  Issue: What about tag features?

�  Features that use class labels – depend on classification

�  Solutions: �  Don’t use features that depend on class labels (loses info)

Page 82: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

�  Via classification: Use classification algorithm �  Issue: What about tag features?

�  Features that use class labels – depend on classification

�  Solutions: �  Don’t use features that depend on class labels (loses info)

�  Use other process to generate class labels, then use

Page 83: Maxent and PosTagging

Solving Sequence Labeling �  Direct: Use a sequence labeling algorithm

�  E.g. HMM, CRF, MEMM

�  Via classification: Use classification algorithm �  Issue: What about tag features?

�  Features that use class labels – depend on classification �  Solutions:

�  Don’t use features that depend on class labels (loses info) �  Use other process to generate class labels, then use �  Perform incremental classification to get labels, use labels

as features for instances later in sequence


Recommended