+ All Categories
Home > Documents > CSA2050: Introduction to Computational Linguistics

CSA2050: Introduction to Computational Linguistics

Date post: 21-Mar-2016
Category:
Upload: vilina
View: 83 times
Download: 5 times
Share this document with a friend
Description:
CSA2050: Introduction to Computational Linguistics. Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995). 3 Approaches to Tagging. Rule-Based Tagger: ENGTWOL Tagger (Voutilainen 1995) Stochastic Tagger: HMM-based Tagger - PowerPoint PPT Presentation
Popular Tags:
24
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Transcript
Page 1: CSA2050: Introduction to Computational Linguistics

CSA2050:Introduction to Computational

Linguistics

Part of Speech (POS) Tagging II

Transformation Based Tagging Brill (1995)

Page 2: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 2

3 Approaches to Tagging

1. Rule-Based Tagger: ENGTWOL Tagger(Voutilainen 1995)

2. Stochastic Tagger: HMM-based Tagger

3. Transformation-Based Tagger: Brill Tagger(Brill 1995)

Page 3: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 3

Transformation-Based Tagging

A combination of rule-based and stochastic tagging methodologies: like the rule-based tagging because rules are

used to specify tags in a certain environment; like stochastic tagging, because machine

learning is used. uses Transformation-Based Learning (TBL)

Input: tagged corpus dictionary (with most frequent tags)

Page 4: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 4

Transformation-Based Tagging

Basic Process: Set the most probable tag for each word as a

start value, e.g. tag all “race” with NNP(NN|race) = .98P(VB|race) = .02

The set of possible transformations is limited by using a fixed number of rule templates,

containing slots and allowing a fixed number of fillers to fill the slots

Page 5: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 5

Transformation Based Error Driven Learningunannotated

textinitialstate

annotatedtext

TRUTH learner

transformationrulesdiagram after Brill (1996)

retag

Page 6: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 6

TBL Requirements

Initial State Annotator List of allowable transformations Scoring function Search strategy

Page 7: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 7

Initial State Annotation

Input Corpus Dictionary Frequency counts for each entry

Output Corpus tagged with most frequent tags

Page 8: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 8

Transformations

Each transformation comprises A source tag A target tag A triggering environmentExample NN VB Previous tag is TO

Page 9: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 9

More Examples

Source tag Target Tag Triggering Environment

NN VB previous tag is TO

VBP VB one of the three previous tags is MD

JJR RBR next tag is JJ

VBP VB one of the two previous words is n’t

Page 10: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 10

TBL Requirements

Initial State Annotator List of allowable transformations Scoring function Search strategy

Page 11: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 11

Rule Templates- triggering environments

Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3

1 *2 *3 *4 *5 *6 *7 *8 *9 *

Page 12: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 12

Set of Possible Transformations

The set of possible transformations is enumerated by allowing

every possible tag or word in every possible slot in every possible schema

This set can get quite large

Page 13: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 13

Rule Types and InstancesBrill’s Templates

• Each rule begins with change tag a to tag b• The variables a,b,z,w range over POS tags• All possible variable substitutions are considered

Page 14: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 14

TBL Requirements

Initial State Annotator List of allowable transformations Scoring function Search strategy

Page 15: CSA2050: Introduction to Computational Linguistics

February 2007 CSA3050: Tagging III and Chunking 15

Scoring Function

For a given tagging state of the corpusFor a given transformation For every word position in the corpus

If the rule applies and yields a correct tag, increment score by 1

If the rule applies and yields an incorrect tag, decrement score by 1

Page 16: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 16

The Basic Algorithm

Label every word with its most likely tag Repeat the following until a stopping

condition is reached. Examine every possible transformation, selecting

the one that results in the most improved tagging Retag the data according to this rule Append this rule to output list

Return output list

Page 17: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 17

Examples of learned rules

Page 18: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 18

TBL: Remarks Execution Speed: TBL tagger is slower than

HMM approach. Learning Speed is slow: Brill’s implementation

over a day (600k tokens)BUT … Learns small number of simple, non-

stochastic rules Can be made to work faster with Finite

State Transducers

Page 19: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 19

Tagging Unknown Words New words added to (newspaper) language

20+ per month Plus many proper names … Increases error rates by 1-2% Methods

Assume the unknowns are nouns. Assume the unknowns have a probability

distribution similar to words occurring once in the training set.

Use morphological information, e.g. words ending with –ed tend to be tagged VBN.

Page 20: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 20

Evaluation

The result is compared with a manually coded “Gold Standard” Typically accuracy reaches 95-97% This may be compared with the result for a

baseline tagger (one that uses no context).

Important: 100% accuracy is impossible even for human annotators.

Page 21: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 21

A word of caution

95% accuracy: every 20th token wrong 96% accuracy: every 25th token wrong

an improvement of 25% from 95% to 96% ??? 97% accuracy: every 33th token wrong 98% accuracy: every 50th token wrong

Page 22: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 22

How much training data is needed?

When working with the STTS (50 tags) we observed

a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens,

a slight increase in accuracy when testing on up to 100´000 tokens,

hardly any increase thereafter.

Page 23: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 23

Summary

Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously.

Learning and tagging are simple, intuitive and understandable.

Transformation-based learning has also been applied to sentence parsing.

Page 24: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 24

The Three Approaches Compared Rule Based

Hand crafted rules It takes too long to come up with good rules Portability problems

Stochastic Find the sequence with the highest probability – Viterbi Algorithm Result of training not accessible to humans Large volume of intermediate results

Transformation Rules are learned Small number of rules Rules can be inspected and modified by humans


Recommended