+ All Categories
Home > Documents > 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Date post: 12-Jan-2016
Category:
Upload: donna-booker
View: 213 times
Download: 1 times
Share this document with a friend
38
07/04/22 CPSC503 Winter 2008 1 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini
Transcript
Page 1: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 1

CPSC 503Computational Linguistics

Lecture 2Giuseppe Carenini

Page 2: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 2

Today Sep 10• Subscribe to mailing list cpsc503

(majordomo)

• Introductions

• Brief check of some background knowledge

• English Morphology

• FSA and Morphology

• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

Page 3: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 3

Introductions• Your Name• Previous experience in NLP?• Why are you interested in NLP?• Are you thinking of NLP as your

main research area? If not, what else do you want to specialize in….

• Anything else…………

Page 4: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 4

Today Sep 10• Subscribe to mailing list cpsc503

(majordomo)

• Introductions

• Brief check of some background knowledge

• English Morphology

• FSA and Morphology

• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

Page 5: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 5

Knowledge-Formalisms Map(including some probabilistic

formalisms)

Logical formalisms (First-Order Logics)

Rule systems (and prob. versions)(e.g., (Prob.) Context-Free

Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners

Page 6: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 6

Next Two Lectures

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 7: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 7

??

b a a a ! \

0 1 2 3 4 65

b a b a ! \

0 1 2 3 4 65

Page 8: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 8

??

/CPSC50[34]/

/^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/

/[0-9]+(\.[0-9]+){3}/

Page 9: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 9

Example of Usage: Text Searching/Editing

Find me all instances of the determiner “the” in an English text. – To count them– To substitute them with something else

You try: /the/

/[tT]he/ /\bthe\b/

/\b[tT]he\b/

The other cop went to the bank but there were no people there.

s/\b([tT]he|[Aa]n?)\b/DET/

Page 10: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 10

Fundamental Relations

FSA

RegularExpression

s

ManyLinguistic

Phenomena

model

implement(generate and

recognize)

describe

Page 11: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 11

Next Two Lectures

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 12: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 12

English Morphology

• We can usefully divide morphemes into two classes– Stems: The core meaning bearing units– Affixes: Bits and pieces that adhere to

stems to change their meanings and grammatical functions

Def. The study of how words are formed from minimal meaning-bearing units (morphemes)

Examples: unhappily, ……………

Page 13: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 13

Word Classes

• For now word classes: nouns, verbs, adjectives and adverbs.

• We’ll go into the gory details in Ch 5

• Word class determines to a large degree the way that stems and affixes combine

Page 14: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 14

English Morphology

• We can also divide morphology up into two broad classes– Inflectional– Derivational

Page 15: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 15

Inflectional Morphology

• The resulting word:– Has the same word class as the

original– Serves a grammatical/semantic

purpose different from the original

Page 16: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 16

Nouns, Verbs and Adjectives (English)

• Nouns are simple (not really)– Markers for plural and possessive

• Verbs are only slightly more complex– Markers appropriate to the tense of

the verb and to the person

• Adjectives– Markers for comparative and

superlative

Page 17: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 17

Regulars and Irregulars• Some words misbehave (refuse to

follow the rules)– Mouse/mice, goose/geese, ox/oxen– Go/went, fly/flew

• Regulars…– Walk, walks, walking, walked, walked

• Irregulars– Eat, eats, eating, ate, eaten– Catch, catches, catching, caught, caught– Cut, cuts, cutting, cut, cut

Page 18: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 18

Derivational Morphology

• Derivational morphology is the messy stuff that no one ever taught you.– Changes of word class – Less Productive ( -ant V -> N only

with V of Latin origin!)

Page 19: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 19

Derivational Examples

• Verb/Adj to Noun

-ation computerize computerization

-ee appoint appointee

-er kill killer

-ness fuzzy fuzziness

Page 20: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 20

Derivational Examples

• Noun/Verb to Adj

-al Computation

Computational

-able Embrace Embraceable

-less Clue Clueless

Page 21: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 21

Compute

• Many paths are possible…• Start with compute

– Computer -> computerize -> computerization

– Computation -> computational– Computer -> computerize ->

computerizable– Compute -> computee

Page 22: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 22

Summary

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 23: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 23

FSAs and Morphology• GOAL1: recognize whether a

string is an English word

• PLAN:1. First we’ll capture the

morphotactics (the rules governing the ordering of affixes in a language)

2. Then we’ll add in the actual stems

Page 24: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 24

FSA for Portion of N Inflectional Morphology

Page 25: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 25

Adding the Stems

But it does not express that:

•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)

•Reg nouns ending –y preceded by a consonant change the –y to -i

Page 26: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 26

Small Fragment of V and N Derivational Morphology

[nouni] eg. hospital

[adjal] eg. formal

[adjous] eg. arduous

[verbj] eg. speculate

[verbk] eg. conserve

Page 27: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 27

GOAL2: Morphological Parsing/Generation (vs. Recognition)

• Recognition is usually not quite what we need. – Usually given a word we need to find: the stem

and its class and morphological features (parsing)– Or we have a stem and its class and morphological

features and we want to produce the word (production/generation)

• Examples (parsing)– From “cats” to “cat +N +PL”– From “lies” to ……

Page 28: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 28

Computational problems in Morphology

• Recognition: recognize whether a string is an English word (FSA)

• Parsing/Generation: word

stem, class, lexical features

….….

lieslie +N +PL

lie +V +3SG• Stemming:

wordstem

….

e.g.,

Page 29: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 29

Finite State Transducers• FSA cannot help….• The simple story

– Add another tape– Add extra symbols to the

transitions

– On one tape we read “cats”, on the other we write “cat +N +PL”

Page 30: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 30

FSTs

generationparsing

Page 31: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 31

(Simplified) FST formal definition(you can skip 3.4.1)

• Q: a finite set of states• I,O: input and an output alphabets

(which may include ε)• Σ: a finite alphabet of complex symbols

i:o, iI and oO

• Q0: the start state

• F: a set of accept/final states (FQ)• A transition relation δ that maps QxΣ

to 2Q

Page 32: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 32

FST can be used as…

• Translators: input one string from I, output another from O (or vice versa)

• Recognizers: input a string from IxO

• Generator: output a string from IxO

Page 33: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 33

Simple Example

Transitions (as a translator):• c:c means read a c on one tape and write a c

on the other (or vice versa)• +N:ε means read a +N symbol on one tape

and write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice

versa)

c:c a:a t:t +N:ε +PL:s

+SG: ε

Page 34: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a translator)

c a t s

+N +SGc a tlexical

lexical

surface

surface

generation

parsing

c:c a:a t:t +N:ε+PL:s

+SG: ε

04/21/23 34CPSC503 Winter 2008

Page 35: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 35

More complex Example

Transitions (as a translator):• l:l means read an l on one tape and write an l on

the other (or vice versa)• +N:ε means read a +N symbol on one tape and

write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice

versa)• …

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:ε

q1

q0

q2

q3

q4q5

q6q7

Page 36: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a translator)

l i e s

+V+3SGl i elexical

lexical

surface

surface

generation

parsing

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:ε

q1

q0

q2

q3

q4q5

q6q7

04/21/2336

CPSC503 Winter 2008

Page 37: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a recognizer and a generator)

l i e s

+V +3SGl i e

lexical

lexical

surface

surface

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:εq1

q0

q2

q3

q4q5

q6q7

04/21/23 37CPSC503 Winter 2008

Page 38: 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2008 38

Next Time

• Finish FST and morphological analysis

• Porter Stemmer• Read Chp. 3 up to 3.10 excluded(def. of FST: understand the one on slides)(3.4.1 optional)


Recommended