+ All Categories
Home > Documents > 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Date post: 21-Jan-2016
Category:
Upload: dustin-maxwell
View: 218 times
Download: 0 times
Share this document with a friend
40
06/18/22 CPSC503 Winter 2010 1 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini
Transcript
Page 1: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 1

CPSC 503Computational Linguistics

Lecture 2Giuseppe Carenini

Page 2: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 2

Today Sep 14• Subscribe to mailing list cpsc503

(majordomo)

• Questionnaire

• Brief check of some background knowledge (& annotated corpora)

• English Morphology

• FSA and Morphology

• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

Page 3: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 3

Finite state machinesRegular Expressions & Finite State Automata 6.7Finite State Transducers 2.0Hidden-Markov Models 4.2Basic Probability, Bayesian Statistics and Information TheoryConditional Probability Programming 7.2 JavaBayesian Networks 6.5

5.4 Python Entropy 5.4 3.4 Dynamic ProgrammingMachine Learning 5.7Supervised Classification (e.g., Decision Trees) Search Algorithms 4.5 6.0Unsupervised Learning (e.g., clustering) Linguistics 4.3 2.4Richer FormalismsContext-Free Grammar 4.3First-Order Logics

5.4

Page 4: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 4

Today Sep 14• Brief check of some background

knowledge

• English Morphology

• FSA and Morphology

• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

Page 5: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 5

Knowledge-Formalisms Map(including probabilistic formalisms)

Logical formalisms (First-Order Logics, Prob. Logics)

Rule systems (and prob. versions)

(e.g., (Prob.) Context-Free Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners (MDP Markov Decision Processes)

Page 6: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 6

Next Two Lectures

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 7: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 7

??

b a a a ! \

0 1 2 3 4 65

b a b a ! \

0 1 2 3 4 65

Page 8: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 8

??

/CPSC50[34]/

/^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/

/[0-9]+(\.[0-9]+){3}/

Page 9: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 9

Fundamental Relations

FSA

RegularExpression

s

ManyLinguistic

Phenomena

model

implement(generate and

recognize)

describe

Page 10: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 10

Second Usage of RegExp: Text Searching/Editing

Find me all instances of the determiner “the” in an English text. – To count them– To substitute them with something else

You try: /the/

/[tT]he/ /\bthe\b/

/\b[tT]he\b/

The other cop went to the bank but there were no people there.

s/\b([tT]he|[Aa]n?)\b/DET/

Page 11: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Annotated Corpora• Example The CoNLL corpora provide

chunk structures, which are encoded as flat trees.

• The CoNLL 2000 Corpus includes ***phrasal chunks***

• The CoNLL 2002 Corpus includes ***named entity chunks***.

• http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html

04/21/23 CPSC503 Winter 2010 11

Page 12: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 12

Next Two Lectures

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 13: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 13

English Morphology

• We can usefully divide morphemes into two classes– Stems: The core meaning bearing units– Affixes: Bits and pieces that adhere to

stems to change their meanings and grammatical functions

Def. The study of how words are formed from minimal meaning-bearing units (morphemes)

Examples: unhappily, ……………

Page 14: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 14

Word Classes

• For now word classes: nouns, verbs, adjectives and adverbs.

• We’ll go into the gory details in Ch 5

• Word class determines to a large degree the way that stems and affixes combine

Page 15: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 15

English Morphology

• We can also divide morphology up into two broad classes– Inflectional– Derivational

Page 16: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 16

Inflectional Morphology

• The resulting word:– Has the same word class as the

original– Serves a grammatical/semantic

purpose different from the original

Page 17: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 17

Nouns, Verbs and Adjectives (English)

• Nouns are simple (not really)– Markers for plural and possessive

• Verbs are only slightly more complex– Markers appropriate to the tense of

the verb and to the person

• Adjectives– Markers for comparative and

superlative

Page 18: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 18

Regulars and Irregulars• Some words misbehave (refuse to

follow the rules)– Mouse/mice, goose/geese, ox/oxen– Go/went, fly/flew

• Regulars…– Walk, walks, walking, walked, walked

• Irregulars– Eat, eats, eating, ate, eaten– Catch, catches, catching, caught, caught– Cut, cuts, cutting, cut, cut

Page 19: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 19

Derivational Morphology

• Derivational morphology is the messy stuff that no one ever taught you.– Changes of word class – Less Productive ( -ant V -> N only

with V of Latin origin!)

Page 20: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 20

Derivational Examples

• Verb/Adj to Noun

-ation computerize computerization

-ee appoint appointee

-er kill killer

-ness fuzzy fuzziness

Page 21: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 21

Derivational Examples

• Noun/Verb to Adj

-al Computation

Computational

-able Embrace Embraceable

-less Clue Clueless

Page 22: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 22

Compute

• Many paths are possible…• Start with compute

– Computer -> computerize -> computerization

– Computation -> computational– Computer -> computerize ->

computerizable– Compute -> computee

Page 23: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 23

Summary

State Machines (no prob.)• Finite State Automata

(and Regular Expressions)

• Finite State Transducers

(English)Morpholo

gy

Logical formalisms (First-Order Logics)

Rule systems (and prob. version)(e.g., (Prob.) Context-Free

Grammars)

Syntax

PragmaticsDiscourse and

Dialogue

Semantics

AI planners

Page 24: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 24

FSAs and Morphology• GOAL1: recognize whether a

string is an English word

• PLAN:1. First we’ll capture the

morphotactics (the rules governing the ordering of affixes in a language)

2. Then we’ll add in the actual stems

Page 25: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 25

FSA for Portion of Noun Inflectional Morphology

Page 26: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 26

Adding the Stems

But it does not express that:

•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)

•Reg nouns ending –y preceded by a consonant change the –y to -i

Page 27: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 27

Small Fragment of V and N Derivational Morphology

[nouni] eg. hospital

[adjal] eg. formal

[adjous] eg. arduous

[verbj] eg. speculate

[verbk] eg. conserve

Page 28: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 28

GOAL2: Morphological Parsing/Generation (vs. Recognition)

• Recognition is usually not quite what we need. – Usually given a word we need to find: the stem

and its class and morphological features (parsing)– Or we have a stem and its class and morphological

features and we want to produce the word (production/generation)

• Examples (parsing)– From “cats” to “cat +N +PL”– From “lies” to ……

Page 29: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 29

Computational problems in Morphology

• Recognition: recognize whether a string is an English word (FSA)

• Parsing/Generation: word

stem, class, lexical features

….….

lieslie +N +PL

lie +V +3SG• Stemming:

wordstem

….

e.g.,

Page 30: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 30

Finite State Transducers• FSA cannot help….• The simple story

– Add another tape– Add extra symbols to the

transitions

– On one tape we read “cats”, on the other we write “cat +N +PL”

Page 31: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 31

FSTs

generationparsing

Page 32: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 32

(Simplified) FST formal definition(you can skip 3.4.1 unless you want to work on

FST)

• Q: a finite set of states• I,O: input and an output alphabets

(which may include ε)• Σ: a finite alphabet of complex symbols

i:o, iI and oO

• Q0: the start state

• F: a set of accept/final states (FQ)• A transition relation δ that maps QxΣ

to 2Q

Page 33: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 33

FST can be used as…

• Translators: input one string from I, output another from O (or vice versa)

• Recognizers: input a string from IxO

• Generator: output a string from IxO

Page 34: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 34

Simple Example

Transitions (as a translator):• c:c means read a c on one tape and write a c

on the other (or vice versa)• +N:ε means read a +N symbol on one tape

and write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice

versa)

c:c a:a t:t +N:ε +PL:s

+SG: ε

Page 35: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a translator)

c a t s

+N +SGc a tlexical

lexical

surface

surface

generation

parsing

c:c a:a t:t +N:ε+PL:s

+SG: ε

04/21/23 35CPSC503 Winter 2010

Page 36: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 36

Slightly More complex Example

Transitions (as a translator):• l:l means read an l on one tape and write an l on

the other (or vice versa)• +N:ε means read a +N symbol on one tape and

write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice

versa)• …

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:ε

q1

q0

q2

q3

q4q5

q6q7

Page 37: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a translator)

l i e s

+V+3SGl i elexical

lexical

surface

surface

generation

parsing

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:ε

q1

q0

q2

q3

q4q5

q6q7

04/21/2337

CPSC503 Winter 2010

Page 38: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Examples (as a recognizer and a generator)

l i e s

+V +3SGl i e

lexical

lexical

surface

surface

+3SG:s

l:l i:i e:e +N:ε +PL:s

+V:εq1

q0

q2

q3

q4q5

q6q7

04/21/23 38CPSC503 Winter 2010

Page 39: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC 503 – Winter 2010 39

Introductions• Your Name• Previous experience in NLP?• Why are you interested in NLP?• Are you thinking of NLP as your

main research area? If not, what else do you want to specialize in….

• Anything else…………

Page 40: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

04/21/23 CPSC503 Winter 2010 40

Next Time

• Finish FST and morphological analysis

• Porter Stemmer• Read Chp. 3 up to 3.10 excluded(def. of FST: understand the one on slides)(3.4.1 optional)

Assignment-1 will be out today (due Sept21)


Recommended