+ All Categories
Home > Documents > Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state...

Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state...

Date post: 27-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
81
3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb
Transcript
Page 1: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 1/

Morphology and Finite-state

Transducers Part 2

ICS 482: Natural Language

Processing

Lecture 6Husni Al-Muhtaseb

Page 2: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 2/

ICS 482: Natural Language

Processing

Lecture 6

Morphology and Finite-state

Transducers Part 2Husni Al-Muhtaseb

بسم هللا الرحمن الرحيم

Page 3: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

NLP Credits and Acknowledgment

These slides were adapted from presentations of the Authors of the

bookSPEECH and LANGUAGE PROCESSING:

An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

and some modifications from presentations found in the WEB by

several scholars including the following

Page 4: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

NLP Credits and Acknowledgment

If your name is missing please contact me

muhtaseb

At

Kfupm.

Edu.

sa

Page 5: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

NLP Credits and AcknowledgmentHusni Al-Muhtaseb

James Martin

Jim Martin

Dan Jurafsky

Sandiway Fong

Song young in

Paula Matuszek

Mary-Angela Papalaskari

Dick Crouch

Tracy Kin

L. VenkataSubramaniam

Martin Volk

Bruce R. Maxim

Jan Hajič

Srinath Srinivasa

Simeon Ntafos

Paolo Pirjanian

Ricardo Vilalta

Tom Lenaerts

Heshaam Feili

Björn Gambäck

Christian KorthalsThomas G. DietterichDevikaSubramanianDumindaWijesekeraLee McCluskeyDavid J. Kriegman

Kathleen McKeown

Michael J. Ciaraldi

David Finkel

Min-Yen Kan

Andreas Geyer-Schulz

Franz J. Kurfess

Tim Finin

Nadjet Bouayad

Kathy McCoy

Hans Uszkoreit

Azadeh Maghsoodi

Khurshid Ahmad

Staffan Larsson

Robert Wilensky

Feiyu Xu

Jakub Piskorski

Rohini Srihari

Mark Sanderson

Andrew Elks

Marc Davis

Ray Larson

Jimmy Lin

Marti Hearst

Andrew McCallum

Nick Kushmerick

Mark Craven

Chia-Hui Chang

Diana Maynard

James Allan

Martha Palmerjulia hirschbergElaine RichChristof MonzBonnie J. DorrNizar HabashMassimo PoesioDavid Goss-GrubbsThomas K HarrisJohn HutchinsAlexandrosPotamianosMike RosnerLatifa Al-SulaitiGiorgio SattaJerry R. HobbsChristopher ManningHinrich SchützeAlexander GelbukhGina-Anne LevowGuitao GaoQing MaZeynep Altan

Page 6: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 6/

Previous Lectures

• 1 Pre-start questionnaire

• 2 Introduction and Phases of an NLP system

• 2 NLP Applications

• 3 Chatting with Alice

• 3 Regular Expressions, Finite State Automata

• 3 Regular languages

• 4 Regular Expressions & Regular languages

• 4 Deterministic & Non-deterministic FSAs

• 5 Morphology: Inflectional & Derivational

• 5 Parsing

Page 7: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 7/

Today’s Lecture

• Review of Morphology

• Finite State Transducers

• Stemming & Porter Stemmer

Page 8: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 8/

Reminder: Quiz 1 Next class

• Next time: Quiz

– Ch 1!, 2, & 3 (Lecture presentations)

– Do you need a sample quiz?

• What is the difference between a sample and a template?

• Let me think – It might appear at the WebCt site on late

Saturday.

Page 9: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 9/

Introduction

State Machines (no probability)

• Finite State Automata (and

Regular Expressions)

• Finite State Transducers

(English)

Morphology

Logical formalisms

(First-Order Logics)

Rule systems (and prob. version)

(e.g., (Prob.) Context-Free Grammars)

Syntax

Pragmatics

Discourse and

Dialogue

Semantics

AI planners

Page 10: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 10/

English Morphology

• Morphology is the study of the ways that

words are built up from smaller meaningful

units called morphemes

• morpheme classes

– Stems: The core meaning bearing units

– Affixes: Adhere to stems to change their

meanings and grammatical functions

– Example: unhappily

Page 11: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 11/

English Morphology

• We can also divide morphology up into two

broad classes

– Inflectional

– Derivational

• Non English

– Concatinative Morphology

– Templatic Morphology

Page 12: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 12/

Word Classes

• By word class, we have in mind familiar

notions like noun, verb, adjective and adverb

• Why to concerned with word classes?

– The way that stems and affixes combine is based

to a large degree on the word class of the stem

Page 13: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 13/

Inflectional Morphology

• Word building process that serves

grammatical function without changing the

part of speech or the meaning of the stem

• The resulting word

– Has the same word class as the original

– Serves a grammatical/ semantic purpose different

from the original

Page 14: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 14/

Inflectional Morphology in English

on Nouns

• PLURAL -s books• POSSESSIVE -‟s Mary‟son Verbs

• 3 SINGULAR -s s/he knows• PAST TENSE -ed talked• PROGRESSIVE -ing talking• PAST PARTICIPLE -en, -ed written, talkedon Adjectives

• COMPARATIVE -er longer• SUPERLATIVE -est longest

Page 15: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 15/

Nouns and Verbs (English)

• Nouns are simple

– Markers for plural and possessive

• Verbs are slightly more complex

– Markers appropriate to the tense of the verb

• Adjectives

– Markers for comparative and superlative

Page 16: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 16/

Regulars and Irregulars

• some words misbehave (refuse to follow the

rules)

– Mouse/mice, goose/geese, ox/oxen

– Go/went, fly/flew

• The terms regular and irregular will be used

to refer to words that follow the rules and

those that don‟t.

Page 17: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 17/

Regular and Irregular Verbs

• Regulars…

– Walk, walks, walking, walked, walked

• Irregulars

– Eat, eats, eating, ate, eaten

– Catch, catches, catching, caught, caught

– Cut, cuts, cutting, cut, cut

Page 18: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 18/

Derivational Morphology

• word building process that creates new

words, either by changing the meaning or

changing the part of speech of the stem

– Irregular meaning change

– Changes of word class

Page 19: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 19/

Examples of derivational morphemes in English that change the part of speech

• ful (N → Adj) – pain → painful

– beauty → beautiful

– truth → truthful

– cat → *catful

– rain → *rainful

• ment (V → N) establish →

establishment

• ity (Adj → N) – pure → purity

• ly (Adj → Adv) – quick → quickly

• en (Adj → V) – wide → widen

Page 20: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 20/

Examples of derivational morphemes in English that change the meaning

• dis-– appear → disappear

• un-– comfortable → uncomfortable

• in-– accurate → inaccurate

• re-– generate → regenerate

• inter-– act → interact

Page 21: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 21/

Examples on Derivational Morphology

V → N

compute computer

nominate nominee

deport deportation

computerize computerization

N → V

computer computerize

A → N

furry furriness

apt aptitude

sincere sincerity

N → A

cat catty, catlike

hope hopeless

magic magical

V → A

love lovable

A → V

black blacken

modern modernize

Page 22: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 22/

Derivational Examples

• Verb/Adj to Noun

-ation computerize computerization

-ee appoint appointee

-er kill killer

-ness fuzzy fuzziness

Page 23: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 23/

Derivational Examples

• Noun/ Verb to Adj

-al Computation Computational

-able Embrace Embraceable

-less Clue Clueless

Page 24: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 24/

Compute

• Many paths are possible…

• Start with compute

– Computer -> computerize -> computerization

– Computation -> computational

– Computer -> computerize -> computerizable

– Compute -> computee

Page 25: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 25/

Templatic Morphology: Root Pattern Examples from Arabic

Word &

TransliterationMeaning

Word &

Transliteration Meaning

<naâma> [ He slept [نامَ <naâ'imun> [ Sleeping [نائمَ

<yanaâmu> [ He sleeps [ينامَ <munawwamun>[ مَ [منوَّ

Under hypnotic

<nam> [ Sleep [نمَ <na'ûmun> [ Late riser [نؤومَ

<tanwçmun>[ [تنويمَ

Lulling to sleep <'anwamu> [ [أنومَ More given to

sleep

<manaâmun>[ [منامَ

Dream<nawwaâmun>[ [نّوامَ

The most given to

sleep

<nawmatun> Of one sleep [نومة]<manaâmun>[ [منامَ

Dormitory

<nawwaâmatun> [نوامة َ]

Sleeper<'an yanaâma> أنَ]

[ينامَ That he sleeps

<nawmiyyatun> [نومية َ]

Pertaining to

sleep

<munawwamun>[ مَ [منوِّ

hypnotic

Page 26: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 26/

Morphotactic Models

• English nominal inflection

q0 q2q1

plural (-s)reg-n

irreg-sg-n

irreg-pl-n

•Inputs: cats, goose, geese

•reg-n: regular noun

•irreg-pl-n: irregular plural noun

•irreg-sg-n: irregular singular noun

Page 27: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 27/

• Derivational morphology: adjective

fragment

q3

q5

q4

q0

q1 q2un-

adj-root1

-er, -ly, -est

adj-root1

adj-root2

-er, -est

• Adj-root1: clear, happy, real

• Adj-root2: big, red

Page 28: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 28/

Using FSAs to Represent the Lexicon and Do Morphological

Recognition

• Lexicon: We can expand each non-

terminal in our NFSA into each stem in its

class (e.g. adj_root2 = {big, red}) and

expand each such stem to the letters it

includes (e.g. red r e d, big b i g)

q0

q1

r e

q2

q4

q3

-er, -est

db

gq5

q6i

q7

Page 29: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 29/

Limitations

• To cover all of English will require very large FSAs with consequent search problems– Adding new items to the lexicon means re-

computing the FSA

– Non-determinism

• FSAs can only tell us whether a word is in the language or not – what if we want to know more?– What is the stem?

– What are the affixes?

– We used this information to build our FSA: can we get it back?

Page 30: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 30/

Parsing with Finite State Transducers

• cats cat +N +PL

• Kimmo Koskenniemi‟s two-level morphology

– Words represented as correspondences between

lexical level (the morphemes) and surface level (the

orthographic word)

– Morphological parsing :building mappings between

the lexical and surface levels

c a t +N +PL

c a t s

Page 31: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 31/

Finite State Transducers

• FSTs map between one set of symbols and

another using an FSA whose alphabet is

composed of pairs of symbols from input

and output alphabets

• In general, FSTs can be used for

– Translator (Hello:مرحبا)

– Parser/generator (Hello:How may I help you?)

– To map between the lexical and surface levels of

Kimmo‟s 2-level morphology

Page 32: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 32/

• FST is a 5-tuple consisting of

– Q: set of states {q0,q1,q2,q3,q4}

– : an alphabet of complex symbols, each is an

i/o pair such that i I (an input alphabet) and o

O (an output alphabet) and is in I x O

– q0: a start state

– F: a set of final states in Q {q4}

– (q,i:o): a transition function mapping Q x to

Q

– Emphatic Sheep Quizzical Cow

q0 q4q1 q2 q3

b:m a:oa:o

a:o !:?

Page 33: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 33/

FST for a 2-level Lexicon

• Example

Reg-n Irreg-pl-n Irreg-sg-n

c a t g o:e o:e s e g o o s e

q0 q1 q2 q3c a t

q1 q3 q4q2

se:o e:o e

q0 q5

g

Page 34: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 34/

FST for English Nominal Inflection

q0 q7

+PL:^s#

Combining (cascade or composition) this FSA

with FSAs for each noun type replaces e.g. reg-

n with every regular noun representation in the

lexicon

q1 q4

q2 q5

q3 q6

reg-n

irreg-n-sg

irreg-n-pl

+N:

+PL:-s#

+SG:-#

+SG:-#

+N:

+N:

Page 35: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 35/

Orthographic Rules and FSTs

• Define additional FSTs to implement rules

such as consonant doubling (beg

begging), „e‟ deletion (make making), „e‟

insertion (watch watches), etc.

Lexical f o x +N +PL

Intermediate f o x ^ s #

Surface f o x e s

Page 36: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 36/

• Note: These FSTs can be used for

generation as well as recognition by

simply exchanging the input and output

alphabets (e.g. ^s#:+PL)

Page 37: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 37/

FSAs and the Lexicon

• First we‟ll capture the morphotactics

– The rules governing the ordering of affixes in a

language.

• Then we‟ll add in the actual stems

Page 38: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 38/

Simple Rules

Page 39: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 39/

Adding the Words

But it does not express that:

•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)

•Reg nouns ending –y preceded by a consonant change the –y to -i

Page 40: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 40/

Derivational Rules

[nouni] eg. hospital

[adjal] eg. formal

[adjous] eg. arduous

[verbj] eg. speculate

[verbk] eg. conserve

Page 41: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 41/

Parsing/Generation

vs. Recognition

• Recognition is usually not quite what we need.

– Usually if we find some string in the language we

need to find the structure in it (parsing)

– Or we have some structure and we want to produce

a surface form (production/ generation)

Page 42: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 42/

In other words

• Given a word we need to find: the stem and its class and properties (parsing)

• Or we have a stem and its class and properties and we want to produce the word (production/generation)

• Example (parsing)– From “cats” to “cat +N +PL”

– From “lies” to ……

Page 43: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 43/

Applications

• The kind of parsing we‟re talking about is

normally called morphological analysis

• It can either be

– An important stand-alone component of an

application (spelling correction, information

retrieval)

– Or simply a link in a chain of processing

Page 44: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 44/

Finite State Transducers

• The simple story

– Add another tape

– Add extra symbols to the transitions

– On one tape we read “cats”, on the other we

write “cat +N +PL”, or the other way around.

Page 45: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 45/

FSTs

generationparsing

Page 46: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 46/

Transitions

• c:c means read a c on one tape and write a c on the other

• +N:ε means read a +N symbol on one tape and write nothing on the other

• +PL:s means read +PL and write an s

c:c a:a t:t +N:ε +PL:s

Page 47: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 47/

Typical Uses

• Typically, we‟ll read from one tape using the

first symbol on the machine transitions (just

as in a simple FSA).

• And we‟ll write to the second tape using the

other symbols on the transitions.

Page 48: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 48/

Ambiguity

• Recall that in non-deterministic recognition

multiple paths through a machine may lead

to an accept state.

– Didn‟t matter which path was actually traversed

• In FSTs the path to an accept state does

matter since different paths represent

different parses and different outputs will

result

Page 49: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 49/

Ambiguity

• What‟s the right parse for

– Unionizable

– Union-ize-able

– Un-ion-ize-able

• Each represents a valid path through the

derivational morphology machine.

Page 50: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 50/

Ambiguity

• There are a number of ways to deal with this

problem

– Simply take the first output found

– Find all the possible outputs (all paths) and return

them all (without choosing)

– Bias the search so that only one or a few likely

paths are explored

Page 51: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 51/

More Details

• Its not always as easy as

– “cat +N +PL” <-> “cats”

• There are geese, mice and oxen

• There are also spelling/ pronunciation

changes that go along with inflectional

changes

Page 52: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 52/

Multi-Tape Machines

• To deal with this we can simply add more

tapes and use the output of one tape

machine as the input to the next

• So to handle irregular spelling changes we‟ll

add intermediate tapes with intermediate

symbols

Page 53: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 53/

Spelling Rules and FSTs

Name Description of Rule Example

Consonant

doubling

1-letter consonant doubled

before -ing/-edbeg/begging

E deletion Silent e dropped before

-ing and –ed

make/making

E insertion e added after –s, -z, -x,

-ch, -sh before -s

watch/watches

Y replacement -y changes to –ie before

-s, and to -i before -ed

try/tries

K insertion verbs ending with vowel + -c add -k

panic/panicked

Page 54: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 54/

Multi-Level Tape Machines

• We use one machine to transducer between the

lexical and the intermediate level, and another to

handle the spelling changes to the surface tape

Page 55: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 55/

Lexical to Intermediate Level

Machine

Page 56: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 56/

FST for the E-insertion Rule: Intermediate to

Surface

q0 q3 q4

q5

q1 q2

^:

:e

^:

^:

z, s, xz, s, x

z, s, x

s

#

other

z, x

#, other

#, other

#

other

s

• The add an “e” rule as in fox^s# <-> foxes

#__^/ s

z

s

x

e

MachineMore

Page 57: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 57/

Note

• A key feature of this machine is that it

doesn‟t do anything to inputs to which it

doesn‟t apply.

• Meaning that: they are written out unchanged

to the output tape.

Page 58: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 58/

English Spelling Changes

• We use one machine to transduce between the

lexical and the intermediate level, and another to

handle the spelling changes to the surface tape

Page 59: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 59/

Foxes

Machine 1

Machine 2

Page 60: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 60/

Overall Plan

Page 61: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 61/

Final Scheme: Part 1

Page 62: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 62/

Final Scheme: Part 2

Page 63: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 63/

Stemming vs Morphology

• Sometimes you just need to know the stem

of a word and you don‟t care about the

structure.

• In fact you may not even care if you get the

right stem, as long as you get a consistent

string.

• This is stemming… it most often shows up in

IR (Information Retrieval) applications

Page 64: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 64/

Stemming in IR

• Run a stemmer on the documents to be

indexed

• Run a stemmer on users queries

• Match

– This is basically a form of hashing

Page 65: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 65/

Porter Stemmer

• No lexicon needed

• Basically a set of staged sets of rewrite rules

that strip suffixes

• Handles both inflectional and derivational

suffixes

• Doesn‟t guarantee that the resulting stem is

really a stem

• Lack of guarantee doesn‟t matter for IR

Page 66: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 66/

Porter Example

• Computerization– ization -> -ize computerize

– ize -> ε computer

• Other Rules– ing -> ε (motoring -> motor)

– ational -> ate (relational -> relate)

• Practice: See Poter‟s Stemmer at Appendix B and suggest some rules for A KFUPM Arabic Stemmer

Page 67: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 67/

Porter Stemmer

• The original exposition of the Porter stemmer

did not describe it as a transducer but…

– Each stage is separate transducer

– The stages can be composed to get one big

transducer

Page 68: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 68/

Human Morphological Processing: How do people represent words?

• Hypotheses:– Full listing hypothesis: words listed

– Minimum redundancy hypothesis: morphemes listed

• Experimental evidence:– Priming experiments (Does seeing/ hearing one

word facilitate recognition of another?)

– Regularly inflected forms prime stem but not derived forms

– But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)

Page 69: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 69/

Reminder: Quiz 1 Next class

• Next time: Quiz

– Ch 1!, 2, & 3 (Lecture presentations)

– Do you need a sample quiz?

• What is the difference between a sample and a template?

• Let me think – It might appear at the WebCt site on late

Saturday.

Page 70: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 70/

More Examples

Page 71: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 71/

Using FSTs for orthographic rules

#__/ s

z

s

x

e

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

Page 72: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 72/

Using FSTs for orthographic rules

fox^s#…we get to q1 with ‘x’

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

Page 73: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 73/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we get to q2 with ‘^’

Page 74: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 74/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we can get to q3

with ‘NULL’

Page 75: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 75/

Using FSTs for orthographic rules

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we also get to q5 with ‘s’

but we don’t want to!

Page 76: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 76/

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…we also get to q5 with ‘s’

but we don’t want to!

So why is this transition there?

?friend^ship, ?fox^s^s (= foxes’s)

Page 77: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 77/

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…q4 with s

Page 78: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 78/

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

fox^s#…q0 with #

(accepting state)

Back

Page 79: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 79/

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

arizona: we leave q0 but return

Other transitions…

Page 80: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 80/

#

q0 q1 q2 q3 q4

q5:̂

#

other

otherZ! = Z, s, x

Z! Z!

Z!

S

#, other

:e

#, other z,x

^:

^:

s

m i s s ^ s

Other transitions…

Page 81: Morphology and Finite-state Transducers Part 2 · 3/19/2008 1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb

3/19/2008 81/

السالم عليكم ورحمة هللا

سبحانك اللهم وبحمدك أشهد

أن ال إله إال أنت أستغفرك

وأتوب اليك


Recommended