+ All Categories
Home > Documents > October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

Date post: 18-Jan-2018
Category:
Upload: catherine-young
View: 238 times
Download: 0 times
Share this document with a friend
Description:
October 2004CSA3050 NLP Algorithms3 Inflectional/Derivational Morphology Inflectional +s plural +ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational +ment category changing escape+ment not completely productive: detractment* not completely systematic: apartment
22
October 2004 CSA3050 NLP Algorithms 1 CSA3050: Natural Language Algorithms Morphological Parsing
Transcript
Page 1: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 1

CSA3050: Natural Language Algorithms

Morphological Parsing

Page 2: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 2

Morphology

• Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes.

• Combination of morphemes to form words that are legal in some language.

• Two kinds of morphology– Inflectional– Derivational

Page 3: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 3

Inflectional/DerivationalMorphology

• Inflectional+s plural+ed past

• category preserving• productive: always

applies (esp. new words, e.g. fax)

• systematic: same semantic effect

• Derivational+ment

• category changingescape+ment

• not completely productive: detractment*

• not completely systematic: apartment

Page 4: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 4

Noun Inflections

Regular Irregular

Singular cat church mouse ox

Plural cats churches mice oxen

Page 5: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 5

Morphological Parsing

MorphologicalParser

Input Word

cats

OutputAnalysis

cat N PL

• Output is a string of morphemes• Reversibility?

Page 6: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 6

Morphological Parsing

• The goal of morphological parsing is to find out what morphemes a given word is built from. mouse mouse N SGmice mouse N PLfoxes fox N PL

Page 7: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 7

2 Steps1. Split word up into its possible components,

using + to indicate possible morpheme boundaries.

cats cat + sfoxes fox + sfoxes foxe + s

2. Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixes

cat + s cat + NP + PLfox + s fox + N + PL.

Page 8: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 8

Step 1: Surface IntermediateFST

Page 9: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 9

Step 1: Surface IntermediateOperation

Page 10: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 10

2. Intermediate Morphemes

Possible inputs to the transducer are:

• Regular noun stem: cat• Regular noun stem + s: cat+s• Singular irregular noun stem: mouse• Plural irregular noun stem: mice

Page 11: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 11

2. Intermediate MorphemesTransducer

Page 12: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 12

Handling Stems

cat /cat

mice/mouse

Page 13: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 13

Completed Stage 2

Page 14: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 14

Joining Stages 1 and 2

• If the two transducers run in a cascade (i.e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases.

• We can change also the direction of translation (in translation mode).

• This transducer can also be used for generating a surface form from an underlying form.

Page 15: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 15

Prolog• The transducer

specifications we have seen translate easily into Prolog format except for the other transition.

• arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(1,3,<other>).

Page 16: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 16

Handling other arcs

arc(1,3,z:z) :- !.arc(1,3,s:s) :- !.arc(1,3,x:x) :- !.arc(1,2,#:+) :- !.arc(1,3,X:X) :- !.

Page 17: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 17

Combining Rules• Consider the word “berries”.• Two rules are involved

– berry + s– y → ie under certain circumstances.

• Combinations of such rules can be handled in two ways– Cascade, i.e. sequentially– Parallel

• Algorithms exist for combining transducers together in series or in parallel.

• Such algorithms involve computations over regular relations.

Page 18: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 18

3 Related Frameworks

REGULARLANGUAGES

REGULAREXPRESSIONS

FSA

Page 19: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 19

REGULAR RELATIONS

REGULARRELATIONS

AUGMENTEDREGULAR

EXPRESSIONS

FINITE STATETRANSDUCERS

Page 20: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 20

Putting it all together

execution of FSTi

takes place in parallel

Page 21: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 21

Kaplan and KayThe Xerox View

FSTi are alignedbut separate

FSTi intersectedtogether

Page 22: October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

October 2004 CSA3050 NLP Algorithms 22

Summary

• Morphological processing can be handled by finite state machinery

• Finite State Transducers are formally very similar to Finite State Automata.

• They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages.


Recommended