Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | jasmin-lynch |
View: | 226 times |
Download: | 0 times |
29.10.2002 CSA3050: NLP Algorithms 1
CSA3050: NL Algorithms
• Introduction to English Morphology
• Finite State Transducers
29.10.2002 CSA3050: NLP Algorithms 2
Acknowledgement
For further details see Jurafsky & Martin Ch.3
29.10.2002 CSA3050: NLP Algorithms 3
Morphology
• Morphology is the study of how word-parts combine to form word wholes.
• Several different dimensions:
• Orthographic - rules for combining strings of characters together.
• Syntax - effect on syntactic category.
• Semantic - effect on meaning.
29.10.2002 CSA3050: NLP Algorithms 4
Examples ofMorphological Processes
• Affixation– prefix– suffix– circumfix: German ge + stem + t
e.g. sagen, gesagt– infix: unbloodylikely
• Vowel change: swim/swam
• Consonant change: send/sent
29.10.2002 CSA3050: NLP Algorithms 5
Inflectional/DerivationalMorphology
• Inflectional+s plural+ed past
• category preserving• productive: always
applies (esp. new words, e.g. fax)
• systematic: same semantic effect
• Derivational+ment
• category changingescape+ment
• not completely productive: detractment*
• not completely systematic: catchment
29.10.2002 CSA3050: NLP Algorithms 6
English Inflectional Morphology
• Applies to nouns, verbs and adjectives only• Number of inflections relatively small• Nouns
– Plural, Possessive
• Verbs– Verb forms
• Adjectives– Comparison
29.10.2002 CSA3050: NLP Algorithms 7
Noun Inflections
Regular Irregular
Singular cat church mouse ox
Plural cats churches mice oxen
29.10.2002 CSA3050: NLP Algorithms 8
Regular Verb Inflections
stem walk merge try map
-s form walks merges tries maps
-ing participle
walking merging trying mapping
-ed participle
or past
walked merged tried mapped
29.10.2002 CSA3050: NLP Algorithms 9
Irregular Verb Inflectionsstem eat catch cut go
-s form eats catches cuts goes
-ing participle
eating catching cutting going
Past ate caught cut went
-ed participle
eaten caught cut gone
29.10.2002 CSA3050: NLP Algorithms 10
Morphological Parsing
MorphologicalParser
Input Word
cats
OutputAnalysis
cat + PL
• Output is a string of morphemes• Reversibility?
29.10.2002 CSA3050: NLP Algorithms 11
Morphological Parsing: Examples
Input word Output morphemes
cats cat +N +PL
cat cat + N + SG
cities city + N + PL
walks walk + V + 3SG
cook cook +N +SG or
cook +V
29.10.2002 CSA3050: NLP Algorithms 12
Morphemes• Morpheme is a theoretical contruct ...• but has a practical use• Choice of morpheme vocabulary:
theoretical and practical motivation• Distinction between underlying morpheme
and its realisation.• String of morphemes could be turned into
another representation later
29.10.2002 CSA3050: NLP Algorithms 13
Morphological Parsing Requires
1. Lexicon: list of stems and affixes + related information (e.g syntactic category)
2. Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before).
3. Correspondences between input and output strings
4. Spelling Rules: city + s cities
29.10.2002 CSA3050: NLP Algorithms 14
Lexicon
• Lexicon is generally divided into sublexicons– Stem Lexicon
• Noun Stems
• Verb Stems
• etc
– Suffix Lexicon
– Prefix Lexicon
• Can all be represented as FSAs
29.10.2002 CSA3050: NLP Algorithms 15
FSA for Sublexicon Fragment
t h e s
ei
s
a
t
o
29.10.2002 CSA3050: NLP Algorithms 16
FSA for Morphotactics forNoun Inflection
29.10.2002 CSA3050: NLP Algorithms 17
Morphotactics for Verb Inflection
29.10.2002 CSA3050: NLP Algorithms 18
Input/Output Correspondences
• Problem: how to specify correspondence between input word, and output analysis.
• Given: both input and output are strings.
• Two level morphology (Koskenniemi 1983) proposes– Surface Tape (words)– Lexical Tape (concatenation of morphemes)
29.10.2002 CSA3050: NLP Algorithms 19
2 Level Model
The automaton used to perform the mapping Between these levels is the finite state transducer(FST).
29.10.2002 CSA3050: NLP Algorithms 20
Basic FS Transducer
• Each transition of a transducer is labelled with a pair of symbols
• Input symbols are matched against the lower-side symbols on transitions.
• If analysis succeeds, return the string of upper-side symbols
input symb
output symb
29.10.2002 CSA3050: NLP Algorithms 21
Morphological Analysis
{ ("CATS", "CAT+N+PL"),
("CAT", "CAT+N+SG")
}
+PLTAC
AC T S
+N
29.10.2002 CSA3050: NLP Algorithms 22
FST Formal Definition
• States, initial state, final states: same as FSA
• Alphabets I and O are input and output alphabets, not necessarily disjoint.
• FST Alphabet Σ I x O• Transition function δ(q, i:o), defines the
state q' that ensues when the machine is in state q and encounters complex symbol i:o.
29.10.2002 CSA3050: NLP Algorithms 23
FST Alphabet Example
c at
εI
O
c:cc:ac:tc:ε
Σ
t:ct:at:tt:ε
a:ca:aa:ta:ε
':c':a':t':ε
'
I x O
29.10.2002 CSA3050: NLP Algorithms 24
Summary
• Morphological processing can be handled by finite state machinery
• Finite State Transducers are formally very similar to Finite State Automata.
• They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages.