Natural Language Processing
Finite State Morphology
Slides adapted from Jurafsky & Martin, Speech and Language Processing
2
Outline
• Finite State Automata• (English) Morphology• Finite State Transducers
Finite State Automata
• Let’s start with the sheep language from Chapter 2 /baa+!/
Sheep FSA
• We can say the following things about this machine It has 5 states b, a, and ! are in its alphabet q0 is the start state
q4 is an accept state It has 5 transitions
More Formally
• You can specify an FSA by enumerating the following things. The set of states: Q A finite alphabet: Σ A start state A set of accept/final states A transition relation that maps QxΣ to Q
Yet Another View
• The guts of FSAs can ultimately be represented as tables
b a ! e
0 1
1 2
2 3
3 3 4
4
If you’re in state 1 and you’re looking at an a, go to state 2
Recognition
• Recognition is the process of determining if a string should be accepted by a machine
• Or… it’s the process of determining if a string is in the language we’re defining with the machine
• Or… it’s the process of determining if a regular expression matches a string
• Those all amount the same thing in the end
Recognition
• Traditionally, (Turing’s notion) this process is depicted with a tape.
Recognition
• Simply a process of starting in the start state
• Examining the current input• Consulting the table• Going to a new state and updating
the tape pointer• Until you run out of tape
D-Recognize
Key Points
• Deterministic means that at each point in processing there is always one unique thing to do (no choices).
• D-recognize is a simple table-driven interpreter
• The algorithm is universal for all unambiguous regular languages. To change the machine, you simply
change the table.
Recognition as Search
• You can view this algorithm as a trivial kind of state-space search.
• States are pairings of tape positions and state numbers.
• Operators are compiled into the table• Goal state is a pairing with the end of
tape position and a final accept state• It is trivial because?
Non-Determinism
Non-Determinism
• Yet another technique Epsilon transitions Key point: these transitions do not
examine or advance the tape during recognition
Equivalence
• Non-deterministic machines can be converted to deterministic ones with a fairly simple construction
• That means that they have the same power; non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept
ND Recognition
• Two basic approaches (used in all major implementations of regular expressions, see Friedl 2006)
1. Either take a ND machine and convert it to a D machine and then do recognition with that.
2. Or explicitly manage the process of recognition as a state-space search (leaving the machine as is).
Non-Deterministic Recognition: Search
• In a ND FSA there exists at least one path through the machine for a string that is in the language defined by the machine.
• But not all paths directed through the machine for an accept string lead to an accept state.
• No paths through the machine lead to an accept state for a string not in the language.
Non-Deterministic Recognition
• So success in non-deterministic recognition occurs when a path is found through the machine that ends in an accept.
• Failure occurs when all of the possible paths for a given string lead to failure.
Example
Example
Example
Example
Example
Example
Example
Example
Key Points
• States in the search space are pairings of tape positions and states in the machine.
• By keeping track of as yet unexplored states, a recognizer can systematically explore all the paths through the machine given an input.
Why Bother?
• Non-determinism doesn’t get us more formal power and it causes headaches so why bother? More natural (understandable) solutions Not always equivalent
Compositional Machines
• Formal languages are just sets of strings
• Therefore, we can talk about various set operations (intersection, union, concatenation)
• This turns out to be a useful exercise
Union
Concatenation
32
Words
• Finite-state methods are particularly useful in dealing with a lexicon
• Many devices, most with limited memory, need access to large lists of words
• And they need to perform fairly sophisticated tasks with those lists
• So we’ll first talk about some facts about words and then come back to computational methods
33
English Morphology
• Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes
• We can usefully divide morphemes into two classes Stems: The core meaning-bearing units Affixes: Bits and pieces that adhere to
stems to change their meanings and grammatical functions
34
English Morphology
• We can further divide morphology up into two broad classes Inflectional Derivational
35
Inflectional Morphology
• Inflectional morphology concerns the combination of stems and affixes where the resulting word: Has the same word class as the original Serves a grammatical/semantic purpose
that is Different from the original But is nevertheless transparently related to
the original
36
Nouns and Verbs in English
• Nouns are simple Markers for plural and possessive
• Verbs are only slightly more complex Markers appropriate to the tense of the
verb
37
Regulars and Irregulars
• It is a little complicated by the fact that some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew
• The terms regular and irregular are used to refer to words that follow the rules and those that don’t
38
Regular and Irregular Verbs
• Regulars… Walk, walks, walking, walked, walked
• Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut
39
Derivational Morphology
• Derivational morphology is the messy stuff that no one ever taught you. Quasi-systematicity Irregular meaning change Changes of word class
40
Derivational Examples
• Verbs and Adjectives to Nouns
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
41
Derivational Examples
• Nouns and Verbs to Adjectives
-al computation computational
-able embrace embraceable
-less clue clueless
42
Morphology and FSAs
• We’d like to use the machinery provided by FSAs to capture these facts about morphology Accept strings that are in the language Reject strings that are not And do so in a way that doesn’t require
us to in effect list all the words in the language
43
Start Simple
• Regular singular nouns are ok• Regular plural nouns have an -s on
the end• Irregulars are ok as is
44
Simple Rules
45
Now Plug in the Words
46
Derivational Rules
If everything is an accept state how do things ever get rejected?
47
Parsing/Generation vs. Recognition
• We can now run strings through these machines to recognize strings in the language
• But recognition is usually not quite what we need Often if we find some string in the language we
might like to assign a structure to it (parsing) Or we might have some structure and we want to
produce a surface form for it (production/generation)• Example
From “cats” to “cat +N +PL”
48
Finite State Transducers
• The simple story Add another tape Add extra symbols to the transitions
On one tape we read “cats”, on the other we write “cat +N +PL”
49
FSTs
50
Applications
• The kind of parsing we’re talking about is normally called morphological analysis
• It can either be • An important stand-alone component of
many applications (spelling correction, information retrieval)
• Or simply a link in a chain of further linguistic analysis
51
Transitions
• c:c means read a c on one tape and write a c on the other• +N:ε means read a +N symbol on one tape and write
nothing on the other• +PL:s means read +PL and write an s
c:c a:a t:t +N: ε +PL:s
52
Ambiguity
• Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.• Didn’t matter which path was actually
traversed• In FSTs the path to an accept state
does matter since different paths represent different parses and different outputs will result
53
Ambiguity
• What’s the right parse (segmentation) for• Unionizable• Union-ize-able• Un-ion-ize-able
• Each represents a valid path through the derivational morphology machine.
54
Ambiguity
• There are a number of ways to deal with this problem• Simply take the first output found• Find all the possible outputs (all paths)
and return them all (without choosing)• Bias the search so that only one or a few
likely paths are explored
55
The Gory Details
• Of course, its not as easy as • “cat +N +PL” <-> “cats”
• As we saw earlier there are geese, mice and oxen
• But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes• Cats vs Dogs• Fox and Foxes
56
Multi-Tape Machines
• To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next
• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
57
Multi-Level Tape Machines
• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
58
Overall Scheme
59
Cascades
• This is an architecture that we’ll see again and again• Overall processing is divided up into
distinct rewrite steps• The output of one layer serves as the
input to the next• The intermediate tapes may or may not
wind up being useful in their own right
60
Overall Plan
61
Final Scheme
Conclusion
• Finite state machines provide flexible and efficient models of words
• Finite state transducers are the method of choice for morphological analysis If there is a solved problem in NLP, this
is it!
• Why not use finite state techniques for all problems in NLP?
62
63
Lexical to Intermediate Level
64
Intermediate to Surface
• The add an “e” rule as in fox^s# <-> foxes#
65
Foxes
66
Composition
1. Create a set of new states that correspond to each pair of states from the original machines (New states are called (x,y), where x is a state from M1, and y is a state from M2)
2. Create a new FST transition table for the new machine according to the following intuition …
67
Composition
• There should be a transition between two states in the new machine if it’s the case that the output for a transition from a state from M1, is the same as the input to a transition from M2 or …
68
Composition
• δ3((xa,ya), i:o) = (xb,yb) iffThere exists c such thatδ1(xa, i:c) = xb AND
δ2(ya, c:o) = yb