+ All Categories
Home > Documents > CSA4050 Advanced Topics in NLP

CSA4050 Advanced Topics in NLP

Date post: 21-Mar-2016
Category:
Upload: reece
View: 58 times
Download: 3 times
Share this document with a friend
Description:
CSA4050 Advanced Topics in NLP. Non-Concatenative Morphology Reduplication Interdigitation. Reference. Ken Beesely and Lauri Karttunen, Finite State Non-Concatenative Morphotactics, Proceedings of SIGPHON-2000. Koskenniemi 1983. - PowerPoint PPT Presentation
Popular Tags:
26
November 2003 Computational Morphology VI 1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation
Transcript
Page 1: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 1

CSA4050 Advanced Topicsin NLP

Non-Concatenative Morphology

– Reduplication– Interdigitation

Page 2: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 2

ReferenceKen Beesely and Lauri Karttunen,

Finite State Non-Concatenative Morphotactics, Proceedings of SIGPHON-2000

Page 3: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 3

Koskenniemi 1983"Only restricted infixation and reduplication can be handled adequately with the present system. Some extensions or revisions will be necessary for an adequate description of languages possessing extensive infixation or reduplication"

Page 4: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 4

Non-Concatenative Languages

• Most languages build words by stringing together morphemes like beads on a string.

• The word-building processes of prefixation and suffixation can be straightforwardly modeled in finite state terms by concatenation.

• But some languages also exhibit non-concatenative morphotactics.

Page 5: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 5

Non-Concatenative Phenomena1. Reduplication

• In Malaybagi (bag)bagi-bagi (bags)

• Although this may appear concatenative, it does not involve concatenating a predictible morpheme – like "s". Instead the entire stem is copied no matter what its length.

• In general language class (ww | w L) is context sensitive, but if L is finite, we can construct an FS network that encodes it.

Page 6: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 6

General Solution for Reduplication

• Therefore, assuming the number of words subject to reduplication is finite, it is possible to construct a lexical transducer for languages like Malay.

• To handle reduplication, a new operator ^n is introduced:

• A^n denotes n concatenations of A.

Page 7: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 7

Remarks from Beesleyon Context Sensitivity

• finite-state grammars (cannot handle unlimited nesting or non-nested terminal dependencies)

• context-free (can handle unlimited nesting, suchas matched parentheses in arithmetic expressions, but cannot handle non-nested dependencies between terminals)

• context-sensitive (can also handle non-nesteddependencies between terminals, as indogdogwhere terminal elements 1 and 4 have to bethe same, 2 and 5 have to be the same, and3 and 6 have to be the same.  These dependenciescross, so they're not nested.

Page 8: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 8

Non-Concatenation 2. Interdigitation

• In Arabic and Maltese, prefixes and suffixes attach to stems in the usual concatenative way, but stems themselves are formed by a process known as interdigitation.

• An example of occurs with the Arabic stem "katab" (wrote).

• This stem is composed of three elements1. the all consonant root ktb2. an abstract consonant-vowel template CVCVC3. a vocalisation aa (in this case signifying perfect

tense and active voice)

Page 9: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 9

Interdigitation• The same root ktb can combine with

the same template CVCVC and a different vocalism ui (signifying imperfect aspect and passive voice) to produce "kutib" (was written).

• The same root ktb can combine with a different template CVVCVC and the vocalism ui to produce "kuutib" – another form of the verb.

Page 10: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 10

Intermediate Result:Template + Root

d v v r v s

Page 11: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 11

Final Result:Intermediate Result +

Vocalism

d u u r i s

Page 12: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 12

Merge• In this case the filler language

contains an infinite set of strings (i, ui, uui …) but only one path can be constructed because all strings end in i. Hence the earlier vowels must be "u".

• This need not always be the case (eg if the filler language were u*i*).

Page 13: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 13

Merge Operators• To introduce the merge operation into

the Xerox calculus new operators, .<m. and .m>. have been introduced.

• These differ only in the order of arguments.

• [T .<m. F] and [F .m>. T] represent the same merge operation with F and T as filler and template respectively.

Page 14: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 14

The Composite Transducer• With these operators the network

above can be compiled by using the following expression:

[d r s] .m>. [C V V C V C] .<m. [u* i]

Page 15: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 15

Merge c v v c v c

d r s i

u

template

vocalismroot

Page 16: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 16

Compile-Replace• Regular expressions are compiled into

networks as usual, but in addition,• the compiler is then applied to its own

output.• Central idea:

– transduce to a language that has the format of regular expressions.

– The compile-replace algorithm then replaces the regular expression with the result of its own compilation.

Page 17: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 17

Compile Replace Simple Example

0:^[ a * 0:^]

This network maps the string a* to ^[ a* ^] (i.e. the same RE but with special delimiters)

Application of CR to the lower side of thenetwork eliminates the markers, compile theRE a* and maps the upper side to to the languageresulting from the compilation.

Page 18: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 18

The result of compiling ^[ a* ^]

a*:a

a:0

*:0

*:0

0:a

• To answer the question: what does this network do?

• Figure out what it does in upward and downward directions

Page 19: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 19

The result of compiling ^[ a* ^]

a*:a

a:0

*:0

*:0

0:a

When applied in the upward direction, this transducermaps any string of the infinite a* language into the regularexpression from which it was compiled.

When applied in the downward direction, it maps from a* to all the strings in the language a*, {0, a, aa, ...}

Page 20: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 20

Compile-Replace: 1• Copy input path to output path until

^[ is encountered on indicated (in our case lower) side of the network.

• Extract path until closing delimiter ^].

0:^[ a * 0:^]

a:a *:*

Page 21: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 21

Compile-Replace: 2• Symbols along indicated side are

concatenated into a string and eliminated from the path leaving just the symbols on the opposite side. The remaining net is

• The extracted string is compiled into a second network using the standard network compiler

a *

a

Page 22: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 22

Compile-Replace: 3• The 2 networks

are combined together using the cross product operator.

• The result

• is spliced between the origin and destination states of the regular expression path.

a *

a

a*:a

a:0*:0

*:0

0:a

Page 23: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 23

Reduplication Revisited• Applying compile-replace to this

transducerLexical: b a g i +Noun +PluralSurface: ^[ [b a g i] ^ 2 ^]

• yields this oneLexical: b a g i +Noun +PluralSurface: b a g i b a g i

Page 24: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 24

Interdigitation Revisited• Applying compile-replace to this

transducer

Up: k i t e b +Verb +Past +3SgDo:[k t b] .m>. [C V C V C] .<m. [i e]

• yields this one

Up: k i t e b +Verb +Past +3SgDo: k i t e b

Page 25: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 25

Remember: Two Central Problems

• Morphotactics: constraints on combinations of morphemes governing the formation of valid words. unbelievable vs. believeunable

• Phonological/Orthographical Alternation (spelling rules):how morphemes are realised in particular environmentsfly + s = flies

Page 26: CSA4050 Advanced Topics in NLP

November 2003 Computational Morphology VI 26

Xerox Perspective • Morphotactics: handle with lexc• Phonological/Orthographical

Alternation (spelling rules):handle with xfst

Morphotactics

Rules FST

Lexicon FST

LexicalTransducer

Alternations

.o.

lexc

xfst


Recommended