+ All Categories
Home > Documents > Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf ·...

Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf ·...

Date post: 19-Jan-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
2014/09/01 Workshop on Finite-State Language Resources Sofia Inflection transducers for simple words Éric Laporte
Transcript
Page 1: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

2014/09/01

Workshop on Finite-State

Language Resources

Sofia

Inflection transducers for simple words

Éric Laporte

Page 2: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Outline

01/09/2014 2 Inflection transducers for simple words

Inflection

Lemma dictionaries

Generation of inflected forms

Inflection transducers

Operators

Page 3: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Inflection and derivation

Inflection collect collects collected collecting collect is the lemma or canonical form

Derivation collect collection collector collective collectivism collect is the base

Dubious cases Diminutives

Inflection in Portuguese copo copinho lemma: copo features: msD Derivation in German Glas Gläschen lemma: Gläschen features: Nsn

01/09/2014 3 Inflection transducers for simple words

Page 4: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Inflected-form dictionary

01/09/2014 4 Inflection transducers for simple words

блестящото,блестящ.A:snd влак,влак.N+m:s0 влака,влак.N+m:c влака,влак.N+m:sh влако,влак.N+m:v влакове,влак.N+m:p0 влаковете,влак.N+m:pd влакът,влак.N+m:sl глава,глава.N+f:s0 главата,глава.N+f:sd глави,глава.N+f:p0 главите,глава.N+f:pd главо,глава.N+f:v добра,добър.A:sf0

Source: Svetla Koeva, Cvetana Krstev

Page 5: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Updating

an inflected-form dictionary

Evolution of the language, of the domain, of spelling, of a project

Errors

Generate a version Each version of an inflected-form dictionary can be generated

from a lemma dictionary

01/09/2014 5 Inflection transducers for simple words

Page 6: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Variations

Variation in form Suffixes give gave given

Variation in grammatical features Tense/mood infinitive preterit past participle Inflectional features

Inflection without variation in form hit hit hit

01/09/2014 6 Inflection transducers for simple words

Page 7: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Lemma

One of the inflected forms, chosen to represent all others in the lexical entry

collect collects collected collecting lemma: collect

01/09/2014 7 Inflection transducers for simple words

Page 8: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Outline

01/09/2014 8 Inflection transducers for simple words

Inflection

Lemma dictionaries

Generation of inflected forms

Inflection transducers

Operators

Page 9: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Lemma dictionary

01/09/2014 9 Inflection transducers for simple words

блестящ,A3 влак,N1+m глава,N600+f добър,A4 индианец,N2+m индиански,A2 индиец,N3+m индийски,A2 индикация,N603+f индия,N601+f+NProp кораб,N8+m корабче,N301+n ладия,N603+f лице,N300+n

лодка,N602+f лондонски,A2 мъж,N4+m параход,N8+m Париж,N7+m+Nprop плавателен,A5 съд,N1+m фракция,N603+f франция,N601+f+NProp французин,N9+m+NProp французки,A2 червен,A3 член,N5+m човек,N6+m

Source: Cvetana Krstev

Page 10: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

The DELAS format

01/09/2014 10 Inflection transducers for simple words

влак,N1+m

lemma part of speech inflectional behaviour other information

Page 11: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Outline

01/09/2014 11 Inflection transducers for simple words

Inflection

Lemma dictionaries

Generation of inflected forms

Inflection transducers

Operators

Page 12: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Generation of inflected forms

01/09/2014 12 Inflection transducers for simple words

влакът,влак.N+m:sl

влак N1+m :sl

Page 13: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Two approaches

01/09/2014 13 Inflection transducers for simple words

oaths,oath.N:p

oath N1 :p

dashes,dash.N:p

dash N3 :p

oaths,oath.N:p

oath N1 :p

dashes,dash.N:p

dash N1 :p

oath+s dash+s

Taxonomic approach Detailed taxonomy of

inflectional behaviours Strengths: readable

transducers, updatability

Tools: Unitex-Gramlab

Context-based approach Context-sensitive rules Strength: Non-redundant

transducers Tools: two-level

morphology

Page 14: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Outline

01/09/2014 14 Inflection transducers for simple words

Inflection

Lemma dictionaries

Generation of inflected forms

Inflection transducers

Operators

Page 15: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

The taxonomic approach to

generation of inflected forms

SILBERZTEIN, Max. 1998. “INTEX: An integrated FST toolbox”, in Derick WOOD, Sheng YU (eds.), Automata Implementation, p. 185-197, Lecture Notes in Computer Science, vol. 1436. Second International Workshop on Implementing Automata (1997), Berlin/Heidelberg: Springer.

01/09/2014 15 Inflection transducers for simple words

Inflection transducer for влак Source: Cvetana Krstev

Page 16: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

The taxonomic approach to

generation of inflected forms

01/09/2014 16 Inflection transducers for simple words

In the boxes Suffixes to be appended to the

lemma Operators to edit the lemma

Below the boxes Encoded inflectional features

Name of the transducer N1.grf Same as the code for the

inflectional behaviour

Level of generality All nouns with inflectional

behaviour modelled by this transducer

Inflection class

Page 17: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

How to create and edit an

inflection transducer with Unitex

01/09/2014 17 Inflection transducers for simple words

Type in the boxes <E>/:s0 а/:sh:c ът/:sl ...

Page 18: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

How to generate an inflected-form

dictionary with Unitex

Save the lemma dictionary in the Dela directory of the language, with extension .dic

Save the inflection transducers in the Inflection directory

With the DELA menu of Unitex > Check Format choose the DELA format > Inflect choose "Allow only simple words" > Compress into FST

01/09/2014 18 Inflection transducers for simple words

Page 19: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Outline

01/09/2014 19 Inflection transducers for simple words

Inflection

Lemma dictionaries

Generation of inflected forms

Inflection transducers

Operators

Page 20: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

Operators modify the lemma before appending a suffix

L (for left): delete last letter

01/09/2014 20 Inflection transducers for simple words

плавателен LLният плавателе LLният плавател LLният плавателният LLният

Page 21: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

Last letters L (for left) push the last letter onto the stack D delete the last letter R (for right) pop one letter from the stack C duplicate the last letter U unaccent the last letter and others see the manual (Paumier, 2002)

First letters P capitalize the first letter W lower-case the first letter <I=?> insert ? before the first letter <X=n> delete the first n letters <R=?> replace the first letter with ?

01/09/2014 21 Inflection transducers for simple words

Page 22: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

L (for left): push the last letter onto the stack

D: delete the last letter R (for right): pop one letter from

the stack Any remaining letter in the stack

is discarded at the end

01/09/2014 22 Inflection transducers for simple words

плавателен _ LDRият плавателе н LDRият плавател н LDRият плавателн _ LDRият плавателният _ LDRият

Page 23: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

C: duplicate the last letter

01/09/2014 23 Inflection transducers for simple words

French appeler "call"

appeler _ DDCe appele _ DDCe appel _ DDCe appell _ DDCe appelle _ DDCe

English

hit _ Cing hitt _ Cing hitting _ Cing

Page 24: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

U: unaccent the last letter

01/09/2014 24 Inflection transducers for simple words

Portuguese miúdo "tiny"

miúdo _ DLURinho miúd _ DLURinho miú d DLURinho miu d DLURinho miud _ DLURinho miudinho _ DLURinho

Page 25: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Operators to modify the lemma

<I=?>: insert ? before the first letter

01/09/2014 25 Inflection transducers for simple words

Serbian trom "sluggish"

trom _ <I=j><I=a><I=n>ijem jtrom _ <I=j><I=a><I=n>ijem ajtrom _ <I=j><I=a><I=n>ijem najtrom _ <I=j><I=a><I=n>ijem najtromijem _ <I=j><I=a><I=n>ijem

Page 26: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

Details

Inflection transducers may have subgraphs They may not contain lexical masks referring to information

in dictionaries The inflection tool preserves the case (upper vs. lower) of

letters in lemmas and suffixes

01/09/2014 26 Inflection transducers for simple words

Page 27: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

The two approaches and

updatability of transducers

Taxonomic approach An update of a transducer does not affect inflection in other

inflection classes It is easy to control the evolution of the transducers

Context-based approach Most rules apply to any entry Only exceptions have conditions of application which take

into account inflection classes An update of a rule may affect the inflection of any entry It is difficult to predict the consequences of a change

01/09/2014 27 Inflection transducers for simple words

Page 28: Inflection transducers for simple wordsdcl.bas.bg/sites/default/files/inflection-simple.pdf · 2014. 9. 2. · The inflection tool preserves the case (upper vs. lower) of letters

CONTACT

ÉRIC LAPORTE

00 +33 (0)1 60 95 75 52

[email protected]

Thanks


Recommended