+ All Categories
Home > Documents > Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to...

Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to...

Date post: 06-Apr-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
8
Composite Tense Recognition and Tagging in Serbian Duˇ sko Vitas Faculty of Mathematics University of Belgrade [email protected] Cvetana Krstev Faculty of Philology University of Belgrade [email protected] Abstract The technology of finite-state transduc- ers is implemented to recognize, lem- matize and tag composite tenses in Ser- bian in a way that connects the auxiliary and main verb. The suggested approach uses a morphological electronic dictio- nary of simple words and appropriate lo- cal grammars. 1 Introduction The lemmatization of verb forms is, in general, re- duced to the assignment of a predefined canonical form to simple verb forms. In Serbian/Croatian this canonical form is the infinitive. This princi- ple can be successfully applied, under certain con- straints, to other inflective words as well, namely to the lemmatization of nouns and adjectives. However, the lemmatization of verb forms, viewed as the establishment of a relation between tex- tual word and lexical word and the assignment of values of morphological categories that con- nect them, has many deficiencies (Gross, 1998- 1999), since composite verbs, though they repre- sent conjugated forms of a verb, cannot be recog- nized within the same framework. For instance, the string video ga je (Engl. he saw him) will be tagged as an active past participle of the verb videti in singular masculine form, followed by a clitic pronoun ga, followed by the third person present of the auxiliary verb jesam . Comparing this string with the corresponding string in present tense vidi ga (Engl. he sees him) it can be clearly observed that the form video in the first example should be tagged as a third person perfect of the verb videti with the additional information that the form is of masculine gender. One of the reasons for which the composite tenses are not recognized during the morpholog- ical analysis is due to the inserts that separate the auxiliary verb form from the form of the main verb. The distance between these two forms can be considerable, measured either with the num- ber of inserted words or with the complexity of the syntactic structure of the inserted word se- quence. Further reasons to postpone the composite tense recognition until the syntactic analysis can be found in the so-called free word order and in the ambiguities of auxiliary and main verb forms (Popovi´ c, 1997) On the other hand, the consequences of inade- quate recognition of composite verbs during mor- phological analysis are manifold. First of all, the problem of the recognition of composite tenses is thus pushed toward the syntactic analysis and for that reason the process of lemmatization can be accomplished only partially on the morphological level, while a considerable number of ambigui- ties cannot be eliminated during the morphologi- cal analysis. In this article we will present the problem of the recognition of composite active tenses in con- temporary Serbian as well as one partial solution that is based on the application of finite transduc- ers. First we will describe a Serbian morpholog- ical e-dictionary of simple verb forms (in section 2), in section 3 we will indicate the problems en-
Transcript
Page 1: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

CompositeTenseRecognitionand Taggingin Serbian

Dusko VitasFacultyof MathematicsUniversityof Belgrade

[email protected]

CvetanaKrstevFacultyof Philology

Universityof [email protected]

Abstract

The technologyof finite-statetransduc-ers is implementedto recognize,lem-matizeandtagcompositetensesin Ser-bianin awaythatconnectstheauxiliaryandmainverb. Thesuggestedapproachusesa morphologicalelectronicdictio-naryof simplewordsandappropriatelo-calgrammars.

1 Intr oduction

Thelemmatizationof verbformsis, in general,re-ducedto theassignmentof apredefinedcanonicalform to simple verb forms. In Serbian/Croatianthis canonicalform is the infinitive. This princi-plecanbesuccessfullyapplied,undercertaincon-straints,to otherinflective wordsaswell, namelyto the lemmatizationof nouns and adjectives.However, thelemmatizationof verbforms,viewedas the establishmentof a relation betweentex-tual word and lexical word and the assignmentof valuesof morphologicalcategories that con-nect them, hasmany deficiencies(Gross,1998-1999),sincecompositeverbs,thoughthey repre-sentconjugatedformsof a verb,cannotberecog-nized within the sameframework. For instance,the string videogaje (Engl. he saw him) willbe taggedasan active pastparticipleof the verbvideti in singularmasculineform, followed by aclitic pronounga, followed by the third personpresentof the auxiliary verb jesam. Comparingthisstringwith thecorrespondingstringin presenttensevidi ga(Engl. heseeshim) it canbeclearly

observed that the form video in the first exampleshouldbe taggedasa third personperfectof theverbvideti with theadditionalinformationthattheform is of masculinegender.

One of the reasonsfor which the compositetensesarenot recognizedduring the morpholog-ical analysisis dueto the insertsthatseparatetheauxiliary verb form from the form of the mainverb. The distancebetweenthesetwo forms canbe considerable,measuredeither with the num-ber of insertedwords or with the complexity ofthe syntactic structureof the insertedword se-quence.Furtherreasonstopostponethecompositetenserecognitionuntil the syntacticanalysiscanbe found in the so-calledfree word orderand inthe ambiguitiesof auxiliary andmain verb forms(Popovic, 1997)

On the otherhand,the consequencesof inade-quaterecognitionof compositeverbsduringmor-phologicalanalysisaremanifold. First of all, theproblemof the recognitionof compositetensesisthuspushedtoward thesyntacticanalysisandforthat reasonthe processof lemmatizationcan beaccomplishedonly partially on themorphologicallevel, while a considerablenumberof ambigui-ties cannotbe eliminatedduring the morphologi-cal analysis.

In this article we will presentthe problemoftherecognitionof compositeactive tensesin con-temporarySerbianaswell asonepartial solutionthat is basedon theapplicationof finite transduc-ers. First we will describea Serbianmorpholog-ical e-dictionaryof simpleverb forms (in section2), in section3 we will indicatetheproblemsen-

Page 2: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

counteredin lemmatizationbasedon simpleverbforms,andin section4 we will presentthestruc-tureof compositeactiveverbtensesin Serbianandone possibility to representthem by finite trans-ducers.In theconclusionwewill discussthelimi-tationsof this solutionandwill outlinefurtherde-velopments.

2 E-dictionary of simpleverb forms

Themorphologicale-dictionary(DELAS) of Ser-bian is being developedin the format describedin (Courtois,1990),(Vitas, 2000). Presentlythisdictionarycontainsapproximately15,000verben-tries, which correspondsto typical one-volumeSerbian/Croatiandictionaries. In this dictionaryeachverb (with a few exceptions)is representedby its infinitive form. For eachverb,simpleformsof conjugation,givenascharacterstringsbetweentwo consecutive separators,have beengeneratedtogetherwith possiblevaluesof theirmorphologi-calcategories.Thistaskhasbeenaccomplishedbyusingdescriptionsof differentverb classesin theform of regularexpressionsandtheir implementa-tion by finite transducersincorporatedin the IN-TEX system(Silberztein,1993). A partof a reg-ular expressionin theINTEX formatof thetrans-ducerV122.fst is:

2/:Ays:Azs +2cxe/:Fzs:Fzp +2la/:Gsf:Gpn +2na/:Tfs:Tnp +<E>/:W +2vsxi/:X +4zxem/:Pxs +4zxi/:Yys +

So far, 339 transducershave beendevelopedthatpreciselydescribethe simpleverb forms of con-jugation,startingfrom theverbs’ infinitive forms.For eachverb,in additionto its verbforms,all theinflectedforms of the correspondingverbalnounandpassivepastparticiple,if they exist, havebeengeneratedaswell. Theareain DELAS dedicatedto thedesignationof syntacticandsemanticchar-acteristicshasbeenfilled for eachverb with itsbasic features: aspect,reflexiveness,and transi-tiveness. This dictionary containsverbs in bothekavianandijekavianpronunciation,whichis also

marked in this areaof theDELAS dictionary(Vi-tas,2001).

An exampleof a few entriesin theDELAS dic-tionaryis:

pokazati,V122+Perf+Tr+Iref+Refpokazivati,V18+Impf+Tr+Iref+Ref

where, for instance,the tags+Perf+Tr+Iref+Refsignify, respectively, that theverbpokazati(Engl.to show) is perfective, transitive,andcan,but neednot, be reflexive. The tag V122 signifiesthat theconjugationof thisverbis describedby transducerV122.fst. Thesimpleverbformsdescribedby thistransducerare: infinitive (W), present(P), aorist(A), imperfect(I), imperative (Y), future (F), ac-tivepastparticiple(G), passive pastparticiple(T),presentparticiple (S), and perfectparticiple (X).An exampleof someof the 30 generatedsimpleformsfor verbpokazatiis:

pokaza,pokazati.V122+Perf+Tr+Iref+Ref:Ays:Azspokazacxe,pokazati.V122+Perf+Tr+Iref+Ref:Fzs:Fzppokazala,pokazati.V122+Perf+Tr+Iref+Ref:Gsf:Gpnpokazana,pokazati.V122+Perf+Tr+Iref+Ref:Tfs:Tnppokazati,pokazati.V122+Perf+Tr+Iref+Ref:Wpokazavsxi,pokazati.V122+Perf+Tr+Iref+Ref:Xpokazxem,pokazati.V122+Perf+Tr+Iref+Ref:Pxspokazxi,pokazati.V122+Perf+Tr+Iref+Ref:Yys

Proceedingfrom the information in morphologi-cale-dictionaryandusingtheformalismsincorpo-ratedin theINTEX systemit is possibleto formu-latecomplex querieson texts. In the initial phaseof text processing— theapplicationof lexical re-sources— eachtext string thatoccursin someoftheapplieddictionariesof theDELAF form is as-signedone or more lexical entrieswith possiblegrammaticalcategories.This enablestheprocess-ing of, amongothers,queriesof theforms:

� pokazati — matchesall text stringsthatliterally coincidewith thequerystring;

Page 3: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

��� pokazati � — matchesall text stringstowhich the lemmapokazatiis assignedin thedictionary;

��� pokazati:P � — matchesall text stringsthat coincide with some form of the verbpokazati in presenttense(accordingto thedictionary);

��� pokazati:Ps � — matches all textstringsthatcoincidewith somesingularformof theverbpokazatiin presenttense(regard-lessof person);

��� pokazati:G � — matchesall text stringsthat coincidewith someform of the activepastparticiple of the verb pokazati(regard-lessof number),etc.

Thesyntacticandsemanticinformationassoci-atedto verbentriesin DELAS canalsobeusedtoexpressqueries:

��� V � — matchesall text stringsthatcoincidewith somesimpleverbform;

��� V-Aux:P � — matchesall text stringsthatcoincidewith somepresenttenseform of averbthatis notauxiliary, etc.

Even more complex queriescan be formulatedthroughlocal grammars(Roche,1997).

3 Lemmatization of simpleverb forms

The recognitionof simple verb forms has beentestedon several different texts. First we givequantitative data for four texts, marked consec-utively as R, P, K, and F that are describedinmoredetailsin AppendixA. Dataaboutthelengthof texts andfrequenciesof particularsimpleverbforms (without disambiguation)is given in Table1. N in the Tabledenotesthe numberof simpleforms (and different simple forms), that are se-quencesof alphabeticcharactersbetweentwo sep-arators.

Verb forms that participate in the produc-tion of compositetenses,active past participle( � V:G � ) and infinitive ( � V:W � ) for activetenses, and passive past participle ( � V:T � )for passive tensesrepresentover a quarterof all

R P K V

N 18188 147913 88095 60176

(diff.) (4966) (26884) (16412) (15051)�

V � 6465 34090 27354 12571�

V:W � 340 1414 755 430�

V:G � 940 6438 5361 3638�

V:T � 239 2169 997 644

% 24% 29% 26% 37%

Table 1: Text lengths and frequency of occur-rencesof certainverbforms.

stringspotentiallytaggedasverb forms( � V � ),without any disambiguationbeingattempted.Thecells in the last row of Table 1 are computedas( � V � /( � V:G � + � V:W � + � V:T � ))*100.

Activecompositetensesarebuild with auxiliaryverbs jesam, biti (Engl. to be), and hteti (Engl.shall,will ) andimpersonalsimpleverbforms.Ta-ble 2 shows thetotal frequency of auxiliary verbsaswell asfrequency of formsthatenterinto com-positetenses.

R P K V�

jesam� 1239 7632 4705 3209�

jesam:Pi� 1076 7090 3985 2905�

jesam:Ph� 145 510 709 301�

hteti� 210 974 429 290�

hteti:Pi� 125 831 298 252�

hteti:Ph� 33 111 74 16�

hteti:G� 25 18 30 17�

biti � 380 1667 1063 680�

biti:P � 16 196 67 17�

biti:A � 170 478 708 125�

biti:G � 136 636 503 460

Table 2: Frequency of occurrencesof auxiliaryverbsin differenttexts.

Fromthedatain Table2 it canbeconcludedthatauxiliary verb formsthatparticipatein compositetenseformation representthe dominantusageoftheseverbs.By comparisonof datafrom Tables1and2onecanseethattaggingby e-dictionarydoesnotgivetheproperinsightinto thewayaparticularverbis realizedin thetext.

In theprocessof lemmatizationandtaggingof aSerbiantext a high degreeof ambiguityof simple

Page 4: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

Figure1: Descriptionof active compositetensesin Serbian

verbformsis prominent.Theoriginsof theambi-guity of stringsthatcanpotentiallybeverbformscanbevarious:

(a) A string can representseveral different re-alizationsof morphologicalcategoriesof thesameverb,e.g. thestringpeva is at thesametime the third personsingularof the presenttenseandthesecondandthird personsingularof theaoristtenseof theverbpevati (Engl. tosing), while theclitic form ce is thethird per-sonsingularandplural of thepresenttenseoftheauxiliaryverbhteti (Engl. to wish).

(b) A stringcanrepresentformsof differentverbs.Suchis thecasewith stringzeli thatrepresentsthethirdpersonsingularof thepresenttenseoftheverbzeleti (Engl. to desire) andthepluralmasculinegenderof theactive pastparticipleof theverbzeti (Engl. to reap).

(c) A string canrepresentverb forms aswell asforms of someother part of speech. Suchis the casewith string vise that canrepresentoneof severalcomparative formsof theadjec-

tive visok (Engl. high), theadverbvise(Engl.more), theprepositionvise(Engl. above), andthe third personof the aorist of the verb viti(Engl. to wind). Similarly, the string sirenais the nominative singularof the nounsirena(Engl. siren), but also the singularfemininegenderof theactive pastparticipleof theverbsiriti (Engl. to producecheese).

The problemof disambiguationis particularlydifficult in thecaseof pronounforms,suchasmi(thenominativeof thepronounweandtheclitic ofthedative of thepronounI) andje (theaccusativeof the pronounshe/it), the conjunctionda (Engl.to, that, etc.), and the particlesda (Engl. yes)and li (Engl. if, whether) with certain forms ofthe verbsmiti (Engl. to wash), jesam(Engl. tobe), dati (Engl. to give), andliti (Engl. to pour).This kind of ambiguitycanpartly be removed byputting more frequentforms into a filter dictio-narythatgivesthemprecedenceover lessfrequentforms. For instance,the particle li is muchmorefrequentthanthethird personsingularof theaoristof theverbliti .

Page 5: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

eprestano jezero pod sobom, tako da cxu primetiti: F1 kad se priblizxuje ladxotisxla je pesxice na Vecxe. O, ja nisam video: PR, ali kazxu da je isxao i--- O, ja nisam video, ali kazxu da je isxao: PR i jedan automobil. - Mora dasxao i jedan automobil. -- Mora da je bio: PR debeo led? - - Sedamnaest pedal. - Radim po jednom ugledu koji mi je dala: PR markiza. - - Vi radite za kiji muzx ima lovisxte. Stari markiz je umro: PR josx pre no sxto sam se rodiezero. Ponekad pobesni. -- Nocxas cxe biti: F1 mirno? - - Mozxda i ne. Izgleoni su vecx dosta veliki, a uskoro cxe biti: F1 mnogo vecxi. - - Vi se ne bis

cxe biti mnogo vecxi. - - Vi se ne biste mogli:C1 vratiti vecyeras? - Ne,ne

Figure2: Compositetensesrecognizedby graphfrom theFigure1.

4 The structure of compositetensesinSerbian

In Serbiansix compositetensesareusedin theac-tive voice. The way they are constructedis de-scribedby thegraphin theFigure1. Theapplica-tion of this transducerto text R recognizesa totalof 401occurrencesof compositetenses.Thecon-cordancesof therecognizedoccurrencesaregivenin Figure2.

However, the graph in Figure 1 does nottake account of variations of different kinds.First, word order can vary so that the formof the auxiliary verb follows the form of themain verb. The recognizedforms of the perfect(PR) and conditional I (C1) shown in Figure2 can be realized, in a different context asvideonisam, isaoje, ����� , mogli biste. The rec-ognized compositeverb forms of the future Itense(F1) cu primetiti and cubiti shown in thesame Figure have two alternative forms: (a)the simple form primeticu and bicu, and (b) theda-constructionja cudaprimetim, ja cu dabudemthat can be describedwith the regular expres-sion (<hteti:Pi> + <hteti:Ph>) da<V:P>. Moreover, the auxiliary verb is omittedin the third personsingular of the perfect tensewhen the reflexive pronoun se occurs togetherwith the main verb. For instance,insteadof thestringbojao:Gsm se:PRO je:Pzs in text Rthestringbojao:Gsm se:PRO is realized.

Second,thegraphin Figure1 doesnot expresstheconditionthattheauxiliaryverbandmainverbhave to agreein genderandnumber. For instance,the sequencedosxla:Gsf smo:Pxp can notbe a potential perfect tensebecausethere is noagreementin number(dosxla is a singularfem-

inine active pastparticiple,while smo is the firstpersonpluralof theverbjesam).

Numberof words Frequency

0 428

1 84

2 41

3 19

4 37

more(non-greedy) 648

more(greedy) 586

Table3: Frequency of insertsof differentlength

Third, a string of simple words of arbitrarylength can be insertedbetweenthe form of theauxiliary verb andthe form of the main verb. InTable3 thefrequency of insertsof differentlengththatoccurredin text P betweentheauxiliary verb� jesam� and � V:G � (potentially representingtheperfecttense)is given.

The following occuramonginsertscomprisingone word: The reflexive pronounse, the cliticparticle li , the clitic pronouns,adverbs,but alsothe conjunction da that introduces a depen-dent clause (for instance, Da su nxegoviroditelxi znali da sam ja htela,).Among inserts of two words occurs, for in-stance Mislila sam da ste otisxlithat was already recognizedamong inserts oflength0 asMislila sam andste otisxli.This shows that the greedy algorithm is notan adequatesolution in recognizing compositetenses. However, with a non-greedyalgorithmundesirable occurrences of composite tenserecognition also appear, as in the example Toje kao moja majka koja nije htela...

Page 6: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

Figure3: A partof thesubgraphins.fst thatrecognizespronominalclitics.

Figure4: Subgraphperfect.fst thatrecognizesperfecttense.

wherethesearchalgorithmlooksfor thefirst formof the active past participle following the cliticform of theauxiliaryverb.

The structureof insertscan be modeledby asubgraphthat needsto be insertedat certainpo-sitionsin graphshown in theFigure1. This graphnamedins.fst, actsasafilter thattendsto describethe permittedinserts. This graphis built stepbystep on the basisof the analysisof the concor-dancesthat recognizethe compositetenses.Fig-ure 3 shows oneof its subgraphsthat recognizesinsertionsconsistingof pronominalclitics by tak-ing into considerationtheir order.

Thedescriptionof variationsin thestructureofcompositeverbsfrom Figure1 leadsto the sub-stitution of paths in this graph with subgraphsthat recognizeparticular compositetenses,tak-ing into considerationthe statedconstraints.Thesubgraphperfect.fst that recognizesthe perfecttenseis given in Figure 4. The arcs in a sub-graphcanbe representedby othersubgraphsthatareimplementedasfinite transducers.Theoutputof eachtransduceris the morphologicalcodeofthe recognizedform. The transducersinside the

grapharelabeledasvariables:$5 labelsthesub-graphjesam-ceo thatencompassesall clitic andnegatedforms of the verb � jesam � . This kindof labelingenablesa shift of the insertsinto newposition,aftertherecognizedform of thecompos-ite tense.

An example of successfulrecognitionof theperfect tensein text R with the transducerper-fect.fst is given in Figure5. Theunderlinedpartsof text outsidethe parenthesisarefragmentsrec-ognizedby subgraphins.fst.

Examplesof unsuccessfulrecognitionaregivenin Figure 6. In the first examplethe numberofactive pastparticiple(p) andnumberof auxiliaryverb (s) do not agree.In the secondexampletheform of theclitic pronounje hasnotbeenresolvedcorrectly, assubgraphins.fst doesnot forbid oc-currencesof theauxiliaryverbforms.

text R P K F

c tenses 947 5985 1342 3027

Table4: Numberof recognizedcompositetensesin analyzedtexts.

Page 7: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

To :Pxs:Gms � sam mislio i. Mozxda cxu tamo nacxiniz ulice. :Gms:Pxs � Peo sam se ipak ulicama. Kameni izlaz

-- Znacyi :Pzsh:Gfs � nije mogla se loviti riba. -- Ne, nije...................................................................jednom ugledu koji mi :Pzs:Gfs � je dala markiza.

Stari markiz :Pzs:Gms � je umro josx pre no sxto sam se rodila.umro josx pre no sxto :Pxs:Gfs � sam rodila se.

Iako :Pxsh:Gms � nisam primetio josx nisxta na jezeruKazxem cyoveku s kim :Pxs:Gms � sam govorio josx malocyas na obali:

-- Ah, pa ona :Pzs:Gfs � je otisxla ima vecx deset minuta. Cyekalaa vecx deset minuta. :Gfs:Pzs � Cyekala je vas pet minuta,

je pet minuta, ali :Pxp:Gmp � smo mislili posle da ste se predomislili.smo posle mislili da :Pyp:Gmp � ste predomislili se.

Figure5: Perfecttenseswith insertsrecognizedby thegraphperfect.fst

Siroto. :Gfp:Pzs � Prosxle je zime celo jezero bilo zamrznuto.na jezero nikako :Pzs:Gms � je video nisam.

Figure6: Incorrectrecognitionby thegraphperfect.fst

The graph composite.fst that substitutesthegraphfrom Figure1 in which the pathsfrom thestartingnodeto thefinal nodearesubstitutedwithcorrespondingsubgraphsanalogousto the onefrom the Figure4 recognizesthe compositeverbtensesandproducesthe resulton text R shown inFigure7. Thetotalnumberof recognizedcompos-ite tensesis givenin Table4.

5 Conclusion

By tagging the text with information obtainedfrom the morphologicale-dictionaryand bycon-struction of appropriatelocal grammarsin theform of finite transducers,it is possibleto recog-nize with considerablereliability the occurrencesof compositetensesin Serbiantexts. In this waythe recognition of compositetensesremainsinthe scopeof morphologicalanalysisand can beachievedwith thesametechnologythatis usedforothermorphologicalphenomena.The refinementof obtainedresultsis tightly coupledwith thede-greeof precisionof the graph ins.fst that recog-nizesinserts(Gross,2000). On theotherhand,itis expectedthatanumberof ambiguitiesdescribedin section3 will beresolved throughthedevelop-mentof a dictionaryof compoundsDELAC andadictionaryfor disambiguationDESAMB.

Acknowledgement

We arethankful to Prof. Ljubomir Popovic fromFacultyof PhilologyatUniversity of Belgradeforhis valuablecomments.

References

Courtois, Blandine; Max Silberztein (eds.). 1990.Dictionnaires electroniquesdu francais. Languefrancaise87.Paris: Larousse

Gross,Maurice.1998-1999.”Lemmatizationof com-pound tensesin English”. Lingvisticae Investiga-tiones, 22:71-122.

Gross,Maurice.2000.A Bootstrapmethodfor Con-structingLocal Grammars.In: Bokan,Neda(Ed.):Proceedings of the Symposium”ContemporaryMathematics”, Faculty of Mathematics,Universityof Belgrade.229-250.

Popovic,Ljubomir. 1997.Redreci u recenici.Beograd:Drustvo zasrpskijezik i knjizevnost.

Roche,Emmanuel;Schabes,Yves(eds.)1997.FiniteStateLanguageProcessing, Cambridge,Mass.: TheMIT Press

Silberztein,Max D. 1993.Le dictionnaireelectroniqueet analyseautomatiquede textes: Le systemeIN-TEX, Paris: Masson

Page 8: Composite Tense Recognition and Tagging in Serbianof the aorist tense of the verb pevati (Engl. to sing), while the clitic form ´ce is the third per-son singular and plural of the

:Pyp--PRO:se-V:W � cxete � vratiti,.V156+Perf+Tr+Iref+Ref:W se. � S ��:Pxs:Gms ��� sam,jesam.V575+Imperf+It+Iref+Aux:Pxsi � mislio,misliti.V6:Pxs--V:W � cxu � nacxi,.V191+Perf+Tr+Iref+Ref:W �� tamo,.ADV za vecy:Pyp--V:W � cxete biti �� sa,.PREP gostionicom � zadovolxni,zadovolxan:Pzs:Pzp--V:W � cxe � udesiti,.V158+Perf+Tr+Iref+Ref:W vam i za � spa:Pzs:Pzp--V:W � cxe uzeti vas � posxtanska,posxtanski.A2+PosQ:akms2g::Pxs--V:W � cxu � primetiti,.V156+Perf+Tr+Iref+Ek:W � kad se � pribliz:Pxs--PRO:se-V:W � cxu peti se ovim ulicyicama. � S -- Bolxe � je,jesa:Gms:Pxs ��� peo,peti.V72+Imperf+Tr+Iref:Gsm �� sam,jesam.V575+Imperf+It+:Gfp:Pzs ��� prosxle,procxi.V191+Perf+Tr+Iref:Gpf �� je,jesam.V575+Imperf:Pzsh-PRO:se:Gfp ��� nije,jesam.V575+Imperf+It+Iref+Aux:Pzsh � mogla,mo:Pzsh-PRO:se:Gfp ��� nije,jesam.V575+Imperf+It+Iref+Aux:Pzsh � mogla,mo:Gfp:Pzs ��� otisxla,oticxi.V690+Perf+It+Iref:Gsf:Gpn � je,jesam.V575+Im:Pxsh:Gms ��� nisam,jesam.V575+Imperf+It+Iref+Aux:Pxsh � video,videti.V:Pzs:Gms ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi � isxao,icxi.V569+Im:Pzs:Gms ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi � bio,biti.V77:Gsm :Pzs:Gfp ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi � dala,dati.V103+Per:Pzs:Gms ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi umro josx pre � n

:Pxs-PRO:se:Gfp ��� sam,jesam.V575+Imperf+It+Iref+Aux:Pxsi � rodila,rod :Pzs:Gms ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi � sluzxio,sluzxiti.V:Pzs:Pzp--V:W � cxe biti � mirno,miran.A18:aens1g:aens4g:aens5g ? � S :Pxsh:Gms ��� nisam,jesam.V575+Imperf+It+Iref+Aux:Pxsh � primetio,prime:Pxs:Gms ��� sam,jesam.V575+Imperf+It+Iref+Aux:Pxsi � govorio,govoriti.:Pzs:Gfp ��� je,jesam.V575+Imperf+It+Iref+Aux:Pzsi � otisxla,oticxi.V69:Gnp:Pzs ��� cyekala,cyekati.V1+Imperf+Tr+Iref:Gsf:Gpn � je,jesam.V575+I

Figure7: Excerptfrom theconcordancesof therecognizedcompositetenseswith assignedlemma.

Vitas,Dusko; Krstev, Cvetana;Pavlovic-Lazetic, Gor-dana;Nenadic, Goran.2000.RecentResultsin Ser-bianComputationalLexicography. In: Bokan,Neda(Ed.): Proceedingsof the Symposium”Contempo-rary Mathematics”, Faculty of Mathematics,Uni-versityof Belgrade,111-128.

Vitas,Dusko; Krstev, Cvetana;Pavlovic-Lazetic, Gor-dana.2001.The Flexible Entry. In: Zybatow, G. etal. (eds.):CurrentIssuesin Formal Slavic Linguis-tics.Leipzig: Universityof Leipzig.461-468.

A List of analyzedtexts

R - Rastko Petrovic: Ljudi govore, GecaKon,Beograd,1931(novel)

P - Six completeissuesof web-editionofdaily newspaperPolitika (from ����� to �������October2000)

K - RadeKuzmanovic: Partija karata, Nolit,Beograd1982(shortstories)

F - Miodrag Popovic: Velikani starog Filo-zofskog fakultetau Beogradu, (numbers1 to

36),Politika, ��� ��� Octoberto ��� ��� November2002,(feuilleton)


Recommended