+ All Categories
Home > Documents > Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0

Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0

Date post: 01-Jan-2016
Category:
Upload: dawn-price
View: 52 times
Download: 0 times
Share this document with a friend
Description:
Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0. Magda Ševčíková Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University Prague, Czech Republic {sevcikova,zabokrtsky}@ufal.mff.cuni.cz. Outline of the talk. Introduction - PowerPoint PPT Presentation
20
Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0 Magda Ševčíková Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University Prague, Czech Republic {sevcikova,zabokrtsky}@ufal.mff .cuni.cz
Transcript
Page 1: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

Systematic Parameterized Description of Pro-forms in the Prague Dependency

Treebank 2.0

Magda Ševčíková

Zdeněk Žabokrtský

Institute of Formal and Applied Linguistics

Charles University

Prague, Czech Republic

{sevcikova,zabokrtsky}@ufal.mff.cuni.cz

Page 2: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Outline of the talk

Introduction

Description of pro-forms in the PDT 2.0 Type 1

• Personal pronouns

Type 2

• Indefinite, negative, interrogative, and relative pronouns

• Pro-adverbs and pro-numerals

Pro-forms in other languages

Final remarks

Page 3: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Introduction Pro-forms

pronouns, pro-adverbs, and pro-numerals closed classes to replace or substitute other words, phrases, or sentences anaphoric and deictic functions semantically relevant regularities within the sub-classes

• nobody-never-nowhere• everybody-always-everywhere

Pro-forms in the PDT 2.0 formal linguistic system for annotation of pro-forms

making the present regularities explicit part of the deep-syntactic layer (tectogrammatical layer, t-layer) representation by a reduced set of (underlying) lemmas in combination with

relevant attributes

Page 4: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

PDT project Historical background

mid 1960’s Functional Generative Description (Petr Sgall et al.)

1994 Czech National Corpus 1995 PDT started 1998 PDT 0.5 pre-release 2001 PDT 1.0 released by LDC (LDC2001T10)

manual annotation of morphology and surface syntax

2006 PDT 2.0 released by LDC (LDC2006T01) interlinked morphological, surface-syntactic

and complex deep-syntactic annotation

Page 5: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

PDT 2.0 Layers of annotation

Lit: [He] was would went to forest.[He] would have gone to the forest.

Tectogrammatical layer deep-syntactic dependency tree 59 % of the a-layer data 3,165  doc., 49,431  sent., 833,195  tokens

Analytical layer surface-syntactic dependency tree 75 % of the m-layer data 5,330 doc., 87,913 sent., 1,503,739 tokens

Morphological layer m-lemma and m-tag

associated with each token 7,110 textual documents 115,844 sent., 1,957,247 tokens

Word layer original text, segmented on word boundaries

Page 6: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Outline of the talk

Introduction

Description of pro-forms in the PDT 2.0 Type 1

• Personal pronouns

Type 2

• Indefinite, negative, interrogative, and relative pronouns

• Pro-adverbs and pro-numerals

Pro-forms in other languages

Final remarks

Page 7: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Description of pro-forms in the PDT 2.0

M-layer pronouns, pro-adverbs, and pro-numerals treated separately m-lemma, m-tag

T-layer 2 basic types of description

• type 1: personal pronouns

• type 2: indefinite, negative, interrogative, and relative pronouns together with pro-adverbs and pro-numerals

semantic features originally present in the word form extracted and stored as values of inner attributes of the t-node that corresponds to the given word form

Page 8: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Page 9: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Type 1Personal pronouns in the PDT 2.0

all personal pronouns (no matter whether they are pro-dropped or present in the sentence) represented by nodes labeled with a single, artificial lemma #PersPron

grammatical information expressed by a personal pronoun in the sentence is stored in node attributes person, number, and gender

attribute politeness for discerning between honorific and non-honorific usage

vy jste přišel (you came said politely to a single person) #PersPron + 2nd person + singular + masc.anim. + polite

Page 10: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Tím, že Evropská unie nechala ve rwandské operaci Francii na holičkách, podle Léotarda ukázala, že její politika nemá žádný africký rozměr. According to Léotard, by the fact that the European Union left France in the lurch concerning the Rwanda operation, [it] has shown that its politics has no African dimension.

at the t-layer, representation of personal pronouns was completed with the annotation of co-reference (i.e relations between nodes referring to the same entity)

Type 1Personal pronouns and co-reference

Page 11: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Type 2Indefinite, negative, interrogative, and relative pronouns in the PDT 2.0 in Czech, single meanings are expressed regularly by means of a relatively

small group of prefixes that join together with a small set of bases transparent correspondence between the semantic features and formal

composition of pronouns: indefinite prefix ně-: někdo (somebody) – něco (something) – nějaký (some) negative prefix ni-: nikdo (nobody) – nic (nothing)…

at the t-layer, pronouns with the same base element grouped together, each pronoun group represented by the lemma corresponding to the respective relative pronoun: e.g. někdo (somebody) and nikdo (nobody) represented by the lemma kdo (who)

corresponding possessive pronouns represented in the same way as the non-possessive ones

the semantic feature completing the reduced lemma was stored in the indeftype attribute

Page 12: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Type 2Indefinite, negative, interrogative, and relative pronouns and the indeftype attribute

all indefinite, negative, interrogative, and relative pronouns represented by only four lemmas at the t-layer

the reduced lemmas were completed by a value of the indeftype attribute 11 values:

Page 13: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Type 2Pro-adverbs and pro-numerals in the PDT 2.0

in Czech, pro-adverbs (e.g. nikde (nowhere), nějak (somehow)) and pro-numerals (e.g. několik (a few)) share certain semantic features with pronouns

represented in the same way as indefinite, negative, interrogative, and relative pronouns at the t-layer

another derivational relation can be seen between pro-adverbs with directional meaning and those of location – for example, the adverb odněkud (from somewhere) is represented as follows: lemma kde (where) + indef1 value (of the indeftype attribute) + functor

DIR1 capturing the directional meaning

Page 14: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Zakládá-li si někdo na tom, že se vyhýbá cizím slovům, pak udělá nejlíp, když se nikdy nepodívá do Etymologického slovníku jazyka českého. If someone finds it important that [he] eliminates foreign words, then the best thing [he] can do is if [he] never looks in the Etymology Dictionary of Czech.

Page 15: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Outline of the talk

Introduction

Description of pro-forms in the PDT 2.0 Type 1

• Personal pronouns

Type 2

• Indefinite, negative, interrogative, and relative pronouns

• Pro-adverbs and pro-numerals

Pro-forms in other languages

Final remarks

Page 16: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Pro-forms in other languagesPDT-like description indefinite, negative, interrogative, and relative pronouns and other pro-

forms are unproductive classes with (at least to a certain extent) transparent derivational relations also in other languages

preliminary sketch of several English and German pronouns:

still not solved: English anybody, German niemand and nirgendjemand …

Page 17: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Lit.: The teacher finds nowhere a mistake.

Der Lehrer findet nirgends einen Fehler.

In Helbig, H. (2001), Die semantische Struktur natürlicher Sprache, Springer, 2001, p. 174

Negative pro-adverbs

Lit.: Peter goes on holiday nowhere.

Peter fährt in den Ferien nirgendwo hin.

with directional meaning with local meaning

Pro-forms in other languagesHelbig’s MultiNet

Page 18: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Outline of the talk

Introduction

Description of pro-forms in the PDT 2.0 Type 1

• Personal pronouns

Type 2

• Indefinite, negative, interrogative, and relative pronouns

• Pro-adverbs and pro-numerals

Pro-forms in other languages

Final remarks

Page 19: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

Final remarks achievements:

all pro-forms in Czech divided into two groups:• personal (and corresponding possessive) pronouns• indefinite, negative, interrogative, and relative pronouns (and

corresponding possessive pronouns) and pro-adverbs and pro-numerals several pro-form analogies crossing the part-of-speech boundaries are

explicitly marked in the annotation verification of the formal system on large-scale data

future work: to elaborate the system for other languages in more detail, taking into

consideration specific phenomena of the respective language to describe the relations among pro-form systems in more languages

(for example, for the purposes of machine translation)

Page 20: Systematic Parameterized Description  of Pro-forms in the Prague Dependency Treebank 2.0

TLT 2006 [email protected]/20

http://ufal.mff.cuni.cz/pdt2.0/


Recommended