+ All Categories
Home > Documents > Lexicography and computer science: a harmless drudgery? Judith Knapp ([email protected]) Andrea Abel...

Lexicography and computer science: a harmless drudgery? Judith Knapp ([email protected]) Andrea Abel...

Date post: 27-Mar-2015
Category:
Upload: maya-obrien
View: 220 times
Download: 2 times
Share this document with a friend
76
Lexicography and computer science: a harmless drudgery? Judith Knapp ([email protected]) Andrea Abel ([email protected]) European Academy Bozen - Bolzano
Transcript
Page 1: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Lexicography and computer science: a harmless

drudgery?

Judith Knapp ([email protected])Andrea Abel ([email protected])

European Academy Bozen - Bolzano

Page 2: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Content

Learner‘s Difficulties and Needs Pedagogical Lexicography Today – A Short Overview ELDIT – Linguistic-lexicographic Background & Live

Demo Datamodel Implementation Content Authoring ELDIT and Word Manager ELDIT and the TreeTagger Literature Conclusion

Page 3: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Learners‘ difficulties and needs

Problems with foreign language use

Decoding Encoding

Problems

Syntagmaticlevel

Paradigmaticlevel

Semanticlevel

Page 4: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

PROBLEMS WITH SYNONYMS

AND SIMILAR WORDS

(meeting)

convegno

riunione

incontro

assemblea

Assemblea condominiale (condominium meeting)

assemblea d‘affari (business meeting)

Page 5: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

DIFFICULTIES WITH WORD COMBINATIONS

Collocations

fixed combinations

of words (arbitrary,

unpredictable):

Ex:

• to brush one‘s teeth

• lavarsi i denti

•sich die Zähne putzen

Grammatical Constructions

formed according to the rules of grammar, partly arbitrary:

Ex:

• to ask sb sth

•chiedere qlco a qlcu

• jemanden etwas fragen

Page 6: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Problems with foreign language use

Decoding Encoding

Problems

Syntagmaticlevel

Semanticlevel

Metalanguage

Problems with dictionary useProblems with dictionary use

Problems with dictionary use

Abbreviations

Technicalterms

Other„codes“

Descriptivelanguage

Page 7: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Italian

agg.

art.

tr.

determ.

pron.

femm.

ant.

volg.

region.

mus.

sociol.

ABBREVIATIONSGerman

Adj.

Art.

tr.

best.

Pron.

w./Fem.

veralt.

vulg.

landsch.

Mus.

Soziol..

(adjective)

(article)

(transitive verb)

(definite article)

(pronoun)

(feminine)

(archaic)

(vulgar)

(regional)

(music)

(sociology)

Page 8: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

aggettivo

articolo

ausiliare

transitivo

determinativo

pronome

femminile

antico

volgare

dialetto

musica

sociologia

TECHNICAL TERMS

Adjektiv

Artikel

Hilfsverb

transitiv

bestimmt

Pronomen

weiblich

veraltet

vulgär

landschaftlich

Musik

Soziologie

grammar

language

variation

Page 9: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

OTHER „CODES“

International Phonetic Alphabet (IPA) or other transcription systems

focus

shake

chiesa [chiè-sa]

Syntactic information (valency) provided in coded or abbreviated form

Ex.: (a) geben; [...] Vt j-m etw. g (Langenscheidt)

(b) give 2 Vnn (Cobuild) Vn (c) dare 17. N-V-N1 (N2/a N3) (Blumenthal/

Rovere)

.

Page 10: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

UNDERSTANDING THE DEFINITION...

„Ich muß im Lexikon nachschlagen, um herauszufinden, was eine Jungfrau ist. [...] Im Lexikon steht, Jungfrau, Frau (gewöhnlich jung), welche sich in einem Zustand unangetasteter Keuschheit befindet und in diesem verbleibt.Jetzt muß ich unangetastet und Keuschheit nachschlagen, und alles, was ich hier finde, ist, daß unangetastet das Gegenteil von angetastet bedeutet, und Keuschheit bedeutet keusch, und das bedeutet frei von gesetzeswidrigem geschlechtlichen Interkursus. Jetzt muß ich Interkursus nachschlagen [...] und ich weiß nicht, was das bedeutet, und ich bin es einfach leid, in dem schweren Lexikon von einem Wort zum anderen geschickt zu werden wie ein Vollidiot, und das alles nur, weil die Leute, die das Lexikon geschrieben haben, nicht wollen, daß unsereins etwas erfährt.Ich will doch nur wissen, wo ich hergekommen bin, aber wenn man jemanden fragt, sagen sie einem, man soll jemand anderen fragen, oder sie schicken einen von Wort zu Wort.“(McCourt 1998: 412 – 413, dt. Übersetzung)

Page 11: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Problems with foreign language use

Decoding Encoding

Problems with dictionary useProblems

Syntagmaticlevel

Problems with dictionary use

Problems with dictionary use

Semanticlevel

Metalanguage

Abbreviations

Technicalterms

Other„codes“

Descriptivelanguage

Formal Problems

Search Presentation

Page 12: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Problems with searching

• Time consuming

- 2000 pages- Small characters- Difficult metalanguage

• Complex expressions

- Collocations (“Zähne putzen”)- Idiomatic expressions

• …

Page 13: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Problems with presentation

•Limited space

• Linear presentation order

• Organisation of the dictionary

• Organisation of the entries

Page 14: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 15: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 16: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Problems with foreign language use

Decoding Encoding

Problems with dictionary use

Metalanguage

Problems

Syntagmaticlevel

Problems with dictionary use

Problems with dictionary use

Semanticlevel

Abbreviations

Technicalterms

Other„codes“

Descriptivelanguage

Formal Problems

Search Presentation

Solutions

Page 17: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Pedagogical Dictionaries

Target Group: language learners Functions: encoding & decoding General characteristics:

- (usually) monolingual- selective regarding macrostructure (limited

number of entries ) ‐ exhaustive regarding microstructure (detailled

information for each entry)

Page 18: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

ELDITELDIT

Elektronisches Elektronisches Lern(er)wörterbuch Lern(er)wörterbuch Deutsch-ItalienischDeutsch-Italienisch

Dizionario elettronico Dizionario elettronico per per apprendentiapprendentiItaliano-TedescoItaliano-Tedesco

http://www.eurac.edu/eldit

Page 20: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

1. typologically innovative:

• a monolingual dictionary (German or Italian): definitions, collocations, idiomatic expressions, examples … in the target language

&

• a bilingual dictionary (German and Italian): translation equivalents, explanations in L1

„cross-lingual“ dictionary German-Italian

Three main characteristics:

Page 21: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

2. well defined target group:

• beginners – intermediate students (Waystage level A1 up to Threshold level B1):basic vocabulary: ~ 3.000 entry words for each language

• addressed to the linguistic layman:limited use of meta-language, abbrevations and symbols

Page 22: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

3. designed solely for computer use:

• not a transformation of a paper dictionary into a electronic dictionary

• exploits the possibilities of the electronic medium (multimedia & hypertext)

• modular structure: contains detailled informations which you usually find in different types of dictionaries

Page 23: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Solutions

Problems with foreign language use

Decoding Encoding

Problems with dictionary use

Metalanguage

Problems

Syntagmaticlevel

Descriptivelanguage

Other„codes“

Technicalterms

AbbrevationsPresentationSearch

Problems with dictionary use

Problems with dictionary use

Semanticlevel

Formal Problems

1) Simple2) Use of L1

3) Multimedia

1) Definitions2) Examples

3) ...

Electronicsearch

possibilities

Hypertext and

hyperlinks 1) Sound-files2) Verb patterns

1) Avoiding2) Explaining

1) Avoiding2) Explaining

Page 24: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

1. Simple

2. Multiple descriptions

3. Hypertext

SOLUTIONS ...

Descriptive language

Page 25: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

a) Limited defining vocabulary

b) Easy syntax

d) Avoid circularity

1. Simple =

Page 26: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

a) Definitions

b) Lexicographic examples

c) Word fields

d) L1 (semantic equivalents)

[e) images]

2. Multiple descriptions =

Page 27: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Semantic Level:

Semantic information:

1. Definitions

2. Examples

3. Word fields

4. Equivalents

Hypernyms

Coordinates

Kinds of ...

das Gebäude

das Hausdas Haus, die Villa, das Schloss, die Wohnung ...

das Hochhaus, das Bauernhaus ...

1.a) Ein Haus ist ein Gebäude, in dem Menschen wohnen.casa

Sie wohnt mit ihrer Familie in einem zweistöckigen Haus am Stadtrand.

b) Ein Haus ist das Gebäude, in dem man ständig lebt und in das man

regelmäßig zurückkehrt. Es ist der Ort, wo man daheim ist.

Sie verlässt das Haus jeden Morgen um sieben Uhr, um zur Arbeit

zu fahren.

2. Das Haus sind die Bewohner eines Hauses (1a). casa

....

Page 28: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

a) Click on unknown words inside the definition

b) Click on the semantic equivalents

c) Click on any information you‘re interested in

3. Hypertext =

Page 29: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

1) Simple2) Use of L1

3) Multimedia

Solutions

Problems with foreign language use

Decoding Encoding

Problems with dictionary use

Metalanguage

Problems

Syntagmaticlevel

Descriptivelanguage

Other„codes“

Technicalterms

AbbrevationsPresentationSearch

Problems with dictionary use

Problems with dictionary use

Semanticlevel

1) Definitions2) Examples

3) ...

1) Collocations2) Examples

3) ...Hypertext

andhyperlinks

Electronicsearch

possibilities

Formal Problems

1) Sound-files2) Verb patterns

1) Avoiding2) Explaining

1) Avoiding2) Explaining

Page 30: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

1. Collocations

2. Idiomatic Expressions

3. Verb Valency

Syntagmatic level:

Page 31: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

- Definition: “Valency refers to the capacity of a verb to take a specific number and type of arguments” (Bianco)

- Theoric origin: dependency grammar (Lucien Tesnière)

Verb Valency

Page 32: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

• verb constructions are largely arbitrary and unpredictable

• number of obligatory and facultative elements

• distinction between transitivity and intransitivity

• …

Verb Valency: a problem for learners and researchers

Page 33: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

• General monolingual dictionaries

The description of verb valency in different dictionary types

fragen: [jemdn.] unvermittelt, ... etw. fragen

Duden Deutsches

Universalwörterbuch

chiedere: v.tr. (2 argom.)

Disc

chiedere: v.tr. Devoto/Oli

Page 34: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

2. Special mono- and bilingual verb valency dictionaries

The description of verb valency in different dictionary types

fragen: 01a v 1b C Bianco

chiedere: N- V- N1 (N2/a N3)

Blumenthal/Rovere

Page 35: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

3. (Monolingual) learners‘ dictionaries

The description of verb valency in different dictionary types

fragen: Vt/i (j-n) (etw.) f. Langenscheidt

fragen: tr K jd fragt jdn [nach etw dat]

Pons Basiswörterbuch

chiedere: tr. Dib

Page 36: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Description of Verb Valency in ELDIT

Explicit way of describing verb valency

N-V-N1-(N2) v.tr. (2 argom.) Vt/i (etw.) (über j-n/etw.) r.

I. Learner friendly description:

Page 37: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Description of Verb Valency in ELDIT

II. Multimedia:

Visualization of information to support comprehension

(colors and animations instead of meta-language)

Page 38: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Description of Verb Valency in ELDIT

III. Semiotic didactics:

Functions of the different colors:

- they indicate the parts of the sentence

- they show which parts of the verbs belong together

- correspondence between patterns and examples

Page 39: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Description of Verb Valency in ELDIT

IV. Additional explanations for the learner:

- Visible notes to describe semantic restrictions

- Variations for realizing single parts of the sentence

Page 40: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Hypertext and

hyperlinks

1) Simple2) Use of L1

3) Multimedia

1) Collocations2) Examples

3) ...

1) Definitions2) Examples

3) ...

Solutions

Problems with foreign language use

Decoding Encoding

Problems with dictionary use

FormalProblems

Metalanguage

Problems

Lexical fieldsThree dimensional

graphics

Syntagmaticlevel

Descriptivelanguage

Other„codes“

Technicalterms

AbbreviationsPresentationSearch

Problems with dictionary use

Problems with dictionary use

Semanticlevel

Electronicsearch

possibilities

1) Sound-files2) Verb patterns

1) Avoiding2) Explaining

1) Avoiding2) Explaining

Page 41: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

• Word field theory:

„Ein Wortfeld ist eine Gruppe von Wörtern, die inhaltlich einander eng benachbart sind und die sich vermöge Interdependenz ihre Leistungen gegenseitig zuweisen.“ (Trier 1968/1973: 189, späte Def.)

• Existing Projects

- WordNet (GermaNet, Italian WordNet)

- Alexia

- Kirrkirr

PARADIGMATIC RELATIONS

Page 42: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmatic relations in ELDIT

• Ca. 150 words per language• interactive graphic representation• spacial arrangement and colors for the

representation of paradigmatic lexical relations• explicit description of the semantic relations

between the lexical units and the lemma (no metalanguage)

• definitions and examples for describing similarities/differences of meaning, register, authentic context

Page 43: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Lexical fields in ELDIT

Type of meaning relations:• hierachical relations

(hyperonymy/hyponymy; holonymy/meronymy)

• non-hierarchical relations

(similarity: synonyms, quasi-synonyms … -

contrast: gradable and nongradable antonyms;

converse terms)

Page 44: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 45: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 46: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 47: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Paradigmaticlevel

Learners‘ difficulties and needs

Hypertext and

hyperlinks

1) Simple2) Use of L1

3) Multimedia

Three dimensionalgraphics

1) Collocations2) Examples

3) ...1) Sound-files

2) Verb patterns

1) Avoiding2) Explaining

1) Avoiding2) Explaining

1) Definitions2) Examples

3) ...

Solutions

Problems with foreign language use

Decoding Encoding

Problems with dictionary use

Formal Problems

Metalanguage

Problems

Syntagmaticlevel

Descriptivelanguage

Other„codes“

Technicalterms

AbbreviationsPresentationSearch

Problems with dictionary use

Problems with dictionary use

Semanticlevel

Electronicsearch

possibilities

Page 48: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Other modules

• Flexion

• Word family

• N.B.

Page 49: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

DatamodelNeeds for an innovative presentation

Page 50: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

A detailed data model

Page 51: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 52: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Implementation

– Hierarchical structured data– Many changes were expected– Communication with linguists

Page 53: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Use of XML

– XML und XML-Editor• Hierarchic Structure• Communication with Linguists

– Java-Servlet Technology– DXML or JDOM– Dynamic Generation of HTML

Page 54: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Content Authoring

• Content Authoring– Difficult– Time consuming– Error prone

• In ELDIT:– Innovative Presentation– Efficient Interface

(Real World System)– Research of Linguists

Page 55: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

“Efficient” Authoring Interface

Page 56: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

• Semi-structured Data

• Automatic full-structuring

• Automatic enriching

Efficient Authoring Interface

Page 57: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Semi-structured Data

Page 58: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.
Page 59: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Automatic full-structuring

<example> <w>Meine</w> <w>Eltern</w> <w style="emphasized">haben</w> <w style="emphasized">das</w> <w style="emphasized">Haus</w> <w>vor</w> <w>50</w> <w>Jahren</w> <w style="emphasized">gebaut</w> <w>.</w></example>

<prebasuf> <article>die</article> <praefix>Be</praefix> <basis>haus</basis> <suffix>ung</suffix></prebasuf>

Page 60: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Automatic Enriching

By using Computational Linguistics tools

• WordManager• TreeTagger• PhraseManager, WordNet, Parser, …

Page 61: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

<derivation> <prebasuf>die Be_haus_ung</prebasuf> <translation>la dimora</translation></derivation>

Page 62: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

<derivation id="de.n.haus.1.deriv2">

<pattern id="de.n.haus.1.deriv2.patt0" base="Behausung" ctag="N" lexref=""> <article base="der" ctag="art" lexref="de.g.articles.1.item1">die</article> <praefix explref="de.prae.h.be">Be</praefix> <basis>haus</basis> <suffix explref="de.suff.h.ung">ung</suffix> </pattern>

<translation id="de.n.haus.1.deriv2.trans0"> <w id="de.n.haus.1.deriv2.trans0.w0" type="content" base="il" ctag="art" lexref="it.g.articles.1.item2">la</w> <w id="de.n.haus.1.deriv2.trans0.w1" type="content" base="dimora" ctag="N" lexref="it.n.dimora.1">dimora</w> </translation>

</derivation>

Page 63: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

ELDIT and WordManager

• WordManager

• WM Transducers

• WordManager in ELDIT

Page 64: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

WordManager - 1992

– System for reusable morphological dictionaries

– Information of a word about• Flexion (Declination and Conjugation) • Word formation (Derivation and Composition)• Orthography (Old and new for German)• …

– German, Italian, English

Page 65: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Lemmatizer Häusern → haus (Cat N)

Inflection Analyzer Häusern →

haus (Cat N)(Gender N)(Num PL)(Case Dat)

Inflection Generator Haus →

haus (Cat N)(Gender N)(Num SG)(Case Nom),

haus (Cat N)(Gender N)(Num SG)(Case Gen),

häuser (Cat N)(Gender N)(Num PL)(Case Nom),

häusern (Cat N)(Gender N)(Num PL)(Case Dat)

Word Formation Analyzer

kennenlernen → kennen (Cat V)(Aux haben)

lernen (Cat V)(Aux haben)

Word Formation Generator

bosco → abbracciabosco (Cat N)(Gen M)

boscaglia (Cat N)(Gen F)

boscaiolo (Cat N)(Gen M)

WM Transducers - 2000

Page 66: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

WM in ELDIT

Search (Lemmatizer)

Page 67: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Links and Additional Examples (Lemmatizer)

Page 68: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Exercises (Analyzer)

Page 69: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Conjugation tables (Generator)

Page 70: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

ELDIT and TreeTagger

• ELDIT Text Corpus• Development• Tagging • Manual Corrections

Page 72: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Development

• MSWord

(Goethe Institut of Milan)

• HTML

• Simple XML

Page 73: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Tagging

• POS – tagging (→ TreeTagger)

• XML with links

• Iterative Correction by frequency of unlinked words

Page 74: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Corrections

• Old German spelling rules valid until 1998

• The Italian verb “sono” (they are) was always tagged with “sonare” (=suonare, make music) instead of with “essere” (to be).

• The verb “sia” (he may be) was always recognized as a conjunction and tagged with “sia” (as well as) instead of with “essere” (to be).

• Many conjugated forms of “avere” were tagged with “riavere” (to get something back) instead of with “avere” (to have).

• Many conjugated forms of “andare” were tagged with “riandare” (to go back) instead of with “andare”.

• Abbreviated forms of Italian words (such as “bel”, “vuol”, “pur”, “fin”) were tagged as nouns and with the original form as lemma.

• Some Italian words which exist both as nouns and as past participles (such as the word “successo” (the success, it happened)) were tagged with the wrong word class.

Page 75: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Literature

• http://www.eurac.edu/about/collaborators/JKnapp/index.htm

→ Publications

(some linguistic ones, too)

→ PhD-Thesis

(Abel Andrea – Uni Innsbruck;

Judith Knapp – Uni Hannover)

Page 76: Lexicography and computer science: a harmless drudgery? Judith Knapp (jknapp@eurac.edu) Andrea Abel (aabel@eurac.edu) European Academy Bozen - Bolzano.

Conclusionsyntagmatisch, paradigmatisch, pragmatisch, Polysemie, Homographie, Homonymie, Holonymie, Hyponymie, Hyperonymie, semiotisch, ludativ, …

Goal based scenarios, blended

learning …

TEI, CES, NLP, Lemmatizing, POS-

Tagging …

Fileserver, Webserver, Datenmodell, HTTP request, Client, Protokoll, Port, …

+∞

∫√∂u∆v- ∞


Recommended