+ All Categories
Home > Documents > Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual...

Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual...

Date post: 19-Dec-2015
Category:
View: 221 times
Download: 3 times
Share this document with a friend
Popular Tags:
57
Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: and Complex Expressions: towards Multilingual Linking towards Multilingual Linking Nicoletta Calzolari Nicoletta Calzolari
Transcript
Page 1: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Lexicons …Lexicons …

and Complex Expressions:and Complex Expressions:towards Multilingual Linkingtowards Multilingual Linking

Nicoletta CalzolariNicoletta Calzolari

CopenhagenCopenhagen, October 2001

Page 2: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

What is What is SIMPLESIMPLE??

A commoncommon rich modelrich model representation languagerepresentation language methodology of buildingmethodology of building the lexicon

common Template Typescommon Template Types, with default obligatory info (Type defining), and indication of optional info

First time: on a large scale, for so many languagesFirst time: on a large scale, for so many languages Lexical meaning represented in terms of integrated combinations of different integrated combinations of different

sorts of informationsorts of information (semantic type, argument structure, relations, features, etc. ) Ontology-based informationOntology-based information comes together with predicative representationpredicative representation and syntactic syntactic

linkinglinking A shared set of SemUs (from EWN) (about 700) of the 12 Lexicons cross-lingually of the 12 Lexicons cross-lingually

relatedrelated

A set of A set of 1212 harmonised harmonised computational computational lexicons for HLT applications, lexicons for HLT applications, geared for multilingual linksgeared for multilingual links

Page 3: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

SemU

MuSSynU

SemU

Sem InfoSem Info

Lexical RelSem. Rel Sem. Feat

MuSSynU

SemUSemU

Sem InfoSem Info

PAROLE/SIMPLE PAROLE/SIMPLE Architecture Architecture + + CLIPSCLIPS Italian Italian National Project National Project

TEMPLATETEMPLATE

55,00055,000lemmaslemmas 55,00055,000

SemUSemU

60,00060,000lemmaslemmas

Page 4: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Semantic informationSemantic information in SIMPLE in SIMPLE

Word senses encoded as Semantic UnitsSemantic Units (SemUs),(SemUs), containing the following info:

• Semantic type *Semantic type *

• Domain *Domain *

• Lexicographic gloss *Lexicographic gloss *

• Extdended Qualia structure

• Reg. Polysemy altern.

• Event type

• Derivation relations

• Synonymy

• Collocations

• Argument structure for Argument structure for predicative SemUs *predicative SemUs *

• Selection restrictions on the Selection restrictions on the arguments *arguments *

• Link of the arguments to the Link of the arguments to the syntactic subcategorization syntactic subcategorization frames (represented in the frames (represented in the PAROLE lexicons) *PAROLE lexicons) *

Page 5: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Semantic MultidimensionalitySemantic Multidimensionality and and NLPNLP

NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of multidimensional aspects of word meaningword meaning, represented in SIMPLE with

the

Extended Qualia RelationsExtended Qualia RelationsIs_a_part_ofIs_a_part_of

Member_ofMember_of

TelicTelic

Made_ofMade_of

la pagina del libro (the page of the book)

il difensore della Juventus (Juventus fullback)

il suonatore di liuto (the lute player)

il tavolo di legno (the wooden table)

Page 6: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions

Pred. LayerPred. Layer

QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type

InstantiationInstantiation

Italian lexiconItalian lexicon

Type Type OntologyOntology

150 150 typestypes

TemplateTemplate Catalan lexiconCatalan lexicon

Danish lexiconDanish lexicon

Greek lexiconGreek lexicon

Overall OrganizationOverall Organization

......

Page 7: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Each type is associated to a Each type is associated to a Template Template consisting of a cluster of consisting of a cluster of information (relations, features, argument structure, event type, etc.) information (relations, features, argument structure, event type, etc.) that defines the typethat defines the type

The information characterizing a Semantic Unit includes:

a. The type definingtype defining information (associated to the template the SemU instantiates)

b. Additional information (other relations or features, selectional restrictions, terminology, cross-part of speech relations, polysemy, etc.)

The The Core OntologyCore Ontology represents a first level of organization of the represents a first level of organization of the semantic type systemsemantic type system

Page 8: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

TemplateTemplate

Contextual/Contextual/Polysemy Polysemy

InformationInformation

Qualia Qualia StructureStructure

Predicative Predicative LayerLayer

Type System Type System CoordinatesCoordinates

SemU: Identifier of a SemUSynU: Identifier of the SynU to which the SemU is linkedBC Number: Number of the corresponding Base Concept in

EuroWordNetTemplate_Type: Semantic type of the SemUTemplate_Supertype: Semantic type which dominates the type of the SemU in the

type-hierarchyUnification_path: Unification history of a template (only for unified top-types)Domain: Domain information from ERLI's domain listSemantic Class: One of WordNet Classes used by ERLIGlossa: Lexicographic definitionEvent Type: Event SortPredicativeRepresentation:

Predicate associated with the SemU, and its argumentstructure

Selectional Restr.: Selectional restrictions on the argumentsDerivation: Derivational relations between SemUsFormal: Formal relation between SemUsAgentive: Agentive relations between SemUsConstitutive: Constitutive relations between SemUs

Constitutive semantic featuresTelic: Telic relations between SemUsSynonymy: Synonyms of the SemUCollocates: Collocate informationComplex: Polysemous class of the SemU

““redundancy”redundancy”

Page 9: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Verb Examples: hear, smell, etc.

Noun Examples: sight, look, etc.Linguistic Tests:Linguistic Tests: ….Levin Class:Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g. look, smell)Comments: Processes involving an experiencing relation, ….

SemU: 1 < <guardare_2> (look)guardare_2> (look)Usyn:BC Number: 105Template_Type: [Perception]Template_Supertype: [Psychological_event]Domain: GeneralSemantic Class: PerceptionGloss: //free// osservare con attenzioneosservare con attenzioneEvent typeEvent type: processprocessPred _RepPred _Rep.:.: Lex_Pred (<arg0>,<arg1>)(<arg0>,<arg1>)Derivation: <Derivational relation> Selectional RestrSelectional Restr.: arg0 = AnimateAnimate //concept// arg1:default = [Entity] Formal:Formal: isa (1,<SemU>:[Perception]>) <percepire>:[Psych_ev]<percepire>:[Psych_ev] AgentiveAgentive:: <Nil>Constitutive:Constitutive: instrumentinstrument (1, <SemU>:[Body_partBody_part]) <occhio><occhio> intentionalityintentionality ={yes,no} //optional// ={yes}={yes}Telic:Telic: <Nil>Collocates: Collocates (<SemU1>,...<SemUn>)Complex: <Nil>

Template for “Perception”Template for “Perception”

Page 10: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Modular RepresentationModular Representation of a SemUSemantic RelationsSemantic Relations

SemU SemU

Predicate, Predicate, arguments, selection arguments, selection

restrictions, ..restrictions, ..

Pred. LayerPred. LayerPred. LayerPred. Layer

Relations betw. Relations betw. SemUsSemUs

Rel. LayerRel. LayerRel. LayerRel. Layer

QualiaQualiamultiple meaning dimensions in a

sense

DerivationDerivationcross-PoS relations

PolysemyPolysemyregular

polysemous classes

CollocatioCollocationn

collocational collocational informationinformation

Flexibility: Flexibility: an extendable framework extendable framework to allow coherent future extensions & tuning for specific

applications/text types

FeaturesFeatures

Page 11: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

TopTop

FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic

Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty

ContainsContains

Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic PurposePurpose

InstrumentalInstrumental

Is_the_habit_ofIs_the_habit_ofUsed_forUsed_for Used_asUsed_as

... ...

The targets of relations identify:

prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU

elements of dictionary definitions of SemUselements of dictionary definitions of SemUs

typical corpus collocates of the SemUtypical corpus collocates of the SemU

100 Rels.100 Rels.

....

ActivityActivity.... ....

Page 12: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Ala (wing)

SemU: 3232Type: [Part][Part]Part of an airplanePart of an airplane

<uccello>bird

<parte>part

<volare>fly

IsaSemU: 3268Type: [Part][Part]Part of a buildingPart of a building

SemU: D358Type: [Body_part][Body_part]Organ of birds for flyingOrgan of birds for flying

Used_for

Isa

Isa

<fabbricare>make

Used_for

Agentive

<edificio>building

<aeroplano>airplane

Is_a_part_of

Is_a_part_ofIs_a_part_of

SemU: 3467Type: [Role][Role]Role in footballRole in football

<giocatore>player

Isa

Page 13: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

SemU

Sell VSell V

SemU

Sale NSale N

SemU

Seller NSeller N

Pred_SELLPred_SELL <ARG0>, <ARG1>,

<ARG2>, <ARG3>

Event_nounEvent_noun

Relations and Relations and PredicatesPredicates

Is_the_agent_ofIs_the_agent_of

Page 14: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Comprendere V

SemU: 61725

Type: [Cognitive_event][Cognitive_event]

To understandTo understand

SemU: 6962

Type: [Constitutive_state][Constitutive_state]

To includeTo include

Comprensione N

SemU: 61726

Type: [Cognitive_event][Cognitive_event]

UnderstandingUnderstanding

Comprendere#1Comprendere#1 <Arg1 [Human]>, <Arg2 [ Semiotic]><Arg1 [Human]>, <Arg2 [ Semiotic]>

Comprendere#2Comprendere#2<Arg1 [Group]>, <Arg2><Arg1 [Group]>, <Arg2>

master

master

verb_nominalization

problems problems with with

selection selection restrictionsrestrictions

!!!!!!

problems problems with with

selection selection restrictionsrestrictions

!!!!!!

Page 15: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

SIMPLE/CLIPSSIMPLE/CLIPS figures (now) figures (now)

((11,00011,000 Lex. Units) 16,903 Lex. Units) 16,903 SemUsSemUs((11,00011,000 Lex. Units) 16,903 Lex. Units) 16,903 SemUsSemUs

Nouns: 12161

Verbs:3476

Adjectives:1266

PredicatesPredicates: 43684368

• TemplatesTemplatesInstrument 734Human 712PsychologicalProperty 586Profession 541Purpose_Act 535Part 503Human_Group 502Relational_Act 521AgentTemporaryActivity

320Domain 303

• FeaturesFeatures & Relations & Relations

Agentive1945

EventTypeProcess1846

EventTypeTransition1463

AgentiveCause1175

Usedfor1488

Synonym 1258ResultingState

1197 Isapartof

909Hasaspart

800 Istheactivityof

611 Objectoftheactivity

598AntonymGrad

575 Createdby

525 Agentverb

454Concerns

421

Page 16: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

PAROLE/SIMPLE/EWNPAROLE/SIMPLE/EWN startstart providing the common platformcommon platform

For the subsidiarity subsidiarity concept the process started at the EU level is continued at the national levelnational level:

extended in extended in (at least) (at least) 9 National Projects9 National Projects (Danish, Greek, Italian, Portuguese, Swedish, ...)

(to be) used in applications

True Infrastructure of harmonised LRs in EU True Infrastructure of harmonised LRs in EU Basis for Multilingual LRBasis for Multilingual LR

ENABLERENABLER ((coord. A. Zampolli)coord. A. Zampolli)

Core Lexicons enlarged inCore Lexicons enlarged in National ProjectsNational Projects

Page 17: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Harmonisation:Harmonisation:Need for a Global ViewNeed for a Global View

Interaction/sharingInteraction/sharing of data & software/tools Need of compatibility among various componentscompatibility among various components An “exemplary cycle”:An “exemplary cycle”:

FormalismsFormalisms

GrammarsGrammars

Software: Taggers,Software: Taggers,Chunkers, ParsersChunkers, Parsers

Representation Representation AnnotationAnnotation

LexiconLexicon CorporaCorpora

Software: Software: Acquisition SystemsAcquisition Systems

I/O InterfacesI/O Interfaces

LanguageLanguagess

Page 18: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

SIMPLESIMPLE wrt wrt EAGLES/ISLEEAGLES/ISLEStandards for

Multilingual Lexical resources

EAGLES guidelines for syntactic and semantic lexicons

PAROLE/SIMPLELexicons

MT systems

MultilingualLexicons

ISLE recommendations

for multilingual lexicons

Page 19: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

MissionMission((http://lingue.ilc.pi.cnr.it/EAGLES96/isle/http://lingue.ilc.pi.cnr.it/EAGLES96/isle/

ISLE_Home_Page.htmISLE_Home_Page.htm))

• MT and multilingual HLT need to enhance production, maintenance & extension of computational lexical resources

• ISLE goals– provide a common environment for the development, integration, interchange

& sharing of lexical resources with various types of linguistic information

– establish a virtuous circle betw. research, applications, & standardization process: lay down a bridge betw. the worlds of research and application

– mark the boundary between well-consolidated practice and theoretical achievements in multilingual HLT, and areas still open to research but critical for future technological improvements

• Crucial role of intercontinental cooperationintercontinental cooperation for preparing ISLE recommendations and for their validation

Page 20: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

ISLE and MTISLE and MT

• Academic and industrial members of the MT community actively involved in the ISLE group– Microsoft, NMSU, Sail Labs, Systran, UMIACS, UPenn, ISI, etc.

• Survey phase: – a number of lexical resources for MT systems surveyed by ISLE

• MT systems requirements provide the main reference points for ISLE work, to determine:– types of lexical information critical to SL TL mapping– criteria to create bilingual resources from existing monolingual ones– common data structures to develop reusable multilingual resources– critical areas of the lexicon: MWEsMWEs, complex transfer cases,

collocational/example-based information, etc.MWEMWE

parenthesisparenthesis

Page 21: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

MWEMWE in ISLE & XMELLT - 2 types of MWE: in ISLE & XMELLT - 2 types of MWE:

(Deverbal) nominalisations +support (light) verbs(Deverbal) nominalisations +support (light) verbs make an acquisition1 (noun.act; verb.possession) complete an acquisition1 undertake an acquisition1

make an application1 (noun/verb.communication) have an application1 in decide on an application1 (consider, hear) get an application1 (receive, take) submit an application1 (file)

Noun(/Adj/Poss)+Noun MWNoun(/Adj/Poss)+Noun MW (Ital.: N+PP/N+Adj/N+Vinf/...)

air pollution job application murder suspect police action; police scandal

• coltello dada macellaio butcher's's knife• carta didi credito credit card• carta telefonica (adj)(adj) phone card• agenzia di viaggi travel agency• film perper adulti adult movie (adj)(adj)• macchina da scrivere typewriter (comp.)(comp.)

11stst

NoNo equivalentequivalent structuresstructures

NoNo equivalentequivalent structuresstructures

22ndnd

Page 22: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

The Boundaries:The Boundaries:··Support VerbsSupport Verbs:: more than Light Verbs? more than Light Verbs?

·· Nominalisations Nominalisations:: …. to a broader set …. to a broader set

Both verbs,verbs, combined with an event noun, whose subjects subjects are : participantsparticipants in the event identified by the noun related to some scenariorelated to some scenario associated with the eventevent

Type 1: take an exam, give an examtake an exam, give an exam Type 2: pass an exam, fail an exam, grade (evaluate) an exampass an exam, fail an exam, grade (evaluate) an exam

Type 1: perform an operation, undergo an operationperform an operation, undergo an operation Type 2: survive an operationsurvive an operation

But also … But also … enlarge the concept of nominalisation nominalisation to event/result/abstract nouns not morphologically derivedevent/result/abstract nouns not morphologically derived

dare un ceffone ceffone (to slap) provare rancore rancore (to bear sb. a grudge) fare una festa festa (to have a party) fare festa festa (to have a holiday) fare festa festa a qno (to give sb. a warm welcome) prestare attenzione attenzione (to pay attention) fare la guerra guerra (to wage war)

fare una cessione (cedere) cessione (cedere) vs.vs. make? a cession (…) cession (…) avere una cessazione (cessare) cessazione (cessare) delle ostilita vs.vs. have? a cessation cessation of hostilities (…)

11stst

No verbNo verb (for diachronic reason)(for diachronic reason)

No verbNo verb (for diachronic reason)(for diachronic reason)

Page 23: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Hypothesis for encoding:Hypothesis for encoding:“Mel’cuk type” Lexical Functions (“Mel’cuk type” Lexical Functions (LFLF))

to record to record semanticsemantic contribution and/or contribution and/or aspectual aspectual properties properties conveyed by the Vconveyed by the V

to express to express argument-sharingargument-sharing betw 2 arg structuresbetw 2 arg structures

Oper1: Oper1: perform an operation;perform an operation; made an apologymade an apology Oper2: Oper2: undergo an operation; merits discussion;undergo an operation; merits discussion; had a visithad a visit Func0: Func0: silence reignsilence reign LaborLaborijij: : take into considerationtake into consideration Incep: Incep: start the attackstart the attack Cont: Cont: maintain influencemaintain influence Fin: Fin: complete the acquisitioncomplete the acquisition Liqu: Liqu: eradicate the diseaseeradicate the disease Real: Real: keep the promise, approve the applicationkeep the promise, approve the application AntiReal: AntiReal: turn down, withdraw the applicationturn down, withdraw the application ……..

11stst

Page 24: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Nominalisations: Nominalisations: examples from examples from CorpusCorpus

accusaaccusa(supp-v: (supp-v: formulareformulare, lanciare, muovere, rivolgere,..., lanciare, muovere, rivolgere,... (Oper1)(Oper1) subiresubire[default],[default], beccarsi, attirarsi, rischiare,... beccarsi, attirarsi, rischiare,... (Oper2)(Oper2) mettere, porre,... sotto a.mettere, porre,... sotto a. (Laborij)(Laborij) rintuzzare, rigettare,rintuzzare, rigettare, smontare, …smontare, … (Liqu)(Liqu)Problematic?:Problematic?: ritorcere, rovesciare… ritorcere, rovesciare… (...)(...) sostenere,… sostenere,… (...) (...) ripetere,ripetere, … … (...)(...) ……....

________________________________________________________________________________________________________________________acquisizioneacquisizione

(supp-v: (fare)(supp-v: (fare)[default],[default], condurre, curare,effettuare,... condurre, curare,effettuare,... (Oper1)(Oper1) vararevarare,...,... (Incep)(Incep)

perfezionareperfezionare, completare,, completare, concludere, …concludere, … (Fin)(Fin) evitare, compromettere, …evitare, compromettere, … (Liqu)(Liqu) sfumare, …sfumare, … (LiquFunc0)(LiquFunc0)Problematic?:Problematic?: annuciare, dichiarare,annuciare, dichiarare, … … (say)(say) decidere, proporre, promuovere, stimolare,decidere, proporre, promuovere, stimolare, … … (...)(...) consentire, permettere, proporre, garantire,consentire, permettere, proporre, garantire, … … (...)(...) ……....

11stst

Automatic Automatic acquisitionacquisition

Automatic Automatic acquisitionacquisition

Page 25: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Support Verbs:Support Verbs: whatwhat to listto list for multilingual lexicons?for multilingual lexicons?

Decide if to include/listinclude/list, for a noun all the verbsall the verbs usable for a Melcukian LF

INCEPINCEP: : cominciare cominciare [default] vs. vs. varare, intraprendere, … varare, intraprendere, … INCEPINCEP: : begin begin [default] vs. vs. open open (an investigation),(an investigation), … … OPER1OPER1::saysay a prayer a prayer (not (not makemake, , like with other speech act

nouns) OPER1OPER1::paypay attention attention

onlyonly those lexically dedicated to that nounlexically dedicated to that noun (needed for generation) (not the general & available by default for a LF)

begin begin an exam/operation or finishfinish an exam/operation

similar words preferentially select different verbsdifferent verbs to express similar similar meaningsmeanings (same lexical functions): (same lexical functions): lexical preferencelexical preference

11stst

Page 26: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Complex nominalsComplex nominals in a multilingual frameworkin a multilingual framework

Different syntactic patterns in L1 & L2Different syntactic patterns in L1 & L2 N+Nh (= head noun) in English is usually Nh+PP in Italian

tooth brush spazzolino da spazzolino da denti & the syntactic pattern is not predictable& the syntactic pattern is not predictable

hair/clothes brush spazzola per spazzola per capelli/abitinail brush spazzola per lespazzola per le unghie

• travel agency agenzia di di viaggi• real estate agency agenzia immobiliimmobiliareare• marriage bureau agenzia matrimonimatrimonialeale

A MWE in L1A MWE in L1 corresponding to a corresponding to a fully compositional phrasefully compositional phrase cucchiaino da caffè cucchiaino da caffè coffee spoon???

For MT implies some conceptual (interlingual?) representation conceptual (interlingual?) representation

but the “encoding”“encoding” process must find an appropriate MWE if it is called for

analogous to blocking/pre-emption:blocking/pre-emption: a regular/compositional process is not carried out (dispreferred) because the semantic space occupied by the concept associated with that formation is already claimed by some ready-made expression

22ndnd

FillmoreFillmoreFillmoreFillmore

Page 27: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

If look at devices in grammar that allow to produce new MWEsIf look at devices in grammar that allow to produce new MWEs

a a continuumcontinuum::

N+PPN+PP>>collocationcollocation>>multi-wordmulti-word>>idiomidiom

productiveproductive mechanisms in the languagemechanisms in the language but idiosyncraticbut idiosyncratic

information at the borderline betw. grammar & lexiconinformation at the borderline betw. grammar & lexicon

Amounts to: Amounts to: describedescribe productive modification relation of Nproductive modification relation of N in general:in general: in particular those lexically selected/preferred by a N (its semantic in particular those lexically selected/preferred by a N (its semantic

paradigm)paradigm)

MWE are a subset of theseMWE are a subset of these

(give good hints to discover most prominent relations??)(give good hints to discover most prominent relations??)

look at thelook at the semantic structure of Nounssemantic structure of Nouns: : i.e. at the variety of i.e. at the variety of modifiers they can select modifiers they can select by virtue of their meaning by virtue of their meaning

Broader scope :Broader scope : extension to non MWE?extension to non MWE?22ndnd

FillmoreFillmoreFillmoreFillmore

Page 28: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals…are …are pervasivepervasive

There is a motivation in most N+N construction: the context provides it

The FrameNetFrameNet (SIMPLESIMPLE) way appeal to specific frame structuresspecific frame structures (qualia qualia

structuresstructures) associated with the head noun, determine from corpus attestations which frame which frame

elementselements (qualiaqualia) can get instantiated as a modifier word

““container”:container”: complex nominals can specify: material material (aluminium c., glass c., …) contents contents (food c., trash c., …) size size (3 quart c., …) function function (shipping c., storage c., …) ......

22ndnd

Fillmore Busa

Page 29: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

a.a. FrameNetFrameNet

Container Frame: Frame ElementsFrame Elements: Material,Contents,Size,Function• Material: aluminum container, glass c., metal c., tin c.• Contents: food container, beverage c., trash c., water c., milk c., fuel c.• Size: 3 quart container• Function: shipping container, storage c.

b.b. SIMPLESIMPLE

Qualia RelationsQualia Relations of "container" used in compounds:• Constitutive: made_of [MATERIAL]

aluminum container, glass c., metal c., tin c.• Telic: contains [ENTITY]

food container, beverage c., trash c., water c., milk c., fuel c.

• Constitutive:size [QUANTITY]3 quart container

• Telic:is_used_for [EVENT]shipping container, storage c.

Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals& & multidimensional multidimensional semantic approachessemantic approaches22ndnd

Page 30: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

describe vs. list?describe vs. list? if a compound noun is clearly lexicalizedlexicalized, it's simply one of the words in L1 butbut if it is an instance of some productive word-formation ruleinstance of some productive word-formation rule, we

should describe it

bothboth describe & list: describe & list:

listlist explicitly in the lexical entry explicitly in the lexical entry

what iswhat is idiomatic/idiosyncratic wrt generationidiomatic/idiosyncratic wrt generation for for lexical selection

mucca pazzamucca pazza vs. mattavs. matta prestare attenzioneprestare attenzione vs.vs. pay attention pay attention

structural pattern travel agency travel agency agenzia di viaggiagenzia di viaggi marriage bureaumarriage bureau agenzia matrimoniale agenzia matrimoniale (*di matrimonio) real estate agencyreal estate agency agenzia immobiliareagenzia immobiliare

but also,but also, an apparatusan apparatus to describeto describe how word semantics of Ns how word semantics of Ns interact when they co-occur (co-selection, co-composition, ...) interact when they co-occur (co-selection, co-composition, ...)

Complex Nominals/Lexical ConstructionsComplex Nominals/Lexical Constructions

in a multilingual context…in a multilingual context…22ndnd

Page 31: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

In a multilingual context…In a multilingual context…

...regularities in each language, but they don’t match...regularities in each language, but they don’t match

Both for decoding & encoding, decoding & encoding, we need both: both: a linguistic apparatus for interpretationa linguistic apparatus for interpretation (e.g. to go to a language where it is not a MWE: cucchiaino da caffè for a Japanese useful to know … “used

for”) lists for lists for idioms…, idioms…, for for unpredictable/idiosyncraticunpredictable/idiosyncratic

Same apparatus to interpret both MWE & regular N Same apparatus to interpret both MWE & regular N constructionsconstructions (similar power of expressiveness): general principles of semantic constitution of lex. items & their combinatorics in terms e.g. of frames/qualia/…:

basic sem. notionsbasic sem. notions & a general schema to characterise the problema general schema to characterise the problem, e.g.

frame frame (qualia) (qualia) structure of the headNstructure of the headN semantic Type of the modifier Nsemantic Type of the modifier N allow the headN to impose its interpretation on the modification allow the headN to impose its interpretation on the modification

rel.rel. ......

22ndnd

Page 32: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

a “cutting frame” (FrameNet) “cutting frame” (FrameNet) specific SIMPLE dimensions of meaningspecific SIMPLE dimensions of meaning

extensively evaluate whether qualia roles (already) encoded in SIMPLE correspond to what is necessary to interpret N-N modification relations

SIMPLE Extended Qualia structureSIMPLE Extended Qualia structurefor the interpretation of the semantic relation betw. Ns for the interpretation of the semantic relation betw. Ns

(internal relational structure of(internal relational structure of MWE MWE))

butcher’s knife butcher’s knife (coltello (coltello dada macellaio) macellaio) TELIC TELIC (used_by) (used_by) Y [Y [HumanHuman] ] PPdaPPda plastic knife plastic knife (coltello (coltello didi plastica) plastica) CONST CONST (made_of) (made_of) X [X [MaterialMaterial]] PPdiPPdi table knife table knife (coltello (coltello dada tavola) tavola) TELIC TELIC (used_in) (used_in) Z [Z [LocationLocation]] PPdaPPda hunting knife hunting knife (coltello (coltello dada caccia) caccia) TELIC TELIC (used_in_activity) (used_in_activity) E [E [ActivityActivity] ] PpdaPpda

piatto piatto didi legno legno CONST CONST (made_of) (made_of) X [X [MaterialMaterial]] PPdiPPdi piatto piatto didi pasta pasta CONST CONST (contains) (contains) X [X [FoodFood]] PPdiPPdi

Complex nominals, e.g.Complex nominals, e.g. knife knife (coltello) (coltello) triggerstriggers

22ndnd

PPPPdisambig.disambig.

PPPPdisambig.disambig.

Page 33: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

In In SIMPLESIMPLE: : possible possible extensionextension

Deverbal nominalisation:Deverbal nominalisation:

noun noun murdermurder ( (uccisione, delitto, omicidiouccisione, delitto, omicidio (different sem. (different sem.

pref.pref.))

PPdiPPdi PREDPRED::MURDERMURDER(uccidere)(uccidere)

PPda_parte_di, diPPda_parte_di, di ARG1ARG1:agent:agent[Hum/Anim?][Hum/Anim?]

verbverb murdermurder ( (uccidereuccidere)) ARG2ARG2::patientpatient[Hum/Anim?][Hum/Anim?]

subj:NPsubj:NP MOD1MOD1::instrinstr[Weapon][Weapon]

obj:NPobj:NP MOD2MOD2::meansmeans[Action][Action]

MOD3MOD3::......[...][...]

:instr: PPcon:instr: PPcon [ [WeaponWeapon] ] ((knife m., knife m., concon coltello coltello))

:means: PPper:means: PPper [ [ActionAction] ] ((strangulation m., strangulation m., perper strangolamento strangolamento))

:loc: Ppploc|di:loc: Ppploc|di [ [LocationLocation] ] ((Kent State murders, Kent State murders, nelnel ... ...))

:time: Ppptime|di:time: Ppptime|di [ [TimeTime] ] ((1983 murders, 1983 murders, del del 19831983))

As if it were As if it were a Situationa Situation

22ndnd

Page 34: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

consider as the starting point for MILE the edited unionedited union of the basic notions represented in the existing syntactic/semantic lexicons (their models)

evaluate their notions wrt EAGLESEAGLES recommendations for syntax and semantics

evaluate their usefulness & adequacyusefulness & adequacy for multilingual tasks

evaluate integrabilityintegrability of their notions in a unitary MILE

look for deficient areasdeficient areas, e.g. MWEMWE

...

… … Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation

Strategy:

To be decided: should ISLE reach a consensus at the level of the “types”“types” of information only, or also at the level of their “token”“token” values? …. different answers for diff. notions

Page 35: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

… … the Multilingual ISLE the Multilingual ISLE Lexical EntryLexical Entry (MILE)(MILE)

General methodological principlesmethodological principles (from EAGLES):

Basic requirements for the MILEMILE::

Discover and list the (maximal) set of basic notionsbasic notions

needed to describe the MILE (up to which level standardisation is feasible?)

GranularityGranularity

The leading principle for the design of the MILE: the

edited unionedited union of existing lexicons/models (redundancyredundancy is not a problem)

Modular and layered: Modular and layered: various degrees of specification possible

Allow for underspecification (& hierarchical structure)underspecification (& hierarchical structure)

Page 36: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

The MILEThe MILE

• Main features– factor outfactor out primitive units of lexical information– explicit representationexplicit representation of information to be

targeted by multilingual NLP tools– rely on lexical analyses with the highest degree

of inter-theoretical agreementinter-theoretical agreement– avoid framework-specificframework-specific representational

solutions– open to different paradigms of multilingualitymultilinguality– oriented to the creation of large-scalelarge-scale lexical

databases

Page 37: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

ObjectiveObjective: definition of the definition of the MILEMILE

as a as a meta-entrymeta-entry to act as a common formatcommon format for resource sharing and integration/architecturearchitecture for lexical data encoding

its basic notions its basic notions general architecturegeneral architecture

formalizedformalized as an entity-rel. model (XML, RDF, etc.)with a tool tool to support it

open to task- & system-dependent parameterisationtask- & system-dependent parameterisation

MILEMILE

Page 38: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Agreed PrinciplesAgreed Principles

MILE MILE builds on the monolingual entrybuilds on the monolingual entry & expands it & expands it MILEMILE incorporates previous EAGLES recommendations

is the is the “complete” entry“complete” entry

adopt as starting point the PAROLE/SIMPLE DTD to be revised, augmented, ...

We consider 2 broad categories ofcategories of applications :applications :

MTMT CLIRCLIR (linking module may be simpler/ontology based)(linking module may be simpler/ontology based)

(label info types wrt application)

Page 39: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Advantages: Flexibility of representation Easy to customise and update Easy integration of existing resources High versatility towards different applications

ModularityModularity at least under three respects:

in the macrostructuremacrostructure and general architecturegeneral architecture of the MILE in the microstructuremicrostructure of the MILE

• monolingual linguistic representation (previous EAGLES revised/updated)• collocational/corpus-driven information (new)• multilingual apparatus (e.g. transfer conditions and actions; interlingua)

(new)

in the specific microstructure of the MILE word-senseword-sense

Modularity in MILEModularity in MILE

Page 40: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

MILE

A. MILE Macrostructure

Meta-information

Architecture

B. MILE Microstructure

1. Monolingual 2. Collocational 3. Multilingual

C. Word-Sense Microstructure

1. Coarse-grained

2. Fine-grained

Modularity in MILEModularity in MILE

Page 41: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

– three independent and yet linked layers characterising the MILE in a source language

– possibly corresponds to the typology of information contained in major existing lexiconsmajor existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet, COMLEX, FrameNet, etc.

– simple and complexand complex lexical unit (to account for MWEsMWEs)– various degrees of granularity of lexical units representation

The MILE ArchitectureThe MILE Architecture MonolingualMonolingual Lexical Description Lexical Description

morphological layer

syntactic layer

semantic layer

correspondenceconditions

Page 42: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

The MILE ArchitectureThe MILE ArchitectureMultilingual Layer

– acts as an (independent) interface layer between monolingual lexicons

morphological layer

syntactic layer

semantic layer

correspondenceconditions

Lexicon 1

multilingual layermultilingual layer

Lexicon 2

Page 43: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

The MILE Multilingual LayerThe MILE Multilingual Layer….(NEW)….(NEW)

• Correspondences can be established between different types of linguistic objects (strings, syntactic descriptions, semantic elements, predicates, etc.)

• Transfer testsTransfer tests and actionsactions to target various types of lexical information in the monolingual layers– constrain syntactic positions and their fillers– lexicalize syntactic positions– add positions or arguments– add new features to define more fine-grained sense

distinctions relevant at the multilingual level– restructuring argument configurations– collocational information

– ...

Page 44: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Paths to Discover thePaths to Discover theBasic Notions of MILEBasic Notions of MILE

• clues in dictionariesclues in dictionaries to decide on target equivalent• guidelines for lexicographersguidelines for lexicographers• clues (to disambiguate/translate) in corpus corpus

concordancesconcordances• lexical requirements from various types of transfer transfer

conditions and actionsconditions and actions in MT systems• lexical requirements from interlinguainterlingua-based systems• …

a list of critical information typescritical information types that will compose each module of the MILE

Page 45: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Organisational Proposal:Organisational Proposal:

division of labourdivision of labour

Highlighted some hot issueshot issues & assigned taskstasks:

sense indicators (EU) selection preferences (EU) lexicographic relevance (EU) argument structure (US) MWE (EU & US) collocations & parallel corpora (US) modifiers (EU) semantic relations (EU) transfer conditions (EU & US) collocational patterns (US) ontology (US) metaphors (EU) interlingua requirements (US) spoken lexicon (EU) meta-representation (US & EU) ...

Page 46: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Organisational ProposalOrganisational ProposalThe tasks will lead to:The tasks will lead to:

an in-depth analysis of eachin-depth analysis of each area area aiming at identifying: the most stable solutions adopted in the community linguistic specifications and criteria possible representational solutions, their compatibility, etc. evaluation of their respective weight/importance in a

multilingual lexicon (towards a layered approach to recommendations)

open issues and current boundaries of the state-of-the-art (which cannot be standardised yet)

model limitations through creation of a sample dictionary …

see how the various pieces fit together & can be merged in a fit together & can be merged in a unified proposalunified proposal

evaluate if we can combine in a “hybrid super-model”“hybrid super-model” the transfer & interlingua approaches

Page 47: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

1. Identification of categories of transfer phenomena2. Ranking of hard cases3. Possible parameterisation wrt language types4. How to formalise them5. Types of actions

Transfer conditions and Transfer conditions and actionsactions

1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.)

2. Inheritance3. Which roles for ontologies in the MILE4. Representational issues5. Customisation and development criteria

OntologyOntology

1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.)

2. Different status of the preferences

3. Criteria to identify them

4. Expressive limits of existing formal resources

Selectional preferencesSelectional preferences

Information Types:Information Types:examplesexamples

Page 48: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

CLWG Ongoing Activities

… to prepare a preliminary proposal of the MILE: • existing models for lexical representation and data

interchange (Genelex, Olif, etc.) are explored• model limitations and expressive power are tested through

creation of sample entries in a few languages

• groups at work• lexical description and information: types of relevant info• lexicographic exploration: systematic summary &

classification of types of transfer tests (also extracted from MRDs)

• multilingual correspondences• lexical data modeling: format & representation issues• tool development

Page 49: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Representation issues

• Working with GENELEX, lexicon development work is (can be) affected by:– impossibility (or difficulty) of defining abstract and

general classes or types of objects– lack of inheritance mechanisms – lack of default expression and default rewriting

mechanisms

Cf. Lexical templates in SIMPLE:• not included in the GENELEX data-structure• implemented in the editing sw. tool• very useful to capture relevant lexical generalizations,

enhance consistency in encoding, speed-up lexicographers’ work, etc.

Page 50: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

MILEMILE Lexical ObjectsFormal Specifications

Monolingual & Multilingual Lexicons

MILEMILEShared LexicalShared Lexical

ObjectsObjects

User DefinedLexical Objects

MILEMILE Lexical EntryFormal Specifications

CLWG Ongoing Activity

Page 51: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

MILEMILE Repository of Shared Lexical Objects:• Basic syntactic constructions (e.g.

transitive, etc.)• (Micro-)semantic objects (e.g. features,

relations)• (Macro-)semantic objects (e.g. lexical

templates)• Multilingual constructions (e.g. basic

transfer conditions and actions)

MILEMILE Repository of Shared Lexical Objects:• Basic syntactic constructions (e.g.

transitive, etc.)• (Micro-)semantic objects (e.g. features,

relations)• (Macro-)semantic objects (e.g. lexical

templates)• Multilingual constructions (e.g. basic

transfer conditions and actions)

MILEMILEShared LexicalShared Lexical

ObjectsObjects

User-DefinedLexical Objects

- New Lexical objects defined by the User according to the common MILE formal data-structure specification.

- Sub-types of the Shared MILE Objects

- Possibly enriched with metadata defining their “semantics” and “usage”

Monolingual&

Multilingual Lexicons

- Lexical entries obtained by referring to various lexical objects (both Shared and User-defined)

- The MILE lexical entry model specifies how lexical objects can be combined to achieve the proper lexical representation

Simplify usingSimplify using MILEMILE

Simplify usingSimplify using MILEMILE

Page 52: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Involvement Involvement

of Asian Languagesof Asian Languages

participation in last meetings some input from AsiaAsia formal cooperation EU-ASIA: formal cooperation EU-ASIA:

steps to put in motionsteps to put in motion

Page 53: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Impact & synergiesImpact & synergies

real impact… real impact… to be evaluated later to be evaluated later

through the use in applicationsthrough the use in applications

already its being a US/EU project & already its being a US/EU project & the Asian interestthe Asian interest

synergies nowsynergies now, e.g.:

PAROLE/SIMPLEPAROLE/SIMPLE (also instantiated in 9 national projects): main input EuroWordNet: provides input XMELLT (NSF): provides input OLIF: expects (& provides) input SALT: complementary ENABLER:ENABLER: validation (& expects input) ELSNET: validation SENSEVAL: validation NIMM WG for Metadata for CL(also with the US OLAC) ...

Page 54: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Target: ….Target: …. Multilingual Content ManagementMultilingual Content Management

the Resources viewpointthe Resources viewpoint The relevance/impact ofrelevance/impact of (good vs. less good) LRsLRs for high-

quality Cross/Multilingual systems is highhigh, even if not easily measurable.

Different applicationsDifferent applications, component technologies - & approaches within - need different info typesneed different info types (e.g. CLIR or content access systems wrt MT)

For each, need to specify (not an easy task):

clear lexical/linguistic/conceptual requirements lexical/linguistic/conceptual requirements priority info typespriority info types (which, how encoded, etc.) the respective rolerespective role of e.g. annotated corpora, mono-

bi- multilingual lexicons (with different info types), ontologies, KBs

Page 55: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Economic Feasibility:Economic Feasibility:

for which (Multilingual) Resources for which (Multilingual) Resources to invest?to invest?

Wrt short- vs. medium-term impact: Basic, general purposegeneral purpose bi-/multilingual lexicons, butbut to be

tuned, adaptedtuned, adapted to different applications

need of robust systems able to acquire/tune robust systems able to acquire/tune

(multilingual) lexical/linguistic/conceptual knowledge(multilingual) lexical/linguistic/conceptual knowledge, to accompany static basic resources

We shouldn’t rely only on parallel corpora. More advisable to aim at reliable methods for acquisitionreliable methods for acquisition & use of ‘comparable ‘comparable

corpora’corpora’, accompanied by robust technologies for annotationtechnologies for annotation (at different levels:

morphosyntactic, syntactic/functional, semantic, …), and by

a shared set ofshared set of (text) annotation schemata (text) annotation schemata

Page 56: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical

Feasibility:Feasibility: Prerequisite:Prerequisite: is it an achievable goalachievable goal a commonly commonly

agreedagreed text/lexicon annotation protocol also for text/lexicon annotation protocol also for the semantic/conceptual levelthe semantic/conceptual level (to be able to automatically establish links among different languages)?

YesYes, at the lexicallexical level

More complex, for corpus annotation?More complex, for corpus annotation?

EAGLES/ISLEEAGLES/ISLE

Page 57: Copenhagen, Oct. 2001 Lexicons … Lexicons … and Complex Expressions: towards Multilingual Linking Nicoletta Calzolari Copenhagen Copenhagen, October 2001.

Copenhagen, Oct. 2001

Content for practical use:Content for practical use:

Gap betw. Resources and Systems?Gap betw. Resources and Systems?

If we had real-size lexicons with very fine-grained real-size lexicons with very fine-grained semantic/conceptual infosemantic/conceptual info, would there be systemssystems (non ad-hoc toy systems) able to use themable to use them?

A vicious circlevicious circle between i)i) lack of suitable, large-size and lack of suitable, large-size and knowledge intensiveknowledge intensive, ,

resourcesresources (lexicons and corpora, with many different types of syntactic and semantic information encoded), and

ii)ii) systems’ ability to use them effectivelysystems’ ability to use them effectively

The twotwo targets should be pursued in parallelpursued in parallel, should closely interactinteract with each other, and

be gradually integratedgradually integrated

??


Recommended