Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 221 times |
Download: | 3 times |
Copenhagen, Oct. 2001
Lexicons …Lexicons …
and Complex Expressions:and Complex Expressions:towards Multilingual Linkingtowards Multilingual Linking
Nicoletta CalzolariNicoletta Calzolari
CopenhagenCopenhagen, October 2001
Copenhagen, Oct. 2001
What is What is SIMPLESIMPLE??
A commoncommon rich modelrich model representation languagerepresentation language methodology of buildingmethodology of building the lexicon
common Template Typescommon Template Types, with default obligatory info (Type defining), and indication of optional info
First time: on a large scale, for so many languagesFirst time: on a large scale, for so many languages Lexical meaning represented in terms of integrated combinations of different integrated combinations of different
sorts of informationsorts of information (semantic type, argument structure, relations, features, etc. ) Ontology-based informationOntology-based information comes together with predicative representationpredicative representation and syntactic syntactic
linkinglinking A shared set of SemUs (from EWN) (about 700) of the 12 Lexicons cross-lingually of the 12 Lexicons cross-lingually
relatedrelated
A set of A set of 1212 harmonised harmonised computational computational lexicons for HLT applications, lexicons for HLT applications, geared for multilingual linksgeared for multilingual links
Copenhagen, Oct. 2001
SemU
MuSSynU
SemU
Sem InfoSem Info
Lexical RelSem. Rel Sem. Feat
MuSSynU
SemUSemU
Sem InfoSem Info
PAROLE/SIMPLE PAROLE/SIMPLE Architecture Architecture + + CLIPSCLIPS Italian Italian National Project National Project
TEMPLATETEMPLATE
55,00055,000lemmaslemmas 55,00055,000
SemUSemU
60,00060,000lemmaslemmas
Copenhagen, Oct. 2001
Semantic informationSemantic information in SIMPLE in SIMPLE
Word senses encoded as Semantic UnitsSemantic Units (SemUs),(SemUs), containing the following info:
• Semantic type *Semantic type *
• Domain *Domain *
• Lexicographic gloss *Lexicographic gloss *
• Extdended Qualia structure
• Reg. Polysemy altern.
• Event type
• Derivation relations
• Synonymy
• Collocations
• Argument structure for Argument structure for predicative SemUs *predicative SemUs *
• Selection restrictions on the Selection restrictions on the arguments *arguments *
• Link of the arguments to the Link of the arguments to the syntactic subcategorization syntactic subcategorization frames (represented in the frames (represented in the PAROLE lexicons) *PAROLE lexicons) *
Copenhagen, Oct. 2001
Semantic MultidimensionalitySemantic Multidimensionality and and NLPNLP
NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of multidimensional aspects of word meaningword meaning, represented in SIMPLE with
the
Extended Qualia RelationsExtended Qualia RelationsIs_a_part_ofIs_a_part_of
Member_ofMember_of
TelicTelic
Made_ofMade_of
la pagina del libro (the page of the book)
il difensore della Juventus (Juventus fullback)
il suonatore di liuto (the lute player)
il tavolo di legno (the wooden table)
Copenhagen, Oct. 2001
SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions
Pred. LayerPred. Layer
QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type
InstantiationInstantiation
…
Italian lexiconItalian lexicon
Type Type OntologyOntology
150 150 typestypes
TemplateTemplate Catalan lexiconCatalan lexicon
Danish lexiconDanish lexicon
Greek lexiconGreek lexicon
Overall OrganizationOverall Organization
......
Copenhagen, Oct. 2001
Each type is associated to a Each type is associated to a Template Template consisting of a cluster of consisting of a cluster of information (relations, features, argument structure, event type, etc.) information (relations, features, argument structure, event type, etc.) that defines the typethat defines the type
The information characterizing a Semantic Unit includes:
a. The type definingtype defining information (associated to the template the SemU instantiates)
b. Additional information (other relations or features, selectional restrictions, terminology, cross-part of speech relations, polysemy, etc.)
The The Core OntologyCore Ontology represents a first level of organization of the represents a first level of organization of the semantic type systemsemantic type system
Copenhagen, Oct. 2001
TemplateTemplate
Contextual/Contextual/Polysemy Polysemy
InformationInformation
Qualia Qualia StructureStructure
Predicative Predicative LayerLayer
Type System Type System CoordinatesCoordinates
SemU: Identifier of a SemUSynU: Identifier of the SynU to which the SemU is linkedBC Number: Number of the corresponding Base Concept in
EuroWordNetTemplate_Type: Semantic type of the SemUTemplate_Supertype: Semantic type which dominates the type of the SemU in the
type-hierarchyUnification_path: Unification history of a template (only for unified top-types)Domain: Domain information from ERLI's domain listSemantic Class: One of WordNet Classes used by ERLIGlossa: Lexicographic definitionEvent Type: Event SortPredicativeRepresentation:
Predicate associated with the SemU, and its argumentstructure
Selectional Restr.: Selectional restrictions on the argumentsDerivation: Derivational relations between SemUsFormal: Formal relation between SemUsAgentive: Agentive relations between SemUsConstitutive: Constitutive relations between SemUs
Constitutive semantic featuresTelic: Telic relations between SemUsSynonymy: Synonyms of the SemUCollocates: Collocate informationComplex: Polysemous class of the SemU
““redundancy”redundancy”
Copenhagen, Oct. 2001
Verb Examples: hear, smell, etc.
Noun Examples: sight, look, etc.Linguistic Tests:Linguistic Tests: ….Levin Class:Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g. look, smell)Comments: Processes involving an experiencing relation, ….
SemU: 1 < <guardare_2> (look)guardare_2> (look)Usyn:BC Number: 105Template_Type: [Perception]Template_Supertype: [Psychological_event]Domain: GeneralSemantic Class: PerceptionGloss: //free// osservare con attenzioneosservare con attenzioneEvent typeEvent type: processprocessPred _RepPred _Rep.:.: Lex_Pred (<arg0>,<arg1>)(<arg0>,<arg1>)Derivation: <Derivational relation> Selectional RestrSelectional Restr.: arg0 = AnimateAnimate //concept// arg1:default = [Entity] Formal:Formal: isa (1,<SemU>:[Perception]>) <percepire>:[Psych_ev]<percepire>:[Psych_ev] AgentiveAgentive:: <Nil>Constitutive:Constitutive: instrumentinstrument (1, <SemU>:[Body_partBody_part]) <occhio><occhio> intentionalityintentionality ={yes,no} //optional// ={yes}={yes}Telic:Telic: <Nil>Collocates: Collocates (<SemU1>,...<SemUn>)Complex: <Nil>
Template for “Perception”Template for “Perception”
Copenhagen, Oct. 2001
Modular RepresentationModular Representation of a SemUSemantic RelationsSemantic Relations
SemU SemU
Predicate, Predicate, arguments, selection arguments, selection
restrictions, ..restrictions, ..
Pred. LayerPred. LayerPred. LayerPred. Layer
Relations betw. Relations betw. SemUsSemUs
Rel. LayerRel. LayerRel. LayerRel. Layer
QualiaQualiamultiple meaning dimensions in a
sense
DerivationDerivationcross-PoS relations
PolysemyPolysemyregular
polysemous classes
CollocatioCollocationn
collocational collocational informationinformation
Flexibility: Flexibility: an extendable framework extendable framework to allow coherent future extensions & tuning for specific
applications/text types
FeaturesFeatures
Copenhagen, Oct. 2001
TopTop
FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic
Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty
ContainsContains
Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic PurposePurpose
InstrumentalInstrumental
Is_the_habit_ofIs_the_habit_ofUsed_forUsed_for Used_asUsed_as
... ...
The targets of relations identify:
prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU
elements of dictionary definitions of SemUselements of dictionary definitions of SemUs
typical corpus collocates of the SemUtypical corpus collocates of the SemU
100 Rels.100 Rels.
....
ActivityActivity.... ....
Copenhagen, Oct. 2001
Ala (wing)
SemU: 3232Type: [Part][Part]Part of an airplanePart of an airplane
<uccello>bird
<parte>part
<volare>fly
IsaSemU: 3268Type: [Part][Part]Part of a buildingPart of a building
SemU: D358Type: [Body_part][Body_part]Organ of birds for flyingOrgan of birds for flying
Used_for
Isa
Isa
<fabbricare>make
Used_for
Agentive
<edificio>building
<aeroplano>airplane
Is_a_part_of
Is_a_part_ofIs_a_part_of
SemU: 3467Type: [Role][Role]Role in footballRole in football
<giocatore>player
Isa
Copenhagen, Oct. 2001
SemU
Sell VSell V
SemU
Sale NSale N
SemU
Seller NSeller N
Pred_SELLPred_SELL <ARG0>, <ARG1>,
<ARG2>, <ARG3>
Event_nounEvent_noun
Relations and Relations and PredicatesPredicates
Is_the_agent_ofIs_the_agent_of
Copenhagen, Oct. 2001
Comprendere V
SemU: 61725
Type: [Cognitive_event][Cognitive_event]
To understandTo understand
SemU: 6962
Type: [Constitutive_state][Constitutive_state]
To includeTo include
Comprensione N
SemU: 61726
Type: [Cognitive_event][Cognitive_event]
UnderstandingUnderstanding
Comprendere#1Comprendere#1 <Arg1 [Human]>, <Arg2 [ Semiotic]><Arg1 [Human]>, <Arg2 [ Semiotic]>
Comprendere#2Comprendere#2<Arg1 [Group]>, <Arg2><Arg1 [Group]>, <Arg2>
master
master
verb_nominalization
problems problems with with
selection selection restrictionsrestrictions
!!!!!!
problems problems with with
selection selection restrictionsrestrictions
!!!!!!
Copenhagen, Oct. 2001
SIMPLE/CLIPSSIMPLE/CLIPS figures (now) figures (now)
((11,00011,000 Lex. Units) 16,903 Lex. Units) 16,903 SemUsSemUs((11,00011,000 Lex. Units) 16,903 Lex. Units) 16,903 SemUsSemUs
Nouns: 12161
Verbs:3476
Adjectives:1266
PredicatesPredicates: 43684368
• TemplatesTemplatesInstrument 734Human 712PsychologicalProperty 586Profession 541Purpose_Act 535Part 503Human_Group 502Relational_Act 521AgentTemporaryActivity
320Domain 303
• FeaturesFeatures & Relations & Relations
Agentive1945
EventTypeProcess1846
EventTypeTransition1463
AgentiveCause1175
Usedfor1488
Synonym 1258ResultingState
1197 Isapartof
909Hasaspart
800 Istheactivityof
611 Objectoftheactivity
598AntonymGrad
575 Createdby
525 Agentverb
454Concerns
421
Copenhagen, Oct. 2001
PAROLE/SIMPLE/EWNPAROLE/SIMPLE/EWN startstart providing the common platformcommon platform
For the subsidiarity subsidiarity concept the process started at the EU level is continued at the national levelnational level:
extended in extended in (at least) (at least) 9 National Projects9 National Projects (Danish, Greek, Italian, Portuguese, Swedish, ...)
(to be) used in applications
True Infrastructure of harmonised LRs in EU True Infrastructure of harmonised LRs in EU Basis for Multilingual LRBasis for Multilingual LR
ENABLERENABLER ((coord. A. Zampolli)coord. A. Zampolli)
Core Lexicons enlarged inCore Lexicons enlarged in National ProjectsNational Projects
Copenhagen, Oct. 2001
Harmonisation:Harmonisation:Need for a Global ViewNeed for a Global View
Interaction/sharingInteraction/sharing of data & software/tools Need of compatibility among various componentscompatibility among various components An “exemplary cycle”:An “exemplary cycle”:
FormalismsFormalisms
GrammarsGrammars
Software: Taggers,Software: Taggers,Chunkers, ParsersChunkers, Parsers
Representation Representation AnnotationAnnotation
LexiconLexicon CorporaCorpora
Software: Software: Acquisition SystemsAcquisition Systems
I/O InterfacesI/O Interfaces
LanguageLanguagess
Copenhagen, Oct. 2001
SIMPLESIMPLE wrt wrt EAGLES/ISLEEAGLES/ISLEStandards for
Multilingual Lexical resources
EAGLES guidelines for syntactic and semantic lexicons
PAROLE/SIMPLELexicons
MT systems
MultilingualLexicons
ISLE recommendations
for multilingual lexicons
Copenhagen, Oct. 2001
MissionMission((http://lingue.ilc.pi.cnr.it/EAGLES96/isle/http://lingue.ilc.pi.cnr.it/EAGLES96/isle/
ISLE_Home_Page.htmISLE_Home_Page.htm))
• MT and multilingual HLT need to enhance production, maintenance & extension of computational lexical resources
• ISLE goals– provide a common environment for the development, integration, interchange
& sharing of lexical resources with various types of linguistic information
– establish a virtuous circle betw. research, applications, & standardization process: lay down a bridge betw. the worlds of research and application
– mark the boundary between well-consolidated practice and theoretical achievements in multilingual HLT, and areas still open to research but critical for future technological improvements
• Crucial role of intercontinental cooperationintercontinental cooperation for preparing ISLE recommendations and for their validation
Copenhagen, Oct. 2001
ISLE and MTISLE and MT
• Academic and industrial members of the MT community actively involved in the ISLE group– Microsoft, NMSU, Sail Labs, Systran, UMIACS, UPenn, ISI, etc.
• Survey phase: – a number of lexical resources for MT systems surveyed by ISLE
• MT systems requirements provide the main reference points for ISLE work, to determine:– types of lexical information critical to SL TL mapping– criteria to create bilingual resources from existing monolingual ones– common data structures to develop reusable multilingual resources– critical areas of the lexicon: MWEsMWEs, complex transfer cases,
collocational/example-based information, etc.MWEMWE
parenthesisparenthesis
Copenhagen, Oct. 2001
MWEMWE in ISLE & XMELLT - 2 types of MWE: in ISLE & XMELLT - 2 types of MWE:
(Deverbal) nominalisations +support (light) verbs(Deverbal) nominalisations +support (light) verbs make an acquisition1 (noun.act; verb.possession) complete an acquisition1 undertake an acquisition1
make an application1 (noun/verb.communication) have an application1 in decide on an application1 (consider, hear) get an application1 (receive, take) submit an application1 (file)
Noun(/Adj/Poss)+Noun MWNoun(/Adj/Poss)+Noun MW (Ital.: N+PP/N+Adj/N+Vinf/...)
air pollution job application murder suspect police action; police scandal
• coltello dada macellaio butcher's's knife• carta didi credito credit card• carta telefonica (adj)(adj) phone card• agenzia di viaggi travel agency• film perper adulti adult movie (adj)(adj)• macchina da scrivere typewriter (comp.)(comp.)
11stst
NoNo equivalentequivalent structuresstructures
NoNo equivalentequivalent structuresstructures
22ndnd
Copenhagen, Oct. 2001
The Boundaries:The Boundaries:··Support VerbsSupport Verbs:: more than Light Verbs? more than Light Verbs?
·· Nominalisations Nominalisations:: …. to a broader set …. to a broader set
Both verbs,verbs, combined with an event noun, whose subjects subjects are : participantsparticipants in the event identified by the noun related to some scenariorelated to some scenario associated with the eventevent
Type 1: take an exam, give an examtake an exam, give an exam Type 2: pass an exam, fail an exam, grade (evaluate) an exampass an exam, fail an exam, grade (evaluate) an exam
Type 1: perform an operation, undergo an operationperform an operation, undergo an operation Type 2: survive an operationsurvive an operation
But also … But also … enlarge the concept of nominalisation nominalisation to event/result/abstract nouns not morphologically derivedevent/result/abstract nouns not morphologically derived
dare un ceffone ceffone (to slap) provare rancore rancore (to bear sb. a grudge) fare una festa festa (to have a party) fare festa festa (to have a holiday) fare festa festa a qno (to give sb. a warm welcome) prestare attenzione attenzione (to pay attention) fare la guerra guerra (to wage war)
fare una cessione (cedere) cessione (cedere) vs.vs. make? a cession (…) cession (…) avere una cessazione (cessare) cessazione (cessare) delle ostilita vs.vs. have? a cessation cessation of hostilities (…)
11stst
No verbNo verb (for diachronic reason)(for diachronic reason)
No verbNo verb (for diachronic reason)(for diachronic reason)
Copenhagen, Oct. 2001
Hypothesis for encoding:Hypothesis for encoding:“Mel’cuk type” Lexical Functions (“Mel’cuk type” Lexical Functions (LFLF))
to record to record semanticsemantic contribution and/or contribution and/or aspectual aspectual properties properties conveyed by the Vconveyed by the V
to express to express argument-sharingargument-sharing betw 2 arg structuresbetw 2 arg structures
Oper1: Oper1: perform an operation;perform an operation; made an apologymade an apology Oper2: Oper2: undergo an operation; merits discussion;undergo an operation; merits discussion; had a visithad a visit Func0: Func0: silence reignsilence reign LaborLaborijij: : take into considerationtake into consideration Incep: Incep: start the attackstart the attack Cont: Cont: maintain influencemaintain influence Fin: Fin: complete the acquisitioncomplete the acquisition Liqu: Liqu: eradicate the diseaseeradicate the disease Real: Real: keep the promise, approve the applicationkeep the promise, approve the application AntiReal: AntiReal: turn down, withdraw the applicationturn down, withdraw the application ……..
11stst
Copenhagen, Oct. 2001
Nominalisations: Nominalisations: examples from examples from CorpusCorpus
accusaaccusa(supp-v: (supp-v: formulareformulare, lanciare, muovere, rivolgere,..., lanciare, muovere, rivolgere,... (Oper1)(Oper1) subiresubire[default],[default], beccarsi, attirarsi, rischiare,... beccarsi, attirarsi, rischiare,... (Oper2)(Oper2) mettere, porre,... sotto a.mettere, porre,... sotto a. (Laborij)(Laborij) rintuzzare, rigettare,rintuzzare, rigettare, smontare, …smontare, … (Liqu)(Liqu)Problematic?:Problematic?: ritorcere, rovesciare… ritorcere, rovesciare… (...)(...) sostenere,… sostenere,… (...) (...) ripetere,ripetere, … … (...)(...) ……....
________________________________________________________________________________________________________________________acquisizioneacquisizione
(supp-v: (fare)(supp-v: (fare)[default],[default], condurre, curare,effettuare,... condurre, curare,effettuare,... (Oper1)(Oper1) vararevarare,...,... (Incep)(Incep)
perfezionareperfezionare, completare,, completare, concludere, …concludere, … (Fin)(Fin) evitare, compromettere, …evitare, compromettere, … (Liqu)(Liqu) sfumare, …sfumare, … (LiquFunc0)(LiquFunc0)Problematic?:Problematic?: annuciare, dichiarare,annuciare, dichiarare, … … (say)(say) decidere, proporre, promuovere, stimolare,decidere, proporre, promuovere, stimolare, … … (...)(...) consentire, permettere, proporre, garantire,consentire, permettere, proporre, garantire, … … (...)(...) ……....
11stst
Automatic Automatic acquisitionacquisition
Automatic Automatic acquisitionacquisition
Copenhagen, Oct. 2001
Support Verbs:Support Verbs: whatwhat to listto list for multilingual lexicons?for multilingual lexicons?
Decide if to include/listinclude/list, for a noun all the verbsall the verbs usable for a Melcukian LF
INCEPINCEP: : cominciare cominciare [default] vs. vs. varare, intraprendere, … varare, intraprendere, … INCEPINCEP: : begin begin [default] vs. vs. open open (an investigation),(an investigation), … … OPER1OPER1::saysay a prayer a prayer (not (not makemake, , like with other speech act
nouns) OPER1OPER1::paypay attention attention
onlyonly those lexically dedicated to that nounlexically dedicated to that noun (needed for generation) (not the general & available by default for a LF)
begin begin an exam/operation or finishfinish an exam/operation
similar words preferentially select different verbsdifferent verbs to express similar similar meaningsmeanings (same lexical functions): (same lexical functions): lexical preferencelexical preference
11stst
Copenhagen, Oct. 2001
Complex nominalsComplex nominals in a multilingual frameworkin a multilingual framework
Different syntactic patterns in L1 & L2Different syntactic patterns in L1 & L2 N+Nh (= head noun) in English is usually Nh+PP in Italian
tooth brush spazzolino da spazzolino da denti & the syntactic pattern is not predictable& the syntactic pattern is not predictable
hair/clothes brush spazzola per spazzola per capelli/abitinail brush spazzola per lespazzola per le unghie
• travel agency agenzia di di viaggi• real estate agency agenzia immobiliimmobiliareare• marriage bureau agenzia matrimonimatrimonialeale
A MWE in L1A MWE in L1 corresponding to a corresponding to a fully compositional phrasefully compositional phrase cucchiaino da caffè cucchiaino da caffè coffee spoon???
For MT implies some conceptual (interlingual?) representation conceptual (interlingual?) representation
but the “encoding”“encoding” process must find an appropriate MWE if it is called for
analogous to blocking/pre-emption:blocking/pre-emption: a regular/compositional process is not carried out (dispreferred) because the semantic space occupied by the concept associated with that formation is already claimed by some ready-made expression
22ndnd
FillmoreFillmoreFillmoreFillmore
Copenhagen, Oct. 2001
If look at devices in grammar that allow to produce new MWEsIf look at devices in grammar that allow to produce new MWEs
a a continuumcontinuum::
N+PPN+PP>>collocationcollocation>>multi-wordmulti-word>>idiomidiom
productiveproductive mechanisms in the languagemechanisms in the language but idiosyncraticbut idiosyncratic
information at the borderline betw. grammar & lexiconinformation at the borderline betw. grammar & lexicon
Amounts to: Amounts to: describedescribe productive modification relation of Nproductive modification relation of N in general:in general: in particular those lexically selected/preferred by a N (its semantic in particular those lexically selected/preferred by a N (its semantic
paradigm)paradigm)
MWE are a subset of theseMWE are a subset of these
(give good hints to discover most prominent relations??)(give good hints to discover most prominent relations??)
look at thelook at the semantic structure of Nounssemantic structure of Nouns: : i.e. at the variety of i.e. at the variety of modifiers they can select modifiers they can select by virtue of their meaning by virtue of their meaning
Broader scope :Broader scope : extension to non MWE?extension to non MWE?22ndnd
FillmoreFillmoreFillmoreFillmore
Copenhagen, Oct. 2001
Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals…are …are pervasivepervasive
There is a motivation in most N+N construction: the context provides it
The FrameNetFrameNet (SIMPLESIMPLE) way appeal to specific frame structuresspecific frame structures (qualia qualia
structuresstructures) associated with the head noun, determine from corpus attestations which frame which frame
elementselements (qualiaqualia) can get instantiated as a modifier word
““container”:container”: complex nominals can specify: material material (aluminium c., glass c., …) contents contents (food c., trash c., …) size size (3 quart c., …) function function (shipping c., storage c., …) ......
22ndnd
Fillmore Busa
Copenhagen, Oct. 2001
a.a. FrameNetFrameNet
Container Frame: Frame ElementsFrame Elements: Material,Contents,Size,Function• Material: aluminum container, glass c., metal c., tin c.• Contents: food container, beverage c., trash c., water c., milk c., fuel c.• Size: 3 quart container• Function: shipping container, storage c.
b.b. SIMPLESIMPLE
Qualia RelationsQualia Relations of "container" used in compounds:• Constitutive: made_of [MATERIAL]
aluminum container, glass c., metal c., tin c.• Telic: contains [ENTITY]
food container, beverage c., trash c., water c., milk c., fuel c.
• Constitutive:size [QUANTITY]3 quart container
• Telic:is_used_for [EVENT]shipping container, storage c.
Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals& & multidimensional multidimensional semantic approachessemantic approaches22ndnd
Copenhagen, Oct. 2001
describe vs. list?describe vs. list? if a compound noun is clearly lexicalizedlexicalized, it's simply one of the words in L1 butbut if it is an instance of some productive word-formation ruleinstance of some productive word-formation rule, we
should describe it
bothboth describe & list: describe & list:
listlist explicitly in the lexical entry explicitly in the lexical entry
what iswhat is idiomatic/idiosyncratic wrt generationidiomatic/idiosyncratic wrt generation for for lexical selection
mucca pazzamucca pazza vs. mattavs. matta prestare attenzioneprestare attenzione vs.vs. pay attention pay attention
structural pattern travel agency travel agency agenzia di viaggiagenzia di viaggi marriage bureaumarriage bureau agenzia matrimoniale agenzia matrimoniale (*di matrimonio) real estate agencyreal estate agency agenzia immobiliareagenzia immobiliare
but also,but also, an apparatusan apparatus to describeto describe how word semantics of Ns how word semantics of Ns interact when they co-occur (co-selection, co-composition, ...) interact when they co-occur (co-selection, co-composition, ...)
Complex Nominals/Lexical ConstructionsComplex Nominals/Lexical Constructions
in a multilingual context…in a multilingual context…22ndnd
Copenhagen, Oct. 2001
In a multilingual context…In a multilingual context…
...regularities in each language, but they don’t match...regularities in each language, but they don’t match
Both for decoding & encoding, decoding & encoding, we need both: both: a linguistic apparatus for interpretationa linguistic apparatus for interpretation (e.g. to go to a language where it is not a MWE: cucchiaino da caffè for a Japanese useful to know … “used
for”) lists for lists for idioms…, idioms…, for for unpredictable/idiosyncraticunpredictable/idiosyncratic
Same apparatus to interpret both MWE & regular N Same apparatus to interpret both MWE & regular N constructionsconstructions (similar power of expressiveness): general principles of semantic constitution of lex. items & their combinatorics in terms e.g. of frames/qualia/…:
basic sem. notionsbasic sem. notions & a general schema to characterise the problema general schema to characterise the problem, e.g.
frame frame (qualia) (qualia) structure of the headNstructure of the headN semantic Type of the modifier Nsemantic Type of the modifier N allow the headN to impose its interpretation on the modification allow the headN to impose its interpretation on the modification
rel.rel. ......
22ndnd
Copenhagen, Oct. 2001
a “cutting frame” (FrameNet) “cutting frame” (FrameNet) specific SIMPLE dimensions of meaningspecific SIMPLE dimensions of meaning
extensively evaluate whether qualia roles (already) encoded in SIMPLE correspond to what is necessary to interpret N-N modification relations
SIMPLE Extended Qualia structureSIMPLE Extended Qualia structurefor the interpretation of the semantic relation betw. Ns for the interpretation of the semantic relation betw. Ns
(internal relational structure of(internal relational structure of MWE MWE))
butcher’s knife butcher’s knife (coltello (coltello dada macellaio) macellaio) TELIC TELIC (used_by) (used_by) Y [Y [HumanHuman] ] PPdaPPda plastic knife plastic knife (coltello (coltello didi plastica) plastica) CONST CONST (made_of) (made_of) X [X [MaterialMaterial]] PPdiPPdi table knife table knife (coltello (coltello dada tavola) tavola) TELIC TELIC (used_in) (used_in) Z [Z [LocationLocation]] PPdaPPda hunting knife hunting knife (coltello (coltello dada caccia) caccia) TELIC TELIC (used_in_activity) (used_in_activity) E [E [ActivityActivity] ] PpdaPpda
piatto piatto didi legno legno CONST CONST (made_of) (made_of) X [X [MaterialMaterial]] PPdiPPdi piatto piatto didi pasta pasta CONST CONST (contains) (contains) X [X [FoodFood]] PPdiPPdi
Complex nominals, e.g.Complex nominals, e.g. knife knife (coltello) (coltello) triggerstriggers
22ndnd
PPPPdisambig.disambig.
PPPPdisambig.disambig.
Copenhagen, Oct. 2001
In In SIMPLESIMPLE: : possible possible extensionextension
Deverbal nominalisation:Deverbal nominalisation:
noun noun murdermurder ( (uccisione, delitto, omicidiouccisione, delitto, omicidio (different sem. (different sem.
pref.pref.))
PPdiPPdi PREDPRED::MURDERMURDER(uccidere)(uccidere)
PPda_parte_di, diPPda_parte_di, di ARG1ARG1:agent:agent[Hum/Anim?][Hum/Anim?]
verbverb murdermurder ( (uccidereuccidere)) ARG2ARG2::patientpatient[Hum/Anim?][Hum/Anim?]
subj:NPsubj:NP MOD1MOD1::instrinstr[Weapon][Weapon]
obj:NPobj:NP MOD2MOD2::meansmeans[Action][Action]
MOD3MOD3::......[...][...]
:instr: PPcon:instr: PPcon [ [WeaponWeapon] ] ((knife m., knife m., concon coltello coltello))
:means: PPper:means: PPper [ [ActionAction] ] ((strangulation m., strangulation m., perper strangolamento strangolamento))
:loc: Ppploc|di:loc: Ppploc|di [ [LocationLocation] ] ((Kent State murders, Kent State murders, nelnel ... ...))
:time: Ppptime|di:time: Ppptime|di [ [TimeTime] ] ((1983 murders, 1983 murders, del del 19831983))
As if it were As if it were a Situationa Situation
22ndnd
Copenhagen, Oct. 2001
consider as the starting point for MILE the edited unionedited union of the basic notions represented in the existing syntactic/semantic lexicons (their models)
evaluate their notions wrt EAGLESEAGLES recommendations for syntax and semantics
evaluate their usefulness & adequacyusefulness & adequacy for multilingual tasks
evaluate integrabilityintegrability of their notions in a unitary MILE
look for deficient areasdeficient areas, e.g. MWEMWE
...
… … Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation
Strategy:
To be decided: should ISLE reach a consensus at the level of the “types”“types” of information only, or also at the level of their “token”“token” values? …. different answers for diff. notions
Copenhagen, Oct. 2001
… … the Multilingual ISLE the Multilingual ISLE Lexical EntryLexical Entry (MILE)(MILE)
General methodological principlesmethodological principles (from EAGLES):
Basic requirements for the MILEMILE::
Discover and list the (maximal) set of basic notionsbasic notions
needed to describe the MILE (up to which level standardisation is feasible?)
GranularityGranularity
The leading principle for the design of the MILE: the
edited unionedited union of existing lexicons/models (redundancyredundancy is not a problem)
Modular and layered: Modular and layered: various degrees of specification possible
Allow for underspecification (& hierarchical structure)underspecification (& hierarchical structure)
Copenhagen, Oct. 2001
The MILEThe MILE
• Main features– factor outfactor out primitive units of lexical information– explicit representationexplicit representation of information to be
targeted by multilingual NLP tools– rely on lexical analyses with the highest degree
of inter-theoretical agreementinter-theoretical agreement– avoid framework-specificframework-specific representational
solutions– open to different paradigms of multilingualitymultilinguality– oriented to the creation of large-scalelarge-scale lexical
databases
Copenhagen, Oct. 2001
ObjectiveObjective: definition of the definition of the MILEMILE
as a as a meta-entrymeta-entry to act as a common formatcommon format for resource sharing and integration/architecturearchitecture for lexical data encoding
its basic notions its basic notions general architecturegeneral architecture
formalizedformalized as an entity-rel. model (XML, RDF, etc.)with a tool tool to support it
open to task- & system-dependent parameterisationtask- & system-dependent parameterisation
MILEMILE
Copenhagen, Oct. 2001
Agreed PrinciplesAgreed Principles
MILE MILE builds on the monolingual entrybuilds on the monolingual entry & expands it & expands it MILEMILE incorporates previous EAGLES recommendations
is the is the “complete” entry“complete” entry
adopt as starting point the PAROLE/SIMPLE DTD to be revised, augmented, ...
We consider 2 broad categories ofcategories of applications :applications :
MTMT CLIRCLIR (linking module may be simpler/ontology based)(linking module may be simpler/ontology based)
(label info types wrt application)
Copenhagen, Oct. 2001
Advantages: Flexibility of representation Easy to customise and update Easy integration of existing resources High versatility towards different applications
ModularityModularity at least under three respects:
in the macrostructuremacrostructure and general architecturegeneral architecture of the MILE in the microstructuremicrostructure of the MILE
• monolingual linguistic representation (previous EAGLES revised/updated)• collocational/corpus-driven information (new)• multilingual apparatus (e.g. transfer conditions and actions; interlingua)
(new)
in the specific microstructure of the MILE word-senseword-sense
Modularity in MILEModularity in MILE
Copenhagen, Oct. 2001
MILE
A. MILE Macrostructure
Meta-information
Architecture
B. MILE Microstructure
1. Monolingual 2. Collocational 3. Multilingual
C. Word-Sense Microstructure
1. Coarse-grained
2. Fine-grained
Modularity in MILEModularity in MILE
Copenhagen, Oct. 2001
– three independent and yet linked layers characterising the MILE in a source language
– possibly corresponds to the typology of information contained in major existing lexiconsmajor existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet, COMLEX, FrameNet, etc.
– simple and complexand complex lexical unit (to account for MWEsMWEs)– various degrees of granularity of lexical units representation
The MILE ArchitectureThe MILE Architecture MonolingualMonolingual Lexical Description Lexical Description
morphological layer
syntactic layer
semantic layer
correspondenceconditions
Copenhagen, Oct. 2001
The MILE ArchitectureThe MILE ArchitectureMultilingual Layer
– acts as an (independent) interface layer between monolingual lexicons
morphological layer
syntactic layer
semantic layer
correspondenceconditions
Lexicon 1
multilingual layermultilingual layer
Lexicon 2
Copenhagen, Oct. 2001
The MILE Multilingual LayerThe MILE Multilingual Layer….(NEW)….(NEW)
• Correspondences can be established between different types of linguistic objects (strings, syntactic descriptions, semantic elements, predicates, etc.)
• Transfer testsTransfer tests and actionsactions to target various types of lexical information in the monolingual layers– constrain syntactic positions and their fillers– lexicalize syntactic positions– add positions or arguments– add new features to define more fine-grained sense
distinctions relevant at the multilingual level– restructuring argument configurations– collocational information
– ...
Copenhagen, Oct. 2001
Paths to Discover thePaths to Discover theBasic Notions of MILEBasic Notions of MILE
• clues in dictionariesclues in dictionaries to decide on target equivalent• guidelines for lexicographersguidelines for lexicographers• clues (to disambiguate/translate) in corpus corpus
concordancesconcordances• lexical requirements from various types of transfer transfer
conditions and actionsconditions and actions in MT systems• lexical requirements from interlinguainterlingua-based systems• …
a list of critical information typescritical information types that will compose each module of the MILE
Copenhagen, Oct. 2001
Organisational Proposal:Organisational Proposal:
division of labourdivision of labour
Highlighted some hot issueshot issues & assigned taskstasks:
sense indicators (EU) selection preferences (EU) lexicographic relevance (EU) argument structure (US) MWE (EU & US) collocations & parallel corpora (US) modifiers (EU) semantic relations (EU) transfer conditions (EU & US) collocational patterns (US) ontology (US) metaphors (EU) interlingua requirements (US) spoken lexicon (EU) meta-representation (US & EU) ...
Copenhagen, Oct. 2001
Organisational ProposalOrganisational ProposalThe tasks will lead to:The tasks will lead to:
an in-depth analysis of eachin-depth analysis of each area area aiming at identifying: the most stable solutions adopted in the community linguistic specifications and criteria possible representational solutions, their compatibility, etc. evaluation of their respective weight/importance in a
multilingual lexicon (towards a layered approach to recommendations)
open issues and current boundaries of the state-of-the-art (which cannot be standardised yet)
model limitations through creation of a sample dictionary …
see how the various pieces fit together & can be merged in a fit together & can be merged in a unified proposalunified proposal
evaluate if we can combine in a “hybrid super-model”“hybrid super-model” the transfer & interlingua approaches
Copenhagen, Oct. 2001
1. Identification of categories of transfer phenomena2. Ranking of hard cases3. Possible parameterisation wrt language types4. How to formalise them5. Types of actions
Transfer conditions and Transfer conditions and actionsactions
1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.)
2. Inheritance3. Which roles for ontologies in the MILE4. Representational issues5. Customisation and development criteria
OntologyOntology
1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.)
2. Different status of the preferences
3. Criteria to identify them
4. Expressive limits of existing formal resources
Selectional preferencesSelectional preferences
Information Types:Information Types:examplesexamples
Copenhagen, Oct. 2001
CLWG Ongoing Activities
… to prepare a preliminary proposal of the MILE: • existing models for lexical representation and data
interchange (Genelex, Olif, etc.) are explored• model limitations and expressive power are tested through
creation of sample entries in a few languages
• groups at work• lexical description and information: types of relevant info• lexicographic exploration: systematic summary &
classification of types of transfer tests (also extracted from MRDs)
• multilingual correspondences• lexical data modeling: format & representation issues• tool development
Copenhagen, Oct. 2001
Representation issues
• Working with GENELEX, lexicon development work is (can be) affected by:– impossibility (or difficulty) of defining abstract and
general classes or types of objects– lack of inheritance mechanisms – lack of default expression and default rewriting
mechanisms
Cf. Lexical templates in SIMPLE:• not included in the GENELEX data-structure• implemented in the editing sw. tool• very useful to capture relevant lexical generalizations,
enhance consistency in encoding, speed-up lexicographers’ work, etc.
Copenhagen, Oct. 2001
MILEMILE Lexical ObjectsFormal Specifications
Monolingual & Multilingual Lexicons
MILEMILEShared LexicalShared Lexical
ObjectsObjects
User DefinedLexical Objects
MILEMILE Lexical EntryFormal Specifications
CLWG Ongoing Activity
Copenhagen, Oct. 2001
MILEMILE Repository of Shared Lexical Objects:• Basic syntactic constructions (e.g.
transitive, etc.)• (Micro-)semantic objects (e.g. features,
relations)• (Macro-)semantic objects (e.g. lexical
templates)• Multilingual constructions (e.g. basic
transfer conditions and actions)
MILEMILE Repository of Shared Lexical Objects:• Basic syntactic constructions (e.g.
transitive, etc.)• (Micro-)semantic objects (e.g. features,
relations)• (Macro-)semantic objects (e.g. lexical
templates)• Multilingual constructions (e.g. basic
transfer conditions and actions)
MILEMILEShared LexicalShared Lexical
ObjectsObjects
User-DefinedLexical Objects
- New Lexical objects defined by the User according to the common MILE formal data-structure specification.
- Sub-types of the Shared MILE Objects
- Possibly enriched with metadata defining their “semantics” and “usage”
Monolingual&
Multilingual Lexicons
- Lexical entries obtained by referring to various lexical objects (both Shared and User-defined)
- The MILE lexical entry model specifies how lexical objects can be combined to achieve the proper lexical representation
Simplify usingSimplify using MILEMILE
Simplify usingSimplify using MILEMILE
Copenhagen, Oct. 2001
Involvement Involvement
of Asian Languagesof Asian Languages
participation in last meetings some input from AsiaAsia formal cooperation EU-ASIA: formal cooperation EU-ASIA:
steps to put in motionsteps to put in motion
Copenhagen, Oct. 2001
Impact & synergiesImpact & synergies
real impact… real impact… to be evaluated later to be evaluated later
through the use in applicationsthrough the use in applications
already its being a US/EU project & already its being a US/EU project & the Asian interestthe Asian interest
synergies nowsynergies now, e.g.:
PAROLE/SIMPLEPAROLE/SIMPLE (also instantiated in 9 national projects): main input EuroWordNet: provides input XMELLT (NSF): provides input OLIF: expects (& provides) input SALT: complementary ENABLER:ENABLER: validation (& expects input) ELSNET: validation SENSEVAL: validation NIMM WG for Metadata for CL(also with the US OLAC) ...
Copenhagen, Oct. 2001
Target: ….Target: …. Multilingual Content ManagementMultilingual Content Management
the Resources viewpointthe Resources viewpoint The relevance/impact ofrelevance/impact of (good vs. less good) LRsLRs for high-
quality Cross/Multilingual systems is highhigh, even if not easily measurable.
Different applicationsDifferent applications, component technologies - & approaches within - need different info typesneed different info types (e.g. CLIR or content access systems wrt MT)
For each, need to specify (not an easy task):
clear lexical/linguistic/conceptual requirements lexical/linguistic/conceptual requirements priority info typespriority info types (which, how encoded, etc.) the respective rolerespective role of e.g. annotated corpora, mono-
bi- multilingual lexicons (with different info types), ontologies, KBs
Copenhagen, Oct. 2001
Economic Feasibility:Economic Feasibility:
for which (Multilingual) Resources for which (Multilingual) Resources to invest?to invest?
Wrt short- vs. medium-term impact: Basic, general purposegeneral purpose bi-/multilingual lexicons, butbut to be
tuned, adaptedtuned, adapted to different applications
need of robust systems able to acquire/tune robust systems able to acquire/tune
(multilingual) lexical/linguistic/conceptual knowledge(multilingual) lexical/linguistic/conceptual knowledge, to accompany static basic resources
We shouldn’t rely only on parallel corpora. More advisable to aim at reliable methods for acquisitionreliable methods for acquisition & use of ‘comparable ‘comparable
corpora’corpora’, accompanied by robust technologies for annotationtechnologies for annotation (at different levels:
morphosyntactic, syntactic/functional, semantic, …), and by
a shared set ofshared set of (text) annotation schemata (text) annotation schemata
Copenhagen, Oct. 2001
Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical
Feasibility:Feasibility: Prerequisite:Prerequisite: is it an achievable goalachievable goal a commonly commonly
agreedagreed text/lexicon annotation protocol also for text/lexicon annotation protocol also for the semantic/conceptual levelthe semantic/conceptual level (to be able to automatically establish links among different languages)?
YesYes, at the lexicallexical level
More complex, for corpus annotation?More complex, for corpus annotation?
EAGLES/ISLEEAGLES/ISLE
Copenhagen, Oct. 2001
Content for practical use:Content for practical use:
Gap betw. Resources and Systems?Gap betw. Resources and Systems?
If we had real-size lexicons with very fine-grained real-size lexicons with very fine-grained semantic/conceptual infosemantic/conceptual info, would there be systemssystems (non ad-hoc toy systems) able to use themable to use them?
A vicious circlevicious circle between i)i) lack of suitable, large-size and lack of suitable, large-size and knowledge intensiveknowledge intensive, ,
resourcesresources (lexicons and corpora, with many different types of syntactic and semantic information encoded), and
ii)ii) systems’ ability to use them effectivelysystems’ ability to use them effectively
The twotwo targets should be pursued in parallelpursued in parallel, should closely interactinteract with each other, and
be gradually integratedgradually integrated
??