+ All Categories
Home > Documents > Infrastructural Language Resources & Standards for Multilingual Computational Lexicons

Infrastructural Language Resources & Standards for Multilingual Computational Lexicons

Date post: 18-Mar-2016
Category:
Upload: feivel
View: 40 times
Download: 2 times
Share this document with a friend
Description:
Infrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto di Linguistica Computazionale - CNR - Pisa [email protected]. The ENABLER Mission. - PowerPoint PPT Presentation
Popular Tags:
129
Pisa, September 2004 Infrastructural Infrastructural Language Resources Language Resources & & Standards for Multilingual Standards for Multilingual Computational Lexicons Computational Lexicons Nicoletta Calzolari Nicoletta Calzolari … with many others Istituto di Linguistica Computazionale - CNR - Pisa [email protected]
Transcript
Page 1: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Infrastructural Infrastructural Language Resources Language Resources

& & Standards for Multilingual Standards for Multilingual Computational LexiconsComputational Lexicons

Nicoletta CalzolariNicoletta Calzolari… with many others

Istituto di Linguistica Computazionale - CNR - [email protected]

Page 2: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The ENABLER MissionThe ENABLER MissionLanguage Resources (LRs) & Evaluation: central Language Resources (LRs) & Evaluation: central component of the component of the ““linguistic infrastructurelinguistic infrastructure””

LRs supported by national funding in LRs supported by national funding in National ProjectsNational Projects

Availability of LRsAvailability of LRs also a “sensitive” issue, touching the also a “sensitive” issue, touching the sphere of linguistic and cultural identity, but also with sphere of linguistic and cultural identity, but also with economical and political implicationseconomical and political implications

The The ENABLER Network of National initiativesENABLER Network of National initiatives, aims at , aims at “enabling” the realisation of a “enabling” the realisation of a cooperative frameworkcooperative framework

formulate aformulate a common agenda of medium- & long-term common agenda of medium- & long-term research prioritiesresearch priorities contribute to the contribute to the definition of an overall framework for the definition of an overall framework for the provision of LRsprovision of LRs

Page 3: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

towards ….towards ….

Only Only Combining the strengths of different initiatives & communitiesCombining the strengths of different initiatives & communitiesExploiting at best the ‘modus operandi’ of the national funding Exploiting at best the ‘modus operandi’ of the national funding authorities in different national situationsauthorities in different national situationsResponding to/anticipating needs and priorities of R&D & Responding to/anticipating needs and priorities of R&D & industrial communitiesindustrial communitiesPromoting the adoption of Promoting the adoption of [[de factode facto]] standards, best practices standards, best practicesWith a clear distinction of tasks & roles for different actorsWith a clear distinction of tasks & roles for different actors

We can produce theWe can produce the synergies, economy of scale, convergence & critical mass synergies, economy of scale, convergence & critical mass

necessary to provide thenecessary to provide the infrastructural LRs infrastructural LRs needed to realise the needed to realise the full potential of a full potential of a multilingualmultilingual global information society global information society

Page 4: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Lexicon and Corpus:Lexicon and Corpus:a multi-faceted a multi-faceted

interactioninteraction L L C C taggingtagging C C L L frequencies (of different linguistic “objects”)frequencies (of different linguistic “objects”) C C L L proper nouns, acronyms, …proper nouns, acronyms, … L L C C parsing, chunking, …parsing, chunking, … C C L L training of parserstraining of parsers C C L L lexicon updatinglexicon updating C C L L “collocational” data (MWE“collocational” data (MWE, idioms, gram. patterns ...), idioms, gram. patterns ...) C C L L “nuances” of meanings & semantic clustering“nuances” of meanings & semantic clustering C C L L acquisition of lexical (syntactic/semantic) knowledgeacquisition of lexical (syntactic/semantic) knowledge L L C C semantic tagging/word-sense disambiguation semantic tagging/word-sense disambiguation (e.g. in Senseval)(e.g. in Senseval) C C L L more semantic information on LEmore semantic information on LE C C L L corpus based computational lexicographycorpus based computational lexicography C C L L validation of lexical modelsvalidation of lexical models C C L L …… L L C C ......

Page 5: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

...Language as a “Continuum”...Language as a “Continuum”

Interesting - and intriguing - aspects of corpus use: Interesting - and intriguing - aspects of corpus use: impossibilityimpossibility of descriptions based on a of descriptions based on a clear-cut boundaryclear-cut boundary betw. betw.

what is what is admitted admitted and what isand what is not not

in actual usage, language displays a large number of properties in actual usage, language displays a large number of properties behaving as a behaving as a continuumcontinuum, , and not as properties of “yes/no” type and not as properties of “yes/no” type

the same is true for the so-called “rules”, where we find more a the same is true for the so-called “rules”, where we find more a “tendency”“tendency” towards rulestowards rules than than preciseprecise rules in corpus evidence rules in corpus evidence

difficult to constrain word meaningdifficult to constrain word meaning within a rigorously defined within a rigorously defined organisation: by its very nature it tends to evade any strict boundaryorganisation: by its very nature it tends to evade any strict boundary

BUTBUT

Lexicon & CorpusLexicon & Corpus as two viewpoints on the same ling. objectas two viewpoints on the same ling. object

……. even more in a . even more in a multilingual multilingual contextcontext

Page 6: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Extraction from texts vs.Extraction from texts vs.formal representation in formal representation in

lexiconslexicons It is It is difficult to constrain word meaningdifficult to constrain word meaning within a rigorously within a rigorously

defined organisation: by its very nature it tends to evade any strict defined organisation: by its very nature it tends to evade any strict boundaryboundary

TheThe rigourrigour and and lack of flexibilitylack of flexibility of formal representation of formal representation languages causes difficulties when mapping into it NL word languages causes difficulties when mapping into it NL word meaning, meaning, ambiguousambiguous and and flexibleflexible by its own nature by its own nature

No clear-cut boundaryNo clear-cut boundary when analysing many phenomena: it’s when analysing many phenomena: it’s more a continuummore a continuum

The same impression if one looks at examples of types of The same impression if one looks at examples of types of alternations:alternations:

no clear-cut classesno clear-cut classes across languages across languages or within one languageor within one language

Page 7: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Correlation between Correlation between different levels of linguistic different levels of linguistic

description description in the design of a lexical entryin the design of a lexical entry

To understand To understand word-meaningword-meaning:: Focus on the correlation between syntactic and semantic aspectsFocus on the correlation between syntactic and semantic aspects

But other linguistic levels - such as morphology, morphosyntax, But other linguistic levels - such as morphology, morphosyntax, lexical cooccurrence, collocational data, etc. - are closely lexical cooccurrence, collocational data, etc. - are closely interrelated/involvedinterrelated/involved

These relations must be captured when accounting for These relations must be captured when accounting for meaning meaning discrimination discrimination

The The complexity complexity of these of these interrelationshipsinterrelationships makes makes semanticsemantic disambiguationdisambiguation such such a hard task in NLPa hard task in NLP

Textual corporaTextual corpora as a device to discover and reveal the intricacy as a device to discover and reveal the intricacy of these relationshipsof these relationships

Frame/SIMPLE semanticsFrame/SIMPLE semantics as a device to unravel and as a device to unravel and disentangle the complex situation into elementary and disentangle the complex situation into elementary and computationally manageable piecescomputationally manageable pieces

Page 8: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

towardstowards Corpus based Semantic Corpus based Semantic LexiconsLexicons

… at least in principle… at least in principle

both in the design of the model , &both in the design of the model , & in the building of the lexiconin the building of the lexicon (at least partially)(at least partially)

with (semi-)automatic meanswith (semi-)automatic means

Design of the Design of the lexical entrylexical entry with a combined approach: with a combined approach:

theoretical:theoretical: e.g. Fillmore Frame Semantics/ e.g. Fillmore Frame Semantics/ Pustejovsky Generative Lexicon, …Pustejovsky Generative Lexicon, …

empirical:empirical: Corpus evidence Corpus evidenceo even ifeven if: : not always there are sound and explicit criteria for not always there are sound and explicit criteria for classification according to “frame elements”/qualia relations/...classification according to “frame elements”/qualia relations/...

Page 9: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

ButBut … they will never be “complete” … they will never be “complete”

Semantic networksSemantic networks: Euro-/ItalWordNet: Euro-/ItalWordNetLexiconsLexicons: PAROLE/SIMPLE/CLIPS: PAROLE/SIMPLE/CLIPSTreeBanksTreeBanks

Infrastructure of Language Infrastructure of Language Resources...Resources...

Lexical acquisitionLexical acquisition systemssystems (syntactic & semantic) from corporafrom corporaInfrastructure of toolsInfrastructure of tools

•Robust morphosyntactic & syntactic analysersmorphosyntactic & syntactic analysers•Word-senseWord-sense disambiguation systemsdisambiguation systems•Sense classifiersSense classifiers•......

...static...static

……dynamicdynamic

International International StandardsStandards

Page 10: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

ItalWordNet ItalWordNet Semantic NetworkSemantic Network

[Italian module of EuroWordNetEuroWordNet]

~ 50.00050.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations

~ ~ 50.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.

•Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),

•Through the ILIILI link to all the European European WordNets WordNets (de-facto standard) & to the common Top OntologyTop Ontology

•Possibility of plug-in withplug-in with domain terminological lexiconsdomain terminological lexicons(legal, maritime)

•Usable in IR, CLIR, IE, QA, ...

Page 11: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

hond

dog

cane

perro

dog Italian WN

TOP ONTOLOGY

Spanish WN

Dutch WN

English WN

ANIMAL

ILI

LIVING

HUMAN

French WN German

WN

Estonian WN

Czech WN

EuroWordNet EuroWordNet Multilingual Data StructureMultilingual Data Structure

Page 12: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

{{Casa, abitazione, dimora Casa, abitazione, dimora }}

Hyperonym: {edificio,..}

Hyponym:{villetta }{catapecchia, bicocca, .. }{cottage}{bungalow }

Role_location: {stare, abitare, ...}

Role_target_direction: {rincasare}

Role_patient: {affitto, locazione}

Mero_part: {vestibolo}

{stanza}Holo_part: {casale}

{frazione} {caseggiato}

home, domicile, ..house

TOP TOP ConceptsConcepts:Object,Artifact,Building

Synsets linkedSynsets linkedby Semantic by Semantic Relations in Relations in ItalWordNetItalWordNet

Page 13: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

JurJur--WordNetWordNetWith ITTG-CNR (Istituto di Teoria e Tecniche dell’informazione With ITTG-CNR (Istituto di Teoria e Tecniche dell’informazione

Giuridica)Giuridica)

JurJur-WordNet-WordNet EExtension for the xtension for the juridical juridical domaindomain of ItalWordNet of ItalWordNet

Knowledge base for multilingual access to sources of Knowledge base for multilingual access to sources of legal informationlegal information

Source of metadata for semantic mark-up of legal textsSource of metadata for semantic mark-up of legal texts

To be used, together with the generic ItalWordNet, in To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.Norm Comparison, etc.

Page 14: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Terminological LexiconTerminological Lexicon of Navigation & Sea of Navigation & Sea TransportationTransportation

NoloNolo

Synsets Synsets 1.614 1.614Lemmas Lemmas 2.116 2.116Senses Senses 2.232 2.232Nouns Nouns 1.621 1.621Verbs Verbs 205 205Adjectives Adjectives 35 35Proper Nouns Proper Nouns

236 236

Page 15: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

PAROLEPAROLEItal. Synt. Lex.Ital. Synt. Lex.

’96-’98

SIMPLESIMPLEItal. Sem. Lex.Ital. Sem. Lex.

’98-2000

CLIPSCLIPS2000-20042000-2004

morphology: 20,000 entriesmorphology: 20,000 entriessyntax: 20,000 wordssyntax: 20,000 words

semantics: 10,000 senses semantics: 10,000 senses

phonologyphonologymorphology 55,000 morphology 55,000 words words

syntaxsyntax

semantics: 55,000 semantics: 55,000 sensessenses

SGMLSGML SGMLSGML

XMLXML

PAROLEPAROLE CorpusCorpusPAROLE/SIMPLEPAROLE/SIMPLE12 harmonised 12 harmonised computational lexiconscomputational lexiconshttp://www.ilc.cnr.it/clips/

Page 16: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

machine language learningmachine language learning

Page 17: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

machine language learningdevelopment of conceptual networksdevelopment of conceptual networks

linguistic learninglinguistic learning

adaptive classification systemsadaptive classification systems

information extractioninformation extraction

bootstrappingbootstrapping of grammars of grammars

linguistic change modelslinguistic change models

language usage modelslanguage usage models

bootstrapping bootstrapping of lexical informationof lexical information

Page 18: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

structuredstructuredknowledgeknowledge

lexica

unstructuredtextdata

annotationtools

annotateddata

machine machine learninglearning

for linguistic for linguistic knowledge knowledge acquisitionacquisition

lexica

cross-lingualinformation

retrieval

multi-lingualinformationextraction

multi-lingual textmining

userneed

s

lexiconmodel

Architecture for linguisticArchitecture for linguistic knowledge acquisitionknowledge acquisition ... ...

LKGLKG

……. towards “dynamic” lexicons, able to auto-enrich. towards “dynamic” lexicons, able to auto-enrich

terminologyterminology

Page 19: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Harmonisation:Harmonisation:More & moreMore & more Need of a Global ViewNeed of a Global View

for Global for Global InteroperabilityInteroperabilityIntegration/sharingIntegration/sharing of data & software/tools of data & software/tools Need of Need of compatibility among various componentscompatibility among various components An “exemplary cycle”:An “exemplary cycle”:

FormalismsFormalismsGrammarsGrammars

Software: Taggers,Software: Taggers,Chunkers, Parsers, …Chunkers, Parsers, …

Representation Representation AnnotationAnnotation

Lexicon Lexicon CorporaCorpora

TerminologyTerminology

Software: Software:

Acquisition SystemsAcquisition SystemsI/O InterfacesI/O Interfaces

LanguageLanguagess

Page 20: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A short guide to A short guide to ISLE/EAGLES ISLE/EAGLES

http://www.ilc.cnr.it/EAGLES96/isle/ISLE_Home_http://www.ilc.cnr.it/EAGLES96/isle/ISLE_Home_Page.htmPage.htm

Multilingual Computational Lexicon Multilingual Computational Lexicon Working GroupWorking Group

Page 21: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Target: Target: … the Multilingual ISLE … the Multilingual ISLE

Lexical EntryLexical Entry (MILE)(MILE) General methodological principles (from EAGLES):General methodological principles (from EAGLES):

high granularity: high granularity: factor outfactor out the (maximal) set ofthe (maximal) set of primitive primitive units of lexical info (units of lexical info (basic notionsbasic notions) with the highest degree of ) with the highest degree of inter-inter-theoretical agreementtheoretical agreement

modular and layered:modular and layered: various degrees of specification various degrees of specification possiblepossible

explicit representationexplicit representation of info of info allow for underspecification (& hierarchical structure)allow for underspecification (& hierarchical structure) leading principle: leading principle: edited unionedited union of existing lexicons/models of existing lexicons/models

((redundancyredundancy isis not a problem)not a problem) open to different paradigms ofopen to different paradigms of multilingualitymultilinguality oriented to the creation oforiented to the creation of large-scalelarge-scale & & distributed distributed

lexiconslexicons

Page 22: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Paths to Discover thePaths to Discover theBasic Notions of MILEBasic Notions of MILE

clues in dictionariesclues in dictionaries to decide on target equivalent to decide on target equivalent guidelines for lexicographersguidelines for lexicographers clues (to disambiguate/translate) in clues (to disambiguate/translate) in corpus concordancescorpus concordances lexical requirements from various types of lexical requirements from various types of transfer transfer

conditions & actionsconditions & actions in MT systems in MT systems lexical requirements from lexical requirements from interlinguainterlingua-based systems-based systems ……

a list of critical information typescritical information types that will compose each module of the MILE

Page 23: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Designing MILEDesigning MILESteps towards MILE:Steps towards MILE:

Creating Creating entries entries (Bertagna, Reeves, Bouillon)(Bertagna, Reeves, Bouillon) Identifying the Identifying the MILE Basic Notions MILE Basic Notions

(Bertagna,Monachini,Atkins,Bouillon)(Bertagna,Monachini,Atkins,Bouillon) Defining the Defining the MILE Lexical Model MILE Lexical Model (Lenci, Calzolari, etc.)(Lenci, Calzolari, etc.) Formalising Formalising MILE MILE (Ide)(Ide) Development of the Development of the ISLE Lexical Tool ISLE Lexical Tool (Bel)(Bel) ISLE &ISLE & spoken language & multimodality spoken language & multimodality (Gibbon)(Gibbon) Metadata Metadata for the lexicon for the lexicon (Peters, Wittenburg)(Peters, Wittenburg) A case-study: A case-study: MWEs in MILE MWEs in MILE (Quochi, lenci, Calzolari)(Quochi, lenci, Calzolari)

the MILE Basic NotionsMILE Basic Notions the MILE Lexical ModelMILE Lexical Model

Page 24: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Basic Notions The MILE Basic Notions (the (the EAGLES/ISLE CLWG)EAGLES/ISLE CLWG)

Basic Basic lexical dimensionslexical dimensions & info-types relevant to & info-types relevant to establish multilingual linksestablish multilingual links

Typology of Typology of lexicallexical multilingual multilingual correspondencescorrespondences (relevant conditions & actions) (relevant conditions & actions)

Identified by:Identified by:

creating creating sample multilingual lexical entries sample multilingual lexical entries (Bertagna, Reeves)(Bertagna, Reeves)

investigating the use of investigating the use of sense indicatorssense indicators in in traditional bilingual dictionaries traditional bilingual dictionaries (Atkins, Bouillon)(Atkins, Bouillon)

……..

Page 25: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Lexical Classes The MILE Lexical Classes – –

Data Categories for Content Data Categories for Content InteroperabilityInteroperability

Francesca Bertagna*, Alessandro Francesca Bertagna*, Alessandro Lenci°, Monica Monachini*, Lenci°, Monica Monachini*,

Nicoletta Calzolari*Nicoletta Calzolari*

*ILC–CNR – Pisa *ILC–CNR – Pisa °Pisa University°Pisa University

Page 26: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

OverviewOverview1.1. MILE Lexical Model with Lexical MILE Lexical Model with Lexical

Objects and Data CategoriesObjects and Data Categories2.2. Mapping of existing lexicons onto Mapping of existing lexicons onto

MILEMILE3.3. RDF schema and DC Registry for RDF schema and DC Registry for

some pre-instantiated lexical objects some pre-instantiated lexical objects together with a sample entry from the together with a sample entry from the PAROLE-SIMPLE lexicons in MILEPAROLE-SIMPLE lexicons in MILE

4.4. Future …Future …

Page 27: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

GENELEXModel

PAROLE-SIMPLELexicons

MultilingualLexicons

(EuroWordNet, etc.)

MILE Lexical Model

The MILE Lexical ModelThe MILE Lexical ModelGuidelines

syntactic

semantic

lexicons

Computational Lexicon Working Group

… … where where

after?after?

Page 28: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Main The MILE Main FeaturesFeatures

A general architecture devised as a common A general architecture devised as a common representational layer for multilingual representational layer for multilingual Computational LexiconsComputational Lexicons both for hand-coded and corpus-driven lexical databoth for hand-coded and corpus-driven lexical data

Key features:Key features: ModularityModularity Granularity Granularity Extensibility and “openess”Extensibility and “openess” - User-- User-

adaptabilityadaptability Resource SharingResource Sharing Content InteroperabilityContent Interoperability ReusabilityReusabilitySemantic Web technologies & Semantic Web technologies &

standards standards applied at Lexicon modellingapplied at Lexicon modelling

Page 29: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Lexical Model The MILE Lexical Model (MLM)(MLM)

The MLM The MLM corecore is the is the Multilingual ISLE Multilingual ISLE Lexical EntryLexical Entry ( (MILEMILE)) a general a general schemaschema for multilingual lexical resources for multilingual lexical resources a a lexical meta-entrylexical meta-entry as a common representational as a common representational

layer for multilingual lexiconslayer for multilingual lexicons Computational lexicons can be viewed as Computational lexicons can be viewed as

different different instancesinstances of the MILE schemaof the MILE schema

MILELexical Model

lexicon#1 lexicon#3lexicon#2

Page 30: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILEMILEthe building-block modelthe building-block model

The MILE architecture is designed The MILE architecture is designed according to the according to the building-block modelbuilding-block model:: Lexical entries are obtained by combining Lexical entries are obtained by combining

various types of various types of lexical objectslexical objects (atomic and (atomic and complex)complex)

Users design their lexicon by:Users design their lexicon by: selecting and/or specifying the relevant lexical selecting and/or specifying the relevant lexical

objectsobjects combine the lexical objects into lexical entriescombine the lexical objects into lexical entries

Lexical objects may be Lexical objects may be sharedshared:: within the same lexicon (intra-lexicon reusability)within the same lexicon (intra-lexicon reusability) among different lexicons (inter-lexicon reusability)among different lexicons (inter-lexicon reusability)

Page 31: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

syntacticframe

phraseslot Synfeature

Lexical Objects

Semfeature

MILEMILEthe building-block modelthe building-block model

Lexical entry 1 Lexical entry 2 Lexical entry 3

Page 32: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

morphologicallayer

syntactic layer

semantic layerlinking

conditions

mono-MILE

Modularity in MILEModularity in MILE

multi-MILE

multilingualcorrespondence

conditions

mono-Mile

multiple levels of

modularity

Horizontal organization, where independent, Horizontal organization, where independent, but interlinked, modules allow to express but interlinked, modules allow to express different dimensions of lexical entriesdifferent dimensions of lexical entries

Page 33: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The Mono-MILEThe Mono-MILE

Each monolingual layer within Mono-MILE Each monolingual layer within Mono-MILE identifies a identifies a basicbasic unitunit of lexical description of lexical description

morphological layer MU

basic unit to describe the inflectional and derivational morphological properties of the word

syntactic layer SynU

basic unit to describe the syntactic behaviour of the MU

semantic layer SemUbasic unit to describe the semantic properties of the MU

Page 34: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The Mono-MILEThe Mono-MILE

MUSynU

SynUSynU

SynU

SemUSemU

SemUSemU

SemUSemU

SemU

Within each layer, a basic linguistic information unit is identified

Page 35: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Granularity in MILEGranularity in MILE Concerns the vertical dimension. Within a Concerns the vertical dimension. Within a

given lexical layer, varying degrees of given lexical layer, varying degrees of depth of lexical descriptions are alloweddepth of lexical descriptions are allowed, , both shallow and deep lexical both shallow and deep lexical representationsrepresentations

Page 36: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Defining the MLMDefining the MLM

The MLM is designed as an The MLM is designed as an E-R modelE-R model ((MILE Entry SchemaMILE Entry Schema)) defines the lexical objects and the ways they can defines the lexical objects and the ways they can

be combined into a lexical entrybe combined into a lexical entry The MLM includes 3 types of lexical objects:The MLM includes 3 types of lexical objects:

MILE Lexical ClassesMILE Lexical Classes (MLC) (MLC) MILE Lexical Data CategoriesMILE Lexical Data Categories (MDC) (MDC) MILE Lexical OperationsMILE Lexical Operations (MLO) (MLO)

Page 37: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Lexical ObjectsThe MILE Lexical Objects Within each layer, Within each layer, basic lexical basic lexical

notions notions are represented by are represented by lexical lexical objectsobjects:: MILE Lexical Classes MLCMILE Lexical Classes MLC MILE Data Categories MDCMILE Data Categories MDC Lexical operationsLexical operations

They are an They are an ontology of lexical ontology of lexical objectsobjects as an abstraction over different as an abstraction over different lexical models and architectureslexical models and architectures

Page 38: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE E/R diagramsThe MILE E/R diagrams

The The lexical objectslexical objects are described are described with E-R diagrams which define them with E-R diagrams which define them and the and the ways they can be ways they can be combinedcombined into a lexical entry into a lexical entry

Page 39: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical Objects: MILE Lexical Objects: Syntactic LayerSyntactic Layer

MLC:SynU

MLC:SyntacticFramehasSyntacticFrame

MLC:FrameSethasFrameSet

MLC:Compositioncomposedby

correspondTo MLC:SemU

MLC:CorrespSynUSemU

1..*

*

*

*

Page 40: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

SyntacticFrame

Construction Self

Slot Slot

SynU

Function

Phrase

… expanding one node.…

Page 41: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC:SemU

MLC:SynsetbelongsToSynset

MLC:SemanticFramehasSemFrame

MLC:SemanticFeaturehasSemFeature

MLC:CollocationhasCollocation

semanticRelation MLC:SemU

MLC:SemanticRelation

MILE Lexical Objects: MILE Lexical Objects: Semantic LayerSemantic Layer

*

0..1

*

*

*

Page 42: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC:CorrespSynUSemUMLC:SynUhasSourceSynu

hasTargetSemu MLC:SemU

hasPredicativeCorresp MLC:PredicativeCorresp

IncludesSlotArgCorresp MLC:SlotArgCorresp

MILE Lexical Objects: Synt-Sem MILE Lexical Objects: Synt-Sem LinkingLinking

1

1

1

0..*

Page 43: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Syntax-Semantics Syntax-Semantics LinkingLinking

CorrespSynUSemU

PredCorresp

Slot0:Arg1

Slot1:Arg0

SemU

Predicate

Arg_0

Arg_1

SynU

Frame

Slot1

Slot0

filters&

conditions

Page 44: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Syntax-Semantics Syntax-Semantics LinkingLinking

John gave the book to Mary

John gave Mary the book

SynU#1

obj_NP obl_PP_to

SemU#1

Semantic_Frame:GIVE

Arg1Agent

subj_NP

SynU#2

obj_NP obj_NPsubj_NP

Arg2Theme

Arg3Goal

Page 45: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

CorrespSynUSemU

Syntax-SemanticSyntax-Semantic Linking in Linking in SIMPLESIMPLE

Transitive structure

Slot0 Slot1

SemU1_migliorare SemU2_migliorareCHANGE_OF_STATECAUSE_CHANGE_OF_STATE

PRED_ migliorare

ARG0:Agent ARG1:Patient

isomorphic non-isomorphic

SynU_migliorare

FramesetIntransitive structure

Slot0 Ø

CorrespSynUSemU

SlotArgCorresp SlotArgCorresp

Page 46: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MultiCorresp

MUMUCorresphasMUMUCorr

SynUSynUCorresphasSynUSynuCorr

SemUSemUCorresphasSemUSemUCorr

SynsetMultCorresphasSynsetMultCorr

hasSemFrameCorrSemanticFrameMultCorresp

The Multilingual layerThe Multilingual layer

1..0

1..0

1..0

1..0

1..0

Page 47: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE approach to MILE approach to multilingualitymultilinguality

Open to various approachesOpen to various approaches transfer-basedtransfer-based

monolingual descriptions are used to state monolingual descriptions are used to state correspondences (tests and actions) between correspondences (tests and actions) between source and target entriessource and target entries

interlingua-basedinterlingua-based monolingual entries linked to language-monolingual entries linked to language-

independent lexical objects (e.g. semantic independent lexical objects (e.g. semantic frames, “primitive predicates”, etc.)frames, “primitive predicates”, etc.)

Page 48: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The Multi-MILEThe Multi-MILE

Multi-MILE specifies a formal Multi-MILE specifies a formal environment to express multilingual environment to express multilingual correspondences between lexical itemscorrespondences between lexical items

Source and target lexical entries can be Source and target lexical entries can be linked by exploiting (possibly combined) linked by exploiting (possibly combined) aspects of their monolingual descriptionsaspects of their monolingual descriptions monolingual lexicons act as monolingual lexicons act as pivot lexical pivot lexical

repositoriesrepositories, on top of which language-to-, on top of which language-to-language multilingual modules can be language multilingual modules can be defineddefined

Page 49: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The Multi-MILEThe Multi-MILE Multi-MILE may include:Multi-MILE may include:

Multlingual operations to establish transfer Multlingual operations to establish transfer links between source and target mono-MILElinks between source and target mono-MILE

Multlingual lexical objectsMultlingual lexical objects enrich the source and target lexical descripotions, enrich the source and target lexical descripotions,

butbut do not belong to the monolingual lexiconsdo not belong to the monolingual lexicons

Language-independent lexical objects:Language-independent lexical objects: Primitive semantic frames, “interlingual synsets”, Primitive semantic frames, “interlingual synsets”,

etc.etc. Relevant for interlingua approaches to Relevant for interlingua approaches to

multilingualitymultilinguality

Page 50: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MU_1

SynU_2

SemU_2

SynU_1

SemU_1

Italianmono-MILE IT-to-EN multi-MILE

Multi-MILEMulti-MILE

IT_SemU_2 En_SemU_1

IT_SynU_2 En_SynU_1

IT_Slot_0 EN_Slot_1

IT_Slot_1 EN_Slot_0

MU_1

SynU_1

SemU_1

Englishmono-MILE

AddFeature to source SemU

+HUMAN

AddSlot to target SynU

MODIF [PP_with]

Page 51: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Multi-MILEMulti-MILE

dito

finger

toe

modif(mano)

modif(piede)

multilingual conditions

run + PP_intoentrare“to enter” +PP_di_corsa

multilingual conditions

IT Lexicon EN Lexicon

Page 52: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical ClassesMILE Lexical Classes Represent the main building blocks of lexical Represent the main building blocks of lexical

entriesentries Formalize the MILE Basic NotionsFormalize the MILE Basic Notions Define an Define an ontology of lexical objectsontology of lexical objects

represent lexical notions such as semantic unit, represent lexical notions such as semantic unit, syntactic feature, syntactic frame, semantic syntactic feature, syntactic frame, semantic predicate, semantic relation, synset, etc.predicate, semantic relation, synset, etc.

Similar to class definitions in OO languagesSimilar to class definitions in OO languages specify the relevant attributesspecify the relevant attributes define the relations with other classesdefine the relations with other classes hierarchically structuredhierarchically structured

Page 53: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical ClassesMILE Lexical Classesan ontology of lexical objectsan ontology of lexical objects

MLM:SemU

id: xs:anyURI comment: xs:string example: xs:string

MLM:Synset correspondsToSynset

*

MLM:SemanticFrame

MLM:semValues

hasSemanticFrame

0..1

MLM:SemU semURelation

*

MLM:SemURelation

MLM:Collocation hasCollocation

*

semFeature

*

Page 54: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical Data MILE Lexical Data CategoriesCategories

MDC are instances of the MILE lexical MDC are instances of the MILE lexical ClassesClasses Can be used Can be used ““off the shelfoff the shelf”” or as a departure point for the or as a departure point for the

definition of new or modified categoriesdefinition of new or modified categories Enable Enable modular specificationmodular specification of lexical entities using all or parts of lexical entities using all or parts

of the lexical information in the repositoryof the lexical information in the repository Each MDC respresents a Each MDC respresents a resourceresource

uniquely identified by a URIuniquely identified by a URI Two types of MDC:Two types of MDC:

Core MDCCore MDC belong to shared repositories (belong to shared repositories (Lexical Data Lexical Data

Category RegistryCategory Registry)) lexical objects and linguistic notions with wide consensuslexical objects and linguistic notions with wide consensus

User Defined MLDCUser Defined MLDC user-specific or language specific lexical objects user-specific or language specific lexical objects

Page 55: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

User-defined MDC

The MILE Data CategoriesThe MILE Data Categories Instances of the MILE Lexical Classes are Instances of the MILE Lexical Classes are

Data CategoriesData Categories MDC can belong to a shared repository or be MDC can belong to a shared repository or be

user-defined user-defined

CoreMDC

MLC

Page 56: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Data CategoriesThe MILE Data Categories User-adaptability and User-adaptability and

extensibilityextensibility

HUMANARTIFACTEVENTANIMALGROUP

AGEMAMMAL

instance_of

Core

UserDefined

MLC:SemanticFeature

Page 57: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical Data MILE Lexical Data CategoriesCategories

MLM:Feature

MLM:SemFeature

MLM:SynFeature

HUMANARTIFACTUALEVENTDURATIONGROUPAGEANIMATE

instance_of

Core

UserDefined

MDCGENDERCASEPERSONTENSECONTROLASPECT

Core

UserDefined

instance_of

MDC

MLM:GrammaticalFunction

SUBJOBJIOBJPREDX_COMPC_COMP

Core

UserDefined

instance_ofMDC

Page 58: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical OperationsMILE Lexical Operations They are used to They are used to state conditionsstate conditions and and

perform operationsperform operations over lexical entries over lexical entries Link syntactic slots and semantic argumentsLink syntactic slots and semantic arguments Constrain the syntax-semantic linkConstrain the syntax-semantic link Express tests and actions in the transfer Express tests and actions in the transfer

conditions in the multi-MILEconditions in the multi-MILE ……

They provide the “They provide the “glueglue” to link various ” to link various independent independent intra-lexicalintra-lexical and and inter-inter-lexicallexical components components

Page 59: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Multilingual OperationsMultilingual Operations Source-to-target language Source-to-target language transfer conditionstransfer conditions

can be expressed by combining multilingual can be expressed by combining multilingual operationsoperations

Three types of multingual operations:Three types of multingual operations: Multilingual correspondencesMultilingual correspondences

Link a Link a source lexical objectsource lexical object (MU, SemU, SynU, semantic (MU, SemU, SynU, semantic argument, syntactic slot) and a argument, syntactic slot) and a target lexical objecttarget lexical object (MU, (MU, SemU, SynU, semantic argument, syntactic slot)SemU, SynU, semantic argument, syntactic slot)

Add-operationsAdd-operations Add lexical information relevant for the cross-lingual link, Add lexical information relevant for the cross-lingual link,

but not present in the source or target mono-MILEbut not present in the source or target mono-MILE Constrain-operationsConstrain-operations

Constrain the transfer link to some portions of source and Constrain the transfer link to some portions of source and target mono-MILEtarget mono-MILE

Page 60: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Defining the MLMDefining the MLM

MILEEntry Schema

MILE LexicalClasses

User DefinedMDC

MDCRegistry

RDF/SDescriptions

Monolingual/MultilingualLexicon

Page 61: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

RDF Instantiation of the RDF Instantiation of the MLMMLM

Lexicon#1Lexicon#2

Lexicon#3 Resources

LexicalObjects

LexicalClasses

LexicalData Categories

Resources

Metadata

Page 62: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical ModelMILE Lexical Model

Ideal structure for rendering in RDF:Ideal structure for rendering in RDF: hierarchy of lexical objects built up by hierarchy of lexical objects built up by

combining atomic data categories via combining atomic data categories via clearly defined relationsclearly defined relations

Proof of concept:Proof of concept: Create an Create an RDF schemaRDF schema for the MILE for the MILE

Lexical ModelLexical Model version 1.2version 1.2

Instantiate MILE Lexical Data CategoriesInstantiate MILE Lexical Data Categories

Page 63: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

User-Adaptability and User-Adaptability and Resource Sharing in Resource Sharing in

MILEMILE Compatible with different models of lexical analysis:Compatible with different models of lexical analysis:

Relational semantic models (e.g. WordNet)Relational semantic models (e.g. WordNet) Syntactic and semantic framesSyntactic and semantic frames Ontology-based lexiconsOntology-based lexicons

Compatible with different degrees of specification:Compatible with different degrees of specification: Deep lexical representations (e.g. PAROLE-SIMPLE)Deep lexical representations (e.g. PAROLE-SIMPLE) Terminological lexiconsTerminological lexicons

Compatible with different paradigm of Compatible with different paradigm of multilingualitymultilinguality Lexicons for Transfer Based MTLexicons for Transfer Based MT Interlingua-based lexiconsInterlingua-based lexicons ……

Page 64: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The MILE Lexical ModelThe MILE Lexical Model

MILELexical Model

lexicon_1 lexicon_2 lexicon_3

DTD_1 DTD_2…

DTD_n

Page 65: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

RDF Instantiation of the RDF Instantiation of the MLMMLM

Enable universal access to sophisticated linguistic Enable universal access to sophisticated linguistic infoinfo

Provide means for inferencing over lexical info Provide means for inferencing over lexical info Incorporate lexical information into the Incorporate lexical information into the Semantic Semantic

WebWeb

W3C standards:W3C standards: Resource Definition Framework (Resource Definition Framework (RDFRDF) ) Ontology Web Language (Ontology Web Language (OWLOWL) )

Built on the XML web infrastructure to enable the Built on the XML web infrastructure to enable the creation of a Semantic Webcreation of a Semantic Web web objects are classified according to their propertiesweb objects are classified according to their properties semantics of relationssemantics of relations (links) to other web objects precisely defined (links) to other web objects precisely defined

Page 66: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The RDF SchemaThe RDF Schema

Defines classes of objects (MLC) and Defines classes of objects (MLC) and their relations to other objectstheir relations to other objects

Like a class definition in Java, etc.Like a class definition in Java, etc. Classes and properties in the schema Classes and properties in the schema

correspond to the E-R model correspond to the E-R model Can specify sub-classes/sub-Can specify sub-classes/sub-

properties and inheritanceproperties and inheritance

Page 67: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

GoalsGoals Lexical information will form a Lexical information will form a

central component of semantic central component of semantic informationinformation

Need a standardized, machine Need a standardized, machine processable format so that processable format so that information can be used, merged information can be used, merged with otherswith others

Main task: Main task: get the data model rightget the data model right

See Semantic WebSemantic Web

Page 68: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Advantages of RDFAdvantages of RDF ModularityModularity

Can create “instances” of bits of lexical information Can create “instances” of bits of lexical information for re-use in a single lexicon or across lexiconsfor re-use in a single lexicon or across lexicons

Instances can be stored in a central repository for Instances can be stored in a central repository for use by othersuse by others

Can use partial information or all of itCan use partial information or all of it Building block approach to lexicon creationBuilding block approach to lexicon creation

Web-compatibleWeb-compatible RDF instantiation will integrate into Semantic WebRDF instantiation will integrate into Semantic Web Inferencing capabilitiesInferencing capabilities

Page 69: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

ExampleExample Three parts:Three parts:

RDF Schema for lexical entriesRDF Schema for lexical entries Defines classes and properties, sub-Defines classes and properties, sub-

classes, etc.classes, etc. Sample repository of RDF-Sample repository of RDF-

instantiated lexical objectsinstantiated lexical objects Three levels of granularityThree levels of granularity

Sample lexicon entriesSample lexicon entries Use repository information at different Use repository information at different

levelslevels

Page 70: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Sample RepositoriesSample Repositories1 repository of repository of enumerated classesenumerated classes for for

lexical objects at the lowest level of lexical objects at the lowest level of granularitygranularity

• definition of sets of possible values for definition of sets of possible values for various lexical objectsvarious lexical objects

2 repository of repository of phrasesphrases for common for common phrase types, e.g., NP, VP, etc.phrase types, e.g., NP, VP, etc.

3 repository of repository of constructionsconstructions for for common syntactic constructionscommon syntactic constructions

Page 71: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#FunctionType"><owl:oneOf> <rdf:Seq> <rdf:li>Subj</rdf:li> <rdf:li>Obj</rdf:li> <rdf:li>Comp</rdf:li> <rdf:li>Arg</rdf:li> <rdf:li>Iobj</rdf:li> </rdf:Seq></owl:oneOf> </rdfs:Class>

<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#SynFeatureName"><owl:oneOf> <rdf:Seq> <rdf:li>tense</rdf:li> <rdf:li>gender</rdf:li> <rdf:li>control</rdf:li> <rdf:li>person</rdf:li> <rdf:li>aux</rdf:li> </rdf:Seq></owl:oneOf> </rdfs:Class>

<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#SynFeatureValue"><owl:oneOf> <rdf:Seq> <rdf:li>have</rdf:li> <rdf:li>be</rdf:li> <rdf:li>subject_control</rdf:li> <rdf:li>object_control</rdf:li> <rdf:li>masculine</rdf:li> <rdf:li>feminine</rdf:li> </rdf:Seq></owl:oneOf> </rdfs:Class>

Enumerated Enumerated classesclasses

Page 72: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:mlc="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#">

<Phrase rdf:ID="NP" rdfs:label="NP"/>

<Phrase rdf:ID="Vauxhave"> <hasSynFeature> <SynFeature> <hasSynFeatureName rdf:value="aux"/> <hasSynFeatureValue rdf:value="have"/> </SynFeature> </hasSynFeature></Phrase>

</rdf:RDF>

Sample LDCR for a Sample LDCR for a Phrase ObjectPhrase Object

Page 73: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Sample LDCR entry for a Sample LDCR entry for a Construction objectConstruction object

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#"> <Construction rdf:ID="TransIntrans"> <slot> <SlotRealization rdf:ID="NPsubj"> <hasFunction rdf:value="Subj"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> </SlotRealization> </slot> <slot> <SlotRealization rdf:ID="NPobj"> <hasFunction rdf:value="Obj"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> </SlotRealization> </slot></Construction></rdf:RDF>

Page 74: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Full entryFull entry<Entry rdf:ID="eat1"> <hasSynu rdf:parseType="Resource"> <SynU rdf:ID="eat1-SynU"> <example>John ate the cake</example> <hasSyntacticFrame> <SyntacticFrame rdf:ID="eat1SynFrame"> <hasSelf> <Self rdf:ID="eat1Self"> <headedBy> <Phrase rdf:ID="Vauxhave"> <hasSynFeature> <SynFeature> <hasSynFeatureName rdf:value="aux"/> <hasSynFeatureValue rdf:value="have"/> </SynFeature> </hasSynFeature> </Phrase> </headedBy> </Self> </hasSelf>Continued…

Page 75: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Continued from previous slide…

<hasConstruction> <Construction rdf:ID="eat1Const"> <slot> <SlotRealization rdf:ID="NPsubj"> <hasFunction rdf:value="Subj"/> <filledBy rdf:value="NP"/> </SlotRealization> </slot> <slot> <SlotRealization rdf:ID="NPobj"> <hasFunction rdf:value="Obj"/> <filledBy rdf:value="NP"/> </SlotRealization> </slot> </Construction> </hasConstruction> <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/> </SyntacticFrame> </hasSyntacticFrame> </SynU> </hasSynu> </Entry> </rdf:RDF>

Page 76: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Entry Using Entry Using PhrasePhrase

<Entry rdf:ID="eat1"> <hasSynu rdf:parseType="Resource"> <SynU rdf:ID="eat1-SynU"> <example>John ate the cake</example> <hasSyntacticFrame> <SyntacticFrame rdf:ID="eat1SynFrame"> <hasSelf> <Self rdf:ID="eat1Self"> <headedBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/> </Self> </hasSelf> <hasConstruction> <Construction rdf:ID="eat1Const"> <slot> <SlotRealization rdf:ID="NPsubj"> <hasFunction rdf:value="Subj"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> </SlotRealization> </slot> <slot> <SlotRealization rdf:ID="NPobj"> <hasFunction rdf:value="Obj"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> </SlotRealization> </slot> </Construction> </hasConstruction> <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/> </SyntacticFrame> </hasSyntacticFrame> </SynU> </hasSynu> </Entry>

Page 77: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Entry Using ConstructionEntry Using Construction<Entry rdf:ID="eat1"><hasSynu rdf:parseType="Resource"> <SynU rdf:ID="eat1-SynU"> <example>John ate the cake</example> <hasSyntacticFrame> <SyntacticFrame rdf:ID="eat1SynFrame"> <hasSelf> <Self rdf:ID="eat1Self"> <headedBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/> </Self> </hasSelf> <hasConstruction rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Constructions#TransIntrans"/> <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/> </SyntacticFrame> </hasSyntacticFrame> </SynU> </hasSynu> </Entry>

Page 78: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Semantic RepresentationSemantic Representation The The data modeldata model underlying RDF/UML, etc underlying RDF/UML, etc. . is universal, is universal,

abstract enough to capture all types of infoabstract enough to capture all types of info Semantic representations:Semantic representations:

Registry of basic data categoriesRegistry of basic data categories ““meta”-categories: addressee, utterance, etc.meta”-categories: addressee, utterance, etc. Information categories: eyebrow movement, gestures, pitch, …Information categories: eyebrow movement, gestures, pitch, … Supporting ONTOLOGY of information categoriesSupporting ONTOLOGY of information categories

Interpretative procedures yield another level of meaning Interpretative procedures yield another level of meaning represent.represent. Registry of categories….Registry of categories….

UNINTERPRETED REPRESENATION INTERPRETATION

PROCESS

INTERPRETED INTERPRETED REPRESENTATIONREPRESENTATION

Page 79: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical Data MILE Lexical Data Category Registry (MDC)Category Registry (MDC)

Instantiation of pre-defined lexical objectsInstantiation of pre-defined lexical objects Extension of the shared class schema with Extension of the shared class schema with

lexicon-specific sub-classes and sub-propertieslexicon-specific sub-classes and sub-properties Can be used “Can be used “off the shelfoff the shelf” or as a departure ” or as a departure

point for the definition of new or modified point for the definition of new or modified categories categories

Enables modular specification of lexical entitiesEnables modular specification of lexical entities eliminate redundancyeliminate redundancy identify lexical entries or sub-entries with shared identify lexical entries or sub-entries with shared

propertiesproperties

Page 80: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC in RDF/SMLC in RDF/S featuresfeatures

mlm:LexObject mlm:Valuesmlm:feature

mlm:SemValues

mlm:SynValues

rdfs:subClassOfmlm:semFeature

rdfs:subClassOf

mlm:synFeature

rdfs:subPropertyOf

features are properties of lexical objects

Page 81: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC in RDF/SMLC in RDF/S syntactic featuressyntactic features

<rdfs:Property rdf:ID=“synCat"><rdfs:subPropertyOf

rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#synFeature"/>

<rdfs:rangerdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#SynCatValues”/>

</rdfs:Property>

<rdfs:Class rdf:ID=“SynCatValues”><rdfs:subClassOf

rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1 #SynValues”/>

<owl:oneOf rdf:parseType="Collection"><owl:Thing rdf:about="#Noun"/><owl:Thing rdf:about="#Verb"/><owl:Thing rdf:about="#Adjective"/>...

</owl:oneOf> </rdfs:Class> </rdfs:RDF>

feature values

Page 82: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC in RDF/SMLC in RDF/S semantic featuressemantic features

<rdfs:Property rdf:ID=“domain"><rdfs:subPropertyOf

rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#semFeature"/>

<rdfs:rangerdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1 #DomainValues”/>

</rdfs:Property>

<rdfs:Class rdf:ID=“DomainValues”><rdfs:subClassOf

rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-schema-v.1#SemValues”/>

<owl:oneOf rdf:parseType="Collection"><owl:Thing rdf:about="#Finance"/><owl:Thing rdf:about="#Medicine"/><owl:Thing rdf:about="#Sport"/>...

</owl:oneOf> </rdfs:Class> </rdfs:RDF>

“domain ontology”

Page 84: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

<rdfs:Class rdf:ID="Synset"><rdfs:label>Synset</rdfs:label><rdfs:comment>This class formalizes the notion of synset as defined in WordNet (Fellbaum 1998).</rdfs:comment><rdfs:subClassOf rdf:resource=“#LexObject”/>

</rdfs:Class>

<rdfs:Property rdf:ID="synsetRelation"><rdfs:domain rdf:resource="#Synset"/><rdfs:range rdf:resource="#Synset"/>

</rdfs:Property>

<rdfs:Property rdf:ID="hypernym" mlm:source="WordNet1.7"><rdfs:comment>The WordNet hypernym relation</rdfs:comment><rdfs:subPropertyOf rdf:resource="#synsetRelation"/>

</rdfs:Property><rdfs:Property rdf:ID="meronym" mlm:source="WordNet1.7">

<rdfs:comment>The WordNet meronym relation</rdfs:comment><rdfs:subPropertyOf rdf:resource="#synsetRelation"/>

</rdfs:Property>

Synsets in RDF/SSynsets in RDF/S

relation between synsets

different types of synset relations

Page 85: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

<mlm:Synset rdf:about="http://www.cogsci.princeton.edu/~wn1.7/concept#01752990“ mlm:source="WordNet1.7">

<mlm:gloss>A member of the genus Canis</mlm:gloss><mlm:word>dog</mlm:word><mlm:word>domestic dog</mlm:word><mlm:word>Canis familiaris</mlm:word><mdc:synCat rdf:resource="#Noun"/><mdc:domain rdf:resource="#Zoology"/><mdc:hypernymrdf:resource="http://www.cogsci.princeton.edu/~wn1.7/concept

#01752283"/></mlm:Synset>

WordNet 1.7 SynsetsWordNet 1.7 Synsets

featureshypernym

Page 86: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Foundations of the Foundations of the Mapping ExperimentMapping Experiment

Page 87: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

1. The MILE building-1. The MILE building-block modelblock model

The MILE The MILE Lexical ClassesLexical Classes and the MILE and the MILE Lexical Data CategoriesLexical Data Categories are the main are the main building blocksbuilding blocks of the MILE lexical of the MILE lexical architecturearchitecture

Building blocks allow two kinds of Building blocks allow two kinds of reusabilityreusability: : intra-lexicon reusability (within the same lexicon)intra-lexicon reusability (within the same lexicon) inter-lexicon reusability (among different inter-lexicon reusability (among different

lexicons)lexicons)

Page 88: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

syntacticframe

phraseslot Synfeature

Lexical Objects

Semfeature

How building-blocks work?How building-blocks work?

Lexical entry 1 Lexical entry 2 Lexical entry 3

Page 89: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

2. MILE: a meta-entry2. MILE: a meta-entry MILEMILE isis

a general a general schemaschema for multilingual lexical for multilingual lexical resourcesresources

a a lexical meta-entrylexical meta-entry, a common representational , a common representational layer for multilingual lexiconslayer for multilingual lexicons

Computational lexicons can be viewed as Computational lexicons can be viewed as different different instancesinstances of the MILE schema of the MILE schema

MILE

lexicon#1 lexicon#3lexicon#2

Page 90: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE and MILE and Content Content InteroperabilityInteroperability

This common shared compatible representation of This common shared compatible representation of lexical objects is particularly suited to lexical objects is particularly suited to manipulate objects available in different lexical manipulate objects available in different lexical

resourcesresources understand their deep semanticsunderstand their deep semantics apply the same operations to lexical objects of the same apply the same operations to lexical objects of the same

typetype

key elements of Content Interoperabilitykey elements of Content Interoperability

Page 91: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The Mapping The Mapping Experiment: Why?Experiment: Why?

It is a concrete experiment aimed to test the It is a concrete experiment aimed to test the expressive potentialities and capabilities of expressive potentialities and capabilities of the MILEthe MILE

The idea is that if the MILE atomic notions The idea is that if the MILE atomic notions combined together in different ways suit the combined together in different ways suit the different “visions” underlying two lexicons different “visions” underlying two lexicons such as such as FrameNetFrameNet andand NOMLEXNOMLEX, , the MILE will come out fortified the MILE will come out fortified its adoption as an interface between differently its adoption as an interface between differently

conceived lexical architectures can be pushed moreconceived lexical architectures can be pushed more key issues for content interoperability between key issues for content interoperability between

resources can be addressedresources can be addressed

Page 92: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The mapping scenariosThe mapping scenarios1.1. High level mapping of the objects of a High level mapping of the objects of a

lexicon into the objects of the abstract lexicon into the objects of the abstract model model

the native structure is maintained and no the native structure is maintained and no format conversion is performedformat conversion is performed

2.2. Translate instances of lexical entries Translate instances of lexical entries directly in MILEdirectly in MILE

acts as a true interchange formatacts as a true interchange format

Page 93: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

FrameNet to MILEFrameNet to MILE

Page 94: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

FrameNet-MILE: FrameNet-MILE: ObservationsObservationsThe mapping is promisingThe mapping is promising

Frame ↔ Predicate (Frame ↔ Predicate (primitiveprimitive) ) Frame Elements ↔ Argument (Frame Elements ↔ Argument (enlarge the set of possible enlarge the set of possible

values)values) Lexical_Unit ↔ SemULexical_Unit ↔ SemU Link SemU-Predicate (Link SemU-Predicate (obligatoryobligatory) should become ) should become

underspecifiedunderspecified

But …But … Lack of inheritance mechanism in the Predicate does not Lack of inheritance mechanism in the Predicate does not

allow to represent the hierarchical organization of Frames allow to represent the hierarchical organization of Frames and Sub-frames, temporal ordering among Frames, and Sub-frames, temporal ordering among Frames, subsumption relations among Framessubsumption relations among Frames

We could add a new object We could add a new object PredicateRelationPredicateRelation to allow for to allow for the description of relations occurring between predicates the description of relations occurring between predicates and sub-predicatesand sub-predicates

Page 95: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MLC:SynU MLC:SemU MLC:SemanticFrame

TypeOfLinkAgentnom

IncludedArg 0

MLC:Predicate

MLC:ArgumentMLC:Argument

MLC:CorrespSynUSemU

:nom-type ((subject))

Page 96: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

NOMLEX-MILE: NOMLEX-MILE: ObservationsObservationsThe mapping is promisingThe mapping is promising

Notions represented in NOMLEX have a correspondent in Notions represented in NOMLEX have a correspondent in MILEMILE

But ..But .. are expressed with two opposite lexical structuresare expressed with two opposite lexical structures In NOMLEX, In NOMLEX,

lexical information is expressed in a very compact waylexical information is expressed in a very compact way no clear cut boundaries between the levels of linguistic descriptionno clear cut boundaries between the levels of linguistic description

In MILE In MILE compressed info should be decompressed and spread over compressed info should be decompressed and spread over

different MILE lexical layers and objects: SynU, SemU, different MILE lexical layers and objects: SynU, SemU, SemanticFrame with its Predicate and relevant Arguments to SemanticFrame with its Predicate and relevant Arguments to account for the incorporation of the Agent.account for the incorporation of the Agent.

Page 97: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Lesson Learned from the Lesson Learned from the mappingmapping The results of the experiments are promisingThe results of the experiments are promising

FrameNet offers the possibility to be FrameNet offers the possibility to be confronted with two similar lexical models, confronted with two similar lexical models, but not perfectly overlapping lexical objects but not perfectly overlapping lexical objects test the adequacy of the linguistic objectstest the adequacy of the linguistic objects

NOMLEX gives the opportunity to work with NOMLEX gives the opportunity to work with two lexicons where linguistic notions two lexicons where linguistic notions correspond but are expressed with an correspond but are expressed with an opposite lexicon structure opposite lexicon structure test the test the adequacy of the architectural modeladequacy of the architectural model

The high granularity and modularity of MILE The high granularity and modularity of MILE allow the compatibility with differently packaged allow the compatibility with differently packaged

linguistic objectslinguistic objects allow the addition of new objects and relations allow the addition of new objects and relations

without perverting the general architecturewithout perverting the general architecture

Page 98: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

RDF and MILE: Why?RDF and MILE: Why?Some reasons (from Nancy IdeSome reasons (from Nancy Ide et al. et al. 2003) 2003) MILE as a hierarchy of lexical objects built up by MILE as a hierarchy of lexical objects built up by

combining data categories via clearly defined combining data categories via clearly defined relations is an ideal structure for rendering in relations is an ideal structure for rendering in RDFRDF

RDF mechanism, with the capacity of expressing RDF mechanism, with the capacity of expressing named relations between objects, offers a web-named relations between objects, offers a web-based means to represent the MILE architecturebased means to represent the MILE architecture

RDF representation of linguistic information is an RDF representation of linguistic information is an invaluable resource for language processing invaluable resource for language processing applications in the Semantic Webapplications in the Semantic Web

RDF description and instantiation is in line with RDF description and instantiation is in line with the goal of the goal of ISO TC37 SC4ISO TC37 SC4

Page 99: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

RDF Representation of RDF Representation of MILEMILE

MILE was already supplied withMILE was already supplied with an an RDF schemaRDF schema for the MILE Syntactic Layer for the MILE Syntactic Layer an instantiation of pre-defined syntactic objectsan instantiation of pre-defined syntactic objects

We increased the repository of shared We increased the repository of shared lexical objects with the RDF description lexical objects with the RDF description and (and (partial!partial!) instantiations of the objects ) instantiations of the objects of the semantic and linking layersof the semantic and linking layers

This has been carried out with the intent to This has been carried out with the intent to be submitted within the be submitted within the ISO TC37/SC4ISO TC37/SC4 foster the adoption of MILE, by offering a foster the adoption of MILE, by offering a

librarylibrary of RDF objects ready-to-use of RDF objects ready-to-use

Page 100: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

An RDF Schema for the synt-An RDF Schema for the synt-sem linkingsem linking

<!-- An RDF Schema for ISLE lexical entries v 0.1 2004/05/05 Author: Monachini--><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl ="http://www.w3.org/2002/07/owl# xmlns:mlc ="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6# xmlns:mlc ="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#"> <!-- ISLE/MILE lexical objects (classes for the synt-sem linking) -->

<rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"> <rdfs:label>CorrespSynUSemU</rdfs:label> <rdfs:comment>This class links a SynU to a SemU</rdfs:comment> </rdfs:Class>

<rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp"> <rdfs:label>PredicativeCorresp</rdfs:label> <rdfs:comment>This class contains the associations between the syntactic slots and semantic argument</rdfs:comment> </rdfs:Class>

<rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#SlotArgCorresp"> <rdfs:label>SlotArgCorresp</rdfs:label> <rdfs:comment>This class links a syntactic slots to a semantic argument</rdfs:comment> </rdfs:Class>

Classes

Page 101: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

An RDF Schema for the synt-An RDF Schema for the synt-sem linkingsem linking

<!-- Properties (relations) between objects and between objects and atomic values -->

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasSourceSynU"> <rdfs:label>hasSourceSynU</rdfs:label> <rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/> <rdfs:range rdf:resource="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#SynU"/> </rdf:Property>

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasTargetSemU"> <rdfs:label>hasTargetSemU</rdfs:label> <rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/> <rdfs:range rdf:resource="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#SemU"/> </rdf:Property>

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasPredicativeCorresp"> <rdfs:label>hasPredicativeCorresp</rdfs:label> <rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/> <rdfs:range rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp"/> </rdf:Property>

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#includesSlotArgCorresp"> <rdfs:label>includesSlotArgCorresp</rdfs:label> <rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp"/> <rdfs:range rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#SlotArgCorresp"/> </rdf:Property>

Properties

Page 102: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The The librarylibrary of Pre- of Pre-instantiated objectsinstantiated objects

Enable modular specification of lexical Enable modular specification of lexical entitiesentities eliminate redundancyeliminate redundancy identify lexical entries or sub-entries with identify lexical entries or sub-entries with

shared propertiesshared properties create ready-to-use packages that can be create ready-to-use packages that can be

combined in different wayscombined in different ways Can be used “Can be used “off the shelfoff the shelf” or as a ” or as a

departure point for the definition of departure point for the definition of new or modified categoriesnew or modified categories

Page 103: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MDCR for some objectsMDCR for some objects<!-- <!-- Sample LDCR entry for a PredicativeCorresp and SlotArgCorresp objects Sample LDCR entry for a PredicativeCorresp and SlotArgCorresp objects DataCats for ISLE lexical entries DataCats for ISLE lexical entries v 0.1 2004/05/17 v 0.1 2004/05/17 Author: Monachini -->Author: Monachini -->

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" … … … … <PredicativeCorresp rdf:ID="<PredicativeCorresp rdf:ID="isobivalentisobivalent"> "> <includesSlotArgCorresp<includesSlotArgCorresp rdf:resource=“http://rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle-datacats/SlotArgCorresp#Arg0Slot0 SlotArgCorresp#Arg0Slot0

Arg1Slot1“/>Arg1Slot1“/> </includesSlotArgCorresp></includesSlotArgCorresp></PredicativeCorresp></PredicativeCorresp>

<SlotArgCorresp rdf:ID="Arg0Slot0"<SlotArgCorresp rdf:ID="Arg0Slot0" SlotNumber="0" SlotNumber="0" ArgNumber"0">ArgNumber"0"></SlotArgCorresp></SlotArgCorresp> <SlotArgCorresp rdf:ID="Arg1Slot1"<SlotArgCorresp rdf:ID="Arg1Slot1" SlotNumber="1" SlotNumber="1" ArgNumber"1">ArgNumber"1"></SlotArgCorresp></SlotArgCorresp>

</rdf:RDF></rdf:RDF>

Pre-Pre-instantiatedinstantiated PredicativeCo

rresp

Pre-instantiated

SlotArgCorresp

Page 104: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A Sample Entry in MILE A Sample Entry in MILE The entry is shown in a double alternative: The entry is shown in a double alternative:

1.1. the full specification of a lexical object the full specification of a lexical object PredicativeCorrespPredicativeCorresp

2.2. an already instantiated object an already instantiated object PredicativeCorrespPredicativeCorresp

The advantage is that The advantage is that the object does not need to be specified in the the object does not need to be specified in the

entry entry and can be and can be used and reusedused and reused in other entries in other entries

explore the potential of MILE for explore the potential of MILE for representation of lexical datarepresentation of lexical data

Page 105: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Sample full entry for Sample full entry for amareamareVV

<!-- The SynU SemU link --><!-- The SynU SemU link --> <correspondsTo><correspondsTo> <CorrespSynUSemU><CorrespSynUSemU> <hasSourceSynU mlcp:ID="SYNUamareV"><hasSourceSynU mlcp:ID="SYNUamareV"> </hasSourceSynU></hasSourceSynU> <hasTargetSemU mlcp:ID="SEMUamareEXPEVE"><hasTargetSemU mlcp:ID="SEMUamareEXPEVE"> </hasTargetSemU></hasTargetSemU> <hasPredicativeCorresp><hasPredicativeCorresp>

<PredicativeCorresp mlcp:ID="amare-PredCorresp"><PredicativeCorresp mlcp:ID="amare-PredCorresp"> <includesSlotArgCorresp><includesSlotArgCorresp> <SlotArgCorresp SlotNumber="0" ArgNumber="0"><SlotArgCorresp SlotNumber="0" ArgNumber="0"> </SlotArgCorresp></SlotArgCorresp>

<SlotArgCorresp SlotNumber="1" <SlotArgCorresp SlotNumber="1" ArgNumber="1">ArgNumber="1"> </SlotArgCorresp></SlotArgCorresp> </includesSlotArgCorresp></includesSlotArgCorresp> </PredicativeCorresp></PredicativeCorresp>

</hasPredicativeCorresp></hasPredicativeCorresp> </CorrespSynUSemU></CorrespSynUSemU> </correspondsTo> </correspondsTo> </SynU></SynU></hasSynu></hasSynu>

The “full” object

PredicativeCorresp

Page 106: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

… … the abbreviated entrythe abbreviated entry

<!-- The SynU SemU link --><!-- The SynU SemU link --> <correspondsTo><correspondsTo> <CorrespSynUSemU><CorrespSynUSemU> <hasSourceSynU mlcp:ID="SYNUamareV"><hasSourceSynU mlcp:ID="SYNUamareV"> </hasSourceSynU></hasSourceSynU> <hasTargetSemU mlcp:ID="SEMUamareEXPEVE"><hasTargetSemU mlcp:ID="SEMUamareEXPEVE">

</hasTargetSemU></hasTargetSemU> <hasPredicativeCorresp<hasPredicativeCorresp

rdf:resource=“http://rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle-datacats/PredicativeCorresp#isobivalent“/>PredicativeCorresp#isobivalent“/> </CorrespSynUSemU></CorrespSynUSemU>

</correspondsTo> </correspondsTo> </SynU></SynU> </hasSynu></hasSynu>

Instantiated object

PredicativeCorresp

Page 107: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

The RDF Schema, the DCR for MILE objects and the entries are available atwww.ilc.cnr.it/clips/rdf/

Page 108: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

and INTERA? …and INTERA? … INTERA Multilingual Terminological INTERA Multilingual Terminological

Lexica will follow and merge the two Lexica will follow and merge the two frameworksframeworks

The MILE and The MILE and ISO TMF (Terminological Markup ISO TMF (Terminological Markup

Framework)Framework)

Page 109: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

MILE Lexical Model oriented towards anMILE Lexical Model oriented towards an Open Distributed Lexical Open Distributed Lexical InfrastructureInfrastructure::

Lexical Information ServersLexical Information Servers for multiple access to for multiple access to lexical information repositorieslexical information repositories

Enhance Enhance user-adaptivityuser-adaptivity resource sharingresource sharing cooperative creationcooperative creation

Develop integration and interchange toolsDevelop integration and interchange tools

Beyond MILE: future Beyond MILE: future workwork

Page 110: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Broadening MILE:Broadening MILE: ... ... other languagesother languages

Ongoing enlargement to Ongoing enlargement to Asian languagesAsian languages (Chinese, (Chinese, Japanese, Korean, Thai, Hindi ...)Japanese, Korean, Thai, Hindi ...) promote common initiatives between Asia & Europe (e.g. within promote common initiatives between Asia & Europe (e.g. within

the EU 6th FP)the EU 6th FP)

The creation of an Open Distributed Lexical The creation of an Open Distributed Lexical Infrastructure, also supported by Asian Institutions: Infrastructure, also supported by Asian Institutions: AFNLPAFNLP University of Tokyo (Dept. of Computer Science)University of Tokyo (Dept. of Computer Science) Korean KAIST and KORTERMKorean KAIST and KORTERM Academia Sinica (Taiwan)Academia Sinica (Taiwan) ……

To valorise results & increase visibility of LR & standardisation initiatives in a world-wideworld-wide context, while concretely promoting the launching of a new common platform platform for multilingual LR creation &

management

Page 111: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Using semantically tagged corpora to …Using semantically tagged corpora to … acquire semantic info and enhance acquire semantic info and enhance

LexiconsLexicons evaluate the disambiguating power of the semantic types of the lexiconevaluate the disambiguating power of the semantic types of the lexicon assess the need of integrating lexicons with attested senses and/or phraseologyassess the need of integrating lexicons with attested senses and/or phraseology identify the inadequacy of sense distinctions in lexiconsidentify the inadequacy of sense distinctions in lexicons check actual frequency of known senses in different text typescheck actual frequency of known senses in different text types have a more precise and complete view on the semantics of a lemma have a more precise and complete view on the semantics of a lemma

identify the most general sensesidentify the most general senses capture the most specific shifts of meaningcapture the most specific shifts of meaning

Capture just the core, basic distinctions in a core lexiconCapture just the core, basic distinctions in a core lexicon Corpus analysis must not lead to excessive granularity of Corpus analysis must not lead to excessive granularity of

sense distinctionssense distinctions, , but but draw a distinction between draw a distinction between sense discriminationsense discrimination – to be kept “under control” - – to be kept “under control” - clustering clustering

(manually or automatically) (manually or automatically) additional, additional, more granularmore granular information (often of information (often of collocationalcollocational

nature) which can/must be nature) which can/must be acquired/acquired/encoded within the broader encoded within the broader senses, e.g. to help translationsenses, e.g. to help translation

Page 112: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

… … Dynamic lexiconDynamic lexicon Current Current computational lexicons (even WordNets) are computational lexicons (even WordNets) are static static

objects, still shaped on traditional dictionaries objects, still shaped on traditional dictionaries suffering from the limitations induced by paper support suffering from the limitations induced by paper support

Thinking at the complex relationships between lexicon and corpus Thinking at the complex relationships between lexicon and corpus towards a towards a flexible model of dynamic lexiconflexible model of dynamic lexicon

extending the expressiveness of a core static lexicon extending the expressiveness of a core static lexicon adapting to the requirements of language in use as attested adapting to the requirements of language in use as attested in corporain corpora

with semantic clustering techniques, etc.with semantic clustering techniques, etc.

Convert the extreme flexibility & multidimensionality of Convert the extreme flexibility & multidimensionality of meaning into large-scale and exploitable (VIRTUAL?) meaning into large-scale and exploitable (VIRTUAL?)

resourcesresources

a Lexicon and Corpus a Lexicon and Corpus togethertogether

Page 113: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

What to annotate?What to annotate?

Mix of:Mix of: Word-sense annotation (implicit semantic Word-sense annotation (implicit semantic

markup)markup) Semantic/conceptual markupSemantic/conceptual markup ……

Syntagmatic relationsSyntagmatic relations Dependency relations Dependency relations Semantic rolesSemantic roles ……

Page 114: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Need for a common Encoding Need for a common Encoding Policy ?Policy ?

Agree on common policy issues? Agree on common policy issues? is it feasible? is it feasible? desirable? desirable? to what extent?to what extent?

This would imply, among others:This would imply, among others: analysis of analysis of needs needs – also applicative/industrial - before any large – also applicative/industrial - before any large

development initiativedevelopment initiative base semantic tagging on commonly accepted base semantic tagging on commonly accepted standards/guidelinesstandards/guidelines ? ???

up to which level?up to which level? Common semantic tagset: Common semantic tagset: Gold Standard??Gold Standard??

build a build a core set of semantically tagged corporacore set of semantically tagged corpora, encoded in a , encoded in a harmonised way, for a number of languages??harmonised way, for a number of languages??

make annotated corpora available to the community by largemake annotated corpora available to the community by large involve the community, collect and analyse existing semantically tagged involve the community, collect and analyse existing semantically tagged

corporacorpora devise devise common set of parameters for analysiscommon set of parameters for analysis

Page 115: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A few Issues for discussion:A few Issues for discussion:MILE & lexicon standardsMILE & lexicon standardsMore standardisation initiatives?More standardisation initiatives?

MILE MILE - a general schema for encoding multilingual lexical - a general schema for encoding multilingual lexical info, info, as a meta-entryas a meta-entry, as a common representational layer , as a common representational layer

Short & medium term requirements wrt Short & medium term requirements wrt standards for standards for multilingual lexicons and content encodingmultilingual lexicons and content encoding, also , also industrial requirementsindustrial requirements

Relation with Relation with Spoken Spoken language language communitycommunity (see ELRA) (see ELRA) Semantic Web standardsSemantic Web standards & the needs of & the needs of content content

processing technologies: processing technologies: importance of reaching importance of reaching consensus on (linguistic & non-linguistic) consensus on (linguistic & non-linguistic) “content”“content”,, in in addition to agreement on formats & encoding issues (…addition to agreement on formats & encoding issues (…wordswords convey content & knowledge) convey content & knowledge)

Define Define further stepsfurther steps necessary to converge on common necessary to converge on common prioritiespriorities

Page 116: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

NLP, lexicons, terminologies, ontologies, Semantic Web: NLP, lexicons, terminologies, ontologies, Semantic Web:

a continuum?a continuum?Knowledge management is critical. Knowledge management is critical. For For “content” interoperability“content” interoperability, need, need to converge to converge around around

agreed standards also for the semantic/conceptual levelagreed standards also for the semantic/conceptual level is the field is the field ‘mature’ enough to converge‘mature’ enough to converge around agreed standards around agreed standards

also for the semantic/conceptual level (e.g. to automatically establish also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?links among different languages)?

Is the field of multilingual lexical resources Is the field of multilingual lexical resources ready to tackle the ready to tackle the challenges set by the Semantic Webchallenges set by the Semantic Web development? development?

Foster better integration with Foster better integration with corpus-driven datacorpus-driven data terminology/ontology/semantic webterminology/ontology/semantic web communities communities multimodal & multimedialmultimodal & multimedial aspects aspects

Broadening MILE: ... Broadening MILE: ... other other communitiescommunities

Oriented towards open, distributedopen, distributed lexical resources:

Lexical Information ServersLexical Information Servers for multiple access to lexical information repositories

Page 117: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A few Issues for discussion:A few Issues for discussion:NLP, lexicons, content, ontologies,NLP, lexicons, content, ontologies,

Semantic Web: … a continuum?Semantic Web: … a continuum?

Need for Need for robust systems, able to robust systems, able to acquire/tune acquire/tune multilingualmultilingual lexical/linguistic/conceptual knowledgelexical/linguistic/conceptual knowledge, to , to auto-enrich static basic resourcesauto-enrich static basic resources

Relation betw. lexical standards & Relation betw. lexical standards & acquisitionacquisition & text annotation protocols & text annotation protocols

Page 118: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Target…..Target….. Multilingual Knowledge ManagementMultilingual Knowledge Management Technical Feasibility:Technical Feasibility:

Prerequisite:Prerequisite: is it an is it an achievable goalachievable goal a a commonly agreedcommonly agreed text/lexicon annotation text/lexicon annotation protocol also for the semantic/conceptual protocol also for the semantic/conceptual levellevel (to be able to automatically establish links (to be able to automatically establish links among different languages)?among different languages)?

YesYes, at the, at the lexicallexical level level

More complex, for corpus annotation?More complex, for corpus annotation?

EAGLES/ISLEEAGLES/ISLE

Page 119: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Natural convergence with HLTHLT:•multilingual semantic multilingual semantic processingprocessing•ontologiesontologies•semantic-syntactic semantic-syntactic computational lexiconscomputational lexicons

To make the Semantic Web To make the Semantic Web a reality ...a reality ...

……need to tackle the twofold challenge of need to tackle the twofold challenge of content availabilitycontent availability && multilingualitymultilinguality

Page 120: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

… … enables a new role of enables a new role of Multilingual Multilingual LexiconsLexicons: :

to become essential component for theto become essential component for the Semantic WebSemantic Web

Language - & lexicons - Language - & lexicons - are theare the gateway to knowledge gateway to knowledge Semantic Web developers need Semantic Web developers need repositories of wordsrepositories of words & &

termsterms - & knowledge of their relations in language use & - & knowledge of their relations in language use & ontological classificationontological classification

The cost of adding this structured and The cost of adding this structured and machine-machine-understandable lexical informationunderstandable lexical information can be one of the can be one of the factors that delays its full deploymentfactors that delays its full deployment

The effort of making available The effort of making available millions of ‘words’ for millions of ‘words’ for dozens of languagesdozens of languages is something that is something that no small groupno small group is is able to affordable to afford

A radical shift in the lexical paradigmradical shift in the lexical paradigm - whereby many participants add linguistic content - whereby many participants add linguistic content

descriptions in an open distributed lexical framework -descriptions in an open distributed lexical framework - required to make the Web usablerequired to make the Web usable

Page 121: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Create Create a first repository of shared lexical entriesa first repository of shared lexical entries “extracted” from different lexical resources & “extracted” from different lexical resources & mapped to MILEmapped to MILE ((choosing e.g. lexical entries in areas related to the choosing e.g. lexical entries in areas related to the Olympic GamesOlympic Games)) to test mapping different lexicon models to MILEto test mapping different lexicon models to MILE provide a grid with all the ISLE Basic Notions, short descriptions, provide a grid with all the ISLE Basic Notions, short descriptions,

attributes and sub-elements,to be filled with the correspondent attributes and sub-elements,to be filled with the correspondent "notions”"notions”

Create a list Create a list (Open Lexicon Interest Group)(Open Lexicon Interest Group)

......

Beyond MILE: Beyond MILE: next steps...next steps... …. …. towards antowards an

Open Distributed Lexical Open Distributed Lexical InfrastuctureInfrastuctureLanguageLanguage

•Enhance user-adaptivityuser-adaptivity, , resource sharing, cooperative creation & managementresource sharing, cooperative creation & management•Lexical Information ServersLexical Information Servers for multiple access to lexical information repositories

Knowledge

Page 122: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A new paradigm forA new paradigm for a “new generation” of a “new generation” of

LR?LR?

New Strategic VisionNew Strategic Visiontowards a towards a Distributed Open Lexical Distributed Open Lexical

InfrastructureInfrastructure

Focus on cooperationcooperation, , also between different communities between different communities

• for distributed & cooperative creationdistributed & cooperative creation, management, etc. of Lexical Resources• MILEMILE as a common platform

• technicaltechnical & organisational& organisational requirementsrequirements

Page 123: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Beyond MILE:Beyond MILE: towards open & distributed towards open & distributed

lexiconslexicons

Semantic LexiconSemantic LexiconURI = http://www.xxx…

Syntactic Syntactic ConstructionsConstructionsURI = http://www.yyy…

OntologyOntologyURI = http://www.zzz…

Monolingual/MultilingualMonolingual/Multilingual LexiconLexicon

Lex_object: semFeatureURI = http://www.xxx…#HUMAN

Lex_object: syntagmaNTURI = http://www.zzz…#NP

corpora

Page 124: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

A few issues for the future...A few issues for the future...

Integration betw. Integration betw. WLR/SLR/MMRWLR/SLR/MMR (see e.g. (see e.g. LRECLREC))

Integration betw. Integration betw. LRs & SemWebLRs & SemWeb Integration of Integration of

Lexicons/Terminologies/Ontologies: towards Lexicons/Terminologies/Ontologies: towards Knowledge ResourcesKnowledge Resources

MultilingualMultilingual Resources: an open infrastructureResources: an open infrastructure Integration of Integration of Lexicon/CorpusLexicon/Corpus (see e.g. (see e.g.

Framenet)Framenet) Parallel evolution of Parallel evolution of LRs & LTechnologyLRs & LTechnology

Page 125: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

from Computational Lexicons to from Computational Lexicons to Knowledge ResourcesKnowledge Resources

Unified framework for lexicons, ontologies, Unified framework for lexicons, ontologies, terminologies, etc.terminologies, etc.

Towards an open, distributed infrastructure Towards an open, distributed infrastructure for lexical resourcesfor lexical resources Lexical Information ServersLexical Information Servers flexible and extensibleflexible and extensible integrated with multimodal and multimedial dataintegrated with multimodal and multimedial data integrated with Web technologyintegrated with Web technology related initiatives: INTERA, ICWLRErelated initiatives: INTERA, ICWLRE

Page 126: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

……with a with a world-wide world-wide participationparticipation

looking for an appropriate looking for an appropriate callcall

…….. pushing to launch an .. pushing to launch an Open & Distributed Lexical Open & Distributed Lexical

InfrastructureInfrastructurefor content description and for content description and content content

interoperabilityinteroperability, ,

to make lexical resources usable within the to make lexical resources usable within the emerging emerging Semantic WebSemantic Web scenario scenario

for Language Resources & for Language Resources & Semantic Web….Semantic Web….

Page 127: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

How to go to How to go to a framework allowing a framework allowing incremental creation/merging/…incremental creation/merging/…

How to:How to: "organise" creation/acquisition of "organise" creation/acquisition of multilingual multilingual

LRsLRs: evaluate different models: evaluate different models

cope with/affect cope with/affect maintenancemaintenance organise organise technology transfertechnology transfer among languages among languages support support BLARKBLARK ((a commonly agreed list of a commonly agreed list of

minimal requirements for “national” LRs)minimal requirements for “national” LRs) launch an international initiative linking launch an international initiative linking Semantic Semantic

Web & LRsWeb & LRs bootstrap this by bootstrap this by "opening" a few LRs"opening" a few LRs rolerole of standardsof standards

Page 128: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Lexical WEB & Lexical WEB & Content InteroperabilityContent Interoperability

As a critical step for semantic mark-up in As a critical step for semantic mark-up in the SemWebthe SemWeb

ComLex

SIMPLE

WordNets WordNets

WordNets

FrameNetLex_x

Lex_y

MILEMILE

with intelligent agents????

NomLex

Page 129: Infrastructural  Language Resources  &  Standards for Multilingual Computational Lexicons

Pisa, September 2004

Semantic Lexicon

http://www.xxx…

Syntactic Lexicon

http://www.yyy…

Ontology

http://www.zzz…

corpora

A new paradigm forA new paradigm for a “new generation” of LRs?a “new generation” of LRs?

Cross-lingual

Cross-linguallinkslinks


Recommended