Post on 18-Dec-2015
transcript
REPORT onREPORT onComputational Lexicon Working Computational Lexicon Working
GroupGroup
on Multilingual Lexiconon Multilingual Lexicon EU -WG Meeting
December 1st-2nd 2000Pisa
UPenn, December 11 2000
The The MultilingualMultilingual ISLE ISLE Lexical EntryLexical Entry (MILE)(MILE)
General methodological principles (from EAGLES):
1. Basic requirements for the MILE:MILE:
Modular and layeredModular and layered
GranularGranular
Allow for underspecificationunderspecification
ISLE should discover and list (the maximal set of) basic notionsbasic notions to be included in the MILE
The leading principle for the design of the MILE should be the edited unionedited union of existing lexicons / models (redundancyredundancy should not be a problem)
MILEMILE
3.3. ObjectiveObjective: definition of definition of MILEMILE, ,
its basic notions, its basic notions,
architecture, architecture,
3.3. such that we can write a DTDsuch that we can write a DTD
& have a tool to support it& have a tool to support it
discover a methodology of workmethodology of work towards this
Some advantages:Flexibility of representation
Easy to customise and update
Easy integration of existing resources
High versatility towards different applications
ModularityModularity at least under three respects:
in the macrostructuremacrostructure and general general architecturearchitecture of the MILE
in the microstructuremicrostructure of the MILE
in the specific microstructure of the MILE word-senseword-sense
Modularity in MILEModularity in MILE
Modularity in MILEModularity in MILE
A. Modularity in the macrostructure and macrostructure and general architecturegeneral architecture of the MILE
1.1. Meta-informationMeta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX)
2.2. Possible architecture(s) of multilingual Possible architecture(s) of multilingual lexicon(s)lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer-based, interlingua-based approaches, and hybrid solutions.
Modularity in MILEModularity in MILE
B. Modularity in the microstructure microstructure of the MILE – The MILE could be organized in at least the following modules:
1.1. Monolingual linguistic representationMonolingual linguistic representation
2.2. Collocational informationCollocational information
3.3. Multilingual apparatus (e.g. transfer Multilingual apparatus (e.g. transfer conditions and actions)conditions and actions)
Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation
• It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language.
• It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.
Monolingual Linguistic Monolingual Linguistic Representation: Representation:
a Provisional Lista Provisional List
Morphological layer
• Grammatical category and subcategory
• Gender, number, person, mood
• Inflectional class
• Modifications of the lemma
• Mass/count, 'pluralia tantum'
• …
Monolingual Linguistic Monolingual Linguistic Representation: Representation:
a Provisional Lista Provisional List
Syntactic layer
• Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.)
• Attributive vs. predicative function, gradability
• List of syntactic positions forming subcategorization frames
• Syntactic constraints and properties of the possible 'slot filler'
• Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements)
• Information on control (subject control, object control, etc.) and raising properties
• …
Monolingual Linguistic Monolingual Linguistic Representation: Representation:
a Provisional Lista Provisional ListSemantic layer
• Characterization of senses through links to an Ontology
• Domain information, gloss
• Argument structure, semantic roles, selectional preferences
• Event type for verbs, to characterize their actionality behaviour
• Link to the syntactic realization of the arguments
• Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.)
• Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure)
• Information about regular polisemous alternation
• Information concerning cross-part of speech relations• ….
Collocational InformationCollocational Information
More or less typical and/or fixed syntactic-semantic patterns
• Typical or idiosyncratic syntactic constructions
• Typical collocates
• Support verb construction
• Phraseological or multiwords constructions
• Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.)
• Corpus-driven examples of MILE
• …
Multilingual ApparatusMultilingual Apparatus
Transfer conditions and actions
• possible starting points: OLIF, GENELEX, etc.
• devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated)
• identify which conditions must be expressible and which transformation actions are necessary
• select which types of information these conditions must access
• examine the variability in granularity needed when translating in different languages, and the architectural implications of this
• which role for an Interlingua?
Modularity in MILEModularity in MILE
C. Modularity in the specific microstructure ofmicrostructure of the MILE word-senseword-sense
Word-senses are the basic units at the multilingual level
Senses should also have a modular structure
Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1)
Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)
MILEMILE
A. MILE MacrostructureA. MILE Macrostructure
Meta-informationMeta-information
ArchitectureArchitecture
B. MILE MicrostructureB. MILE Microstructure
1. Monolingual1. Monolingual 2. Collocational2. Collocational 3. Multilingual3. Multilingual
C. Word-Sense C. Word-Sense MicrostructureMicrostructure
1. Coarse-grained1. Coarse-grained
2. Fine-grained2. Fine-grained
Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation
A strategy:• consider as the starting point for MILE the edited unionedited union of
the basic notions represented in the existing syntactic/semantic lexicons (their models)
• evaluate their notions wrt EAGLESEAGLES recommendations for syntax and semantics
• evaluate their usefulness & adequacyusefulness & adequacy for multilingual tasks
• evaluate integrabilityintegrability of their notions in a unitary MILE
• look for deficient areasdeficient areas.
To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?
Open issues:
• what is relevant
• what can be generalised and formally characterised
• what must be simply listed (but even lists may be partially categorised)
• what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.)
Collocational InformationCollocational Information
Agreed PrinciplesAgreed Principles
MILEMILE incorporates previous recommendations:
is the “complete” entryis the “complete” entry
(to be evaluated wrt usefulness & adequacy for multilingual tasks)
MILEMILE builds on the monolingual entry & builds on the monolingual entry & expands itexpands it
(at least) with an additional module where correspondences betw. languages are defined
We consider 2 broad categories of applicationscategories of applications
translation
CLIR (linking module may be simpler)
(label info types wrt application)
Clues in dictionaries to decide on target equivalent
Guidelines for lexicographers
Clues (to disambiguate/translate) in corpus concordances
Lexical requirements from various types of transfer conditions and actions in MT systems
Lexical requirements from interlingua-based systems
Examined guidelines for bilingual dictionaries provided by SA
Paths to discover Basic Paths to discover Basic Notions of MILENotions of MILE
For all the notionsnotions:
notion already in previous workin previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…)
evaluate if the existing specs are adequate
draw a list of “not yet recommended/adoptednot yet recommended/adopted” notions:
method of work
priorities
for which applications
assign tasks
need of further development
Classification of Basic Classification of Basic Notions of MILENotions of MILE
Organisational ProposalOrganisational Proposal
Start from available EAGLESEAGLES recommendations, e.g. as instantiated in Parole/Simple
adopt as starting point the P/SP/S DTD, DTD,
to be revised & augmented
see Barcelona tool
Evaluate if we can combine
in a “hybrid super-model”“hybrid super-model”
the transfer & interlingua approaches
1. Select a list of critical critical information typesinformation types that will compose each module of the MILE
2. Start an in-depth analysis of eachin-depth analysis of each of these areas aiming at identifying:
The most stable solutions adopted in the community
Linguistic specifications and criteria
Possible representational solutions, their compatibility, etc.
An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations)
Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet)
…..
Organisational ProposalOrganisational Proposal
The tasks should lead to:The tasks should lead to:
Information Types
1. How to represent it (e.g. frames, a selection of theta-roles, e.g.)
2. Typology of arguments3. Representational problems4. Applicative constraints and needs5. Linking with syntax (how to express it)6. Open issues
Argument structureArgument structure
1. Typology (e.g. hyponymy, meronymy, etc.)2. Available tests3. Representational format(s)4. Applicative constraints and needs5. Expressive limits6. Open issues
Semantic relationsSemantic relations
1. Types of modifiers2. Representational issues3. Open issues
Modification relationsModification relations
1. Typology2. How to represent the “internal” structure
of MWEs (e.g. Mel’cuk relations, etc.)3. Encoding criteria4. Application needs and biases 5. Open issues
MultiWords MultiWords ExpressionsExpressions
1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.)
2. Different status of the preferences3. Criteria to identify them4. Expressive limits of existing formal
resources
Selectional Selectional preferencespreferences
Information Types
1. Identification of categories of transfer phenomena
2. Ranking of hard cases3. Possible parameterisation wrt language
types4. How to formalise them5. Types of actions
Transfer conditions Transfer conditions and actionsand actions
1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.)
2. Inheritance3. Which roles for ontologies in the MILE4. Representational issues5. Customisation and development criteria6. Limits
OntologyOntology
1. Typology2. How to represent them3. Interaction with selectional preferences
Collocational Collocational PatternsPatterns
Information Types
Organisational ProposalOrganisational Proposal
Highlighted some hot issues & assigned tasks:
sense indicators (Issco)
selection preferences (Thurmair)
argument structure (US?….)
MWE (Pisa)
modifiers (Jock)
semantic relations (Piek?)
transfer conditions (…)
collocational patterns (…)
ontology (…)
….
Organisational Proposal
Ask to AmericansAmericans, e.g.:
evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,…
analyse some of the above info types
look at other languages (Japanese, Chinese, Korean, …) for transfer conditions
look at transfer-based MT systems
look at interlingua MT systems (e.g. Mikrokosmos): additional info types?
…
Meeting Meeting together US & EUUS & EU, e.g. end February, beg. March?
DIET Tool
From ISSCO:
for text annotation (of test suites for semantic annotation)
to be used for evaluation purposes
….
…
...
Survey:Survey:
List of Received MaterialsList of Received Materials
Comparison table Linguistic phenomena
Collins, Hachette-Oxford
Yes Yes
Van Dale Lexicons Yes No
FrameNet Yes No
Collins-Robert lexical-semantic db
Yes No
PAROLE-Simple Yes Yes
EuroWordNet Yes Yes
Eurotra Yes Yes
OLIF No No
Genelex No No
EDR No No