PrepNet: a Framework for Describing Prepositions:
Preliminary Investigation results
Patrick Saint-DizierIRIT-CNRS, France
Long-term objectives
• Construct a repository of preposition syntactic and semantic behaviors,
• Develop a multi-level approach, from prototypical uses to unexpected ones, that accounts for diversity of preposition uses and for their polysemic behavior,
• Develop a relatively shallow semantic characterization based on frames,
• Investigate the verb-preposition-NP relations: restrictions and compositionality
• Develop a multi-lingual approach. Applications: MT, Knowledge extraction, QA, etc.
This paper: basic elements of a preliminary approach
• Introduce a general characterization of preposition senses viewed as abstract notions,
• Characterize these abstract notions by means of frames (viewed as linguistic or conceptual macros),
• Populate preposition frames via corpus and then validate, • Develop a multi-level characterization of preposition uses, to
organize the diversity of their uses in language,• Raise a few questions about multilinguality (prepositions can be
realized by other categories or by morphology in some languages)
Investigate evaluation methods, in abstracto, and via applications.
Related work
• Very little in CL circles compared to verbs and nouns, in spite of their necessity in a number of applications (MT, IE, QA, …),
• Almost nothing in EWN, FrameNet or VerbNet,• Some valuable work in AI: e.g. temporal, spatial reasoning,• A few isolated works in linguistics on a given preposition,• Quite a lot of work in psycho-linguistics.
Other resources: B. Dorr’s large description for English, with MT in view (about 500 entries).
Why is that so ?
• High polysemy (but may be not more than adjectives?, and smaller number: 95 preps. in French + compounds, 32 in Spanish: not always agreement on what a preposition is…..)
• Linguistic realizations very difficult to predict, large number of idiosyncratic uses and cross-linguistic differences,
• Syntactic difficulties due to the chain V-Prep-N, e.g.: PP-attachment problems, VPC,
• Deep level in the semantic-cognitive structure: prepositions often used in metalanguages as primitives
Study here only compositional uses of prepositions
Global architecture of the proposal
Prep. Senses: 3 level set of abstract notions
Shallow semantic representation with strata
Uses in language 1 Uses in language 2 etc.
General architecture (1): categorizing preposition senses
Preposition categorization on 3 levels:– Family (roughly thematic roles): localization, manner,
quantity, etc.– Facets: localization: source, position, destination, etc.– Modalities.
Facets viewed as abstract notions on which PrepNet is based
12 families defined
Families/ facets
Quantity: numerical/ frequency / proportionAccompaniment: adjunction/ simultaneity/ inclusion/ exclusionManner: means/ manners and attitudes/ imitation or analogyLocalisation: source/ destination/ via/ fixed positionChoice and exchange: exchange / choice or alternative / substitutionCausality: cause/ goal or consequence/ intentionOppositionOrdering: priority/ subordination/ hierarchy/ ranking/ degree of importanceMinor elements: about, in spite of, comparison(see examples in paper)
Conceptual/ ontological status of these dictinctions ??
• Families ‘superframes’ : general principles and restrictions
• Facets: frames, strata: subframes : with some general forms of inheritance and property consistency
• Whenever appropriate: modalities subframesFrames are viewed as linguistic macros, to be interpreted.They are shallow or coarsed-grained representations so far.Language realizations are a priori associated with the lower
level frame nodes.
(2): a conceptual, prelexical structure
Frame of abstract notion
SF1 SF2 SF3
- name + gloss,- shallow restrictions- simplified LCS representation
strata ofabstract notion:subframes
Structure of a frame
• Structure:– Number, name, gloss,– Frame with shallow constraints: X <Action> Y [Number] Z– Conceptual representation in simplified LCS (kind of LST)– In the future: inferential patterns (within a frame or among frames)
195 senses/abstract notions described using 65 primitivesShallow constraints:
(1) generic semantic types (2) generic verb class types from WordNet (3) generic semantic fields from the LCS: temp, poss, loc, psy,
epist, perc, amount, comm, prop, abs, etc.
Example 1: ‘via’[1] : VIA - generic.'An entity X moving via a location Y' X <ACTION> [1] YX: concrete entity, ACTION: movement verb, Y: locationrepresentation: X : via(loc, Y)French synset: {par, via} example: Jean rentre par la porte
Stratification 1:[1.1] : VIA - narrow passage.'An entity X moving via / an action that uses a narrow passage in an object Y'X <ACTION> [1.1] YX: concrete entity, ACTION: perception verb, Y: location with a narrow passagerepresentation: X : through(loc or temp, Y)French synset: {a travers, au travers de, dans}example: Jean regarde a travers la grille / dans les jumelles.
.
Example 1, cont’: Stratification 2: [1.2.1] VIA UNDER – from generic 'An entity X moving via under a location Y' X <ACTION> [1.2.1] Y X: concrete entity, ACTION: movement verb,
Y: location with a form of passage under it representation: X : via(loc, under(loc,Y)) French synset: {par dessous} example: Jean passe par dessous le pont.
[1.2.2] VIA ABOVE – from generic etc.
Example 2: instruments
Stratification requires the taking into account of 2 relations,characterized by means of primitives (Mari and Saint-Dizier 03):
– Actor/instrument: undergo (no control), select (controls another prop.), control,
– Instrument/ V+NP object: be (passive, but participates), react (other prop than controlled by the agent), act (full participation)
Contrast: cut the bread with a knife / eat soup with a spoon John burned himself with boiling oil.
A generic entry for instruments, and, potentially: 9 strata (combinations), depends on language.
4 strata for French
(2) cont’
[5] : MANNER - MEANS - Instrument'Someone X doing an action Y using instrument Z.'X <ACTION> Y [5] ZX: human, ACTION: verb of change, Y: object Z: instrumentrepresentation: X: by-means-of(_, Z)
Followed by a priori 9 Strata.Example: Application to French:1. Be(X,Z) Λ Undergo(Z, Action+Y) : synset: {grâce à} , restrictions…2. Be(X,Z) Λ Select (Z, Action+Y) : synset: {par} , restrictions…3. Select(X,Z) Λ React (Z, Action+Y) : synset: {avec} , restrictions…4. Act(X,Z) Λ Control (Z, Action+Y) : synset: {avec, au moyen de}, …..
(3) The language realization level
SFi (= lower frame level)
Multi-level partitioning of realizations from usage norms
Direct uses Indirect usesetc… etc…
restr1 restr2 restr3 Derived types, …
synset1 synset3 synsets ??…. … + frequency measures
Populating preposition frames from corpora
• Conceptual frames are associated with shallow constraints
Move on to the language level, elements of a method:• For a given language: associate each frame strata with
corpus and dictionary observations• Manual analysis: identify prototypical uses, promote usage
norms multi-level partitioning of realizations• Contrast, if possible, direct versus indirect (mainly
metaphorical) realization levels• Elaborate conceptual/ontological status of categorizations
and related constraints (mainly semantic types)
A few notes
• Multi-level architecture: helps to account for the large variety of (compositional) behaviors, investigate in more depth partitioning strategies, incremental depth to get finer-grained analysis worth pursuing??
• For each synset: develop frequency measures, identify contexts of use (from syntactic to type of text): frequency rates are very diverse (some uses are only found in dictionaries!)
• Populate but then valide on new corpora: develop several forms of corpus annotations (the frame; the relation with the head, with the NP, etc.)
Looking at other languages
• Hypothesis: given an abstract notion (interlingua), translations are constructed on the basis of the restrictions that hold on the corresponding synsets, BUT:
• Large realization variations are in general observed, even for closely related languages: up to what point is this just surface language contrasts? Or is it also conceptual ? :Regarder dans le microscope / look through the microscope (durch; a travès de)
• Some languages have do not use so much pre-/post-positions, but other categories, incorporation in heads, or just case marks .
Preliminary conclusions
• Preliminary investigation to identify difficulties and organize the research,
• Global architecture looks an interesting approach• Abstract notion definitions seem to be quite stable, status
of strata needs further investigations,• Multi-level approach to language realizations seems a
good direction, but needs a much larger testing on a number of languages and a more clear method to organize sets of realizations
• Implement an open system on the Web.
Some obvious research directionsontological/conceptual status of categorizations and
restrictions, Investigate integration with other frameworks: VerbNet,
FrameNet, Investigate preposition polysemy and derived uses in more
depth, and ways to characterize itRelations Head-preposition-NP, and compositionality
(Head is often a verb, but can be any other kind of predicate): some PPs have wider scope over the proposition.
Inferential patterns associated with prepositions (e.g. for approximation notions, spatial notions, etc.)