Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Generation
Aims of this talk Discuss MRS and LKB generation Describe larger research programme:
modular generation Mention some interactions with other
work in progress: RMRS SEM-I
Outline of talk Towards modular generation Why MRS? MRS and chart generation Data-driven techniques SEM-I and documentation
Modular architecture
Language independent component
Meaning representation
Language dependent realization
string or speech output
Desiderata for a portable realization module Application independent Any well-formed input should be
accepted No grammar-specific/conventional
information should be essential in the input
Output should be idiomatic
Architecture (preview)
Chart generator
String
External LF
Internal LF
SEM-I
control modules
specializationmodules
Why MRS? Flat structures
independence of syntax: conventional LFs partially mirror tree structure
manipulation of individual components: can ignore scope structure etc
lexicalised generation composition by accumulation of EPs: robust
composition Underspecification
An excursion: Robust MRS Deep Thought: integration of deep and
shallow processing via compatible semantics
All components construct RMRSs Principled way of building robustness into
deep processing Requirements for consistency etc help
human users too
Extreme flattening of deep output
x
every
cat
x
some
y dog1 chase
y x y
some
y dog1
y
every
x cat chase
xe
x ye
lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y),
lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x),
ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5
Extreme Underspecification Factorize deep representation to minimal
units Only represent what you know Robust MRS
Separating relations Separate arguments Explicit equalities Conventions for predicate names and sense
distinctions Hierarchy of sorts on variables
Chart generation with the LKB1. Determine lexical signs from MRS2. Determine possible rules contributing EPs
(`construction semantics’: compound rule etc)
3. Instantiate signs (lexical and rule) according to variable equivalences
4. Apply lexical rules5. Instantiate chart6. Generate by parsing without string position7. Check output against input
Lexical lookup for generation _like_v_1(e,x,y) – return lexical entry for
sense 1 of verb like temp_loc_rel(e,x,y) – returns multiple temp_loc_rel(e,x,y) – returns multiple
lexical entrieslexical entries multiple relations in one lexical entry: multiple relations in one lexical entry:
e.g., e.g., who, wherewho, where entries with null semantics: heuristicsentries with null semantics: heuristics
Instantiation of entries _like_v_1(e,x,y) & named(x,”Kim”) &
named(y,”Sandy”) find locations corresponding to `x’s in all FSs replace all `x’s with constant repeat for `y’s etc
Also for rules contributing construction semantics
`Skolemization’ (misleading name ...)
Lexical rule application Lexical rules that contribute EPs only
used if EP is in input Inflectional rules will only apply if
variable has the correct sort Lexical rule application does
morphological generation (e.g., liked, bought)
Chart generation proper Possible lexical signs added to a chart
structure Currently no indexing of chart edges
chart generation can use semantic indices, but current results suggest this doesn’t help
Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)
Root conditions Complete structures must consume all
the EPs in the input MRS Should check for compatibility of scopes
precise qeq matching is (probably) too strict
exactly same scopes is (probably) unrealistic and too slow
Generation failures due to MRS issues Well-formedness check prior to input to
generator (optional) Lexical lookup failure: predicate doesn’t
match entry, wrong arity, wrong variable types
Unwanted instantiations of variables Missing EPs in input: syntax (e.g., no noun),
lexical selection Too many EPs in input: e.g., two verbs and no
coordination
Improving generation via corpus-based techniques CONTROL: e.g. intersective modifier
order: Logical representation does not determine
order• wet(x) & weather(x) & cold(x)
UNDERSPECIFIED INPUT: e.g., Determiners: none/a/the/ Prepositions: in/on/at
Constraining generation for idiomatic output Intersective modifier order: e.g.,
adjectives, prepositional phrases Logical representation does not
determine order wet(x) & weather(x) & cold(x)
Adjective ordering Constraints / preferences
big red car * red big car cold wet weather wet cold weather (OK, but dispreferred)
Difficult to encode in symbolic grammar
Corpus-derived adjective ordering ngrams perform poorly Thater: direct evidence plus clustering positional probability Malouf (2000): memory-based learning
plus positional probability: 92% on BNC
Underspecified input to generationWe bought a car on FridayAccept:
pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday)
and:pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday)
And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)
Guess the determiner We went climbing in _ Andes _ president of _ United States I tore _ pyjamas I tore _ duvet George doesn’t like _ vegetables We bought _ new car yesterday
Determining determiners Determiners are partly conventionalized,
often predictable from local context Translation from Japanese etc, speech
prosthesis application More `meaning-rich’ determiners assumed to
be specified in the input Minnen et al: 85% on WSJ (using TiMBL)
Preposition guessing Choice between temporal in/on/at
in the morning in July on Wednesday on Wednesday morning at three o’clock at New Year
ERG uses hand-coded rules and lexical categories Machine learning approach gives very high precision
and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)
SEM-I: semantic interface Meta-level: manually specified
`grammar’ relations (constructions and closed-class)
Object-level: linked to lexical database for deep grammars
Definitional: e.g. lemma+POS+sense Linked test suites, examples,
documentation
SEM-I development SEM-I eventually forms the `API’: stable,
changes negotiated. SEM-I vs Verbmobil SEMDB
Technical limitations of SEMDB Too painful! `Munging’ rules: external vs internal SEM-I development must be incremental
Role of SEM-I in architecture Offline
Definition of `correct’ (R)MRS for developers
Documentation Checking of test-suites
Online In unifier/selector: reject invalid RMRSs Patching up input to generation
Goal: semi-automated documentation
Lex DB [incr tsdb()]
and semantic test-suite
Object-level SEM-I
Meta-level SEM-I
Documentation
Auto-generate examples
autogenerateappendix
ERGDocumentation
strings
semi-automatic
examples, autogenerated
on demand
Robust generation SEM-I an important preliminary
check whether generator input is semantically compatible with grammars
Eventually: hierarchy of relations outside grammars, allowing underspecification
`fill-in’ of underspecified RMRS exploit work on determiner guessing etc
Architecture (again)
Chart generator
String
External LF
Internal LF
SEM-I
control modules
specializationmodules
Interface External representation
public, documented reasonably stable
Internal representation syntax/semantics interface convenient for analysis
External/Internal conversion via SEM-I
Guaranteed generation? Given a well-formed input MRS/RMRS,
with elementary predications found in SEM-I (and dependencies)
Can we generate a string? with input fix up? negotiation? Semantically bleached lexical items: which,
one, piece, do, make Defective paradigms, negative polarity,
anti-collocations etc?
Next stages SEM-I development Documentation and test suite integration Generation from RMRSs produced by shallower
parser (or deep/shallow combination) Partially fixed text in generation (cogeneration) Further statistical modules: e.g., locational
prepositions, other modifiers More underspecification Gradually increase flexibility of interface to
generation