Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme:...

Generation

Aims of this talk Discuss MRS and LKB generation Describe larger research programme:

modular generation Mention some interactions with other

work in progress: RMRS SEM-I

Outline of talk Towards modular generation Why MRS? MRS and chart generation Data-driven techniques SEM-I and documentation

Modular architecture

Language independent component

Meaning representation

Language dependent realization

string or speech output

Desiderata for a portable realization module Application independent Any well-formed input should be

accepted No grammar-specific/conventional

information should be essential in the input

Output should be idiomatic

Architecture (preview)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Why MRS? Flat structures

independence of syntax: conventional LFs partially mirror tree structure

manipulation of individual components: can ignore scope structure etc

lexicalised generation composition by accumulation of EPs: robust

composition Underspecification

An excursion: Robust MRS Deep Thought: integration of deep and

shallow processing via compatible semantics

All components construct RMRSs Principled way of building robustness into

deep processing Requirements for consistency etc help

human users too

Extreme flattening of deep output

x

every

cat

x

some

y dog1 chase

y x y

some

y dog1

y

every

x cat chase

xe

x ye

lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y),

lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x),

ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5

Extreme Underspecification Factorize deep representation to minimal

units Only represent what you know Robust MRS

Separating relations Separate arguments Explicit equalities Conventions for predicate names and sense

distinctions Hierarchy of sorts on variables

Chart generation with the LKB1. Determine lexical signs from MRS2. Determine possible rules contributing EPs

(`construction semantics’: compound rule etc)

3. Instantiate signs (lexical and rule) according to variable equivalences

4. Apply lexical rules5. Instantiate chart6. Generate by parsing without string position7. Check output against input

Lexical lookup for generation _like_v_1(e,x,y) – return lexical entry for

sense 1 of verb like temp_loc_rel(e,x,y) – returns multiple temp_loc_rel(e,x,y) – returns multiple

lexical entrieslexical entries multiple relations in one lexical entry: multiple relations in one lexical entry:

e.g., e.g., who, wherewho, where entries with null semantics: heuristicsentries with null semantics: heuristics

Instantiation of entries _like_v_1(e,x,y) & named(x,”Kim”) &

named(y,”Sandy”) find locations corresponding to `x’s in all FSs replace all `x’s with constant repeat for `y’s etc

Also for rules contributing construction semantics

`Skolemization’ (misleading name ...)

Lexical rule application Lexical rules that contribute EPs only

used if EP is in input Inflectional rules will only apply if

variable has the correct sort Lexical rule application does

morphological generation (e.g., liked, bought)

Chart generation proper Possible lexical signs added to a chart

structure Currently no indexing of chart edges

chart generation can use semantic indices, but current results suggest this doesn’t help

Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)

Root conditions Complete structures must consume all

the EPs in the input MRS Should check for compatibility of scopes

precise qeq matching is (probably) too strict

exactly same scopes is (probably) unrealistic and too slow

Generation failures due to MRS issues Well-formedness check prior to input to

generator (optional) Lexical lookup failure: predicate doesn’t

match entry, wrong arity, wrong variable types

Unwanted instantiations of variables Missing EPs in input: syntax (e.g., no noun),

lexical selection Too many EPs in input: e.g., two verbs and no

coordination

Improving generation via corpus-based techniques CONTROL: e.g. intersective modifier

order: Logical representation does not determine

order• wet(x) & weather(x) & cold(x)

UNDERSPECIFIED INPUT: e.g., Determiners: none/a/the/ Prepositions: in/on/at

Constraining generation for idiomatic output Intersective modifier order: e.g.,

adjectives, prepositional phrases Logical representation does not

determine order wet(x) & weather(x) & cold(x)

Adjective ordering Constraints / preferences

big red car * red big car cold wet weather wet cold weather (OK, but dispreferred)

Difficult to encode in symbolic grammar

Corpus-derived adjective ordering ngrams perform poorly Thater: direct evidence plus clustering positional probability Malouf (2000): memory-based learning

plus positional probability: 92% on BNC

Underspecified input to generationWe bought a car on FridayAccept:

pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday)

and:pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday)

And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)

Guess the determiner We went climbing in _ Andes _ president of _ United States I tore _ pyjamas I tore _ duvet George doesn’t like _ vegetables We bought _ new car yesterday

Determining determiners Determiners are partly conventionalized,

often predictable from local context Translation from Japanese etc, speech

prosthesis application More `meaning-rich’ determiners assumed to

be specified in the input Minnen et al: 85% on WSJ (using TiMBL)

Preposition guessing Choice between temporal in/on/at

in the morning in July on Wednesday on Wednesday morning at three o’clock at New Year

ERG uses hand-coded rules and lexical categories Machine learning approach gives very high precision

and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)

SEM-I: semantic interface Meta-level: manually specified

`grammar’ relations (constructions and closed-class)

Object-level: linked to lexical database for deep grammars

Definitional: e.g. lemma+POS+sense Linked test suites, examples,

documentation

SEM-I development SEM-I eventually forms the `API’: stable,

changes negotiated. SEM-I vs Verbmobil SEMDB

Technical limitations of SEMDB Too painful! `Munging’ rules: external vs internal SEM-I development must be incremental

Role of SEM-I in architecture Offline

Definition of `correct’ (R)MRS for developers

Documentation Checking of test-suites

Online In unifier/selector: reject invalid RMRSs Patching up input to generation

Goal: semi-automated documentation

Lex DB [incr tsdb()]

and semantic test-suite

Object-level SEM-I

Meta-level SEM-I

Documentation

Auto-generate examples

autogenerateappendix

ERGDocumentation

strings

semi-automatic

examples, autogenerated

on demand

Robust generation SEM-I an important preliminary

check whether generator input is semantically compatible with grammars

Eventually: hierarchy of relations outside grammars, allowing underspecification

`fill-in’ of underspecified RMRS exploit work on determiner guessing etc

Architecture (again)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Interface External representation

public, documented reasonably stable

Internal representation syntax/semantics interface convenient for analysis

External/Internal conversion via SEM-I

Guaranteed generation? Given a well-formed input MRS/RMRS,

with elementary predications found in SEM-I (and dependencies)

Can we generate a string? with input fix up? negotiation? Semantically bleached lexical items: which,

one, piece, do, make Defective paradigms, negative polarity,

anti-collocations etc?

Next stages SEM-I development Documentation and test suite integration Generation from RMRSs produced by shallower

parser (or deep/shallow combination) Partially fixed text in generation (cogeneration) Further statistical modules: e.g., locational

prepositions, other modifiers More underspecification Gradually increase flexibility of interface to

generation

Date post:	21-Dec-2015
Category:	Documents
View:	213 times
Download:	0 times

Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme:...

Documents