+ All Categories
Home > Documents > Information Status

Information Status

Date post: 06-Jan-2016
Category:
Upload: snowy
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Information Status. Varieties of Information Status. Contrast John wanted a poodle but Becky preferred a corgi . Topic / comment The corgi they bought turned out to have fleas . Theme / rheme The corgi they bought turned out to have fleas . - PowerPoint PPT Presentation
Popular Tags:
36
Information Status
Transcript
Page 1: Information Status

Information Status

Page 2: Information Status

Varieties of Information Status

– Contrast John wanted a poodle but Becky preferred a corgi.

– Topic/comment The corgi they bought turned out to have fleas.

– Theme/rheme The corgi they bought turned out to have fleas.

– Focus/presupposition It was Becky who took him to the vet.

– Given/new Some wildcats bite, but this wildcat turned out to be a sweetheart.

Page 3: Information Status

Today: Given/New

• Why do we care about Given/New?• Defining Given/New: why is this hard?

– Hearer-based and Discourse-based models• Uses of Given/New information in NLP• Identifying Given/New information automatically

– Rule-based– Corpus-based– The Boston Directions Corpus– Laboratory studies suggest new directions

Page 4: Information Status

Why do we care about the given/new distinction?

• Building a model of the discourse– What do S and H believe to be true?– What is in their consciousness now?– What is ‘grounded’?

• Speech technologies– TTS: Given information is often deaccented

while new information is usually accented– ASR?

Page 5: Information Status

Defining Given/New

• Halliday ‘67:

– Given: Recoverable from some form of context

– New: Not recoverable

• Chafe ’74 ’76:

– Given: what S believes is in H’s consciousness

– New: what S believes is not…

– “Chafe-givenness”Yesterday I had my class disrupted by a bulldog/dog.

I’m beginning to dislike dogs/bulldogs.

• But not vice versa….

Page 6: Information Status

Prince ’81: A Given/New Taxonomy

• Text as set of instructions from S to H on how to construct a discourse model– Model includes discourse entities, attributes,

and links between entities– Discourse entities: individuals, classes,

exemplars, substances, concepts (NPs)– Entities as ‘hooks’ on which to hang attributes

(Webber ’78)• Entities when first introduced are new

Page 7: Information Status

– Brand-new (H must create a new entity)I saw a dinosaur today.

– Unused (H already knows of this entity)I saw your mother today.

• Evoked entities are old -- already in the discourse– Textually evoked

The dinosaur was scaley and gray.

– Situationally evokedThe light was red when you went through it.

• Inferrables– Containing

Page 8: Information Status

I bought [a carton of eggs]. One of them was broken.

[The door of the Bastille] was painted purple.

– Non-containingA bus pulled up beside me. The driver was a monkey.

Page 9: Information Status

Given/New and Definiteness/Indefiniteness– Definiteness: subject NPs tend to be

syntactically definite and old– Indefiniteness: object NPs tend to be indefinite

and newI saw a black cat yesterday. The cat looked hungry.• Definite articles, demonstratives, possessives,

personal pronouns, proper nouns, quantifiers like all, every signal definiteness…but…

There were the usual suspects at the bar.

• Indefinite articles, quantifiers like some, any, one signal indefiniteness…but….

This guy came into the room

Page 10: Information Status

What’s wrong with a simple Hearer-centric model of given/new?

• Hearer-centric information status:– Given: what S believes H has in his/her

consciousness– New: what S believes H does not have in

his/her consciousness• But discourse entities may also be given and new

wrt the current discourse– Discourse-old: already evoked in the discourse– Discourse-new: not evoked

Page 11: Information Status

(1) A: I’ve decided to make an appointment with Lee Bollinger.

(2) B: Why do you want to see Bollinger?

• Hearer status of discourse entities in 1? 2?– If B is your roommate? your mother? a guy on

the subway?• Discourse status of discourse entities in 1? 2?• What would be the hearer/discourse status of

discourse entities in this version?(1) A: I’ve decided to make an appointment with Lee Bollinger.

(2a) B: Why do you want to see the president?

(2b) B: Have you talked to his secretary?

Page 12: Information Status

What does this new Hearer/Discourse given/new distinction provide?

• A way to separate what is explicit in the discourse model from what is believed to be in speaker/hearer cognitive model

• A way to explain given/new in more complex terms– To identify coreference relations– To explain deaccenting in ASR and TTS

Page 13: Information Status

Gross Oversimplification: Given Items Tend to be Deaccented

• Accenting and deaccenting: making items intonationally prominent or not

• Critical to get this distinction ‘right’ in TTS– Accenting everything makes it hard for people

to understand anything, e.g.

I like my cat and my cat adores me.

One potato, two potato, three potato,…

If a discourse entity is given for one speaker then it may or may not be given for another speaker.

Page 14: Information Status

How can we determine automatically whether a discourse entity is given or new?

• A rule-based approach:– Stem the content words in the discourse– Select a window within which incoming items

with the same stem as a previous entity and within this window will be labeled ‘given’

• Other items are ‘new’

• Is this hearer-based? Discourse-based?• How well does it work?

– 65-75% accurate (precision) depending on genre, domain

Page 15: Information Status

Boston Directions Corpus (Hirschberg & Nakatani ’96)

• Experimental Design• 12 speakers: 4 used• Spontaneous and read versions of 9 direction-giving

tasks

• Corpus: 50m read; 67m spon• Labeling

– Prosodic: ToBI intonational labeling– Discourse: Grosz & Sidner– Given/new (Prince ’92), grammatical function,

p.o.s.,…

Page 16: Information Status

d1: dsp1: step 1: enter and get tokenfirstenter the Harvard Square T stopand buy a token

d2: dsp2: inbound on red linethenproceed to get on theinboundumRed Lineuh subway

Boston Directions Corpus: Describe how to get to MIT from Harvard

Page 17: Information Status

dp3 dsp3: take subway from hs, to cs to ksandtake the subwayfrom Harvard Squareto Central Squareand then to Kendall Square

dp4: dsp4: get off T.then get off the T

Page 18: Information Status

Hearer and Discourse Given/New Labelingfirstenter <HG/DN the Harvard Square T stop>and buy <HI/DN a token>thenproceed to get on <HI/DN theinboundumRed Lineuh subway>andtake <HG/DG the subway>from <HG/DG Harvard Square>to <HG/DN Central Square>and then to <HG/DN Kendall Square>then get off <HG/DG the T>

Page 19: Information Status

What could we do with this labeled data?

• Can we predict given/new?• Can we predict what will be accented and what

will be deaccented?

Page 20: Information Status

Does Given/New Status Predict Deaccenting?

NPa HG HI HN DG DN

Deaccented 37.1% 53.9% 26.2% 43.3% 38.8%

Total 1009 406 130 596 950

Page 21: Information Status

What else might be at work?• Given/new and grammatical function

• Hypothesis: how discourse entities are evoked in a discourse influences how ‘given’ they are

• E.g., How might grammatical function and surface position interact with the accentuation of ‘given’ items?

• Cases:

– X has not been mentioned in the prior context

– X has been mentioned, with the same grammatical function/surface position

– X has been mentioned but with a different grammatical function/surface position

Page 22: Information Status

Experimental Design

• Major problem:– How to elicit ‘spontaneous’ productions while

varying desired phenomena systematically?– Key: simple variations and actions can

capitalize upon natural tendency to associate grammatical functions with particular thematic roles for a given set of verbs

Page 23: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Page 24: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Context 1Context 1

Page 25: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Context 2Context 2

Page 26: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Context 3Context 3

Page 27: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Target(A)Target(A)

Page 28: Information Status

TriangleTriangle

CylinderCylinder

DiamondDiamond

RectangleRectangle

OctagonOctagon

Target(B)Target(B)

Page 29: Information Status

Experimental Conditions

• 10 native speakers of standard American English• Subject and experimenter in soundproof booth• Subject told to describe scenes to confederate

outside the booth, visible but with providing no feedback

• 10 practice scenarios• ~20 minutes per subject

Page 30: Information Status

Prosodic Analysis

• Target turns excised and analyzed by two judges independently for location of pitch accents for each referring expression: accented (2), unsure (1), deaccented (0) accentedness score from 0-4 (81% agreement for 0 and 2 scores)

Page 31: Information Status

Grammatical Role/Surface Position Accenting

CONTEXT TARGET

GIVEN Subj D-obj Pp-obj

Subj 2.1 3.6 3.2

D-obj 3.3 0.6 1.6

Pp-obj 3.0 1.4 0.7

NEW 3.7 3.8 --

Page 32: Information Status

Findings• In general

– Items that differ from context to target in grammatical function or surface position tend to be accented

– Items that share grammatical function and surface position tend to be deaccented

• But– Subjects tend to be accented more often than

objects, even if previously mentioned in the same role

– Direct objects and pp-objects tend to be more distinguished from subjects than from one another

Page 33: Information Status

How can we explain these observations?

• Consider our examples, e.g. subjD.O.

The TRIANGLE touches the CYLINDER.

The triangle touches the DIAMOND.

The triangle touches the OCTAGON.

The RECTANGLE touches the TRIANGLE.• An entity may be ‘given’ or ‘new’ wrt the role it

plays in the discourse

Page 34: Information Status

Given/New Sensitive to the Role the Discourse Entity Plays

• E.g., a discourse entity may retain a given or take on a new thematic role

– By the time the target is uttered, ‘triangle’ is established both as a ‘given’ discourse entity and as the discourse topic (or BLC in centering theory)

– But this status has been established for ‘triangle’ as agent

– What is new, and, perhaps, focused in the target is ‘triangle’s’ new thematic role as patient – the players are the same but the roles are different

Page 35: Information Status

Consequences for NLP

– Identification of given/new status must be sensitive to more complex model of context (grammatical function/thematic role)

– Will this help us predict deaccenting more accurately?

– Stay tuned…..

Page 36: Information Status

Next Class


Recommended