Meaningful Agents Talk goal - Rutgers Universitymdstone/pubs/nlucs06.pdf · Stone, DeCarlo, Oh,...

1

Human Communication Research CentreUniversity of Edinburgh

andDepartment of Computer Science

Center for Cognitive ScienceRutgers University

Designing Meaningful Agents

Matthew Stone

Meaning

The content that a speaker conveys and the hearer recognizes with an utterance.

Speaker meaning, in the sense of Grice 1957

Meaningful Agents

Systems that mean what they say.

Meaningfulness:– not a module or level of representation– an attribution based on many coordinated

processes

Talk goal

Get clearer on processes involved in meaning – an important way computational thinking about

language can contribute to cognitive science

2

Outline

IntroductionBackgroundMeaning in language

– Action– Interepretation– Convention

Conclusions and future work

Background

My research involves working with testbedsthat model face-to-face conversation.

Cassell, Stone, and Yan. INLG 2000.

3

REA

Developed by Justine Cassell at MIT and team.

I designed and implemented REA’sarchitecture for utterance generation.

Methodology

Models are based on human-human talk– where non-verbal cues mix closely with words

Methodology

Goal is to account for behaviors…

4

Methodology

…and their effects on the conversation

Generation in REA

Model how interlocutors plan behaviors.– An integrated process across modalities.– Incrementally add contextually-motivated and

contextually-appropriate actions.

Works in real time.Matches a corpus of human descriptions.

5

Argument

Modeling cues in face-to-face conversation – yields useful generalizations about how all

communication works.– yields useful distinctions about how meaning in

language is special.

Three case studies

Action in conversation– seamlessly combines language and other actions

Understanding in conversation– language and other actions are interpreted

analogously

Language and convention– we need to ground and track special

commitments for language

Underlying moral

Computational thinking connects practical systems and questions about cognition.

Cognitive science guides system-building– use linguistic techniques for multimodal systems

System-building informs cognitive science– ask clearer questions after practical experience

Outline


– Action– Interpretation– Convention


6

Modeling communicative action

Key principles

Temporal consistency– Voice and body emphasize same important

places in the sentence

Semantic coherence– Voice and body consistently reflect the

conversational action the speaker is taking

7

Theoretical orientation

Nonverbal actions are integral, meaningful parts of contributions to face-to-face conversation.

Sources:– Cassell in HCI – Ekman, McNeill, Clark, Bavelas and others in

psychology

Temporal consistency

BatonsQuick motions that align with speech accents

nod down on “similar”nod up on “ever”

8

Temporal consistency

UnderlinersGradual motions over an entire phrase

– tilt and turn on “far greater”– brow raise on “than any similar object”

Semantic coherence

All communicative actions support message

Here:– head tilt and turn anticipates story’s punchline– brow raise accompanies what’s surprising– nods underscore key concepts – similar, ever

9

Describing utterances computationally: coding“Musical score” for an utterance

– Made up of elements– Composed in an abstract time-line– Marking emphasis and synchrony

Representations are very close to those used already for expressive speech synthesis.

Coding for spoken utterances

TEXT far greater than any similar object ever discoveredINTONATION L+H* !H* H– L+H* !H* L– L+H* L+!H* L– L%BROWS 1+2 HEAD TR D* U*

Our realization

2004: DeCarlo, Stone, Revilla & Venditti“Specifying and animating facial signals for discourse in embodied conversational agents”, Computer Animation and Virtual Worlds 15(1).

10

Implementation

RUTH (Rutgers University Talking Head)Real-time animation from 3D geometryFreely available for research http://www.cs.rutgers.edu/~village/ruth

Outline




Modeling utterance interpretation

11

Trying to get precise

Clearly language and gesture are different– interpretation of gesture seems more elusive

But they do fit together– we do understand what gesture portrays– we do understand how that relates to speech


So there are these very low-level phonological errors that tend to not get reported…

…because they are being produced continually by an iterative process below our level of awareness.

12


Now one thing you could do is totally audiotape hours and hours…

…so that you get a large amount of data that you can think of as laid out on a time-line.


And exhaustively go through and make sure that you really pick up all the speech errors…

…by individually examining each unit of analysis along the time-line of your data.

13


Allow two different coders to go through it…

…and moreover get them to work independently and reconcile their activities.

Linking gesture and speech

The content of gesture connects with speech by rhetorical relations

[speech] because [gesture][speech] so that [gesture][speech] by [gesture][speech] and moreover [gesture]

Just like successive sentences in discourse connect together.

Parallel extends to formal model

Describe interpretation of gesture and speech in SDRT (Asher & Lascarides, 2003)

– start from underspecified meaning– assume specific resolution of meaning– find rhetorical connection to discourse– to explain why action is coherent

In fact, REA already used same (simple) model to interpret words and gesture

Practical consequences

Can add meaningful nonverbal behaviors very easily to systems designed for language

– possible with deep techniques as in REA– possible with shallow techniques too

14

Example: data-driven synthesis

Speech synthesis for limited domains– Plan utterances using templates– Select speech from application-specific database

Black and Lenzo 2000, Bulyko and Ostendorf 2002, Pan and Wang 2002

Becomes multimodal with database of speech and motion.

Our proof of concept

Our proof of concept

15

Convincing results

Because the animation closely follows human performance.

Performance in our animation

to set up your landing#155#185

you didn’t manage#174#214

dude#172#122

that was ugly#041#091

ContentVoiceMotion

16

Flexibility

The character responds dynamically to ongoing events – including player actions

By selecting recordings for new situations and adapt them for one another

Another selection of performance

you got it#094#114

on this jump#133#183

dude#172#192

yikes#181#031

ContentVoiceMotion

17

Another selection of performance

you got it#094#114

on this jump#133#183

dude#172#192

yikes#181#031

ContentVoiceMotion

Live Demo

Stone, DeCarlo, Oh, Rodriguez, Stere, Lees & Bregler“Speaking with Hands: Creating Animated Conversational Characters from Recordings of Human Performance”, ACM Transactions on Graphics 23(3) (SIGGRAPH 2004).

Outline





Clearly language and gesture are different– interpretation of gesture seems more elusive

Why might this be?

18

Gesture lacks convention

Meta-talk is unintelligible:– What do you mean [demonstration]?– What does [demonstration] mean?

Language depends on convention

You see this in meta-talk in dialogue:– What do you mean?– What does that mean?

You see this in intuitions about reference– Particularly when we borrow others’ reference

Reference borrowingSomeone, let’s say, a baby, is born; his parents call him by a certain name. They talk about him to their friends. Other people meet him. Through various sorts of talk the name is spread from link to link as if by a chain. A speaker who is on the far end of this chain, who has heard about, say, Richard Feynman, in the market place or elsewhere, may be referring to Richard Feynman even though he can’t remember from whom he first heard of Feynman or from whom he ever heard of Feynman. He knows that Feynman is a famous physicist. A certain passage of communication reaching ultimately to the man himself does reach the speaker. He is then referring to Feynman even though he can’t identify him uniquely.

Kripke 1980

Conventions involve commitments

Agents must work to keep meaning aligned with community

– Clarifying what they say, if asked– Resolving inconsistencies, if necessary

19

Pursuing commitments

Clarification– S: It’s light brown.

U: Do you mean tan?S: Yeah.

Negotiation– S: It’s a square.

U: There is no square.S: Isn’t the red one a square?U: No.S: OK, it’s not a square; it’s the red one.

Conventions involve commitments

Sounds very abstract, but amenable to strategies already used to implement dialogue

– after Larsson & Traum, Purver

Information-state approach

Formalize the state of the ongoing dialogue– including commitments and obligations

Formalize the moves interlocutors can make– including meta-level moves that push side tasks

Create dialogue strategy– rules for choosing good moves in each state

This is standard KR methology

See “Living with Classic” (Brachman et al 1990)– Get clear on the relevant objects and meaningful

distinctions in the world– Set up symbols for these objects and properties– Give clear content to structures that say how

things are– Build algorithms that respect that content

20

Pursuing commitments

Clarification– S: It’s light brown.

U: Do you mean tan?S: Yeah.

Negotiation– S: It’s a square.

U: There is no square.S: Isn’t the red one a square?U: No.S: OK, it’s not a square; it’s the red one.

Meaning and information state

Formalize the state of the ongoing dialogue– Include links between words and referents

interlocutors are committed to

Formalize the moves interlocutors can make– Carry out a clarification subdialogue,

to agree what a word refers to– Carry out a negotiation subdialogue,

to align concepts of how the world is

Meaning and information state

Dialogue strategy– Among other things, choose moves so that

agents pursue commitments to public meaning

“Societal grounding”– Formalized and illustrated in AAAI 2006:

Societal grounding is essential for meaningful language use. David DeVault, Iris Oved and Matthew Stone.

Outline




21

Conclusions

Action in conversation– seamlessly combines language and other actions

Understanding in conversation– language and other actions are interpreted in

similar ways

Language and convention– only words have conventional meanings that we

need to ground and track in conversation

Designing Meaningful Agents

An attribution based on coordinated processes– Not a module or level of representation

Using language as a collaboration

observed events

Inferredintentions

Intent Recognition

Current intention

action

CoordinatedExecution

Deliberation

Context:beliefsdesires

commitments

Pursuing sharing, not just outcomes

Inferredintentions

Current intentionDeliberation

Deliberation that reflects– Meta-level insight into one’s own meanings– Commitment to make meanings shared– Commitment to respect community standards

22

What makes language special?

A range of answers across cognitive science– Social underpinning, e.g. Tomasello– Linguistic processing, e.g. Pickering, Garrod– Linguistic structure, e.g. Fitch, Hauser, Chomsky

Need to reconcile them rather than arbitrate– Computational thinking can help make this clear

Semantics at Rutgers

PhilosophyJerry Fodor, Ernie Lepore, Jason Stanley

LinguisticsMaria Bittner, Veneeta Dayal, Roger Schwarzschild

Computer ScienceChung-chieh Shan, Matthew Stone

New initiative

IGERT in Perceptual Science– Graduate training program – Investigations bridging humans and computers– Includes face-to-face communication as a focus

Acknowledgments

AnimationDoug DeCarlo, (Assoc Prof RU)Radu Gruian (U’00), Corey Revilla (U’01),Christian Rodriguez (M’04),Niki Shah (M’01), Adrian Stere (U’05),

23

Acknowledgments

Data analysisJennifer Venditti (Postdoc; PhD Ohio State ’00)Chris Dymek (U’01)Nathan Folsom-Kovarik (U’01)Insuk Oh (PhD Student)

Acknowledgments

SemanticsAlex Lascarides (University of Edinburgh) GestureDavid DeVault (PhD Student, Rutgers) GroundingIris Oved (PhD Student, Rutgers) Grounding

Acknowledgments

Motion captureChris Bregler (Assoc Prof, NYU)Alyssa Lees (PhD Student, NYU)Kate Brehm (Performer)

AnimationLoren Runcie (U’04)Jared Silver (Staff, NYU)

Zoe model from Electronic Arts SSX 3

Date post:	26-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Meaningful Agents Talk goal - Rutgers Universitymdstone/pubs/nlucs06.pdf · Stone, DeCarlo, Oh,...

Documents