ICMI 2012 Workshop on gesture and speech production

A Common Gesture and Speech Production Framework for Virtual and Physical Agents

Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA

Quoc Anh Le - Jing Huang - Catherine PelachaudCNRS, LTCI

Telecom-ParisTech, France

Introduction

Motivations• Similar approaches between virtual agents and humanoid

robots• Limits of existing systems: agent dependent

Objectives• Common co-verbal gesture generation framework for both

virtual and physical agents Methodologies• Based on GRETA system• Use

- same representation languages - same algorithm for selecting and planning gestures - different algorithms for creating the animation

page 2

page 3

Behavior Realizer

(Common Module)

Intent Lexicon Behavior Lexicon

Behavior Planner

(Common Module)

FAP-BAP Values

Joint Values

ActiveMQ

Messaging Central System

FML-APML BML BML Keyframes

Animation Realizer

(Specific Module)

Animation Realizer

(Specific Module)

Keyframes Keyframes

Greta Animation Lexicon

Nao Animation Lexicon

Input Data (text, audio, video, etc)

Intent Planner

(Common Module)

FML-APML

Baselines for Nao

Baselines for Greta

Gestuary for Nao

Gestuary for Greta

FAP-BAP Player

Nao Built-in Proprietary Procedures

Architecture Overview

page 4

Behavior Realizer

(Common Module)


Behavior Planner

(Common Module)

FAP-BAP Values

Joint Values


Animation Realizer

(Specific Module)

Animation Realizer

(Specific Module)

Keyframes Keyframes




Intent Planner

(Common Module)

FML-APML

Baselines for Nao

Baselines for Greta

Gestuary for Nao

Gestuary for Greta

FAP-BAP Player

Nao Built-in Proprietary Procedures

Behavior Realizer

Behavior Lexicon

Behavior Realizer: Outline

page 5

Common processes to all agents1. Create gesture from the gestuary of an agent

2. Schedule timing of gesture phases

3. Generate keyframes: pair (absolute time, symbolic description of hand configuration at this time)

Different databases For Nao

Gestuary (for instance, pointing with full stretch arm) Velocity profile (empirically determined from Nao)

For Greta Gestuary (for instance, pointing with one finger) Velocity profile (empirically determined from real humans)

page 6

Example: Different pointing gestures

page 6

<bml id=“bml1” > <speech xmlns="" id="s1" start="0"> <text>It is <sync id=« tm1 »/> overthere! <sync id=« tm2 »/> </speech> <gesture id=« g1 » lexeme=« pointing » start=«s1:tm1» end=«s2:tm2»> <description priority=« 1 » type=«GRETA»> <GRETA:SPC>0.80</GRETA:SPC>

<GRETA:TMP>0.50</GRETA:TMP> <GRETA:FLD>-0.62</GRETA:FLD> <GRETA:PWR>0.30</GRETA:PWR> <GRETA:REP>0.00</GRETA:REP> <GRETA:OPE>1.00</GRETA:OPE> <GRETA:TEN>0.20</GRETA:TEN>

</description> </gesture></bml>

Nao Gestuary..<gesture id=« pointing »><phase type=« stroke »> <vertical>YUpperP</vertical> <horizontal>XEP</horizontal> <distance>XFar<distance> <hShape>OPEN</hShape> </phase></gestures>…

BML

<keyframe 1 (time, description)><keyframe 2 (time, description)>…<keyframe N (time, description)>

<keyframe 1 (time, description)><keyframe 2 (time, description)>…<keyframe N (time, description)>

JOINT VALUES BAP

Greta Gestuary..<gesture id=« pointing »><phase type=« stroke »> <vertical>YP</vertical> <horizontal>XP</horizontal> <distance>XMiddle<distance> <hShape>INDEX</hShape> </phase></gestures>…

1 1

2, 3 2,3

4 4

page 7

BR: Synchronization with speech

Algorithm• Compute preparation phase• Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i)

+duration)• Add a hold phase to fit gesture planned duration• Co-articulation between several gestures

- If enough time, retraction phase (ie go back to rest position)

- Otherwise, go from end of stroke to preparation phase of next gesture

Start

S-start S-end S-start S-end

end

Start end Start end

BR: Velocity profiles

page 8

Gesture velocity • Predict a movement duration using Fitts’ law:

• Movement Time = a+b*log2(Distance+1)• Threshold of maximal speeds (empirically determined)• Stroke phase is different from other phases in velocity and

acceleration (Quek, 1995)

Add expressivity• Temportal extent (TMP): Modulate the duration of whole gesture

=> change coefficient of Fitts’ Law

BR: Build coefficients of Fitts’ law

page 9

page 10

Behavior Realizer

(Common Module)


Behavior Planner

(Common Module)

FAP-BAP Values

Joint Values


Animation Realizer

(Specific Module)

Animation Realizer

(Specific Module)

Keyframes Keyframes




Intent Planner

(Common Module)

FML-APML

Baselines for Nao

Baselines for Greta

Gestuary for Nao

Gestuary for Greta

Animation Realizer

Implemented expressivity parameters

page 11

EXP Definition Nao Greta

TMP Velocity of movement Change coefficient of Fitts’ law

Change coefficient of Fitts’ law

SPC Amplitude of movement Limited in predefined key positions

Change gesture space scales

PWR Acceleration of movement

Modulate stroke duration Modulate stroke acceleration

REP Number of stroke repetition times

Yes Yes

FLD Smoothness and Continuity

No No

OPN Relative spatial extent to body

No elbow swivel angle

TEN Muscular tension No No

Create animation parameters Joint values for Nao BAP values for Greta

page 12

Create animation parameters

Descritization of the gestural space of McNeill (1992) One symbolic position will be translated into concrete values of agent joints (for

instance 6 joints of Nao as table below)

Translate symbolic keyframes in joint values Animation is obtained by interpolating between

joint values with robot built-in proprietary procedures use Slerp (spherical linear interpolation) with time warping: easing in out

functionsfor Greta

Code ArmX ArmY ArmZ Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand)

000 XEP YUpperEP ZNear (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0)

001 XEP YUpperEP ZMiddle (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0)

002 XEP YUpperEP ZFar (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0)

010 XEP YUpperP ZNear (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0)

... ... ... ... ...

page 13

Greta: Full Body IK

Torso target depending on hand position

Torso IK

Analytic Method: Arm To Torso

Demo: Greta

page 14

Demo: Nao

page 15

Perceptive Evaluation

Objective• Evaluate how robot’s gestures are perceived by human users

Procedure• Participants (63 French speakers) rate videos of Nao storyteller• Random displayed versions to the participants: - Gestures with expressivity VS. Gestures without expressivity

- Gesture-speech synchronization VS. Gesture-speech asynchronization

Results (using the ANOVA method)

• Synchronization: - F(1, 124) = 4.94, p < .05

- 76% agreed that gestures were synchronized with speech for sync version

• Expressivity:- F(1, 124) = 4.43, p < .05

- 70% agreed that gestures were expressive for expressivity version

page 16

State of the art

Most similar work: Salem et al. (2012) • Same idea (based on existing Max virtual agent system)

Main differences:• Our system: re-designed GRETA as a common framework• Salem et al.’s system: adjusted Max’s ACE to ASIMO robot

page 17

Features Our model Salem et al.’s system

Gesture Product Online from templates regardless specific domain

Automatically generated from trained specified domain data corpus

Gesture Shapes Agent specific parameter Original for Max and mapped to ASIMO configurations

Gesture Timing Agent specific parameter Original for Max and adapted to ASIMO by feedback

Expressivity Yes No

Synchronization Adapt gesture to speech Cross-Modal Adjustment

Future works

Short-term plan• Human like gestures: enhance velocity profiles• Expressivity: implement fluidity and tension

Long-term plan• Feedback mechanism • Study of the coherence between consecutive

gestures in a G-Unit (Kendon, 2004)

page 18

Date post:	19-Dec-2014
Category:	Technology
Upload:	le-anh
View:	544 times
Download:	1 times

ICMI 2012 Workshop on gesture and speech production

Technology