+ All Categories
Home > Documents > APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall,...

APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall,...

Date post: 01-Apr-2015
Category:
Upload: luc-rimes
View: 218 times
Download: 4 times
Share this document with a friend
Popular Tags:
35
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer Science Susan Paddock and Simon Rushton Cardiff School of Psychology Cardiff University
Transcript
Page 1: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk

David Marshall, Darren Cosker and Paul Rosin

Cardiff School of Computer Science

Susan Paddock and Simon Rushton

Cardiff School of Psychology

Cardiff University

Page 2: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Context: A Talking Head• Development of a Video-Realistic Talking

Head• Animation from Continuous Speech• Perceptual Analysis -> Realism

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Page 3: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Contribution of this Paper: Perceptual Realism Test

• Perceptual Analysis via McGurk Test• Perceptual Test with no prior bias• Used to improve talking head

synthesis

QuickTime™ and aSorenson Video decompressorare needed to see this picture.

Page 4: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Outline of Talk

• Video Realistic Talking Head (Overview)

• Perceptual Analysis and Testing• The McGurk Effect + McGurk Test• Results : Implications of McGurk• Conclusions + Future Work

Page 5: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Our Talking Head

• Image based synthesis• Continuous Speech• Flexible framework – emotion, behaviour

BASIC IDEA:• Train on input video and audio

• Extracting only low level image and audio features• No phonetic labelling

• Synthesise new video using only input audio• Unseen utterances• Speaker Independent

Page 6: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Hierarchical Facial Model

• Active Appearance Models – Control of shape and texture using single ‘appearance parameter’

• Based on Principal Component Analysis (PCA)

• Non-linear Hierarchical PCA (developed at Cardiff)

• Greater Separation of Variation

• High Degree of Control – Sub-Facial variation not orthogonal in standard PCA model

• Coupling of Speech Model (Cardiff Idea)

Page 7: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Building A Talking Head - Initialisation

For Each Video Frame Extract:• Shape – Key Landmark Points (Tracker Helps)• Textures – Colour Pixel Values Normalised to Shape• Speech Features – Mel-Cepstral, Linear Predictive Coding (LPC)

Page 8: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Building A Talking Head - Tracking

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Semi Automated• Hand Place Few Frames• Build Interim Shape Model• Track Other Frames• Build Final Shape Model

Page 9: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Building A Talking Head - Learning/Model Building

Active Appearance Model (AAM)-> Shape (PCA) and Texture (PCA)Speech/Appearance Model (SAAM NEW) -> Speech (PCA) and AAM

Nonlinear PCA:• Gaussian Mixture Model (GMM)

Model of Dynamics:• Hidden Markov Model (HMM)

Page 10: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Building A Talking Head - Synthesis + Reconstruction

Input Speech -> Extract Speech Features + Find Best Clusters

Bottom up reconstruction: Mouth Driven

Page 11: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Talking Head Examples

Page 12: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Talking Head Example:Independent Speaker

Page 13: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads

Current Talking Head Analysis Methods

• Subjective Evaluation• Analyse and Compare Trajectories• Improved Perception in Noisy

environments• Forced Choice Testing

Page 14: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking HeadsSubject and Trajectory Evaluation

• Analyse and Compare Trajectories

• Ground truth quantitative assessment

• Comparison to “seen” data

• No perceptual quality measurement

• Subjective Evaluation• Does it “look good”?• No formative comparison• No feedback to improve model

Page 15: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Noisy Environment Evaluation

• Noisy Environment Evaluation• Perceptual Evaluation• Compare Performance of Synthetic v Real Talking Head in realistic situations• Good overall test of talking head

• Lip-syncing, realism• No Quantitative Measure of Performance

Page 16: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Forced Choice Testing

Forced Choice Testing:• Users Asked if Video is Real or Synthetic

• Only says if it looks realistic + lip sync is good• Big Prior Introduced

• Users look for artefacts• Randomness Bias in User selection

• Bored/Uninterested User• No Quantitative Feedback for Model

Improvement• What makes it real/synthetic?

Page 17: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:An New McGurk Test

• McGurk Test for Perceptual Analysis

• Subject doesn’t develop a prior

• Helps address strengths and weaknesses

• Suggests improvements based on these

• Compliments other tests

Page 18: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:The McGurk Effect

MacDonald and McGurk (1976):• Auditory Syllable Dubbed onto Videotape

of Different Syllables Gives Perception of and Entirely Different Syllable, e.g.:• Audio ‘Ba’• Visual ‘Ga’• Perception ‘Da’

• “Close Eyes – Illusion Vanishes”• Raises Psychological Audio-Visual

questions:• How is Auditory and Visual Stimuli combined?• Why combine when audio is enough?

Page 19: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Some More McGurk Effect Examples

Page 20: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking HeadsMcGurk Effect Examples (REAL)

QuickTime™ and aSorenson Video decompressorare needed to see this picture.

QuickTime™ and aSorenson Video decompressorare needed to see this picture.

Page 21: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:McGurk Effect Examples (ANSWERS)

Tuple:Bent/Vest/Vent Tuple:Mat/Dead/Gnat

Page 22: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking HeadsMcGurk Effect Examples (Synthetic)

QuickTime™ and aSorenson Video decompressorare needed to see this picture.

QuickTime™ and aSorenson Video decompressorare needed to see this picture.

Page 23: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads: McGurk Effect Examples (ANSWERS)

Synthetic Examples

Tuple: Fame/Face/Feign Tuple: Mat/Dead/Gnat

Page 24: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Our McGurk Test

McGurk Perceptual Evaluation Test:

• Mix Real and Synthetic tuples.• What word do you perceive?• Users asked to note anything differences

• NO PRIORS as to real/synthetic forced choice• User only asked about they hear/perceive

• Best Viewing resolution• Tested different resolutions (72x75, 36x289,

720x576 pixels)

Page 25: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Our McGurk Experimental Procedure• Mix of Real and Synthetic McGurk Examples

• Real examples are a control• Users Presented with a series of 60 (30 real 30 Synthetic) random

examples• Users asked only to focus on the mouth area• Two initial example “training” sequences (not in trial)• Soundproofed booths with adjustable volume and artificial lighting• Replay option for all example• Users simply record the word they perceive• Users asked three questions after viewing all clips

• “Did you notice anything about the videos that you can comment on?”

• “Could you tell that some of the videos were computer generated?”

• “Did you use the replay button at all?”• 20 psychology undergrad test subjects (4 Male/16 female) with

normal hearing/vision

Page 26: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads: How is Our McGurk Test a Test

• How is this a test?• Correct Lip Synch = McGurk Effect• Incorrect Lip Synch = Audio/Other

• Audio should be dominant• Questions Assess Behaviour/Output

• After test procedure participants asked whether they noticed anything unnatural?

Page 27: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Results

Four Types of Analysis of Results:

• Standard McGurk Response• From tuples form accepted audio and accepted McGurk response• Original McGurk observation

• Enhanced McGurk Response• Assemble a List of All participants McGurk Reponses• Allows for greater variability in accents/articulation• Allows for greater analysis and Improvement of Head Models

• Effects of Resolution on McGurk Effect

• End of Test Questions Analysis• General overall response, qualitative analysis

Page 28: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads: Standard McGurk Response

Page 29: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Enhanced McGurk Response

Page 30: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Image Resolution

Page 31: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:End of Test Questions Results

• “Notice anything to comment on?”Some audio didn’t match video

• “Could you tell some synthetic?”No, 1 participant = some

unnatural?• “Did you use replay?”

Few = once, One = twice

Page 32: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Overall Results Analysis

• Realistic behaviour • Most users were unaware of synthetic output

• More McGurk effects in real output• Points to some weakness in model

• Good Synthesis of /F/, /D/, /S/, /A/ and /E/• Poor Synthesis of /V/

• Some weak real and synthetic McGurk responses• Beige-Gaze-Deige -> 2X Audio v McGurk• Mock-Dock-Knock -> 50:50 Audio:McGurk

• Resolution has effect on real only• Due to overall lower synthetic McGurk response

Page 33: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Conclusions

• Suggested a perceptual approach to analysis and development of a Talking Head• Unbiased by prior forced choice making• Insight into performance of algorithms

• Complements other tests

Page 34: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Perceptual Analysis of Talking Heads:Future Work

• Talking Head• Full Emotion• Performance Driven Animation• 3D Modelling• Full 3D appearance modelling

• Other perceptual tests• Longer videos – McGurk sentences• Real/Synthesised correct lip synch:

McGurk = bad synch?• Emotion – A McGurk emotion test?

Page 35: APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Web Links

• Paper Downloads

www.cs.cf.ac.uk/user/D.P.Cosker/publications.htmlwww.cs.cf.ac.uk/Dave/Publications.html

• McGurk Video Clips and McGurk Test Software (Macromedia Director)

www.cs.cf.ac.uk/user/D.P.Cosker/McGurk/


Recommended