APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall,...

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.APGV04

Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk

David Marshall, Darren Cosker and Paul Rosin

Cardiff School of Computer Science

Susan Paddock and Simon Rushton

Cardiff School of Psychology

Cardiff University



Context: A Talking Head• Development of a Video-Realistic Talking

Head• Animation from Continuous Speech• Perceptual Analysis -> Realism

QuickTime™ and aCinepak decompressor

are needed to see this picture.



Contribution of this Paper: Perceptual Realism Test

• Perceptual Analysis via McGurk Test• Perceptual Test with no prior bias• Used to improve talking head

synthesis

QuickTime™ and aSorenson Video decompressorare needed to see this picture.



Outline of Talk

• Video Realistic Talking Head (Overview)

• Perceptual Analysis and Testing• The McGurk Effect + McGurk Test• Results : Implications of McGurk• Conclusions + Future Work



Our Talking Head

• Image based synthesis• Continuous Speech• Flexible framework – emotion, behaviour

BASIC IDEA:• Train on input video and audio

• Extracting only low level image and audio features• No phonetic labelling

• Synthesise new video using only input audio• Unseen utterances• Speaker Independent



Hierarchical Facial Model

• Active Appearance Models – Control of shape and texture using single ‘appearance parameter’

• Based on Principal Component Analysis (PCA)

• Non-linear Hierarchical PCA (developed at Cardiff)

• Greater Separation of Variation

• High Degree of Control – Sub-Facial variation not orthogonal in standard PCA model

• Coupling of Speech Model (Cardiff Idea)



Building A Talking Head - Initialisation

For Each Video Frame Extract:• Shape – Key Landmark Points (Tracker Helps)• Textures – Colour Pixel Values Normalised to Shape• Speech Features – Mel-Cepstral, Linear Predictive Coding (LPC)



Building A Talking Head - Tracking

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Semi Automated• Hand Place Few Frames• Build Interim Shape Model• Track Other Frames• Build Final Shape Model



Building A Talking Head - Learning/Model Building

Active Appearance Model (AAM)-> Shape (PCA) and Texture (PCA)Speech/Appearance Model (SAAM NEW) -> Speech (PCA) and AAM

Nonlinear PCA:• Gaussian Mixture Model (GMM)

Model of Dynamics:• Hidden Markov Model (HMM)



Building A Talking Head - Synthesis + Reconstruction

Input Speech -> Extract Speech Features + Find Best Clusters

Bottom up reconstruction: Mouth Driven



Talking Head Examples



Talking Head Example:Independent Speaker



Perceptual Analysis of Talking Heads

Current Talking Head Analysis Methods

• Subjective Evaluation• Analyse and Compare Trajectories• Improved Perception in Noisy

environments• Forced Choice Testing



Perceptual Analysis of Talking HeadsSubject and Trajectory Evaluation

• Analyse and Compare Trajectories

• Ground truth quantitative assessment

• Comparison to “seen” data

• No perceptual quality measurement

• Subjective Evaluation• Does it “look good”?• No formative comparison• No feedback to improve model



Perceptual Analysis of Talking Heads:Noisy Environment Evaluation

• Noisy Environment Evaluation• Perceptual Evaluation• Compare Performance of Synthetic v Real Talking Head in realistic situations• Good overall test of talking head

• Lip-syncing, realism• No Quantitative Measure of Performance



Perceptual Analysis of Talking Heads:Forced Choice Testing

Forced Choice Testing:• Users Asked if Video is Real or Synthetic

• Only says if it looks realistic + lip sync is good• Big Prior Introduced

• Users look for artefacts• Randomness Bias in User selection

• Bored/Uninterested User• No Quantitative Feedback for Model

Improvement• What makes it real/synthetic?



Perceptual Analysis of Talking Heads:An New McGurk Test

• McGurk Test for Perceptual Analysis

• Subject doesn’t develop a prior

• Helps address strengths and weaknesses

• Suggests improvements based on these

• Compliments other tests



Perceptual Analysis of Talking Heads:The McGurk Effect

MacDonald and McGurk (1976):• Auditory Syllable Dubbed onto Videotape

of Different Syllables Gives Perception of and Entirely Different Syllable, e.g.:• Audio ‘Ba’• Visual ‘Ga’• Perception ‘Da’

• “Close Eyes – Illusion Vanishes”• Raises Psychological Audio-Visual

questions:• How is Auditory and Visual Stimuli combined?• Why combine when audio is enough?



Perceptual Analysis of Talking Heads:Some More McGurk Effect Examples



Perceptual Analysis of Talking HeadsMcGurk Effect Examples (REAL)





Perceptual Analysis of Talking Heads:McGurk Effect Examples (ANSWERS)

Tuple:Bent/Vest/Vent Tuple:Mat/Dead/Gnat



Perceptual Analysis of Talking HeadsMcGurk Effect Examples (Synthetic)





Perceptual Analysis of Talking Heads: McGurk Effect Examples (ANSWERS)

Synthetic Examples

Tuple: Fame/Face/Feign Tuple: Mat/Dead/Gnat



Perceptual Analysis of Talking Heads:Our McGurk Test

McGurk Perceptual Evaluation Test:

• Mix Real and Synthetic tuples.• What word do you perceive?• Users asked to note anything differences

• NO PRIORS as to real/synthetic forced choice• User only asked about they hear/perceive

• Best Viewing resolution• Tested different resolutions (72x75, 36x289,

720x576 pixels)



Perceptual Analysis of Talking Heads:Our McGurk Experimental Procedure• Mix of Real and Synthetic McGurk Examples

• Real examples are a control• Users Presented with a series of 60 (30 real 30 Synthetic) random

examples• Users asked only to focus on the mouth area• Two initial example “training” sequences (not in trial)• Soundproofed booths with adjustable volume and artificial lighting• Replay option for all example• Users simply record the word they perceive• Users asked three questions after viewing all clips

• “Did you notice anything about the videos that you can comment on?”

• “Could you tell that some of the videos were computer generated?”

• “Did you use the replay button at all?”• 20 psychology undergrad test subjects (4 Male/16 female) with

normal hearing/vision



Perceptual Analysis of Talking Heads: How is Our McGurk Test a Test

• How is this a test?• Correct Lip Synch = McGurk Effect• Incorrect Lip Synch = Audio/Other

• Audio should be dominant• Questions Assess Behaviour/Output

• After test procedure participants asked whether they noticed anything unnatural?



Perceptual Analysis of Talking Heads:Results

Four Types of Analysis of Results:

• Standard McGurk Response• From tuples form accepted audio and accepted McGurk response• Original McGurk observation

• Enhanced McGurk Response• Assemble a List of All participants McGurk Reponses• Allows for greater variability in accents/articulation• Allows for greater analysis and Improvement of Head Models

• Effects of Resolution on McGurk Effect

• End of Test Questions Analysis• General overall response, qualitative analysis



Perceptual Analysis of Talking Heads: Standard McGurk Response



Perceptual Analysis of Talking Heads:Enhanced McGurk Response



Perceptual Analysis of Talking Heads:Image Resolution



Perceptual Analysis of Talking Heads:End of Test Questions Results

• “Notice anything to comment on?”Some audio didn’t match video

• “Could you tell some synthetic?”No, 1 participant = some

unnatural?• “Did you use replay?”

Few = once, One = twice



Perceptual Analysis of Talking Heads:Overall Results Analysis

• Realistic behaviour • Most users were unaware of synthetic output

• More McGurk effects in real output• Points to some weakness in model

• Good Synthesis of /F/, /D/, /S/, /A/ and /E/• Poor Synthesis of /V/

• Some weak real and synthetic McGurk responses• Beige-Gaze-Deige -> 2X Audio v McGurk• Mock-Dock-Knock -> 50:50 Audio:McGurk

• Resolution has effect on real only• Due to overall lower synthetic McGurk response



Conclusions

• Suggested a perceptual approach to analysis and development of a Talking Head• Unbiased by prior forced choice making• Insight into performance of algorithms

• Complements other tests



Perceptual Analysis of Talking Heads:Future Work

• Talking Head• Full Emotion• Performance Driven Animation• 3D Modelling• Full 3D appearance modelling

• Other perceptual tests• Longer videos – McGurk sentences• Real/Synthesised correct lip synch:

McGurk = bad synch?• Emotion – A McGurk emotion test?



Web Links

• Paper Downloads

www.cs.cf.ac.uk/user/D.P.Cosker/publications.htmlwww.cs.cf.ac.uk/Dave/Publications.html

• McGurk Video Clips and McGurk Test Software (Macromedia Director)

www.cs.cf.ac.uk/user/D.P.Cosker/McGurk/

Date post:	01-Apr-2015
Category:	Documents
Upload:	luc-rimes
View:	218 times
Download:	4 times

APGV04 Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall,...

Documents