Human Emotion Synthesis David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson,...

transcript

Human Emotion Synthesis

David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson, Barry Thomas

University of Bristol, Motion Ripper, 3CR Research

Synthesising Facial Emotions – University of Bristol – 3CR Research

Project Group

• Motion Ripper Project

– Methods of motion capture.– Re-using captured motion signatures.– Synthesising new or extend motion sequences.– Tools to aid animation.

• Collaboration between University of Bristol CS, Matrix Media & Granada.

Introduction

• What is an emotion?

• Ekman outlined 6 different basic emotions.– joy, disgust, surprise, fear, anger and sadness.

• Emotional states relate to ones expression and movement.

• Synthesising video footage of an actress expressing different emotions.

Video Textures

• Video textures or temporal textures are textures with motion. (Szummer’96)

• Schodl’00, reordered frames from the original to produce loops or continuous sequences.

– Doesn’t produce new footage.

• Campbell’01, Fitzgibbon’01, Reissell’01, used Autoregressive process (ARP) to synthesis frames.

Examples of Video Textures

Autoregressive Process

• Statistical model

• Calculating the model involves working out the parameter vector (a1…an) and w.

• n is known as the order of the sequence.

y(t) = – a1y(t – 1) – a2y(t – 2) – … – any(t – n) + w.ε

Parameter vector (a1,…,an) Noise

Current value at time t

• Statistical model

• Increasing dimensionality of y drastically increases the complexity in calculating (a1…an).

y(t) = – a1y(t – 1) – a2y(t – 2) – … – any(t – n) + w.ε

PCA analysis of Sad footage in 2D

Secondary mode

Primary mode

• Principal Components Analysis is used to reduce number of dimensions in the original sequence.

PCA analysis of Sad footage in 2D Generated sequence using an ARP

Secondary mode Secondary mode

Primary mode Primary mode

• Non-Gaussian Distribution is incorrectly modelled by an ARP.

Face Modelling

• Campbell’01, synthesised a talking head.

• Cootes and Talyor’00, combined appearance model.– Isolates shape and texture.

• Requires labelled frames.– Must label important features

on the face.

Labelled points

Combined Appearance

Shape space

Hand Labelled video footage provides a point set which represents the shape space of the clip.

Combined Appearance

Shape space Texture space

Warping each frame into a standard pose, creates the texture space.

The standard pose is the mean position of the points.

Combined Appearance

Combined spaceCombined space

Joining the shape and texture space and then re-analysing using PCA produces the combined space.

Combined Appearance

Combined space

Reconstruction of the original sequence from the combined space.

Combined spaceCombined space

Secondary mode

Primary mode

Combined Appearance

Combined Appearance sequence

Original sequence in 2D

Secondary mode

Primary mode

Change in distribution after applyingThe combined appearance technique

Secondary mode

Primary mode

Combined Appearance

Generated SequenceOriginal sequence

Secondary mode

Primary mode

ARPmodelARP

• Visually the generated plot appears to have been generated using the same stochastic process as the original.

Copying and ARP

• Combine the benefits of copying with ARP– New motion signatures.– Handles non-Gaussian distributions.

Copying and ARP

Original inputOriginal input

Reduced inputReduced input

PCAPCA

• Important to reduce the complexity of the search process.• Need around 30 to 40 dimensions in this example.

Copying and ARP

Segmented inputSegmented inputPCAPCA Reduced segmentsReduced segmentsPCAPCA

• Temporal segments of between 15 to 30 frames.• Need to reduce each segment to be able to train ARP’s.

Copying and ARP

Segmented inputSegmented input Reduced segmentsReduced segmentsPCAPCA PCAPCA

ARPARP

Synthesised segmentsSynthesised segments

• Many of the learned models are unstable.• 10-20% are usable.

Copying and ARP

Segmented inputSegmented input Reduced segmentsReduced segmentsPCAPCA PCAPCA

ARPARP

Synthesised segmentsSynthesised segmentsSegment selectionSegment selection

Outputted SequenceOutputted Sequence

Example

First mode

Time t

End of generated sequence.

Possible segments.

Compared section

First mode

Time t

Example

Closest 3 segmentsare chosen.

First mode

Time t

Example

The segment to be copied is randomly selected from the closest 3.

First mode

Time t

Example

Segments are blended together using a small overlap and averaging the overlapping pixels.

Secondary mode

Primary mode

Secondary mode

Primary mode

Copying& ARPmodel

PCA analysis of Sad footage in 2D

Generated sequence

Copying and ARP

• Potentially infinitely long.• Includes new novel motions.

Results (Angry)

Source Footage Copying with ARPCombined Appearance ARP

• Combined appearance produces higher resolution frames.

• Better motion from the copying and ARP approach

Results (Sad)

Source Footage Copying with ARPCombined Appearance ARP

• Similar results as with the angry footage– Copied approach is less blurred due to the reduced variance.

Comparison Results

- Combined appearance - Segment copying

• Simple objective comparison.– Randomly selected temporal segments.

Comparison

• Perceptually is it better to have good motion or higher resolution.

Combined appearance Segment Copying with ARP

Other potential uses

• Self Organising Map

• Uses combined appearance– as each ARP model provides a

minimal representation of the given emotion.

• Can navigate between emotions to create new interstates.

Angry Sad Happy

Conclusions

• Both methods can produce synthesised clips of a given emotion.

• Combined appearance produces higher definition frames.

• Copying and ARPs generates more natural movements.

Questions

Human Emotion Synthesis David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson,...

Documents