EMPATH: A Neural Network that Categorizes Facial Expressions
Matthew N. Dailey and Garrison W. CottrellUniversity of California, San Diego
Curtis PadgettCalifornia Institute of Technology
Facial Expression Recognition (Theory 1) Categorical Perception
Categories are discrete entities Sharp categorical boundaries Discrimination of similar pairs of expressive faces is
enhanced when near category boundaries
Facial Expression Recognition (Theory 2) Graded and expressions considered points in a
continuous, low-dimensional space e.g. “Surprise” between “Happiness” and “Fear”
Historical Research (Categorical) Ekman and Friesen (1976) 10-step photos
between pairs of caricatures Ekman 1999 essay on basic emotions Harnad, 1987 Categorical Perception Beale and Keil (1995) morph image
sequence with famous faces Etcoff and Magee (1992) facial expression
recognition tied to perceptual mechanism
Historical Research (Continuous) Schlosberg (1952) category ratings and
subjects “errors” predicted accurately by arranging categories around an ellipse
Russell (1980) structure theory of emotions Russell and Bullock (1986) emotion
categories best thought of as fuzzy sets Russel et al. (1989),Katsikitis (1997),
Schiano et al. (2000) continuous multidimensional perceptual space for facial expression perception
Young et al.’s (1997) “Megamix” Experiments Experiment 1: Subjects identify the emotional category
in 10%, 30%, 50%, 70%, and 90% morphs between all pairs of the 6 prototypical expressions 6-way forced choice identification
Experiment 2: Same as experiment 1 with the addition of the “neutral” face 7-way forced choice
Young et al.’s (1997) “Megamix” Experiments Experiment 3: Discriminate pairs of stimuli along the
six transitions Sequential discrimination task (ABX) Simultaneous discrimination task (same-different)
Experiment 4: Determine what expression is “mixed-in” to a faint morph Given a morph or prototype stimulus, indicate the most
apparent, second-most apparent, and third-most apparent emotion
“Megamix” Experiment Results
Results from experiments 1-3 support the categorical view of facial expression perception
Results from experiment 4 showed that subjects were significantly likely to detect mixed-in emotion at 30%. This supports the continuous, dimentional accounts of facial expression perception
Rather than settling the issue of categorical vs. continuous theories they found evidence to support BOTH theories
Until now, no computational model has ever been able to simultaneously explain these seemingly contradictory data
The Model
Three layer neural network Perceptual analysis Object representation Categorization
Feedforward network (no backpropagation at later levels)
Input is 240 x 292 grayscale face image
Perceptual Analysis Layer Neurons whose response properties are similar to
complex cells in the visual cortex This is modeled by “Gabor Filters” Basically, these units do nonlinear edge detection
at five different scales and eight different orientations
Object Representation Layer Extract small set of features from high dimensional data Equal to an “image compression” network that extracts global
representations of the data Principal components analysis is used to model this layer 50 linear hidden units
Categorization Layer
Simple perceptron with six outputs (one for each “basic” emotion)
The network is set up so that the output can be interpreted as probabilities (i.e. they are all positive and sum to 1)
The Model
Experiments & Results
Same experiments as the Young et al. “Megamix” experiments
Results The model and humans find the same
expressions difficult or easy to interpret When presented with morphs between pairs of
expressions, the model and humans place similar sharp category boundaries between prototypes
The model and humans are similarly sensitive to mixed-in expressions in morph stimuli
More Results Network generalization to unseen faces, compared to
human agreement on the same face (six-way forced choice)
More Results
Conclusion
This model was able to simulate both the categorical and continuous nature of facial classification consistent with the human experiments conducted by Young et al.
Categorical or Continuous? Conclusion leans toward both theories being
complimentary instead of mutually exclusive “tapping different computational levels of
processing” Which method is dictated by the task and the data