Date post: | 10-Jul-2015 |
Category: |
Technology |
Upload: | debmalya-sinha |
View: | 234 times |
Download: | 0 times |
Emotion Recognition
F e l i c i t o u s C o m p u t i n g I n s t i t u t e visit to
Deb * Mal * Ya Sin * Ha
Means of Emotion Recognition
Speech
Gestures
Facial Feature
Heartbeat
Skin ConductanceBrain Imagery
Movement Features
Hand Writing
Dance
Valance
Aro
usa
lHigh
High
Low
Low
Anger
LoveSadness
Joy
Social Presence
Bio Chemical
Eye & pupil
Active User Participation
Speech
Gestures
Facial Feature
Heartbeat
Skin ConductanceBrain Imagery
Movement Features
Hand Writing
Dance
Valance
Aro
usa
lHigh
High
Low
Low
Anger
LoveSadness
Joy
Social Presence
Bio Chemical
Eye & pupil
•Users usually have to consciously take part in it•Chances of suppressing emotive cues•Chances of showing inverted affective state•Tend to be biased
Passive User Participation
Speech
Gestures
Facial Feature
Heartbeat
Skin ConductanceBrain Imagery
Movement Features
Hand Writing
Dance
Valance
Aro
usa
lHigh
High
Low
Low
Anger
LoveSadness
Joy
Social Presence
Bio Chemical
Eye & pupil
• Users do not need to take part actively• Users have much less control over the results• Less attention and control means less bias
Active PassiveTransition
Other two Key Factors for the Transition:1. Device invisibility2. Sensor distance
Over time users tend to get familiarized•The actions slowly gets more passive•Controlling tendency decreases•Faking of emotion decreases
Device invisibility
Instead of
An invisible tracker/sensor device sits in the background and do work for the users
Problem of devices with direct contact:
• In various situations users will not have the mindset to actively engage (Trauma, Sadness, Forlorn)• Sensor Distance is important in these situations
MoodScope
Sensor Distance
• Facial Feature
• Heartbeat
• Skin Conductance
• Brain Imagery
• Movement and Gestures
• Bio Chemical
• Eye & pupil
With External
Attachments
• No body attachments• Large Sensor Distance• Can be operated more passively• Much more unconscious participation• Bias can be more minimized
Without
Attachments
Modes of Passive Recognizers
Passive, Ambient Sensors
• Facial Feature
• Eye & pupil
• Movement and Gestures
needs
• Focus• Facing to camera• A degree of attachment to
sensors
In many cases where:1. the face is not visible2. there is no provision for attaching
sensors to body3. there is no speech input
The movement and gesture detection is much more feasible to detect affect
Movements and Gestures: A scenario
Situations where body movements and gestures are crucial:
1. A Post Traumatic Stress Disorder (PTSD) patient pacing in the room. 2. A schizophrenic patient at an asylum is going impatient and angry and doing frivolous, jerky movements.3. A patient of Chronic Depression is seen pacing slowly, hands in pocket, head drooping.
An Automated system that detects emotive states in such situations, can even save lives.
HaiXiu -害羞
Records gestures and movement
Comes up with unique feature set
Trains a Neural Net for later detection
Continuous Emotion
Detection
HaiXiu -害羞
• Microsoft Kinect™ for movement detection
• Rather than discreet affective states, our targetis to detect Arousal and Valence Levels incontinuous space.
• This model of continuous affective leveldetection can be implemented with othercontinuous affective spaces. e.g: Plutchik’sEmotion Wheel, PAD model
• Presently HaiXiu detects only Arousal levels.Work is going on to include the Valence level.
Valance
Aro
usa
l
High
High
Low
Low
Anger
LoveSadness
Joy
Feature Set for Arousal level detection
Kinect gives us 20 Different Joint position data
We Calculate:
1. Minimum coordinates for X , Y and Z axis (Relative to spine)2. Maximum coordinates for X , Y and Z axis (Relative to spine)3. Speed = Δs/Δt4. Peak Acceleration = Δu/Δt5. Peak Deceleration = - Δu/Δt6. Average Acceleration = (Σ (Δu/Δt))/f7. Average Deceleration = - (Σ (Δu/Δt))/f8. Jerk Index = (Σ (Δa/Δt))/f
Δt = 0.2 second; f = total time / Δt
Training the Neural Net
Initially we took 20 movement features (without the position features) and told 2 subjects to walk in various arousal levels. We measured Speed, Accel, Decel, JerkIndex for upper body joints.
Type: Bipolar Feedforward ANNLayers: 3 (20 : 6: 1) Learning: Backpropagation LearningSample Size: 34 Walks (in different arousal levels) of 2 subjectsError Limit of learned Net: 0.0956
DetectionThe ANN outputs one variable for Arousal Level
The output range is from -1 (totally relaxed) to +1 (Very Aroused)
Challenges
1. Short working range of Kinect : .8m to 4.0m 2. Shorter than the range needed in practical scenarios
3. Data not consistent enough for precise movement feature Calculation4. Fault Tolerance in case of recording and detection is needed.5. Kinect does not follow BVH format thus available gesture databases in BVH
can not be natively used without a converter module (less efficiency)
Next Step
1. Introducing the Position CoOrdinates2. Fine tune the Arousal level recognizer3. A Robust Gesture recognition module4. Building a Valence recognizer module5. Getting more test data with more number of subjects6. Multiple Kinect integration for better recognition
7. A slightly better user interface
Valance
Aro
usa
lHigh
High
Low
Low
Anger
LoveSadness
Joy
Integrated Emotion detection
1. Every one of the modes of recognition have their merits2. There are a plethora of existing facial expression detectors like “affectiva”3. Speech based emotion recognition has also been extensively done4. MoodScope has changed the smartphone based affect detection5. Powerful tools like AmbientDynamix makes integration of various sensor
inputs ease for processing and using in small devices like a smartphone
+
Thank You