Post on 30-Jun-2018
transcript
FOT-Net Data Workshop 2: FOT Data Anonymization
DCode and DMask: Two Approaches for Video Data Anonymization
Based on Work done for FHWA Exploratory Advanced Research - Topic 2A and 2B
Amir Tamrakar, PI
September 1, 2015Site: SAFER Vehicle and Traffic Safety Centre, Gothenburg, Sweden
1
Human Behavior Understanding Lab
2
Multimodal Behavior and
Communication Sensing
Social Interaction Study
and Modeling
FHWA Strategic Highway Research Program -2 (SHRP2)
• SHRP2 was established by Congress to investigate the underlying causes of highway crashes and congestion in a short–term program of focused research.
•
• The objective was to identify countermeasures which will significantly improve highway safety through an understanding of driving behaviors.
• Naturalistic Driving Study (NDS) under the SHRP2 program
– Collected normal driving behavior data
• 3,400+ drivers
• 5,400,000+ Trip
• ~1 Million hours of video data + other metadata
3
FHWA SHRP2 NDS Dataset
• Four camera views + GPS + Lane Trackers + Vehicle Operation Data +
Cellphone records
• Data includes
– Different lighting conditions: day-time, night-time and transitional light
– Different genders, age groups, ethnicities, facial hair, eye wear, head gear, …
4
SHRP2 Raw Video Data
480x354
240x356
360x124 360x124In cabin video
capture hardware
Projects DCode and DMask
• DCode: A Comprehensive Automatic Coding System for Driver Behavior
Analysis
– Need: Way too much data for manual coding!
– Goal: Assist in the automatic coding of features relevant to safety researchers
interested in using the SHRP2 NDS data.
• DMask: A Reliable Identity Masking System for Driver Safety Video Data
– Need: NDS video data is currently only accessible to researchers in secure data
enclaves
– Goal: Generate identity masked video that can be disseminated to a wider
audience
5
Overview
• Driver Anonymization In Video
– Anonymize by Coding Driver Activity (DCode)
– Anonymize by Masking Driver’s Face / Body (DMask)
• Location Anonymization in Video
– Anonymize by Coding Driving Context (DCode)
– Anonymize by Masking Location Identifiable portions of the video
• Only a proposal (not part of our current project)
6
DCode: Technology Concept
• Goal: Assist in the automatic coding of features relevant to safety researchers interested in using the SHRP2 NDS data
• A comprehensive driving behavior study will need to take into account not only the actions and behaviors of the driver but also the “context” in which those actions are performed – Context = everything external to the driver’s person
7
Technical Overview
8
Tier 1 Tier 2 Tier 3
• Lane trackers,
• Accelerometers,
• GPS,
• Cell phone records,
• Vehicle operation data
• Companion Roadway Information Data.
The Comprehensive List of High-Level Coded Features
9
Coded Feature Intermediate Features Used Relation to Safety
Dri
ve
r S
tate
Coded Head Pose
(Where is he looking?)
Quantize continuous head pose angles into
discrete spatial zones in the cockpit
Various uses e.g., for
indentifying distraction,
mirror usage, blind spot
monitoring etc.
Gaze Direction
(More accurate vector)
Gaze monitoring
Driver’s Eye Closure Frequency &
Duration (Blink rate and blink duration)
Eyes monitoring Can indicate state of
alertness or fatigue
Facial Expression (6 canonical ones) Facial expression recognition Can indicate fatigue, road
rage, etc.
Driver Posture: Relaxed, Slouched,
Engaged
Upper body tracking + Head pose tracking Can indicate fatigue,
boredom, nervousness etc.
Safety Belt Usage Seat belt detection + Seat belt wearing
action detection
Safe driving behavior
# of Hands on the Wheel Hand Tracking + Steering Wheel detection Safe driving behavior
Location of Hands on the Wheel Hand Tracking + Steering Wheel detection For air bag related issues
Measure of fatigue Yawning + Rubbing their eyes + Blink rate For impact on safety
Driver Affective State:
Angry/Stressed/Fatigued/
Distracted
Facial expression recognition + body
posture + Eye Monitoring + Measure of
fatigue
For studying impact on
safety
Object of Driver’s Attention
(What is he looking at?)
Head pose + eye gaze tracking + Pedestrian
Tracking + Vehicle Tracking + other objects
detected (cell phone, Sat Nav, billboards)
For understanding causes of
distraction, also applicable
for turning and lane
changing behavior studies.
Task 4
The Comprehensive List of High-Level Coded Features
10
Coded Feature Intermediate Features Used Relation to Safety
Dri
ve
r A
ctio
ns
Putting on/taking off safety belt Driver gesture/action recognition Safe driving behavior
Raising cell phone to the ear Driver gesture/action recognition Potential distraction
Driver Rubbing their Eyes Driver gesture/action recognition Can indicate fatigue
Driver Yawning Mouth Monitoring Can indicate fatigue,
boredom, etc.
Putting on/taking off sunglasses Driver gesture/action recognition Safe driving behavior or
object of distraction
Turning steering wheel:
(measure rate of turn)
Driver gesture/action recognition + steering
wheel detection
Raising/lowering visor Driver gesture/action recognition Safe driving behavior or
object of distraction
Interacting with the Instrument panel
(radio, weather controls, Sat. Navs)
Driver gesture/action recognition +
Steering wheel detection + dashboard
Distraction from driving
Driver Talking on a cell phone (handheld
or hands free)
Mouth moving + cell phone to the ear
action + passenger detection + call info
(positive or negative)
Distraction from driving
Driver Talking to the Passenger Distraction from driving
Drinking from a container Driver gesture/action recognition Potential distraction
Signaling to another driver / pedestrian Driver gesture/action recognition +
pedestrian tracking + vehicle tracking
Communication for safe
driving
The Comprehensive List of High-Level Coded Features
11
Coded Feature Intermediate Features Used Relation to Safety
Dri
vin
g C
on
text
Weather Condition: Atmospherics classification General context
Traffic Density Vehicle tracking + Radar info + roadway
info
General context
Distance to Nearby Vehicles Vehicle tracking + Radar info Maintaining safe distance
or aggressive driving
Pedestrians Crossing the Street (walking
or running hurriedly) Pedestrian tracking + pedestrian action
recognition
Safe interaction with
pedestrians, cause of
frustration, road rage, etc. Pedestrians Loitering on the street
Vehicle Changing Lanes
(normal or aggressive)
Vehicle tracking + vehicle action
recognition
For studying impact of
aggressive vehicle on driver
or identifying driver’s own
aggressive behavior
Current Vehicle Tailgating or another
vehicle tailgating
Vehicle tracking + radar info + roadway info
Vehicle Driving Erratically Vehicle tracking + radar info For studying impact of
nearby vehicle actions on
the driver
Brake lights and Turn signal light states
on Vehicle
Vehicle Tracking + Brake light detection
Vehicle tracking + Turn Signal detection
Screenshot of Our Software Showing Various Codings
12
Driver’s Face Detection and Tracking
13
Core Feature: Head/Face Pose Tracking
14
Task 2.4
Courtesy of HPV metadata documentation from VTTI
This head pose angle
corresponds to the
front facing direction
for this driver.
-40 -30 -20 -10 0 10 20 30 40-40
-30
-20
-10
0
10
20
30
CB Pan angle
He
ad P
ose P
an a
ng
le
Scatter plot of CB and Head Pose Pan angles
-35 -30 -25 -20 -15 -10 -5 0 5-40
-30
-20
-10
0
10
20
CB Tilt angle
Head P
ose T
ilt a
ngle
Scatter plot of CB and Head Pose Tilt angles
Errors:
Pan:
mean =0.99 deg,
std = 2.24 deg
Tilt:
Mean = = -5.32 deg,
std = 4.70 deg
Using Head/Face Pose to Compute 3D Glance Vectors
15
Blue: Landmark points
Red: Glance target points
Box: Legal volume for the
driver’s head
Intermediate Feature: Eyes/Gaze Monitoring
• Eye Blink Detection and Blink-Rate Estimation:
– Currently based solely on the tracked landmark features
16
Task 2.7
Intermediate Feature: Facial Expression Analysis
• Seven standard facial expression classes were trained using the Cohn-
Kanade+ dataset
– Neutral, Angry, Contempt, Disgust, Fear, Happy, Sadness, Surprise
• This dataset only contains frontal faces.
• Thus at evaluation time, we need to rotate the tracked faces in 3d and
project them to a fronto-parallel plane before we can use the trained
classifiers.
• Qualitatively, the only expression that seems to arise in this data is “happy”
when the drivers are chatting with the person in the passenger’s seat.
17
Task 2.8
Core Features : Driver’s Hands and Upper Body Pose Tracking
• Goal: – Track upper body joints
– Jointly from the frontal face view video and the overhead hands view video of the SHRP2 dataset
18
Head
L. Sh.
R. Sh.
L. Elbow
L. Hand
R. Hand
R. Elbow
Chest
Our Skeletal Representation
for
Upper Body Pose Tracking
Our Skeletal Representation
for
Upper Body Pose Tracking
Tasks 2.5 and 2.6
Upper Body Pose Tracking Examples
19
Local Driving Data
Intermediate Features: Driver Gesture/Action Recognition
• 11 gesture/action
categories
20
Action label # of instances
Driving 58
Adjust mirror 30
Drink from cup 66
Look back to back up 83
Put on safety belt 58
Take off safety belt 57
Action label # of instances
Rest arm on window 47
Talk to passenger 90
Touch face 80
Make phone call 84
Put on Glasses 58
Task 2.9
Sample Identity Masked Video From Low-Level Coding
21
This video shows a visualization of the anonymized driver video using only the
the low-level body tracking information including facial landmarks, head pose
and upper body skeleton.
Various Approaches to Identity Masking
22
DMask: Technology Concept
• Completely mask out the
driver’s head with an
overlaid synthetic avatar.
• Keep as much of the driver’s
natural behavioral
information as possible for
downstream safety
researchers.
23
SHRP2 Raw Video Data
480x354240x356
360x124 360x124
Identity Protecting Driver Face Masked Video
DMask: Approach
24
Tracking
Filling-in
Masking
Manual
Assist
Synthesizing Avatars
25
• Why Avatars?
– Guaranteed to cover the whole face!
– Should be able to largely preserve behavioral cues
• eye state, facial expressions, lips moving, mouth opening, head pose and dynamics, even
gaze direction to some extent.
• How good will it be?
– Masking: 100%.
– Behavior Transfer: As good as the tracking is.
Avatar
Rendering
Facial Motion
SynthesisVideo Video
Facial Motion Transfer
Modeling the Face and Facial Motion (For Transfer)
• It can’t be too complicated.
– The avatar parameters will very quickly overwhelm the tracker.
• It will be harder to learn the mapping
• Also it may be too brittle since there will be no information regarding some parts of the
face but too much information for other parts.
– Tracker noise will also feature prominently on the rendering.
• But it can’t be too simple either
– None of the facial motion will transfer
• Rigid mask face problem
• Therefore, we’ve decided to stick with the fronto-parallel 2d point set
model instead of venturing into 3d points.
26
Our Face Model
27
• Find a close mapping between the tracked landmark points on the
image and a set of mesh vertices on the Avatar’s 3d mesh.
Facial Motion TransferStep 1: Find a Mapping
28
• Map the set of tracked landmark points from a single frame of video to an
optimal set of blend shape weight parameters to align the two faces using an
optimization approach
Reference Neutral Face
BN
Reference Target Face
AN
FACS-like
Some Close-Up Sample Results
29
Facial Motion Transfer Step 2: Learn the Mapping (Learn to Transfer)
• Key Point:– Features are defined as
normalized deviations from a reference neutral face.
– Separate Regression model for each dimension of W using Support Vector Regression (SVR)
30
Using the Learnt Model to Transfer Facial Expression/Motion
31
Example Rendering
32
This video shows the motion-transferred virtual avatar rendered over the
original video.
FHWA NDS Video Data (Identity Masked, of course!)
33
Practical Problems:Natural Questions that Arise
• Will there be more than one model?
• Where does one get these models from ?
• How does one select the most appropriate model for each driver?
• How much overhead will there be for adding a new model ?
• How does one deal with/ work around a person’s hair ?
• How does one handle head gear, eyewear, and other accessories?
• How do we judge whether the masking is good enough?
• How much slop is allowed?
34
Our Tools
• Make Human :
– For defining/designing a diverse set of models.
– Generate different models by specifying parameters like age, gender, race, skin
color etc. as well as extraneous parameters like hair style, hair color etc.
– Real Advantage: The mesh topology will remain largely constant (even more so in
the future) thus reducing burden to re-training motion synthesis.
• Blender :
– For defining and training motion synthesis.
– Allows us to define Blend Shapes (Shape targets/ morph targets) for
parameterizing facial movements.
• Unity Rendering Engine :
– Efficient rendering over the video
– Can use all of the models derived from the previous two software.
35
Some Avatars Generated from MakeHuman
36
Performance of Facial Feature Tracking
• The success of facial motion transfer and identity masking is totally dependent on the accuracy of the facial feature tracking.
• This graphic summarizes out current level of performance on the SHRP2 HPV dataset.
37
Analysis of Problem Cases
• Thick rimmed glasses
• Sunglasses
• Large pan angles (heads looking away from the camera where one side of the face is almost all or fully occluded, i.e. profile faces)
• Beards (especially Goatees)
• Shadows/ highlights (due to the sun light) falling across the face.
38
Going Beyond The Face
• Preserving the driver’s hands
• Computing alpha masks of driver’s hands and body (clothing)
– For future systems, it would be easier to do this using 3d sensors like the Kinect
and Intel RealSense.
39
3d depth imagery
Anonymizing Location
• It is not very hard these days to determine the exact location of this place
using crowdsourcing on the web.
• Therefore there is a clear need to anonymizing location information in
external videos.
40
Masking Location Information In NDS Video
• Some parts of the driver video, e.g looking out of the driver’s side window,
will also show the external scenery that will need to be masked out.
• This is a tractable problem
– Integration of Optic flow information over the video allows us to localize the
windows
– Also needed for isolating the driver’s foreground mask so that driver data is not
erased.
41
Pedestrian/Vehicle Detection and Tracking
• Can track vehicles, pedestrians, motorbikes, bicycles in front view video– Can classify class of a tracked vehicle (sedan, SUV, trucks, etc.)
• Can approximately localize the surrounding vehicles from video– Better with more processing
– Better with radar information
• Can detect and track lane markers– Better with roadway information database and GPS locations
• Can detect road signs/ traffic lights
• Can detect brake lights/turn signals
42
Proposal for Anonymizing The External Video
• Similar to case with the driver’s video, we can just represent the video
using the tracked metadata.
• We can also render a virtual scene from all of the above mentioned tracking
information about the external scenery without showing any of the original pixels
from the video.
• Is this enough?
• What I’m interested in learning from safety researchers:
– How accurately does the external information need to be preserved?
43
Recap
44
DCode: Anonymization by Coding
DMask: Anonymization by Masking