Juergen Gall Analyzing Human Behavior in Video...

Post on 20-Apr-2020

1 views 0 download


Juergen Gall

Analyzing Human Behavior in

Video Sequences

Analyzing Human Behavior


Low level features, e.g., gradients, optical flow

Analyzing Human Behavior

Human Pose

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 2

21 Actions from HMDB

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 3

928 clips, 33183 frames

HMDB51 (Kuehne et al, ICCV 2011)

Puppet Annotation

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 4

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 5

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Study with Annotated Data (2013)

• Large potential gain for pose feature

• Not with existing 2d human pose methods

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 6

given flow

+ ~11%

given mask

+ ~9%

pose features

+ ~20%

baseline given puppet flow given puppet mask given joint positions

Low Mid High



[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

CNNs for Pose Estimation

Stack CNNs:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 7

[ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]

Coupled Action Recognition and Pose


09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 16

[ U. Iqbal et al. Pose for Action – Action for Pose. FG 2017 ]

Pose Estimation in Videos

Video datasets for human pose in unconstrained videos

does not exist.

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 18

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Estimation in Videos

Video datasets for human pose in unconstrained videos

does not exist.

Unconstrained means

• Public available content from the Internet (e.g.


• Multiple persons in a video (no assumption about


• Arbitrary number of visible joints (truncation and


• Large scale variations (unknown scale)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 19

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 20

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 22

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 23

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 24

Challenge ICCV 2017

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 25

[ http://posetrack.net/workshops/iccv2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 26

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 27

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

• Build a graph with temporal and spatial edges

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 28

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 29

f’f f’’f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 30

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Unaries: Confidences of detected joints

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 31

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Spatial binaries: Extract quadratic bounding box around


Two cases:

• Different joint type:

• Logistic regression based on distance and orientation

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 32

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Spatial binaries: Extract quadratic bounding box around


Two cases:

• Same joint type:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 33

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 34

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Temporal binaries: Compute optical flow (DeepMatching)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 35

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Temporal binaries: Compute optical flow (DeepMatching)

Logistic regression:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 36

f’f[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 37

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 38

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 39

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 40

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

• Temporal transitivity:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 42

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

• Temporal transitivity:

• Spatio-temporal trans.:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 43

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 44

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

Spatio-temporal consistency:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 45

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

• Build a graph with temporal and spatial edges

• Partition spatio-temporal graph

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 46

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 47

Pose Track: Evaluation

• Pose estimation accuracy (mAP)

• Person association (MOTA)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 48

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Pose Track: Evaluation

• Pose estimation accuracy (mAP)

• Person association (MOTA)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 49

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 50

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Video Analysis for Studying the

Behavior of Mice

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 72

Recurrent Neural Networks

• Gated units (LSTM/GRU)

10 /9 /20 17 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 74

Weakly Supervised Learning

• Fully supervised:

• Weakly supervised (transcripts)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 75

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 77

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 78

• Represent an activity a like “spoon_powder” by latent

sub-activities s1(a) ,s2


• Optimal number of sub-activities is unknown:

• Many sub-activities for long activities

• Few sub-activities for short activities

s1(a) s2

(a) s3(a) s4

(a) s5(a) s6



• RNN with Gated Recurrent Units (GRU)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 79


• Hidden Markov Model (HMM) enforce fixed order of

sub-activities: s1(a) ,s2


• HMMs use probabilities of RNN as input

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 80


• Hidden Markov Model (HMM) for each activity

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 81


• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 82


• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 83


• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 84

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 85

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 86

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 87

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 88

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 89

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]


09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 90


• Accuracy on unseen sequences (video without


09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 91

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]


• Accuracy on unseen sequences (video without


09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 92

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]


• Accuracy on unseen sequences (video with


09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 93

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Research Unit - Anticipating Human


20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 94

[ https://pages.iai.uni-bonn.de/FOR2535 ]

Research Unit - Anticipating Human


20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 95

Thank you for your attention.

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 96