+ All Categories
Home > Documents > Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision...

Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision...

Date post: 29-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
38
EHuM Workshop December, 2006 Leonid Sigal Part I: HumanEva-I dataset and evaluation metrics Leonid Sigal Department of Computer Science Brown University Michael J. Black http://www.cs.brown.edu/people/ls/ http://vision.cs.brown.edu/humaneva/
Transcript
Page 1: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Part I: HumanEva-I dataset and evaluation metrics

Leonid SigalDepartment of Computer Science

Brown University

Michael J. Black

http://www.cs.brown.edu/people/ls/http://vision.cs.brown.edu/humaneva/

Page 2: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Motivation381+ papers in the past ~20 years [D.A. Forsyth]

Real need for a common dataset with ground truth

Models2D, 2.5D, 3DNumber body partsDegrees of freedom per joint…

RepresentationKinematic (skeleton) treePart-based modelsGraphical model…

ShapeCylindersConic cross-section Voxels…

LikelihoodSilhouetteEdges (1st derivative filters)Ridges (2nd derivative filters)Optical flow…

PriorsAction specific articulation priorsTemporal priors…

Inference MethodsDirect optimizationStochastic optimizationParticle filtersHidden Markov ModelsBelief Propagation…

Page 3: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Motivation381+ papers in the past ~20 years [D.A. Forsyth]

Real need for a common dataset with ground truth

Page 4: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Motivation

That will help to address the following questions:

What is the state-of-the art in human motion and pose estimation?

What design choices are important and to what extent?

What are the strengths and weaknesses of different methods?

What are the main unsolved problems?

Page 5: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Similar datasets in other fieldsFace detection (FERET Dataset)

Human gait identification (HumanID Dataset)

Dense stereo vision

Activity Recognition (CAVIAR Dataset)

Pedestrian Classification (DaimlerChrysler Benchmark Dataset)

P.J. Phillips, H. Moon, S.A. Rizvi and P.J. Rauss. “The FERET evaluation methodology for face-recognition algorithms”. PAMI, 2000.

D. Scharstein and R. Szeliski. “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”. IJCV, 2002.

EC Funded CAVIAR project/IST 2001 37540.

S. Sarkar, P. J. Phillips, Z. Liu, I.Robledo, P.Grother and K. W. Bowyer. “The Human ID Gait Challenge Problem: Data Sets, Performance, and Analysis. PAMI, 2005.

S. Munder and D. M. Gavrila. “An Experimental Study on Pedestrian Classification”. PAMI, 2006.

Page 6: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Hardware SetupMotion Capture: Vicon (6 M1 cameras)

Frame rate of 120 fps

Video Capture 1: Spica Tech 4 Pulnix TM6710 camerasSynchronized capture to diskMonochrome, 644 x 448 pixel, progressive scan.Frame rate of 60 fps (120 fps max)Hot-mirror filters (to filter out IR from Vicon)

Video Capture 2: IO Industries3 UniQ UC685CL Synchronized capture to diskColor, 10-bit, 659x494 pixel, progressive scan.Frame rate of 60 fps (110 fps max)

Automated software synchronization. Single world coordinate frame

I don’t recommend this camera!

This one is much better!

y(Thank you to Stan Sclaroff and BU Team)

xz

Page 7: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Data collection and processing

HumanEva-I data is calibrated and software synchronizedCalibration of Mocap systemIntrinsic calibration of video cameras (Fc, Cc, Kc, αc = 0)Extrinsic calibration of video cameras (Rc, Tc)Temporal scaling (Ac)Temporal alignment (Bc,s) (per sequence)

Page 8: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Data collection and processing

HumanEva-I data is calibrated and software synchronizedCalibration of Mocap system

Intrinsic calibration of video cameras (Fc, Cc, Kc, αc = 0)Extrinsic calibration of video cameras (Rc, Tc)Temporal scaling (Ac)Temporal alignment (Bc,s) (per sequence)

Part 1 Part 2

Page 9: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Data collection and processing

HumanEva-I data is calibrated and software synchronizedCalibration of Mocap systemIntrinsic calibration of video cameras (Fc, Cc, Kc, αc = 0)

Focal point – Fc ε R2

Principle point – Cc ε R2

Radial distortion - Kc ε R5

Skew (we assume squared pixels) - αc = 0

Extrinsic calibration of video cameras (Rc, Tc)Temporal scaling (Ac)Temporal alignment (Bc,s) (per sequence)

Based on Caltech Calibration Toolbox for Matlab

Page 10: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Data collection and processing

HumanEva-I data is calibrated and software synchronizedCalibration of Mocap systemIntrinsic calibration of video cameras (Fc, Cc, Kc, αc = 0)Extrinsic calibration of video cameras (Rc, Tc)

Global rotation – Rc ε SO(3)Global translation – Tc ε R3

Temporal scaling (Ac)

Temporal alignment (Bc,s) (per sequence)

Based on Caltech Calibration Toolbox for Matlab

Page 11: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Data collection and processing

HumanEva-I data is calibrated and software synchronizedCalibration of Mocap systemIntrinsic calibration of video cameras (Fc, Cc, Kc, αc = 0)Extrinsic calibration of video cameras (Rc, Tc)Temporal scaling (Ac)Temporal alignment (Bc,s) (per sequence)

Manually mark some visible markers in a few frames

+ use direct optimization

Page 12: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

7 video cameras 4 grayscale 3 color

4 subjects

6 actions each

Each action is repeated 3 times (twice with synchronized MoCap and video and once with MoCap Only)

Page 13: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Walking, Subject - S1

Page 14: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Jogging, Subject - S3

Page 15: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Boxing, Subject – S2

Page 16: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Gestures, Subject – S1

Page 17: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Throw and Catch, Subject – S2

Page 18: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Combo, Subject – S4

Page 19: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Dataset

Training Mocap (~35,000 frames)Synchronized MoCap and Video (~6,800 frames)

ValidationSynchronized MoCap and Video (~6,800 frames)

TestingVideo only (~24,000 frames)Synchronized MoCap is withheldOn-line evaluation (to disallow tweaking of parameters)

Page 20: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Background Subtraction

Background template images are givenSample background subtraction support code

Better background subtraction techniques will be presented today

Color Cameras Grayscale Cameras

Page 21: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Quantitative Evaluation

Average distance between markers corresponding to joints and limb endpoints

where,

M=15

∑∑=

−=∆

M

mM

ii

mmm xxXXD

1 ˆ

ˆˆ)ˆ,ˆ,(

δ

δ

{ }{ }{ } ]1,0[ˆ,ˆ,...,ˆ,ˆˆ

ˆ,ˆ,...,ˆ,ˆˆ,,...,,

21

321

321

∈=∆

ℜ∈=

ℜ∈=

iM

iM

iM

xxxxX

xxxxX

δδδδ

Page 22: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Part II: Performance of APF on HumanEva-I

Leonid SigalDepartment of Computer Science

Brown University

Michael J. BlackAlexandru Balan

http://www.cs.brown.edu/people/alb/http://vision.cs.brown.edu/humaneva/

Page 23: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Benchmark Reference Algorithm

Alexandru Balan, Leonid Sigal and Michael J. Black. “A Quantitative Evaluation of Video-based 3D Person Tracking”. VS-PETS, 2005

Annealed Particle Filtering [Deutscher, Blake & Reid, CVPR’00]

Based on general Bayesian recursive posterior estimation

11:111:1 )|()|()|()|( −−−−∫∝ ttttttttt dpppp XYXXXXYYXrrr

Posterior: probability of pose given image evidence

Temporal prior

Likelihood: probability that pose generated the image

Page 24: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Articulated Body Model

],,[ ,0,0,0 gz

gy

gx τττ

],,[ ,0,0,0 gz

gy

gx θθθ

],,[ 0,10,10,1zyx θθθ

][ 1,2xθ

[Marr&Nishihara ’78]

Represent a “pose” at time t by a vector of all parameters: Xt

40D space

Kinematic tree:

Page 25: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

)|( ttp XYLikelihood

p(bg pixel | limb location and orientation)

[Deutscher, Blake & Reid, CVPR’00]

Page 26: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

)|( ttp XYLikelihood

p(edge filter response | limb edge location and orientation)

[Deutscher, Blake & Reid, CVPR’00]

Page 27: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

)|( 1−ttp XXTemporal PriorPrior can be very simple [Deutscher, Blake & Reid, CVPR’00]

Include constraints via a pose prior (using rejection sampler)

Self-intersection constraintsRange of motion constraints for individual joints (can be learned from MoCap)

Action-specificGeneral

),()|( 11 QNp ttt −− = XXX

Page 28: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Inference using Particle Filtering

[Isard & Blake ’96]

Temporal dynamics

Posterior

Likelihood

)|( 1−ttp XX

)|( ttp XY

)|( ttp YXr

Posterior )|( 11 −− ttp YXr

samplesample

samplesample

normalizenormalize

Page 29: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Annealed Particle Filter

Smooth the likelihood

mttp β)|( XY

Annealing parameter

Page 30: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Conclusions from VS-PETS 2005

Q: How does performance scale with the number of views?A: Works poorly with < 3 views, does not gain significant

benefit from more then 3 views

Q: How does performance scale with the number of particles?A: Exponential [log(N) vs. error = straight line]

Q: How do different choices of likelihoods effect performance?A: Silhouettes are most useful, adding edge features helps

with internal edges

Q: Does annealing help?A: Not as much as we initially thought

Page 31: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Results from VS-PETS 2005

Page 32: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Results from VS-PETS 2005

Page 33: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

HumanEva-I Experiments

We know how to set up APF to produce good tracking performance

Use all 7 viewsInitialize from ground truthUse 250 particles (more is better)5 layers of annealingLikelihood (silhouettes + edges)

Do observations we have made on VS-PETS data generalize to HumanEva dataset

Do action specific priors help and to what extent?

Page 34: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Do action specific priors help?How much? (Maybe the benefits of the general prior outweigh the additional error)

Page 35: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Action-specific prior on walking

Page 36: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

General prior on walking

Page 37: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Other actions …

Page 38: Part I: HumanEva-I dataset and evaluation metricslsigal/Talks/ehum2006sigal.pdfDense stereo vision Activity Recognition (CAVIAR Dataset) Pedestrian Classification (DaimlerChrysler

EHuM Workshop December, 2006 Leonid Sigal

Collaborators

HumanEva-IAlexandru Balan (Brown University) Michael Black (Brown University) Rui Li (Boston University) Payman Yadollahpour (Brown University) Ming-Hsuan Yang (Honda Research Institute)Horst Haussecker (Intel Research)

Annealed Particle FilteringAlexandru Balan (Brown University) Michael Black (Brown University)

EHuM Program Committee Members

All contributors and attendees


Recommended