Multi-camera detection, tracking
and re-identification
Andrea Cavallaro
youtube.com/smartcameras
twitter.com/smartcameras
Disjoint fields of view
Wireless camera networks
Disjoint fields of view
Wireless camera networkshow to effectively scale up to networks of 10s, 100s cameras?
how to compensate for the absence of observations?
AcknowledgmentsRiccardo MazzonSyed Fahad TahirChristian Nastasi
Introduction
In-bodyhealth medicine
Scale
Personalvideo conferencinggaming
Homeenergy efficiencydomotics
Buildingservice optimisat.security
Facilitygoods trackingsafety
Citytraffic controlplanning
Inter-citytraffic monitoringtransit control
In the wild fauna monitoringfire detection
A. Cavallaro
Data sharing strategies
Centralized Decentralized Distributed
Distributed and decentralized multi-camera trackingM. Taj, A. CavallaroIEEE Signal Processing Magazine, Vol. 28, Issue 3, May 2011
Fields of view
A. Cavallaro
Disjoint fields of view: re-identification
Person re-identification in crowd R. Mazzon, S.F. Tahir, A. Cavallaro Pattern Recognition Letters, to appear, 2012
Multi-camera tracking using a Multi-Goal Social Force ModelR. Mazzon, A. CavallaroNeurocomputing, to appear, 2012
2
1
What is the problem?
• ObjectiveTo estimate target correspondencegiven two sets of object observations in two disjoint camera views
A. Cavallaro
Prior work: 4 phases
• Person detection– Classifier or motion detection or combination
• Feature extraction– Colour, texture, shape and their combination– Support
• single instance of an object • grouping object features over time (requires intra-camera tracking)
• Calibration– cross-camera colour calibration– spatio-temporal calibration
• spatial relationships among cameras• entry/exit points in the fields of view• travel time across cameras
• Association
Challenge 1: movement
• (Large) variability of – travel time – entrance point in the second field of view
A. Cavallaro
Challenge 2: pose and appearance
A. Cavallaro
Main ideas
• Problems in real scenarios – object matching (alone) is inaccurate– (naïve) temporal prediction is not enough
• Solution– crowd (motion prediction) modeling for trajectory propagation– integrate information from a map of the scene– appropriate object representation that works in crowd
A. Cavallaro
Re-identification
Proposed approach
Detection TrackingHuman motion
prediction
Top view
Detection TrackingHuman motion
prediction
A. Cavallaro
Motion prediction in uncovered areas
Detection TrackingHuman motion
prediction
Top view
Detection TrackingHuman motion
prediction
Re-identification
A. Cavallaro
Motion prediction approaches
• Multi-Goal Social Force Model (MGSFM)– models attractive and repulsive forces – parameter based– used for crowd simulation [Helbing2000]
• Landmark-based method (LBM)– defines the position of points of interest (landmarks)– movements constrained to pass through landmarks
A. Cavallaro
Multi-Goal Social Force Model
1
3
1
3
2
2
4
0iv
Video: http://www.youtube.com/watch?v=3AIr92YPY94A. Cavallaro
Landmark-Based Method
a5
a4a3
a2
a1
a8
a7
a6
b1
b3 b4 b5 b6 b7b2
b10
b9
b8
b11
b12
C1
C2
Crossing landmark
Entry landmark
Observed trajectory
0iv
Movement propagation
A. Cavallaro
• How are people re-identified in the second field of view?– match between predicted positions and current detections– temporal cue: +/- delay between time step of predicted
reappearance and detections (time window)– spatial cue: distance between predicted reappeared position and
detected position
Re-identification
A. Cavallaro
Re-identification
A. Cavallaro
How about appearance?
• Appearance description [Gray2008, Prosser2010, Zheng2011]– colour features
R G B H S Y Cb Cr
Colour channelsObject Patch
A. Cavallaro
How about appearance?
• Appearance description [Gray2008, Prosser2010, Zheng2011]– colour features– texture features
Schmid filters
Y-Channel
Gabor filters
Y-Channel
A. Cavallaro
Appearance support in crowd
w
h 2h
w/2
2h
h/4
w/4
w w/2
2h
A. Cavallaro
Comparison (London Gatwick dataset) motion prediction + appearance
appearanceonly
[Gray2008][Prosser2010]
[Zheng2011]
Influence of motion prediction
Multi-Goal Social Force ModelLandmark-Based Method
Expected traveling time model (regions)
Expected traveling time model
Influence of appearance
Spatial + Temporal + Appearance
Spatial + Temporal
Tracking in wireless camera networks
Distributed target tracking under realistic network conditionsC. Nastasi, A. CavallaroProc. of Sensor Signal Processing for Defence (SSPD), London, 28-29 September, 2011
WiSE-MNet: an experimental environment for Wireless Multimedia Sensor Networks C. Nastasi, A. CavallaroProc. of Sensor Signal Processing for Defence (SSPD), London, 28-29 September, 2011
What is the problem?
• ObjectiveContinuous estimation of the target state given a set of measurements (observations) obtained from spatially distributed sensing nodes
Measurements
State estimation
)zzz( Z Nk
2k
1kk
), x, Zf(Zx :k-:kkk 1011
1kz
2kz
3kz
4kz
A. Cavallaro
Distributed tracking: strategies
consensus aggregation
start
estimate
• Distributed target tracking– need a collaborative information exchange mechanism– consensus-based algorithms
• Parallel (e.g. Kalman Consensus Filter [Olfati-Saber2005], Distributed Particle Filters [Gu2007])
– data aggregation algorithms • Sequential (e.g. Distributed Particle Filters [Hlinka2009])
A. Cavallaro
Distributed Particle Filters (DPFs)
• Basic ideas:– each node executes a local Particle Filter (PF)– measurements are synchronized, calibration is known– some information is exchanged
• Likelihood sharing [Coates2004]– exchange information to have a common model of the likelihood– random number generators are synchronized
• Posterior sharing– the network has a common knowledge of the posterior pdf– consensus-based approach [Sheng2005, Gu2007]– aggregation-based approach [Sheng2005, Hlinka2009]
• spatial sequence of aggregation steps• Partial Posterior (PP) is exchanged among the nodes
A. Cavallaro
)Z|f(x )z ,Z|f(x kk4:1
k1k:1k
)z ,Z|f(x 3:1k1k:1k
)z ,Z|f(x 2:1k1k:1k
Aggregation-based DPF
)z ,Z|f(x 1k1k:1k
1kz
2kz
3kz
4kz
start estimate
Problem: Particle dissemination is not feasible!Solution: Gaussian Mixture Model of the Partial Posterior (GMM-PP)
Independence from the # of particles
A. Cavallaro
Proposed approach
• Goal– Distributed tracking under realistic conditions in wireless camera networks
• Problems– existing approaches are theoretical and designed for WSNs– need adaptation for limited Field-Of-View sensors (cameras)
• detection miss• target hand-over• target loss
– need mechanisms for the definition of the aggregation chain• first node (starts iteration)• intermediate nodes (aggregate local measurement to the PP)• last node (performs estimation)
– a network-simulator environment is required
A. Cavallaro
First node
1kz
1. Knows previous posterior and local measurement
2. Prediction and Update:• re-sampling• draw from state-transition• weight update from likelihood
3. GMM-PP creation
4. Next-hop selection
5. Sends GMM-PP
P 1i(i)
1-k(i) 1-k w,x
P 1i(i)
1-k(i)1-k w,x
P1,...,i )x|f(x x (i)1-kk
(i)k ~
P1,...,i )x|f(z
)x|f(z w P
1j
(j)
k
1
k
(i)
k
1
k(i)k
P 1i
(i)k
(i) k w,x
1PP-GMMf
next-hop
Intermediate node h
hkz
1. Receives PP from node h-1
2. Importance sampling:• use the incoming PP as
importance function g()• draw from importance function• weight update: CONDENSATION
3. GMM-PP creation
4. Next-hop selection
5. Sends GMM-PPP1,...,i )z|(xf x 1-h:1
kkPP-GMM(i)k ~
P1,...,i )x|f(z
)x|f(z w P
1j
(j)
k
h
k
(i)
k
h
k(i)k
P 1i(i)
1-k(i) 1-k w,x
h:1PP-GMMf
)z|(xf)g(x 1-h:1kk
1-h:1PP-GMMk
next-hop
Last node
Nkz
1. Receives PP from node N-1
2. Importance sampling as for intermediate nodes
3. Last PP is also the global PP
4. Target state estimation
5. Next tracking step starts here!
P
1i
(i) k
(i)kk xw x̂
First node at k+1
After importance sampling: N:1PPf
Estimation:
)Z|f(x kk
A. Cavallaro
Experimental setups
• Simulations• number of nodes: N = 10, 50, 100, 300, 500, 700, 1000• number of particles: P = 100, 300, 500
• DPF with different GMM configurations• No GMM approximation: DPF-0
• Variable number of GMM components: DPF-1, DPF-5
• realistic network conditions
Simulator: WiSE-MNet www.eecs.qmul.ac.uk/~andrea/wise-mnet.html
A. Cavallaro
Simulation setup
• Network– T-MAC protocol, BW = 250 kbps– request-to-send/clear-to-send mechanism– acknowledged-transmission mechanism– number of retransmissions: 10
• Cameras– Covering 6000 sqm (random uniform distribution)– Top-down facing cameras: 6m from the ground plane (FOV is 10m X 6m) – Frame rate = 1fps
• 100 simulation runs, each of 10 minutes
A. Cavallaro
What do we measure?
• Estimation efficiency
trK
1itr
d(k) K1D
KK E tr Ktr # of estimations (detected events)
K # of observations (all the events)
• Average estimation delay
d(k) : Estimation delay for the k-th tracking step
A. Cavallaro
Efficiency
Network Delay
Summary
• Distributed target tracking (DPF) for wireless cameras– Dealing with limited-FOV sensors– Independence from number of particles – Importance of co-design between tracking algorithms and
communication protocols
• Human motion prediction across disjoint camera views– Dealing with absence of observations– Multi-Goal Social Force Model– Landmark-Based Method– Importance of integration of spatial, temporal and appearance cues
Simulator available as open source atwww.eecs.qmul.ac.uk/~andrea/wise-mnet.html
A. Cavallaro