Home Page
Title Page
Contents
JJ II
J I
Page 1 of 19
Go Back
Full Screen
Close
Quit
A motion-based event detectionmethod for video sequences
Jian-feng YAO∗
Institut de Recherche MAthematique de Rennes, France
(joint work with Gwenaelle PIRIOU, PatrickBOUTHEMY)
∗ http ://name.math.univ-rennes1.fr/jian-feng.yao
Home Page
Title Page
Contents
JJ II
J I
Page 2 of 19
Go Back
Full Screen
Close
Quit
Plan
I Motion and its measures in a image sequence
II Basic models for motion measures
III Application toevent detectionin a video
Home Page
Title Page
Contents
JJ II
J I
Page 3 of 19
Go Back
Full Screen
Close
Quit
I. Motion and its measures in a image sequence
} Reminding : digital picture
132×128
Home Page
Title Page
Contents
JJ II
J I
Page 3 of 19
Go Back
Full Screen
Close
Quit
I. Motion and its measures in a image sequence
} Reminding : digital picture
132×128
its digital version - upper left : 30 x 15 pixels
217. 217. 217. 214. 217. 217. 217. 217. 217. 217. 217. 217. 217. 217. 217.217. 217. 217. 217. 217. 217. 217. 217. 217. 217. 229. 229. 229. 229. 217.217. 217. 217. 217. 217. 217. 217. 217. 229. 217. 217. 229. 217. 229. 229.217. 229. 229. 217. 217. 229. 217. 229. 217. 229. 217. 229. 229. 229. 229.229. 217. 229. 217. 229. 229. 217. 217. 229. 229. 229. 217. 217. 229. 229.229. 229. 229. 229. 229. 217. 229. 229. 229. 217. 217. 229. 229. 229. 229.229. 217. 229. 217. 217. 229. 229. 217. 229. 217. 229. 217. 229. 229. 229.229. 229. 234. 234. 234. 234. 234. 229. 217. 234. 229. 229. 229. 229. 229.234. 234. 229. 234. 234. 234. 234. 234. 234. 234. 250. 234. 217. 229. 250.229. 229. 229. 229. 229. 234. 229. 234. 229. 234. 250. 217. 195. 214. 229.229. 229. 234. 234. 234. 234. 229. 234. 234. 229. 234. 229. 189. 214. 234.229. 234. 229. 234. 229. 229. 229. 234. 234. 234. 250. 250. 150. 153. 173.229. 229. 229. 229. 229. 234. 250. 229. 234. 234. 250. 250. 173. 164. 164.234. 234. 229. 217. 229. 229. 234. 217. 229. 179. 179. 189. 173. 153. 117.229. 229. 229. 250. 229. 229. 234. 229. 250. 150. 127. 102. 102. 102. 102.217. 229. 234. 234. 250. 250. 234. 250. 250. 189. 117. 51. 77. 102. 102.229. 250. 250. 195. 214. 250. 234. 234. 250. 250. 137. 77. 77. 102. 89.229. 234. 250. 173. 166. 250. 250. 250. 250. 250. 127. 89. 89. 77. 77.229. 250. 250. 217. 164. 217. 250. 250. 250. 189. 77. 102. 102. 77. 77.229. 234. 234. 250. 195. 150. 164. 189. 189. 127. 77. 77. 77. 77. 77.217. 214. 217. 214. 166. 137. 137. 204. 117. 77. 77. 77. 51. 77. 51.166. 150. 150. 166. 117. 102. 150. 217. 89. 51. 51. 51. 51. 51. 51.217. 102. 178. 204. 89. 77. 117. 175. 77. 51. 51. 51. 51. 51. 51.234. 77. 102. 89. 153. 77. 51. 0. 0. 51. 51. 51. 51. 51. 51.77. 117. 117. 150. 166. 0. 0. 0. 51. 51. 51. 51. 0. 51. 0.102. 117. 117. 102. 51. 51. 51. 77. 51. 77. 51. 51. 51. 51. 0.102. 117. 153. 173. 0. 51. 51. 51. 51. 51. 51. 51. 0. 51. 51.77. 89. 89. 77. 51. 51. 77. 77. 51. 51. 0. 51. 51. 51. 0.89. 77. 89. 51. 51. 77. 77. 51. 51. 51. 51. 0. 51. 51. 51.117. 175. 195. 51. 77. 77. 51. 51. 0. 51. 51. 51. 51. 51. 51.
Home Page
Title Page
Contents
JJ II
J I
Page 4 of 19
Go Back
Full Screen
Close
Quit
animage== an array of numbers in[0, 255] (grey levels)
a display :
Home Page
Title Page
Contents
JJ II
J I
Page 5 of 19
Go Back
Full Screen
Close
Quit
Motion (optical flow) in a sequence
I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)
I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory
t 7→ pt = (xt, yt)
Motion field : vt(pt) =∂
∂tpt .
Home Page
Title Page
Contents
JJ II
J I
Page 5 of 19
Go Back
Full Screen
Close
Quit
Motion (optical flow) in a sequence
I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)
I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory
t 7→ pt = (xt, yt)
Motion field : vt(pt) =∂
∂tpt .
I Problem : given (It), estimate(vt)
Home Page
Title Page
Contents
JJ II
J I
Page 5 of 19
Go Back
Full Screen
Close
Quit
Motion (optical flow) in a sequence
I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)
I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory
t 7→ pt = (xt, yt)
Motion field : vt(pt) =∂
∂tpt .
I Problem : given (It), estimate(vt)
I Ill-posed problem :moving objectsor the trajectoriest 7→ pt areunknown!
I Basic assumption : t 7→ I(pt) is constant
I Well-known restoration techniques :−→ (vt)t
} An application : segmentation of “moving re-gions”
Home Page
Title Page
Contents
JJ II
J I
Page 6 of 19
Go Back
Full Screen
Close
Quit
} More details on motion measures :
vt = wt + ut
Image motion Dominant motion Residual motion
Home Page
Title Page
Contents
JJ II
J I
Page 6 of 19
Go Back
Full Screen
Close
Quit
} More details on motion measures :
vt = wt + ut
Image motion Dominant motion Residual motion
- large scale - lower scale- camera motion - scene motion
of interest
Home Page
Title Page
Contents
JJ II
J I
Page 6 of 19
Go Back
Full Screen
Close
Quit
} More details on motion measures :
vt = wt + ut
Image motion Dominant motion Residual motion
- large scale - lower scale- camera motion - scene motion
of interest
I Simple affine model for camera motion :wt
wt(p) = wt(x, y) =
(at + btx + ctya′t + b′tx + c′ty
),
θt = (at, bt, ct, a′t, b
′t, c
′t) ,
θt = arg minθ
∑p
|It+1(p + wt(p))− It(p)|2 .
( the sum∑
may be restricted to local windows)
Home Page
Title Page
Contents
JJ II
J I
Page 6 of 19
Go Back
Full Screen
Close
Quit
} More details on motion measures :
vt = wt + ut
Image motion Dominant motion Residual motion
- large scale - lower scale- camera motion - scene motion
of interest
I Simple affine model for camera motion :wt
wt(p) = wt(x, y) =
(at + btx + ctya′t + b′tx + c′ty
),
θt = (at, bt, ct, a′t, b
′t, c
′t) ,
θt = arg minθ
∑p
|It+1(p + wt(p))− It(p)|2 .
( the sum∑
may be restricted to local windows)
I Estimation of scene motion :ut
ut(p) = local average of residuals|It+1(p + wt(p))− It(p)|
Home Page
Title Page
Contents
JJ II
J I
Page 7 of 19
Go Back
Full Screen
Close
Quit
A benchmark video sequence : Athletics videoI a French TV video reporting a athletics meetingI Source : Institut National d’AudiovisuelI Project conducted at : INRIA/IRISA, Rennes (ref. Pa-
trick BOUTHEMY
I 17 mins ; (∼ 25000 images)I typical scenes :
Home Page
Title Page
Contents
JJ II
J I
Page 8 of 19
Go Back
Full Screen
Close
Quit
} Motion measures of the Athletics video :
sample imageIt domainant motionwt residual motionut
pole vault
Home Page
Title Page
Contents
JJ II
J I
Page 8 of 19
Go Back
Full Screen
Close
Quit
} Motion measures of the Athletics video :
sample imageIt domainant motionwt residual motionut
pole vault
track race
Home Page
Title Page
Contents
JJ II
J I
Page 8 of 19
Go Back
Full Screen
Close
Quit
} Motion measures of the Athletics video :
sample imageIt domainant motionwt residual motionut
pole vault
track race
interview
Home Page
Title Page
Contents
JJ II
J I
Page 9 of 19
Go Back
Full Screen
Close
Quit
II. Models for motion measures
a) : basic model for the scene motion u = (ut)
I histograms of scene motionu = (ut) , [ut(p)],1 ≤ t ≤ T, p ∈ S
Home Page
Title Page
Contents
JJ II
J I
Page 9 of 19
Go Back
Full Screen
Close
Quit
II. Models for motion measures
a) : basic model for the scene motion u = (ut)
I histograms of scene motionu = (ut) , [ut(p)],1 ≤ t ≤ T, p ∈ S
I Also usedits gradient : ∇ut(p) = ut+1(p)− ut(p) .Note :∇ut(p) ∈ R
I Family of (marginal) mixture distributions :
f1(x) = β11Ix=0 + (1− β1)2gσ1(x)1Ix>0 , for u = [ut(p)]
f2(x) = β21Ix=0 + (1− β2)gσ2(x)1Ix6=0 , for ∇u
with Gaussian gσ(x) = (2πσ2)−1/2 exp[− 12σ2x
2] .
I Basic model for(u,∇u) :
ξ(u,∇u) =
T∏t=1
∏p∈S
f1(ut(p))f2(∇ut(p)) .
Note : no spatial correlation nor time dependency !
Home Page
Title Page
Contents
JJ II
J I
Page 10 of 19
Go Back
Full Screen
Close
Quit
b) : basic model for the camera motion w = (wt)
I have computed for eacht and each pixelp the cameramotion vectorwt(p) ∈ R2 ;
I example of histograms :
(pole vault) (wide-shot of track race)
Home Page
Title Page
Contents
JJ II
J I
Page 10 of 19
Go Back
Full Screen
Close
Quit
b) : basic model for the camera motion w = (wt)
I have computed for eacht and each pixelp the cameramotion vectorwt(p) ∈ R2 ;
I example of histograms :
(pole vault) (wide-shot of track race)
I Use of mixture of bivariate Gaussian distributions
f3(x; α) =
K∑j=1
αjGj(x; µj,Σj)
I Basic model forw = (wt(p)) :
ζ(w) =
T∏t=1
∏p∈S
f3(w; α)
Note : again, no spatial correlation nor time depen-dency !
Home Page
Title Page
Contents
JJ II
J I
Page 11 of 19
Go Back
Full Screen
Close
Quit
III. Application to event detection in a video
} Why event detection ?I verification of paid advertizing spots on TV broadcast
from a companyI locating only those exciting moments of goals in a soc-
cer matchI ...
Home Page
Title Page
Contents
JJ II
J I
Page 11 of 19
Go Back
Full Screen
Close
Quit
III. Application to event detection in a video
} Why event detection ?I verification of paid advertizing spots on TV broadcast
from a companyI locating only those exciting moments of goals in a soc-
cer matchI ...
I Generally :find all the places in a video where a short sequence
similar to this example oneoccurs
Home Page
Title Page
Contents
JJ II
J I
Page 12 of 19
Go Back
Full Screen
Close
Quit
Back to the benchmark Athletics videoI a French TV video reporting a athletics meetingI 17 mins ; (∼ 25000 images)I typical scenes :
I Proposal :detecting short sequences on the follo-wing events :
- Pole vault ;- Replay of pole vault- Wide shot of track race- Close-up of track race
Home Page
Title Page
Contents
JJ II
J I
Page 13 of 19
Go Back
Full Screen
Close
Quit
I Standard statistical scheme for event detection :
Model + Learning + Detection algorithm
Home Page
Title Page
Contents
JJ II
J I
Page 13 of 19
Go Back
Full Screen
Close
Quit
I Standard statistical scheme for event detection :
Model + Learning + Detection algorithm
} Preparation : all videos are pre-segmented (inde-pendent problem)
I typically : a homogeneous video segment runs from se-veral seconds to several minutes
Home Page
Title Page
Contents
JJ II
J I
Page 14 of 19
Go Back
Full Screen
Close
Quit
} Two-staged strategy for event detection
Home Page
Title Page
Contents
JJ II
J I
Page 15 of 19
Go Back
Full Screen
Close
Quit
Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion
Home Page
Title Page
Contents
JJ II
J I
Page 15 of 19
Go Back
Full Screen
Close
Quit
Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion
Learning algorithm :I given a learning data base of video segments{si} we
know if si is or notan interesting segmentI get two sets :
E = {interesting segments}E = { uninteresting segments}
Home Page
Title Page
Contents
JJ II
J I
Page 15 of 19
Go Back
Full Screen
Close
Quit
Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion
Learning algorithm :I given a learning data base of video segments{si} we
know if si is or notan interesting segmentI get two sets :
E = {interesting segments}E = { uninteresting segments}
I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2
I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm
- fit again a scene motion model to each clusterCi ;
Home Page
Title Page
Contents
JJ II
J I
Page 15 of 19
Go Back
Full Screen
Close
Quit
Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion
Learning algorithm :I given a learning data base of video segments{si} we
know if si is or notan interesting segmentI get two sets :
E = {interesting segments}E = { uninteresting segments}
I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2
I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm
- fit again a scene motion model to each clusterCi ;
I do the same for the setE to getm′ clusters ofuninte-resting segmentsC ′
1, . . . , C′m′
Home Page
Title Page
Contents
JJ II
J I
Page 15 of 19
Go Back
Full Screen
Close
Quit
Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion
Learning algorithm :I given a learning data base of video segments{si} we
know if si is or notan interesting segmentI get two sets :
E = {interesting segments}E = { uninteresting segments}
I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2
I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm
- fit again a scene motion model to each clusterCi ;
I do the same for the setE to getm′ clusters ofuninte-resting segmentsC ′
1, . . . , C′m′
I finally we get- m clusters ofinteresting segments
with m motion modelsMi
- m′ clusters ofuninteresting segmentswith m′ motion modelsM ′
j
Home Page
Title Page
Contents
JJ II
J I
Page 16 of 19
Go Back
Full Screen
Close
Quit
Classification/recognition algorithm :
I given a test segment s, with scene motions(u(s),∇u(s)),compute the likelihoods
ξMi(u(s),∇u(s)) , learned models from E
ξM ′j(u(s),∇u(s)) , learned models from E
I assign the lableinterestingor uninterestingaccordingto the maximum of likelihood
Home Page
Title Page
Contents
JJ II
J I
Page 17 of 19
Go Back
Full Screen
Close
Quit
} Experimental results of Stage 1I consider a TV videoAthletics videoof 17 minutes (∼
25000 images)I typical scenes :
I learning basis = 10 minute ; test set = 7 minute
Home Page
Title Page
Contents
JJ II
J I
Page 17 of 19
Go Back
Full Screen
Close
Quit
} Experimental results of Stage 1I consider a TV videoAthletics videoof 17 minutes (∼
25000 images)I typical scenes :
I learning basis = 10 minute ; test set = 7 minute
I recognitions rates on the test set :1 = interesting, 0=uninteresting
nij = #{i segments classified asj}Precisionrate of the alarm =
n11
n11 + n01= 84 %
Recallrate of the alarm =n11
n11 + n10= 94 %
Home Page
Title Page
Contents
JJ II
J I
Page 18 of 19
Go Back
Full Screen
Close
Quit
Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge
in the modelsI need to fix a set oftarget events
Example ofAthletics video:
- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace
I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:
h(s) = ξ(us,∇us)ζ(ws)
Home Page
Title Page
Contents
JJ II
J I
Page 18 of 19
Go Back
Full Screen
Close
Quit
Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge
in the modelsI need to fix a set oftarget events
Example ofAthletics video:
- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace
I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:
h(s) = ξ(us,∇us)ζ(ws)
Learning algorithm :I standard way : for each target evente, learn a modelhe
from a learning sample of video segments containing theevent
Home Page
Title Page
Contents
JJ II
J I
Page 18 of 19
Go Back
Full Screen
Close
Quit
Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge
in the modelsI need to fix a set oftarget events
Example ofAthletics video:
- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace
I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:
h(s) = ξ(us,∇us)ζ(ws)
Learning algorithm :I standard way : for each target evente, learn a modelhe
from a learning sample of video segments containing theevent
Final event detection algorithm :I given atestsegments, useStage 1to know if it is in-
terestingor not ;I stop if classified’uninteresting’I Otherwise, claim it as an evente accoding to the maxi-
mum of the likelihood from the joint model above
e = arg maxe
he(s) .
Home Page
Title Page
Contents
JJ II
J I
Page 19 of 19
Go Back
Full Screen
Close
Quit
} Experimental resultsI Athletics video : from the 7-minutes test sequence :
top : ground-truthbottom : detection
� pole vault � replay of pole vault� wide-shot of track race � close-up of track-race
� “uninteresting”
Home Page
Title Page
Contents
JJ II
J I
Page 19 of 19
Go Back
Full Screen
Close
Quit
} Experimental resultsI Athletics video : from the 7-minutes test sequence :
top : ground-truthbottom : detection
� pole vault � replay of pole vault� wide-shot of track race � close-up of track-race
� “uninteresting”
I Classification matrix :
Pole vault Pv-Reply Wide-Race Close-RacePole vault 2 0 0 0Pv-Replay 0 2 0 0Wide-Race 0 1 4 1Close-Race 0 0 0 6
Uninteresting 0 1 0 0