+ All Categories
Home > Documents > A motion-based event detection method for video … Page Title Page Contents JJ II J I Page 1 of 19...

A motion-based event detection method for video … Page Title Page Contents JJ II J I Page 1 of 19...

Date post: 20-May-2018
Category:
Upload: buidat
View: 213 times
Download: 0 times
Share this document with a friend
39
Home Page Title Page Contents Page 1 of 19 Go Back Full Screen Close Quit A motion-based event detection method for video sequences Jian-feng Y AO * Institut de Recherche MAth´ ematique de Rennes, France (joint work with Gw¨ enaelle P IRIOU, PatrickBOUTHEMY) * http ://name.math.univ-rennes1.fr/jian-feng.yao
Transcript

Home Page

Title Page

Contents

JJ II

J I

Page 1 of 19

Go Back

Full Screen

Close

Quit

A motion-based event detectionmethod for video sequences

Jian-feng YAO∗

Institut de Recherche MAthematique de Rennes, France

(joint work with Gwenaelle PIRIOU, PatrickBOUTHEMY)

∗ http ://name.math.univ-rennes1.fr/jian-feng.yao

Home Page

Title Page

Contents

JJ II

J I

Page 2 of 19

Go Back

Full Screen

Close

Quit

Plan

I Motion and its measures in a image sequence

II Basic models for motion measures

III Application toevent detectionin a video

Home Page

Title Page

Contents

JJ II

J I

Page 3 of 19

Go Back

Full Screen

Close

Quit

I. Motion and its measures in a image sequence

} Reminding : digital picture

132×128

Home Page

Title Page

Contents

JJ II

J I

Page 3 of 19

Go Back

Full Screen

Close

Quit

I. Motion and its measures in a image sequence

} Reminding : digital picture

132×128

its digital version - upper left : 30 x 15 pixels

217. 217. 217. 214. 217. 217. 217. 217. 217. 217. 217. 217. 217. 217. 217.217. 217. 217. 217. 217. 217. 217. 217. 217. 217. 229. 229. 229. 229. 217.217. 217. 217. 217. 217. 217. 217. 217. 229. 217. 217. 229. 217. 229. 229.217. 229. 229. 217. 217. 229. 217. 229. 217. 229. 217. 229. 229. 229. 229.229. 217. 229. 217. 229. 229. 217. 217. 229. 229. 229. 217. 217. 229. 229.229. 229. 229. 229. 229. 217. 229. 229. 229. 217. 217. 229. 229. 229. 229.229. 217. 229. 217. 217. 229. 229. 217. 229. 217. 229. 217. 229. 229. 229.229. 229. 234. 234. 234. 234. 234. 229. 217. 234. 229. 229. 229. 229. 229.234. 234. 229. 234. 234. 234. 234. 234. 234. 234. 250. 234. 217. 229. 250.229. 229. 229. 229. 229. 234. 229. 234. 229. 234. 250. 217. 195. 214. 229.229. 229. 234. 234. 234. 234. 229. 234. 234. 229. 234. 229. 189. 214. 234.229. 234. 229. 234. 229. 229. 229. 234. 234. 234. 250. 250. 150. 153. 173.229. 229. 229. 229. 229. 234. 250. 229. 234. 234. 250. 250. 173. 164. 164.234. 234. 229. 217. 229. 229. 234. 217. 229. 179. 179. 189. 173. 153. 117.229. 229. 229. 250. 229. 229. 234. 229. 250. 150. 127. 102. 102. 102. 102.217. 229. 234. 234. 250. 250. 234. 250. 250. 189. 117. 51. 77. 102. 102.229. 250. 250. 195. 214. 250. 234. 234. 250. 250. 137. 77. 77. 102. 89.229. 234. 250. 173. 166. 250. 250. 250. 250. 250. 127. 89. 89. 77. 77.229. 250. 250. 217. 164. 217. 250. 250. 250. 189. 77. 102. 102. 77. 77.229. 234. 234. 250. 195. 150. 164. 189. 189. 127. 77. 77. 77. 77. 77.217. 214. 217. 214. 166. 137. 137. 204. 117. 77. 77. 77. 51. 77. 51.166. 150. 150. 166. 117. 102. 150. 217. 89. 51. 51. 51. 51. 51. 51.217. 102. 178. 204. 89. 77. 117. 175. 77. 51. 51. 51. 51. 51. 51.234. 77. 102. 89. 153. 77. 51. 0. 0. 51. 51. 51. 51. 51. 51.77. 117. 117. 150. 166. 0. 0. 0. 51. 51. 51. 51. 0. 51. 0.102. 117. 117. 102. 51. 51. 51. 77. 51. 77. 51. 51. 51. 51. 0.102. 117. 153. 173. 0. 51. 51. 51. 51. 51. 51. 51. 0. 51. 51.77. 89. 89. 77. 51. 51. 77. 77. 51. 51. 0. 51. 51. 51. 0.89. 77. 89. 51. 51. 77. 77. 51. 51. 51. 51. 0. 51. 51. 51.117. 175. 195. 51. 77. 77. 51. 51. 0. 51. 51. 51. 51. 51. 51.

Home Page

Title Page

Contents

JJ II

J I

Page 4 of 19

Go Back

Full Screen

Close

Quit

animage== an array of numbers in[0, 255] (grey levels)

a display :

Home Page

Title Page

Contents

JJ II

J I

Page 5 of 19

Go Back

Full Screen

Close

Quit

Motion (optical flow) in a sequence

I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)

I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory

t 7→ pt = (xt, yt)

Motion field : vt(pt) =∂

∂tpt .

Home Page

Title Page

Contents

JJ II

J I

Page 5 of 19

Go Back

Full Screen

Close

Quit

Motion (optical flow) in a sequence

I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)

I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory

t 7→ pt = (xt, yt)

Motion field : vt(pt) =∂

∂tpt .

I Problem : given (It), estimate(vt)

Home Page

Title Page

Contents

JJ II

J I

Page 5 of 19

Go Back

Full Screen

Close

Quit

Motion (optical flow) in a sequence

I an image : I = [I(p)],grey levelI(p) at pixelp = (x, y)

I an image sequence : (It), It = [It(p)]I assume amovingobjet (or person), with trajectory

t 7→ pt = (xt, yt)

Motion field : vt(pt) =∂

∂tpt .

I Problem : given (It), estimate(vt)

I Ill-posed problem :moving objectsor the trajectoriest 7→ pt areunknown!

I Basic assumption : t 7→ I(pt) is constant

I Well-known restoration techniques :−→ (vt)t

} An application : segmentation of “moving re-gions”

Home Page

Title Page

Contents

JJ II

J I

Page 6 of 19

Go Back

Full Screen

Close

Quit

} More details on motion measures :

vt = wt + ut

Image motion Dominant motion Residual motion

Home Page

Title Page

Contents

JJ II

J I

Page 6 of 19

Go Back

Full Screen

Close

Quit

} More details on motion measures :

vt = wt + ut

Image motion Dominant motion Residual motion

- large scale - lower scale- camera motion - scene motion

of interest

Home Page

Title Page

Contents

JJ II

J I

Page 6 of 19

Go Back

Full Screen

Close

Quit

} More details on motion measures :

vt = wt + ut

Image motion Dominant motion Residual motion

- large scale - lower scale- camera motion - scene motion

of interest

I Simple affine model for camera motion :wt

wt(p) = wt(x, y) =

(at + btx + ctya′t + b′tx + c′ty

),

θt = (at, bt, ct, a′t, b

′t, c

′t) ,

θt = arg minθ

∑p

|It+1(p + wt(p))− It(p)|2 .

( the sum∑

may be restricted to local windows)

Home Page

Title Page

Contents

JJ II

J I

Page 6 of 19

Go Back

Full Screen

Close

Quit

} More details on motion measures :

vt = wt + ut

Image motion Dominant motion Residual motion

- large scale - lower scale- camera motion - scene motion

of interest

I Simple affine model for camera motion :wt

wt(p) = wt(x, y) =

(at + btx + ctya′t + b′tx + c′ty

),

θt = (at, bt, ct, a′t, b

′t, c

′t) ,

θt = arg minθ

∑p

|It+1(p + wt(p))− It(p)|2 .

( the sum∑

may be restricted to local windows)

I Estimation of scene motion :ut

ut(p) = local average of residuals|It+1(p + wt(p))− It(p)|

Home Page

Title Page

Contents

JJ II

J I

Page 7 of 19

Go Back

Full Screen

Close

Quit

A benchmark video sequence : Athletics videoI a French TV video reporting a athletics meetingI Source : Institut National d’AudiovisuelI Project conducted at : INRIA/IRISA, Rennes (ref. Pa-

trick BOUTHEMY

I 17 mins ; (∼ 25000 images)I typical scenes :

Home Page

Title Page

Contents

JJ II

J I

Page 8 of 19

Go Back

Full Screen

Close

Quit

} Motion measures of the Athletics video :

sample imageIt domainant motionwt residual motionut

pole vault

Home Page

Title Page

Contents

JJ II

J I

Page 8 of 19

Go Back

Full Screen

Close

Quit

} Motion measures of the Athletics video :

sample imageIt domainant motionwt residual motionut

pole vault

track race

Home Page

Title Page

Contents

JJ II

J I

Page 8 of 19

Go Back

Full Screen

Close

Quit

} Motion measures of the Athletics video :

sample imageIt domainant motionwt residual motionut

pole vault

track race

interview

Home Page

Title Page

Contents

JJ II

J I

Page 9 of 19

Go Back

Full Screen

Close

Quit

II. Models for motion measures

a) : basic model for the scene motion u = (ut)

I histograms of scene motionu = (ut) , [ut(p)],1 ≤ t ≤ T, p ∈ S

Home Page

Title Page

Contents

JJ II

J I

Page 9 of 19

Go Back

Full Screen

Close

Quit

II. Models for motion measures

a) : basic model for the scene motion u = (ut)

I histograms of scene motionu = (ut) , [ut(p)],1 ≤ t ≤ T, p ∈ S

I Also usedits gradient : ∇ut(p) = ut+1(p)− ut(p) .Note :∇ut(p) ∈ R

I Family of (marginal) mixture distributions :

f1(x) = β11Ix=0 + (1− β1)2gσ1(x)1Ix>0 , for u = [ut(p)]

f2(x) = β21Ix=0 + (1− β2)gσ2(x)1Ix6=0 , for ∇u

with Gaussian gσ(x) = (2πσ2)−1/2 exp[− 12σ2x

2] .

I Basic model for(u,∇u) :

ξ(u,∇u) =

T∏t=1

∏p∈S

f1(ut(p))f2(∇ut(p)) .

Note : no spatial correlation nor time dependency !

Home Page

Title Page

Contents

JJ II

J I

Page 10 of 19

Go Back

Full Screen

Close

Quit

b) : basic model for the camera motion w = (wt)

I have computed for eacht and each pixelp the cameramotion vectorwt(p) ∈ R2 ;

I example of histograms :

(pole vault) (wide-shot of track race)

Home Page

Title Page

Contents

JJ II

J I

Page 10 of 19

Go Back

Full Screen

Close

Quit

b) : basic model for the camera motion w = (wt)

I have computed for eacht and each pixelp the cameramotion vectorwt(p) ∈ R2 ;

I example of histograms :

(pole vault) (wide-shot of track race)

I Use of mixture of bivariate Gaussian distributions

f3(x; α) =

K∑j=1

αjGj(x; µj,Σj)

I Basic model forw = (wt(p)) :

ζ(w) =

T∏t=1

∏p∈S

f3(w; α)

Note : again, no spatial correlation nor time depen-dency !

Home Page

Title Page

Contents

JJ II

J I

Page 11 of 19

Go Back

Full Screen

Close

Quit

III. Application to event detection in a video

} Why event detection ?I verification of paid advertizing spots on TV broadcast

from a companyI locating only those exciting moments of goals in a soc-

cer matchI ...

Home Page

Title Page

Contents

JJ II

J I

Page 11 of 19

Go Back

Full Screen

Close

Quit

III. Application to event detection in a video

} Why event detection ?I verification of paid advertizing spots on TV broadcast

from a companyI locating only those exciting moments of goals in a soc-

cer matchI ...

I Generally :find all the places in a video where a short sequence

similar to this example oneoccurs

Home Page

Title Page

Contents

JJ II

J I

Page 12 of 19

Go Back

Full Screen

Close

Quit

Back to the benchmark Athletics videoI a French TV video reporting a athletics meetingI 17 mins ; (∼ 25000 images)I typical scenes :

I Proposal :detecting short sequences on the follo-wing events :

- Pole vault ;- Replay of pole vault- Wide shot of track race- Close-up of track race

Home Page

Title Page

Contents

JJ II

J I

Page 13 of 19

Go Back

Full Screen

Close

Quit

I Standard statistical scheme for event detection :

Model + Learning + Detection algorithm

Home Page

Title Page

Contents

JJ II

J I

Page 13 of 19

Go Back

Full Screen

Close

Quit

I Standard statistical scheme for event detection :

Model + Learning + Detection algorithm

} Preparation : all videos are pre-segmented (inde-pendent problem)

I typically : a homogeneous video segment runs from se-veral seconds to several minutes

Home Page

Title Page

Contents

JJ II

J I

Page 14 of 19

Go Back

Full Screen

Close

Quit

} Two-staged strategy for event detection

Home Page

Title Page

Contents

JJ II

J I

Page 15 of 19

Go Back

Full Screen

Close

Quit

Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion

Home Page

Title Page

Contents

JJ II

J I

Page 15 of 19

Go Back

Full Screen

Close

Quit

Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion

Learning algorithm :I given a learning data base of video segments{si} we

know if si is or notan interesting segmentI get two sets :

E = {interesting segments}E = { uninteresting segments}

Home Page

Title Page

Contents

JJ II

J I

Page 15 of 19

Go Back

Full Screen

Close

Quit

Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion

Learning algorithm :I given a learning data base of video segments{si} we

know if si is or notan interesting segmentI get two sets :

E = {interesting segments}E = { uninteresting segments}

I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2

I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm

- fit again a scene motion model to each clusterCi ;

Home Page

Title Page

Contents

JJ II

J I

Page 15 of 19

Go Back

Full Screen

Close

Quit

Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion

Learning algorithm :I given a learning data base of video segments{si} we

know if si is or notan interesting segmentI get two sets :

E = {interesting segments}E = { uninteresting segments}

I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2

I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm

- fit again a scene motion model to each clusterCi ;

I do the same for the setE to getm′ clusters ofuninte-resting segmentsC ′

1, . . . , C′m′

Home Page

Title Page

Contents

JJ II

J I

Page 15 of 19

Go Back

Full Screen

Close

Quit

Stage 1 : rough classification– Aim : to cope with a large variete of videos– use of the previous model ofscene residual motion

Learning algorithm :I given a learning data base of video segments{si} we

know if si is or notan interesting segmentI get two sets :

E = {interesting segments}E = { uninteresting segments}

I for eachs ∈ E , fit a model of scene motionMs = (βj(s), σj(s)) , j = 1, 2

I use a hierarchical ascending classification tree to getmclusters ofinteresting segmentsC1, . . . , Cm

- fit again a scene motion model to each clusterCi ;

I do the same for the setE to getm′ clusters ofuninte-resting segmentsC ′

1, . . . , C′m′

I finally we get- m clusters ofinteresting segments

with m motion modelsMi

- m′ clusters ofuninteresting segmentswith m′ motion modelsM ′

j

Home Page

Title Page

Contents

JJ II

J I

Page 16 of 19

Go Back

Full Screen

Close

Quit

Classification/recognition algorithm :

I given a test segment s, with scene motions(u(s),∇u(s)),compute the likelihoods

ξMi(u(s),∇u(s)) , learned models from E

ξM ′j(u(s),∇u(s)) , learned models from E

I assign the lableinterestingor uninterestingaccordingto the maximum of likelihood

Home Page

Title Page

Contents

JJ II

J I

Page 17 of 19

Go Back

Full Screen

Close

Quit

} Experimental results of Stage 1I consider a TV videoAthletics videoof 17 minutes (∼

25000 images)I typical scenes :

I learning basis = 10 minute ; test set = 7 minute

Home Page

Title Page

Contents

JJ II

J I

Page 17 of 19

Go Back

Full Screen

Close

Quit

} Experimental results of Stage 1I consider a TV videoAthletics videoof 17 minutes (∼

25000 images)I typical scenes :

I learning basis = 10 minute ; test set = 7 minute

I recognitions rates on the test set :1 = interesting, 0=uninteresting

nij = #{i segments classified asj}Precisionrate of the alarm =

n11

n11 + n01= 84 %

Recallrate of the alarm =n11

n11 + n10= 94 %

Home Page

Title Page

Contents

JJ II

J I

Page 18 of 19

Go Back

Full Screen

Close

Quit

Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge

in the modelsI need to fix a set oftarget events

Example ofAthletics video:

- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace

I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:

h(s) = ξ(us,∇us)ζ(ws)

Home Page

Title Page

Contents

JJ II

J I

Page 18 of 19

Go Back

Full Screen

Close

Quit

Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge

in the modelsI need to fix a set oftarget events

Example ofAthletics video:

- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace

I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:

h(s) = ξ(us,∇us)ζ(ws)

Learning algorithm :I standard way : for each target evente, learn a modelhe

from a learning sample of video segments containing theevent

Home Page

Title Page

Contents

JJ II

J I

Page 18 of 19

Go Back

Full Screen

Close

Quit

Stage 2 : precise event detectionI aim : detect target events with more a priori knowledge

in the modelsI need to fix a set oftarget events

Example ofAthletics video:

- Pole vault ; - Replay of pole vault- Wide shot of track race ; - Close-up of trackrace

I use of both the scene motion modelξ and the cameramotion modelζ to get thejoint model:

h(s) = ξ(us,∇us)ζ(ws)

Learning algorithm :I standard way : for each target evente, learn a modelhe

from a learning sample of video segments containing theevent

Final event detection algorithm :I given atestsegments, useStage 1to know if it is in-

terestingor not ;I stop if classified’uninteresting’I Otherwise, claim it as an evente accoding to the maxi-

mum of the likelihood from the joint model above

e = arg maxe

he(s) .

Home Page

Title Page

Contents

JJ II

J I

Page 19 of 19

Go Back

Full Screen

Close

Quit

} Experimental resultsI Athletics video : from the 7-minutes test sequence :

top : ground-truthbottom : detection

� pole vault � replay of pole vault� wide-shot of track race � close-up of track-race

� “uninteresting”

Home Page

Title Page

Contents

JJ II

J I

Page 19 of 19

Go Back

Full Screen

Close

Quit

} Experimental resultsI Athletics video : from the 7-minutes test sequence :

top : ground-truthbottom : detection

� pole vault � replay of pole vault� wide-shot of track race � close-up of track-race

� “uninteresting”

I Classification matrix :

Pole vault Pv-Reply Wide-Race Close-RacePole vault 2 0 0 0Pv-Replay 0 2 0 0Wide-Race 0 1 4 1Close-Race 0 0 0 6

Uninteresting 0 1 0 0


Recommended