Use sparse MPEG flow vectors to computeHOF: Histograms of flowMBH: Motion boundary histograms
Efficient feature extraction, encoding and classification
for action recognition
Vadim Kantorov, Ivan Laptev
INRIA – WILLOW / École Normale Supérieure, Paris, France
Goal
École Normale Supérieure
Motivation
Contributions
Related work
ResultsApproach
Huge amounts of video:
Large-scale applications:
•
•
Local motion descriptor
Descriptor aggregation
MPEG flow
Estimated motion vectors are part of the most compressed video representations: MPEG, H-264, VP9.
MPEG motion vectors are sparse, typically defined on a 16x16 pixel grid.
•
•
The quality of MPEG flow is comparable to motion estimation by standard Optical Flow algorithms.
•
Motion in the synthetic MPI Sintel Flow dataset:
Motion in movie frames:Hollywood 2
HMDB 51
UCF 50
Qu
anti
zed
Lu
kas-
Kan
ade
flo
wQ
uan
tize
d
Farn
ebäc
kfl
ow
Fast action recognition.
State-of-the-art performance.
••
Decades of TV channels
5M years of video transfer per month in 20186000 years of new video each year
Video indexingSurveillanceAugmented reality
Current state-of-the-art methods for action recognition typically process ≈1 frame per second
•
Time for video feature extraction
Dense trajectories [1]
61%
31%
8%
Our method
<1%
>100x speed-up of video feature extraction.
4x real-time action recognition (CPU).
•
•Minor decrease in recognition accuracy.•
Optical flow estimation
Tracking
Descriptor aggregation
Publicly available implementationhttp://www.di.ens.fr/willow/research/fastvideofeat
•
[1] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, 2013.
[2] F. Shi, E. Petriu, and R. Laganiere. Sampling strategies for real-time action recognition. In CVPR, pages 2595–2602, 2013.
[3] F. Perronnin and J. Sanchez. High-dimensional signature compression for large-scale image classification. In CVPR, 2012.
[4] M. Muja and D. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP, pp. 331–340, 2009.
Qu
anti
zed
M
PEG
flo
w
Descriptor evaluation
Parameter sensitivity
Comparison to the state of the art
-1%
-1%
OF stride marginally affects accuracy Stable recognition across codecs and bit-rates
Trajectory information has limited influence on results
V* V0
V*[1]
•
Grid cells of two scales:16x16 pixels, 5 frames24x24 pixels, 5 frames
•
Dense descriptor sampling with16 pixels spatial stride5 frames temporal stride
•
Feature encoding and classification schemes:Histogram encoding + kernel SVMVLAD + linear SVMFisher Vector [3] + linear SVM
•
Descriptor assignment using approximate Nearest Neighbor search (FLANN) [4].
•
Approximate FV aggregation with updates of five nearest centroids only.
•
Code available http://www.di.ens.fr/willow/research/fastvideofeat
Hollywood2Histogram encoding