+ All Categories
Home > Documents > Efficient feature extraction, encoding and classification ... · Descriptor evaluation Parameter...

Efficient feature extraction, encoding and classification ... · Descriptor evaluation Parameter...

Date post: 09-Aug-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
1
Use sparse MPEG flow vectors to compute HOF: Histograms of flow MBH: Motion boundary histograms Efficient feature extraction, encoding and classification for action recognition Vadim Kantorov, Ivan Laptev INRIA WILLOW / École Normale Supérieure, Paris, France Goal École Normale Supérieure Motivation Contributions Related work Results Approach Huge amounts of video: Large-scale applications: Local motion descriptor Descriptor aggregation MPEG flow Estimated motion vectors are part of the most compressed video representations: MPEG, H-264, VP9. MPEG motion vectors are sparse, typically defined on a 16x16 pixel grid. The quality of MPEG flow is comparable to motion estimation by standard Optical Flow algorithms. Motion in the synthetic MPI Sintel Flow dataset: Motion in movie frames: Hollywood 2 HMDB 51 UCF 50 Quantized Lukas-Kanade flow Quantized Farnebäck flow Fast action recognition. State-of-the-art performance. Decades of TV channels 5M years of video transfer per month in 2018 6000 years of new video each year Video indexing Surveillance Augmented reality Current state-of-the-art methods for action recognition typically process ≈1 frame per second Time for video feature extraction Dense trajectories [1] 61% 31% 8% Our method <1% >100x speed-up of video feature extraction. 4x real-time action recognition (CPU). Minor decrease in recognition accuracy. Optical flow estimation Tracking Descriptor aggregation Publicly available implementation http://www.di.ens.fr/willow/research/fastvideofeat [1] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, 2013. [2] F. Shi, E. Petriu, and R. Laganiere. Sampling strategies for real-time action recognition. In CVPR, pages 2595–2602, 2013. [3] F. Perronnin and J. Sanchez. High-dimensional signature compression for large-scale image classification. In CVPR, 2012. [4] M. Muja and D. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP, pp. 331–340, 2009. Quantized MPEG flow Descriptor evaluation Parameter sensitivity Comparison to the state of the art -1% -1% OF stride marginally affects accuracy Stable recognition across codecs and bit-rates Trajectory information has limited influence on results V* V0 V* [1] Grid cells of two scales: 16x16 pixels, 5 frames 24x24 pixels, 5 frames Dense descriptor sampling with 16 pixels spatial stride 5 frames temporal stride Feature encoding and classification schemes: Histogram encoding + kernel SVM VLAD + linear SVM Fisher Vector [3] + linear SVM Descriptor assignment using approximate Nearest Neighbor search (FLANN) [4]. Approximate FV aggregation with updates of five nearest centroids only. Code available http://www.di.ens.fr/willow/research/fastvideofeat Hollywood2 Histogram encoding
Transcript
Page 1: Efficient feature extraction, encoding and classification ... · Descriptor evaluation Parameter sensitivity Comparison to the state of the art-1% -1% OF stride marginally affects

Use sparse MPEG flow vectors to computeHOF: Histograms of flowMBH: Motion boundary histograms

Efficient feature extraction, encoding and classification

for action recognition

Vadim Kantorov, Ivan Laptev

INRIA – WILLOW / École Normale Supérieure, Paris, France

Goal

École Normale Supérieure

Motivation

Contributions

Related work

ResultsApproach

Huge amounts of video:

Large-scale applications:

Local motion descriptor

Descriptor aggregation

MPEG flow

Estimated motion vectors are part of the most compressed video representations: MPEG, H-264, VP9.

MPEG motion vectors are sparse, typically defined on a 16x16 pixel grid.

The quality of MPEG flow is comparable to motion estimation by standard Optical Flow algorithms.

Motion in the synthetic MPI Sintel Flow dataset:

Motion in movie frames:Hollywood 2

HMDB 51

UCF 50

Qu

anti

zed

Lu

kas-

Kan

ade

flo

wQ

uan

tize

d

Farn

ebäc

kfl

ow

Fast action recognition.

State-of-the-art performance.

••

Decades of TV channels

5M years of video transfer per month in 20186000 years of new video each year

Video indexingSurveillanceAugmented reality

Current state-of-the-art methods for action recognition typically process ≈1 frame per second

Time for video feature extraction

Dense trajectories [1]

61%

31%

8%

Our method

<1%

>100x speed-up of video feature extraction.

4x real-time action recognition (CPU).

•Minor decrease in recognition accuracy.•

Optical flow estimation

Tracking

Descriptor aggregation

Publicly available implementationhttp://www.di.ens.fr/willow/research/fastvideofeat

[1] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, 2013.

[2] F. Shi, E. Petriu, and R. Laganiere. Sampling strategies for real-time action recognition. In CVPR, pages 2595–2602, 2013.

[3] F. Perronnin and J. Sanchez. High-dimensional signature compression for large-scale image classification. In CVPR, 2012.

[4] M. Muja and D. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISSAPP, pp. 331–340, 2009.

Qu

anti

zed

M

PEG

flo

w

Descriptor evaluation

Parameter sensitivity

Comparison to the state of the art

-1%

-1%

OF stride marginally affects accuracy Stable recognition across codecs and bit-rates

Trajectory information has limited influence on results

V* V0

V*[1]

Grid cells of two scales:16x16 pixels, 5 frames24x24 pixels, 5 frames

Dense descriptor sampling with16 pixels spatial stride5 frames temporal stride

Feature encoding and classification schemes:Histogram encoding + kernel SVMVLAD + linear SVMFisher Vector [3] + linear SVM

Descriptor assignment using approximate Nearest Neighbor search (FLANN) [4].

Approximate FV aggregation with updates of five nearest centroids only.

Code available http://www.di.ens.fr/willow/research/fastvideofeat

Hollywood2Histogram encoding

Recommended