+ All Categories
Home > Documents > Representing Videos using Mid-level Discriminative Patches - Arpit ...

Representing Videos using Mid-level Discriminative Patches - Arpit ...

Date post: 14-Feb-2017
Category:
Upload: phungkhanh
View: 216 times
Download: 0 times
Share this document with a friend
13
Motivation Different approaches to video analysis Part discovery Using the mid-level representation Results Representing Videos using Mid-level Discriminative Patches Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry Davis Sobhan Naderi 26 June 2013 Sobhan Naderi Representing Videos using Mid-level Discriminative Patches
Transcript
Page 1: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

Representing Videos using Mid-levelDiscriminative Patches

Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry Davis

Sobhan Naderi

26 June 2013

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 2: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

I Learning mid-level discriminative spatio-temporal patches

I Category level action recognitionI Understainding actions at a finer level

I action primitives: bend, pick, liftI objects: people, weightI sceneI temporal localization

I Video alignment and label transfer

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 3: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

1. Global spatio-temporal templates

2. Bag of local features

3. Part based approaches

This paper uses exemplar-SVMto automatically discoverdistinctive parts

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 4: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

This paper’s approach

Look for patches that are ...

I recurrent: fire in many images

I distinctive: fire only (mostly) on samples of one category

The big challenge is that

I Space of all possible spatio-temporal patches is huge

I Most of the patches belong to background or areuninteresting

Simple solution:Run K-Means and prune.But it doesn’t work!

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 5: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

This paper’s approach

1. Form inital clustersI Use training setI Sample 200 random spatio-temporal patches per image

ignoring uniform/no motionI Keep 500 most distinct patches per class (use 20-NN)

2. Select top-rank clustersI Use validation setI Rank by: appearance +λ purityI Choose the top-80 clusters (for each class)I Q: are the e-SVM scores comparable?I Q: how is λ obtained?I Q: redundant clusters?

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 6: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

This paper’s approach

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 7: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

Patch selection procedure

1. Action classificationI Run each e-SVM in sliding window

fashionI Construct SPM representation (with

max-pooling)I Train an SVM classifier

2. Fine grain video analysisI Use context to choose a subset of

patchesI Build correspondance between videosI Q: how?

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 8: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

Patch selection procedure

I Here we assume the video has been classified as class ”k”I Potential patches: highest-scoring detection of each e-SVMI For a patch-vocabulary of size N let x = (x1, . . . , xN) be an

indicator vector xi ∈ {0, 1}I Find x∗ = argmaxx

∑i Aixi + w1

∑i Ck,ixi − w2

∑i ,j Pi ,jxixj

where:A : N × 1 appearance vector

Ck : N × 1 class consistency vector for class k

P : N × N penalty matrix

I Solve the optimization problem using IPFP.This requires writing the problem in the following form:

X ∗ = argmaxXXTMX

X =

(1

x

)

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 9: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

Patch selection procedure

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 10: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

ClassificationAlignment

I Only cuboid patches

I Scale ranges from 120x120x50 to the entire video

I Each patch is represented by HOG3D (4x4x5 and 20orientations)

I Experiment on UCF50 and Olympics Dataset

I This method outperforms action-bank by 3.32% on UCF50

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 11: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

ClassificationAlignment

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 12: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

ClassificationAlignment

I Manually label 50 patches per class with:I Objects of interaction (e.g. golf club, weights)I Person bounding boxesI Person pose

I These extra annotation is transferred to test images afteraligning

I Informal evaluations:I 50% of transferred joints are within 15 pix of ground-truthI 84% accuracy in localizing persons (50% overlap criteria)

I Q: How is the alignment done?

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches

Page 13: Representing Videos using Mid-level Discriminative Patches - Arpit ...

MotivationDifferent approaches to video analysis

Part discoveryUsing the mid-level representation

Results

ClassificationAlignment

Sobhan Naderi Representing Videos using Mid-level Discriminative Patches


Recommended