+ All Categories
Home > Science > Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Date post: 15-Apr-2017
Category:
Upload: xavier-giro
View: 4,007 times
Download: 0 times
Share this document with a friend
98
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks Alberto Montes July 15th, 2016 Xavi Giró Amaia Salvador
Transcript
Page 1: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Temporal Activity Detection in Untrimmed Videos with Recurrent

Neural Networks

Alberto Montes

July 15th, 2016

Xavi Giró Amaia Salvador

Page 2: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

1. Introduction2. Related Work3. Methodology4. Results5. Conclusions and Future Work

2

Page 3: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Motivation

3

Page 4: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Motivation

4

Page 5: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

5

Videos

Page 6: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

6

Videos

Activity Classification

Longboarding

Page 7: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

7

Videos

Activity Temporal Localization

Longboarding

Page 8: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

8

How?

Page 9: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

9

Neural Network

Activity

Page 10: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Problem Definition

10

Activity

CNN RNN+

Page 11: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

11

Large-Scale Activity Recognition Challenge

Stats:

● 19,994 Videos● 200 Activities● 660 hours of video● 313 hours of activities● 65.6 million of frames

Page 12: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Dataset

12

Page 13: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

1. Introduction2. Related Work3. Methodology4. Results5. Conclusions and Future Work

13

Page 14: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Literature Approaches

14

Activity

CNN RNN+

Page 15: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Convolutional Neural Network

15

Convolutional Layer

Page 16: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Recurrent Neural Network

16

c0 c1 c2

Page 17: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Literature Approaches

17

Activity

CNN RNN+

Page 18: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

3D Convolution

18

Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015, December). Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE ICCV 2015 (pp. 4489-4497). IEEE.

Page 19: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

3D Convolution

19

● 16-frame video clip as input● 80 million parameters● 3x3x3 filter size at all conv layers

Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015, December). Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE ICCV 2015 (pp. 4489-4497). IEEE.

Page 20: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Literature Approaches

20

Activity

CNN RNN+

Page 21: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Literature Approaches

21

Activity

CNN RNN+

Page 22: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Segments Proposals

22

Shou, Z., Wang, D., & Chang, S. F. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs CVPR

2016.

Page 23: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Literature Approaches

23

Activity

CNN RNN+

Page 24: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

RNN for Activity Localization

24

Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. et al. "End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR 2016

Page 25: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

1. Introduction2. Related Work3. Methodology4. Results5. Conclusions and Future Work

25

Page 26: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Architecture Overview

26

16 frames 200 activities + background

16 frames 200 activities + background

16 frames 200 activities + background

Page 27: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

27

Page 28: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

28

Page 29: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

C3D Network

29

Caffe +

by

feature vector

published on:

Page 30: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

C3D Network

30

Caffeby

feature vector

Page 31: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

31

Page 32: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Audio Features

32

C3D

Recurrent Neural Network Input

Audio Features:● MFCC● Spectral

concatvideo features

Provided by Ignasi Esquerra

Page 33: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

33

Page 34: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Network Architecture

34

Page 35: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Network Architecture

35

Page 36: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Network Architecture

36

LSTM with previous output feedback

Page 37: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

37

Page 38: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Training Methodology

Categorical Cross Entropy Loss

38

Page 39: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Training Methodology

For unbalanced data, weighted loss:

39

660 hours of video

313 hours of activities

Page 40: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

3. Methodologya. Extracting C3D Featuresb. Audio Featuresc. Network Architectured. Training Methodologye. Post-Processing

40

Page 41: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Post-Processing

41

Background

Activity 1

Activity 2

Activity 200

Clip

1

Clip

2

Clip

3

Clip

N

Page 42: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Post-Processing

42

Background

Activity 1

Activity 2

Activity 200

Clip

1

Clip

2

Clip

3

Clip

N

Ave

rage

Page 43: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Post-Processing

43

Background

Activity 1

Activity 2

Activity 200

Clip

1

Clip

2

Clip

3

Clip

N

Ave

rage

Max Probability

Page 44: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Post-Processing

44

Background

Activity 1

Activity 2

Activity 200

Clip

1

Clip

2

Clip

3

Clip

N

Applied a mean filter of k samplestime

Page 45: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Post-Processing

45

Background

ActivityC

lip 1

Clip

2

Clip

3

Clip

N

Ɣ

Page 46: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Post-Processing

46

Ɣ

Page 47: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

1. Introduction2. Related Work3. Methodology4. Results5. Conclusions and Future Work

47

Page 48: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification: Audio Features

48

mAP = 0.5755mAP = 0.5938

Music unrelated to the activity is often added to the videos in post-processing,causing a decrease in performance when audio and video features are combined.

Page 49: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification: Depth Analysis

49

mAP = 0.5938 mAP = 0.5492 mAP = 0.5635

Deeper networks present overfitting

Page 50: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Results Per Activity

50

Page 51: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Results Per Activity

51

Using the Pommel HorseSailingPlaying Ice HockeyRock ClimbingBMX

Page 52: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Classification Results Per Activity

52

Drinking CoffeePeeling PotatoesHaving an Ice CreamRock-Paper-ScissorsPolishing shoes

Page 53: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Top Level Classification

53

Page 54: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection

54

mAP = 0.2251 mAP = 0.2067

Model with feedback did not improve results

Page 55: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Training with feedback

55

512-LSTM

video features0 0 1 0 0 0

concat

When training

previous ground truth

Page 56: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Training with feedback

56

512-LSTM

video features0 0.1 0.6 0.2 0.1 0

concat

When testing

previous prediction

Page 57: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Comparing Post-Processing

57

Ɣ

Grid search for optimal parameters

Page 58: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Results per Activity

58

Page 59: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Results per Activity

59

WindsurfingRiding Bumper CarsPlaying RacquetballUsing the Pommel HorseUsing Parallel Bars

Page 60: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Detection Results per Activity

60

Drinking CoffeePutting on ShoesRock-Paper-ScissorsRemoving CurlersSmoking a Cigarette

Page 61: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Top Level Detection

61

Page 62: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Qualitative Evaluation

62

Ground Truth:Playing water polo

Prediction:0.765 Playing water polo0.202 Swimming0.007 Springboard diving

Page 63: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Qualitative Evaluation

63

Ground Truth:Hopscotch

Prediction:0.848 Running a marathon0.023 Triple jump0.022 Javelin throw

Page 64: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Qualitative Evaluation

64

Page 65: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Qualitative Evaluation

65

Page 66: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Challenge Results

66

Classification Task(24 participants)

Baseline

42.20%

0% 100%

93.23%

WinnerAverage

Performance

66.26%58.74%

UPC Team

* results over test subsetSlide Design by Issey Masuda

mAP

Page 67: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Challenge Results

67

Detection Task(6 participants)

Baseline

9.70%

0% 50%

42.47%

WinnerAverage

Performance

29.94%22.36%

UPC Team

mAP

* results over test subsetSlide Design by Issey Masuda

Page 68: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Outline

1. Introduction2. Related Work3. Methodology4. Results5. Conclusions and Future Work

68

Page 69: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Conclusions

69

Classification:Longboarding

Detection:42.7s – 193.5s Longboarding

Page 70: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Conclusions

70

Video

Spatial Net

Temporal Net

Output

Winning entry forActivityNet Classification task

Wang, Limin, et al. "Towards good practices for very deep two-stream convnets." arXiv preprint arXiv:1507.02159 (2015).

Page 71: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Conclusions

71

Classification:Longboarding

Detection:42.7s – 193.5s Longboarding

Page 72: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Conclusions

72

Best results were obtained for sport categories, due to the pretraining of C3D with the Sports-1M dataset

Page 73: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Future Work: E2E Training

73

Training the whole pipeline end-to-end would reduce the bias towards sport categories

Page 74: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Future Work: Attention Models

74

Temporal Attention

Filters

Neural Network

Page 75: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Challenge Submission

75

Page 76: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Open Sourced Contributions

76

github.com/imatge-upc/activitynet-2016-cvprw

Page 77: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

“Thank you for your attention

77

Page 78: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

78

Questions?

Page 79: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

79

Support Slides

Page 80: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Metrics

80

Hit@3

Classification DetectionIoU

Page 81: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Smoothing Effect Comparison

81

Page 82: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Post-Processing Effect

82

Smoothing Filter:

Page 83: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Post-Processing Effect

83

Activity Threshold:

Page 84: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Activities Duration

84

Page 85: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

AP and Video Appearance Correlation

85

Page 86: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

AP and Video Appearance Correlation

86

Page 87: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Preparing Data

87

batch 1

batch 2

Page 88: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Preparing Data

88Sequence of Video Vector Features

Sequence of Activities

time

Page 89: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Preparing Data

89

time

timesteps

Page 90: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Preparing Data

90

Page 91: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Preparing Data

91

Gradient Propagation

Page 92: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Gathering Audio Features

92

16-Frame Clip

10 ms MFCC Features

t 10 ms MFCC Features

10 ms MFCC Features

10 ms MFCC Features

10 ms MFCC Features

10 ms MFCC Features

16-Frame Clip

Spectral Features

… … …

Page 93: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Gathering Audio Features

93

16-Frame Clipmean MFCC

Features

t

std MFCC

Features

16-Frame Clip

Spectral Features

… … …

mean MFCC

Features

std MFCC

Features

Page 94: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Gathering Audio Features

94

16-Frame Clipmean MFCC

Features

t

std MFCC

Features

16-Frame Clip

Spectral Features

… … …

mean MFCC

Features

std MFCC

FeaturesSpectral Features

Page 95: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Convolutional Neural Network

95

Convolutional Layer

Page 96: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Convolutional Neural Network

96

Pooling Layer

Page 97: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Convolutional Neural Network

97

Fully-Connected Layer

Page 98: Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Qualitative Evaluation

98


Recommended