+ All Categories
Home > Documents > Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets:...

Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets:...

Date post: 30-Jun-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
38
Activity Understanding “ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos” by Zhou, Xu and Corso Thomas Leyh University of Freiburg June 28th, 2017 Seminar on Current Works in Computer Vision Thomas Leyh Activity Understanding June 28th, 2017 1 / 24
Transcript
Page 1: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Activity Understanding“ProcNets: Learning to Segment Procedures in Untrimmed and

Unconstrained Videos” by Zhou, Xu and Corso

Thomas Leyh

University of Freiburg

June 28th, 2017Seminar on Current Works in Computer Vision

Thomas Leyh Activity Understanding June 28th, 2017 1 / 24

Page 2: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Outline

1 Introduction

2 Network ArchitectureContext-Aware Video EncodingProcedure Segment ProposalSequential Prediction

3 Performance

4 Conclusion

Thomas Leyh Activity Understanding June 28th, 2017 2 / 24

Page 3: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Outline

1 Introduction

2 Network ArchitectureContext-Aware Video EncodingProcedure Segment ProposalSequential Prediction

3 Performance

4 Conclusion

Thomas Leyh Activity Understanding June 28th, 2017 3 / 24

Page 4: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is this about?

Thomas Leyh Activity Understanding June 28th, 2017 4 / 24

Page 5: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is this about?

1 Grill the tomatoes in a pan

2 Add oil to a pan

3 Grill bacon until crispy...

8 Finish with bread

Number of segments and positions are inferred automatically!

Thomas Leyh Activity Understanding June 28th, 2017 5 / 24

Page 6: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is this about?

1 Grill the tomatoes in a pan

2 Add oil to a pan

3 Grill bacon until crispy...

8 Finish with bread

Number of segments and positions are inferred automatically!

Thomas Leyh Activity Understanding June 28th, 2017 5 / 24

Page 7: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Why is this useful?

Video Description Generation

Activity Recognition

First step towards a self-learningrobot cook?

Figure: Kim Kyung-Hoon/Reuters

Thomas Leyh Activity Understanding June 28th, 2017 6 / 24

Page 8: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Why is this useful?

Video Description Generation

Activity Recognition

First step towards a self-learningrobot cook?

Figure: Kim Kyung-Hoon/Reuters

Thomas Leyh Activity Understanding June 28th, 2017 6 / 24

Page 9: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Outline

1 Introduction

2 Network ArchitectureContext-Aware Video EncodingProcedure Segment ProposalSequential Prediction

3 Performance

4 Conclusion

Thomas Leyh Activity Understanding June 28th, 2017 7 / 24

Page 10: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Network has three stages.

Thomas Leyh Activity Understanding June 28th, 2017 8 / 24

Page 11: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Stage 1

Reduce dimensionality of each frame.

Thomas Leyh Activity Understanding June 28th, 2017 9 / 24

Page 12: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is ResNet?

What is Bi-LSTM?

Thomas Leyh Activity Understanding June 28th, 2017 10 / 24

Page 13: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is ResNet? → Residual Network

Very popular convolutionalnetwork model

Can be very deep

Figure: medium.com/@karpathy/a-peek-at-trends-in-machine-learning-

ab8a1085a106

Thomas Leyh Activity Understanding June 28th, 2017 11 / 24

Page 14: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is ResNet? → Residual Network

State-of-the-art performance inimage classification

Easy to train

Figure: chaosmail.github.io/deeplearning/2016/10/22/intro-to-

deep-learning-for-computer-vision/

Thomas Leyh Activity Understanding June 28th, 2017 11 / 24

Page 15: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is Bi-LSTM? → Bidirectional Long short-term Memory

Long short-term Memory (LSTM)

Special kind of Recurrent Neural Network (RNN)

Add ‘forgetting’ mechanism

For capturing long term dependencies

Easier to train than traditional RNN

Thomas Leyh Activity Understanding June 28th, 2017 12 / 24

Page 16: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is Bi-LSTM? → Bidirectional Long short-term Memory

Bidirectional LSTM (Bi-LSTM)

For capturing past and future context

One network runs forward over sequence

One network runs backwards

Combine output of both

Thomas Leyh Activity Understanding June 28th, 2017 12 / 24

Page 17: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What is ResNet? → Residual Network

What is Bi-LSTM? → Bidirectional Long short-term Memory

Dimensionality Reduction with Context

∈ R720×360×3

1©7→

0.190.940.84

...

∈ R512

(Numbers are made-up)

Thomas Leyh Activity Understanding June 28th, 2017 13 / 24

Page 18: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Stage 2

Produce segment proposals and their likelihood.

Thomas Leyh Activity Understanding June 28th, 2017 14 / 24

Page 19: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What are Temporal Convolutional Anchors?

Region Proposal Networks

Introduce an ‘attention’ mechanism

Originally for object detection on images

Here used on the temporal axis

Generates multiple proposals with score for each feature

Thomas Leyh Activity Understanding June 28th, 2017 15 / 24

Page 20: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

What are Temporal Convolutional Anchors?

Segment Proposals

←time axis→0.190.94

· · · 0.84 · · ·...

∈R512×11

2©7→

k = 11

0.95 0.74 0.920.51 0.25 0.900.28 0.10 0.46

...

∈ R15×3

Thomas Leyh Activity Understanding June 28th, 2017 15 / 24

Page 21: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Stage 3

Choose a variable number of segment proposals.

Thomas Leyh Activity Understanding June 28th, 2017 16 / 24

Page 22: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Another Long short-term Memory.

LSTM Input:

All Proposal Scores

Discretized Location, e.g. second 3 to 5 7→[0 1 0 · · ·

]Segment Content

Thomas Leyh Activity Understanding June 28th, 2017 17 / 24

Page 23: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Another Long short-term Memory.

LSTM Output:

Likelihood that a proposal is next segment

Maximize likelihood for all segmentsand you get an array of segment positions

Thomas Leyh Activity Understanding June 28th, 2017 17 / 24

Page 24: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Another Long short-term Memory.

LSTM Output:

Likelihood that a proposal is next segment

Maximize likelihood for all segmentsand you get an array of segment positions

Thomas Leyh Activity Understanding June 28th, 2017 17 / 24

Page 25: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Using different convolutional and recurrent network models to

Encode video frames

Encode temporal dependencies

And search for most likely arrangement.

Thomas Leyh Activity Understanding June 28th, 2017 18 / 24

Page 26: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Outline

1 Introduction

2 Network ArchitectureContext-Aware Video EncodingProcedure Segment ProposalSequential Prediction

3 Performance

4 Conclusion

Thomas Leyh Activity Understanding June 28th, 2017 19 / 24

Page 27: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

YouCookII Dataset

Since existing datasets were not sufficient a new dataset was collected.

Set of YouTube cooking videos with different recipes.Includes 2007 videos.

Thomas Leyh Activity Understanding June 28th, 2017 20 / 24

Page 28: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

New Metrics

Existing methods are hardly comparable.(e.g. the number of segments needs to be given)

Existing metrics fail to measure ordering information.

Therefore new metrics:

Average Recall at 0.5 ([email protected])

Mean Intersection-over-Union (mIoU)

They essentially measure the overlapping between ground truth.

Thomas Leyh Activity Understanding June 28th, 2017 21 / 24

Page 29: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 30: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Uniform model produces 8 segments

The others produce 7 segments

r is relaxation factor

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 31: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 32: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 33: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 34: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Performance Comparison

Thomas Leyh Activity Understanding June 28th, 2017 22 / 24

Page 35: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Outline

1 Introduction

2 Network ArchitectureContext-Aware Video EncodingProcedure Segment ProposalSequential Prediction

3 Performance

4 Conclusion

Thomas Leyh Activity Understanding June 28th, 2017 23 / 24

Page 36: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Conclusion

The performance looks quite nice and the approach is interesting.

But is this the right path to human level understanding?

Well...

Go try it out yourself!1

github.com/LuoweiZhou/Procedure-Segmentation-Networks

1

Unfortunately not finished at time of presentation.

Thomas Leyh Activity Understanding June 28th, 2017 24 / 24

Page 37: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Conclusion

The performance looks quite nice and the approach is interesting.

But is this the right path to human level understanding?

Well...

Go try it out yourself!1

github.com/LuoweiZhou/Procedure-Segmentation-Networks

1

Unfortunately not finished at time of presentation.

Thomas Leyh Activity Understanding June 28th, 2017 24 / 24

Page 38: Activity Understanding - “ProcNets: Learning to Segment ... · Activity Understanding \ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos" by Zhou, Xu

Conclusion

The performance looks quite nice and the approach is interesting.

But is this the right path to human level understanding?

Well...

Go try it out yourself!1

github.com/LuoweiZhou/Procedure-Segmentation-Networks

1Unfortunately not finished at time of presentation.Thomas Leyh Activity Understanding June 28th, 2017 24 / 24


Recommended