Unsupervised Video Object Segmentation for Deep ...€¦ · Unsupervised Video Object Segmentation...

Unsupervised Video Object Segmentation for Deep Reinforcement Learning

Machine Learning and Data Analytics SymposiumDoha, Qatar, April 1, 2019

Vikash Goel, Jameson Weng, Pascal Poupart

2

Pascal: RBC Borealis AI Research Director

• Research institute funded by RBC

• 5 research centers: – Montreal, Toronto, Waterloo,

Edmonton and Vancouver

• 80 researchers: – Integrated (applied & fundamental) research model

• ML, RL, NLP, computer vision, private AI, knowledge graphs

• We are hiring!

3

Pascal: ML Professor at U of Waterloo

• Deep Learning– Automated structure learning, sum-product networks, transfer learning

• Reinforcement learning– Constrained RL, motion-oriented RL, sport analytics

• NLP– Conversational agents, machine translation, automated proofreading

• Theory– Convex relaxations of sum-product networks, characterization of local

optima in mixture models, consistent approximate Bayesian techniques

4

Outline

• Background

– Reinforcement learning: data inefficiency

– Solution: self-supervised learning

• MOREL: Motion-Oriented REinforcement Learning

– Unsupervised object & motion recognition

– Faster policy optimization & interpretability

Reference: Goel, Weng, Poupart (2018) Unsupervised Video Object Segmentation for Deep Reinforcement Learning, NeurIPS.

5

Reinforcement Learning

Games, robotics, automated trading, autonomous driving, recommender systems, conversational agents, operations research, data center optimization

Agent

Environment

ObservationReward Action

6

Data Inefficiency• Most RL successes: simulated environments

• Atari baselines: 40M frames (Schulman et al., 2017)

Atari MuJoCo VizDoom Computer Go

7

Image-based RL

imag

e actionsor

values

deep neural network sparse

reward

8

Self-supervised learning

• Auxiliary tasks and objectives– Future observation/reward prediction– Past observation prediction (inverse dynamics)– Observation reconstruction (auto-encoder)

Agent

Environment

ObservationReward Action

9

Image-based RL• Deep RL:

• Self-supervised RL (auxiliary tasks):

imag

e actionsor

values

deep neural network

dense signal

deep neural networkim

age

next

imag

e

sparse reward

10

Prior knowledge• What do you see?

– Humans: moving objects– RL agent: sequence of pixels

seaquest space invaders breakout

11

Discovery of relevant features slows down learning

imag

e actionsor

values

deep neural network

sparse reward

Feature extractionPolicy optimization

12

Faster LearningCan we learn a policy that automatically segments moving objects and identifies relevant objects?

seaquest space invaders breakout

13

Outline• Background

– Reinforcement learning: data inefficiency– Solution: self-supervised learning

• MOREL: Motion-Oriented REinforcement Learning– Unsupervised object & motion recognition– Faster policy optimization & interpretability


14

MOREL: Motion-Oriented RL

Unsupervised object segmentation

Only 1% of the frames (random actions)

Faster policy segmentation

Based on object segmentation and motion

Phase 1 Phase 2

15

Motion Consistency • Supervised segmentation: labor intensive labeling

• Idea: leverage optical flow (structure from motion)

16

SfM-NetVijayanarasimhan, Ricco, Schmid, Sukthankar, Fragkiadaki, SfM-Net: Learning of Structure and Motion from Video, arXiv, 2017.

17

SfM-Net predictions (KITTI 2015)

18

Simplified 2D SfM-Net

• No skip connection

• Reconstruction loss: DSSIM (structural dissimilarity)

• Flow regularization: L1 loss

• Curriculum: gradually increase !"#$ from 0 to 1

%"#&'()*"+&* = -../0

%"#$ =12

0(2) × 62 7

%899 = %"#&'()*"+&* + !"#$%"#$

19

Simplified 2D SfM-Net

Frame 1 Frame 2Masks

(summed)Most salient

mask Optical flow

Brea

kout

Pong

20

Unsupervised object segmentationMasks (summed) Most salient mask Optical flow

Spac

e In

vade

rsBe

am R

ider

Seaq

uest

Frame 1 Frame 2Masks

(summed)Most salient

mask Optical flow

21

MOREL: Motion-Oriented RLMulti-objective: max $%&'$() and min ,-./0'121,&3$$,$

Comparison with PPOBetter: 25 gamesSimilar: 25 gamesWorse: 9 games

Comparison with A2CBetter: 26 gamesSimilar: 30 gamesWorse: 3 games

22

VideosPong

Breakout

Seaquest

Beamrider

23

Performance CurvesBreakout

Epis

ode

rew

ards

Frames Frames

Epis

ode

rew

ards

Seaquest

Pong

Beamrider

Pong

24

Ablation StudyBreakout

Seaquest Beamrider

FramesFrames

Epis

ode

rew

ards

Epis

ode

rew

ards

25

Conclusion• MOREL: Motion-Oriented REinforcement Learning

– Unsupervised object & motion recognition– Faster policy optimization & interpretability

• Future work– 3D environments, physics-based dynamics, object-oriented RL,

model-based RL


26

RBC Borealis AI

• Graduating soon?– Join RBC Borealis AI (https://www.borealisai.com)– Email: [email protected]

• Research Institute– Fundamental research (publications)– Applied research (products)

• Topics– RL: automated trading– NLP: news filtering, information extraction, text generation– Computer Vision: satellite-based house valuation– Privacy: differential privacy– Knowledge graphs: recommender systems

https://www.borealisai.com/

Date post:	08-Aug-2020
Category:	Documents
Upload:	others
View:	9 times
Download:	0 times

Unsupervised Video Object Segmentation for Deep ...€¦ · Unsupervised Video Object Segmentation...

Documents