Unsupervised Video Object Segmentation for Deep Reinforcement Learning
Machine Learning and Data Analytics SymposiumDoha, Qatar, April 1, 2019
Vikash Goel, Jameson Weng, Pascal Poupart
2
Pascal: RBC Borealis AI Research Director
• Research institute funded by RBC
• 5 research centers: – Montreal, Toronto, Waterloo,
Edmonton and Vancouver
• 80 researchers: – Integrated (applied & fundamental) research model
• ML, RL, NLP, computer vision, private AI, knowledge graphs
• We are hiring!
3
Pascal: ML Professor at U of Waterloo
• Deep Learning– Automated structure learning, sum-product networks, transfer learning
• Reinforcement learning– Constrained RL, motion-oriented RL, sport analytics
• NLP– Conversational agents, machine translation, automated proofreading
• Theory– Convex relaxations of sum-product networks, characterization of local
optima in mixture models, consistent approximate Bayesian techniques
4
Outline
• Background
– Reinforcement learning: data inefficiency
– Solution: self-supervised learning
• MOREL: Motion-Oriented REinforcement Learning
– Unsupervised object & motion recognition
– Faster policy optimization & interpretability
Reference: Goel, Weng, Poupart (2018) Unsupervised Video Object Segmentation for Deep Reinforcement Learning, NeurIPS.
5
Reinforcement Learning
Games, robotics, automated trading, autonomous driving, recommender systems, conversational agents, operations research, data center optimization
Agent
Environment
ObservationReward Action
6
Data Inefficiency• Most RL successes: simulated environments
• Atari baselines: 40M frames (Schulman et al., 2017)
Atari MuJoCo VizDoom Computer Go
7
Image-based RL
imag
e actionsor
values
deep neural network sparse
reward
8
Self-supervised learning
• Auxiliary tasks and objectives– Future observation/reward prediction– Past observation prediction (inverse dynamics)– Observation reconstruction (auto-encoder)
Agent
Environment
ObservationReward Action
9
Image-based RL• Deep RL:
• Self-supervised RL (auxiliary tasks):
imag
e actionsor
values
deep neural network
dense signal
deep neural networkim
age
next
imag
e
sparse reward
10
Prior knowledge• What do you see?
– Humans: moving objects– RL agent: sequence of pixels
seaquest space invaders breakout
11
Discovery of relevant features slows down learning
imag
e actionsor
values
deep neural network
sparse reward
Feature extractionPolicy optimization
12
Faster LearningCan we learn a policy that automatically segments moving objects and identifies relevant objects?
seaquest space invaders breakout
13
Outline• Background
– Reinforcement learning: data inefficiency– Solution: self-supervised learning
• MOREL: Motion-Oriented REinforcement Learning– Unsupervised object & motion recognition– Faster policy optimization & interpretability
Reference: Goel, Weng, Poupart (2018) Unsupervised Video Object Segmentation for Deep Reinforcement Learning, NeurIPS.
14
MOREL: Motion-Oriented RL
Unsupervised object segmentation
Only 1% of the frames (random actions)
Faster policy segmentation
Based on object segmentation and motion
Phase 1 Phase 2
15
Motion Consistency • Supervised segmentation: labor intensive labeling
• Idea: leverage optical flow (structure from motion)
16
SfM-NetVijayanarasimhan, Ricco, Schmid, Sukthankar, Fragkiadaki, SfM-Net: Learning of Structure and Motion from Video, arXiv, 2017.
17
SfM-Net predictions (KITTI 2015)
18
Simplified 2D SfM-Net
• No skip connection
• Reconstruction loss: DSSIM (structural dissimilarity)
• Flow regularization: L1 loss
• Curriculum: gradually increase !"#$ from 0 to 1
%"#&'()*"+&* = -../0
%"#$ =12
0(2) × 62 7
%899 = %"#&'()*"+&* + !"#$%"#$
19
Simplified 2D SfM-Net
Frame 1 Frame 2Masks
(summed)Most salient
mask Optical flow
Brea
kout
Pong
20
Unsupervised object segmentationMasks (summed) Most salient mask Optical flow
Spac
e In
vade
rsBe
am R
ider
Seaq
uest
Frame 1 Frame 2Masks
(summed)Most salient
mask Optical flow
21
MOREL: Motion-Oriented RLMulti-objective: max $%&'$() and min ,-./0'121,&3$$,$
Comparison with PPOBetter: 25 gamesSimilar: 25 gamesWorse: 9 games
Comparison with A2CBetter: 26 gamesSimilar: 30 gamesWorse: 3 games
22
VideosPong
Breakout
Seaquest
Beamrider
23
Performance CurvesBreakout
Epis
ode
rew
ards
Frames Frames
Epis
ode
rew
ards
Seaquest
Pong
Beamrider
Pong
24
Ablation StudyBreakout
Seaquest Beamrider
FramesFrames
Epis
ode
rew
ards
Epis
ode
rew
ards
25
Conclusion• MOREL: Motion-Oriented REinforcement Learning
– Unsupervised object & motion recognition– Faster policy optimization & interpretability
• Future work– 3D environments, physics-based dynamics, object-oriented RL,
model-based RL
Reference: Goel, Weng, Poupart (2018) Unsupervised Video Object Segmentation for Deep Reinforcement Learning, NeurIPS.
26
RBC Borealis AI
• Graduating soon?– Join RBC Borealis AI (https://www.borealisai.com)– Email: [email protected]
• Research Institute– Fundamental research (publications)– Applied research (products)
• Topics– RL: automated trading– NLP: news filtering, information extraction, text generation– Computer Vision: satellite-based house valuation– Privacy: differential privacy– Knowledge graphs: recommender systems