Modern Artificial Intelligence via Deep Learning
S. M. Ali Eslami
October 2016
Algorithms
OutputInput
Algorithm
ProgrammableComputer
Introduction
Artificial Intelligence / Machine Learning
OutputInput
Algorithm
ProgrammableComputer
Introduction
?
Horse
Introduction
An Analogy
Immediate Usefulness
General Applicability
Introduction
An Analogy
Immediate Usefulness
General Applicability
Introduction
An Analogy
Immediate Usefulness
General Applicability
Introduction
An Analogy
Immediate Usefulness
General Applicability
Introduction
An Analogy
Immediate Usefulness
General Applicability
?
Deep Supervised Learning
Computer
Horse
Cow
OutputInput
Prep
roce
ssin
g
Feat
ure
Extr
actio
n
Feat
ure
Sele
ctio
n
Lear
ned
Dis
crim
inat
ion
Calib
ratio
n
Algorithm
Computer
Horse
Cow
OutputInput
Stag
e 1
Stag
e 2
Stag
e 3
Stag
e 4
Stag
e 5
Algorithm
Introduction
Convolutional Neural Networks
Torch (2015)
Introduction
Convolutional Neural Networks
Krizhevsky et al. (2012)
Clarifai (2014)
Introduction
● Optimize directly for the end loss
● End-to-end training, no engineered inputs
● With enough data, learn a big non-linear function
● Supervised labeling is often enough for transferrable representations
● Large labeled dataset + big / deep neural network + GPUs
Deep Supervised Learning
Introduction
Deep Supervised Learning
Zhang et al. (2015) Simonyan et al. (2014)
Text Classification Video Classification
Introduction
● Innovation continues○ Inception (Szegedy et al., 2015)○ Residual connections (He et al., 2015)○ Batchnorm (Ioffe et al., 2015)
● Performance is continuously improving
Deep Supervised Learning
Szegedy et al., (2015)
Introduction
● Sequence Modelling● Unsupervised Learning● Generative Modelling● Probabilistic Modelling● ...
Beyond Supervised Learning
Hochreiter et al. (1997) Vinyals et al. (2015) Kavukcuoglu et al. (2009) Hinton et al. (2006)
Larochelle et al. (2011)Rasmus et al. (2015)
Where does the data come from?
Reinforcement Learning
● Supervised / unsupervised learning is important○ Gives us tractable targets○ Helps model development○ Sometimes best to do algorithmic search when gradients are not noisy○ However large labelled datasets are not enough
● Real AI requires agents that○ interpret their environments○ act in their environments to gather data○ control themselves and their environments○ form representations that generalize○ learn end-to-end with minimal engineering
Agents
Reinforcement Learning
Architecture
Agent Environment
Observations
Actions
Reinforcement Learning
Powered by Neural Networks
Tesauro (1989) Lange et al. (2012)
Levine et al. (2015) Schulman et al. (2015)
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis
Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015)
Human-level control in ATARI
100+ classic 8-bit Atari games
● Observations: Raw video (~30k dimensional)● Actions: 18 buttons but not told what they do● Goal: Simply to maximize score
● Designed to be challenging and interesting for humans● Widely adopted benchmark for evaluation (Bellemare et al., 2013)● Provides a rich visual domain● Many different games emphasize control, strategy, planning, etc.
ATARI agents
● Maximise total future reward:
● For a policy π the action-value function Q is:
● Measure of how good action a is in state s○ Greedy: Follow the max○ ε-greedy: Follow the max with (1-ε) probability and random otherwise
Human-level control in ATARI
The action-value function
● Maximizing Qπ(s,a) over possible policies gives the optimal action-value function and the Bellman equation:
● Basic idea:○ Approximate ○ Apply the Bellman Equation as an iterative update:
Human-level control in ATARI
Value iteration
Human-level control in ATARI
End-to-end reinforcement learning
Mnih et al. (2015)
● We need a loss function to minimize
● So now we can do our good old SGD update:
Human-level control in ATARI
Value iteration
● Experiences in a sequence are correlated○ Do not do online updates, store in replay memory○ Sample from experience replay memory to apply Q-updates
● Targets can not depend on same Θi → introduce target network
Human-level control in ATARI
Value iteration
Human-level control in ATARI
Deep Q Networks
Human-level control in ATARI
Evaluation
1 epoch = 50,000 interactions = 30 minutes of experienceTotal experience: 10m interactions = 5 days
Human-level control in ATARI
Data ‘efficiency’
Asynchronous Methodsfor Deep Reinforcement Learning
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu (2016)
Reinforcement Learning
● DQN is very robust, but computationally expensive○ About a week to train on a single GPU
● Off-policy Q-Learning○ We would like a robust system for both on-policy and off-policy methods
● Discrete action space○ We want to be able to use the same method on continuous action spaces too
Asynchronous RL
Reinforcement Learning
● Asynchronous training of RL agents
● Parallel actor-learners implemented using CPU threads
● No replay? Parallel actor-learners have a similar stabilizing effect
● Choice of RL algorithm○ on- or off-policy○ value or policy-based
Asynchronous RL
Reinforcement Learning
Asynchronous RL
Asynchronous RL
● Parallel actor-learners compute online 1-step update
● Gradients accumulated over minibatch before update
1-step Q-learning
Asynchronous RL
● Q-learning with a uniform mixture of backups of length 1 through N
● Variation of Peng and Williams (1995)
n-step Q-learning
Asynchronous RL
● The agent learns a policy and a state value function● Policy gradient multiplied by an estimate of the advantage● Similar to Generalized Advantage Estimation (Schulman et al, 2015)
Asynchronous Advantage Actor-Critic (A3C)
Asynchronous RL
1-step Q-learning
Asynchronous RL
Asynchronous Advantage Actor-Critic (A3C)
Asynchronous RL
Recap
● Lightweight framework for asynchronous reinforcement learning○ Stable training with a variety of standard RL algorithms○ State-of-the-art results on a range of domains in hours on a single machine
● Async advantage actor-critic excels on:○ Both discrete and continuous actions○ Feedforward and recurrent agents○ 2D and 3D games
Model-based Methods
Model-based methods
Three learning paradigms
x
z
x
SupervisedLearning
Reinforcement Learning
y
z
ahorse left
Model-based methods
Three learning paradigms
Model
x
z
xx
z
x
SupervisedLearning
Reinforcement Learning
GenerativeModelling
y
z
a yhorse left
Model-based methods
Three learning paradigms
Model
x
z
xx
z
x
SupervisedLearning
Reinforcement Learning
GenerativeModelling
y
z
a y
(2.3, -1, 0.5, 3)
not blinkinghorse left
The Shape Boltzmann MachineS. M. Ali Eslami, Nicolas Heess, John Winn (2011)
Model-based methods
The Shape Boltzmann Machine
Model-based methods
The Shape Boltzmann Machine
Model-based methods
The Shape Boltzmann Machine
Eslami et al. (2012)
Model-based methods
The Shape Boltzmann Machine
Eslami et al. (2012)
Model-based methods
The Shape Boltzmann Machine
Eslami et al. (2012)
Model-based methods
Modern Variational Inference
Model
x
z
x
y
Approximate p(z|x) using q(z|x)
Parameterise q(z|x) by deep network
Parameterise p(x|z) by deep network
Minimise KL[ q(z|x) | p(z|x) ] via SGD
Samples from q(z|x) can be used as codes representing the image x
Recurrent Neural Networksfor Image Generation
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra (2015)
Model-based methods
Recurrent Neural Networks for Image Generation
c
x
z
p(x|c)D
ecod
ing
Gen
erat
ion
Enc
odin
gIn
fere
nce
Gregor et al. (2015)
Read
ct+1
x
Write
z
Model-based methods
Recurrent Neural Networks for Image Generation
c
x
z
p(x|c)D
ecod
ing
Gen
erat
ion
Enc
odin
gIn
fere
nce
Write
Read Read
ct
x
ct+1
x
Write
p(x|cT)
z z
Gregor et al. (2015)
Model-based methods
Recurrent Neural Networks for Image Generation
Model-based methods
Recurrent Neural Networks for Image Generation
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey Hinton (2016)
Model-based methods
Attend, Infer, Repeat
x
z
blue brick
Mod
elIm
age
Cau
se
x
z
blue brick pile of bricks
x
z
Mod
elIm
age
Cau
se
not sufficient forgraspingcountingtransfergeneralisation
x
z
x
z1 z2
Mod
elIm
age
Cau
se
blue brick red brickpile of bricks
x
z
Mod
elIm
age
Cau
se
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
blue brick red brickpile of bricks blue brickabove
red brickbelow
x
z1 z2
Decoder
x y
z Decoder
x y
h1 h2 h3
z1 z2 z3
x
z
x
z1 z2 z3
Mod
elIn
fere
nce
Net
wor
k
Decoder
x y
h1 h2 h3
z1 z2 z3 Decoder
x y
h1 h2 h3
zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
Mod
elIn
fere
nce
Net
wor
k
x
z1 z2 z3
Decoder
x y
h1 h2 h3
z1 z2 z3 Decoder
x y
h1 h2 h3
zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
Mod
elIn
fere
nce
Net
wor
k
x
z1 z2 z3
focus on representation not reconstruction
output is a setorder? count?
x y
zpres
zwhatxatt yatt
hi
zwhere...
VA
E
yi
i ii
i
i
... ...
Attend, Infer, Repeat
Key Ideas
1. Build in structureGet out meaning
2. Inference networks that area. recurrentb. variable-lengthc. attentive
3. End-to-end learning througha. discrete, continuous varsb. inference and model nets
Decoder
x y
h1 h2 h3
zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
Attend, Infer, Repeat
Omniglot
Attend, Infer, Repeat
Representational Power
6
9
no
yes
Sum? Increasing order?
Attend, Infer, Repeat
Additional Structure
x
z
distributed vector that correlates with blue brick
learned
Attend, Infer, Repeat
Additional Structure
x
z
distributed vector that correlates with blue brick
learned
x
z
class=brickcolour=blueposition=Protation=R
specified
Attend, Infer, Repeat
Additional Structure
Decoder
x y
h1 h2 h3
z1 z2 z3
x
z1 z2 z3
specified
Attend, Infer, Repeat
Inverse Graphics
Attend, Infer, Repeat
Inverse Graphics
Attend, Infer, Repeat
Policy learning
Tabl
e-to
pM
NIS
T
Towards Deep Symbolic Reinforcement LearningMarta Garnelo, Kai Arulkumaran, Murray Shanahan (2016)
Unsupervised Learning of 3D Structure from Images
Danilo Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess (2016)
Model-based methods
Unsupervised Learning of 3D Structure from Images
Model-based methods
Unsupervised Learning of 3D Structure from Images
Unsupervised Learning of 3D Structure from Images
Inferring object meshes
Unsupervised Learning of 3D Structure from Images
Class-conditional samples
Unsupervised Learning of 3D Structure from Images
3D structure from multiple 2D images
Pixel Recurrent Neural NetworksAäron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu (2016)
Generative Modelling
Goal: Learn a generative model of natural images
Research landscape:● Latent variable models (VAEs, DRAW)● Adversarial (GANs)● Fully visible (NADE, MADE, RIDE)
PixelRNN: Fully visible, probabilistic, tractable, density estimator
Pixel Recurrent Neural Networks
Pixel Recurrent Neural Networks
Model
● Fully visible
● Similar to language models with RNNs
● Model pixels with Softmax
Pixel Recurrent Neural Networks
Masked Convolutions
Spatially Colors
Pixel Recurrent Neural Networks
Masked Convolutions
Pixel Recurrent Neural Networks
Binary MNIST
Pixel Recurrent Neural Networks
CIFAR-10
Pixel Recurrent Neural Networks
CIFAR-10
occluded
occluded completions
occluded completions original
Elephant
Sandbar
Coral Reef
Horse
Lhasa Apso (Dog)
Brown Bear
Lawn Mower
Robin (Bird)
Geyser
White Whale
Hartebeest
Tiger
Alp
Matching Networks for One Shot LearningOriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra (2016)
Core Machine Learning
Matching Nets x1
x2
x3
q
y1
y2
y3
z1
z2
z3
qz
a3
a2
a1
yz
y1
y2
y3
Core Machine Learning
Matching Nets x1
x2
x3
q
y1
y2
y3
z1
z2
z3
qz
a3
a2
a1
yz
y1
y2
y3
red implements same-class-or-notnetwork
Core Machine Learning
Matching Nets x1
x2
x3
q
y1
y2
y3
z1
z2
z3
qz
a3
a2
a1
yz
y1
y2
y3
the idea is useful because it allows us to construct a classifier on the
fly without any further training
Core Machine Learning
Matching Nets x1
x2
x3
q
y1
y2
y3
z1
z2
z3
qz
a3
a2
a1
yz
y1
y2
y3
R
Core Machine Learning
Matching Netsy1
y2
y3
x1
x2
x3
q
y1
y2
y3
z1
z2
z3
qz
a3
a2
a1
c
R
Core Machine Learning
Matching Nets
● Machine Learning● Deep Supervised Learning● Deep Reinforcement Learning● Model-based Methods● Deep Variational Inference● Structured / Unstructured Generative Models● Matching Networks
Recap