Modern Artificial Intelligence via Deep Learningmpd37/teaching/ml_tutorials/... · 2016-10-24 ·...

Modern Artificial Intelligence via Deep Learning

S. M. Ali Eslami

October 2016

Algorithms

OutputInput

Algorithm

ProgrammableComputer

Introduction

Artificial Intelligence / Machine Learning

OutputInput

Algorithm

ProgrammableComputer

Introduction

?

Horse

Introduction

An Analogy

Immediate Usefulness

General Applicability

Introduction

An Analogy



Introduction

An Analogy



Introduction

An Analogy



Introduction

An Analogy



?

Deep Supervised Learning

Computer

Horse

Cow

OutputInput

Prep

roce

ssin

g

Feat

ure

Extr

actio

n

Feat

ure

Sele

ctio

n

Lear

ned

Dis

crim

inat

ion

Calib

ratio

n

Algorithm

Computer

Horse

Cow

OutputInput

Stag

e 1

Stag

e 2

Stag

e 3

Stag

e 4

Stag

e 5

Algorithm

Introduction

Convolutional Neural Networks

Torch (2015)

Introduction

Convolutional Neural Networks

Krizhevsky et al. (2012)

Clarifai (2014)

Introduction

● Optimize directly for the end loss

● End-to-end training, no engineered inputs

● With enough data, learn a big non-linear function

● Supervised labeling is often enough for transferrable representations

● Large labeled dataset + big / deep neural network + GPUs


Introduction


Zhang et al. (2015) Simonyan et al. (2014)

Text Classification Video Classification

Introduction

● Innovation continues○ Inception (Szegedy et al., 2015)○ Residual connections (He et al., 2015)○ Batchnorm (Ioffe et al., 2015)

● Performance is continuously improving


Szegedy et al., (2015)

Introduction

● Sequence Modelling● Unsupervised Learning● Generative Modelling● Probabilistic Modelling● ...

Beyond Supervised Learning

Hochreiter et al. (1997) Vinyals et al. (2015) Kavukcuoglu et al. (2009) Hinton et al. (2006)

Larochelle et al. (2011)Rasmus et al. (2015)

Where does the data come from?

Reinforcement Learning

● Supervised / unsupervised learning is important○ Gives us tractable targets○ Helps model development○ Sometimes best to do algorithmic search when gradients are not noisy○ However large labelled datasets are not enough

● Real AI requires agents that○ interpret their environments○ act in their environments to gather data○ control themselves and their environments○ form representations that generalize○ learn end-to-end with minimal engineering

Agents


Architecture

Agent Environment

Observations

Actions


Powered by Neural Networks

Tesauro (1989) Lange et al. (2012)

Levine et al. (2015) Schulman et al. (2015)

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis

Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis (2015)

Human-level control in ATARI

100+ classic 8-bit Atari games

● Observations: Raw video (~30k dimensional)● Actions: 18 buttons but not told what they do● Goal: Simply to maximize score

● Designed to be challenging and interesting for humans● Widely adopted benchmark for evaluation (Bellemare et al., 2013)● Provides a rich visual domain● Many different games emphasize control, strategy, planning, etc.

ATARI agents

● Maximise total future reward:

● For a policy π the action-value function Q is:

● Measure of how good action a is in state s○ Greedy: Follow the max○ ε-greedy: Follow the max with (1-ε) probability and random otherwise


The action-value function

● Maximizing Qπ(s,a) over possible policies gives the optimal action-value function and the Bellman equation:

● Basic idea:○ Approximate ○ Apply the Bellman Equation as an iterative update:


Value iteration


End-to-end reinforcement learning

Mnih et al. (2015)

http://www.youtube.com/watch?v=Iw4tTDGUlIY

● We need a loss function to minimize

● So now we can do our good old SGD update:


Value iteration

● Experiences in a sequence are correlated○ Do not do online updates, store in replay memory○ Sample from experience replay memory to apply Q-updates

● Targets can not depend on same Θi → introduce target network


Value iteration


Deep Q Networks


Results

http://www.youtube.com/watch?v=wHDxF5N700Q


Results

http://www.youtube.com/watch?v=p4Kem0wQoHs


Results

http://www.youtube.com/watch?v=Erkt7HelEco


Evaluation

1 epoch = 50,000 interactions = 30 minutes of experienceTotal experience: 10m interactions = 5 days


Data ‘efficiency’


Deep RL for Continuous Control

http://www.youtube.com/watch?v=JeVppkoloXs

Asynchronous Methodsfor Deep Reinforcement Learning

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu (2016)


● DQN is very robust, but computationally expensive○ About a week to train on a single GPU

● Off-policy Q-Learning○ We would like a robust system for both on-policy and off-policy methods

● Discrete action space○ We want to be able to use the same method on continuous action spaces too

Asynchronous RL


● Asynchronous training of RL agents

● Parallel actor-learners implemented using CPU threads

● No replay? Parallel actor-learners have a similar stabilizing effect

● Choice of RL algorithm○ on- or off-policy○ value or policy-based

Asynchronous RL


Asynchronous RL

Asynchronous RL

● Parallel actor-learners compute online 1-step update

● Gradients accumulated over minibatch before update

1-step Q-learning

Asynchronous RL

● Q-learning with a uniform mixture of backups of length 1 through N

● Variation of Peng and Williams (1995)

n-step Q-learning

Asynchronous RL

● The agent learns a policy and a state value function● Policy gradient multiplied by an estimate of the advantage● Similar to Generalized Advantage Estimation (Schulman et al, 2015)

Asynchronous Advantage Actor-Critic (A3C)

Asynchronous RL

1-step Q-learning

Asynchronous RL

Asynchronous Advantage Actor-Critic (A3C)

Asynchronous RL

Labyrinth

http://www.youtube.com/watch?v=nMR5mjCFZCw

Asynchronous RL

Recap

● Lightweight framework for asynchronous reinforcement learning○ Stable training with a variety of standard RL algorithms○ State-of-the-art results on a range of domains in hours on a single machine

● Async advantage actor-critic excels on:○ Both discrete and continuous actions○ Feedforward and recurrent agents○ 2D and 3D games

Model-based Methods

Model-based methods

Three learning paradigms

x

z

x

SupervisedLearning


y

z

ahorse left

Model-based methods


Model

x

z

xx

z

x

SupervisedLearning


GenerativeModelling

y

z

a yhorse left

Model-based methods


Model

x

z

xx

z

x

SupervisedLearning


GenerativeModelling

y

z

a y

(2.3, -1, 0.5, 3)

not blinkinghorse left

The Shape Boltzmann MachineS. M. Ali Eslami, Nicolas Heess, John Winn (2011)

Model-based methods

The Shape Boltzmann Machine

Model-based methods


Model-based methods


http://www.youtube.com/watch?v=tk9FTdKOL5Q

Model-based methods


Eslami et al. (2012)

Model-based methods



Model-based methods



Model-based methods

Modern Variational Inference

Model

x

z

x

y

Approximate p(z|x) using q(z|x)

Parameterise q(z|x) by deep network

Parameterise p(x|z) by deep network

Minimise KL[ q(z|x) | p(z|x) ] via SGD

Samples from q(z|x) can be used as codes representing the image x

Recurrent Neural Networksfor Image Generation

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra (2015)

Model-based methods

Recurrent Neural Networks for Image Generation

c

x

z

p(x|c)D

ecod

ing

Gen

erat

ion

Enc

odin

gIn

fere

nce

Gregor et al. (2015)

Read

ct+1

x

Write

z

Model-based methods


c

x

z

p(x|c)D

ecod

ing

Gen

erat

ion

Enc

odin

gIn

fere

nce

Write

Read Read

ct

x

ct+1

x

Write

p(x|cT)

z z

Gregor et al. (2015)

Model-based methods


Model-based methods


Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey Hinton (2016)

Model-based methods

Attend, Infer, Repeat

x

z

blue brick

Mod

elIm

age

Cau

se

x

z

blue brick pile of bricks

x

z

Mod

elIm

age

Cau

se

not sufficient forgraspingcountingtransfergeneralisation

x

z

x

z1 z2

Mod

elIm

age

Cau

se

blue brick red brickpile of bricks

x

z

Mod

elIm

age

Cau

se

x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

blue brick red brickpile of bricks blue brickabove

red brickbelow

x

z1 z2

Decoder

x y

z Decoder

x y

h1 h2 h3

z1 z2 z3

x

z

x

z1 z2 z3

Mod

elIn

fere

nce

Net

wor

k

Decoder

x y

h1 h2 h3

z1 z2 z3 Decoder

x y

h1 h2 h3

zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3

x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

Mod

elIn

fere

nce

Net

wor

k

x

z1 z2 z3

Decoder

x y

h1 h2 h3

z1 z2 z3 Decoder

x y

h1 h2 h3


x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2

Mod

elIn

fere

nce

Net

wor

k

x

z1 z2 z3

focus on representation not reconstruction

output is a setorder? count?

x y

zpres

zwhatxatt yatt

hi

zwhere...

VA

E

yi

i ii

i

i

... ...


Key Ideas

1. Build in structureGet out meaning

2. Inference networks that area. recurrentb. variable-lengthc. attentive

3. End-to-end learning througha. discrete, continuous varsb. inference and model nets

Decoder

x y

h1 h2 h3


x

zwhat

y1

z1 zwherez1 zwhat

y2

z2 zwherez2

atty1

atty2


Demo Reel

http://www.youtube.com/watch?v=EKsgjR4Txk4


Omniglot


Representational Power

6

9

no

yes

Sum? Increasing order?


Additional Structure

x

z

distributed vector that correlates with blue brick

learned



x

z

distributed vector that correlates with blue brick

learned

x

z

class=brickcolour=blueposition=Protation=R

specified



Decoder

x y

h1 h2 h3

z1 z2 z3

x

z1 z2 z3

specified


Inverse Graphics


Inverse Graphics


Policy learning

Tabl

e-to

pM

NIS

T

Towards Deep Symbolic Reinforcement LearningMarta Garnelo, Kai Arulkumaran, Murray Shanahan (2016)

Unsupervised Learning of 3D Structure from Images

Danilo Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess (2016)

Model-based methods


Model-based methods



Inferring object meshes


Class-conditional samples


3D structure from multiple 2D images

Pixel Recurrent Neural NetworksAäron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu (2016)

Generative Modelling

Goal: Learn a generative model of natural images

Research landscape:● Latent variable models (VAEs, DRAW)● Adversarial (GANs)● Fully visible (NADE, MADE, RIDE)

PixelRNN: Fully visible, probabilistic, tractable, density estimator

Pixel Recurrent Neural Networks


Model

● Fully visible

● Similar to language models with RNNs

● Model pixels with Softmax


Masked Convolutions

Spatially Colors


Masked Convolutions


Binary MNIST


CIFAR-10


CIFAR-10

occluded

occluded completions

occluded completions original

Elephant

Sandbar

Coral Reef

Horse

Lhasa Apso (Dog)

Brown Bear

Lawn Mower

Robin (Bird)

Geyser

White Whale

Hartebeest

Tiger

Alp

Matching Networks for One Shot LearningOriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, Daan Wierstra (2016)

Core Machine Learning

Matching Nets x1

x2

x3

q

y1

y2

y3

z1

z2

z3

qz

a3

a2

a1

yz

y1

y2

y3


Matching Nets x1

x2

x3

q

y1

y2

y3

z1

z2

z3

qz

a3

a2

a1

yz

y1

y2

y3

red implements same-class-or-notnetwork


Matching Nets x1

x2

x3

q

y1

y2

y3

z1

z2

z3

qz

a3

a2

a1

yz

y1

y2

y3

the idea is useful because it allows us to construct a classifier on the

fly without any further training


Matching Nets x1

x2

x3

q

y1

y2

y3

z1

z2

z3

qz

a3

a2

a1

yz

y1

y2

y3

R


Matching Netsy1

y2

y3

x1

x2

x3

q

y1

y2

y3

z1

z2

z3

qz

a3

a2

a1

c

R


Matching Nets

● Machine Learning● Deep Supervised Learning● Deep Reinforcement Learning● Model-based Methods● Deep Variational Inference● Structured / Unstructured Generative Models● Matching Networks

Recap

Thanks

[email protected]

Date post:	17-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Modern Artificial Intelligence via Deep Learningmpd37/teaching/ml_tutorials/... · 2016-10-24 ·...

Documents