+ All Categories
Home > Documents > GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/...

GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/...

Date post: 12-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
75
GPU processing with Theano Razvan Pascanu (Google DeepMind) Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 1/ 75
Transcript
Page 1: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

GPU processing with Theano

Razvan Pascanu (Google DeepMind)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 1/ 75

Page 2: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Personal Info

I Masters: in Bremen, Germany

I Advisor: Herbert Jaeger

Source: http://minds.jacobs-university.de/herbert.html

I Topic: Memory-like behaviour in recurrent neural networks

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 2/ 75

Page 3: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Personal Info

I PhD: , Canada

I Advisor: Yoshua Bengio

Source: http://www.iro.umontreal.ca/~bengioy/yoshua_en/index.html

I Topics:I Recurrent Neural Networks (e.g. learning dynamics/memory/etc)I Optimization for neural networks (e.g. natural gradient/error landscape)I Theory for neural networks (e.g. deep vs shallow)I Theano (e.g. looping/conditional computation)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 3/ 75

Page 4: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Personal Info

I Job: Researcher at , London, UK

I Topics:I OptimizationI Reinforcement Learning and Deep LearningI Deep LearningI Generative ModelsI etc.

I Publications: http://deepmind.com/publications.html

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 4/ 75

Page 5: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Now a bit about yourself

I How many of you played with MLPs / convolutional nets?

I How many of you played with RNNs ?

I How many used Torch, Caffe, or, Theano indirectly (pylearn2, blocks,lasagne)?

I How many already know Theano ?

I How many know about Reinforcement Learning (Q-learning) ?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 5/ 75

Page 6: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Software infrastructure: Why?

I Neural networks become interestingin the large data regime

I Learning can be a slow process

I Neural networks are a flexibleparadigm

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 6/ 75

Page 7: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Software infrastructure: Why?

To be competitive (or even keep up) youneed a good hammer.

I Solid software infrastructure

I Efficiency (on new hardware) withminimal effort

I Flexibility

I Simple interface

I Lots of coffee

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 7/ 75

Page 8: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Theano overview

And Theano does all of this (except the coffee ..).

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 8/ 75

Page 9: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Theano overview

pros cons

I extremely flexible

I symbolic differentiation

I efficient

I transparent GPU/CPU(stabilization tricks)

I intuitive syntax

I python interface compatible withnumpy/scipy

I steep learning curve

I different phisolophy of coding

I ”unpredictable” slowness

I requires compilation at runtime(g++, nvcc)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 9/ 75

Page 10: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Theano and extensions

I PyLearn2 (https://github.com/lisa-lab/pylearn2)

I Blocks (https://github.com/mila-udem/blocks)

I Lasagne (https://github.com/Lasagne/Lasagne)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 10/ 75

Page 11: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano:Philosophy

x W

Dot

Sigmoid

\Sigmoid( x \Dot W)

x WGEMV

Sigmoid

\Sigmoid( x \Dot W)

x W

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 11/ 75

Page 12: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano:Simple Example

import numpyimport theanoimport theano.tensor as TT

x = TT.vector("input")W = TT.shared(numpy.random.uniform(

low=.01,high=.01,size=(500,784)))

out = TT.tanh(TT.dot(W, x))

fn = theano.function(x, out)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 12/ 75

Page 13: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano:Variables

x = TT.fvector("input")W = TT.shared(numpy.random.uniform(

low=.01,high=.01,size=(500,784)).astype(’float32’))

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 13/ 75

Page 14: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Variables

Source: Theano Library documentation

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 14/ 75

Page 15: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Operations

out = TT.tanh(TT.dot(W, x))

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 15/ 75

Page 16: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Operations

I TT.Dot

I TT.tanh

I TT.nnet.sigmoid

I TT.nnet.relu

I TT.nnet.conv2d

I * + / -

I ...

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 16/ 75

Page 17: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Operations

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 17/ 75

Page 18: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Function

fn = theano.function(x, out)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 18/ 75

Page 19: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Function

Source: Theano Library documentation

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 19/ 75

Page 20: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Using a function

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 20/ 75

Page 21: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Using a function

x = numpy.random.uniform(size=(784,)).astype(’float32’)print (fn(x))

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 21/ 75

Page 22: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Grad

gW = TT.grad(out.sum(), W)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 22/ 75

Page 23: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Grad

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 23/ 75

Page 24: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Grad

Source: Theano Library documentation

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 24/ 75

Page 25: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan I

x(n) = tanh(Wx(n − 1) + W in

1 u(n) + W in2 u(n − 4) + W feedbacky(n − 1)

)

y(n) = W outx(n − 3)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 25/ 75

Page 26: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan II

def oneStep(u tm4 , u t , x tm3 , x tm1 , y tm1 , W, W in 1 ,W in 2 , W feedback , W out):

x t = T.tanh(theano.dot(x tm1 , W) +theano.dot(u t , W in 1) +theano.dot(u tm4 , W in 2) +theano.dot(y tm1 , W feedback))

y t = theano.dot(x tm3 , W out)

return [x t , y t]

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 26/ 75

Page 27: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan III

([x vals , y vals], updates) = theano.scan(fn=oneStep,sequences=dict(input=u, taps=[−4,−0]),outputs info=[dict(initial=x0, taps=[−3,−1]), y0],non sequences=[W, W in 1 , W in 2 , W feedback , W out],strict=True)

# f o r s e c o n d i n p u t y , s c a n a d d s −1 i n# o u t p u t t a p s b y d e f a u l t

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 27/ 75

Page 28: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan IV

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 28/ 75

Page 29: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan V

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 29/ 75

Page 30: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan VI

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 30/ 75

Page 31: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano: Scan VII

theano.scan(fn,sequences=None,outputs info=None,non sequences=None,n steps=None,truncate gradient=−1,go backwards=False,mode=None,name=None,profile=False,allow gc=None,strict=False)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 31/ 75

Page 32: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano

To be continued ...

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 32/ 75

Page 33: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

The magic of GPUs

Source: http://docs.nvidia.com/cuda/cuda-c-programming-guide

ALU = Arithmetic Logic Units (workhorse for arithmetic and bitwise logical operations)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75

Page 34: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

I The power is in having more cores rather than in faster cores.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 34/ 75

Page 35: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Comparing GPUs

Source: www.nvidia.com/object/tesla-servers.html

With (k80) a max frequency per core of just 875 MHz

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 35/ 75

Page 36: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 36/ 75

Page 37: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 37/ 75

Page 38: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 38/ 75

Page 39: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 39/ 75

Page 40: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 40/ 75

Page 41: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Around 1999/2000:

I Fallout

I Tomb Raider II

I Quake II

I ...

What I was doing:

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 41/ 75

Page 42: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Shift in paradigm

Now also heavily used for machine learning (deep learning)

Source: https://developer.nvidia.com/deep-learning

Allmost any deep learning success story hides behind hundreds of GPUs.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 42/ 75

Page 43: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Efficiency of parallelization (e.g. matrix multiplication)

Source: https: // en. wikipedia. org/ wiki/ Matrix_ multiplication

Virtually, GPUs can make an O(nm) into O(m) algorithm.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 43/ 75

Page 44: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Theano and GPUs

Go to http://deeplearning.net/software/theano/install.html

I Define a $CUDA ROOT environment variable to equal the cuda root directory, asin CUDA ROOT=/path/to/cuda/root, or

I add a cuda.root flag to THEANO FLAGS, as inTHEANO FLAGS=’cuda.root=/path/to/cuda/root’, or

I add a [cuda] section to your .theanorc file containing the option root =/path/to/cuda/root.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 44/ 75

Page 45: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Basics of Theano on GPU: Flags

THEANO FLAGS=floatX=float32,device=gpu0 python \my script.py

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 45/ 75

Page 46: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

And it should work?

Theoretically yes ... practically not quite.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 46/ 75

Page 47: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does GPU code work?

For most Ops there exists a GPU variant:

I Gemm ⇔ GpuGemm

I Softmax ⇔ GpuSoftmax

I ...

You have GpuFromHost and HostFromGpu to move data around.

I Optimizations will try to move ops from CPU to GPU (patching withGpuFromHost, HostFromGpu where needed)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 47/ 75

Page 48: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does GPU code work?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 48/ 75

Page 49: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does GPU code work?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 49/ 75

Page 50: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does GPU code work?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 50/ 75

Page 51: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does GPU code work?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 51/ 75

Page 52: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How does a GPU Op work?

def c code(self, node, nodename , inputs, outputs, sub):return """if (%(x)s−>nd != 2)...if (CudaNdarray gemm(1.0f, %(x)s, %(y)s, 0.0f, %(z)s)){if (%(z)s){Py DECREF(%(z)s);

...""" % locals();

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 52/ 75

Page 53: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Debugging tools: Print Op

h = theano.printing.Print("value x",attrs=("mean", "max"))(h)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 53/ 75

Page 54: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Debugging tools: debugprint

> theano.printing.debuprint(prediction)Elemwise {gt,no inplace } [@A] ’’| Elemwise { true div ,no inplace } [@B] ’’| | DimShuffle {x } [@C] ’’| | | TensorConstant {1} [@D]| | Elemwise {add,no inplace } [@E] ’’| | DimShuffle {x } [@F] ’’| | | TensorConstant {1} [@D]| | Elemwise {exp,no inplace } [@G] ’’| | Elemwise {sub,no inplace } [@H] ’’| | Elemwise {neg,no inplace } [@I] ’’| | | dot [@J] ’’

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 54/ 75

Page 55: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Debugging tools: pydotprint

> theano.printing.pydotprint(func)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 55/ 75

Page 56: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Debugging tools: Test values

# e n a b l e on− t h e − f l y g r a p h c o m p u t a t i o n stheano.config.compute test value = ’warn’

...

# i n p u t w h i c h w i l l b e o f s h a p e ( 5 , 1 0 )x = T.matrix(’x’)# p r o v i d e T h e a n o w i t h a d e f a u l t t e s t − v a l u ex.tag.test value = numpy.random.rand(5, 10)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 56/ 75

Page 57: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Debugging tools: ipdb

import ipdb

...ipdb.set trace()

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 57/ 75

Page 58: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Configuring Theano: MODE

fun = theano.function([x], out, mode=’DebugMode’)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 58/ 75

Page 59: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Configuring Theano: Linker

fun = theano.function([x], out,mode=theano.mode.Mode(linker=’cvm’,optimizer=’fast run’)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 59/ 75

Page 60: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Configuring Theano: Profiling

fun = theano.function([x], out, profile=True)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 60/ 75

Page 61: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Carefully writing your graph !?

Theano is helpful but it is no magic

I If code is slow, profile !

I Ensure you do not have many HostFromGpu, GpuFromHost nodes

I Ask for help and be patient

I Optimization is NP hard so give it some slack

I Rely on the many existing examples ...

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 61/ 75

Page 62: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Carefully writing your graph – example

x = TT.matrix(’x’)W = TT.matrix(’W’)out, = theano.scan(lambda x,W:TT.dot(W,x),

sequences=x,non sequences=W)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 62/ 75

Page 63: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

This should be just a GEMM call ...

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 63/ 75

Page 64: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

GPU: words of wisdom

I only float32

I only matrix multiplications, convolutions, large element-wise operations will beaccelerated

I indexing, dimension-shuffling as fast on CPU as GPU

I summation over rows/cols slightly slower

I copying large amounts of data is slow

I set allow gc=False

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 64/ 75

Page 65: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Benchmarks

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 65/ 75

Page 66: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Benchmarks

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 66/ 75

Page 67: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Citing Theano

I F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N.Bouchard, D. Warde-Farley and Y. Bengio. Theano: new features and speedimprovements. NIPS 2012 deep learning workshop.

I J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J.Turian, D. Warde-Farley and Y. Bengio. Theano: A CPU and GPU MathExpression Compiler. Proceedings of the Python for Scientific ComputingConference (SciPy) 2010. June 30 - July 3, Austin, TX

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 67/ 75

Page 68: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

How to get help

I Register to theano-announce for important changes

I Register to theano-users for help

I Register to theano-dev for devs.

I Ask/view questions/answers at StackOverflow

I Look at Github tickets

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 68/ 75

Page 69: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Developers (early days)

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 69/ 75

Page 70: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Developers

MILA students, from where many of the new Theano dev team come from:

MILA: Montreal Institute for Learning Algorithms

iSource: http://www.iro.umontreal.ca/~bengioy/talks/NIPS-OPT-workshop-12dec2014.pdf

But also many many more ...

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 70/ 75

Page 71: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Thank you

Questions ?

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 71/ 75

Page 72: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Possible exercise for the afternoon sessions I

Pick one or several tasks from the Deep Learning Tutorials:

I Logistic Regression

I MLP

I AutoEncoders / Denoising AutoEncoders

I Stacked Denoising AutoEncoders

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 72/ 75

Page 73: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Possible exercise for the afternoon sessions II

Compare different initialization for neural networks (MLPs and ConvNets) withrectifieres or tanh. In particular compare:

I The initialization proposed by Glorot et al.

I Sampling uniformally from [− 1fanin

, 1fanin

]

I Setting all singular values to 1 (and biases to 0)

How do different optimization algorithms help with this initializations? Extra kudos forinteresting plots or analysis. Please make use of the Deep Learning Tutorials.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 73/ 75

Page 74: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Possible exercise for the afternoon sessions IIIRequires convolutions

Re-implement the AutoEncoder tutorial using convolutions both in the encoder andthe decoder. Extra kudos for allowing pooling (or strides) in the encoder.

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 74/ 75

Page 75: GPU processing with Theano...Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 33/ 75 I The power is in having more cores rather than in faster cores. Razvan Pascanu

Possible exercise for the afternoon sessions IVRequires Reinforcement Learning

Attempt to solve the Catch game.

Not actual screenshots of the game

Razvan Pascanu (Google DeepMind) Theano: an overview 17 August 2015 75/ 75


Recommended