+ All Categories
Home > Technology > Anil Thomas - Object recognition

Anil Thomas - Object recognition

Date post: 08-Feb-2017
Category:
Upload: nervana-systems
View: 4,806 times
Download: 0 times
Share this document with a friend
41
Object Recognition for Fun and Profit Anil Thomas SV Deep Learning Meetup November 17 th , 2015
Transcript
Page 1: Anil Thomas - Object recognition

Object Recognition for Fun and Profit

Anil Thomas

SV Deep Learning Meetup

November 17th, 2015

Page 2: Anil Thomas - Object recognition

Outline

2

•  Neon examples

•  Intro to convnets

•  Convolutional autoencoder

•  Whale recognition challenge

Page 3: Anil Thomas - Object recognition

NEON

3

Page 4: Anil Thomas - Object recognition

Neon

4

Backends NervanaCPU, NervanaGPU NervanaEngine (internal)

Datasets Images: ImageNet, CIFAR-10, MNIST

Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon

Initializers Constant, Uniform, Gaussian, Glorot Uniform

Learning rules Gradient Descent with Momentum

RMSProp, AdaDelta, Adam, Adagrad

Activations Rectified Linear, Softmax, Tanh, Logistic

Layers Linear, Convolution, Pooling, Deconvolution, Dropout

Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum, LookupTable

Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error

Metrics Misclassification, TopKMisclassification, Accuracy

•  Modular components

•  Extensible, OO design

•  Documentation

•  neon.nervanasys.com

Page 5: Anil Thomas - Object recognition

HANDS ON EXERCISE

5

Page 6: Anil Thomas - Object recognition

INTRO TO CONVNETS

6

Page 7: Anil Thomas - Object recognition

Convolution

0 1 2

3 4 5

6 7 8

0 1

2 3

19 25

37 43

0 1 3 4 0 1 2 3 19

7

•  Each element in the output is the result of a dot product between two vectors

Page 8: Anil Thomas - Object recognition

Convolutional layer

8

0

1

2

3

4

5

6

7

8

19

8

0 1 2

3 4 5

6 7 8

0 1

2 3

19 25

37 43

0

23

1

0

23

1

0

23

1

0

2

3

1

25

37

43

Page 9: Anil Thomas - Object recognition

Convolutional layer

9

x +

x + x + x =

00

1 1

32

43

The weights are shared among the units.

0

1

2

3

4

5

6

7

8

0

23

1

0

23

1

0

23

1

0

2

3

1

19

19

Page 10: Anil Thomas - Object recognition

Recognizing patterns

10

Detected the pattern!

Page 11: Anil Thomas - Object recognition

11

B0 B1 B2

B3 B4 B5

B6 B7 B8

G0 G1 G2

G3 G4 G5

G6 G7 G8

R0 R1 R2

R3 R4 R5

R6 R7 R8

Page 12: Anil Thomas - Object recognition

12

B0 B1 B2

B3 B4 B5

B6 B7 B8

G0 G1 G2

G3 G4 G5

G6 G7 G8

R0 R1 R2

R3 R4 R5

R6 R7 R8

Page 13: Anil Thomas - Object recognition

B0 B1 B2

B3 B4 B5

B6 B7 B8

13

G0 G1 G2

G3 G4 G5

G6 G7 G8

R0 R1 R2

R3 R4 R5

R6 R7 R8

Page 14: Anil Thomas - Object recognition

B0 B1 B2

B3 B4 B5

B6 B7 B8

14

G0 G1 G2

G3 G4 G5

G6 G7 G8

R0 R1 R2

R3 R4 R5

R6 R7 R8

Page 15: Anil Thomas - Object recognition

Max pooling

0 1 2

3 4 5

6 7 8

4 5

7 8

0 1 3 4 4

15

•  Each element in the output is the maximum value within the pooling window

Max( )

Page 16: Anil Thomas - Object recognition

Deconvolution

16

•  Ill posed problem, but we can approximate

•  Scatter versus gather

•  Used in convlayer backprop

•  Equivalent to convolution with a flipped kernel on zero padded input

•  Useful for convolutional autoencoders

Page 17: Anil Thomas - Object recognition

Deconv layer

17

0 0 1

0 4 6

4 12 9

0 1

2 3

0 1

2 3

Page 18: Anil Thomas - Object recognition

Deconv layer

18

0

0

1

0

4

6

4

12

9

0

23

1

x + x =

31

1 3

0

23

1

0

23

1

0

23

1

0

1

2

3

6

Page 19: Anil Thomas - Object recognition

Convolutional autoencoder

19

Input Conv1 Conv2 Conv3 Deconv1 Deconv2 Deconv3

Page 20: Anil Thomas - Object recognition

RIGHT WHALE RECOGNITION

20

Page 21: Anil Thomas - Object recognition

“Face” recognition for whales

21

•  Identify whales in aerial photographs

•  ~4500 labeled images, ~450 whales

•  ~7000 test images

•  Pictures taken over 10 years

•  Automating the identification process will aid conservation efforts

•  https://www.kaggle.com/c/noaa-right-whale-recognition

•  $10,000 prize pool

Page 22: Anil Thomas - Object recognition

Right whales

22

•  One of the most endangered whales

•  Fewer than 500 North Atlantic right whales left

•  Hunted almost to extinction

•  Makes a V shaped blow

•  Has the largest testicle in the animal kingdom

•  Eats 2000 pounds of plankton a day

Page 23: Anil Thomas - Object recognition

23

“All y’all look alike!”

Page 24: Anil Thomas - Object recognition

24

Source: http://rwcatalog.neaq.org/

Page 25: Anil Thomas - Object recognition

25

Source: https://teacheratsea.files.wordpress.com/2015/05/img_2292.jpg

Page 26: Anil Thomas - Object recognition

Brute force approach

26

Churchill

Quasimodo

Aphrodite ?

*Not actual names

Page 27: Anil Thomas - Object recognition

A better method

27

Churchill

Quasimodo

Aphrodite ?

Page 28: Anil Thomas - Object recognition

Object localization

28

•  Many approaches in the literature

•  Overfeat (http://arxiv.org/pdf/1312.6229v4.pdf)

•  R-CNN (http://arxiv.org/pdf/1311.2524v5.pdf)

Page 29: Anil Thomas - Object recognition

Even better!

29

Churchill

Quasimodo

Aphrodite ?

Page 30: Anil Thomas - Object recognition

Getting mugshots

30

•  How to go from to ?

•  Training set can be manually labeled

•  No manual operations allowed on test set!

•  Estimate the heading (angle) of the whale using a CNN?

Page 31: Anil Thomas - Object recognition

Estimate angle

31

220°

160°

120° ?

Page 32: Anil Thomas - Object recognition

An easier way to estimate angle

32

•  Find two points along the whale’s body

•  θ = arctan((y1 – y2) / (x1 – x2))

•  But how do you label the test images? θ

Page 33: Anil Thomas - Object recognition

Train with co-ords?

33

(80, 80)

(90, 130)

(80, 190) ?

Page 34: Anil Thomas - Object recognition

Train with a mask

34

?

Page 35: Anil Thomas - Object recognition

Code for convolutional encoder

35

init = Gaussian(scale=0.1) opt = Adadelta(decay=0.9) common = dict(init=init, batch_norm=True, activation=Rectlin()) layers = [] nchan = 128 layers.append(Conv((2, 2, nchan), strides=2, **common)) for idx in range(16): layers.append(Conv((3, 3, nchan), **common)) if nchan > 16: nchan /= 2 for idx in range(15): layers.append(Deconv((3, 3, nchan), **common)) layers.append(Deconv((4, 4, nchan), strides=2, **common)) layers.append(Deconv((3, 3, 1), init=init)) cost = GeneralizedCost(costfunc=SumSquared()) mlp = Model(layers=layers) callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args) mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)

Page 36: Anil Thomas - Object recognition

Code for classifier

36

init = Gaussian(scale=0.01) opt = Adadelta(decay=0.9) common = dict(init=init, batch_norm=True, activation=Rectlin()) layers = [] nchan = 64 layers.append(Conv((2, 2, nchan), strides=2, **common)) for idx in range(6): if nchan > 1024: nchan = 1024 layers.append(Conv((3, 3, nchan), strides=1, **common)) layers.append(Pooling(2, strides=2)) nchan *= 2 layers.append(DropoutBinary(keep=0.5)) layers.append(Affine(nout=447, init=init, activation=Softmax())) cost = GeneralizedCost(costfunc=CrossEntropyMulti()) mlp = Model(layers=layers) callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args) mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)

Page 37: Anil Thomas - Object recognition

37

Results –heatmaps Input epoch 0 epoch 2 epoch 4 epoch 6

Prediction indicated by

Page 38: Anil Thomas - Object recognition

38

Results –sample crops from test set

Page 39: Anil Thomas - Object recognition

39

Page 40: Anil Thomas - Object recognition

Acknowledgements

40

•  NOAA Fisheries

•  Kaggle

•  Developers of sloth

•  Playground Global

Page 41: Anil Thomas - Object recognition

Recommended