NLP - Yale UniversityMatrix Multiplication in Theano import theano import theano.tensoras T Import...

Post on 17-Jun-2020

15 views 0 download

transcript

NLP

Libraries for Deep Learning

Deep Learning

Matrix Multiplication in Python

http://stackoverflow.com/questions/10508021/matrix-multiplication-in-python

Matrix Multiplication in Numpy

Libraries for Deep Learning

• Theano (Python): http://deeplearning.net/software/theano/

• Lasagne: – Built on top of Theano, has pre-designed networks:

https://github.com/Lasagne/Lasagne• Torch (Lua):

– http://torch.ch/• TensorFlow (Python and C++):

– https://www.tensorflow.org/• Keras

Matrix Multiplication in Theano

import theanoimport theano.tensor as

TImport numpy as np

# “symbolic” variablesx = T.matrix('x')y = T.matrix(‘y’)dot = T.dot(x, y)

Matrix Multiplication in Theano

import theanoimport theano.tensor as

TImport numpy as np

# “symbolic” variablesx = T.matrix('x')y = T.matrix(‘y’)dot = T.dot(x, y)

#this is the slow partf = theano.function([x,y],

[dot])

#now we can use this function

a = np.random.random((2,3))

b = np.random.random((3,4))

Sigmoid in Theano

in = T.vector(‘in’)sigmoid = 1 / (1 + T.exp(-in))#same as T.nnet.sigmoidsigmoid = T.nnet.sigmoid(x)

Shared Variables vs Symbolic Variables

# This is symbolicx = T.matrix('x')

#shared means that it is not symbolicw = theano.shared(np.random.randn(n))b = theano.shared(0.)

Computational Graph# This is symbolicx = T.matrix('x')#shared means that it is not symbolicw = theano.shared(np.random.randn(n))b = theano.shared(0.)

# Computational Graphp_1 = sigmoid(T.dot(x, w) + b)xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropycost = xent.mean() # The cost to minimize

Automatic Gradient Computationp_1 = sigmoid(T.dot(x, w) + b)

xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy

cost = xent.mean() # The cost to minimize

gw, gb = T.grad(cost, [w, b])

Compile a Function

train = theano.function(inputs=[x,y],outputs=[prediction, xent],updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))

Computation Graphs in Theano

Computation Graphs in Tensorflow

LSTM Sentiment Analysis Demo

• If you’re new to deep learning and want to work with Theano, do yourself a favor and work through http://deeplearning.net/tutorial/

• A LSTM demo is described here: http://deeplearning.net/tutorial/lstm.html

• Sentiment analysis model trained on IMDB movie reviews

LSTMs: One Time Step

x1

h0 c1σ

c0 + c1f1

i1 h1

tanh

o1~

[Slides from Catherine Finegan-Dollak]

LSTMs: Building a Sequence

The cat sat on …

Theano Implementation of an LSTM Step

(lstm.py, L. 174) def _step(m_, x_, h_, c_):

preact = tensor.dot(h_, tparams[_p(prefix, 'U')])preact += x_

i = tensor.nnet.sigmoid(_slice(preact, 0, options['dim_proj']))f = tensor.nnet.sigmoid(_slice(preact, 1, options['dim_proj']))o = tensor.nnet.sigmoid(_slice(preact, 2, options['dim_proj']))c = tensor.tanh(_slice(preact, 3, options['dim_proj']))

c = f * c_ + i * cc = m_[:, None] * c + (1. - m_)[:, None] * c_

h = o * tensor.tanh(c)h = m_[:, None] * h + (1. - m_)[:, None] * h_

return h, c

“preact” is the sum of Wx with the dot product of the previous step’s h with the weight matrix U; U concatenates Ui, Uf, Uo, and Uc, for computational efficiency; W does the same with all the W matrices. Then the _slice function splits the dot product back out again to generate the three gates, i, f, and o, and the candidate 𝐶".

m_ is a mask, used for dealing with variable-length input.

theano.scan iterates through a series of steps

rval, updates = theano.scan(_step,sequences=[mask, state_below],outputs_info=[tensor.alloc(numpy_floatX(0.),

n_samples, dim_proj),tensor.alloc(numpy_floatX(0.),n_samples, dim_proj)],

name=_p(prefix, '_layers'),n_steps=nsteps)

(lstm.py, L. 195)

Links About Deep Learning

• Long lists of resources and papers:– http://www.cs.yale.edu/homes/radev/dlnlp2017.pdf– http://clair.si.umich.edu/~radev/dl/dl.pdf

• Richard Socher’s Stanford class– http://cs224d.stanford.edu/

• Learn Theano + deep learning in one tutorial– http://deeplearning.net/tutorial/

NLP