+ All Categories
Home > Documents > Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California,...

Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California,...

Date post: 16-Dec-2015
Category:
Upload: suzanna-stevens
View: 216 times
Download: 3 times
Share this document with a friend
50
Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine
Transcript
Page 1: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Autoencoders, Unsupervised Learning, and Deep

Architectures

P. Baldi

University of California, Irvine

Page 2: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

1. General Definition

2. Historical Motivation (50s,80s,2010s)

3. Linear Autoencoders over Infinite Fields

4. Non-Linear Autoencoders: the Boolean Case

5. Summary and Speculations

Page 3: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

General Definition • x1, ,xM training vectors in EN (e.g. E=IR or {0,1})

• Learn A and B to minimize: i Δ[ FAB(xi)-xi]

B

A

H

N

N

Key scaling parameters: N, H, M

Page 4: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Autoencoder Zoo

Autoencoders

Linear Non-Linear

Complex Real Finite Fields (GF2) BooleanNeural Network

(sigmoidal)Boltzmann Machines

RBMsThreshold Gates

Boolean/Linear

Page 5: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Historical Motivation

• Three time periods: 1950s, 1980s, 2010s.

• Three motivations: – Fundamental Learning Problem (1950s)– Unsupervised Learning (1980s)– Deep Architectures (2010s)

Page 6: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

2010: Deep Architectures

Page 7: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

1950s

Page 8: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Where do you store your telephone number?

Page 9: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

THE SYNAPTIC BASIS OF MEMORY CONSOLIDATION

© 2007, Paul De Koninck© 2004, Graham Johnson

Page 10: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Scales

Size in Meters x106

Diameter of Atom 10-10 10-4 Hair

Diameter of DNA 10-9 10-3

Diameter of Synapse

10-7 10-1 Fist

Diameter of Axon 10-6 1

Diameter of Neuron

10-5 10 Room

Length of Axon 10-3-100 103-106 Park-Nation

Length of Brain 10-1 105 State

Length of Body 1 106 Nation

Page 11: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

The Organization of Behavior: A

Neuropsychological Theory (1949)

Let us assume that the persistence or repetition of a reverberatory

activity (or “trace”) tends to induce lasting cellular changes that add to its stability…….When an axon of cell A is near enough to excite a

cell B and repeatedly or persistently takes part in firing

it, some growth process of metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells

firing B, is increased.

Δwij ~ xixj

Page 12: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

1980s

• Hopfield

• PDP group

Page 13: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Back-Propagation (1985)BACK-PROPAGATION

ERROR E=F(w)

i

W ij

j

OUTPUT LAYER

INPUT LAYER

GRADIENT DESCENT: Δ w ij = µ outj єi

µ = learning rate

Page 14: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

First Autoencoder • x1, ,xM training points (real-valued vectors)

• Learn A and B to minimize i ||FAB(xi)-xi||2

B

A

sigmoidal neurons

sigmoidal neurons

H

N

N

Page 15: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.
Page 16: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder

• x1,…,xM training vectors over IRN

• Find two matrices A and B that minimize:

i ||AB(xi)-xi||2

B

A

N

N

H

Page 17: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder Theorem (IR)• A and B are defined only up to group multiplication by an invertible

HxH matrix C: W = AB = (AC-1)CB. • Although the cost function is quadratic and the transformation

W=AB is linear, the problem is NOT convex.

• The problem becomes convex if A or B is fixed. Assuming ΣXX is invertible and the covariance matrix has full rank : B*=(AtA)-1At and A*= ΣXX Bt(B ΣXX Bt)-1.

• Alternate minimization of A and B is an EM algorithm.

B

A

Page 18: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder Theorem (IR)• The overall landscape of E has no local minima. All the critical

points where the gradient is 0 are associated with projections onto subspaces associated with H eigenvectors of the covariance matrix. At any critical point: A=UI C and B=C-1UI where the columns of UI are the H eigenvectors of ΣXX associated with the index set I. In this case, W = AB = PUI

correspond to a projection. Generalization is

easy to measure and understand.• Projections onto the top H eigenvectors correspond to a global

minimum. All other critical points are saddle points.

B

A

H

N

N

Page 19: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Landscape of E

B

A

Page 20: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder Theorem (IR)• Thus any critical point performs a form of clustering by hyperplane.

For any vector x, all the vectors of the form x+KerB are mapped onto the same vector y=AB(x)=AB(x+ KerB).

• At any critical point where C=Identity A=Bt. The constraint A=Bt can be imposed during learning by weight sharing, or symmetric connections, and is consistent with a Hebbian rule that is symmetric between pre-and post- synaptic units (folded autoencoder, or clamping input and output units).

B

A

N

N

H

Page 21: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder Theorem (IR)• At any critical point, reverberation is stable for every x

(AB)2x=ABx• The global minimum remains the same if additional matrices or rank

>=H are introduced anywhere in the architecture. There is no gain in expressivity by adding such matrices.

• However such matrices could be introduced for other reasons. Vertical Composition law: “NH1HH1N ~NH1N + H1HH1”

• Results can be extended to linear case with given output targets and to the complex field.

B

A

H

N

N

Page 22: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Vertical Composition

• NH1HH1N ~ NH1N + H1HH1

H1

N

H1

H

H1

NN

H1

H1

N

H

N

H1

H

Page 23: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Linear Autoencoder Theorem (IR)• At any critical point, reverberation is stable (AB)2x=ABx• The global minimum remains the same if additional matrices or rank

>=H are introduced anywhere in the architecture. There is no gain in expressivity by adding such matrices.

• However such matrices could be introduced for other reasons. VerticalcComposition law: “NH1HH1N ~NH1N + H1HH1”

• Results can be extended to linear case with given output targets and to the complex field.

• Provides some intuition

for the non-linear case.

B

A

H

N

N

Page 24: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Page 25: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

• x1,…,xM training vectors over IHN (binary)

• Find Boolean functions A and B that minimize:

i H[AB(xi),xi]

H= Hamming Distance

• Variation 1: Enforce AB(xi) {x1,…,xM}

• Variation 2: Restrict A and B (connectivity, threshold gates, etc)

Page 26: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix A

Page 27: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix A

h=10010

Page 28: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix A

h=10010

y=A(h)=11010110010

Page 29: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix A

h=10010

y=A(h)=11010110010

A(h1)

A(h2)

A(h3)

Page 30: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Autoencoder

Fix A

h=10010

y=A(h)=11010110010

B({Voronoi A(h)}) =h

A(h1)

A(h2)

A(h3)

Page 31: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Autoencoder

Fix A

h=10010

y=A(h)=11010110010

B({Voronoi A(h)}) =h

A(h1)

A(h2)

A(h3)

Page 32: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

Page 33: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

h=10100

A

Page 34: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

h=10100

A

A(h)=?

Page 35: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

00110101001

11010100101

10101010101

h=10100

A

A(h)=?

Page 36: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

00110101001

11010100101

10101010101

h=10100

A

A(h)=10110100101

Page 37: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder

Fix B

00110101001

11010100101

10101010101

h=10100

A

A(h)=10110100101 A(h)=Majority[B-1(h)]

Page 38: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder Theorem

• A and B are defined only up to the group of permutations of the 2H points in the H-dimensional hypercube of the hidden layer.

• The overal optimization problem is non trivial. Polynomial time solutions exist when H is held constant (centroids in the training set). When H~εLogN the problem becomes NP-complete.

• The problem has a simple solution when A is fixed or B is fixed: A*(h)=Majority {B-1(h)} B*{Voronoi A(h)}=h [B*(x)=h such that A(h) is closest to x among {A(h)}].

• Every “critical point” (A* and B*) correspond to a clustering into K=2H clusters. The optimum correspond to the best clustering. (Maximum?) Plenty of approximate algorithms (k means, hierarchical clustering, belief propagation (centroids in training set).

• Generalization is easy to measure and understand.

Page 39: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Boolean Autoencoder Theorem

• At any critical point, reverberation is stable.• The global minimum remains the same if additional Boolean

functions with layers >=H are introduced anywhere in the architecture. There is no gain in expressivity by adding such functions.

• However such functions could be introduced for other reasons. Composition law: “NH1HH1N ~NH1N + H1HH1”. Can achieve hierarchical clustering in input space.

• Results can be extended to the case with given output targets.

Page 40: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Learning Complexity• Linear autoencoder over infinite fields can be

solved analytically• Boolean autoencoder is NP complete as soon

as the number of clusters (K=2H) scales like Mε (for ε>0). It is solvable in polynomial time when K is fixed.

• Linear autoencoder over finite fields is NP complete in the general case.

• RBM learning is NP complete in the general case.

Page 41: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Embedding of Square Lattice in Hypercube

• 4x3 square lattice with embedding in H7

0000000

1111111

1111000

0000111

Page 42: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Vertical Composition

Page 43: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Horizontal Composition

Page 44: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Autoencoders with H>N

• Identity provides trivial solution• Regularization//Horizontal Composition//Noise

Page 45: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Information and Coding (Transmission and Storage)

message

parity bits

noisy channel

decoded message

Page 46: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Summary and Speculations

Autoencoders

Linear Non-Linear

Complex Real Finite Fields (GF(2)) BooleanNeural Network

(sigmoidal)Boltzmann Machines

RBMsBoolean/Linear

over GF(2)

Boolean/Linear over R or C

Threshold Gates

Page 47: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Unsupervised Learning

Autoencoders

Clustering

Hebbian Learning

Page 48: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Information and Coding Theory

Autoencoders

Compression

Communication

Page 49: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Deep Architectures

Autoencoders

Vertical Composition

HorizontalComposition

Page 50: Autoencoders, Unsupervised Learning, and Deep Architectures P. Baldi University of California, Irvine.

Summary and Speculations• Unsupervised Learning: Hebb, Autoencoders,

RBMs, Clustering• Conceptually clustering is the fundamental

operation • Clustering can be combined with targets• Clustering is composable: horizontally, vertically,

recursively, etc.• Autoencoders implement clustering and labeling

simultaneously• Deep architecture conjecture


Recommended