+ All Categories
Home > Documents > CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales

CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales

Date post: 24-Jan-2016
Category:
Upload: conroy
View: 40 times
Download: 0 times
Share this document with a friend
Description:
CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales. Geoffrey Hinton. An overview of how biology solves search problems. Searching for good combinations can be very slow if its done in a naive way. Evolution has found many ways to speed up searches. - PowerPoint PPT Presentation
Popular Tags:
22
CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales Geoffrey Hinton
Transcript
Page 1: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

CSC2535: Advanced Machine Learning

Lecture 11bAdaptation at multiple

time-scales

Geoffrey Hinton

Page 2: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

An overview of how biology solves search problems

• Searching for good combinations can be very slow if its done in a naive way.

• Evolution has found many ways to speed up searches. – Evolution works too well to be blind. It is being

guided.

– It has discovered much better methods than the dumb trial-and-error method that many biologists seem to believe in.

Page 3: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Some search problems in Biology

• Searching for good genes and good policies for when to express them.– To understand how evolution is so efficient, we need to

understand forms of search that work much better than random trial and error.

• Searching for good policies about when to express muscles.– Motor control works much too well for a system with a 30

mille-second feedback loop.• Searching for the right synapse strengths to represent how the

world works– Learning works much too well to be blind trial and error. It

must be doing something smarter than just randomly perturbing synapse strengths.

Page 4: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

A way to make searches work better• In high-dimensional spaces, it is a very bad idea to try making

multiple random changes.

– Its impossible to learn a billion synapse strengths by randomly changing synapses.

– Once the system is significantly better than random, almost all combinations of random changes will make it worse.

• It is much more effective to compute a gradient and change things in the direction that makes things better.

– That’s what brains are for. They are devices for computing gradients. What of?

Page 5: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

A different way to make searches work better

• It is much easier to search a fitness landscape that has smooth hills rather than sharp spikes.– Fast adaptive processes can change the

fitness landscape to make search much easier for slow adaptive processes.

Page 6: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

An example of a fast adaptive process changing the fitness landscape for a slower one

• Consider the task of drawing on a blackboard.– It is very hard to do with a dumb robot arm:

• If the robot positions the tip of the chalk just beyond the board, the chalk breaks.

• If the robot positions the chalk just in front of the board, the chalk doesn’t leave any marks.

• We need a very fast feedback loop that uses the force exerted by the board on the chalk to stop the chalk.– Neural feedback is much too slow for this.

Page 7: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

A biological solution

• Set the relative stiffnesses of opposing muscles so that the equilibrium point has the tip of the chalk just beyond the board.

• Set the absolute stiffnesses so that small perturbations from equilibrium only cause small forces (this is called “compliance”).

• The feedback loop is now in the physical system so it works at the speed of shockwaves in the arm.– The feedback in the physics makes a much nicer

fitness landscape for learning how to set the muscle stiffnesses.

Page 8: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

The energy landscape created by two opposing muscles

Physical energy in the opposing springs

Location of endpoint

start Location of board

The difference of the two muscle stiffnesses determines where the minimum is. The sum of the stiffnesses determines how sharp the minimum is.

Page 9: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Two fitness landscapes

• System that directly specifies joint angles

• System that specifies spring stiffnesses

fitness

neural signals

fitness

neural signals

Page 10: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Objective functions versus programs

• By setting the muscle stiffnesses, the brain creates an energy function.– Minimizing this energy function is left to the physics.– This allows the brain to explore the space of objective

functions (i.e. energy landscapes) without worrying about how to minimize the objective function.

• Slow adaptive processes should interact with fast ones by creating objective functions for them to optimize.– Think how a general interacts with soldiers. He

specifies their goals.– This avoids micro-management.

Page 11: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Generating the parts of an object

sloppy top-down activation of parts

clean-up using lateral interactions specified by the layer above.

pose parameters

parts with top-down support

“square” +

Its like soldiers on a parade ground

Page 12: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Another example of the same principle

• The principle: Use fast adaptive processes to make the search easier for slow ones.

• An application: Make evolution go a lot faster by using a learning algorithm to create a much nicer fitness landscape (the Baldwin effect).

• Almost all of the search is done by the learning algorithm, but the results get hard-wired into the DNA.– Its strictly Darwinian even though it achieves

most of what Lamark wanted.

Page 13: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

A toy example to explain the idea

• Consider an organism that has a mating circuit containing 20 binary switches. If exactly the right subset of the switches are closed, it mates very successfully. Otherwise not.

– Suppose each switch is governed by a separate gene that has two alleles.

– The search landscape for unguided evolution is a one-in-a-million spike.

• Blind evolution has to build about a million organisms to get one good one.

– Even if it finds a good one, that combination of genes will be almost certainly be destroyed in the next generation by crossover.

Page 14: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Guiding evolution with a fast adaptive process(godless intelligent design :-)

• Suppose that each gene has three alleles: ON, OFF, and “leave it to learning”.– ON and OFF are decisions hard-wired into the DNA– “leave it to learning” means that on each learning

trial, the switch is set randomly.• Now consider organisms that have 10 switches hard-

wired and 10 left to learning.– One in a thousand will have the correct hard-wired

decisions, and with only about a thousand learning trials, all 20 switches will be correct.

Page 15: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

The search tree

Evolution: 1000 nodes

Learning: 999,000 nodes

99.9% of the work required to find a good combination is done by learning. A learning trial is MUCH cheaper than building a new organism.

Evolution can ask learning: “Am I correct so far?”

Page 16: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

The results of a simulation (Hinton and Nowlan 1987)

• After building about 30,000 organisms, each of which runs 1000 learning trials, the population has nearly all of the correct decisions hard-wired into the DNA.– The pressure towards hard-wiring comes from the

fact that with more of the correct decisions hard-wired, an organism learns the remaining correct decisions faster.

• This suggests that learning performed almost all of the search required to create brain structures that are currently hard-wired.

Page 17: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Using the dynamics of neural activity to speed up learning

• A Boltzmann machine has an inner-loop iterative search to find a locally optimal interpretation of the current visible vector.

– Then it updates the weights to lower the energy of the locally optimal interpretation.

• An autoencoder can be made to use the same trick: It can do an inner loop search for a code vector that is better at reconstructing the input than the code vector produced by its feedforward encoder.

– This speeds the learning if we measure the learning time in number of input vectors presented to the autoencoder (Ranzato, PhD thesis, 2009).

Page 18: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

Major Stages of Biological Adaptation

• Evolution keeps inventing faster inner loops to make the search easier for slower outer loops:– Pure evolution: each iteration takes a lifetime.– Development: each iteration of gene expression takes

about 20 minutes. The developmental process may be optimizing objective functions specified by evolution (see next slide)

– Learning: each iteration takes about a second.– Inference: In one second, a neural network can

perform many iterations to find a good explanation of the sensory input.

Page 19: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

The three-eyed frog

• The two retinas of a frog connect to its tectum in a way that tries to satisfy two conflicting goals:– 1. Each point on the tectum should receive inputs from

corresponding points on the two retinas.– 2. Nearby points on one retina should go to nearby

points on the tectum.• A good compromise is to have interleaved stripes on the

tectum. – Within each stripe all cells receive inputs from the same

retina. – Neighboring stripes come from corresponding places on

the two retinas.

Page 20: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

What happens if you give a frog embryo three eyes?

• The tectum develops interleaved stripes of the form: LMRLMRLMR…– This suggests that in the normal frog, the

interleaved stripes are not hard-wired.– They are the result of running an optimization

process during development (or learning).• The advantage of this is that it generalizes much

better to unforeseen circumstances.– It may also be easier for the genes to specify

goals than the details of how to achieve them.

Page 21: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

The next great leap?

• Suppose that we let each biological learning trial consist of specifying a new objective function.

• Then we use computer simulation to evaluate the objective function in about one second. – This creates a new inner loop that is millions of

times faster than a biological learning trial.• Maybe we are on the brink of a major new stage

in the evolution of biological adaptation methods. We are in the process of adding a new inner loop:– Evolution, development, learning, simulation

Page 22: CSC2535:  Advanced Machine Learning Lecture 11b Adaptation at multiple  time-scales

THE END


Recommended