CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales

CSC2535: Advanced Machine Learning

Lecture 11bAdaptation at multiple

time-scales

Geoffrey Hinton

An overview of how biology solves search problems

• Searching for good combinations can be very slow if its done in a naive way.

• Evolution has found many ways to speed up searches. – Evolution works too well to be blind. It is being

guided.

– It has discovered much better methods than the dumb trial-and-error method that many biologists seem to believe in.

Some search problems in Biology

• Searching for good genes and good policies for when to express them.– To understand how evolution is so efficient, we need to

understand forms of search that work much better than random trial and error.

• Searching for good policies about when to express muscles.– Motor control works much too well for a system with a 30

mille-second feedback loop.• Searching for the right synapse strengths to represent how the

world works– Learning works much too well to be blind trial and error. It

must be doing something smarter than just randomly perturbing synapse strengths.

A way to make searches work better• In high-dimensional spaces, it is a very bad idea to try making

multiple random changes.

– Its impossible to learn a billion synapse strengths by randomly changing synapses.

– Once the system is significantly better than random, almost all combinations of random changes will make it worse.

• It is much more effective to compute a gradient and change things in the direction that makes things better.

– That’s what brains are for. They are devices for computing gradients. What of?

A different way to make searches work better

• It is much easier to search a fitness landscape that has smooth hills rather than sharp spikes.– Fast adaptive processes can change the

fitness landscape to make search much easier for slow adaptive processes.

An example of a fast adaptive process changing the fitness landscape for a slower one

• Consider the task of drawing on a blackboard.– It is very hard to do with a dumb robot arm:

• If the robot positions the tip of the chalk just beyond the board, the chalk breaks.

• If the robot positions the chalk just in front of the board, the chalk doesn’t leave any marks.

• We need a very fast feedback loop that uses the force exerted by the board on the chalk to stop the chalk.– Neural feedback is much too slow for this.

A biological solution

• Set the relative stiffnesses of opposing muscles so that the equilibrium point has the tip of the chalk just beyond the board.

• Set the absolute stiffnesses so that small perturbations from equilibrium only cause small forces (this is called “compliance”).

• The feedback loop is now in the physical system so it works at the speed of shockwaves in the arm.– The feedback in the physics makes a much nicer

fitness landscape for learning how to set the muscle stiffnesses.

The energy landscape created by two opposing muscles

Physical energy in the opposing springs

Location of endpoint

start Location of board

The difference of the two muscle stiffnesses determines where the minimum is. The sum of the stiffnesses determines how sharp the minimum is.

Two fitness landscapes

• System that directly specifies joint angles

• System that specifies spring stiffnesses

fitness

neural signals

fitness

neural signals

Objective functions versus programs

• By setting the muscle stiffnesses, the brain creates an energy function.– Minimizing this energy function is left to the physics.– This allows the brain to explore the space of objective

functions (i.e. energy landscapes) without worrying about how to minimize the objective function.

• Slow adaptive processes should interact with fast ones by creating objective functions for them to optimize.– Think how a general interacts with soldiers. He

specifies their goals.– This avoids micro-management.

Generating the parts of an object

sloppy top-down activation of parts

clean-up using lateral interactions specified by the layer above.

pose parameters

parts with top-down support

“square” +

Its like soldiers on a parade ground

Another example of the same principle

• The principle: Use fast adaptive processes to make the search easier for slow ones.

• An application: Make evolution go a lot faster by using a learning algorithm to create a much nicer fitness landscape (the Baldwin effect).

• Almost all of the search is done by the learning algorithm, but the results get hard-wired into the DNA.– Its strictly Darwinian even though it achieves

most of what Lamark wanted.

A toy example to explain the idea

• Consider an organism that has a mating circuit containing 20 binary switches. If exactly the right subset of the switches are closed, it mates very successfully. Otherwise not.

– Suppose each switch is governed by a separate gene that has two alleles.

– The search landscape for unguided evolution is a one-in-a-million spike.

• Blind evolution has to build about a million organisms to get one good one.

– Even if it finds a good one, that combination of genes will be almost certainly be destroyed in the next generation by crossover.

Guiding evolution with a fast adaptive process(godless intelligent design :-)

• Suppose that each gene has three alleles: ON, OFF, and “leave it to learning”.– ON and OFF are decisions hard-wired into the DNA– “leave it to learning” means that on each learning

trial, the switch is set randomly.• Now consider organisms that have 10 switches hard-

wired and 10 left to learning.– One in a thousand will have the correct hard-wired

decisions, and with only about a thousand learning trials, all 20 switches will be correct.

The search tree

Evolution: 1000 nodes

Learning: 999,000 nodes

99.9% of the work required to find a good combination is done by learning. A learning trial is MUCH cheaper than building a new organism.

Evolution can ask learning: “Am I correct so far?”

The results of a simulation (Hinton and Nowlan 1987)

• After building about 30,000 organisms, each of which runs 1000 learning trials, the population has nearly all of the correct decisions hard-wired into the DNA.– The pressure towards hard-wiring comes from the

fact that with more of the correct decisions hard-wired, an organism learns the remaining correct decisions faster.

• This suggests that learning performed almost all of the search required to create brain structures that are currently hard-wired.

Using the dynamics of neural activity to speed up learning

• A Boltzmann machine has an inner-loop iterative search to find a locally optimal interpretation of the current visible vector.

– Then it updates the weights to lower the energy of the locally optimal interpretation.

• An autoencoder can be made to use the same trick: It can do an inner loop search for a code vector that is better at reconstructing the input than the code vector produced by its feedforward encoder.

– This speeds the learning if we measure the learning time in number of input vectors presented to the autoencoder (Ranzato, PhD thesis, 2009).

Major Stages of Biological Adaptation

• Evolution keeps inventing faster inner loops to make the search easier for slower outer loops:– Pure evolution: each iteration takes a lifetime.– Development: each iteration of gene expression takes

about 20 minutes. The developmental process may be optimizing objective functions specified by evolution (see next slide)

– Learning: each iteration takes about a second.– Inference: In one second, a neural network can

perform many iterations to find a good explanation of the sensory input.

The three-eyed frog

• The two retinas of a frog connect to its tectum in a way that tries to satisfy two conflicting goals:– 1. Each point on the tectum should receive inputs from

corresponding points on the two retinas.– 2. Nearby points on one retina should go to nearby

points on the tectum.• A good compromise is to have interleaved stripes on the

tectum. – Within each stripe all cells receive inputs from the same

retina. – Neighboring stripes come from corresponding places on

the two retinas.

What happens if you give a frog embryo three eyes?

• The tectum develops interleaved stripes of the form: LMRLMRLMR…– This suggests that in the normal frog, the

interleaved stripes are not hard-wired.– They are the result of running an optimization

process during development (or learning).• The advantage of this is that it generalizes much

better to unforeseen circumstances.– It may also be easier for the genes to specify

goals than the details of how to achieve them.

The next great leap?

• Suppose that we let each biological learning trial consist of specifying a new objective function.

• Then we use computer simulation to evaluate the objective function in about one second. – This creates a new inner loop that is millions of

times faster than a biological learning trial.• Maybe we are on the brink of a major new stage

in the evolution of biological adaptation methods. We are in the process of adding a new inner loop:– Evolution, development, learning, simulation

THE END

Date post:	24-Jan-2016
Category:	Documents
Upload:	conroy
View:	40 times
Download:	0 times

CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales

Documents