Complementary Learning Systems in Natural and Artificial ... · Complementary Learning Systems in...

Complementary Learning Systems in Natural and Artificial Intelligence

James L. McClellandDepartment of Psychology &

Center for Mind, Brain and Computation

Stanford University

Tom’s questions for me

• What sort of NN architectures could serve an automated programmer in constructing a program?

• How do you imagine different memory systems working in a human programmer?

Outline for the session

• Complementary learning systems

– The basic theory

– Rapid schema consistent learning

– Comparison of the two learning systems

• Deep learning and complementary learning systems

– Rehearsal buffer in the DQN

– Memory based parameter adaptation

• Revisiting Tom’s prompt and a response

Your knowledge is in your connections!

• An experience is a pattern of activation over neurons in one or more brain regions.

• The trace left in memory is the set of adjustments to the strengths of the connections.

– Each experience leaves such a trace, but the traces are not separable or distinct.

– Rather, they are superimposed in the same set of connection weights.

• Recall involves the recreation of a pattern of activation, using a part or associate of it as a cue.

• The reinstatement depends on the knowledge in the connection weights, which in general will reflect influences of many different experiences.

• Thus, memory is always a constructive process, dependent on contributions from many different experiences.

Effect of a HippocampalLesions

• Intact performance on tests of intelligence, general knowledge, language, other acquired skills

• Dramatic deficits in formation of some types of new memories:– Explicit memories for

episodes and events– Paired associate learning– Arbitrary new factual

information

• Spared priming and skill acquisition

• Temporally graded retrograde amnesia:– lesion impairs recent

memories leaving remote memories intact.

Note: HM’s lesion was bilateral

Key Points

• We learn about the general pattern of experiences, not just specific things

• Gradual learning in the cortex builds implicit semantic and procedural knowledge that forms much of the basis of our cognitive abilities

• The Hippocampal system complements the cortex by allowing us to learn specific things without interference with existing structured knowledge

• In general these systems must be thought of as working together rather than being alternative sources of information.

Effect of Prior Association on Paired-Associate Learning in Control and Amnesic Populations

Cutting (1978), Expt. 1

-20

0

20

40

60

80

100

Very Easy Easy Fairly Easy Hard Very Hard

Category (Ease of Association)

Pe

rce

nt

Co

rre

ct

Control (Expt)

Amnesic (Expt)

Base rates

Kwok & McClelland Model ofSemantic and Episodic Memory

• Model includes slow learning cortical system and a fast-learning hippocampal system.

• Cortex contains units representing both content and context of an experience.

• Semantic memory is gradually built up through repeated presentations of the same content in different contexts.

• Formation of new episodic memory depends on hippocampus and the relevant cortical areas, including context.

– Loss of hippocampus would prevent initial rapid binding of content and context.

• Episodic memories benefit from prior cortical learning when they involve meaningful materials.

ContextRelation Cue

Target

Neo-Cortex

Hippocampus

Simulation Results From KM Model

Cutting (1978), Expt. 1

84

0

70

9

0

68

-20

0

20

40

60

80

100

Very Easy Easy Fairly Easy Hard Very Hard

Category (Ease of Association)

Pe

rce

nt

Co

rre

ct

Control (Model)

Amnesic (Model)

Control (Expt)

Amnesic (Expt)

Base ratesin model

Emergence of Meaning in Learned Distributed Representations through

Gradual Interleaved Learning

• Distributed representations (what ML calls embeddings) that capture aspects of meaning emerge through a gradual learning process

• The progression of learning and the representations formed capture many aspects of cognitive development

Progressive differentiation

– Sensitivity to coherent covariation across contexts

– Reorganization of conceptual knowledge

The Rumelhart Model

The Training Data:

All propositions true of items at the bottom levelof the tree, e.g.:

Robin can {grow, move, fly}

Experience

Early

Later

LaterStill

What happens in this system if we try to learn something new?

Such as a Penguin

Learning Something New

• Used network already trained with eight items and their properties.

• Added one new input unit fully connected to the representation layer

• Trained the network withthe following pairs of items:

– penguin-isaliving thing-animal-bird

– penguin-cangrow-move-swim

Rapid Learning Leads to Catastrophic Interference

A Complementary Learning System in the Medial Temporal Lobes

colorform

motion

action

valance

Temporal

pole

name

Medial Temporal Lobe

Avoiding Catastrophic Interference with Interleaved Learning

Initial Storage in the Hippocampus Followed by Repeated Replay Leads to the Consolidation of

New Learning in Neocortex, Avoiding Catastrophic Interference

colorform

motion

action

valance

Temporal

pole

name

Medial Temporal Lobe

Rapid Consolidation of Schema Consistent Information

RichardMorris

Tse et al (Science, 2007, 2011)

During training, 2 wellsuncovered on each trial

Schemata and Schema Consistent

Information

• What is a ‘schema’?– An organized knowledge

structure into which existing knowledge is organized.

• What is schema consistent information?– Information that can be

added to a schema without disturbing it.

• What about a penguin?– Partially consistent– Partially inconsistent

• In contrast, consider– a trout– a cardinal

New Simulations

• Initial training with eight items and their properties as before.

• Added one new input unit fully connected to the representation layer also as before

• Trained the network on one of the following pairs of items:

– penguin-isa & penguin-can– trout-isa & trout-can– cardinal-isa & cardinal-can

New Learning of Consistent and Partially Inconsistent Information

INTERFERENCELEARNING

Connection Weight Changes after Simulated NPA, OPA and NM Analogs

Tse Et al 2011

How Does It Work?

How Does It Work?

Comparison of the two learning systems

Dense vs Sparse Coding

• Pattern separation:

– Sparse randomconjunctive coding

Similarity Based Representations in Cortex

In more detail…

• Input from neocortex comes into EC; EC projects to DG, CA3, and CA1

• Drastic pattern separation occurs in DG

• Downsampling in CA3 assigns an arbitrary code

• Invertable somewhat sparsifiedrepresentation in CA1

• Fewish-shot learning in DG, CA3, CA3->CA1 allows reconstruction of ERC pattern from partial input.

• Other connections shown in black are part of the slow-learning neocortical network.

• Recurrence within CA3, through the hippocampal circuit shown, and through the outer loop also involving the rest of the neocortex

Two modes of generalization

• Parametric vs. Item-based

• As long as the embeddings are already known, these modes can both support generalization

• The hippocampus can do so without requiring interleaved learning

• Adapting the embeddings may be relatively hard

ContextRelation Cue

Target

Neo-Cortex

Hippocampus

How might hippocampus support inference and generalization?

‘Inference’

• Finding missing links in the transitive inference task

Complementary Learning Systems in AI

• DQN • MBPA

Tom’s questions for me

• What sort of NN architectures could serve an automated programmer in constructing a program?

• How do you imagine different memory systems working in a human programmer?

• My version of the question:

– What additional form of memory do intelligent agent’s need?

Working Memory

• Is there a special working memory system in the brain?

• Or do we learn connection weights that sustain information an active state in memory?

• RNNs and LSTMs provide forms of working memory

• What is exciting about these models is that they learn what to retain

– We learn to retain the information that will be useful later

The Differentiable Neural Computer

Learning what to store – in two senses

Memory Augmented Neural Networks

Santoro et al (2016) One-shot learning with MANNs

Some closing comments

• Cognitive Science, Neuroscience, and AI now have increasingly powerful ideas that we can use to help us understand learning and memory

• AI has expanded the space of what we can consider to be learned rather than innate

• But currently, AI breakthroughs are drastically over-compartmentalized

• We can use meta-learning to teach a neural network just about anything

• But there’s little generalization outside of a limited meta-task space

• And there’s very little fully integrative work going on, allowing a single integrated learner to acquire a range of skills all of which can be brought together to solve the problem of general artificial intelligence

Date post:	20-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Complementary Learning Systems in Natural and Artificial ... · Complementary Learning Systems in...

Documents