Complementary Learning Systems in Natural and Artificial Intelligence
James L. McClellandDepartment of Psychology &
Center for Mind, Brain and Computation
Stanford University
Tom’s questions for me
• What sort of NN architectures could serve an automated programmer in constructing a program?
• How do you imagine different memory systems working in a human programmer?
Outline for the session
• Complementary learning systems
– The basic theory
– Rapid schema consistent learning
– Comparison of the two learning systems
• Deep learning and complementary learning systems
– Rehearsal buffer in the DQN
– Memory based parameter adaptation
• Revisiting Tom’s prompt and a response
Your knowledge is in your connections!
• An experience is a pattern of activation over neurons in one or more brain regions.
• The trace left in memory is the set of adjustments to the strengths of the connections.
– Each experience leaves such a trace, but the traces are not separable or distinct.
– Rather, they are superimposed in the same set of connection weights.
• Recall involves the recreation of a pattern of activation, using a part or associate of it as a cue.
• The reinstatement depends on the knowledge in the connection weights, which in general will reflect influences of many different experiences.
• Thus, memory is always a constructive process, dependent on contributions from many different experiences.
Effect of a HippocampalLesions
• Intact performance on tests of intelligence, general knowledge, language, other acquired skills
• Dramatic deficits in formation of some types of new memories:– Explicit memories for
episodes and events– Paired associate learning– Arbitrary new factual
information
• Spared priming and skill acquisition
• Temporally graded retrograde amnesia:– lesion impairs recent
memories leaving remote memories intact.
Note: HM’s lesion was bilateral
Key Points
• We learn about the general pattern of experiences, not just specific things
• Gradual learning in the cortex builds implicit semantic and procedural knowledge that forms much of the basis of our cognitive abilities
• The Hippocampal system complements the cortex by allowing us to learn specific things without interference with existing structured knowledge
• In general these systems must be thought of as working together rather than being alternative sources of information.
Effect of Prior Association on Paired-Associate Learning in Control and Amnesic Populations
Cutting (1978), Expt. 1
-20
0
20
40
60
80
100
Very Easy Easy Fairly Easy Hard Very Hard
Category (Ease of Association)
Pe
rce
nt
Co
rre
ct
Control (Expt)
Amnesic (Expt)
Base rates
Kwok & McClelland Model ofSemantic and Episodic Memory
• Model includes slow learning cortical system and a fast-learning hippocampal system.
• Cortex contains units representing both content and context of an experience.
• Semantic memory is gradually built up through repeated presentations of the same content in different contexts.
• Formation of new episodic memory depends on hippocampus and the relevant cortical areas, including context.
– Loss of hippocampus would prevent initial rapid binding of content and context.
• Episodic memories benefit from prior cortical learning when they involve meaningful materials.
ContextRelation Cue
Target
Neo-Cortex
Hippocampus
Simulation Results From KM Model
Cutting (1978), Expt. 1
84
0
70
9
0
68
-20
0
20
40
60
80
100
Very Easy Easy Fairly Easy Hard Very Hard
Category (Ease of Association)
Pe
rce
nt
Co
rre
ct
Control (Model)
Amnesic (Model)
Control (Expt)
Amnesic (Expt)
Base ratesin model
Emergence of Meaning in Learned Distributed Representations through
Gradual Interleaved Learning
• Distributed representations (what ML calls embeddings) that capture aspects of meaning emerge through a gradual learning process
• The progression of learning and the representations formed capture many aspects of cognitive development
Progressive differentiation
– Sensitivity to coherent covariation across contexts
– Reorganization of conceptual knowledge
The Rumelhart Model
The Training Data:
All propositions true of items at the bottom levelof the tree, e.g.:
Robin can {grow, move, fly}
Experience
Early
Later
LaterStill
What happens in this system if we try to learn something new?
Such as a Penguin
Learning Something New
• Used network already trained with eight items and their properties.
• Added one new input unit fully connected to the representation layer
• Trained the network withthe following pairs of items:
– penguin-isaliving thing-animal-bird
– penguin-cangrow-move-swim
Rapid Learning Leads to Catastrophic Interference
A Complementary Learning System in the Medial Temporal Lobes
colorform
motion
action
valance
Temporal
pole
name
Medial Temporal Lobe
Avoiding Catastrophic Interference with Interleaved Learning
Initial Storage in the Hippocampus Followed by Repeated Replay Leads to the Consolidation of
New Learning in Neocortex, Avoiding Catastrophic Interference
colorform
motion
action
valance
Temporal
pole
name
Medial Temporal Lobe
Rapid Consolidation of Schema Consistent Information
RichardMorris
Tse et al (Science, 2007, 2011)
During training, 2 wellsuncovered on each trial
Schemata and Schema Consistent
Information
• What is a ‘schema’?– An organized knowledge
structure into which existing knowledge is organized.
• What is schema consistent information?– Information that can be
added to a schema without disturbing it.
• What about a penguin?– Partially consistent– Partially inconsistent
• In contrast, consider– a trout– a cardinal
New Simulations
• Initial training with eight items and their properties as before.
• Added one new input unit fully connected to the representation layer also as before
• Trained the network on one of the following pairs of items:
– penguin-isa & penguin-can– trout-isa & trout-can– cardinal-isa & cardinal-can
New Learning of Consistent and Partially Inconsistent Information
INTERFERENCELEARNING
Connection Weight Changes after Simulated NPA, OPA and NM Analogs
Tse Et al 2011
How Does It Work?
How Does It Work?
Comparison of the two learning systems
Dense vs Sparse Coding
• Pattern separation:
– Sparse randomconjunctive coding
Similarity Based Representations in Cortex
In more detail…
• Input from neocortex comes into EC; EC projects to DG, CA3, and CA1
• Drastic pattern separation occurs in DG
• Downsampling in CA3 assigns an arbitrary code
• Invertable somewhat sparsifiedrepresentation in CA1
• Fewish-shot learning in DG, CA3, CA3->CA1 allows reconstruction of ERC pattern from partial input.
• Other connections shown in black are part of the slow-learning neocortical network.
• Recurrence within CA3, through the hippocampal circuit shown, and through the outer loop also involving the rest of the neocortex
Two modes of generalization
• Parametric vs. Item-based
• As long as the embeddings are already known, these modes can both support generalization
• The hippocampus can do so without requiring interleaved learning
• Adapting the embeddings may be relatively hard
ContextRelation Cue
Target
Neo-Cortex
Hippocampus
How might hippocampus support inference and generalization?
‘Inference’
• Finding missing links in the transitive inference task
Complementary Learning Systems in AI
• DQN • MBPA
Tom’s questions for me
• What sort of NN architectures could serve an automated programmer in constructing a program?
• How do you imagine different memory systems working in a human programmer?
• My version of the question:
– What additional form of memory do intelligent agent’s need?
Working Memory
• Is there a special working memory system in the brain?
• Or do we learn connection weights that sustain information an active state in memory?
• RNNs and LSTMs provide forms of working memory
• What is exciting about these models is that they learn what to retain
– We learn to retain the information that will be useful later
The Differentiable Neural Computer
Learning what to store – in two senses
Memory Augmented Neural Networks
Santoro et al (2016) One-shot learning with MANNs
Some closing comments
• Cognitive Science, Neuroscience, and AI now have increasingly powerful ideas that we can use to help us understand learning and memory
• AI has expanded the space of what we can consider to be learned rather than innate
• But currently, AI breakthroughs are drastically over-compartmentalized
• We can use meta-learning to teach a neural network just about anything
• But there’s little generalization outside of a limited meta-task space
• And there’s very little fully integrative work going on, allowing a single integrated learner to acquire a range of skills all of which can be brought together to solve the problem of general artificial intelligence