Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | river-slocombe |
View: | 213 times |
Download: | 0 times |
The Principle of Presence:
A Heuristic for Growing Knowledge Structured Neural
Networks
Laurent Orseau, INSA/IRISA, Rennes, France
Neural Networks Efficient at learning single problems
Fully connected Convergence in W3
Lifelong learning: Specific cases can be important More knowledge, more weights Catastrophic forgetting
-> Full connectivity not suitable-> Need localilty
How can people learn so fast? Focus, attention Raw table
storing? Frog and Car and Running woman
With generalization
What do people memorize? (1)
1 memory: a set of « things » Things are made of other, simpler
things Thing=concept Basic concept=perceptual event
What do people memorize? (2)
Remember only what is present in mind at the time of memorization: What is seen What is heard What is thought Etc.
What do people memorize? (3) Not what is not in mind!
Too many concepts are known What is present:
Few things Probably important
What is absent: Many things Probably unrelevant
Good but not always true -> heuristic
Presence in everyday life Easy to see what is present,
harder to tell what is missing Infants lose attention to balls that
have just disappeared The zero number invented long
after other digits Etc.
The principle of presence Memorization = create a new
concept upon only active concepts Independant of the number of
known concepts Few active concepts
-> few variables-> fast generalization
Implications A concept can be active or inactive. Activity must reflect importance, be rare
~ event (programming) New concept = conjunction of actives
ones Concepts must be re-usable(lifelong):
Re-use = create a link from this concept 2 independant concepts = 2 units
-> More symbolic than MLP: a neuron can represent too many things
Implementation: NN Nonlinearity Graphs properties: local or global
connectivity Weights:
Smooth on-line generalization Resistant to noise
But more symbolic: Inactivity: piecewise continuous activation
function Knowledge not too much distributed Concepts not too much overlapping
First implementation Inputs: basic events Output: target concept No macro-concept:
-> 3-layer Neuron = conjunction,
unless explicit (supervised learning),-> DNF
Output weights simulate priority
Locality in learning Only one neuron modified at a time:
Nearest = most activated If target concept not activated when it
should: Generalize the nearest connected neuron Add a neuron for that specific case
If target active, but not enough or too much: Generalize the most activating neuron
Learning: example (0) Must learn AB. Examples: ABC, ABD, ABE, but not
AB. A
B
C
D
E
AB
…
Target already exists
Inputs:
Learning: example (1) ABC:
A
B
C
D
E
AB1
Disjunction1/3
1/3
1/3
2/3
Conjunction
10
1
1-1/Ns
N1
N1 active when A, B and C all active
Learning : example (3) ABE: N1 slightly active for AB
A
B
C
D
E
AB
N1>>1/3
>>1/3
<<1/3
1
N2
1/31/3
1/3
1
2/3
2/3
>1/3
>1/3
<1/3
Learning : example (4) Final: N1 has generalized, active for AB
A
B
C
D
E
AB
N11/2
1/2
0
1
N2
1/31/3
1/3
1
2/3
2/3
Unuseful neuronDeleted by criterion
NETtalk task TDNN: 120 neurons, 25.200 cnx, 90% Presence: 753 neurons, 6.024 cnx,
74% Then learns by heart
If inputs activity reversed-> catastrophic!
Many cognitive tasks heavily biased toward the principle of presence?
Advantages w/r NNs As many inputs as wanted, only
active ones are used Lifelong learning:
Large scale networks Learns specific cases and generalizes,
both quickly Can lower weights without wrong
prediction -> imitation
But… Few data, limiting the number of
neurons:not as good as backprop
Creates many neurons (but can be deleted)
No negative weights