Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | phoebe-elliott |
View: | 212 times |
Download: | 0 times |
Distributed Representations:Distributed Representations:
Simon D. Levy Simon D. Levy
Department of Computer ScienceDepartment of Computer Science
Washington and Lee UniversityWashington and Lee University
Lexington, VA 24450Lexington, VA 24450
PHIL 395PHIL 395
9 May 20069 May 2006
Preaching to the Choir in Church(land)Preaching to the Choir in Church(land)
Theme: A Neuro-Manifesto
The real motive behind eliminative materialism
is the worry that the “propositional” kinematics
and “logical” dynamics of folk psychology
constitute a radically false account of the
cognitive activity of humans, and of the higher
animals generally. The worry is that our folk
conception of how cognitive creatures represent
the world ... is a thoroughgoing
misrepresentation of what really takes
place inside us.
[It] turns out that we don t think the way we think we
think! The scientific evidence coming in all around us
is clear: Symbolic conscious reasoning, which is
extracted through protocol analysis from serial verbal
introspection, is a myth. [It] is entirely clear that the
symbolic mind that AI has tried for 50 years to
simulate is just a story we humans tell ourselves to
predict and explain the
unimaginably complex processes
occurring in our evolved brains.
• Folk psychology representations are local: “a place
for every symbol, and every symbol in its place”.
• Neural-net representations are distributed: “each
entity is represented by a pattern of activity
distributed over many computing elements, and each
computing element is involved in representing many
different entities''. (Hinton 1984)
Local Local vsvs. Distributed Representation. Distributed Representation
• Commonest distributed representation is a vector
of real numbers.
• You already know how vectors can be obtained by
back-propagation / gradient-descent.
• Today I’ll talk about some other (faster, more
plausible) ways of obtaining the vectors.
Variation IThe Hard Problem
A typical American seventh grader knows the meaning
of 10-15 words today that she didn't know yesterday ...
The typical seventh grader would have read less than
50 paragraphs since yesterday, from which she should
have should have learned less than three new words.
Apparently, she mastered the meanings of many words
that she did not encounter. - Landauer 1997
Latent Semantic AnalysisLatent Semantic Analysis“You shall know a word by the company it keeps”
– J. R. Firth
• Make a table showing how many times each word occurs
in each of a set of documents, or with another word, etc. -
purely local info
• Mathematically “smear” this information across each row
of the table, showing how likely the word would be to occur
in the other documents – distributed info
Landauer, T. K., Foltz, P. W., & Laham, D. (1998).
Introduction to Latent Semantic Analysis.
Discourse Processes, 25, 259-284.
Landauer, T. K., Foltz, P. W., & Laham,
D. (1998).
Landauer, T. K., Foltz, P. W., & Laham,
D. (1998).
Latent Semantic AnalysisLatent Semantic Analysis
• As in Elman’s SRN, reps. of similar concepts end up close
together in “meaning space”
•Amazingly useful
• Intelligent information retrieval: “Smart Googling”
(Berry et al. 1994)
• Automatic essay grading: “Who’s really looking at your SAT?”
(Landauer et al. 2000)
• Disambiguating words for automatic translation
(Davis & Levy 2006: http://www.cs.wlu.edu/translate)...
Variation IIThe Harder Problem
The Language of Thought: Binding The Language of Thought: Binding and Recursionand Recursion
• LSA (and Elman-style hidden vectors) only give us the
representations of individual words/concepts
• Documents are just unstructured “bags of words”
• Without folk-psychological structures, how do we represent
1) the distinction between, e.g., “Lois loves Clark” and
“Clark loves Lois”?
2) intentional concepts like
“Perry knows that [Lois loves Clark]”?
Binding as Vector Product Binding as Vector Product (Smolensky 1990)(Smolensky 1990)
© 2004. Indiana University and Michael Gasser.
www.cs.indiana.edu/classes/b651/Notes/convolution.html
24 Feb 2004
• Cool, but problematic, because representations keep
getting bigger...
Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)
• Binding by “circular convolution”: sum over diagonals with circularity to keep fixed size:
© 2004. Indiana University and Michael Gasser.
www.cs.indiana.edu/classes/b651/Notes/convolution.html
24 Feb 2004
Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)
• Keeping the # of dimensions constant allows us to build intentional representations of arbitrary complexity:
KNOWER*PERRY + KNOWN * (LOVER*LOIS + LOVEE*CLARK)
• As with LSA, similar propositions end up close together in “proposition space”
Holographic Reduced Holographic Reduced Representations (Plate 1991)Representations (Plate 1991)
• Mathematically, the same operations are used to produce holograms
Variation IIIThe Hardest Problem
Language Language
• Language is a structured relationship between a set
of structured meanings and a set of structured
utterances.
• Children acquire this mapping after exposure to a
tiny fraction of the possible meaning/utterance pairs,
and [pace Elman] with very little corrective
feedback.
Asking the Right QuestionsAsking the Right Questions
• How might a language organize itself to deal with
the fact that only an infinitesimal fraction of the
possible meaning/utterance pairs will be heard by a
given speaker in their lifetime?
• How might a nervous system (synaptic weights,
topology of neurons) organize itself to match the
regularities in its environment?
Self-Organizing Maps (Kohonen Self-Organizing Maps (Kohonen
1984)1984)
• Input data consisting of N-
dimensional vectors
• Nodes (units) in a 2D grid
• Each node has a synaptic weight
vector of N dimensions
• Simple, “unsupervised” learning
algorithm...
SOM Learning AlgorithmSOM Learning Algorithm
1. Pick an input vector at random
2. “Winning” node is one whose weight vector is
closest to the input vector in vector space.
3. Update weights of winner and its grid neighbors
to move them closer to the input
Get Matlab code: http://www.cs.wlu.edu/~levy/som
SOM Learning: SOM Learning: A Two-Part A Two-Part
Invention in Two DimensionsInvention in Two Dimensions
SOM Learning: SOM Learning: A Two-Part A Two-Part
Invention in Two DimensionsInvention in Two Dimensions
SOM Learning: SOM Learning: A Three-Part A Three-Part
Invention in Three DimensionsInvention in Three Dimensions
SOM Learning: SOM Learning: A Three-Part A Three-Part
Invention in Three DimensionsInvention in Three Dimensions
Self-Organizing LanguageSelf-Organizing Language
● So grid can have any number of dimensions!
● Replace grid with high-dimensional HRR vector
● Learn to map from HRR’s for meanings to HRR’s
for utterances.
● What sort of regularities emerge?
ConclusionsConclusions
● Distributed/vector representations can encode all
sorts of information once thought to be solely the
domain of folk psychology.
● But we will need completely new organizational
principles (holograms, deformable maps, fractals,
error gradients) to be able to tackle the really hard
problems.
Thank You!