HMM-Based Semantic Learningfor a Mobile Robot
Kevin Squire
Language Acquisition and Robotics Group
University of Illinois at Urbana-Champaign
Adviser: Stephen E. Levinson
Language Learning
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.1
Language Learning ... by Robot!
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.2
Philosophy of Language Acquisition
Fundamental Ideas:
The Language Engine is primarily semantic, notsyntactic.
There is no such thing as a disembodied mind.
Language and meaning is acquired throughinteraction with the real world.Sensory-motor function is essential for human-likecognition.
Mental processes are largely based on associativememory and learning.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.3
Ph.D. Background and Research
1. Infrastructure Development
Hardware
Software
2. Research
Semantic Learning
HMM Cascade Model
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.4
Ph.D. Background and Research
1. Infrastructure Development
Hardware
Software
2. Research
Semantic Learning
HMM Cascade Model
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.4
Robot Hardware
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.5
Robot HardwareModifications:
Added
camerasmicrophoneson-board computerwireless transmitter
Miscellaneous structural changes
Replaced power supply, rewiredto supply power to allcomponents
Installed Linux
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.6
Robot HardwareModifications:
Added
camerasmicrophoneson-board computerwireless transmitter
Miscellaneous structural changes
Replaced power supply, rewiredto supply power to allcomponents
Installed Linux
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.6
Robot HardwareModifications:
Added
camerasmicrophoneson-board computerwireless transmitter
Miscellaneous structural changes
Replaced power supply, rewiredto supply power to allcomponents
Installed Linux
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.6
Distributed Communications
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.7
Distributed Computing Framework
Audio Source(Sound Card)
Audio Server(Sink)
Speech Recognition(Sink)
Sound SourceLocation (Sink)
Audio Source(Remote)
Audio Server(Sink)
Illy (robot)
network
network
Hal (workstation)
AudioRing
Buffer
AudioRing
Buffer
full
fullfilling
full (old)
filling
full (old)full
full
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.8
Distributed Computing FrameworkAllowed the integration of:
Sound source localization (D. Li)
Vision based navigation & learning (W. Zhu)
Speech recognition (Q. Liu/R.S. Lin)
Simple working memory (K. Squire)
Next steps:
Centralized controller (M. McClain)
Semantic learning (K. Squire)
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.9
Ph.D. Background and Research
1. Infrastructure Development
Hardware
Software
2. Research
Semantic Learning
HMM Cascade Model
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.10
Ph.D. Background and Research
1. Infrastructure Development
Hardware
Software
2. Research
Semantic Learning
HMM Cascade Model
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.10
Cognitive Framework
SensorySystem
MotorSystem
OutsideWorld
NoeticSystem Somatic
System
Feedback
Proprioceptive
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.11
Associative Learning and Memory
MotorSystem
OutsideWorld
NoeticSystem Somatic
System
SensorySystem
WorkingMemory Noetic
System
MemoryAssociative
Feedback
Proprioceptive
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.12
Associative Learning and Memory
WorkingMemory Noetic
System
MemoryAssociative
SemanticMemory
EpisodicMemory
ProceduralMemory
MemoryAssociative
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.13
Semantic Memory
SemanticMemory
EpisodicMemory
ProceduralMemory
Associative (Long term)Memory
ConceptModel
VisualModel Model
Auditory
SensoryInputs
SemanticMemory
to Working Memory
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.14
Concepts
* crunch *"Apple"
Other knowledge:facts, stories,
experiences, etc.
AppleConcept of
Concept : abstract symbol associated withsymbolic representations in the varioussenses
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.15
Semantic Memory
ConceptModel
VisualModel Model
Auditory
SensoryInputs
SemanticMemory
to Working Memory
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.16
Cascade of HMMs
ConceptModel
VisualModel Model
Auditory
SensoryInputs
SemanticMemory
to Working Memory
=⇒
ϕ̂
n^ visx xn
aud^
yn
aud
ϕ^ vis
ϕ^con
ϕ^aud
yn
vis
con vis aud^ ^y ={x , x }n n n
n^ conx
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.17
Hidden Markov Models (HMMs)
S2 S3S1
0.2
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.18
Hidden Markov Models (HMMs)
0.2 0.20.6
S2 S3S1
0.2
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.18
Hidden Markov Models (HMMs)
S2 S3S1
0.1
0.80.7
0.2
0.1 0.1
0.2
0.60.2
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.18
Hidden Markov Models (HMMs)
S2 S3S1
0.1
0.80.7
0.2
0.1 0.1
0.2
0.60.2
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.18
Maximum-Likelihood Estimation (1)Traditional methods (Baum-Welch reestimation):
Let pn(y1, . . . , yn;ϕ) be the likelihood of observations(y1, . . . , yn) given HMM ϕ.
Maximize pn(y1, . . . , yn;ϕ) by solving
5ϕpn(y1, . . . , yn;ϕ) = 0.
Implemented as an Expectation-Maximization (EM)procedure.
Requires all of (y1, . . . , yn) be available.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.19
Maximum-Likelihood Estimation (2)Recursive (stochastic gradient) procedure (LeGland and Mevel):
Rewrite pn(y1, . . . , yn;ϕ) as a sum
log pn(y1, . . . , yn;ϕ) =n
∑
k=1
log b(yk;ϕ)′uk(ϕ)
where
bi(yk;ϕ) = p(yk|xk = i)
uki(ϕ) = Pr(xk = i|y1, . . . , yk−1).
Update parameters ϕ at time n + 1 with
ϕ = ϕ + ε(
5ϕ log b(yn+1;ϕ)′un+1(ϕ))
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.20
Cascade of HMMs
ϕ̂
n^ visx xn
aud^
yn
aud
ϕ^ vis
ϕ^con
ϕ^aud
yn
vis
con vis aud^ ^y ={x , x }n n n
n^ conx
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.21
Semantic Learning Simulation
ConceptModel
VisualModel Model
Auditory VisualModel
ConceptModel
ModelAuditory
"Apple"
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.22
Semantic Learning Simulation
"Apple"
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.22
Simulation Model Topology
aϕ̂
nya
^nax
nc,2y
nv^
ny = xc,1
aλ
vϕ
^nvx
vϕ̂
^ ^^ vy ={x , x }nnc
na
ny = yvnvis
nyvis audy = y
n n
a
nyvis
ϕvis
robotϕ_̂ϕ
boy_ cϕ̂ϕc
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.23
Simulation Results
MaximumLikelihood
Classification
ViterbiClassification
ϕ̂a 90.1%(3.7%) 89.9%(4.2%)
ϕ̂v 97.9%(1.6%) 99.1%(2.4%)
ϕ̂c 98.4%(1.1%) 98.8%(1.1%)
Average classification accuracy for the learned modelsover 50 runs. The number in parenthesis is the standarddeviation.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.24
Robot Implementation
The robot should:
Recognize visual inputs
Recognize auditory inputs
Learn concepts
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.25
Cascade of HMMs as an Associative Memory
Cascade Model:
ϕ̂
n^ visx xn
aud^
yn
aud
ϕ^ vis
ϕ^con
ϕ^aud
yn
vis
con vis aud^ ^y ={x , x }n n n
n^ conx
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.26
Cascade of HMMs as an Associative Memory
Auditory-only Classification:
ϕ̂ϕ^con
con aud^n ny = x
n^ conx
ϕ^ aud
yn
aud
n^ audx
yn
vis
ϕ^vis
xnvis^
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.26
Cascade of HMMs as an Associative Memory
Visual-only Classification:
ϕ̂
n^ visx
ϕ^ vis
ϕ^con
yn
vis
con vis^n ny = x
n^ conx
yn
aud
xnaud^
ϕ^aud
OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.26
Cascade of HMMs as an Associative Memory
Audio-Visual Learning:
ϕ̂
n^ visx xn
aud^
yn
aud
ϕ^ vis
ϕ^con
ϕ^aud
yn
vis
con vis aud^ ^y ={x , x }n n n
n^ conx
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.26
Robot Demonstration
Visual Objects: Words:
cat
dog
red ball
green ball
ball
animal
Concepts:
cat
dog
red ball
green ball
ball
animal
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.27
Robotic Controllerexplore
−−−/
visibleobject
silence(timeout)
−−−/pick upobject
−−−/run intoobject
speech: "illy"/turn toward
sound
timeoutexpired/
beep
silence(timeout)
wrong object/explore
found nothing or
founddesired object/
say name
object far/approach object
learn object
unknown speech/beep object
nearspeech/
repeat & learn
Object2−Found
3−Learn Name
6−Interact
4−Play 1
7−Search 5−Play 2
objectlost
1−Explore
search for objecthear known object/
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.28
Video
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.29
Conclusion
1. Built a platform upon which to conduct languageacquisition research.
2. Proposed a general model of semantic conceptlearning.
3. Successfully implemented this model in a real robotusing a cascade of HMMs.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.30
Future Directions/Interests1. Grow/shrink models (combine/split states) as
appropriate (e.g., use minimum description length(MDL) or related measures).
2. Apply associative learning to spatial/actioninformation, other modalities.
3. Study sentence comprehension (e.g., combinesyntactic and semantic information).
4. Incorporate reinforcement into current unsupervisedtraining scheme.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.31
Questions?
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.32
Robot Visual HMM
Initialization: Kmeans onlabeled data
Online Learning withRMLE
Features:
Color histogramMomentHeight/width ratio
Fixed number of classes
yn
obj
nobj~ y
S2 S3S1
ϕ^ obj
nobj^ x
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.33
Robot Auditory Model
3 State HMM
States: Silence, Voiced, Unvoiced
Features: 3 log-area ratios +log-energy + voicing
Online update with RMLE possible
“Word Recognizer”
Features: histogram of audio states+ word length
Can distinguish some wordsInitial training offline with RMLE
ϕ^ aud
n
aud{y }
ϕ^ word
S2 S3S1
n’word~ y
n’word x ^
naud{x }^
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.34
Robot Concept HMM
6 State Discrete HMM
Observations: states ofVisual and Auditorymodels
Initialized offline
Online update with RMLE
S2 S3S1
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.35
RMLE Convergence (1)Conditions for RMLE Convergence:
The transition probability matrix A(ϕ∗) is aperiodicand irreducible.
The mapping ϕ → A(ϕ) is twice differentiable withbounded first and second derivatives and Lipschitzcontinuous second derivative. The mappingϕ → b(yk;ϕ) is three times differentiable, and thefunction b(yk; θ) is continuous on R for every θ ∈ Θ.Alternately, for yk drawn from a finite alphabet, themapping ϕ → b(yk;ϕ) is twice differentiable withbounded first and second derivatives and Lipschitzcontinuous second derivative.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.36
RMLE Convergence (2)Conditions for RMLE Convergence:
Under Pϕ∗, the extended Markov chain
{Xn, Yn,un(ϕ),wn(ϕ)}
is geometrically ergodic (see LeGland and Mevel,’96).
Because of this geometric ergodicity, the initial values ofu0(ϕ) and w0(ϕ) are forgotten exponentially fast.
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.37
MPEG Video
Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.38