HMM-Based Semantic Learning for a Mobile Robotk-squire/presentations/... · Robot Auditory Model 3...

HMM-Based Semantic Learningfor a Mobile Robot

Kevin Squire

Language Acquisition and Robotics Group

University of Illinois at Urbana-Champaign

Adviser: Stephen E. Levinson

Language Learning

Kevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.1

Language Learning ... by Robot!


Philosophy of Language Acquisition

Fundamental Ideas:

The Language Engine is primarily semantic, notsyntactic.

There is no such thing as a disembodied mind.

Language and meaning is acquired throughinteraction with the real world.Sensory-motor function is essential for human-likecognition.

Mental processes are largely based on associativememory and learning.


Ph.D. Background and Research

1. Infrastructure Development

Hardware

Software

2. Research

Semantic Learning

HMM Cascade Model

OKevin Squire—Lincoln Laboratory Interview, 6 June 2005 – p.4



Hardware

Software

2. Research

Semantic Learning

HMM Cascade Model


Robot Hardware


Robot HardwareModifications:

Added

camerasmicrophoneson-board computerwireless transmitter

Miscellaneous structural changes

Replaced power supply, rewiredto supply power to allcomponents

Installed Linux



Added




Installed Linux



Added




Installed Linux


Distributed Communications


Distributed Computing Framework

Audio Source(Sound Card)

Audio Server(Sink)

Speech Recognition(Sink)

Sound SourceLocation (Sink)

Audio Source(Remote)

Audio Server(Sink)

Illy (robot)

network

network

Hal (workstation)

AudioRing

Buffer

AudioRing

Buffer

full

fullfilling

full (old)

filling

full (old)full

full


Distributed Computing FrameworkAllowed the integration of:

Sound source localization (D. Li)

Vision based navigation & learning (W. Zhu)

Speech recognition (Q. Liu/R.S. Lin)

Simple working memory (K. Squire)

Next steps:

Centralized controller (M. McClain)

Semantic learning (K. Squire)




Hardware

Software

2. Research

Semantic Learning

HMM Cascade Model




Hardware

Software

2. Research

Semantic Learning

HMM Cascade Model


Cognitive Framework

SensorySystem

MotorSystem

OutsideWorld

NoeticSystem Somatic

System

Feedback

Proprioceptive


Associative Learning and Memory

MotorSystem

OutsideWorld

NoeticSystem Somatic

System

SensorySystem

WorkingMemory Noetic

System

MemoryAssociative

Feedback

Proprioceptive


Associative Learning and Memory

WorkingMemory Noetic

System

MemoryAssociative

SemanticMemory

EpisodicMemory

ProceduralMemory

MemoryAssociative


Semantic Memory

SemanticMemory

EpisodicMemory

ProceduralMemory

Associative (Long term)Memory

ConceptModel

VisualModel Model

Auditory

SensoryInputs

SemanticMemory

to Working Memory


Concepts

* crunch *"Apple"

Other knowledge:facts, stories,

experiences, etc.

AppleConcept of

Concept : abstract symbol associated withsymbolic representations in the varioussenses


Semantic Memory

ConceptModel

VisualModel Model

Auditory

SensoryInputs

SemanticMemory

to Working Memory


Cascade of HMMs

ConceptModel

VisualModel Model

Auditory

SensoryInputs

SemanticMemory

to Working Memory

=⇒

ϕ̂

n^ visx xn

aud^

yn

aud

ϕ^ vis

ϕ^con

ϕâud

yn

vis

con vis aud^ ^y ={x , x }n n n

n^ conx


Hidden Markov Models (HMMs)

S2 S3S1

0.2



0.2 0.20.6

S2 S3S1

0.2



S2 S3S1

0.1

0.80.7

0.2

0.1 0.1

0.2

0.60.2



S2 S3S1

0.1

0.80.7

0.2

0.1 0.1

0.2

0.60.2


Maximum-Likelihood Estimation (1)Traditional methods (Baum-Welch reestimation):

Let pn(y1, . . . , yn;ϕ) be the likelihood of observations(y1, . . . , yn) given HMM ϕ.

Maximize pn(y1, . . . , yn;ϕ) by solving

5ϕpn(y1, . . . , yn;ϕ) = 0.

Implemented as an Expectation-Maximization (EM)procedure.

Requires all of (y1, . . . , yn) be available.


Maximum-Likelihood Estimation (2)Recursive (stochastic gradient) procedure (LeGland and Mevel):

Rewrite pn(y1, . . . , yn;ϕ) as a sum

log pn(y1, . . . , yn;ϕ) =n

∑

k=1

log b(yk;ϕ)′uk(ϕ)

where

bi(yk;ϕ) = p(yk|xk = i)

uki(ϕ) = Pr(xk = i|y1, . . . , yk−1).

Update parameters ϕ at time n + 1 with

ϕ = ϕ + ε(

5ϕ log b(yn+1;ϕ)′un+1(ϕ))


Cascade of HMMs

ϕ̂

n^ visx xn

aud^

yn

aud

ϕ^ vis

ϕ^con

ϕâud

yn

vis


n^ conx


Semantic Learning Simulation

ConceptModel

VisualModel Model

Auditory VisualModel

ConceptModel

ModelAuditory

"Apple"


Semantic Learning Simulation

"Apple"


Simulation Model Topology

aϕ̂

nya

^nax

nc,2y

nv^

ny = xc,1

aλ

vϕ

^nvx

vϕ̂

^ ^^ vy ={x , x }nnc

na

ny = yvnvis

nyvis audy = y

n n

a

nyvis

ϕvis

robotϕ_̂ϕ

boy_ cϕ̂ϕc


Simulation Results

MaximumLikelihood

Classification

ViterbiClassification

ϕ̂a 90.1%(3.7%) 89.9%(4.2%)

ϕ̂v 97.9%(1.6%) 99.1%(2.4%)

ϕ̂c 98.4%(1.1%) 98.8%(1.1%)

Average classification accuracy for the learned modelsover 50 runs. The number in parenthesis is the standarddeviation.


Robot Implementation

The robot should:

Recognize visual inputs

Recognize auditory inputs

Learn concepts


Cascade of HMMs as an Associative Memory

Cascade Model:

ϕ̂

n^ visx xn

aud^

yn

aud

ϕ^ vis

ϕ^con

ϕâud

yn

vis


n^ conx



Auditory-only Classification:

ϕ̂ϕ^con

con aud^n ny = x

n^ conx

ϕ^ aud

yn

aud

n^ audx

yn

vis

ϕ^vis

xnvis^



Visual-only Classification:

ϕ̂

n^ visx

ϕ^ vis

ϕ^con

yn

vis

con vis^n ny = x

n^ conx

yn

aud

xnaud^

ϕâud



Audio-Visual Learning:

ϕ̂

n^ visx xn

aud^

yn

aud

ϕ^ vis

ϕ^con

ϕâud

yn

vis


n^ conx


Robot Demonstration

Visual Objects: Words:

cat

dog

red ball

green ball

ball

animal

Concepts:

cat

dog

red ball

green ball

ball

animal


Robotic Controllerexplore

−−−/

visibleobject

silence(timeout)

−−−/pick upobject

−−−/run intoobject

speech: "illy"/turn toward

sound

timeoutexpired/

beep

silence(timeout)

wrong object/explore

found nothing or

founddesired object/

say name

object far/approach object

learn object

unknown speech/beep object

nearspeech/

repeat & learn

Object2−Found

3−Learn Name

6−Interact

4−Play 1

7−Search 5−Play 2

objectlost

1−Explore

search for objecthear known object/


Video


Conclusion

1. Built a platform upon which to conduct languageacquisition research.

2. Proposed a general model of semantic conceptlearning.

3. Successfully implemented this model in a real robotusing a cascade of HMMs.


Future Directions/Interests1. Grow/shrink models (combine/split states) as

appropriate (e.g., use minimum description length(MDL) or related measures).

2. Apply associative learning to spatial/actioninformation, other modalities.

3. Study sentence comprehension (e.g., combinesyntactic and semantic information).

4. Incorporate reinforcement into current unsupervisedtraining scheme.


Questions?


Robot Visual HMM

Initialization: Kmeans onlabeled data

Online Learning withRMLE

Features:

Color histogramMomentHeight/width ratio

Fixed number of classes

yn

obj

nobj~ y

S2 S3S1

ϕ^ obj

nobj^ x


Robot Auditory Model

3 State HMM

States: Silence, Voiced, Unvoiced

Features: 3 log-area ratios +log-energy + voicing

Online update with RMLE possible

“Word Recognizer”

Features: histogram of audio states+ word length

Can distinguish some wordsInitial training offline with RMLE

ϕ^ aud

n

aud{y }

ϕ^ word

S2 S3S1

n’word~ y

n’word x ^

naud{x }^


Robot Concept HMM

6 State Discrete HMM

Observations: states ofVisual and Auditorymodels

Initialized offline

Online update with RMLE

S2 S3S1


RMLE Convergence (1)Conditions for RMLE Convergence:

The transition probability matrix A(ϕ∗) is aperiodicand irreducible.

The mapping ϕ → A(ϕ) is twice differentiable withbounded first and second derivatives and Lipschitzcontinuous second derivative. The mappingϕ → b(yk;ϕ) is three times differentiable, and thefunction b(yk; θ) is continuous on R for every θ ∈ Θ.Alternately, for yk drawn from a finite alphabet, themapping ϕ → b(yk;ϕ) is twice differentiable withbounded first and second derivatives and Lipschitzcontinuous second derivative.


RMLE Convergence (2)Conditions for RMLE Convergence:

Under Pϕ∗, the extended Markov chain

{Xn, Yn,un(ϕ),wn(ϕ)}

is geometrically ergodic (see LeGland and Mevel,’96).

Because of this geometric ergodicity, the initial values ofu0(ϕ) and w0(ϕ) are forgotten exponentially fast.


MPEG Video


Date post:	19-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

HMM-Based Semantic Learning for a Mobile Robotk-squire/presentations/... · Robot Auditory Model 3...

Documents