Audio Workgroup Neuro-inspired Speech Recognition Group Members Ismail UysalYoojin Chung Ramin...

Post on 27-Mar-2015

219 views 3 download

Tags:

transcript

Audio WorkgroupAudio Workgroup

Neuro-inspired Speech RecognitionNeuro-inspired Speech Recognition

Group MembersGroup MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross GaylorDavid Anderson Shihab ShammaHynek Hermanski Shih-Chii LiuGiacomo Indiveri Malcolm Slaney

Audio WorkgroupAudio Workgroup

Audio ProjectsAudio Projects

LocalizationLocalization

Speech Speech RecognitionRecognition

Speech Speech RecognitionRecognition More ASRMore ASRMore ASRMore ASR

Audio WorkgroupAudio Workgroup

Shihab is RunningShihab is Running

See http://www.hardrock100.com/index.asp

Shihab arriving in Telluride in 2004

(should happen around 4PM today)

Audio WorkgroupAudio Workgroup

Localization EffortLocalization Effort

Interaural Time Difference (ITD)

Estimated from time difference between spikes of two matching channels.

Interaural Intensity Difference (IID)

Difference of spike counts between two cochleae.

Azimuth: Combination of ITD and IID

ITD estimation from pure tones

Azimuth estimation from music

Speaker

Microphones

Audio WorkgroupAudio Workgroup

Localization EffortLocalization Effort

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection.

MOTE—based pattern matching using matched filtering with “receptive fields”

Robosapien—listens to the spoken commands….

Audio WorkgroupAudio Workgroup

FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition

Status:Status:FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up.

MOTE – real-time communication with Matlab and sampling operational.

Audio WorkgroupAudio Workgroup

Relational Network (Simple)Relational Network (Simple)

X Y

Z

MM

X

M

Y

M

Z

m

Patches of neurons

Each measureone quantity

Bidirectionalrelations for feedback/feedforward

Thanks to Rodney Douglas

Audio WorkgroupAudio Workgroup

Relational Network (example)Relational Network (example)

Input here

RelationalFeedback

Relational specification

Relational feedback

Audio WorkgroupAudio Workgroup

ASR Relational NetworkASR Relational Network

Cochlea

Delay

Phone Recognizer

Word Recognizer

A patch of neurons(one of N output)Note: We don’t know

how to represent delays

Phone Recognizer

Bidirectional links enforce

phoneme/word constraints

Audio WorkgroupAudio Workgroup

Relational AdvantagesRelational Advantages

Not an HMMHMMs are great, but…

Incorporate other knowledgeBottom-up perception

Top-down word hypothesis

HallucinateBased on experience

Hear “ba..” and know thatBad, bat, bar, bass, band follow

>

Audio WorkgroupAudio Workgroup

Inner hair cells

Silicon CochleaSilicon Cochlea

Ganglion cells

Basilar membrane

highfrequency

lowfrequency

(van Schaik, Liu, 2004)

BASILAR MEMBRANE

INNER HAIR CELLS

GANGLION CELLS

Audio WorkgroupAudio Workgroup

Silicon Frequency ResponseSilicon Frequency Response

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Tone ramps into two cochleas

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cochlear Rate ProfilesCochlear Rate Profiles

Left Cochlea Right Cochlea

Spi

kes

per

utte

ranc

e

Audio WorkgroupAudio Workgroup

Learning AlgorithmsLearning Algorithms

StatisticalSAS (Pick best channels for decision)

Least squares (for software demo)

Liquid State MachineTake input to high dimensions with spiking net

Spike Timing Dependent Plasticity (STDP)Giocomo/Srinjoy Chip

Brader/Fusi

0 0.05 0.1 0.15 0.2 0.250

0.5

1

1.5

2

2.5

V1

V2

Vowel 1

Vowel 2

LSM Spiking Output

Audio WorkgroupAudio Workgroup

Phoneme 1 Phoneme 2 Phoneme 2

Learning Chip ArchitectureLearning Chip Architecture

ImmediateCochlea

Pla

stic

sy

naps

esDelayedCochlea

Phoneme 1

Cochlea Chip

Learning ChipNeurons

Relational Network

Non

plas

tic

syna

pses

Exc

it.

Inhi

b.

Bin

ary

syna

ptic

w

eigh

ts:

, ,

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Tone ResultsTone Results

Tone recognitionSpike input from silicon cochlea

TrainingTwo tones

Duplicated input

Positive and negative examples

Testing

Audio WorkgroupAudio Workgroup

Phoneme recognitionSpike input from silicon cochlea

TrainingTwo phonemes

Duplicated inputs

Positive and negative examples

Testing

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Phoneme ResultsPhoneme Results

Audio WorkgroupAudio Workgroup

Behind the CurtainBehind the Curtain

Audio WorkgroupAudio Workgroup

Hardware OverviewHardware Overview

Cochlea

Learning

LearningLearning

PhonemeWord

PCI-AER (for remapping)

PCI-AER (for remapping)

Cochlea

Shih-Chii LiuGiacomo Indiveri

Implemented in MATLAB

Audio WorkgroupAudio Workgroup

Infrastructure DifficultiesInfrastructure Difficulties

RemapperEnsuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow)

PowerThe unpredictable problem caused by the variation in supply voltage as much as 1V.

Sharing chipsThe learning chip had to be shared with two other workgroups.

PC replacement

Audio WorkgroupAudio Workgroup

Impedance DifficultiesImpedance Difficulties

Cochlear firing ratesCochlea: 6M spikes/second

30k channels, 200 spikes/second

Silicon Cochlea: 30k spikes/second30 channels, 1k spike/second

Learning Chip: 3k spikes/second30 channels, 100 spikes/second

Dynamic range

Audio WorkgroupAudio Workgroup

Desired ResultsDesired Results

/A/ Phoneme Patch

/I/ Phoneme Patch

AI Word Patch

IA Word Patch

A A A IPhoneme Input

Relational Feedback

Without With

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

SimulationSimulation

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Simulation 2Simulation 2

Audio WorkgroupAudio Workgroup

Simulation 3Simulation 3

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Audio WorkgroupAudio Workgroup

Great Job!Great Job!

Student MembersStudent MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross Gaylor

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Audio WorkgroupAudio Workgroup

Silicon CochleaSilicon Cochlea

0 20 40 60 800

0.5

1

1.5

2x 10

5

Channel Number

Mean firing rate

Mean firing rates in response to two tones

/a//i/

0 2 4 6 8 10 12 14

x 105

10

15

20

25

30

35

40

45

50

55Raster plots for two different tones

Time in microseconds

Channel number

200Hz1000Hz

Raster plot for two different tone inputs

Mean firing rates for two different vowel inputs

Channel Number

Cha

nnel

Num

ber

Time in microseconds

Audio WorkgroupAudio Workgroup

Word RecognizerWord Recognizer

Four example raster plot (silence, A_, A_ with relational, AI)

Audio WorkgroupAudio Workgroup

Software SimulationSoftware Simulation

Audio WorkgroupAudio Workgroup

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Software SimulationSoftware Simulation

Audio WorkgroupAudio Workgroup

Behind the CurtainBehind the Curtain