+ All Categories
Home > Documents > Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...

Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...

Date post: 27-Mar-2015
Category:
Upload: isaiah-ashby
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih- Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma
Transcript
Page 1: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Neuromorphic Audition Group

Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm

Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma

Page 2: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Outline

• Field Programmable Analog Array (Dave)• Speaker Identification (Malcolm, Nima and Max)• Speech Recognition (Hynek, Misha, Jordon)• STRF Noise Suppression (Nima, Shihab, Dave)• Reconstructions from STRF/Modulation Detectors

(Nima, Shihab)• Social sonar demonstration using silicon cochlea and

RoboQuad toy (Toby and Malcolm)• Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii)• Cochlear Periodicity Detector (Teddy, John, Malcolm,

Shih-Chii)

Page 3: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

FPAA

Page 4: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID

Features Model

Features Model

Features Model

Features Model

WinnerTake

All

MFCCSTRF

GMMART

Page 5: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - STRF

Page 6: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID – ARTMalcom Slaney – Heather Ames – Max Versace

Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories

First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.

Page 7: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - ART Results

Feature extraction

Feature extraction

Vowel extraction

Vowel extraction

TrainingTraining

Features

Feature vectors for“vowel” data

Acoustic Model of

Speaker Identity

Speech input (.wav)

12 MFCC + E, First and second derivs

Utterance Independent

transformation

Utterance Independent

transformation

TransformedFeatures

½ wave rectify, Lowpass filter,

Choice of high energy timeslices

TBD

ARTMAP TestingTesting

PredictedSpeaker Identity

50% correct after 100 cross-validations (# of instances of ARTMAP run)on 10 speaker identification

Continued work:1.Improved vowel extraction2.Utterance independent transformation of feature space

Why we care?Top-DownOnline

Page 8: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - Results

Test % Correct

% Correctin 5dB noise

MFCC (Baseline)

81.3% 81.0%

STRF 79.8%

ART ~60%

Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over

decades.

Page 9: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

ASR - Phoneme Posteriors

Page 10: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

ASR - Combining InformationTr

aini

ng

Cont

ext

{ } { }, Pr | ,Q X C Correct X C=

C

X

?Machines P(word|sound)P(word|context)

Humans [1-P(word|sound)] [1-P(word|context)]

Maximize

Page 11: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Inverse model: from neural responses to sound

QuickTime™ and a decompressor

are needed to see this picture.

Page 12: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Reconstruction of speech in white noise

• Reconstructed speech is “cleaner” than the original noisy

QuickTime™ and a decompressor

are needed to see this picture.

Original Spectrograms Reconstructed Spectrograms

Page 13: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Psychoacoustically-motivated Speech Enhancement

• Perceptual loudnessL=(b*e(t))^a

• By mapping loudness using the same type of function, noise can be decreased

• Results from STRFprocessing

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 14: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Noise suppression using inverse model

• Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses

14

Cortical decomposition “Trained” inverse filters

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 15: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Noise Suppression for White, Jet and City Noise

15

Page 16: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

RS MediaLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA Initializing

Page 17: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - ITD Detector

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Time

Position

Page 18: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - JAER Demo

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 19: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - Periodicity detectorResponse to “hiss” Response to “coo”

Page 20: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,
Page 21: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

When both channels conditionally independent• pCpA – probability of correct recognition in both channels• pC(1-pA ) – correct in ch1 but not in ch2

• pA(1-pC) – correct in ch2 but not in ch1

These three cases are mutually exclusive, thus probability of correct recogntion is

p = pCpA + pC(1-pA) + pA(1-pC) = pC+pA-pCpA

Probability of error

e = (1-p) = 1-pC-pA+pCpA = (1-pC)(1-pA) = eCeA

context(top-down)

acoustic(bottom-up)

pC

pA

stimulus decision


Recommended