Center for Auditoryand Acoustic Research
Representation of Timbre inthe Auditory System
Shihab A. Shamma
Center for Auditory and Acoustic ResearchInstitute for Systems Research
Electrical and Computer EngineeringUniversity of Maryland, College Park
Center for Auditoryand Acoustic Research
0 20 40 60 80 100 120 1400
1
2
3
4
5
6
7
8
9
0 20 40 60 80 100 120 1400
1
2
3
4
5
6
7Time (ms)
200 400 600 800 1000 1200 1400 1600 1800
125
250
500
1000
2000
Musical SpectrogramsViolin (vibrato) Piano
Time (ms)
Center for Auditoryand Acoustic Research
sound
Central AuditoryStages
CollicularStages
MidbrainNuclei
Early AuditoryStages
Attributes of Complex Sounds
NLL
LL
TB
Anatomy of the AuditorySystem
DCNPVCNAVCN
Location Timbre Pitch
Spatial maps
Computing pitch
Harmonic templates
ILD, ITDSpectral cues
The auditory spectrum
IC
MGB
Center for Auditoryand Acoustic Research
AnalysisCochlear filters
TransductionHair cells
ReductionLateral inhibition
log f
log f
log f
log f
log u
u log f
eardrum cochlea basilar membranefilters
hair cell stages lateral inhibitorynetwork
Time(ms)
100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Audit ory Spec t rogram
Early Auditory Processing Stages
Center for Auditoryand Acoustic Research
4000
2000
1000
500
250
Time(ms) 60
average response
Auditory-Nerve ResponsePatterns to a Single Tone
Center for Auditoryand Acoustic Research
4000
2000
1000
500
250
Time(ms) 60
average response
Auditory-Nerve ResponsePatterns to Two-Tone Stimulus
Center for Auditoryand Acoustic Research
500
500
Time (ms)
Time (ms)
4000
4000
250
250
/ r i t a w a y /
Center for Auditoryand Acoustic Research
Sound
Estimated stimulus spectrum
60Time (msec)
Basilar membrane vibrations
Time (msec) 500
A’
B’
C’
Cochlear Analysis Auditory-Nerve Responses
C4
.25
Har
mon
ic s
erie
s
Time (msec)
4000
250
60
CF
(H
z)
4000
250
CF
(Hz)
Time (msec)
A
B
500
C
CF
(kH
z)
Hair cells along the tonotopic axis
Characteristic F
req uen cy Ax is (C
F)
Auditory-nerve fibers
Lateral Inhibition
Center for Auditoryand Acoustic Research
Time (ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Normal
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time (ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time (ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Normal Down-Shift
Compress Dilate
Center for Auditoryand Acoustic Research
Center for Auditoryand Acoustic Research
Awake Set-up
Awake ferret with head restraint in cylindrical holder
Center for Auditoryand Acoustic Research
The raw neural trace typically contained multiple distinct waveforms(typically representing 1-4 neurons) which were sorted off-line.
0 20 40 60 80 100 120
0
Spike Sorting
2000
4000
8000
1600010e Unit 2
tagless 10e
2000
4000
8000
1600010e Unit 1
21
7770 14748
Time (ms) 50 100 150 Time (ms) 50 100 150
Waveforms were sorted in a semi-automatic procedure. First, aPCA-based algorithm was used to pre-sort the spikes. Then aMATLAB based program was used to refine the classification.
Center for Auditoryand Acoustic Research
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.5
1
1.5
2
2.5
3
3.5
4
Time (ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
0 100 200 300 400 500 600 7000
0.5
1
1.5
2
2.5
3
Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)
/come/ /home/ /right/ /away/
Center for Auditoryand Acoustic Research
0
25005
t (ms)
Rate (Hz)
0.6
0.2
124-4-12
Time (ms)100
200 300 400 500 600
700
800
900 1000
125
250
500
1000
2000
Decomposing a Spectrogram into Dynamic Ripples
Frequency (kHz)
∆A
1 2 4 8 16
Tim
e (ms)
Frequency
w4 Hz
0
250
Center for Auditoryand Acoustic Research
4241682028Ω=0.4cyc/octω=4to32Hz 30sweepsperωTime(ms)
TemporalFrequency(Hz) RippleFrequencyis0.4cycles/oct1232 55dB
Reponses to Moving Ripples
Center for Auditoryand Acoustic Research
w(Hz) Ω= 0.8 cyc/oct
Time (ms)
w= 12 HzΩ (cyc/oct)
Time (ms)
| F { }|
|TF ( ,Ω )|
0
Ω
0 T
0
X
t (m s)
ST RF (t,x )
B
freq
uenc
y
-w w
w
A
Center for Auditoryand Acoustic Research
4
0.125
4
0.125
4
0.125
4
0.125
4
0.125
4
0.125
Examples of Different STRF Shapes
Center for Auditoryand Acoustic Research
Spectro-Temporal Response Fields
Center for Auditoryand Acoustic Research
250 8000.25
8
A
C
1 8 0
0.25
8
Multiscale Cortical Representation of a Spectrogram
Frequency
Rate (
Hz)
Center for Auditoryand Acoustic Research
Scale-Rate Decomposition
Reconstruction
Center for Auditoryand Acoustic Research
MUSICAL TIMBRE
Center for Auditoryand Acoustic Research
0 20 40 60 80 100 120 1400
1
2
3
4
5
6
7
8
9
0 20 40 60 80 100 120 1400
1
2
3
4
5
6
7Time (ms)
200 400 600 800 1000 1200 1400 1600 1800
125
250
500
1000
2000
Musical SpectrogramsViolin (vibrato) Piano
Time (ms)
Center for Auditoryand Acoustic Research
- 1.- 2.- 4.- 8.- 16.- 32.0.25
0.50
1.00
2.00
4.00
8.00
1. 2. 4. 8. 16. 32.0.25
0.50
1.00
2.00
4.00
8.00
- 1.- 2.- 4.- 8.- 16.- 32.0.25
0.50
1.00
2.00
4.00
8.00
1. 2. 4. 8. 16. 32.0.25
0.50
1.00
2.00
4.00
8.00
Rate (Hz)
_ + +_
1 2 4 8 16 32 - 1- 2- 4- 8- 16-32 1 2 4 8 16
32- 1- 2- 4- 8- 16-32
.25
.5
1
2
4
8Violin (vibrato) Piano
OboeClarinet
Patterns of Musical TimbreViolin (vibrato) Piano
OboeClarinet
Center for Auditoryand Acoustic Research
Timbre Metric for Some Musical Instruments (TSVQ)
Center for Auditoryand Acoustic Research
Timbre Metric for Musical Instruments
exp1c-model-rs-mf
2 4 6 8 10 12
2
4
6
8
10
12
exp1c-subjects
2 4 6 8 10 12
2
4
6
8
10
12
GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet
Gui
tar
Har
pV
ioli
n P
izz.
Vio
lin
Bow
ed
Bas
s S
ynth
A
Syn
th B
O
boe
Cla
rine
tF
lute
H
orn
Tru
mpe
t
GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet
Gui
tar
Har
pV
ioli
n P
izz.
Vio
lin
Bow
ed
Bas
s S
ynth
A
Syn
th B
O
boe
Cla
rine
tF
lute
H
orn
Tru
mpe
t
Subjects (1-24) Spectral cues
Temporal cues
Spectro-temporal cues
Center for Auditoryand Acoustic Research
Mapping musical instruments
Frequency (Hz)
Time (ms)200 400 600 800 1000 1200 1400
125
250
500
1000
2000
Frequency (Hz)
Time (ms)200 400 600 800 1000 1200 1400
125
250
500
1000
2000
Guitar Trumpet
Frequency (Hz)
Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000
125
250
500
1000
2000
Trumpar
Frequency (Hz)
Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000
250
500
1000
2000
4000
ACE Chord
- 1- 2- 4- 8- 16
0.50
1.00
2.00
4.00
8.00
1 2 4 8 16
0.50
1.00
2.00
4.00
8.00
- 1- 2- 4- 8- 16
0.50
1.00
2.00
4.00
8.00
1 2 4 8 16
0.50
1.00
2.00
4.00
8.00
A Melody with the Trumpar
Center for Auditoryand Acoustic Research
Speech Analysis&
Assessment of Inteligibility
Center for Auditoryand Acoustic Research
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.5
1
1.5
2
2.5
3
3.5
4
Time (ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
0 100 200 300 400 500 600 7000
0.5
1
1.5
2
2.5
3
Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)
/come/ /home/ /right/ /away/
Center for Auditoryand Acoustic Research
Center for Auditoryand Acoustic Research
Human versus Ferret Sensitivity to Spectrotemporal Modulations
Center for Auditoryand Acoustic Research
Center for Auditoryand Acoustic Research
Center for Auditoryand Acoustic Research
Auditory Scene Analysis&
Pitch Extraction
Center for Auditoryand Acoustic Research
250 8000.25
8
A
C
1 8 0
0.25
8
Relevance to Auditory Scene Analysis: Streaming and grouping
Frequency
Rate (
Hz)
Working Hypotheses
Streaming: Any consistently isolated feature in the multiscale representation can be streamed e.g., spectral patterns (tones or average vocal tract spectra) repetitive temporal dynamics (modulated noise or sinusoidal FM tones) - transients as segmenters
Grouping: Harmonicity and its linearly interpolated extensions (pitch extraction and segregation, regular patterns) Shared dynamics (Common onsets and modulations)
Center for Auditoryand Acoustic Research
Frequency (Hz)
Time (ms)100 200 300 400 500 600 700 800 900 1000
250
500
1000
2000
4000
250 500 1000 2000 4000
0.5
1.0
2.0
4.0
8.0
0 20 40 60 80 100 120 14002468
10121416
250 500 1000 2000 4000
0.5
1.0
2.0
4.0
Cortical Representation of Harmonic & Shifted Spectra
Auditory Spectrum Multiscale Representation
Sca
le
Frequency
Reduced Representation
Sca
le
Shifted Spectra are also grouped although they are inharmonic
Center for Auditoryand Acoustic Research
Computing Pitch
Center for Auditoryand Acoustic Research
125
250
500
1000
2000
125
250
500
1000
2000
Pitch Estimates
Pre-cortical processing Post-cortical processing
Center for Auditoryand Acoustic Research
F em ale
10 time (s)
M ale
.125
2
10 time (s)
M ale+F emale
10 time (s)
B Extracted Female
10 time (s)
A
P itc h tracks
Estimating Pitch Extracting Pitch Streams
Center for Auditoryand Acoustic Research
Voice Morphing
Center for Auditoryand Acoustic Research
Manipulating Temporal and Spectral Modulations
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Normal
Temporally smeared
Spectrally smeared
Temporally sharpened
Center for Auditoryand Acoustic Research
Time(ms)100 200 300 400 500 600 700 800 900 1000
125
250
500
1000
2000
Time (ms)500 1000 1500 2000 2500 3000
125
250
500
1000
2000
Time(ms)500 1000 1500 2000 2500 3000
125
250
500
1000
2000
Female
Oboe
Female Oboe
MorphingVoices
Center for Auditoryand Acoustic Research
Auditory Speech and Music ProcessingTai Chi, Mounya El-Hilali, Powen Ru
Cortical Physiology and Auditory ComputationsDidier Depireux, Jonathan Fritz, David KleinJonathan Simon
Acknowledgment
Supported by:MURI # N00014-97-1-0501 from the Office of Naval Research# NIDCD T32 DC00046-01 from the NIDCD# NSFD CD8803012 from the National Science Foundation