Representation of Timbre in

Center for Auditoryand Acoustic Research

Representation of Timbre inthe Auditory System

Shihab A. Shamma

Center for Auditory and Acoustic ResearchInstitute for Systems Research

Electrical and Computer EngineeringUniversity of Maryland, College Park


0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

9

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7Time (ms)

200 400 600 800 1000 1200 1400 1600 1800

125

250

500

1000

2000

Musical SpectrogramsViolin (vibrato) Piano

Time (ms)


sound

Central AuditoryStages

CollicularStages

MidbrainNuclei

Early AuditoryStages

Attributes of Complex Sounds

NLL

LL

TB

Anatomy of the AuditorySystem

DCNPVCNAVCN

Location Timbre Pitch

Spatial maps

Computing pitch

Harmonic templates

ILD, ITDSpectral cues

The auditory spectrum

IC

MGB


AnalysisCochlear filters

TransductionHair cells

ReductionLateral inhibition

log f

log f

log f

log f

log u

u log f

eardrum cochlea basilar membranefilters

hair cell stages lateral inhibitorynetwork

Time(ms)

100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Audit ory Spec t rogram

Early Auditory Processing Stages


4000

2000

1000

500

250

Time(ms) 60

average response

Auditory-Nerve ResponsePatterns to a Single Tone


4000

2000

1000

500

250

Time(ms) 60

average response

Auditory-Nerve ResponsePatterns to Two-Tone Stimulus


500

500

Time (ms)

Time (ms)

4000

4000

250

250

/ r i t a w a y /


Sound

Estimated stimulus spectrum

60Time (msec)

Basilar membrane vibrations

Time (msec) 500

A’

B’

C’

Cochlear Analysis Auditory-Nerve Responses

C4

.25

Har

mon

ic s

erie

s

Time (msec)

4000

250

60

CF

(H

z)

4000

250

CF

(Hz)

Time (msec)

A

B

500

C

CF

(kH

z)

Hair cells along the tonotopic axis

Characteristic F

req uen cy Ax is (C

F)

Auditory-nerve fibers

Lateral Inhibition


Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal Down-Shift

Compress Dilate



Awake Set-up

Awake ferret with head restraint in cylindrical holder


The raw neural trace typically contained multiple distinct waveforms(typically representing 1-4 neurons) which were sorted off-line.

0 20 40 60 80 100 120

0

Spike Sorting

2000

4000

8000

1600010e Unit 2

tagless 10e

2000

4000

8000

1600010e Unit 1

21

7770 14748

Time (ms) 50 100 150 Time (ms) 50 100 150

Waveforms were sorted in a semi-automatic procedure. First, aPCA-based algorithm was used to pre-sort the spikes. Then aMATLAB based program was used to refine the classification.


0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.5

1

1.5

2

2.5

3

3.5

4

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

3

Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)

/come/ /home/ /right/ /away/


0

25005

t (ms)

Rate (Hz)

0.6

0.2

124-4-12

Time (ms)100

200 300 400 500 600

700

800

900 1000

125

250

500

1000

2000

Decomposing a Spectrogram into Dynamic Ripples

Frequency (kHz)

∆A

1 2 4 8 16

Tim

e (ms)

Frequency

w4 Hz

0

250


4241682028Ω=0.4cyc/octω=4to32Hz 30sweepsperωTime(ms)

TemporalFrequency(Hz) RippleFrequencyis0.4cycles/oct1232 55dB

Reponses to Moving Ripples


w(Hz) Ω= 0.8 cyc/oct

Time (ms)

w= 12 HzΩ (cyc/oct)

Time (ms)

| F { }|

|TF ( ,Ω )|

0

Ω

0 T

0

X

t (m s)

ST RF (t,x )

B

freq

uenc

y

-w w

w

A


4

0.125

4

0.125

4

0.125

4

0.125

4

0.125

4

0.125

Examples of Different STRF Shapes


Spectro-Temporal Response Fields


250 8000.25

8

A

C

1 8 0

0.25

8

Multiscale Cortical Representation of a Spectrogram

Frequency

Rate (

Hz)


Scale-Rate Decomposition

Reconstruction


MUSICAL TIMBRE


0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

9

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7Time (ms)

200 400 600 800 1000 1200 1400 1600 1800

125

250

500

1000

2000

Musical SpectrogramsViolin (vibrato) Piano

Time (ms)


- 1.- 2.- 4.- 8.- 16.- 32.0.25

0.50

1.00

2.00

4.00

8.00

1. 2. 4. 8. 16. 32.0.25

0.50

1.00

2.00

4.00

8.00

- 1.- 2.- 4.- 8.- 16.- 32.0.25

0.50

1.00

2.00

4.00

8.00

1. 2. 4. 8. 16. 32.0.25

0.50

1.00

2.00

4.00

8.00

Rate (Hz)

_ + +_

1 2 4 8 16 32 - 1- 2- 4- 8- 16-32 1 2 4 8 16

32- 1- 2- 4- 8- 16-32

.25

.5

1

2

4

8Violin (vibrato) Piano

OboeClarinet

Patterns of Musical TimbreViolin (vibrato) Piano

OboeClarinet


Timbre Metric for Some Musical Instruments (TSVQ)


Timbre Metric for Musical Instruments

exp1c-model-rs-mf

2 4 6 8 10 12

2

4

6

8

10

12

exp1c-subjects

2 4 6 8 10 12

2

4

6

8

10

12

GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet

Gui

tar

Har

pV

ioli

n P

izz.

Vio

lin

Bow

ed

Bas

s S

ynth

A

Syn

th B

O

boe

Cla

rine

tF

lute

H

orn

Tru

mpe

t

GuitarHarpViolin Pizz.Violin Bowed Bass Synth A Synth B Oboe ClarinetFlute HornTrumpet

Gui

tar

Har

pV

ioli

n P

izz.

Vio

lin

Bow

ed

Bas

s S

ynth

A

Syn

th B

O

boe

Cla

rine

tF

lute

H

orn

Tru

mpe

t

Subjects (1-24) Spectral cues

Temporal cues

Spectro-temporal cues


Mapping musical instruments

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400

125

250

500

1000

2000

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400

125

250

500

1000

2000

Guitar Trumpet

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000

125

250

500

1000

2000

Trumpar

Frequency (Hz)

Time (ms)200 400 600 800 1000 1200 1400 1600 1800 2000

250

500

1000

2000

4000

ACE Chord

- 1- 2- 4- 8- 16

0.50

1.00

2.00

4.00

8.00

1 2 4 8 16

0.50

1.00

2.00

4.00

8.00

- 1- 2- 4- 8- 16

0.50

1.00

2.00

4.00

8.00

1 2 4 8 16

0.50

1.00

2.00

4.00

8.00

A Melody with the Trumpar


Speech Analysis&

Assessment of Inteligibility


0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.5

1

1.5

2

2.5

3

3.5

4

Time (ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

3

Three envelopes ofmodulation:Slow (< 30 Hz)Intemediate (< 500 Hz)Fast (< 4 kHz)

/come/ /home/ /right/ /away/



Human versus Ferret Sensitivity to Spectrotemporal Modulations




Auditory Scene Analysis&

Pitch Extraction


250 8000.25

8

A

C

1 8 0

0.25

8

Relevance to Auditory Scene Analysis: Streaming and grouping

Frequency

Rate (

Hz)

Working Hypotheses

Streaming: Any consistently isolated feature in the multiscale representation can be streamed e.g., spectral patterns (tones or average vocal tract spectra) repetitive temporal dynamics (modulated noise or sinusoidal FM tones) - transients as segmenters

Grouping: Harmonicity and its linearly interpolated extensions (pitch extraction and segregation, regular patterns) Shared dynamics (Common onsets and modulations)


Frequency (Hz)

Time (ms)100 200 300 400 500 600 700 800 900 1000

250

500

1000

2000

4000

250 500 1000 2000 4000

0.5

1.0

2.0

4.0

8.0

0 20 40 60 80 100 120 14002468

10121416

250 500 1000 2000 4000

0.5

1.0

2.0

4.0

Cortical Representation of Harmonic & Shifted Spectra

Auditory Spectrum Multiscale Representation

Sca

le

Frequency

Reduced Representation

Sca

le

Shifted Spectra are also grouped although they are inharmonic


Computing Pitch


125

250

500

1000

2000

125

250

500

1000

2000

Pitch Estimates

Pre-cortical processing Post-cortical processing


F em ale

10 time (s)

M ale

.125

2

10 time (s)

M ale+F emale

10 time (s)

B Extracted Female

10 time (s)

A

P itc h tracks

Estimating Pitch Extracting Pitch Streams


Voice Morphing


Manipulating Temporal and Spectral Modulations

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Normal

Temporally smeared

Spectrally smeared

Temporally sharpened


Time(ms)100 200 300 400 500 600 700 800 900 1000

125

250

500

1000

2000

Time (ms)500 1000 1500 2000 2500 3000

125

250

500

1000

2000

Time(ms)500 1000 1500 2000 2500 3000

125

250

500

1000

2000

Female

Oboe

Female Oboe

MorphingVoices


Auditory Speech and Music ProcessingTai Chi, Mounya El-Hilali, Powen Ru

Cortical Physiology and Auditory ComputationsDidier Depireux, Jonathan Fritz, David KleinJonathan Simon

Acknowledgment

Supported by:MURI # N00014-97-1-0501 from the Office of Naval Research# NIDCD T32 DC00046-01 from the NIDCD# NSFD CD8803012 from the National Science Foundation

Date post:	13-Jan-2016
Category:	Documents
Upload:	kyne
View:	30 times
Download:	3 times

Representation of Timbre in

Documents