+ All Categories
Home > Documents > No Slide Title -...

No Slide Title -...

Date post: 20-Aug-2018
Category:
Upload: phunghanh
View: 218 times
Download: 0 times
Share this document with a friend
73
Parts I-II 1 Computational Audition DeLiang Wang Perception and Neurodynamics Lab Ohio State University http://www.cse.ohio-state.edu/pnl
Transcript
Page 1: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 1

Computational Audition

DeLiang Wang

Perception and Neurodynamics Lab

Ohio State University

http://www.cse.ohio-state.edu/pnl

Page 2: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 2

Human brain

Page 3: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 3

Three levels of analysis

Marrian framework for understanding complex

information processing systems (Marr’82)

Computational theory

• Goals of computation, appropriateness of the goal, general strategies

• Representation/Algorithm

• How to represent the input and the output

• Algorithms for transforming from one representation to another

• Implementation

• How can the representation and algorithm be realized physically

(architecture, hardware)?

Page 4: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 4

What is the goal of audition?

What is the goal of perception?

The perceptual systems are ways of seeking and extracting

information about the environment from sensory input (Gibson’66)

The purpose of vision is to produce a visual description of the

environment for the viewer (Marr’82)

By analogy, the purpose of audition is to produce an

auditory description of the environment for the listener

The description is unique to the listener, even though the physical

environment can be common for different listeners

Page 5: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 5

Outline of short course

Part I: Acoustics and signal processing background

Part II: Physiological and perceptual basis of audition

Part III: Fundamentals of computational audition

Part IV: Computational auditory scene analysis

Part V: Pattern recognition and learning

Discussion and conclusion

Page 6: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 6

Computational Audition

Part I: Acoustics and Signal Processing

Background

DeLiang Wang

Perception and Neurodynamics Lab

Ohio State University

http://www.cse.ohio-state.edu/pnl

Page 7: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 7

Outline of Part I

Basic concepts in acoustics

Amplitude, phase, frequency/period

Power, intensity, decibels

Spectrum, formant, envelope

Pitch, loudness, timbre

Basic signal processing

Cepstrum

Linear prediction coding (LPC)

Page 8: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 8

Basics of sound

• Mathematics of the pure tone

or

• A: amplitude

• : phase

• T: period

• f: frequency

x(t) Asin(2t / T )

x(t) Asin(2ft )

Page 9: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 9

Phase lead – phase lag

Page 10: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 10

Power, intensity, and decibels

• Treat signal x(t) as voltage

• By Ohm's law, the current i(t) = x(t)/R

• Then the instantaneous power is

• Energy is the integrated power over a certain time period (e.g. kilowatt vs. kilowatthour)

P(t) x(t)i(t) x2(t) / R

Page 11: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 11

Power, intensity, and decibels (cont.)

• Treat signal x(t) as sound pressure

• Then the instantaneous intensity is

• I(t) is measured in watts/m2 (x(t): pressure, Newtons/m2 or pascals)

• ρ: the density of the medium

• c: speed of sound

I(t) x2(t) /(c)

Page 12: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 12

Sound level

• Ratio of one sound to another (baseline), expressed as decibels (dB)

• Note the use of common logarithm

• Double intensity leads to 3 dB, and double amplitude leads to 6 dB

• SNR: signal-to-noise ratio

• Conversational speech is about 65 dB. Above 100 dB is damaging to the ear

L2 L1(decibels) 10log10(I2 / I1)

Page 13: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 13

How loud are sounds?

Source:

http://www.handsandvoices.org

/resource_guide/055_audiogram.html

Page 14: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 14

Spectrum

• Fourier Series: For any periodic function of time, x(t), with period T, i.e.

x(t+mT) = x(t), for all integer m

• x(t) can be represented as a Fourier series like this

• Furthermore,

• n is integer and is the fundamental frequency

x(t) A0 [An cos(nt) Bn sin(nt )]n1

n n0 2n / T

0

Page 15: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 15

Spectrum (cont.)

• "The multiplicity of vibrational forms which can be thus produced by the composition of simple pendular vibrations is not merely extraordinarily great; it is so great that it can not be greater." (H. Holmholtz, 1863)

Page 16: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 16

Waveform illustration

A periodic waveform

Page 17: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 17

Spectrum illustration

Page 18: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 18

Spectrum illustration

Page 19: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 19

Spectrum illustration

Page 20: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 20

Fourier transform

• For any function of time, x(t), the Fourier transform X(w)

of x(t) is defined in terms of the Fourier integral:

• The Fourier transform converts a function of time to a

function of frequency

• Inverse Fourier transform

X() eit

x(t)dt

x(t) 1

2e

it

X()d

)sin()cos( wtite ti

Page 21: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 21

Pitch

• Definition: "that attribute of auditory sensation in terms

of which sounds may be ordered on a musical scale"

(American Standards Association, 1960)

• Pitch is related to the repetition rate of the waveform:

• For pure tone, pitch corresponds to its frequency

• For a periodic complex tone, pitch corresponds to its fundamental

frequency

Page 22: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 22

Envelope and formant

• Envelope: Amplitude variation (modulation)

• Formant: A resonance in the vocal tract which is usually manifested as a peak in the spectral envelope of a speech sound

Page 23: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 23

Formant illustration

Page 24: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 24

Three characteristics of sound sensation

• Pitch (fundamental frequency)

• Loudness (intensity)

• Timbre (quality - spectral envelope)

Page 25: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 25

Signal representations: Cepstrum

• Source-filter separation for sound production

• For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes, and filter corresponds to vocal tract (resonators)

• For music, source corresponds to vibrations (e.g. vibrating strings in plucked or bowed string instrument) and filter corresponds to the body of the instrument

• Overall signal reaching the ear is the convolution of source with the impulse response of filter

• Cepstral analysis attempts to separate source from filter, hence it can be viewed as deconvolution

tdthttxtxththtxty

)()()()()()()(

Page 26: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 26

Speech production illustration

Page 27: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 27

Real cepstrum

• For speech, the spectral magnitude can be written as

• Taking the logarithm yields

• Observation for speech production

• The E term corresponds to an event (e.g. a pulse train with a frequency of 100 Hz) more extended in time than the impulse response of the vocal tract. Analogously, E corresponds to “carrier” and V corresponds to “envelope” in the frequency domain. In other words, E varies more quickly with respect to ω than V

• Hence, one can apply some kind of “filter” to separate “high-frequency” components from “low-frequency” components, thus E term and V term

X() V() E()

log X( ) logV() log E( )

Page 28: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 28

Real cepstrum (cont.)

• Change of notations because the variable is frequency

rather than time

• Filtering -> liftering

• Frequency response -> quefrency response

• Spectrum -> cepstrum

• High (low) frequency components -> high (low) time components or

high (low) quefrency components

Page 29: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 29

Real cepstrum (cont.)

• The log-operation converts a multiplicative term into an additive term, which can be operated upon by a linear operation such as filtering. The cepstrum is defined as the inverse Fourier transform

• c(n) is called the nth cepstral coefficient

• Given separated cepstra for excitation and vocal tract, they can be inverted to give original spectral magnitudes

• Only a moderate number of ceptral coefficients (e.g. 10-14) is needed for many applications, including speech recognition

• Complex cepstrum exists as well

c(n) 1

2e

inlog X( )d

Page 30: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 30

Cepstral analysis illustration

Page 31: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 31

LPC for speech modeling

• The vocal tract can be modeled as a cascaded set of acoustic tubes, each corresponding to a resonator

• Furthermore, each resonator corresponds to a formant

• Complete vowel spectrum can be reasonably represented by six resonators

• A direct implementation of the spectral model is written as an all-pole filter in the complex z domain (z-transform is the discrete-time counterpart of the Laplace transform - generalized form of the Fourier transform):

• P is twice the number of resonators, aj’s are coefficients

H(z) 1

1 ajz j

j1P

Page 32: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 32

LPC illustration

Page 33: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 33

LPC (cont.)

• In the above system, the discrete-time response y(n) to the excitation x(n) can be written as

• In LPC, the coefficients are computed to give an approximation to the original signal. That is, one attempts to predict the speech signal by a linear, weighted sum of its previous values:

• is the linear predictor of y(n)

• The coefficients that produce the best approximation are called the linear prediction coefficients

y(n) x(n) ajy(n j)j1P

˜ y (n) a jy(n j)j1P

˜ y (n)

Page 34: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 34

LPC (cont.)

• The difference between the predictor and the original

signal is called the error signal, residual error, LPC

residual, or prediction error

• can be viewed as an approximation to the

excitation signal

e(n) y(n) ˜ y (n)

Page 35: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 35

Residual error illustration

Page 36: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 36

LPC (cont.)

• Computing the coefficients can be viewed as an

optimization problem, where square error is generally

used

• Various methods can be employed to find coefficients,

including gradient descent

D e2(n)

n0

N1

[y(n) ajy(n j)j1P

]2

n0

N1

Page 37: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 37

LPC (cont.)

• Properties of LPC representation

• For a harmonic signal, the (spectral) model spectrum tends to follow

(hug) harmonic peaks, but not harmonic valleys, hence yielding an

estimate of the envelope of the signal spectrum

• Too many coefficients will yield a good fit to signal spectrum, but

miss spectral envelope. On the other hand, too few coefficients will

miss formants. A reasonable number is between 10 and 20.

• Prediction error is significantly higher for unvoiced speech

• Compared to Fourier and cepstral analysis, LPC is more

directly related to vocal tract characteristics

Page 38: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 38

More LPC illustrations

Page 39: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 39

More LPC illustrations

Page 40: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 40

More LPC illustrations

Page 41: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 41

Spectral analysis via filterbanks

Page 42: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 42

Summary table

Page 43: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 43

Summary of Part I

• A number of important concepts in acoutics, e.g.

intensity and decibels

• Sensation of a signal is based on acoustic characteristics,

but not the same

• Advanced signal representations, e.g. cepstrum, highlight

important features

Page 44: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 44

Computational Audition

Part II: Physiological and Perceptual

Basis of Audition

DeLiang Wang

Perception and Neurodynamics Lab

Ohio State University

http://www.cse.ohio-state.edu/pnl

Page 45: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 45

Outline of Part II

Physiological basis

Psychoacoustic basis

Auditory scene analysis

Real-world audition

Page 46: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 46

Human ear

A complex mechanism for transducing pressure variations in the air to neural impulses in auditory nerve fibers

Page 47: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 47

Traveling wave

• Different frequencies of sound give rise to maximum vibrations at different places along the basilar membrane

• The frequency of vibration at a given place is equal to that of the nearest stimulus component (resonance)

• Hence, the cochlea performs a frequency analysis

Page 48: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 48

Auditory nerve response

Page 49: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 49

Cochlear filtering model

The gammatone function

approximates

physiologically-recorded

impulse responses

n = filter order (typically 4)

b = bandwidth

f0 = centre frequency

= phase

Page 50: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 50

Gammatone filterbank

• Each position on the basilar membrane is simulated by a single gammatone filter with appropriate centre frequency and bandwidth

• A small number of filters (e.g. 32) are generally sufficient to cover the range 50-8 kHz

• Note variation in bandwidth with frequency (unlike Fourier analysis)

Page 51: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 51

Response to a pure tone

• Many channels respond, but those closest to tone frequency respond most strongly (place coding)

• The interval between successive peaks also encodes the tone frequency (temporal coding)

• Note propagation delay along the membrane model

Page 52: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 52

Beyond the periphery

• The auditory system is complex with four relay stations between periphery and cortex rather than one in the visual system • In comparison to the auditory

periphery, central parts of the auditory system are less understood

• Number of neurons in the primary auditory cortex is comparable to that in the primary visual cortex despite the fact that the number of fibers in the auditory nerve is far fewer than that of the optic nerve (thousands vs. millions)

The auditory nerve

Page 53: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 53

Demonstrations by the Institute for Perception Research

(IPO) and the Acoustical Society of America

• Section I. Frequency Analysis and Critical Bands

• Demo 1. Cancelled harmonics (Track 1; 1 min and 33s)

• Demo 2. Critical bands by masking (2-6; 1:50)

• Section II. Sound Pressure, Power, Loudness

• Demo 4. Decibel scale (8-10; 1:57)

• Demo 5. Filtered noise (12-15; 1:50)

Basics of auditory perception

Page 54: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 54

• Section IV. Pitch

• Demo 12. Dependence of pitch on intensity (27-28; 0.48)

• Demo 19. Pitch streaming (36; 1:22)

• Demo 26. Scales with repetition pitch (49-51; 1:25)

• Demo 27. Circularity in pitch judgement, or Shepard scale illusion

(52; 1:20)

Basics of auditory perception (cont.)

Page 55: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 55

• Section V. Timbre

• Demo 28. Effect of spectrum on timbre (53; 1:17)

• Section VI. Beats, Combination Tones, Distortion,

Echoes

• Demo 32. Primary and secondary beats (62; 1:32)

• Demo 35. Effect of echoes (70; 1:47)

Basics of auditory perception (cont.)

Page 56: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 56

Auditory scene analysis (ASA)

• Listeners are capable of parsing an acoustic scene (a

sound mixture) to form a mental representation of each

sound source – stream – in the perceptual process of

auditory scene analysis (Bregman’90)

• From events to streams

• Two conceptual processes of ASA:

• Segmentation. Decompose the acoustic mixture into sensory

elements (segments)

• Grouping. Combine segments into streams, so that segments in the

same stream originate from the same source

Page 57: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 57

Simultaneous organization

Simultaneous organization groups sound components

that overlap in time. ASA cues for simultaneous

organization

• Proximity in frequency (spectral proximity)

• Common periodicity

• Harmonicity

• Fine temporal structure

• Common spatial location

• Common onset (and to a lesser degree, common offset)

• Common temporal modulation

• Amplitude modulation (AM)

• Frequency modulation (FM)

Page 58: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 58

Sequential organization

Sequential organization groups sound components across

time. ASA cues for sequential organization

• Proximity in time and frequency

• Temporal and spectral continuity

• Common spatial location; more generally, spatial continuity

• Smooth pitch contour

• Smooth format transition?

• Rhythmic structure

• Rhythmic attention theory (Large & Jones’99)

Page 59: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 59

ASA demos (Bregman and Ahad)

• Simultaneous organization

• Demo 19. Spectral fusion based on common frequency change (Track 19; 1:00)

• Sequential organization

• Demo 1. Stream segregation in a cycle of six tones (1; 0.47)

• Demo 7. Streaming in African xylophone music (7; 1:30)

– Notes chosen from pentatonic scale

• Demo 11. Stream segregation of vowels and diphthongs (11; 1:17)

• Demo 14. Stream segregation of high and low bands of noise (14; 0:44)

Page 60: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 60

Primitive versus schema-based organization

The grouping process involves two aspects:

Primitive grouping. Innate data-driven mechanisms,

consistent with those described by Gestalt psychologists for

visual perception (proximity, similarity, common fate, good

continuation, etc.)

It is domain-general, and exploits intrinsic structure of

environmental sound

Grouping cues described earlier are primitive in nature

Schema-driven grouping. Learned knowledge about

speech, music and other environmental sounds

Model-based or top-down

It is domain-specific, e.g. organization of speech sounds into

syllables

Page 61: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 61

Organisation in speech: Broadband spectrogram

offset

synchrony

onset

synchrony

common

AM

continuity

“… pure pleasure … ”

harmonicity

Page 62: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 62

Organisation in speech: Narrowband spectrogram

offset

synchrony

onset

synchrony

continuity

“… pure pleasure … ”

harmonicity

Page 63: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 63

Real-world audition

What?

• Source type

• Speech

message

speaker

age, gender, linguistic origin, mood, …

• Music

• Car passing by

Where?

• Left, right, up, down

• How close?

Channel characteristics

Environment characteristics

• Room configuration

• Ambient noise

Page 64: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 64

Sources of intrusion and distortion

additive noise from

other sound sources

reverberation from

surface reflections

Page 65: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 65

Cocktail party problem

• Term coined by Cherry

• “One of our most important faculties is our ability to listen to, and

follow, one speaker in the presence of others. This is such a common

experience that we may take it for granted; we may call it ‘the

cocktail party problem’…” (Cherry’57)

• “For ‘cocktail party’-like situations… when all voices are equally

loud, speech remains intelligible for normal-hearing listeners even

when there are as many as six interfering talkers” (Bronkhorst &

Plomp’92)

Ball-room problem by Helmholtz

“Complicated beyond conception” (Helmholtz, 1863)

• Speech segregation problem

Page 66: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 66

Listener’s performance

Speech Reception

Threshold (SRT)

• The speech-to-noise ratio

needed for 50% intelligibility

• Each 1 dB gain in SRT

corresponds to about 10%

increase in intelligibility

(Miller et al.’51) dependent

upon materials

Source: Steeneken (1992)

Page 67: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 67

Effects of types of competing source

Source: Wang & Brown (2006)

SRT

Difference

(23 dB!)

Page 68: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 68

• Overall SNR is set to 0 dB

• Noise-Noise: pink , white , pink+white

• Speech-Speech:

• Noise-Tone:

• Noise-Speech:

• Tone-Speech:

Demo of effects of sound types on separation

Page 69: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 69

Phonemic restoration

• Demonstrations by Richard Warren and James Bashford • Demo 1. Homophonic temporal induction: Broadband noise (Track

2-3; 1:27)

• Demo 2. Temporal induction of speech (10-15; 4:37)

Page 70: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 70

Bregman figure

Pattern Completion

Page 71: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 71

Location

Source: Bronkhorst

& Plomp (1992)

SRT gain

Page 72: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 72

Binaural versus 3D presentation

Source: Drullman & Bronkhorst (2000)

Page 73: No Slide Title - bcmi.sjtu.edu.cnbcmi.sjtu.edu.cn/dragon-star/resources/dragonstar-2013-CA-PartI-II.pdf · Real cepstrum • For speech, the ... • Complex cepstrum exists as well

Parts I-II 73

Summary of Part II

• A whirlwind tour of auditory physiology

• Basic phenomena in auditory perception

• ASA cues essentially reflect structural coherence of a

sound source. A subset of cues believed to be strongly

involved in ASA:

• Simultaneous organization: Periodicity, temporal modulation, onset

• Sequential organization: Location, pitch contour and other source

characteristics (e.g. vocal tract)

• Everyday audition has to contend with additive noise,

reverberation and channel distortions


Recommended