The psychoacoustics of reverberation - · PDF fileThe psychoacoustics of reverberation ......

The psychoacoustics of reverberation

Steven van de Par [email protected]

July 19, 2016

Thanks to

Julian Grosse and

Andreas Häußler

2016 AES International Conference on Sound Field Control

mailto:[email protected]



Introduction

The psychoacoustics of reverberation, what is this talk about?

• Reverberation is nearly always present in our daily life

• It creates large distortions of the physical waveform

• Yet it mostly has only a small effect on (speech) perception

Introduction





T60 = 250 ms

0 1 2 3 4 5-1

-0.5

0

0.5

1

Time(s)

Am

plit

ude

Clean speech

Reverberated speech

2.95 3 3.05 3.1 3.15 3.2

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Time(s)

Am

plit

ude

Clean speech

Reverberated speech

Introduction





Outline:

• Principles and mechanisms in perception that help beating reverberation

• Some ideas about controlling sound fields in a perceptually motivated manner

The Peripheral Auditory System

Cochlea:

• Mechanical energy (oval window) is converted into a neural signal (auditory nerve)

• Performs a time-frequency analysis

The inner ear

1. cochlear duct 2. scala vestibuli 3. scala tympani 4. spiral ganglion 5. auditory nerve fibres

The Cochlea

• The red arrow is from the oval window • The blue arrow points to the round window • The cochlea is about 2 mm in diameter

Inner Ear: the Basilar Membrane

Frequency-to-place transformation:

Each point on BM acts as a band-pass filter

Cochleagram

Simulates basilar-membrane filtering, and represents magnitudes in dBs.

Brain captures a relatively coarse spectro-temporal representation

Auditory signal representation

Cochleagram is a reasonable first order approximation of perception (loudness, timbre)

Additional perceptual cues (‘texture cues’):

- Timing Information for binaural processing

- ITDs, 20 s JND (source direction)

- Interaural cross-correlation (source width, listener envelopment)

- Temporal pitch cues

- Modulation cues (e.g. roughness of a sound)

Included in advanced models by e.g. Patterson, Meddis and colleagues and Dau et al. (1996, 1997)

Another function within the auditory system

Source segregation:

• Often multiple sources are present simultaneously

• We can focus on one source

• Cocktail party processing:

Listen to one speaker only

Spatial separation helps

How does the brain do it?

https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwiqmdD03ujNAhUBUhQKHaxHCOAQjRwIBw&url=https://www.roboticsbusinessreview.com/the-cocktail-party-effect/&psig=AFQjCNF_rNFmwOFWIYxnZubP7o5Oci9PWg&ust=1468234572788390

Complex acoustical scenes

Acoustic mixtures are often spectro-temporally sparse: For each time-frequency interval one source dominates in level Grouping of signal components is essential to make sense of the speech signal

Time (sec)F

requency (

Hz)

0.5 1 1.5 2 2.5

80

127

201

318

503

796

1260

1995

3159

5000

Azim

uth

(deg)

-50

-40

-30

-20

-10

0

10

20

30

40

50

Time (sec)

Fre

quency (

Hz)

0.5 1 1.5 2 2.5

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Energ

y

Cochleagram of a mix of two speakers binary mask indicating source dominance

Auditory grouping / segregation Bregman 1990: Auditory Scene Analysis

Primitive grouping cues:

• Common onset

• Common pitch

• Common AM/FM modulation

• Common location

All have to do with the physics of sound generation

See also: http://webpages.mcgill.ca/staff/Group2/abregm1/web/

http://webpages.mcgill.ca/staff/Group2/abregm1/web/

Common frequency modulation is a grouping cue (Bregman,

http://webpages.mcgill.ca/staff/Group2/abregm1/web/index.htm)

Auditory grouping / segregation Fusion by common frequency change

Visual grouping / occlusion Apparent continuity

Difficult to see what we are dealing with

Without providing extra parts of the letters we can now see the letter “B” We added information about where the letters are cut The overlay is a physically plausible cause for not seeing part of the letters

Visual grouping / occlusion Apparent continuity

Is the auditory equivalent a two speaker situation?

Time

Fre

quency

Female Speaker

0.5 1 1.5 2 2.5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Fre

quency

Male Speaker

0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Fre

quency

Two speakers

0.5 1 1.5 2 2.5 3 3.5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Female speaker

Male speaker

2 speakers

Dominating voice only

Female mask

Male mask

Linear Sum

30 ms frames 1 critical band

Is the auditory equivalent a two speaker situation?

Female speaker 2 speakers

Male speaker Mask

Role of low-SNR speech glimpses?

Schoenmaker and van de Par (Advances in experimental medicine and biology, 2016)

Remove speech target tiles

with and SNR below a

criterion value

Speech intelligibility impaired

only beyond about 0 dB SNR

Only positive SNR parts of speech contribute to intelligibility

Reverberated

Female mask

Male mask

Linear Sum

30 ms frames 1 critical band T60 = 750 ms

What about reverberation?

Female speaker 2 speakers

Male speaker Mask

Reverberation and the Auditory Representation

Reverberation will temporally smear the auditory signal

Multiple delayed reflections will add to the direct sound

Often reverberant field is stronger than direct sound (critical radius)

Speech phonemes will start to overlap (Speech rate 10 Syllables/sec)

Music is slower (Allegro 150 bpm 3 notes/sec)

Segregation will become more difficult

Remember the primitive grouping cues:

• Common onset (largely preserved)

• Common pitch (pitch unaffected, changes will be smeared)

• Common AM/FM modulation (high rates changed and converted)

• Common location (much reduced reliability)

Measure distribution of binaural cues

– Target at 10º

Measure distribution of binaural cues

– Target at 10º

– Reverberation

Precedence effect:

- The first arriving wave front determines perceived direction

- Allows spatial cues to contribute to segregation in reverberant conditions

Sound localization: Precedence effect (Haas effect)

Intermediate summary

How does perception cope with reverberation:

• Reverberation is not represented well in the brain due to coarse spectro-temporal resolution of the auditory system

• Important perceptual segregation cues are robust against reverb

Common onset

Pitch

Common low-rate AM/FM

Spatial cues (due to precedence effect)

How to use this knowledge for sound field control

The auditory principles that cope with reverberation are implemented in the ‘transformed’ auditory domain.

It is not possible to apply these processing principles directly on acoustical signals.

Two examples will be given that use perceptual processing knowledge for sound field control

Recording room Playback room

Authentic Audio reproduction

Authentic Audio reproduction

Approach to authentic reproduction: - Optimizing spatial parameters on a coarse spectro-temporal

scale is enough: - Direct sound for directional information - Reverberant sound for ASW and LEV (IACC)

- ‘Texture’ cues are represented in microphone signals

- Consider the (reverberant) acoustics at the reproduction side

Grosse and van de Par (IEEE J. OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015)

Room-in-room reproduction

Room-in-room reproduction

Only direct sound can be reproduced optimally No control over reverberant sound field

Perceptual approach

Perceptual Optimization

Perceptual approach


Optimization targeting perceptually relevant statistical properties of reverberant sound field

Perceptual approach


Optimization targeting perceptually relevant statistical properties of reverberant sound field

The acoustics of the playback room is an integral part of the optimization

Optimization

Optimize perceptually relevant statistical parameters:

• Auditory Transfer Function

– Direct sound (front loudspeakers)

– Reverberant sound (dipole loudspeakers)

• Interaural Cross Correlation (frequency dependent)

– Cross-talk dipole loudspeakers

• T60

– Direct-to-reverberant ratio

Perceptual approach

Evaluation

All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings

Compare objective parameters Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)

Objective parameters

RinR = Conventional reproduction without optimization

RinR,Opt = Our proposed optimization

mCH = Multi-channel reproduction with surround speakers

Ref = Recording room

- Coloration can be reduced compared to RinR

- Spatial properties (IACC) better conserved

100 1000 10000

-10

-5

0

5

10

Frequency (Hz)

E

(d

B)

ERinR

ERinR,Opt

EmCH

100 1000 10000-1

-0.5

0

0.5

1

Frequency (Hz)

IAC

C

IACCref

IACCRinR

IACCRinR-Opt

IACCmCH

0 100 200 300 400 500 600

-60

-40

-20

0

t (ms)

L (

dB

)

edcref

edcRinR

edcRinR-Opt

edcmCH

Objective parameters

RinR = Conventional reproduction without optimization

RinR,Opt = Our proposed optimization

mCH = Multi-channel reproduction with surround speakers

Ref = Recording room

- Coloration can be reduced compared to RinR

- Spatial properties (IACC) better conserved

100 1000 10000

-10

-5

0

5

10

Frequency (Hz)

E

(d

B)

ERinR

ERinR,Opt

EmCH

100 1000 10000-1

-0.5

0

0.5

1

Frequency (Hz)

IAC

C

IACCref

IACCRinR

IACCRinR-Opt

IACCmCH

0 100 200 300 400 500 600

-60

-40

-20

0

t (ms)

L (

dB

)

edcref

edcRinR

edcRinR-Opt

edcmCH

Listening test

All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings

MUSHRA test Ref = Recording room Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)

Results and Conclusions Simple loudspeaker set-

up allows:

• Perceptual authentic reproduction

• Individualization by considering playback acoustics

Grosse and van de Par (2015) IEEE Journal of Selected Topics in Signal Processing

Seminar room

Church

Perceptual dereverberation

Scenario:

- Speech reproduction in a reverberant room

- Preprocessing of the speech signal to enhance speech intelligibility

Preprocessing

Speech signal

Reverberant room


Main Idea:

- Conserve spectro-temporal pattern

- Use time-variant filtering (Hodoshima et al., 2006)

0 0.01 0.02 0.03 0.04 0.05

-1

-0.5

0

0.5

1

1.5

2

Time(s)

Am

plit

ude

Reverberated sine

Clean sine

Perceptual dereverberation Approach:

Preprocessing of Loudspeaker inputs

Adapt current frame based on past

Optimize algorithm parameters

with perceptual model.

(Jørgensen et al.

2013)


Listening test:

- Reverberated (pre-processed) speech with reverberated noise

- Measure Speech Reception Threshold


Listening test:

- Robustness for position

- Measure Speech Reception Threshold

Summary

The auditory system:

• Uses low-resolution spectro-temporal representation

• Extracts some special ‘texture’ cues

• Uses robust cues for segregation/grouping

Two examples for sound field control were shown

• Authentic audio reproduction in a reverberant playback room

• Perceptual dereverberation

Thank you for your attention Questions …

Date post:	06-Feb-2018
Category:	Documents
Upload:	hoangdien
View:	227 times
Download:	4 times

The psychoacoustics of reverberation - · PDF fileThe psychoacoustics of reverberation ......

Documents