The psychoacoustics of reverberation
Steven van de Par [email protected]
July 19, 2016
Thanks to
Julian Grosse and
Andreas Häußler
2016 AES International Conference on Sound Field Control
Introduction
The psychoacoustics of reverberation, what is this talk about?
• Reverberation is nearly always present in our daily life
• It creates large distortions of the physical waveform
• Yet it mostly has only a small effect on (speech) perception
Introduction
The psychoacoustics of reverberation, what is this talk about?
• Reverberation is nearly always present in our daily life
• It creates large distortions of the physical waveform
• Yet it mostly has only a small effect on (speech) perception
T60 = 250 ms
0 1 2 3 4 5-1
-0.5
0
0.5
1
Time(s)
Am
plit
ude
Clean speech
Reverberated speech
2.95 3 3.05 3.1 3.15 3.2
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Time(s)
Am
plit
ude
Clean speech
Reverberated speech
Introduction
The psychoacoustics of reverberation, what is this talk about?
• Reverberation is nearly always present in our daily life
• It creates large distortions of the physical waveform
• Yet it mostly has only a small effect on (speech) perception
Outline:
• Principles and mechanisms in perception that help beating reverberation
• Some ideas about controlling sound fields in a perceptually motivated manner
The Peripheral Auditory System
Cochlea:
• Mechanical energy (oval window) is converted into a neural signal (auditory nerve)
• Performs a time-frequency analysis
The inner ear
1. cochlear duct 2. scala vestibuli 3. scala tympani 4. spiral ganglion 5. auditory nerve fibres
The Cochlea
• The red arrow is from the oval window • The blue arrow points to the round window • The cochlea is about 2 mm in diameter
Inner Ear: the Basilar Membrane
Frequency-to-place transformation:
Each point on BM acts as a band-pass filter
Cochleagram
Simulates basilar-membrane filtering, and represents magnitudes in dBs.
Brain captures a relatively coarse spectro-temporal representation
Auditory signal representation
Cochleagram is a reasonable first order approximation of perception (loudness, timbre)
Additional perceptual cues (‘texture cues’):
- Timing Information for binaural processing
- ITDs, 20 s JND (source direction)
- Interaural cross-correlation (source width, listener envelopment)
- Temporal pitch cues
- Modulation cues (e.g. roughness of a sound)
Included in advanced models by e.g. Patterson, Meddis and colleagues and Dau et al. (1996, 1997)
Another function within the auditory system
Source segregation:
• Often multiple sources are present simultaneously
• We can focus on one source
• Cocktail party processing:
Listen to one speaker only
Spatial separation helps
How does the brain do it?
Complex acoustical scenes
Acoustic mixtures are often spectro-temporally sparse: For each time-frequency interval one source dominates in level Grouping of signal components is essential to make sense of the speech signal
Time (sec)F
requency (
Hz)
0.5 1 1.5 2 2.5
80
127
201
318
503
796
1260
1995
3159
5000
Azim
uth
(deg)
-50
-40
-30
-20
-10
0
10
20
30
40
50
Time (sec)
Fre
quency (
Hz)
0.5 1 1.5 2 2.5
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Energ
y
Cochleagram of a mix of two speakers binary mask indicating source dominance
Auditory grouping / segregation Bregman 1990: Auditory Scene Analysis
Primitive grouping cues:
• Common onset
• Common pitch
• Common AM/FM modulation
• Common location
All have to do with the physics of sound generation
See also: http://webpages.mcgill.ca/staff/Group2/abregm1/web/
Common frequency modulation is a grouping cue (Bregman,
http://webpages.mcgill.ca/staff/Group2/abregm1/web/index.htm)
Auditory grouping / segregation Fusion by common frequency change
Visual grouping / occlusion Apparent continuity
Difficult to see what we are dealing with
Without providing extra parts of the letters we can now see the letter “B” We added information about where the letters are cut The overlay is a physically plausible cause for not seeing part of the letters
Visual grouping / occlusion Apparent continuity
Is the auditory equivalent a two speaker situation?
Time
Fre
quency
Female Speaker
0.5 1 1.5 2 2.5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
Fre
quency
Male Speaker
0.5 1 1.5 2 2.5 3 3.5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
Fre
quency
Two speakers
0.5 1 1.5 2 2.5 3 3.5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Female speaker
Male speaker
2 speakers
Dominating voice only
Female mask
Male mask
Linear Sum
30 ms frames 1 critical band
Is the auditory equivalent a two speaker situation?
Female speaker 2 speakers
Male speaker Mask
Role of low-SNR speech glimpses?
Schoenmaker and van de Par (Advances in experimental medicine and biology, 2016)
Remove speech target tiles
with and SNR below a
criterion value
Speech intelligibility impaired
only beyond about 0 dB SNR
Only positive SNR parts of speech contribute to intelligibility
Reverberated
Female mask
Male mask
Linear Sum
30 ms frames 1 critical band T60 = 750 ms
What about reverberation?
Female speaker 2 speakers
Male speaker Mask
Reverberation and the Auditory Representation
Reverberation will temporally smear the auditory signal
Multiple delayed reflections will add to the direct sound
Often reverberant field is stronger than direct sound (critical radius)
Speech phonemes will start to overlap (Speech rate 10 Syllables/sec)
Music is slower (Allegro 150 bpm 3 notes/sec)
Segregation will become more difficult
Remember the primitive grouping cues:
• Common onset (largely preserved)
• Common pitch (pitch unaffected, changes will be smeared)
• Common AM/FM modulation (high rates changed and converted)
• Common location (much reduced reliability)
Measure distribution of binaural cues
– Target at 10º
Measure distribution of binaural cues
– Target at 10º
– Reverberation
Precedence effect:
- The first arriving wave front determines perceived direction
- Allows spatial cues to contribute to segregation in reverberant conditions
Sound localization: Precedence effect (Haas effect)
Intermediate summary
How does perception cope with reverberation:
• Reverberation is not represented well in the brain due to coarse spectro-temporal resolution of the auditory system
• Important perceptual segregation cues are robust against reverb
Common onset
Pitch
Common low-rate AM/FM
Spatial cues (due to precedence effect)
How to use this knowledge for sound field control
The auditory principles that cope with reverberation are implemented in the ‘transformed’ auditory domain.
It is not possible to apply these processing principles directly on acoustical signals.
Two examples will be given that use perceptual processing knowledge for sound field control
Recording room Playback room
Authentic Audio reproduction
Authentic Audio reproduction
Approach to authentic reproduction: - Optimizing spatial parameters on a coarse spectro-temporal
scale is enough: - Direct sound for directional information - Reverberant sound for ASW and LEV (IACC)
- ‘Texture’ cues are represented in microphone signals
- Consider the (reverberant) acoustics at the reproduction side
Grosse and van de Par (IEEE J. OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015)
Room-in-room reproduction
Room-in-room reproduction
Only direct sound can be reproduced optimally No control over reverberant sound field
Perceptual approach
Perceptual Optimization
Perceptual approach
Perceptual Optimization
Optimization targeting perceptually relevant statistical properties of reverberant sound field
Perceptual approach
Perceptual Optimization
Optimization targeting perceptually relevant statistical properties of reverberant sound field
The acoustics of the playback room is an integral part of the optimization
Optimization
Optimize perceptually relevant statistical parameters:
• Auditory Transfer Function
– Direct sound (front loudspeakers)
– Reverberant sound (dipole loudspeakers)
• Interaural Cross Correlation (frequency dependent)
– Cross-talk dipole loudspeakers
• T60
– Direct-to-reverberant ratio
Perceptual approach
Evaluation
All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings
Compare objective parameters Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)
Objective parameters
RinR = Conventional reproduction without optimization
RinR,Opt = Our proposed optimization
mCH = Multi-channel reproduction with surround speakers
Ref = Recording room
- Coloration can be reduced compared to RinR
- Spatial properties (IACC) better conserved
100 1000 10000
-10
-5
0
5
10
Frequency (Hz)
E
(d
B)
ERinR
ERinR,Opt
EmCH
100 1000 10000-1
-0.5
0
0.5
1
Frequency (Hz)
IAC
C
IACCref
IACCRinR
IACCRinR-Opt
IACCmCH
0 100 200 300 400 500 600
-60
-40
-20
0
t (ms)
L (
dB
)
edcref
edcRinR
edcRinR-Opt
edcmCH
Objective parameters
RinR = Conventional reproduction without optimization
RinR,Opt = Our proposed optimization
mCH = Multi-channel reproduction with surround speakers
Ref = Recording room
- Coloration can be reduced compared to RinR
- Spatial properties (IACC) better conserved
100 1000 10000
-10
-5
0
5
10
Frequency (Hz)
E
(d
B)
ERinR
ERinR,Opt
EmCH
100 1000 10000-1
-0.5
0
0.5
1
Frequency (Hz)
IAC
C
IACCref
IACCRinR
IACCRinR-Opt
IACCmCH
0 100 200 300 400 500 600
-60
-40
-20
0
t (ms)
L (
dB
)
edcref
edcRinR
edcRinR-Opt
edcmCH
Listening test
All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings
MUSHRA test Ref = Recording room Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)
Results and Conclusions Simple loudspeaker set-
up allows:
• Perceptual authentic reproduction
• Individualization by considering playback acoustics
Grosse and van de Par (2015) IEEE Journal of Selected Topics in Signal Processing
Seminar room
Church
Perceptual dereverberation
Scenario:
- Speech reproduction in a reverberant room
- Preprocessing of the speech signal to enhance speech intelligibility
Preprocessing
Speech signal
Reverberant room
Perceptual dereverberation
Main Idea:
- Conserve spectro-temporal pattern
- Use time-variant filtering (Hodoshima et al., 2006)
0 0.01 0.02 0.03 0.04 0.05
-1
-0.5
0
0.5
1
1.5
2
Time(s)
Am
plit
ude
Reverberated sine
Clean sine
Perceptual dereverberation Approach:
Preprocessing of Loudspeaker inputs
Adapt current frame based on past
Optimize algorithm parameters
with perceptual model.
(Jørgensen et al.
2013)
Perceptual dereverberation
Listening test:
- Reverberated (pre-processed) speech with reverberated noise
- Measure Speech Reception Threshold
Perceptual dereverberation
Listening test:
- Robustness for position
- Measure Speech Reception Threshold
Summary
The auditory system:
• Uses low-resolution spectro-temporal representation
• Extracts some special ‘texture’ cues
• Uses robust cues for segregation/grouping
Two examples for sound field control were shown
• Authentic audio reproduction in a reverberant playback room
• Perceptual dereverberation
Thank you for your attention Questions …