lec4_SpeechEnhancement

7/30/2019 lec4_SpeechEnhancement

1/26

Speech Enhancement

Noise Reduction

Pham Van Tuan

Electronic & Telecommunication Engineering

Danang University of Technology


2/26

Introduction Aims:

Improvements in the intelligibility of speech to humanlisteners.

Improvement in the quality of speech that make itmore acceptable to human listeners.

Modifications to the speech that lead to improved

performance of automatic speech or speakerrecognition systems.

Modifications to the speech so that it may beencoded more effectively for storage or transmission.

Noise Types: Additive acoustic noise

Acoustic reverberation

Convolutive channel effects Electrical interference, Codec distortion


3/26

General Scheme

The signal is firstly transformed into other domains to get abetter presentation of the speech signal

The noise level is estimated by noise estimation

The noise component is removed out of the noisy speechsignal by the gain function Based on different linear estimators and non-linear

estimators, a gain function will be designed.


4/26

Noise Estimation Based on different linear estimators and non-linear

estimators, a gain function will be designed

The most difficult parts in the noise reduction algorithms Especially for non-stationaryand non-whitenoise (whose

characteristics change over time & over various frequency bands)

- Exploiting the periodicity of voiced speech.

Auditory model based systems.

Optimal linear estimators.

Statistical model based systems using optimal non-linear estimators.

Due to the spectral overlap between speech and noisesignals, the denoised speech signals obtained from single-channel methods exhibit more speech distortion

However, this method shows low cost and small size


5/26

Additive Noise Model Microphone signal is

desired signalestimate

desired signal

? ][ks)y[k]][][][ knksky +=

Goal: Estimate s[k]based on y[k]

Applications: SE in conferencing, handsfree telephony, hearing aids,

digital audio restoration, speech recognition, speech-based technology

Will consider speech applications: s[k] = speech signal

Can be stationary, non-stationary, narrowband, and

broadband noise. Interference speakers is also considered.

contribution contribution


6/26

Strictly speaking: the estimation of statistical quantities via timeaveraging is only admissible when the signal is stationary and ergodic.

Signal chopped into `frames (e.g. 10..20msec), for each frame ia

frequency domain representation is

where the spectral components are short-time spectra of time domain

)()()( iii NSY +=

Additive Noise Model

signal frames (obtained from windowingtechnique using window w) )(),(),( knkskyiii


7/26

Observation

Magnitude squared DFT coefs. of noisy, voiced speechsound and the estimated noise power spectral density(PSD) of the noisy speech.


8/26

Observation Speech signal is an on/off (time-varying) signal, hence some

frames have speech +noise, some frames have noise only,

A speech detection algorithm or Voice Activity Detection(VAD) is needed to distinguish between these 2 types offrames (based on statistical features).

How to design VAD ?

Normal VAD [McAulay, Malpass 1980] Soft-decision estimators [Sohn, Sung 1998]

Minimum statistics [Martin 1994, 2001]

Percentile filter [Pham, Kubin 2005]

.


9/26

Estimation Definition: () =average amplitude of noise spectrum

Assumption: noise characteristics change slowly, hence estimate ()by (long-time) averaging over (M) noise-only frames

})({)( iNE=

Estimate clean speech spectrum Si(), using Gain function Gi() ofcorrupted speech spectrum Yi() + estimated ():

)()()( iii YGS =

))(),(()( ii YfG =

= framesonly-noise )()( MiYM


10/26

Magnitude Spectral Subtraction

Signal model:

Estimation of clean speech spectrum:

)(,)(

)()()(

iyji

iii

eY

NSY

=

+=

)(, j iy

Spectral Subtraction

PS: half-wave rectification

)()(

)(1

)(

i

G

i

ii

YY

i

43421

=

))()(,0max()(

))(,0max()(

=

ii

ii

YS

GG


11/26

Power Spectral Subtraction

Signal model:

Estimation of clean speech spectrum:

{ } { } { 22222

)()()()()()(

iii

iii

NEYESENSY

=

+=

Spectral Subtraction

PS: half-wave rectification

( )22

2

)()(,0max)( iii YS =

{ } { }

{ }{ } 222

2

2

222

)()(

)(

)(1)(

)()()(

ii

i

i

i

iii

GYE

YE

NEYE

NEYESE

=

=

=


12/26

Suppression Behavior{ }{ }

=

=

)(

11

)(

)(1)(

2

2

2

ii

i

i

YE

NEG

)(iG

)(i

)(iG


13/26

Wiener Filter in Frequency Domain Wiener Estimation

Goal: find linear filter Gi() such that MSE

is minimized

Solution: The partial derivative of

2

)(

)().()(48476

iS

YGSE iii

2

with respect to the real part of Gi() which yields the condition:

and hence we have:

*)()()()()()()()( iiiiiiii

YGSYGSESSE =

{ } 0)(Re

)()(2

=

i

ii

G

SSE

{ }

{ } { } { } { }

2

22

22

2

2

2

)(

)()(

)()(

)(

)(

)()(Re

i

ii

ii

i

i

i

i

YE

NEYE

NESE

SE

YE

SEG

=

+==


14/26

Generalized Formula Generalized magnitude squared spectral gain function

{ }{ }

=

=)(

11

)(

)(1)(

2

22

ii

i

i

YE

NEG

Practical heuristic form of spectral subtraction rule:

=2

222

)(

)(1)()(

i

ii

Y

YS


15/26

Suppression Behavior{ }{ }

=

=

)(

11

)(

)(1)(

2

2

2

ii

i

i

YE

NEG

)(iG

)(i

)(iG


16/26

Ephraim-Malah Suppression Rule (EMSR)

+

+

+=

prio

prio

prio

post

SNRSNR1

SNR1SNR

SNR11

2)(Gi

MMSE Estimation

with:

+

prioSNR1

)(

)()(,0))max(SNR-(1)(SNR

1)()()(SNR

)2

()2

()1(][

2

11

postprio

2

post

102

)

+=

=

++=

ii

i

YG

Y

IIeM

modified Bessel functionsprevious frame


17/26

Power Spectral Subtraction

Magnitude Spectral Subtraction

Gain functions

Ephraim-Malah Suppr. Rule

= most frequently used in practice

Non-linear Estimation

Maximum Likelihood


18/26

Interpretation Power Spectral Subtraction method is interpreted as a time-variant filter with magnitude frequency response:

The short-time energy spectrum |Yi()|2 of noisy speech

s gna s ca cu a e rec y. e no se eve s es ma eby averaging over many non-speech frames where thebackground noise is assumed to be stationary.

Negative values resulting from spectral subtraction are

replaced by zero. This results into musical noise: asuccession of randomly spaced spectral peaks emerges inthe frequency bands -> the residual noise which is composedof narrow-band components located at random frequencies

that turn on and off randomlyin each short-time frame


19/26

magn u e su rac on


20/26

Solutions Flooring factorOver-subtraction factor

SNR-dependent subtraction factor

Averaging estimated noise level over K frames

Reduce noise variance at each frequency: apply a simple

recursive first-order low-pass filter (using smoothing coefpcontrolling bandwidth & time constant of the LP filter)


21/26

Solutions

)()()( ii YGS =

- Magnitude averaging: replace Yi() in

calculation of Gi() by a local average overframes

probability that speech is present, given observation

instantaneousaverage

- EMSR (p7)

- augment Gi() with soft-decision VAD:

Gi() P(H1 | Yi()). Gi()


22/26

Additive noise model in Wavelet domain:

Hard-Thresholding

Wavelet Denoising

Soft-Thresholding

Shrinking


23/26

Hard Thresholding


24/26

Soft Thresholding


25/26

Optimal Shrinking


26/26

LAB ASSIGNMENTNoise Reduction For Speech Enhancement

1.From the provided Matlab codes, present algorithm charts of the

following algorithms: Spectral Subtraction, Wiener Filter and Log-MMSE.Be sure that you understand all parts of the algorithms.

2. Test the algorithms with provided audio samples. Evaluate processedspeech quality unofficially based on subjective evaluation with CCR(table 1 described below). Give your comments on this first test.

3.From observed results, find out and explain sensitive variables of theexamined algorithms that affect performance of the algorithms

4.Propose your solutions how to improve performance of the algorithms

5.Test the modified algorithms with the provided audio samples again.Evaluate processed speech quality unofficially based on subjectiveevaluation with CCR.

6.Hand in your report at the end of the Lab session.

Date post:	14-Apr-2018
Category:	Documents
Upload:	binh-minh
View:	215 times
Download:	0 times

lec4_SpeechEnhancement

Documents