+ All Categories
Home > Documents > lec4_SpeechEnhancement

lec4_SpeechEnhancement

Date post: 14-Apr-2018
Category:
Upload: binh-minh
View: 215 times
Download: 0 times
Share this document with a friend

of 26

Transcript
  • 7/30/2019 lec4_SpeechEnhancement

    1/26

    Speech Enhancement

    Noise Reduction

    Pham Van Tuan

    Electronic & Telecommunication Engineering

    Danang University of Technology

  • 7/30/2019 lec4_SpeechEnhancement

    2/26

    Introduction Aims:

    Improvements in the intelligibility of speech to humanlisteners.

    Improvement in the quality of speech that make itmore acceptable to human listeners.

    Modifications to the speech that lead to improved

    performance of automatic speech or speakerrecognition systems.

    Modifications to the speech so that it may beencoded more effectively for storage or transmission.

    Noise Types: Additive acoustic noise

    Acoustic reverberation

    Convolutive channel effects Electrical interference, Codec distortion

  • 7/30/2019 lec4_SpeechEnhancement

    3/26

    General Scheme

    The signal is firstly transformed into other domains to get abetter presentation of the speech signal

    The noise level is estimated by noise estimation

    The noise component is removed out of the noisy speechsignal by the gain function Based on different linear estimators and non-linear

    estimators, a gain function will be designed.

  • 7/30/2019 lec4_SpeechEnhancement

    4/26

    Noise Estimation Based on different linear estimators and non-linear

    estimators, a gain function will be designed

    The most difficult parts in the noise reduction algorithms Especially for non-stationaryand non-whitenoise (whose

    characteristics change over time & over various frequency bands)

    - Exploiting the periodicity of voiced speech.

    Auditory model based systems.

    Optimal linear estimators.

    Statistical model based systems using optimal non-linear estimators.

    Due to the spectral overlap between speech and noisesignals, the denoised speech signals obtained from single-channel methods exhibit more speech distortion

    However, this method shows low cost and small size

  • 7/30/2019 lec4_SpeechEnhancement

    5/26

    Additive Noise Model Microphone signal is

    desired signalestimate

    desired signal

    ? ][ks)y[k]][][][ knksky +=

    Goal: Estimate s[k]based on y[k]

    Applications: SE in conferencing, handsfree telephony, hearing aids,

    digital audio restoration, speech recognition, speech-based technology

    Will consider speech applications: s[k] = speech signal

    Can be stationary, non-stationary, narrowband, and

    broadband noise. Interference speakers is also considered.

    contribution contribution

  • 7/30/2019 lec4_SpeechEnhancement

    6/26

    Strictly speaking: the estimation of statistical quantities via timeaveraging is only admissible when the signal is stationary and ergodic.

    Signal chopped into `frames (e.g. 10..20msec), for each frame ia

    frequency domain representation is

    where the spectral components are short-time spectra of time domain

    )()()( iii NSY +=

    Additive Noise Model

    signal frames (obtained from windowingtechnique using window w) )(),(),( knkskyiii

  • 7/30/2019 lec4_SpeechEnhancement

    7/26

    Observation

    Magnitude squared DFT coefs. of noisy, voiced speechsound and the estimated noise power spectral density(PSD) of the noisy speech.

  • 7/30/2019 lec4_SpeechEnhancement

    8/26

    Observation Speech signal is an on/off (time-varying) signal, hence some

    frames have speech +noise, some frames have noise only,

    A speech detection algorithm or Voice Activity Detection(VAD) is needed to distinguish between these 2 types offrames (based on statistical features).

    How to design VAD ?

    Normal VAD [McAulay, Malpass 1980] Soft-decision estimators [Sohn, Sung 1998]

    Minimum statistics [Martin 1994, 2001]

    Percentile filter [Pham, Kubin 2005]

    .

  • 7/30/2019 lec4_SpeechEnhancement

    9/26

    Estimation Definition: () =average amplitude of noise spectrum

    Assumption: noise characteristics change slowly, hence estimate ()by (long-time) averaging over (M) noise-only frames

    })({)( iNE=

    Estimate clean speech spectrum Si(), using Gain function Gi() ofcorrupted speech spectrum Yi() + estimated ():

    )()()( iii YGS =

    ))(),(()( ii YfG =

    = framesonly-noise )()( MiYM

  • 7/30/2019 lec4_SpeechEnhancement

    10/26

    Magnitude Spectral Subtraction

    Signal model:

    Estimation of clean speech spectrum:

    )(,)(

    )()()(

    iyji

    iii

    eY

    NSY

    =

    +=

    )(, j iy

    Spectral Subtraction

    PS: half-wave rectification

    )()(

    )(1

    )(

    i

    G

    i

    ii

    YY

    i

    43421

    =

    ))()(,0max()(

    ))(,0max()(

    =

    ii

    ii

    YS

    GG

  • 7/30/2019 lec4_SpeechEnhancement

    11/26

    Power Spectral Subtraction

    Signal model:

    Estimation of clean speech spectrum:

    { } { } { 22222

    )()()()()()(

    iii

    iii

    NEYESENSY

    =

    +=

    Spectral Subtraction

    PS: half-wave rectification

    ( )22

    2

    )()(,0max)( iii YS =

    { } { }

    { }{ } 222

    2

    2

    222

    )()(

    )(

    )(1)(

    )()()(

    ii

    i

    i

    i

    iii

    GYE

    YE

    NEYE

    NEYESE

    =

    =

    =

  • 7/30/2019 lec4_SpeechEnhancement

    12/26

    Suppression Behavior{ }{ }

    =

    =

    )(

    11

    )(

    )(1)(

    2

    2

    2

    ii

    i

    i

    YE

    NEG

    )(iG

    )(i

    )(iG

  • 7/30/2019 lec4_SpeechEnhancement

    13/26

    Wiener Filter in Frequency Domain Wiener Estimation

    Goal: find linear filter Gi() such that MSE

    is minimized

    Solution: The partial derivative of

    2

    )(

    )().()(48476

    iS

    YGSE iii

    2

    with respect to the real part of Gi() which yields the condition:

    and hence we have:

    *)()()()()()()()( iiiiiiii

    YGSYGSESSE =

    { } 0)(Re

    )()(2

    =

    i

    ii

    G

    SSE

    { }

    { } { } { } { }

    2

    22

    22

    2

    2

    2

    )(

    )()(

    )()(

    )(

    )(

    )()(Re

    i

    ii

    ii

    i

    i

    i

    i

    YE

    NEYE

    NESE

    SE

    YE

    SEG

    =

    +==

  • 7/30/2019 lec4_SpeechEnhancement

    14/26

    Generalized Formula Generalized magnitude squared spectral gain function

    { }{ }

    =

    =)(

    11

    )(

    )(1)(

    2

    22

    ii

    i

    i

    YE

    NEG

    Practical heuristic form of spectral subtraction rule:

    =2

    222

    )(

    )(1)()(

    i

    ii

    Y

    YS

  • 7/30/2019 lec4_SpeechEnhancement

    15/26

    Suppression Behavior{ }{ }

    =

    =

    )(

    11

    )(

    )(1)(

    2

    2

    2

    ii

    i

    i

    YE

    NEG

    )(iG

    )(i

    )(iG

  • 7/30/2019 lec4_SpeechEnhancement

    16/26

    Ephraim-Malah Suppression Rule (EMSR)

    +

    +

    +=

    prio

    prio

    prio

    post

    SNRSNR1

    SNR1SNR

    SNR11

    2)(Gi

    MMSE Estimation

    with:

    +

    prioSNR1

    )(

    )()(,0))max(SNR-(1)(SNR

    1)()()(SNR

    )2

    ()2

    ()1(][

    2

    11

    postprio

    2

    post

    102

    )

    +=

    =

    ++=

    ii

    i

    YG

    Y

    IIeM

    modified Bessel functionsprevious frame

  • 7/30/2019 lec4_SpeechEnhancement

    17/26

    Power Spectral Subtraction

    Magnitude Spectral Subtraction

    Gain functions

    Ephraim-Malah Suppr. Rule

    = most frequently used in practice

    Non-linear Estimation

    Maximum Likelihood

  • 7/30/2019 lec4_SpeechEnhancement

    18/26

    Interpretation Power Spectral Subtraction method is interpreted as a time-variant filter with magnitude frequency response:

    The short-time energy spectrum |Yi()|2 of noisy speech

    s gna s ca cu a e rec y. e no se eve s es ma eby averaging over many non-speech frames where thebackground noise is assumed to be stationary.

    Negative values resulting from spectral subtraction are

    replaced by zero. This results into musical noise: asuccession of randomly spaced spectral peaks emerges inthe frequency bands -> the residual noise which is composedof narrow-band components located at random frequencies

    that turn on and off randomlyin each short-time frame

  • 7/30/2019 lec4_SpeechEnhancement

    19/26

    magn u e su rac on

  • 7/30/2019 lec4_SpeechEnhancement

    20/26

    Solutions Flooring factorOver-subtraction factor

    SNR-dependent subtraction factor

    Averaging estimated noise level over K frames

    Reduce noise variance at each frequency: apply a simple

    recursive first-order low-pass filter (using smoothing coefpcontrolling bandwidth & time constant of the LP filter)

  • 7/30/2019 lec4_SpeechEnhancement

    21/26

    Solutions

    )()()( ii YGS =

    - Magnitude averaging: replace Yi() in

    calculation of Gi() by a local average overframes

    probability that speech is present, given observation

    instantaneousaverage

    - EMSR (p7)

    - augment Gi() with soft-decision VAD:

    Gi() P(H1 | Yi()). Gi()

  • 7/30/2019 lec4_SpeechEnhancement

    22/26

    Additive noise model in Wavelet domain:

    Hard-Thresholding

    Wavelet Denoising

    Soft-Thresholding

    Shrinking

  • 7/30/2019 lec4_SpeechEnhancement

    23/26

    Hard Thresholding

  • 7/30/2019 lec4_SpeechEnhancement

    24/26

    Soft Thresholding

  • 7/30/2019 lec4_SpeechEnhancement

    25/26

    Optimal Shrinking

  • 7/30/2019 lec4_SpeechEnhancement

    26/26

    LAB ASSIGNMENTNoise Reduction For Speech Enhancement

    1.From the provided Matlab codes, present algorithm charts of the

    following algorithms: Spectral Subtraction, Wiener Filter and Log-MMSE.Be sure that you understand all parts of the algorithms.

    2. Test the algorithms with provided audio samples. Evaluate processedspeech quality unofficially based on subjective evaluation with CCR(table 1 described below). Give your comments on this first test.

    3.From observed results, find out and explain sensitive variables of theexamined algorithms that affect performance of the algorithms

    4.Propose your solutions how to improve performance of the algorithms

    5.Test the modified algorithms with the provided audio samples again.Evaluate processed speech quality unofficially based on subjectiveevaluation with CCR.

    6.Hand in your report at the end of the Lab session.