+ All Categories
Home > Documents > COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear...

COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear...

Date post: 12-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
80
COMPENSATION FOR NONLINEAR DISTORTION IN NOISE FOR ROBUST SPEECH RECOGNITION Mark J. Harvilla Ph.D. Thesis Defense October 27, 2014
Transcript
Page 1: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

COMPENSATION FOR NONLINEAR DISTORTION IN NOISE FOR ROBUST SPEECH RECOGNITION

Mark J. Harvilla Ph.D. Thesis Defense October 27, 2014

Page 2: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Introduction

2

Topic Symbol Fraction of thesis work

Dynamic range compression (DRC) and automatic speech recognition (ASR)

11%

Blind amplitude normalization (BAN) 14%

Blind amplitude reconstruction (BAR) 28%

Robust estimation of distortion (RED) 28%

Artificially-matched training (AMT) 9%

The Big Picture 10%

DRC & ASR

BAN

BAR

RED

AMT

Big Picture

DRC & ASR BAN BAR RED AMT Big Picture Conclusion Introduction

Page 3: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Dynamic Range Compression (DRC) •  A form of nonlinear distortion

Ø  Nonlinear systems are common (e.g., AM/FM radio, rectifiers)

• DRC is used extensively in audio engineering typically for one of three reasons: 1.  Adhere to dynamic range limitations of a signal transmission

system, while increasing average signal power 2.  Increase perceived signal loudness 3.  Eliminate drastic changes in volume (e.g., automatic gain control)

•  Because of the ubiquity of DRC, speech systems—like ASR—

are likely to encounter compressed speech

3

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Page 4: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8−0.6−0.4−0.2

00.20.40.60.8

1

input amplitude

outp

ut a

mpl

itude

τ = 0.6

τ = 0.1

R = 1.5R = 2.5R = ∞

Dynamic Range Compression (DRC) • DRC is characterized by two parameters, ratio (R) and

threshold (τ).

4

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Page 5: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

0 0.005 0.01 0.015 0.02 0.025 0.03−1

−0.8−0.6−0.4−0.2

00.20.40.60.8

1

time (seconds)

−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8−0.6−0.4−0.2

00.20.40.60.8

1

input amplitude

outp

ut a

mpl

itude

R = 1R = 1.5R = 2.5R = ∞

Dynamic Range Compression (DRC)

5

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 10

0.005

0.01

0.015

0.02

0.025

0.03

amplitude

time

(sec

onds

)

Page 6: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

0 10 20 30 40 50 60 70 80 90 1000

4

8

12

16

20

τ, threshold (percentile)

SNR

(dB)

R=1.5R=2R=3R=6R=∞

Dynamic Range Compression (DRC)

6

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Page 7: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Some examples

7

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Threshold (τ) Ratio (R) Audio Crest Factor Word Error

Rate (WER) WER after processing

P100 1 17.1 dB 6.4% 6.4%

P75 4 7.7 dB 20.3% 6.4%

P75 ∞ 4.1 dB 30.8% 13.5%

P50 4 6.7 dB 30.2% 6.4%

P50 ∞ 2.2 dB 49.5% 23.0%

Page 8: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Measuring the effect of DRC on ASR

8

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Clean acoustic model

clean speech

signal

Controlled parameter

values: (R,τ)

Measure word error rate (WER) DRC ASR

Experiment 1 (no additive noise):

Page 9: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Measuring the effect of DRC on ASR

9

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Clean acoustic model

clean speech

signal

Controlled parameter

values: (R,τ)

Measure word error rate (WER) DRC ASR

Experiment 2 (additive, channel noise):

Additive noise at

controlled SNR

+

Page 10: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Measuring the effect of DRC on ASR

10

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Experiment 1 (no additive noise):

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Clean acoustic model

clean speech

signal

Controlled parameter

values: (R,τ)

Measure word error rate (WER) DRC ASR

No additive noise

Page 11: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Measuring the effect of DRC on ASR

11

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Experiment 2 (additive, channel noise): Clean acoustic

model

clean speech

signal

Controlled parameter

values: (R,τ)

Measure word error rate (WER) DRC ASR

Additive noise at

controlled SNR

+

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Additive noise at 20-dB SNR w.r.t. compressed signal

Page 12: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

Measuring the effect of DRC on ASR

12

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

Experiment 2 (additive, channel noise): Clean acoustic

model

clean speech

signal

Controlled parameter

values: (R,τ)

Measure word error rate (WER) DRC ASR

Additive noise at

controlled SNR

+

Additive noise at 15-dB SNR w.r.t. compressed signal

Page 13: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Counteracting the effects of DRC

13

BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR

DRC

Saturating “clipping”

Non-saturating “compression”

Blind amplitude reconstruction

(BAR)

Blind amplitude normalization

(BAN)

Artificially-matched

training (AMT)

Robust estimation of nonlinear distortion function (RED)

Page 14: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)

•  Step 1: Obtain estimate of the cumulative distribution function (CDF) of the observed speech, and of clean, unadulterated reference speech.

14

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Observed speech (R = 10, τ = P50) Clean speech

Page 15: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

•  Step 2: For a given reference signal amplitude, find the amplitude in the observed CDF with the same cumulative probability.

15

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Ø  Input amplitude of 0.061 maps to 0.2

Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Page 16: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

•  Step 3: Repeat for each input signal amplitude to obtain a full non-parametric estimate of the nonlinear mapping.

16

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Page 17: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

How well does BAN work? •  Experiment 1 (no additive noise):

17

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Before BAN

Page 18: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

How well does BAN work? •  Experiment 1 (no additive noise):

18

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

After BAN

Page 19: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

How well does BAN work? •  Experiment 2 (additive, channel noise at 20-dB SNR):

19

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Before BAN

Page 20: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

How well does BAN work? •  Experiment 2 (additive, channel noise at 20-dB SNR):

20

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

After BAN

Page 21: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

How well does BAN work? •  Experiment 2 (additive, channel noise at 15-dB SNR):

21

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Before BAN

Page 22: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

How well does BAN work? •  Experiment 2 (additive, channel noise at 15-dB SNR):

22

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

After BAN

Page 23: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

•  Idea: Shift each input sample by the amount the centroid of it and its neighbors is changed when inverting the nonlinearity.

23

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Observed speech after low-pass filter (R = 10, τ = P50, SNR = 15 dB)

Clean speech after low-pass filter

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Page 24: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

•  Step 1: As before, for a given reference signal amplitude, find the amplitude in the observed CDF with the same cumulative probability.

24

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Page 25: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

•  Step 2: The difference between the output and the input is the offset to be added to the original, noisy and compressed waveform.

25

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

Offset = output – input = 0.2 – 0.061 = 0.139

Page 26: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

26

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

−0.08 −0.048 −0.016 0.016 0.048 0.080

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

−1 −0.6 −0.2 0.2 0.6 10

0.2

0.4

0.6

0.8

1

amplitude

cum

ulat

ive

prob

abili

ty

•  Step 3: Repeat for each input signal amplitude, always using the inverse mapping defined by the smoothed signals.

Page 27: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

27

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

•  Step 1: For each sample, find the centroid of the value and its surrounding 4 samples. •  Step 2: Pass the centroid value through the inverse

nonlinearity estimate. •  Step 3: Find the difference (“offset”) between the output of

the inverse nonlinearity and the centroid. •  Step 4: Add the offset to the original noisy and compressed

sample value from Step 1. •  Step 5: Repeat for each sample in the input signal.

Page 28: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

28

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1

00.10.20.30.4

time (seconds)

ampl

itude

originalDRC + noise (SNR = 15dB)

0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1

00.10.20.30.4

time (seconds)

ampl

itudeRepaired

using BAN:

Page 29: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust BAN (Harvilla & Stern; unpub.)

29

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1

00.10.20.30.4

time (seconds)

ampl

itude

originalDRC + noise (SNR = 15dB)

0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1

00.10.20.30.4

time (seconds)

ampl

itude

Repaired using

Robust BAN:

Page 30: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

R=2 R=4 R=6 R=10 R=20−30

−20

−10

0

10

20

30

(RB

AN−B

AN

) rel

. im

prov

. (%

)

15−dB SNR20−dB SNR

•  RBAN is more useful as R becomes large and SNR decreases:

Results summary

30

DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN

Page 31: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Blind Amplitude Reconstruction (BAR) • When R = ∞, BAN techniques are ineffective. •  All samples greater than |τ| are completely lost (“clipping”).

31

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Page 32: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Consistent Iterative Hard Thresholding (Kitic et al.; ICASSP 2013)

• Kitic-IHT works by learning a sparse representation of the incoming clipped speech in term of Gabor basis vectors. •  Learning is done using a modified version of the Iterative

Hard Thresholding (IHT) algorithm. •  The learned sparse representation is then used to

reconstruct the signal on a frame-by-frame basis.

32

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

Kitic-IHT will be used as a baseline to compare novel declipping algorithm performance.

Gabor basis vectors

Sparse representation, learned from clipped observation

Repaired signal frame

Page 33: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Constrained BAR (Harvilla & Stern; Interspeech 2014) • Declip the signal by interpolating missing samples such that

the energy in the second derivative is minimized (i.e., for smoothness). •  Ensure the interpolation matches the sign of the clipped

signal and is greater than |τ| in the absolute sense.

33

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Page 34: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Constrained BAR (Harvilla & Stern; Interspeech 2014) •  Explaining masking matrices

34

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

Isolates reliable samples

Isolates clipped samples

Page 35: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Constrained BAR (Harvilla & Stern; Interspeech 2014)

35

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

minimize

subject to

xc

CBAR objective function:

Page 36: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Constrained BAR (Harvilla & Stern; Interspeech 2014) •  Because Constrained BAR (CBAR) imposes a hard constraint

when minimizing the objective function, it is very slow.

•  A line search algorithm is used to solve the constrained optimization separately for every frame.

•  In the worst case, it is 400 times slower than real time. •  This motivates the development of a declipping algorithm

that does not require a hard constraint.

36

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

Page 37: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

37

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Replace CBAR’s hard constraint with regularization terms:

minimize

subject to

xc

CBAR objective function:

Page 38: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

38

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Replace CBAR’s hard constraint with regularization terms:

minimize xc

Page 39: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

39

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Replace CBAR’s hard constraint with regularization terms:

minimize xc

Page 40: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

40

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Replace CBAR’s hard constraint with regularization terms:

minimize xc

RBAR objective function: xc can be solved for in closed form!

Page 41: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

41

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Replace CBAR’s hard constraint with regularization terms:

Frame-specific solution: xc can be solved for in closed form!

Page 42: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

42

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  The t0 and t1 terms are target vectors. •  They “float” above the clipped segments at the target

amplitude. •  They are defined as a function of the fraction of clipped

samples in a frame.

Page 43: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

43

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Page 44: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Regularized BAR (Harvilla & Stern; ICASSP 2015)

44

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

t0

t1

Page 45: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Regularized BAR (Harvilla & Stern; ICASSP 2015)

45

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

Page 46: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Regularized BAR (Harvilla & Stern; ICASSP 2015)

46

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

The target amplitudes underestimate the true peak (future research).

Page 47: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Regularized BAR (Harvilla & Stern; ICASSP 2015)

47

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Amplitude prediction

0 0.2 0.4 0.6 0.8 10

80

160

240

320

400

fraction of clipped samples

P 95 / τ

exponentialpower−law

ρ: fraction of clipped samples in frame

Page 48: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Processing speed

48

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

20 40 60 80−2 [0.13]

−1 [0.37]

0 [1.00]

1 [2.71]

2 [7.39]

3 [20.1]

4 [54.6]

5 [148]

6 [403]

τ, threshold (percentile)

log(

TRT)

[run

time/

inpu

t dur

atio

n]

CBARKitic−IHTRBAR

Page 49: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Declipping performance

49

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Experiment 1 (no additive noise):

15 35 55 75 95 1000

20

40

60

80

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

no declippingRBARKitic−IHTCBAR

Page 50: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Declipping performance

50

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Experiment 1 (no additive noise), relative improvements:

15 35 55 75 95−30−20−10

010203040506070

τ, threshold (percentile)

Rel

ativ

e de

crea

se in

WER

(%)

relative to no declippingrelative to RBARrelative to Kitic−IHT

15 35 55 75 95−30−20−10

010203040506070

τ, threshold (percentile)

Rel

ativ

e de

crea

se in

WER

(%)

relative to no declippingrelative to Kitic−IHT

CBAR RBAR

Page 51: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Declipping performance

51

DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR

•  Experiment 2 (additive noise):

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

no declippingRBARCBARKitic−IHT

τ = P75 τ = P95

The location of all clipped samples is assumed known.

Kitic-IHT is more robust to additive noise (future research).

Page 52: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Is audio exposed to

DRC?

Is audio clipped?

Apply BAR

Extract features

Apply BAN

yes

no

yes

no

Receive audio

Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if

declipping (BAR) or decompression (BAN) need to be performed?

52

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Page 53: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if

declipping (BAR) or decompression (BAN) need to be performed?

53

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Is audio exposed to

DRC?

Is audio clipped?

Apply BAR

Search for peaks in the probability distribution of the waveform amplitudes

Accurately estimate the value of R (recall: if R is “very” large, speech is effectively clipped)

Requires estimation of which samples are clipped and must assume the possibility of noise (e.g., as in Experiment 2)

Page 54: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)

•  Exposure to DRC significantly modifies the waveform amplitude distribution of the speech

54

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

−0.6 −0.4 −0.2 0 0.2 0.4 0.60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

amplitude

probability

−0.6 −0.4 −0.2 0 0.2 0.4 0.60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

amplitude

probability

Uncompressed speech with noise at 15-dB SNR

DRC’ed speech (R=6, τ=0.06) + noise at 15 dB

Page 55: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

−0.6 −0.4 −0.2 0 0.2 0.4 0.60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

amplitude

probability

Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)

•  Exposure to DRC significantly modifies the waveform amplitude distribution of the speech

55

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

DRC’ed speech (R=6, τ=0.06) + noise at 15 dB

Clipping detection and τ estimation algorithm: 1.  Detect peaks in the

distribution 2.  Compute:

3.  Output indicates clipping occurrence and amplitude value of τ (0.5*( |-τ| + 0 + |τ| ))

(if output is ∞, no clipping)

Page 56: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)

56

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Clipped signal detection accuracies

5 10 15 200

20

40

60

80

100

SNR (dB)

Clip

ped

signa

l det

. acc

. (%

)

5 10 15 200

20

40

60

80

100

SNR (dB)Cl

ippe

d sig

nal d

et. a

cc. (

%)

τ = P95 τ = P75

Because the amplitude distribution merges into one lobe (thus, one peak) with decreasing SNR and τ, detection accuracy correspondingly decreases.

Page 57: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)

57

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

SNR = 20 dB SNR = 15 dB

SNR = 10 dB SNR = 5 dB

τ-estimation accuracies for R = ∞

0.03 0.06 0.09 0.12 0.15−0.01

0.02

0.05

0.08

0.11

0.14

0.17

0.2

0.23

τ, actual

τ, e

stim

ate

0.03 0.06 0.09 0.12 0.15−0.01

0.02

0.05

0.08

0.11

0.14

0.17

0.2

0.23

τ, actual

τ, e

stim

ate

0.03 0.06 0.09 0.12 0.15−0.01

0.02

0.05

0.08

0.11

0.14

0.17

0.2

0.23

τ, actual

τ, e

stim

ate

0.03 0.06 0.09 0.12 0.15−0.01

0.02

0.05

0.08

0.11

0.14

0.17

0.2

0.23

τ, actual

τ, e

stim

ate

Page 58: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

• Given the amplitude value of τ, how do we determine the location of clipped samples?

58

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

signal samplesclipping threshold

0 0.5 1 1.5 2 2.5 3x 10−3

−0.3

−0.1

0.1

0.3

0.5

time (seconds)

ampl

itude

Clipped speech, no noise Clipped speech + noise at 10-dB SNR

Page 59: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

•  Given the amplitude value of τ, how do we determine the location of clipped samples? •  Solution:

Given, amplitude value of τ percentile value of τ variance of the additive noise (σw

2) variance of the observed signal (σy

2)

•  Model the clean speech and noise with separate Gaussians •  For each sample, classify as clipped if

59

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Pr( clipped|observed sample, τ, σw2, σy

2) > Pr( not clipped|observed sample, τ, σw2, σy

2)

Page 60: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

60

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

−0.2 −0.12 −0.04 0.04 0.12 0.20

5.2

10.4

15.6

20.8

26

amplitude

prob

abili

ty d

ensi

ty

clippednot clipped

Speech clipped at τ = 0.07 and added to noise at 15-dB SNR

Page 61: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

61

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

0 4 8 12 16 20 2460

70

80

90

100

SNR (dB)

mea

n cl

assif

icat

ion

accu

racy

τ = P95τ = P75τ = P55τ = P35

Page 62: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Is audio exposed to

DRC?

Is audio clipped?

Apply BAR

Extract features

Apply BAN

yes

no

yes

no

Receive audio

Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if

declipping (BAR) or decompression (BAN) need to be performed?

62

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Page 63: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

63

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

Apply BAR

Voice activity

detection

Estimation of noise variance

Estimation of τ

percentile

Clipped sample

estimation

Declipping

Page 64: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

64

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

•  Experiment 2 (additive noise):

τ = P75 τ = P95

The location of all clipped samples is assumed known.

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

no declippingRBARCBARKitic−IHT

Page 65: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

no declippingRBARCBARKitic−IHT

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

65

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

•  Experiment 2 (additive noise):

τ = P75 τ = P95

Clipping occurrence and location is detected using RED techniques

Page 66: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

← clipped signal detection accuracy

no declippingRBARCBARKitic−IHT

5 10 15 200

20

40

60

80

100

SNR (dB)

Wor

d er

ror r

ate

(%)

← clipped signal detection accuracy

Clipped sample estimation (Harvilla & Stern; ICASSP 2015)

66

DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED

•  Experiment 2 (additive noise):

τ = P75 τ = P95

Clipping occurrence and location is detected using RED techniques

Page 67: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Artificially-Matched Training (AMT) •  So far, the developed techniques have sought to repair

clipped, compressed and noisy speech to “look like” clean speech:

67

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

noisy observations

compensation

clean models

Page 68: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Artificially-Matched Training (AMT) •  Ultimately, it’s only important for the Acoustic Model and

testing data conditions to match. They both need not be “clean.”

68

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

noisy observations

noisy models

Page 69: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Artificially-Matched Training (AMT)

69

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

•  Experiment 1 (no additive noise):

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Clean training

Page 70: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Artificially-Matched Training (AMT)

70

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

•  Experiment 1 (no additive noise):

DRC-matched training

Page 71: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Artificially-Matched Training (AMT)

71

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

• One approach to achieving this in practice:

xn

MFCC ASR WER

Regression on DRC parameters

{R,τ}

{Rk-1, τk-1} {R1, τ1} {R0, τ0} … Bank of

acoustic models

Artificially-Matched Training with Acoustic Model Selection (AMT-AMS)

Current implementation uses the following parameter sets: R = {∞} τ = {P15, P35, P55, P75, P95}

Page 72: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Artificially-Matched Training (AMT)

72

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

•  Experiment 1 (no additive noise):

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Clean training

Page 73: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

15 35 55 75 95 100

102030405060708090

100

τ, threshold (percentile)

Wor

d er

ror r

ate

(%)

R = ∞R = 20R = 10R = 6R = 4R = 2R = 1

Artificially-Matched Training (AMT)

73

DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT

•  Experiment 1 (no additive noise):

AMT-AMS

Page 74: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

x[n] y[n]

+

w[n]

SNR in dB drawn from N(µ,σ2)

τ drawn uniformly in [τ0,τ1]

Compress with probability pc

Add noise with probability pn

R drawn from Gamma dist., [kR,θR]

The Big Picture • With no knowledge of the noise conditions and

characteristics of the incoming speech, how well does the combination of algorithms from the thesis work in practice?

74

DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture

pc = 0.9 t0 = 60 t1 = 98 pn = 0.75 µ = 20 σ2 = 25 k = 3 θ = 2

Compression

Page 75: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

The Big Picture

75

DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture

Compression

12

19

26

33

40

Wor

d er

ror r

ate

(%)

none

RBAN BAN

Page 76: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

x[n] y[n]

+

w[n]

SNR in dB drawn from N(µ,σ2)

τ drawn uniformly in [τ0,τ1]

Clip with probability pc

Add noise with probability pn

The Big Picture • With no knowledge of the noise conditions and

characteristics of the incoming speech, how well does the combination of algorithms from the thesis work in practice?

76

DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture

pc = 0.9 t0 = 60 t1 = 98 pn = 0.75 µ = 20 σ2 = 25

Clipping

Page 77: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

12

19

26

33

40

Wor

d er

ror r

ate

(%)

none

RBAR

CBARKitic−IHT

AMT−AMS AMT−AMS

(RBAR)

The Big Picture

77

DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture

Clipping

Page 78: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Summary & Conclusions •  A previously-unexplored problem in speech recognition, DRC,

was introduced. • Novel solutions to the two primary aspects of the problem,

clipping and compression, were developed. •  Techniques for detecting the occurrence of DRC were

considered. •  A comprehensive solution to DRC for speech recognition was

proposed. • DRC, especially in noise, is a very hard problem, but this

thesis lays the groundwork for very promising future research.

78

DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion

Page 79: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Summary & Conclusions •  Areas of future research include: •  Improving target amplitude estimates for RBAR [BAR] •  Improving the robustness of BAR methods to additive noise [BAR] •  Improving the robustness of clipped/compressed signal detection to

low-valued SNR and τ [RED, Big Picture] •  Development of an R-estimation algorithm [RED, Big Picture] •  Further investigation of the performance of AMT-AMS with an

increasing granularity of acoustic model references [AMT]

79

DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion

Page 80: COMPENSATION FOR NONLINEAR DISTORTION IN ...Dynamic Range Compression (DRC) • A form of nonlinear distortion Nonlinear systems are common (e.g., AM/FM radio, rectifiers) • DRC

Thank you! • Questions?

80

DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion


Recommended