+ All Categories
Home > Documents > Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗...

Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗...

Date post: 14-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
29
Digital Signal Processors ET1304 Projects Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of Technology SE-371 75 Karlskrona, Sweden E-mail: [email protected] November 3, 2010 Abstract The projects in the course Digital Signal Processors ET1304 focus this year on various topics for speech signal processing. Example projects include, for instance, noise reduction in speech signals, dynamic range compression, automatic gain control. Four main project groups are used as umbrella projects. All projects within each main project group are using the same underlying platform but each project implements a unique speech signal processing method. It is recommended that all projects within the same main project group collaborate regarding development of the common platform, and it is required that each group implement and report their unique method individually. Contents 1 Introduction 3 1.1 Filter banks ............................................ 3 1.2 Analysis-synthesis ........................................ 4 1.3 Decimation ............................................ 4 1.4 Modulation ............................................ 4 1.5 Perfect reconstruction ...................................... 4 1.6 Subband processing ....................................... 5 2 Filter Bank Topologies 5 2.1 FIR filter bank .......................................... 5 2.2 IIR filter bank .......................................... 6 2.3 WOLA filter bank ........................................ 7 2.4 Uniform FFT modulated filter bank .............................. 8 3 Speech Processing Methods 12 3.1 Adaptive Gain Equalizer Noise Suppressor ........................... 12 3.2 Spectral Subtraction Noise Suppressor ............................. 13 3.3 Wiener Filter Noise Suppressor ................................. 14 3.4 Adaptive Gain Equalizer Speech Signal Suppressor ...................... 14 3.5 Spectral subtraction Speech Signal Suppressor ......................... 14 3.6 Wiener Filter Speech Signal Suppressor ............................ 14 3.7 Automatic Gain Control ..................................... 15 3.8 Calibrated Parametric Equalizer ................................ 15 Document version 1.1, November 3, 2010 1
Transcript
Page 1: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Digital Signal Processors ET1304

Projects∗

Benny Sallberg, Ph.D.Department of Electrical Engineering

Blekinge Institute of TechnologySE-371 75 Karlskrona, Sweden

E-mail: [email protected]

November 3, 2010

AbstractThe projects in the course Digital Signal Processors ET1304 focus this year on various topics for

speech signal processing. Example projects include, for instance, noise reduction in speech signals,dynamic range compression, automatic gain control. Four main project groups are used as umbrellaprojects. All projects within each main project group are using the same underlying platform but eachproject implements a unique speech signal processing method. It is recommended that all projectswithin the same main project group collaborate regarding development of the common platform, andit is required that each group implement and report their unique method individually.

Contents

1 Introduction 31.1 Filter banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Analysis-synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Decimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Perfect reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Subband processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Filter Bank Topologies 52.1 FIR filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 IIR filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 WOLA filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Uniform FFT modulated filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Speech Processing Methods 123.1 Adaptive Gain Equalizer Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Spectral Subtraction Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Wiener Filter Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Adaptive Gain Equalizer Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . 143.5 Spectral subtraction Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . 143.6 Wiener Filter Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.7 Automatic Gain Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.8 Calibrated Parametric Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

∗Document version 1.1, November 3, 2010

1

Page 2: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

4 Project groups 16

5 Project report and presentation 175.1 Method Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Project presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.3 Project report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

A WOLA filter bank reference implementation 20A.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20A.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

B Uniform FFT modulated filter bank reference implementation 21B.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21B.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22B.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

C Voice activity detection 23C.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23C.2 VAD function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

D Implementation Issues and Hints 24D.1 Use MATLAB test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

D.1.1 Example of test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24D.2 Analysis filter bank verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25D.3 Analysis-synthesis filter bank verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25D.4 Estimating DSP processing load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27D.5 Recursive Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

D.5.1 Example 1: Power and magnitude spectrum estimation . . . . . . . . . . . . . . . 28D.5.2 Example 2: Background noise spectrum estimation . . . . . . . . . . . . . . . . . . 29

2

Page 3: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

1 Introduction

Modern speech processing methods are usually implemented in the time-frequency domain. The time-frequency domain means that a signal is represented not only as a function of time, which is the normalrepresentation, but it is also a function of frequency. The time-frequency domain representation can beachieved by filtering the input time signal by a bank of bandpass filters, where the bandpass filters havevery little mutual overlap in frequency, see example in fig. 1. Hence, methods that transform a signalfrom the time- to the time-frequency domain are usually referred to as filter banks (see, e.g., [1]). Thetransformed filter bank signals are denoted as subband signals since each of them describe a sub-band ofthe original signal. Signal processing methods are generally more efficient when they are implementedusing filter banks, since the processing load can be implemented in parallel for every subband. Filterbank processing can be considered as a divide-and-conquer approach within signal processing, since largerproblems are sub-divided into many smaller problems. While filter banks are essential components ofspeech processing, and in signal processing in general, they will have the focus during this year’s courseprojects.

0 0.1 0.2 0.3 0.4 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

10H0[f ] H1[f ] H2[f ] H3[f ] H4[f ] H5[f ] H6[f ] H7[f ]

10

log|H

k[f

]|2

Normalized frequency - f

Figure 1: A bank of eight bandpass filters hk[n], with Fourier transforms Hk[f ] = F{hk[n]}, comprise afilter bank.

1.1 Filter banks

Assume a sampled (and quantized) input signal x[n], such as the signal from the Analog-to-DigitalConverter (ADC) of a sound codec connected to a DSP. The signal is uniformly sampled using thefrequency Fs Hz, and we shall assume, for clarity, that the signal is represented using floating pointarithmetic and that it takes values between -1 and 1. Now, a filter bank uses a set of K bandpass filters,where the impulse response of each bandpass filter is hk[n], for k = 0 . . .K − 1, where index k = 0corresponds to DC frequency and where index k = K/2 corresponds to the Nyquist frequency. Thefilter bank output signals are the convolution of the input signal with each of the bandpass filters, i.e.,xk[n] = (hk ∗ x)[n], where ∗ denotes the linear convolution operator. Each filter bank output signal xk[n]is referred to as a subband signal, since it describes one sub-band of the signal. It shall be clear that thesignal set {x1[n], . . . , xK [n]} is a time-frequency signal since the index n represents time and the index krepresents the sub-band index which corresponds to a unique frequency band.

3

Page 4: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Analysisfilter bank

g0

g1

gK-1

x0[n]

x1[n]

xK-1[n]

y0[n]

y1[n]

yK-1[n]

x[n] y[n]Synthesisfilter bank

Figure 2: Analysis-synthesis filter bank, where the subband signals xk[n], from the analysis, are processedby subband processors gk to yield subband output signals yk[n] = gk{xk[n]}, to be reconstructed in thesynthesis. Note that if the subband processors are filters, the subband output signals are computed usingthe ordinary convolution yk[n] = (gk ∗ xk)[n].

1.2 Analysis-synthesis

The part of the filter bank that transforms a time-signal into a corresponding time-frequency repre-sentation is referred to as an analysis filter bank. There exists a corresponding inverse transform thattransforms a time-frequency signal, e.g., xk[n], into a time-signal, e.g., x[n]. The part of the filter bankthat implements this inverse transform (which is also known as reconstruction) is referred to as a synthesisfilter bank. An analysis-synthesis filter bank is illustrated in fig. 2.

1.3 Decimation

It may be noted that, in the definition above, we have a factor K over-representation of the signal sincethe subband signals have the same sampling frequency but they, at the same time, represent only 1/Kof the frequencies of the original signal. Hence, it is common practice to decimate the sub-band signalsby a certain factor D, i.e., the analysis filter bank output is decimated as xk[nD] = (hk ∗ x)[nD]. Thefactor D is referred to as the decimation rate. The decimation rate is related to the number of subbandsK by the over-sampling ratio O = K

D . The over-sampling ratio describes the redundancy in the sub-banddata in relation to the original input time-signal data. If the over-sampling is equal to one, which meansthat K = D, then we say the filter bank is critically sampled or critically decimated. Normally, we allowa certain redundancy in the filterbank, e.g., by a factor of O = 2 which we would refer to as a factortwo over-sampled filter bank. The decimation n the analysis filter bank is naturally compensated by acorresponding interpolation in the synthesis filter bank. Otherwise, we would change the sample ratefrom the input of the analysis filter bank to the output of the synthesis filter bank. Note that after theinterpolation, it is required to filter the interpolated signal in order to suppress aliasing terms due to theinterpolation process.

1.4 Modulation

Some filter bank topologies construct the bandpass filters hk[n] by modulating (shifting in frequency)a prototype low pass filter p[n] by multiplication of a complex sinusoidal hk[n] = e

j2πknK p[n], note that

h0[n] = p[n]. The modulation of the prototype filter in the filter bank bandpass filters leads, after somemathematical manipulation, to a highly compact and efficient structure, suitable for DSP implementation.Filter banks that use modulated bandpass filters are referred to as modulated filter banks, but they aremerely specialized cases of the more general filter bank definition, above.

1.5 Perfect reconstruction

Denote by HA{ · } the analysis filter bank operation and by HS{ · } the synthesis filter bank operation.Then, if a certain filter bank yields that x[n] = HS{HA{x[n]}} we say the filter bank provides a perfect

4

Page 5: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

reconstruction. Perfect reconstruction is normally required when you compress sound and speech, butperfect reconstruction may not be required when you implement adaptive or time-variant methods, suchas the methods in the projects of this course. Hence, we shall allow a non-perfect reconstruction in ourfilter banks, i.e., we require only that they provide an approximate reconstruction x[n] ≈ HS{HA{x[n]}}.

1.6 Subband processing

In the sequel, we shall assume a system model according to fig. 2, where the filter bank analysis stageproduces a set of subband signals xk[n] from a time signal x[n]. The subband signals are subject tosubband processors, denoted as gk, to compute subband output signals yk[n] = gk{xk[n]}. The filterbank synthesis stage transforms the output subband signals yk[n] to a time representation y[n]. Thegeneral work flow when using filter banks follows:

1. Filter bank analysis of x[n] to yield xk[n],

2. subband processing of xk[n] to yield subband output signals yk[n] = gk{xk[n]}, where gk aresubband processors,

3. filter bank synthesis of yk[n] to yield y[n],

4. update filter bank states.

To this point, we have not discussed the details about the subband processors gk; we have only consideredthat they provide a mapping from xk[n] to yk[n]. The exact definition of gk shall become clear in section 3where the signal processing methods are defined.

2 Filter Bank Topologies

Filter banks exist in numerous configurations today (see, e.g., [1]). We shall focus upon four filter banktopologies, namely;

1. FIR filter bank (2.1),

2. IIR filter bank (2.2),

3. WOLA filter bank (2.3),

4. FFT modulated filter bank (2.4).

2.1 FIR filter bank

A FIR filter bank uses a bank of equal length FIR bandpass filters in the analysis stage. Let hk[n] denotethe impulse response of each bandpass filter, where k = 0 . . .K − 1 refers to the subband index and thefilters are defined for n = 0 . . .N − 1, where N is the filter length. Then each subband signal xk[n] iscomputed using the convolution sum as

xk[n] = (x ∗ hk)[n] =N−1∑l=0

x[n− l]hk[l]. (1)

The convolution sum above represents the analysis stage of a FIR filter bank. The K bandpass filterscan be designed in MATLAB using the fir1 function, and where the coefficients of the designed filtersare stored on the DSP memory. After the subband processors are computed, i.e., yk[n] = gk{xk[n]}, the

5

Page 6: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

synthesis stage of a FIR filter bank simply weighs the subband output signals yk[n] to yield the outputsignal y[n] according to

y[n] =K−1∑k=0

yk[n]. (2)

The structure of an analysis-synthesis FIR filter bank is illustrated in figure 3. Note that we have no

Synthesis filter bank

g0

g1

gK-1

x0[n]

x1[n]

xK-1[n]

y0[n]

y1[n]

yK-1[n]

x[n] y[n]

h0[n]

h1[n]

hK-1[n]

Analysisfilter bank

FIR

Figure 3: Analysis-synthesis FIR filter bank structure, note that the subband selective analysis bandpassfilters hk[n] are FIR filters.

decimation in this filter bank structure.

2.2 IIR filter bank

An IIR filter bank is very similar to a FIR filter bank in that it also uses a bank of bandpass filters in theanalysis stage. However, the bandpass filters are now IIR filters described by the polynomial function

Hk[z] =Bk[z]Ak[z]

=b0 + b1z

−1 + . . . + bMz−M

1 + a1z−1 + . . . + aP z−P, (3)

where bm are FIR coefficients and ap are IIR coefficients. One example of an IIR filter bank using eightsecond-order-sections is illustrated in figure 4. Each subband signal xk[n] is computed using a differenceequation

xk[n] =P∑

p=1

apxk[n− p] +M∑

m=0

bmx[n−m]. (4)

The design of the K IIR bandpass filters can be done in MATLAB, and where the ap and bm polynomialcoefficients of the designed filters are stored on the DSP memory. After the subband processors arecomputed, i.e., yk[n] = gk{xk[n]}, the synthesis stage of an IIR filter bank simply weighs the subbandoutput signals yk[n] to yield the output signal y[n] according to

y[n] =K−1∑k=0

yk[n]. (5)

The structure of an analysis-synthesis IIR filter bank is illustrated in figure 5. Note that we have nodecimation in this filter bank structure.

6

Page 7: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

0 0.1 0.2 0.3 0.4 0.5−50

−40

−30

−20

−10

0

10

H0[f ] H1[f ] H2[f ] H3[f ] H4[f ] H5[f ] H6[f ] H7[f ]

10

log|H

k[f

]|2

Normalized frequency - f

Figure 4: Example of an IIR filter bank using eight bands. The IIR bandpass filters are implemented,in this example, using second-order-sections on the form Hk[z] = 1−z−2

1−2r cos θkz−1+r2z−2 , where r and θk

determine the -3 dB-bandwidth and center frequency of each bandpass filter, respectively.

2.3 WOLA filter bank

The Weighted OverLap Add (WOLA) filter bank is an efficient filter bank technique frequently used inDSPs, see, e.g., [2, 3]. Four variables, together with an analysis window function w[n], define the WOLAfilter bank, namely; L the length of the analysis window, D the decimation rate (block rate), K thenumber of subbands, and DF the synthesis window decimation rate.

The analysis stage, see figure 6, accepts a block of D new input data samples. Each new block isfed into an input FIFO buffer u[n], of length L samples. The data in the input FIFO is element-wiseweighted by the analysis window function and stored into a temporary buffer t1[n] = u[n] ·w[n], of lengthL samples. The temporary buffer is time-folded into another temporary vector t2[n], of length K samples.The time-folding means that the elements of t1[n] are modulo-K-added to t2[n], according to

t2[n] =L/K−1∑

m=0

t1[n + mK]. (6)

The temporary buffer t2[n] is circularly shifted by K/2 samples in order to produce a zero-phase signalfor the FFT. This means that the upper half of t2[n] is swapped place with the lower half of t2[n]. Thecircularly shifted buffer t2[n] is then fed into a K-sized FFT to compute the subband signals xk[n].

The synthesis stage of the WOLA filter bank implements the actual weighted overlap-add procedure,i.e., the WOLA procedure. The synthesis stage, see figure 7, starts by applying a size-K IFFT to theprocessed subband signals yk[n]. The IFFT output is circularly shifted K/2 samples, to counter-act thecircular shift used in the analysis stage, and the circularly shifted data is stored in a temporary buffert3[n], of size K samples. The buffer t3[n] is then stacked, by repetition, in the buffer t4[n] of lengthL/DF , where L is the analysis window length, and DF is the synthesis window decimation factor. Thebuffer t4[n] is weighted by a synthesis window function z[n] of size L/DF , defined as z[n] = w[nDF ],i.e., a factor DF decimated analysis window function. The weighted data is summed with the data inthe output FIFO, t5[n] of length L/DF , and the output FIFO data is over-written with the summationresult, i.e., t5[n] ← t5[n] + z[n] · t4[n]. The output FIFO is then shifted left by D samples, i.e., D zerosare filled from the FIFO’s rear, and the out-shifted data is the actual output data block, y[nD].

7

Page 8: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Synthesis filter bank

g0

g1

gK-1

x0[n]

x1[n]

xK-1[n]

y0[n]

y1[n]

yK-1[n]

x[n] y[n]

H0[z]

H1[z]

HK-1[z]

Analysisfilter bank

IIR

Figure 5: Analysis-synthesis IIR filter bank structure. Note that the subband selective analysis bandpassfilters Hk[z] are IIR filters.

The WOLA filter bank is popular due to its simplicity and since many DSP technologies have builtin support for circular buffers and FIFOs, which of course helps when implementing the WOLA fil-ter bank. Also, DSPs have usually direct support for FFT/IFFT which renders furthermore efficientimplementations.

Due to the complexity of this filter bank structure, a reference MATLAB implementation of theWOLA filter bank is provided in appendix A.

2.4 Uniform FFT modulated filter bank

We use a set of K bandpass filters, Hk[z] for k = 0 . . .K − 1, with impulse response functions hk[n],each of length N taps. These bandpass filters are created by modulating (frequency shifting) a low-passprototype filter (which is equal to the first bandpass filter at DC frequency, i.e., h0[n]), according to

hk[n] = W−knK h0[n], for n = 0 . . .N − 1, (7)

where WK = e−j 2πK . The Z-transform of each modulated bandpass filter is

Hk[z] =∞∑

n=0

hk[n]z−n =∞∑

n=0

W−knK h0[n]z−n =

∞∑n=0

h0[n](W k

Kz)−n

= H0[W kKz]. (8)

So, it is clear that each bandpass filter Hk[z] = H0[W kKz] is a frequency shifted (modulated) version of the

prototype filter H0[z]. The notation “uniform” in the section title comes from the fact that the filters areuniformly distributed on the frequency axis during the modulation process. There are also non-uniformmodulation techniques which are not described here.

We shall now assume that the bandpass signals, i.e., the input signal filtered by each modulatedbandpass filter, are subject to decimation a factor D, where D = K/O, and O denotes the over-samplingratio. A filter with a following decimator can be implemented using a polyphase composition, see, e.g.,[1]. The corresponding polyphase implementation is achieved by dividing the prototype filter H0[z] intoD polyphase components, denoted as Pd[z] with impulse response function pd[n] for d = 0 . . .D − 1,according to

H0[z] =D−1∑d=0

z−dPd[zD], (9)

pd[n] = h0[d + nD], (10)

Pd[z] =∞∑

n=0

pd[n]z−n =∞∑

n=0

h0[d + nD]z−n. (11)

8

Page 9: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

t1[n]

u[n]

w[n]L

x[nD]D

L

Circularshift

FFTK

x0[n]

x1[n]

xK-1[n]

g0

g1

gK-1

y0[n]

y1[n]

yK-1[n]

t2[n]K K K K K K

L

K

Figure 6: Analysis stage of a WOLA filter bank structure (reprinted from [3]).

Hence, inserting (9) and (11) into (8), yields a polyphase representation of each modulated bandpassfilter Hk[z] as follows

Hk[z] = H0[W kKz] =

D−1∑d=0

W−dkK z−dPd

[W kD

K zD]

=D−1∑d=0

W−dkK z−d

∞∑n=0

h0[d + nD](W kD

K zD)−n

=

=D−1∑d=0

W−dkK z−d

∞∑n=0

h0[d + nD]W−nkDK z−nD =

D−1∑d=0

W−dkK z−d

∞∑n=0

h0[d + nD]W−nkO z−nD,

(12)

where we used that W−nkDK = W−nk

O . The factor W−nkO is used to introduce O groups of subband

polyphase components, where each group will be described by the index g = 0 . . .O − 1. Each group gcontains subbands having indices k = g + lO for l = 0 . . .D− 1. Considering only the indices k belongingto group g, i.e., k = g + lO, yields W−nk

O = W−n(g+lO)O = (W g

O)n. Each subband component of group g

9

Page 10: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

t5[n]

t4[n]

z[n]

Output FIFO

Circularshift

IFFTK

x0[n]

x1[n]

xK-1[n]

g0

g1

gK-1

y0[n]

y1[n]

yK-1[n]

K

Zeros

Output y[nD]

K

L/DF

L/DF

D

D

t3[n]

L/DF

Figure 7: Synthesis stage of a WOLA filter bank structure (reprinted from [3]).

is expressed as

Hg+lO[z] =D−1∑d=0

W−d(g+lO)K z−d ·

∞∑n=0

h0[d + nD]W−n(g+lO)O z−nD =

=D−1∑d=0

W−dlD z−d ·

∞∑n=0

W−dgK h0[d + nD] (W g

O)n︸ ︷︷ ︸=qg,d[n]

z−nD

︸ ︷︷ ︸=Qg,d[zD ]

=

=D−1∑d=0

W−dlD z−dQg,d[zD] =

(W−0l

D . . .W−(D−1)lD

⎛⎜⎝

z−0Qg,0[zD]...

z−(D−1)Qg,D−1[zD]

⎞⎟⎠ , (13)

where we used the fact that W−dlOK = W−dl

D . We denote the filters qg,d[n] = W−dgK (W g

O)nh0[d+nD] as the

dth modified polyphase component of group g, and they have the Z-transforms Qg,d[z] =∑∞

n=0 qg,d[n]z−n.We now stack all subbands related to the gth group, i.e., subbands with indices k = g + lO for l =

10

Page 11: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

0 . . .D − 1, in a vector Hg[z] according to

Hg[z] =

⎛⎜⎜⎜⎝

Hg[z]Hg+O[z]

...Hg+(D−1)O[z]

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎝

W−00D . . . W

−(D−1)0D

.... . .

...W

−0(D−l)D . . . W

−(D−1)(D−l)D

⎞⎟⎟⎠ ·

⎛⎜⎝

z−0Qg,0[zD]...

z−(D−1)Qg,D−1[zD]

⎞⎟⎠ =

= DWD−1

⎛⎜⎝

z−0Qg,0[zD]...

z−(D−1)Qg,D−1[zD]

⎞⎟⎠ , (14)

where WD is identified as the Dth-order discrete Fourier transformation (DFT) matrix, i.e., WD−1 is

the Dth-order inverse discrete Fourier transformation (IDFT) matrix. Note that the subband indices ineach group are reordered and thus need to be rearranged in an increasing order after each analysis step.

Note that the size of the DFT/IDFT operations is D, the decimation rate. This filter bank structurerequires that a block of D new data samples are input to the filter bank for each update of the filter bankstates. Hence, the sampling rate of the subband signals is FS/D, where FS is the sampling frequencyof the fullband signal, x[n]. If D is a power of two, e.g., D = 2p for any integer p > 0, then theDFT/IDFT operation is efficiently implemented using the fast Fourier transform (FFT) and its inverse,the IFFT. Many DSPs have built in support for FFT/IFFT which may be exploited to yield an efficientimplementation.

The analysis stage of this filter bank is illustrated in figure 8.

g0

gO

g(D-1)O

x0[n]

xO[n]

x(D-1)O[n]

x[nD]

x[(n-1)D+1]

q0,0[n]

q0,1[n]

q0,D-1[n]

Group 0

qO-1,0[n]

qO-1,1[n]

qO-1,D-1[n]

Group O-1

x[nD-1]IFFT

D

IFFTD

gO-1

g2O-1

gK-1

xO-1[n]

x2O-1[n]

xK-1[n]

y0[n]

yO[n]

y(D-1)O[n]

yO-1[n]

y2O-1[n]

yK-1[n]

Figure 8: Uniform FFT modulated analysis filter bank.

So far, only the analysis filter bank has been discussed. The synthesis is derived in a similar fashionas the analysis, but its derivation is left as an exercise for the interested student. The synthesis of thisfilter bank is illustrated in figure 9. It may, however, be noted that the major difference between analysisand synthesis lies in the filters qg,d[n], of the analysis, that are replaced by rg,d[n], in the synthesis. These

11

Page 12: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

g0

gO

g(D-1)O

x0[n]

xO[n]

x(D-1)O[n]

gO-1

g2O-1

gK-1

xO-1[n]

x2O-1[n]

xK-1[n]

y0[n]

yO[n]

y(D-1)O[n]

yO-1[n]

y2O-1[n]

yK-1[n]

FFTD

FFTD

r0,0[n]

r0,1[n]

r0,D-1[n]

Group 0

rO-1,0[n]

rO-1,1[n]

rO-1,D-1[n]

Group O-1

y[nD]

y[(n-1)D+1]

y[nD-1]

Figure 9: Uniform FFT modulated synthesis filter bank.

filters rg,d[n] are referred to as the dth modified polyphase component of group g, but they are based ona different prototype filter.

It is also noted that the subband signals of the FFT modulate filter bank are complex valued data,which may require special attention in some implementations.

Due to the complexity of this filter bank structure, a reference MATLAB implementation of theuniform FFT modulated filter bank is provided in appendix B.

3 Speech Processing Methods

Each filter bank structure, see section 1.1, will be used as a fundament to implement eight differentspeech processing methods, described next. Hence, in total there will be 4 × 8 = 32 unique projects inthis course.

We have three groups of methods; noise suppression (sections 3.1, 3.2, 3.3), speech suppression (sec-tions 3.4, 3.5, 3.6), and speech utility functions (sections 3.7 and 3.8). Noise suppression methods enhancespeech and suppress noise, and they can be used for instance to enhance speech in human communica-tion systems, e.g., VoIP or cellular telephony. Speech suppression methods suppress, on the other hand,speech signals in favor of noise, and can, for instance, be used in, e.g., public address, Karaoke systems,or as a support function to generate better noise references for other noise suppression systems. Speechutility functions are used to enhance part of a signal related to speech.

In the following we shall assume that the received subband signals comprise a mixture of clean speechplus degrading noise, i.e.,

xk[n] = sk[n] + vk[n], (15)

where sk[n] denotes clean subband speech and vk[n] denote subband noise.

3.1 Adaptive Gain Equalizer Noise Suppressor

The Adaptive Gain Equalizer (AGE) is a simple but effective method to reduce noise in speech signals[4]. The AGE uses the fact that speech can be considered non-stationarity compared to the noise. Two

12

Page 13: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

energy, or magnitude, measures track the stationary background noise level and the varying speech plusnoise level. The quotient between these two measures form a gain function that amplifies non-stationaryspeech signals, hence, the method’s name “Adaptive Gain Equalizer”. We shall add a minor adjustmentto the original method described in [4], where the gain function is now:

gk[n] = min

(1,

(Ax;k[n]

Lk · Ax;k[n]

)pk)

, (16)

where Lk, Ax;k[n], Ax;k[n] and pk are defined in [4]. Since gk[n] has only one filter tap, the subbandoutput signal is computed using multiplication as yk[n] = gk[n] · xk[n].

The measures Ax;k[n] and Ax;k[n] can be implemented using recursive averages, see section D.5.

Evaluate the method’s noise suppression performance using sound files from [5, 6].

3.2 Spectral Subtraction Noise Suppressor

The spectral subtraction method is frequently used as speech enhancement method in industry andresearch today [7]. The spectral subtraction method assumes that the spectral properties of the noise donot change much from just before speech is active until the speech is active. Hence, if there is no speechpresent, the input signal is x[n] = v[n] and a spectrum of the background noise level, denoted as Pv;k[n],is estimated. The spectrum of the input signal, with or without speech, is

Px;k[n] = Ps;k[n] + Pv;k[n], (17)

since we assume that noise and speech are uncorrelated, i.e., that Psv;k[n] = 0. We may rewrite thisrelationship as

Ps;k[n] = Px;k[n]− Pv;k[n]. (18)

Hence, the spectral subtraction method continuously subtracts the noise spectrum estimate from theinput signal spectrum to acquire an approximate speech spectrum, i.e.,

Ps;k[n] = max(Px;k[n]− Pv;k[n], 0

), (19)

where the max-function is to ensure a constantly positive spectrum estimate. The enhanced speech signalis

yk[n] = Ps;k[n]1p ej � xk[n], (20)

where � xk[n] is the phase of the noisy speech and p determines if magnitude (p = 1) or power spectrum(p = 2) should be estimated.

The spectral subtraction method requires a Voice Activity Detector (VAD) that detects if speech isactive in one frame. The VAD detection is used to trigger the background noise spectrum estimation,which should be conducted during speech-silence periods, i.e., when the VAD detector is zero. One ex-ample implementation of a VAD is given in appendix C.

The power spectrums can be estimated using recursive averages, see section D.5.

Evaluate the method’s noise suppression performance using sound files from [5, 6].

13

Page 14: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

3.3 Wiener Filter Noise Suppressor

The Wiener filter provides an optimal estimate of a signal embedded in additive noise by minimizing theMean Squares Error (MSE),

E{|ek[n]|2} = E{|sk[n]− gk[n] · xk[n]|2}, (21)

where E{ · } is the expectation operator. In other words, the Wiener filter gives the optimal estimate ofspeech given a linear combination (filtering) of speech plus noise. The MSE is expanded as

E{|ek[n]|2} = E{|sk[n]|2} − 2gk[n] ·E{|sk[n]|2}+ gk[n]2 · E{|xk[n]|2}, (22)

since E{sk[n]x∗k[n]} = E{|sk[n]|2} + E{sk[n]v∗k[n]} = E{|sk[n]|2}. The minimum MSE is found where

∂E{|ek[n]|2}∂gk[n] = 0, which renders that

∂E{|ek[n]|2}∂gk[n]

= −2 · E{|sk[n]|2}+ 2gk[n] ·E{|xk[n]|2} = 0, (23)

gk[n] =E{|sk[n]|2}E{|xk[n]|2} . (24)

Replacing true expectations with their estimates yields the Wiener solution

gk[n] =Ps;k[n]Px;k[n]

, (25)

where we may use the trick from spectral subtraction in order to obtain a clean speech spectrum estimate,i.e., equation (19).

Note that we need a VAD to detect if speech is active in one frame. The VAD detection is used totrigger the background noise spectrum estimation, which should be conducted during speech-silence pe-riods, i.e., when the VAD detector is zero. One example implementation of a VAD is given in appendix C.

The power spectrums Pv;k[n] and Px;k[n] can be estimated using recursive averages, see section D.5.

Evaluate the method’s noise suppression performance using sound files from [5, 6].

3.4 Adaptive Gain Equalizer Speech Signal Suppressor

Modify the adaptive gain equalizer method in section 3.1 so that, instead of suppressing noise, speech issuppressed.

Evaluate speech suppression performance using sound files from [5, 6].

3.5 Spectral subtraction Speech Signal Suppressor

Modify the spectral subtraction method in section 3.2 so that, instead of suppressing noise, speech issuppressed.

Evaluate speech suppression performance using sound files from [5, 6].

3.6 Wiener Filter Speech Signal Suppressor

Modify the Wiener filter in section 3.3 so that, instead of suppressing noise, speech is suppressed.

Evaluate speech suppression performance using sound files from [5, 6].

14

Page 15: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

3.7 Automatic Gain Control

Automatic Gain Control (AGC) is sometimes referred to as Dynamic Range Compression (DRC). Theobjective of an AGC function is to expand low signals and compress large signals to a predeterminedsignal level. This is useful practically since we know that an AGC-processed signal has a well-definedmagnitude. Also, in some applications, such as in hearing-defenders, we use AGC-functions to hear weaksounds better and to protect against harmful loud sounds.

The AGC is a recursive method that controls a signal gain gk[n] so that the power of the AGC-output signal, i.e., E{|yk[n]|2} where yk[n] = gk[n] · xk[n], meets a pre-defined subband target power Pk.Define an input signal power estimate as pk[n] ≈ E{|xk[n]|2}. The output signal power is E{|yk[n]|2} =gk[n]2 · E{|xk[n]|2} ≈ gk[n]2 · pk[n]. The gain is increased if the output power is less than the targetpower and it is decreased if the output power is above the target power. The gain is usually bounded sothat Gmin;k ≤ gk[n] ≤ Gmax;k, where Gmin;k < 1 and Gmax;k > 1 are system constants. The amount ofdecrease and increase in gain value is controlled by the attack and release times TA,k and TR;k, respectively(see appendix D.5 for a definition of how these attack and release times relate to recursive averages). AVAD (see appendix C) is usually used to have the AGC operate only on speech signals. The VAD gives aspeech activity detection d[n] that can be zero or one, zero if there is no speech and one if there is speechpresent. The gain for next input sample is updated as follows;

gk[n + 1] =

⎧⎨⎩

(1− λA;k)gk[n] + λA;kGmin;k if gk[n]2 · pk[n] > Pk,(1− λR;k)gk[n] + λR;kGmax;k else if gk[n]2 · pk[n] ≤ Pk and d[n] = 1,gk[n] else.

(26)

Evaluate the AGC performance using sound files from [5, 6].

3.8 Calibrated Parametric Equalizer

A parametric equalizer is a structure that allows subband gains gk to be tuned according to a predefinedfunction. The output of a parametric equalizer is yk[n] = gk · xk[n], note that gk[n] ≡ gk since we expectthe gains to be fixed during normal operation. The parametric equalizer can, for instance, provide largegain for low subbands to improve the base sounds (low frequency sounds).

We shall, in this project, have a calibrated parametric equalizer structure where the user can modifythe gain gauges of the equalizer in an implicit manner. By sending in a known sound file, and simulta-neously pressing one DSP button, the equalizer estimates the power-level of the input sound file. Whenthe user releases the button, the equalizer recomputes the equalizer gain values so that the sound fromthe sound file is equalized. The calibrated parametric equalizer has two operating modes;

Mode 1: Equalize This is the default operating mode where the input signal is simply scaled by thegain values, i.e., yk[n] = gk · xk[n].

Mode 2: Calibrate During the calibration, a sound file is input to the equalizer and the power spectrumdensity of the input signal is estimated, i.e., Px;k[n] ≈ E{|xk[n]|2}. When the calibration ends, the

gain values are updated according to gk =√

PT ;k

Px;k[n], where PT ;k are nominal power spectrum

density values defined based on system settings (i.e., PT ;k needs to be calibrated as well). The gainis bounded as Gmin;k ≤ gk ≤ Gmax;k, where Gmin;k < 1 and Gmax;k > 1. When the calibrationends, and the gains are updated, the processing falls back to the default equalize mode.

Note that the user should be able to switch between these two operating modes, and the active modeshould be marked by a led-light. On startup, before the user has commenced a calibration, the gainsshould be set to 1.

15

Page 16: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

4 Project groups

The projects in the course focus this year on various topics for speech signal processing. Four mainproject groups are used as umbrella projects. All projects within each main project group are using thesame underlying platform but each project implements a unique speech signal processing method. It isrecommended that all projects within the same main project group collaborate regarding developmentof the common platform and it is required that each group implement and report their unique methodindividually.

Main project group 1Group Filter bank Application

1 FIR (2.1) Adaptive Gain Equalizer Noise Suppressor (3.1)2 FIR (2.1) Spectral Subtraction Noise Suppressor (3.2)3 FIR (2.1) Wiener Filter Noise Suppressor (3.3)4 FIR (2.1) Adaptive Gain Equalizer Speech Signal suppressor (3.4)5 FIR (2.1) Spectral Subtraction Speech Signal suppressor (3.5)6 FIR (2.1) Wiener Filter Speech Signal Suppressor (3.6)7 FIR (2.1) Automatic Gain Control (3.7)8 FIR (2.1) Calibrated Parametric Equalizer (3.8)

Main project group 2Group Filter bank Application

9 IIR (2.2) Adaptive Gain Equalizer Noise Suppressor (3.1)10 IIR (2.2) Spectral Subtraction Noise Suppressor (3.2)11 IIR (2.2) Wiener Filter Noise Suppressor (3.3)12 IIR (2.2) Adaptive Gain Equalizer Speech Signal suppressor (3.4)13 IIR (2.2) Spectral Subtraction Speech Signal suppressor (3.5)14 IIR (2.2) Wiener Filter Speech Signal Suppressor (3.6)15 IIR (2.2) Automatic Gain Control (3.7)16 IIR (2.2) Calibrated Parametric Equalizer (3.8)

Main project group 3Group Filter bank Application

17 WOLA (2.3) Adaptive Gain Equalizer Noise Suppressor (3.1)18 WOLA (2.3) Spectral Subtraction Noise Suppressor (3.2)19 WOLA (2.3) Wiener Filter Noise Suppressor (3.3)20 WOLA (2.3) Adaptive Gain Equalizer Speech Signal suppressor (3.4)21 WOLA (2.3) Spectral Subtraction Speech Signal suppressor (3.5)22 WOLA (2.3) Wiener Filter Speech Signal Suppressor (3.6)23 WOLA (2.3) Automatic Gain Control (3.7)24 WOLA (2.3) Calibrated Parametric Equalizer (3.8)

Main project group 4Group Filter bank Application

25 Modulated (2.4) Adaptive Gain Equalizer Noise Suppressor (3.1)26 Modulated (2.4) Spectral Subtraction Noise Suppressor (3.2)27 Modulated (2.4) Wiener Filter Noise Suppressor (3.3)28 Modulated (2.4) Adaptive Gain Equalizer Speech Signal suppressor (3.4)29 Modulated (2.4) Spectral Subtraction Speech Signal suppressor (3.5)30 Modulated (2.4) Wiener Filter Speech Signal Suppressor (3.6)31 Modulated (2.4) Automatic Gain Control (3.7)32 Modulated (2.4) Calibrated Parametric Equalizer (3.8)

16

Page 17: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

5 Project report and presentation

5.1 Method Evaluation

It is expected that each group undertakes a thorough evaluation of their implemented structure, both inMATLAB and by commencing measurements on their DSP implementation. Each method describes howthe respective implementation can be evaluated.

It is furthermore expected that the evaluation analyzes relevant system parameters, and that theevaluation tries to find values of each system parameter that provides an optimal performance in respectiveapplication. System parameters include filter bank parameters, and parameters from each respectivespeech processing method.

5.2 Project presentation

1. Each group should prepare to orally present, demonstrate and defend their respective implementa-tion.

2. Each group should find a way to demonstrate their solution using a setup that clearly shows theimplementation performance.

3. The total time for oral presentation and demonstration, combined, is restricted to 15 min per group.

Observe that each group needs to pass the presentation as well as the report, so be well prepared for thepresentation.

5.3 Project report

1. Each group will write a project report of maximum 5 pages.

2. All MATLAB and DSP C or assembly code should be submitted together with the report in acompressed zip file.

3. The report disposition and formatting should follow:

• The report should be formatted using IEEE manuscript template for conference proceedings,seehttp://www.ieee.org/conferences_events/conferences/publishing/templates.html.

• The front page of the report should show the title of the group project, the full name andpersonal-id number of all project members, the course code and the report submission date.

• The report should briefly (max 2 pages) describe the group’s implementation including thefilter banks.

• The remaining pages should be used for describing results from the evaluation and discussionsregarding the implementation, or improvements made to each respective project, conclusions,and for references.

• Note that the report should discuss the MATLAB reference implementation as well as theDSP implementation.

• Relevant figures, tables and equations should have proper labels and they must be referred toform the text, i.e., floating figures that are not referred to form the text are disallowed.

• The amount of source code listing within the report should e kept at a minimum!

4. The report must be submitted electronically by email to the course teacher ([email protected]) beforethe project presentation.

17

Page 18: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Observe that reports longer than 5 pages, or reports that do not conform to the manuscript template orthe instructions above, will be rejected and the project group will be failed.

If major errors are found in the report, or if the attached MATLAB or DSP source code does notexecute, the student group will be given two extra weeks to correct the errors. The two extra weeks startat the day the errors have been pointed out by the course teacher. If the student group fails to submit acorrection to the report, or the source code, within these two weeks, the group fails.

18

Page 19: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

References

[1] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice Hall, 1993.

[2] R. E. Crochiere. A weighted overlap-add method of short-time fourier analysis/synthesis. IEEETrans. Acoust. Speech and Sig. Proc., ASSP-28:99–102, February 1980.

[3] R. Brennan and T. Schneider. An ultra low-power dsp system with a flexible filterbank. IEEE 35thAsilomar Conference on Signals, Systems and Computers, 1:809–813, February 2001.

[4] N. Westerlund, M. Dahl, and I. Claesson. Speech enhancement for personal communication using anadaptive gain equalizer. Elsevier Signal Processing, 85(6):1089–1101, 2005.

[5] Y. Hu and P. Loizou. Subjective evaluation and comparison of speech enhancement algorithms.Elsevier Speech Communication, 49:588–601, 2007.

[6] Y. Hu and P. Loizou. Noizeus: A noisy speech corpus for evaluation of speech enhancement algorithms.Internet: http://www.utdallas.edu/~loizou/speech/noizeus/, 1 Nov. 2010.

[7] S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust.Speech and Sig. Proc., ASSP-27:113–120, April 1979.

19

Page 20: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Appendices

A WOLA filter bank reference implementation

The following lists a MATLAB reference implementation for the WOLA filter bank. It assumes an inputblock, denoted as xblock, of D input data samples and it produces an output block, denoted as yblock,of D output data samples. The initialization should be run once at program startup, then analysis isexecuted before synthesis for each new block of input samples, and we assume that subband processorstransform the subband data before synthesis.

A.1 Example initialization

% prototype filter lengthL = 128;% number of subbandsK = 32;% oversampling ratioOS = 2;% prototype filter decimation ratioDF = 1;% output prototype filter lengthLOUT = L / DF;% data decimation rateD = N/OS;% analysis prototype filterw = 0.85*N*fir1(L-1,1/N)’;% synthesis prototype filterif LOUT < L

wOUT = w(1:DF:L);else

wOUT = w;end;% input FIFOxmem = zeros(L,1);% output FIFOymem = zeros(LOUT ,1);

A.2 Analysis

% update input FIFOxmem = [xmem(D+1:end); xblock ];% weight input FIFO with analysis prototype windowtemp = xmem .* w;% time -foldingxn = temp(1:K);for k = K+1:K:L-K+1

xn = xn + temp(k:k+K -1);end;% circular shiftxn = [xn(K/2+1: end); xn(1:K/2)];% fft

20

Page 21: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

X = fft(xn, K);% since we know input data is real valued, its Fourier transform is% complex conjugate symmetric , and we may omit analysis of the upper% frequency half , since it can anyway be reconstructed from the% lower frequency half ...X = X(1:K/2+1);

A.3 Synthesis

% reconstruct the upper frequency half , since we can assume complex% conjugate symmetry of the data.Y = [Y; conj(Y(end -1: -1:2))];% compute ifft , certify that the result is real valued.temp = real(ifft(Y));% revert circular shifttemp = [temp(K/2+1: end); temp(1:K/2)];% repeat the ifft -data , weight with synthesis prototype window, and% add with output FIFO.ymem = repmat(temp ,LOUT/K,1) .* wOUT + ymem;% shift output FIFOyblock = ymem(1:D);ymem = [ymem(D+1:end); zeros(D,1)];

B Uniform FFT modulated filter bank reference implementa-tion

The following lists a MATLAB reference implementation for the uniform FFT modulated filter bank. Itassumes an input block, denoted as xblock, of D input data samples and it produces an output block,denoted as yblock, of D output data samples. The initialization should be run once at program startup,then analysis is executed before synthesis for each new block of input samples, and we assume thatsubband processors transform the subband data before synthesis.

B.1 Example initialization

% number of subbandsK = 32;% oversampling ratioO = 4;% decimation rateD = K/O;% number of taps per polyphase componentP = 3;% total prototype filter length is L = D*P;% prototype filterPROTFLT = sqrt(2/O)*fir1(D*P - 1, 1/K);% compute analysis and synthesis modified polyphase components for% each group , gPOLYPHS_A = zeros(D, P, O );POLYPHS_S = zeros(D, P, O );IDXK = 0 : P-1;for g = 0:O-1

for d = 0:D-1

21

Page 22: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

POLYPHS_A(d+1, :, g+1) = fliplr(D * exp( j*g*2*pi*d/K ) ...* PROTFLT( IDXK*D + d + 1 ) .* exp( j*2*pi*g*IDXK/O ));

POLYPHS_S(d+1, :, g+1) = flipdim(conj( POLYPHS_A(d+1, :, g+1)), 2);end;

end;% input and output FIFOsSMPMEM_A = zeros(D, P);SMPMEM_S = zeros(D, P, O);% temporary and intermediate variablesX = zeros(K/2+1, 1);TempA = zeros(D, 1);IDXI = 1:(K/2)/O;IDX_A = zeros ((K/2)/O, O);IDX_S = zeros(D, O);for g = 1 : O

IDX_A(:, g) = g : O : K/2;IDX_S(:,g) = g : O : K;

end;

B.2 Analysis

% update analysis FIFOSMPMEM_A = [xblock , SMPMEM_A(:, 1:end -1)];% go through all groupsfor g = 1:O

% filter with respective modified polyphase component , and compute the% IFFTTempA = ifft( sum( SMPMEM_A .* POLYPHS_A(:,:,g), 2) );% since the subband indices for each group are in an irregular order ,% we use IDX_A to reorder the subband indices in an increasing order.X(IDX_A(:,g)) = TempA(IDXI , :);if g==1

X(end) = TempA(IDXI(end)+1, :);end;% note that X contains only the first frequency half since we assume% that xblock is real valued, which implies that X is complex conjugate% symmetric.

end;

B.3 Synthesis

% reconstruct the upper frequency halfTemp = [Y; conj(Y(end -1: -1:2))];% go through each synthesis group and update the synthesis FIFOs and% compute the FFT , here , IDX_S rearranges the subband indices from% a linear ordering into an order according to each group g.for g = 1 : O

SMPMEM_S(:, 2:end , g) = SMPMEM_S(:, 1:end -1, g);SMPMEM_S(:, 1, g) = fft( Temp(IDX_S(:, g), :));

end;% compute the output by filtering synthesis FIFO with synthesis modified% polyphase components. real is used to ensure real output data.yblock = real(sum(sum(SMPMEM_S .* POLYPHS_S , 3), 2));

22

Page 23: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

C Voice activity detection

The following lists a MATLAB reference implementation for a Voice Activity Detector (VAD). It as-sumes one input sample, denoted as xsample, and it produces one VAD-decision, denoted as VADout. IfVADout=0 it indicates there is no speech and if VADout=1 it indicates speech. The VAD decision canbe used, for instance, to estimate background noise during speech pauses, i.e., when VADout=0. Theinitialization should be run once at program startup, then the VAD function is executed for each newinput sample.

C.1 Example initialization

% Sampling frequency in Hzfs = 8000;% Time constants of the VADtA = 1/(fs *0.025); % 0.025 stB = 1/(fs *0.1); % 0.1 stBF = 1/(fs*5); % 5 s% SNR detection thresholdSNR_THS = 1;% Number of subbands in VADK = 4;% VAD filter bank parametersalpha = 0.99;T = 3;q = linspace(0,1,K+T).^2;q = 0.5*(q(1:end -T) + q(2:end -T+1));R = 1-pi/2*sqrt(q)/K;theta = pi*q;% VAD filter bank memoryZMEM = zeros(K,2);% Energy measuresE = zeros(K,1);Eflr = zeros(K,1);% temporary vectorxk = zeros(K ,1);

C.2 VAD function

% Compute VAD analysis filter bankfor k = 1:K

Ak = [1,-2*R(k)*cos(theta(k)),R(k)^2];Bk = [1,-alpha];[xk(k), ZMEM(k ,:)] = filter(Bk , Ak, xsample , ZMEM(k,:));

end;% Compute subband energyE = (1-tA)*E + tA*xk.^2;if n<3/tA

Eflr = E;end;% Estimate subband SNRSNREstimate = (E - Eflr)./Eflr;% Update background noise estimate

23

Page 24: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

id = find(SNREstimate <= SNR_THS);jd = find(SNREstimate > SNR_THS);if length(id) > 0

Eflr(id) = min((1-tB)*Eflr(id) + tB*E(id), E(id));end;if length(jd) > 0

Eflr(jd) = min((1-tBF)*Eflr(jd) + tBF*E(jd), E(jd));end;% Compute VAD detection outputVADOut = double(mean(SNREstimate) > SNR_THS);

D Implementation Issues and Hints

D.1 Use MATLAB test vectors

It is suggested that you first implement all parts of your method in MATLAB. First, to see you haveunderstood the concept and, second, that you have a correct reference implementation. Then, to verifyparts of your DSP-implementation, generate random test input vectors in MATLAB and compute testoutput vectors. Then, in your DSP-implementation, input the same test input vectors and compute (inthe DSP) corresponding output vectors. If the MATLAB output vector and DSP output vector are thesame, your implementation is correct for those test vectors.

D.1.1 Example of test vectors

Assume I want to verify that my function mymul is correctly implemented. This function implements thefollowing

mymul(a, b) = a · b, (27)

for two real variables a and b. In MATLAB, my code is

function [out] = mymul(a,b);out = a*b;

And my corresponding C-implementation is

float mymul(float a, float b){

return a*b;}

To test my C-implementation, i generate random test vectors in MATLAB, e.g.,

NUM_TESTS = 3;TEST_VEC_a = round (10000*randn(NUM_TESTS ,1))/10;TEST_VEC_b = round (10000*randn(size(TEST_VEC_a)))/10;for k = 1: NUM_TESTS

fprintf(1, ’mymul (%.1f,%.1f)=%.2f\n’, TEST_VEC_a(k), TEST_VEC_b(k), ...mymul(TEST_VEC_a(k), TEST_VEC_b(k)));

end;

Example MATLAB console output:

mymul ( -354.1, -801.2)=283704.92mymul ( -1372.4 ,23.0)= -31565.20mymul (877.1 ,285.1)=250061.21

When I use the same input data a, b to my C-function I see that I get an identical output, which leadsme to conclude that my C-implementation works as expected (for these test vectors).

24

Page 25: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

D.2 Analysis filter bank verification

A simple way to verify and test an analysis filter bank implementation is to input a sinusoidal signal, wherethe sinusoidal’s frequency is exactly matching the center frequency of one subband, and then analyze thesubband signals xk[n]. We know that the amplitude of adjacent subbands should be very low in themiddle of each subband, which means we have a little amount of aliasing. Hence, if the analysis stageis correctly implemented we shall se most of the signal energy only in the subband where the sinusoidalsignal is centered.

In this test, the subband processors are gk = 1 for k = 0 . . .K − 1. An example is illustrated infigure 10 where a sinusoidal signal is centered in band k = 2. A corresponding figure showing the inputsubband signals xk[n] after the analysis filter bank is given in figure 11.

0 0.1 0.2 0.3 0.4 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

10H0[f ] H1[f ] H2[f ] H3[f ] H4[f ] H5[f ] H6[f ] H7[f ]

10

log|H

k[f

]|2

Normalized frequency - f

Figure 10: An eight band analysis filter bank (shaded) with a sinusoidal signal centered in the middleof band k = 2 (black). Notice that adjacent bands have a very small magnitude (< −60 dB) where thesinusoidal signal is located, which means we have little aliasing.

D.3 Analysis-synthesis filter bank verification

It is straightforward to verify a DSP-implementation of an analysis-synthesis filter bank. What you needis a white noise generator (e.g., randn command in MATLAB), subband processors that are constantscalars, and a power spectrum estimator (e.g., psd command in MATLAB). The subband processors arenow scalar weights that can take on values gk = 0 or gk = 1, see the filter bank configuration in figure 12.

We note that a linear system is defined by the relationship y[n] = (h∗x)[n] which has the correspondingFourier transform relationship Y [f ] = H [f ] ·X [f ]. The square modulus of this relationship is |Y [f ]|2 =|H [f ]|2 · |X [f ]|2. Taking the expectation value yields E{|Y [f ]|2} = |H [f ]|2 · E{|X [f ]|2} which is equalto Pyy[f ] = |H [f ]|2 ·Pxx[f ], where Pxx[f ] and Pyy[f ] are the power spectral densities (PSD) of the inputsignal x[n] and output signal y[n], respectively. Hence, the transfer magnitude function is computed as

|H [f ]| =√

Pyy [f ]Pxx[f ] . We will use this last relationship to estimate the filter bank transfer function and

thereby verify the filter bank.The protocol we follow to verify a filter bank implementation is:

1. Generate a noise sequence x[n] by, for instance using the randn command in MATLAB. Store thissequence as a wav-file.

25

Page 26: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

100 200 300 400 500

x0[n]

x1[n]

x2[n]

x3[n]

x4[n]

x5[n]

x6[n]

x7[n]

Sample index

Figure 11: Input subband signals from an eight band analysis filter bank with a sinusoidal signal centeredin the middle of band k = 2. Notice that most of the signal magnitude is found in subband 2. A smallinitial transient can be seen for all bands around 70 samples, this is the step due to the sinusoidal signalbeing activated is seen across subbands.

2. Input the generated noise sequence x[n] to a DSP-project that simply copies the ADC input to theDAC output, i.e., without any filter bank processing. Estimate the PSD of the DAC output andstore the PSD as Pxx[f ]. This step is illustrated in figure 13.

3. According to figure 14, repeat the following for each subband k = 0 . . .K − 1:

(a) Set the subband processor weights to zero except for at index k where the weight is one, i.e.,

gm ={

1, if m = k0, if m �= k

for m = 0 . . .K − 1.

(28)

(b) Input the generated noise sequence x[n] to the ADC input.

(c) Perform the analysis-synthesis filter bank operation.

(d) Output the synthesis filter bank output signal y[n] to the DAC output channel.

(e) Estimate the PSD of the DAC output and store the PSD as Pyy|k[f ].

(f) Estimate the transfer magnitude function, as |Hk[f ]| =√

Pyy|k [f ]

Pxx[f ].

(g) Compare the estimated transfer magnitude function |Hk[f ]| to the ideal transfer magnitudefunction, |Hk[f ]|.

(h) Repeat steps 3.a to 3.g for each subband k = 0 . . .K − 1.

One example on how the results from the verification procedure may look for the third band, i.e.,h2[n], of an eight band filter bank is illustrated in figure 15. We see that the estimated transfer magnitudefunction |H2[f ]| is very similar to the true transfer magnitude function |H2[f ]| which leads us to concludethat the filter bank implementation of subband k = 2 is correct.

26

Page 27: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

Analysisfilter bank

x0[n]

x1[n]

xK-1[n]

y0[n]

y1[n]

yK-1[n]

x[n] y[n]Synthesisfilter bank

g0

g1

gK-1

Figure 12: Analysis-synthesis filter bank configuration used to verify the filter bank implementation.The subband signals xk[n] are scaled by subband scaling constants gk to yield subband output signalsyk[n] = gk · xk[n].

White noise

generator

Power spectrumestimator

Pxx[f]ADC DACx[n]

DSP board

x[n]

Copyinput to the output

Figure 13: Pass a generated noise file through the DSP system without doing any processing of the signal,estimate the PSD of the output signal.

D.4 Estimating DSP processing load

It is important to ensure that the implemented method does not require more than available DSP cyclesfor its computation. Else, we risk that the DSP processed sound contains undesirable artifacts due toe.g. clipping. The DSP processing load can be estimated by using a General Purpose IO (GPIO) pin.Connect any available GPIO pin to, e.g., an oscilloscope. In your DSP-software, define the same GPIOas an output pin. Now, when you start processing a new sample, or block of samples, in your DSP-code,set the GPIO high, and when your software is finished processing, set the GPIO low. This means thatthe GPIO signal will be toggeling between low and high, and it is low when you are not processingsamples and high when you are processing samples. By measuring the duty cycle on the GPIO pin wecan estimate the processing load. If, for instance, the GPIO signal’s duty cycle is 50 %, it means thetime the GPIO is low is the same as the time the GPIO is high, hence, we utilize 50 % of the availableresources. Note that you should have less than 100 % resource utilization, and you should reserve a smallprocessing margin (say 5 %) for other over-head tasks performed by the DSP to avoid sound artifacts.

D.5 Recursive Averages

Recursive averages are highly useful in practical implementations due to their low computational cost.In effect, they implement an averaging memory whose memory length can, at least approximately, bedefined arbitrarily.

Let us consider a one-to-one mapping (i.e., a scalar function of a scalar variable) fk of a randomvariable, e.g., xk[n], and then we consider its expectation, i.e., E{fk(xk[n])}. Auto-regressive averagesare used to continuously approximate this expectation and where the approximation is updated for everynew input sample. Denote the average that approximate E{fk(xk[n])} as ak[n], it is defined as

ak[n] = (1− λk)ak[n− 1] + λkfk(xk[n]), (29)

27

Page 28: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

White noise

generator

Power spectrumestimator

Pyy[f]ADC DACFilter bank

x[n] y[n]

DSP board

Figure 14: Pass a generated noise file through the DSP system and apply an analysis-synthesis filterbank where the subband processors are zero except for at one subband index where it is one, and thenestimate the PSD of the output signal.

0 0.1 0.2 0.3 0.4 0.5−80

−70

−60

−50

−40

−30

−20

−10

0

10H2[f ]

10

log|H

2[f

]|2

Normalized frequency - f

Figure 15: Verification of the second band of an eight band filter bank using the proposed verificationmethod, where the estimated transfer magnitude function |H2[f ]| (black) and the true transfer magnitudefunction |H2[f ]| (gray) are overlayed.

where λk is referred to as the learning parameter. The learning parameter is defined as λk = 1FS ·Tλ;k

,where FS is the sampling frequency in Hz, and Tλ;k is the average’s memory length in seconds. Notethat if the data xk[n] is decimated by a factor D, for instance by a filter bank, then the correspondingdefinition of the learning parameter is λk = D

FS ·Tλ;k. We see that ak[n] is low-pass filtering fk(xk[n]).

Hence, if λk � 1, we will have ak[n] ≈ E{fk(xk[n])}, asymptotically.

D.5.1 Example 1: Power and magnitude spectrum estimation

Let the factor p decide if we estimate power or magnitude spectrum of a signal xk[n]. If p = 2, a powerspectrum is estimated, and if p = 1, a magnitude spectrum is estimated. The spectrum of xk[n] isE{|xk[n]|p}. The spectrum is estimated using a recursive average Px;k[n], according to

Px;k[n] = (1− λk)Px;k[n− 1] + λk|xk[n]|p. (30)

28

Page 29: Digital Signal Processors ET1304 · 2012. 1. 22. · Digital Signal Processors ET1304 Projects∗ Benny S¨allberg, Ph.D. Department of Electrical Engineering Blekinge Institute of

D.5.2 Example 2: Background noise spectrum estimation

Let the factor p decide if we estimate power or magnitude spectrum of the background noise of a signalxk[n]. If p = 2, a power spectrum is estimated, and if p = 1, a magnitude spectrum is estimated. Let aVAD (see section C) signal if there is speech in a signal frame, or not. The VAD detector value is d[n] = 0if there is no speech in the frame and it is d[n] = 1 if there is speech in the frame. Since we want toestimate background noise, we should only estimate when d[n] = 0 because xk[n] = vk[n] if there is nospeech present. The spectrum of vk[n] is E{|xv[n]|p}. The spectrum is estimated, during speech pauses,using a recursive average Pv;k[n], according to

Pv;k[n] ={

(1− λk)Px;k[n− 1] + λk|xk[n]|p if d[n] = 0,

Pv;k[n− 1] else.(31)

29


Recommended