TDE Overview

8/3/2019 TDE Overview

1/19

Hindawi Publishing CorporationEURASIP Journal on Applied Signal ProcessingVolume 2006, Article ID 26503, Pages 119DOI 10.1155/ASP/2006/26503

Time Delay Estimation in Room AcousticEnvironments: An Overview

Jingdong Chen,1 Jacob Benesty,2 and Yiteng (Arden) Huang1

1 Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974, USA2 INRS-EMT, Universite du Quebec, 800 de la Gaucheti ere Ouest, Suite 6900, Montreal, Quebec, Canada H5A 1K6

Received 31 January 2005; Revised 6 September 2005; Accepted 26 September 2005

Time delay estimation has been a research topic of significant practical importance in many fields (radar, sonar, seismology, geo-physics, ultrasonics, hands-free communications, etc.). It is a first stage that feeds into subsequent processing blocks foridentifying,localizing, and tracking radiating sources. This area has made remarkable advances in the past few decades, and is continuing toprogress, with an aim to create processors that are tolerant to both noise and reverberation. This paper presents a systematicoverview of the state-of-the-art of time-delay-estimation algorithms ranging from the simple cross-correlation method to the ad-vanced blind channel identification based techniques. We discuss the pros and cons of each individual algorithm, and outline theirinherent relationships. We also provide experimental results to illustrate their performance differences in room acoustic environ-ments where reverberation and noise are commonly encountered.

Copyright 2006 Hindawi Publishing Corporation. All rights reserved.

1. INTRODUCTION

Time delay estimation (TDE), which serves as the first stagethat feeds into subsequent processing blocks of a systemto detect, identify, and locate radiating sources, has plentyof applications in fields as diverse as radar, sonar, seismol-ogy, geophysics, ultrasonics, and communications. It has at-tracted a considerable amount of research attention, eversince sensor arrays were introduced to measure a propagat-ing wavefield.

Depending on the nature of its application, TDE can bedichotomized into two broad categories, namely, the time ofarrival (TOA) estimation [14] and the time difference of ar-rival (TDOA) estimation [58]. The former aims at measur-

ing the time delay between the transmission of a pulse sig-nal and the reception of its echo, which is often of primaryinterest to an active system such as radar and active sonar;while the latter, as its name indicates, endeavors to deter-mine the travel time of a wavefront between two spatiallyseparated receiving sensors, which is often of concern to apassive system such as passive sonars and microphone arraysystems. Although there exists intrinsic relationship betweenthe TOA and TDOA estimation, their essential difference isliterally profound. In the former case, the clean referencesignal, that is, the transmitted signal, is known, such that thetime delay estimate can be obtained based on a single sensorgenerally using the matched filter approach. On the contrary,

in the latter, no such explicit reference signal is available, andthe delay estimate is often acquired by comparing the signalsreceived at two (or more) spatially separated sensors. Thispaper deals with TDE, with its emphasis on the TDOA esti-mation. From now on, we will make no distinction betweenTDE and TDOA estimation unless necessary.

The estimation of TDOA would be an easy task if the tworeceived signals were merely a delayed and scaled version ofeach other. In reality, however, the source signal is generallyimmersed in ambient noise since we are living in a natu-ral environment where the existence of noise is inevitable.Furthermore, each observation signal may contain multi-ple attenuated and delayed replicas of the source signal dueto reflections from boundaries and objects. This multipathpropagation effect introduces echoes and spectral distortions

into the observation signal, termed as reverberation, whichseverely deteriorates the source signal. In addition, the sourceof the wavefront may also move from time to time, resultingin a changing time delay. All these factors make time delayestimation a complicated and challenging problem. Over thepast few decades, researchers have approached such a prob-lem by exploiting different facets of the received signals. Nu-merous algorithms have been developed, and they can be cat-egorized from the following points of view:

(i) the number of sources in the wavefield, that is, single-source TDE techniques [5, 9] and the multiple-sourceTDE techniques [10, 11];
http://-/?-http://-/?-


2/19

2 EURASIP Journal on Applied Signal Processing

(ii) how the propagation condition is modeled, that is, theideal single-path propagation model [5], the multi-path propagation model [1214], and the reverbera-tion model [1517];

(iii) what analysis tools are employed, for example, gen-eralized cross-correlation (GCC) method [5, 1822],

higher-order-statistics-(HOS) based approaches [23,24], and blind channel identification based algorithms[15, 25];

(iv) how the delay estimate is updated, that is, non-adapt-ive and adaptive approaches [2630].

These methods were experimented with a certain successin various applications. However, the tolerance of TDE withrespect to distortion (especially to reverberation) is still anopen problem. A great deal of efforts have been made to im-prove the robustness of TDE techniques over the past fewyears. By and large, the improvements are achieved throughthree different ways. The first is to incorporate some a pri-ori knowledge about the distortion sources into the GCC

method to ameliorate its performance. The second is to usemultiple (more than two) sensors and take advantage of theredundancy to enhance the delay estimate between the twoselected sensors. The third is to take into account of rever-beration in the signal model and exploit the advanced sys-tem identification techniques to improve TDE. This paperattempts to summarize these efforts, and review the stateof the art, the critical techniques, and the recent advanceswhich have significantly improved performance of time de-lay estimation in adverse environments. We discuss the prosand cons of each individual algorithm, and outline the re-lationships across different algorithms. We also provide ex-perimental results to illustrate their performance in room

acoustic environments where reverberation, noise, and inter-ference are commonly encountered.

2. SIGNAL MODELS FOR TDE

Before discussing the TDE algorithms, we present mathe-matical models that can be employed to describe an acous-tic environment for the TDE problem. Such a system mod-eling will, on the one hand, help us better understand theproblem, and on the other hand, form a basis for discussionand analysis of various algorithms. Principally, three signalmodels have been used in the literature of TDE. They are theideal single-path propagation model, the multipath model,

and the reverberation model, respectively.

2.1. Ideal propagation model

Suppose that we have an array consisting ofN receivers, theideal propagation model assumes that the signal acquired byeach sensor is a delayed and attenuated version of the origi-nal source signal plus some additive noise. In a mathematicalform, the received signals are expressed as

xn[k] = ns

k t fn()

+ wn[k], (1)

where n, n = 0,1,2, . . . , N 1, are the attenuation factorsdue to propagation effects, s(k) is the unknown source signal,

tis the propagation time from the unknown source to sensor0, wn[k] is an additive noise signal at the nth microphone, isthe relative delay between microphones 0 and 1, and fn() isthe relative delay between microphones 0 and n with f0() =0 and f1() = . For n = 2, . . . , N1, the function fn dependsnot only on but also on the microphone array geometry.

For example, in the far-field case (plane wave propagation),for a linear and equispaced array, we have

fn() = n, n = 2, . . . , N 1, (2)

and for a linear but nonequispaced array, we have

fn() =

n1i=0 did0

, n = 2, . . . , N 1, (3)

where di is the distance between microphones i and i + 1,i = 0,1,2, . . . , N 2. In the near-field case, fn depends alsoon the position of the source. Also note that fn() can be anonlinearfunction of for a nonlinear array geometry, even

in the far-field case (e.g., 3 equilateral sensors). In general is not known, but the geometry of the array is known suchthat the mathematical formulation of fn() is well defined orgiven. It is further assumed that s[k] is reasonably broadbandand wn[k] is a zero-mean Gaussian random process that isuncorrelated with both the source signal and the noise sig-nals at other sensors. For this model, the TDE problem isformulated to determine an estimate of the true time delayusing a set of finite observation samples.

2.2. Multipath model

The ideal propagation model takes only into account the

direct-path signal. In many situations, however, each sen-sor receives multiple delayed and attenuated replicas of thesource signal due to reflections of the wavefront from bound-aries and objects in addition to the direct-path signal. Thisso-called multipath effect has been intensively studied in theliterature [13, 14, 31, 32]. In this case, the received signals areoften described mathematically as

xn[k] =M

m=1

nms

k t nm

+ wn[k], n= 0,1, . . . , N1,

(4)

where nm is the attenuation factor from the unknown source

to the nth sensor via the mth path, t is the propagation timefrom the source to sensor 0 via direct path, nm is the rel-ative delay between sensor n and sensor 0 for path m with01 = 0, Mis the number of different paths, and wn[k] is sta-tionary Gaussian noise and assumed to be uncorrelated withboth the source signal and the noise signals observed at othersensors. This model is widely adopted in the oceanic prop-agation environments as illustrated in Figure 1, where eachsensor receives not only the direct path signal, but reflectionsfrom both the sea surface and the sea bottom as well [33, 34].The primary interest of the TDE problem for this model is tomeasure n1, n = 1, . . . , N 1, which is the TDOA betweensensor n and sensor 0 via direct path.


3/19

Jingdong Chen et al. 3

Sea surface

s[k]

w[k]A

rray

Sea bottom

.

.

.

Figure 1: Illustration of the signal model in a multipath environ-ment.

2.3. Reverberation model

The multipath model is valid for some but not all environ-ments [35]. In addition, if there are many different paths,that is, M is large, it is difficult to estimate all nms in (4).Recently, a more realistic reverberation model has been used

to describe the TDE problem in a room environment whereeach sensor often receives a large number of echoes due toreflections of the wavefront from objects and room bound-aries such as walls, ceiling, and floor [15, 36, 37]. In addition,reflections can occur several times before a signal reaches thearray, as shown in Figure 2. In this model, the received signalsare expressed as

xn[k] = hn s[k] + wn[k], (5)

where denotes convolution, hn is the channel impulse re-sponse between the source and the nth sensor, and again weassume that s[n] is reasonably broadband and wn[k] is un-correlated with s[k] and the noise signals at other sensors. Ina vector-matrix form, the signal model (5) can be rewrittenas

xn[k] = hTn s[k] + wn[k], n = 0,1, . . . , N 1, (6)

where

hn =

hn,0 hn,1 hn,L1T

,

s[k] =

s[k] s[k 1] s[k L + 1]T

,

(7)

and L is the length of the longest channel impulse responsesamong N channels.

As seen, no time delay is explicitly expressed in (5), hence

there is no plain solution to the TDE problem with the rever-beration model. In this case, TDE is often achieved in twosteps. The first step is to estimate the N channel impulse re-sponses from the source to the N receivers. Once the chan-nel impulse responses are measured, the TDOA informationbetween any two receivers is obtained by identifying the twodirect paths [15, 16, 38, 39]. Since we do not have any a prioriknowledge about the source signal and the only informationthat can be accessed is the observation data, channel impulseresponses have to be estimated in a blind manner. However,blind channel identification is a very challenging problem,particularly in room acoustic environments where channelimpulse responses are usually very long.

s[k]

w[k]

Array

Figure 2: Illustration of the signal model in a reverberant environ-ment.

3. TDE ALGORITHMS

Various TDE algorithms were developed in the literature. Inthis section, we brief some critical techniques. Some of themhave already been widely used, while others may not be pop-ular with existing systems, but have the great potential for usein future ones.

3.1. Cross-correlation method

The cross-correlation (CC) method is the most straightfor-ward and the earliest developed TDE algorithm, which is for-

mulated based on the single-path propagation model givenin (1) with only two receivers, that is, N = 2. Suppose thatwe have a block of observation signals at time instant k,

xn[k] =xn[0],xn[1], . . . ,xn[l], . . . ,xn[K 1]

T=

xn[k], xn[k + 1], . . . , xn[k + K 1]T

,(8)

where n = 0,1 and Kis the block size, then the delay estimatewith the CC method is obtained as the lag time that maxi-mizes the cross-correlation function (CCF) between two ob-servation signals, that is,

CC = argmaxmCC[m], (9)

where

CC[m] = Ex0[l]x1[l+ m]

(10)

is theCCF between x0[l] and x1[l], E{} stands for the math-ematical expectation, CC is an estimate of the true delay,m [max, max], and max is the maximum possible de-lay. In digital implementation of (9), some approximationsare required because the CCF is not known and must be es-timated. A normal practice is to replace the CCF defined in


4/19


(10) by its time-averaged estimate, that is,

CC[m] =

1

K

Km1l=0

x0[l]x1[l+ m], m 0,

1

K

K1

l=m

x0[l]x1[l+ m], m < 0.

(11)

A similar method, formulated from the average-mag-nitude-difference function (AMDF), was also investigated inthe literature [40], where the TDE becomes to identify theminimum of AMDF, that is,

AMDF = arg minm

AMDF[m], (12)where

AMDF[m] =

1

K

Km1l=0

x0[l] x1[l+ m], m 0,1

K

K1l=m

x0[l] x1[l+ m], m < 0,(13)

is the AMDF between x0[l] and x1[l]. It has been shown that[41, 42]

EAMDF[m]=2

Ex20[l] + Ex

21[l] 2ECC[m].(14)

There are three terms in the brackets under the square rootof (14): the first two are the signal energies, and the thirdis the expectation of CCF. The signal energy, which can betreated as a constant during the observation period, does notaffect the peak position. Therefore, statistically, searching theminimum of the AMDF is same as finding the maximumof the CCF between two observation signals. As a result, theAMDF approach should exhibit a similar performance to theCC method from a statistical point of view [43].

3.2. Generalized cross-correlation method

The generalized cross-correlation (GCC) algorithm can betreated as an improved version of the CC method. Not onlydoes it unify various correlation-based algorithms into onegeneral framework, but it also provides a mechanism to in-corporate knowledge to improve the performance of TDE.This method has gained its great popularity since the land-mark paper [5] was published by Knapp and Carter in 1976.In this framework, the delay estimate is obtained as

GCC = arg max

mGCC[m], (15)

where

GCC[m] =K 1k=0

[k]Sx0x1 [k]ej2mk

/K

=

K 1

k=0 x0x1 [k]ej2mk/K

(16)

is so-called generalized cross-correlation function (GCCF),Sx0x1 [k

] = E{X0[k]X1 [k

]} is the cross-spectrum, () de-notes the complex conjugate operator, Xn[k] is the discreteFourier transform (DFT) ofxn[k],[k] is a weighting func-tion (sometimes called a prefilter), K is the length of theDFT, and x0x1 [k

] = [k]Sx0x1 [k] is the weighted cross-

spectrum. In a practical system, the cross-spectrum Sx0x1 [k]

has to be estimated, which is normally achieved by replac-ing the expected value by its instantaneous value, that is,

Sx0x1 [k

] =X0[k]X1 [k

].There is a number of member algorithms in the GCC

family depending on how the weighting function [k] is se-lected. Commonly used weighting functions include the con-stant weighting (in this case, the GCC becomes a frequency-domain implementation of the cross-correlation methodshown in (9)), the smoothed coherence transform (SCOT)[44], the Roth processor [45], the Echart filter [5], the phasetransform (PHAT), the maximum-likelihood (ML) proces-sor [5], the Hassab-Boucher transform [18], and so forth.Combination of some of these functions is also reported inuse [46].

Different weighting functions possess different proper-ties. For example, the PHAT algorithm uses PHAT[k] =1/|Sx0x1 [k

]|. SubstitutingPHAT[k]into(15) and neglecting

noise effects, one can readily deduce that the weighted cross-spectrum is free from the source signal and depends only onthe channel responses. Consequently the PHAT algorithmperforms more consistently than many other GCC mem-bers when the characteristics of the source signal change overtime. It is also observed that the PHAT algorithm is more im-mune to reverberation than many other cross-correlation-based methods. Another example is the ML processor withwhich the delay estimate obtained in the ideal propagationsituation is optimal from a statistical point of view sincethe estimation variance can achieve the Cramer-Rao lowerbound (CRLB). It should be pointed out that in order forthe ML processor to achieve the optimal performance, the

observation sample space has to be large enough; the envi-ronments should be free of reverberation; the delay has tobe constant; and the observation signals should be station-ary processes. In addition, the spectra of noise signals have tobe known a priori. If any of these conditions does satisfy, theML algorithm will then become suboptimal, like other GCCmembers.

3.3. LMS-type adaptive TDE algorithm

This method, also based on the ideal propagation modelwith two sensors, was proposed by Reed et al. in 1981 [26].It has been intensively investigated in the literature since


5/19


then [2830, 47]. Different from the cross-correlation-basedapproaches, this algorithm achieves time delay by minimiz-ing the mean-square error between x0[k] and a filtered (FIRfilter) version ofx1[k], and the delay estimate is obtained asthe lag time associated with the largest component of the FIRfilter. If we define a signal vector ofx1[k] at time instant k as

x1[k] = x1[k L], x1[k L + 1], . . . , x1[k],x1[k + 1], . . . , x1[k + L]

T (17)and an FIR filter of length 2L + 1 as

h[k] =

h0, h1, . . . , hl, hl+1, . . . , h2LT

, (18)

where L is the maximum possible time delay, then an errorsignal can be formulated as

e[k] = x0[k] hT[k]x1[k]. (19)

An estimate ofh[k] can be achieved by minimizing E{e2[k]}using either a batch or an adaptive algorithm. For example,with the least-mean-square (LMS) adaptive algorithm, h[k]can be estimated through

h[k + 1] = h[k] + e[k]x1[k], (20)

where is a small positive adaptation step size. Given thisestimate ofh[k], the delay estimate can be determined as

LMS = argmaxl

hl L. (21)Other adaptive algorithms [48] can also be used, which maylead to a better performance.

3.4. Fusion algorithm based on multiple sensor pairs

The GCC framework, which may yield much improvementover the traditional direct cross-correlation method if theweighting function is properly selected, still suffers signif-icant performance degradation in adverse environments.Much attention has been paid to improving the tolerance ofTDE against noise and reverberation. Besides using some apriori knowledge about the distortion sources, another wayof combating noise and reverberation is through exploitingthe redundant information provided by multiple sensors. Toillustrate the redundancy, let us consider a three-sensor lineararray, which can be partitioned into three sensor pairs. Threedelay measurements can then be acquired with the observa-

tion data, that is, 01 (TDOA between sensor 0 and sensor 1),12 (TDOA between sensor 1 and sensor 2), and 02 (TDOAbetween sensor 0 and sensor 2). Apparently, these three de-lays are not independent. As a matter of fact, if the source islocated in the far field, it is easily seen that 02 = 01 + 12.Such a relation was exploited in [49] to formulate a two-stage TDE algorithm. In the preprocessing stage, three delaymeasurements were measured independently using the GCCmethod. A state equation was then formed and a Kalman fil-ter is used in the postprocessing stage to enhance the delayestimate of01 and 12. It was shown that in the far-field case,the estimation variance of01 can be reduced by a factor of 6in low SNR (SNR 0), and of 4 in high SNR (SNR )

conditions. More recently, several approaches based on mul-tiple sensor pairs were developed to deal with TDE in roomacoustic environments [5052]. Different from the Kalmanfilter method, these approaches fuse the estimated cost func-tions from multiple sensor pairs before searching the timedelay. We will call such a scheme as information fusion based

algorithm. In general, the problem of TDE with the fusionalgorithm can be formulated as

FUSION = arg maxm

Pp=1

Fp[m], (22)

where P is the total number of sensor pairs, p[m] repre-sents some delay cost function measured from the pth sensorpair (it can be CCF, GCCF, AMDF, etc.), and F{} denotessome mathematical transformation, which ensures that thecost functions (p[m]) for all the Psensor pairs, after trans-formation, have their peaks due to the same source in thesame location. Various methods can be formulated by select-

ing a different F{} or . For example, if all sensor pairs arecentered around a same position, by choosing F{x} = x,[m] as the GCCF from the PHAT algorithm, one can read-ily derive the so-called synchronous adding method in [50].We can also easily derive the consistency method in [51] andthe SRP (steered response power)-PHAT algorithm in [52].Compared with the algorithms using only two sensors, thefusion technique can usually deliver a better performance.However, its computational complexity is also more than Ptimes of the complexity of the corresponding dual-sensortechnique, where Pis the number of sensor pairs.

3.5. Multichannel cross-correlation algorithmRecently, a squared multichannel cross-correlation coeffi-cient (MCCC) was derived from the theory of spatial linearprediction and interpolation [53]. Consider the signal modelgiven in (1) with a total of N sensors. At time instant k, theMCCC is defined as

2N(k, m) = 1

det

R(k, m)N1

l=0 rll(k, m)= 1 det

R(k, m), (23)where det stands for determinantof a matrix,

R(k, m) =

r00(k, m) r01(k, m) r0N1(k, m)

r10(k, m) r11(k, m) r1N1(k, m)...

. . .. . .

...

rN10(k, m) rN11(k, m) rN1N1(k, m)

,

(24)

is the signal covariance matrix,

rij (k, m) =k

p=0

kpxip + fj (m)

xjp + fi(m)

,

i, j = 0,1, . . . , N 1,

(25)


6/19


7/19


ofR [15]:

u[k + 1] = u[k] e[k]x[k]u[k] e[k]x[k] , (33)with the constraint that

u[k] = 1, where

e[k] = uT[k]x[k] (34)is an error signal, denotes the l2 norm of a vector ormatrix, and , the adaptation step, is a positive constant.

With the identified impulse responses h0 and h1, the timedelay estimate is determined as the difference between twodirect paths, that is,

AED = arg maxl

h1,l arg maxl

h0,l. (35)3.7. Adaptive multichannel time delay estimation

In the AED algorithm, the delay estimate is obtained byblindly identifying two channel impulse responses. It re-quires that the two channels do not share any common ze-ros, which is usually true for systems with short impulse re-sponses. In many application scenarios such as room acousticenvironments, however, the channel impulse response fromthe source to the microphone sensor could be very long, de-pending on the reverberation condition. As the length of thetwo impulse responses becomes longer, the probability forthem not sharing common zeros will become lower and theAED algorithm often fails when a zero is shared between twochannels or some zeros of the two channels are close. Oneway to overcome this problem is to employ more channels

in the system, since it would be less likely for all channels toshare a common zero when the number of sensors is large.This idea leads to an adaptive multichannel (AMC) time de-lay estimation approach based on a blind channel identifica-tion technique [39].

Considering the reverberation model in (5), we can de-fine a cost function among all the Nchannels, at time instantk + 1, as

J[k + 1] =N2i=0

N1j=i+1

e2ij [k + 1], (36)

where

ei j [k + 1] =xTi [k + 1]hj [k] xTj [k + 1]hi[k]h[k] ,

i, j = 0,1, . . . , N 1,

(37)

is an error signal between sensor i and sensor j at time k + 1,hn[k] is the modeling filter ofhn[k], andh[k] = hT0 [k] hT1 [k] hTN1[k]T . (38)

It follows immediately that various adaptive algorithms can

be used to achieve an estimate of

h[k], by minimizingJ[k+1].

For example, a multichannel LMS (MCLMS) algorithm was

derived in [60], which updates h throughh[k + 1] = h[k] 2R[k + 1]h[k] J[k + 1]h[k]

h[k] 2

R[k + 1]

h[k] J[k + 1]

h[k]

,

(39)

where again , the adaptation step, is a positive constant,

R[k + 1] =

i=0

Rxi xi [k+1] Rx1 x0 [k+1] RxN1 x0 [k+1]Rx0 x1 [k+1]

i=1

Rxi xi [k+1] RxN1 x1 [k+1]...

.... . .

...

Rx0 xN1 [k+1]

Rx1 xN1 [k+1]

i=N1

Rxi xi [k+1]

,

Rxixj [k + 1] = xi[k + 1]xTj [k + 1], i, j = 0,1, . . . , N 1.(40)

It was shown that with this MCLMS algorithm the channelestimate can converge in mean to the true impulse responses(up to a scale and common delay). However, the convergencerate of this algorithm is normally slow. To accelerate the con-vergence rate, a normalized multichannel frequency-domainLMS (NMCFLMS) algorithm was developed in [25]. Dif-ferent from the MCLMS method, which updates the chan-nel estimate every snapshot, the (NMCFLMS) algorithm op-erates in the frequency domain on a block-by-block basis.

First, the multichannel observation signals are partitionedinto successive blocks. The fast Fourier transform (FFT) isthen applied to each block to estimate its Fourier spectrum.The frequency-domain channel estimate is then updated us-ing the normalized LMS algorithm. Finally, the time-domainimpulse responses are obtained by applying the inverse FFTto the frequency-domain channel estimate. See Algorithm 5for how to obtain the channel estimates and [25] for the de-tailed derivation of the NMCFLMS algorithm.

Once h[k] is achieved (with either the MCLMS algorithmor the NMCFLMS algorithm), the time-domain estimate ofimpulse responses is obtained by the inverse Fourier trans-form, and time delay between the ith and jth sensors is de-

termined asi j = arg maxl

hj,l argmaxl

hi,l. (41)4. ALGORITHM COMPLEXITY

This section briefly compares the computational complexityof different TDE algorithms. As seen, all the algorithms esti-mate time-delay information in two steps. The first step in-volves the estimation of the cost function. The second stepobtains time delay estimate by searching the extremum ofthe cost function. If we assume that different cost functionshave the same length, it can be easily checked that all the


8/19


Algorithm step: (Real-valued) multiplications:

Obtain a frame of observation signal at time instant k:

xn[k] =xn[0],xn[1], . . . ,xn[K 1]

T=

xn[k], xn[k + 1], . . . , xn[k + K 1]T

Estimate the spectrum ofx0[k]:

X0[k

]=

K1

k=0 x0[k]ej2kk/K K

2 log2(K)

5K

4= FFTK

x0[k]

, (k = 0,1, . . . , K 1)

Estimate the spectrum ofx1[k]:

X1[k] =

K1k=0

x1[k]ej2kk /K K

2log2(K)

5K

4

= FFTK

x1[k]

, (k = 0,1, . . . , K 1)

Compute the weighted cross-spectrum:Sx0x1 [k

]Sx0x1 [k] = EX0[k

]X1 [k]EX0[k]X1 [k] 4K + 8

Estimate the PHAT cost function:

PHAT[m] = K1k =0

Sx0x1 [k]

Sx0x1 [k

]

ej2mk/K 2Klog2(K) 7K+ 12

= FFT1K Sx0 x1 [k]Sx0 x1 [k], (m = 0,1, . . . , K 1)Total: 3Klog2 K

11

2K + 20

Total/sample: 3 log2 K 11

2+

20

K

Algorithm 1: Computational complexity of the PHAT algorithm. FFTK{} and IFFT1K {} are K-point fast Fourier and inverse fast Fourier

transforms, respectively. In addition, due to the symmetric property, we only need to perform K/2+1 complex multiplications and divisionsduring computation of the weighted spectrum.

algorithms have a similar complexity in the second step.

Therefore, we only compare the computational burdens re-quired for estimating the cost function. Here the com-putational complexity is evaluated in terms of the num-ber of real-valued multiplications/divisions required for theimplementation of each algorithm. The number of ad-ditions/subtractions are neglected because they are muchquicker to compute in most generic hardware platforms. Weassume that complex-valued multiplications are transformedinto real-valued multiplications. The multiplication betweena real number and complex number requires 2 real-valuedmultiplications. The multiplication between two complexnumbers needs 4 real-valued multiplications. The divisionbetween a complex number and a real number requires 2

real-valued multiplications.As mentioned earlier, there are different member algo-

rithms in the GCC family. Each involves two FFT opera-tions to estimate the cross-spectrum, some multiplicationsfor the weighting process, and an IFFT operation for com-puting the GCC function. If the Fourier transform of a real-valued series of length K is computed using the FFT rou-tine devised by [61], it requires (K/2)log2(K) 5K/4 mul-tiplications. An IFFT operation of a complex-valued seriesof length K requires 2Klog2(K) 7K + 12. The complex-ity of the PHAT algorithm is summarized in Algorithm 1.Similarly, the computational load for other GCC member

algorithms can be easily counted, which will not be presented

here.Unlike the GCC method, which estimates the time de-

lay on a frame-by-frame basis, the LMS-type adaptive al-gorithm updates the cost function whenever a new datasample is available. For each data sample, the number ofmultiplications required for computing the cost function isshown in Algorithm 2, which is higher than that of the PHATalgorithm.

The MCCC can be computed either on a block-by-blockbasis or in an iterative way. Its complexity is described inAlgorithm 3. We see that, depending on the number of sen-sors, the MCCC algorithm is generally more computationalyexpensive than the GCC method. Notice that more compu-

tationally efficient algorithm can be formulated to calculateMCCC using FFT. This is, however, beyond the scope of thispaper.

The computational burdens required for the estimationof channel impulse responses using either the AED or theNMCFLMS algorithms are presented in Algorithms 4 and 5,respectively. Depending on the length of the modeling filter,the estimation of channel impulse responses usually requiresmore multiplications than estimating the generalizing cross-correlation function. However, such a magnitude of compu-tational complexity should not be a big concern with todayscomputer processors.


9/19



Parameters:h[k] =

h0, h1, . . . , hl, hl+1, . . . , h2L

TObtain a signal vector x1 at time instant k:

x1[k] =

x1[k L], x1[k L + 1], . . . , x1[k 1],

x1[k], x1[k + 1], . . . , x1[k + L]TCompute the error signal at time instant k:

e[k] = x0[k] hT[k]x1[k] 2L + 1

Update the filter coefficients:h[k + 1] = h[k] + e[k]x1[k] 2L + 2

Total/sample: 4L + 3

Algorithm 2: Computational complexity of the LMS-type adaptive algorithm.


Obtain a frame of observation signal at time instant k:xn[k], k = 0,1, . . . , K 1,

n = 0,1, . . . , N 1

Prewhitening:

xn[k] = IFFTK

FFTK

xn[k]/FFTKxn[k], N5

2Klog2(K)

31

4K + 13

n = 0,1, . . . , N 1

Compute matrix R(k, m):R(k, m) =

1 01(k, m) 0N1(k, m)

10(k, m) 1 1N1(k, m)...

. . .. . .

...N10(k, m) N11(k, m) 1

(2K + 3)N(N 1)2max + 1

i j (k, m) =ri j (k, m)rii(k, m)rj j (k, m)

rij (k, m) =ri j (k 1, m) + xi[p + m]xj [p + m]i, j = 0,1, . . . , N 1max m max

Estimate the MCCC cost function:

detR(k, m), max m max 2max + 1N3

3+

5N

3

Total: 4maxKN2 + 4maxKN+ 2KN2 +

5

4NK +

5

2NKlog2 K +

2

3maxN3

+6maxN2 +1

3N3 +

28

3maxN + 3N2 +

43

3N

Total/sample: 4maxN2 + 4maxN+ 2N2 +5

4N +

5

2Nlog2 K

+1

K23 maxN3 + 6maxN2 + 13 N3 + 283 maxN+ 3N2 + 433 NAlgorithm 3: Computational complexity of the MCCC algorithm. It is assumed that determinant of a matrix is computed through LUdecomposition, which requires N3/3 + 5N/3 multiplications [62].

5. RESOLUTION PROBLEM

All the TDE techniques described above measure time de-lay based on discrete signal samples. The delay estimate is,therefore, an integral multiple of the sampling period. Such a

resolution, depending on the sampling rate and several otherfactors, may not be adequate for some applications. How toimprove the TDE resolution becomes another challengingproblem, and has attracted much attention in the past fewdecades. Different solutions can be applied, depending on


10/19



Parameters:

u =

hT1 hT0

T,

h0 =

h0,0 h0,1 h0,L1T

,

h1 =

h1,0 h1,1 h1,L1

T

Construct the signal vector at time instant k:

x[k] =

xT0 [k] xT1 [k]

T,

x0[k] =

x0[k], x0[k 1], . . . , x0[k L + 1]T

x1[k] =

x1[k], x1[k 1], . . . , x1[k L + 1]T

Compute the error signal at time instant k:e[k] = uT[k]x[k] 2L

Update the filter coefficients:

u[k + 1] = u[k] e[k]x[k]u[k] e[k]x[k] , 6L + 2Total/sample: 8L + 2

Algorithm

4: Computational complexity of the AED algorithm.

the TDE algorithm and the nature of application. To illus-trate, let us examine a simple case in the context of directionof arrival (DOA) estimation, where we have two sensors andone source in the far field as shown in Figure 3. The angularresolution, which governs the ability of the system to sepa-rate two closely spaced sources, is determined by how manydifferent DOA measurements can be made between 0 and .Assuming that the distance between two sensors is d, the ve-

locity of wave propagation is c, and the sampling rate is f,we can easily check that the maximum in samples that canbe estimated is df/c, the minimal is df/c, and the bearingangle relates to the time delayby

= arccosc

d. (42)

Therefore, the number of different measurements of in[0, ] depends on the number of different delay estimates in[df/c, df/c]. As a result, to increase the angular resolution,we need to have more different delay measurements betweendf/c and df/c. This can be achieved through the following

three ways.(i) Interpolation. Since its mathematical expectation is

shown to be band limited and present a symmetricpeak around the true time delay, the estimated cross-correlation function can be approximated by a con-cave parabola in the neighborhood of its maximum[40, 63, 64]. As a result, parabolic interpolation canbe applied to the cross-correlation-based algorithmsto obtain a finer TDE resolution, which is a frac-tion of the sampling period. Such a scheme has beenadopted in many systems. However, if the statistic ofthe cost function is not band limited, we, in general,cannot apply parabolic interpolation. Note that in real

environments, the applicability of interpolation is alsolimited by the SNR condition. If the SNR is very low,then interpolation will introduce significant bias. Forthe channel identification TDE techniques, if the es-timated channel impulse responses approximate thetrue ones, interpolation technique can also be appliedto increase resolution. However, in most situations,the impulse responses estimated with the blind tech-

niques are only accurate enough for identifying the di-rect path, but not good enough for interpolation.

(ii) Increasing the sampling rate. The higher the samplingrate, the more the number of different delay estimatescan be acquired between df/c and df/c, which inturnleads to a higher DOA resolution. This approach, how-ever, will increase the complexity of both the TDE al-gorithm and some subsequent processing blocks of thesystem.

(iii) Increasing d. DOA resolution can also be improvedby increasing d. Apparently, this will increase the ar-ray size. Therefore this method is hard to implementin scenarios where the space is limited. Also, a larger d

may cause spatial aliasing problem, which may not be abig concern for the task of source localization, but hasto be treated with great care in the context of beam-forming and noise reduction. In addition, increasingdmay lead to a higher complexity since we may haveto increase the block size to compute the cost functionand search the delay estimates in a larger delay range.

6. EXPERIMENTS

This section attempts to compare the performance of differ-ent TDE algorithms in both noisy and reverberant environ-ments.


11/19



Parameters:

h =

hT0 hT1 h

TN1

T, hn =

hn,0 hn,1 hn,L1

T,

f, the step size; , the regularization factor

Initialization:hn[0] and pn[0],

Construct a frame of signal of length 2L at time instant k;

Update Filter coefficients at time k (for k = 0,1, . . .):

h10n [k] = FFT2L

hTn [k] 01LT

; N

L log2(L)

3

2L

xn[k + 1]2L1 = FFT2L

xn[k + 1]2L1

; N

L log2(L)

3

2L

pn

[k + 1] =pn

[k]2L1 + (1 )N1

i=0, i=k

xi [k + 1] xi[k + 1]; 4N2L 2NL

pn

[k + 1] + 12L1

p1n

[k + 1]2L1; 2NL

ei j [k + 1]2L1 = xi[k + 1] h

10j [k] xj [k + 1] h

10i [k], i = j

02L1, i = j

2N2L 2NL

(i, j = 0,1, . . . , N 1)

ei j [k + 1]2L1 = IFFT2Lei j [k + 1]2L1; N(N 1)2

L log2 L 3

2L

Obtain ei j [k + 1], which consists of the lastL elements ofeij [k + 1];e01i j [k + 1] = FFT2L

01L e

Ti j [k + 1]

T;

N(N 1)

2

L log2 L

3

2L

h10n [k] =

N1i=0

xi [k + 1] e01in [k + 1] p

1n

[k + 1], 4NL

h10n [k] = IFFT2Lh10n [k]

, N

L log2 L

3

2L

h10n [k + 1] = h

10n [k] fh

10n [k], 2NL

Obtain hn[k + 1]L1, which consists of the first L elements ofh10n [k + 1]2L1.

h[k + 1] =h[k + 1]h[k + 1] (impose the unit-norm constraint) NL

Total: N(N+ 2)

L log2 L

3

2L

+ 5NL + 6N2L

Total/sample: N(N+ 2)

log2 L

3

2

+ 6N2 + 5N

Algorithm 5: Computational complexity of the NMCFLMS algorithm. FFT2L{} and IFFT2L{} are 2L-point fast Fourier and inverse fastFourier transforms, respectively. denotes dot product.

6.1. Experimental setup

In an attempt to simulate reverberant acoustic environments,the image model technology [65] is used. We consider arectangular room with plane reflective boundaries (walls,ceiling, and floor). Each boundary is characterized by a uni-form reflection coefficient, which is independent of the fre-quency and the incidence angle of the source signal. The fol-lowing parameter values are used.

(i) Room dimensions: 120 180 150inch (x y z).(ii) Reflection coefficients: ri (i = 1,2, . . . , 6) varying be-

tween 0 and 1.

(iii) Source position: two point omnidirectional sourcesare located at (100, 100, 40) and (32, 100, 40), respec-tively.

(iv) Sensor positions: a linear array which consists of four(4) ideal point microphones placed in parallel with thex-axis. The four microphones are located at (20, 10,40), (28, 10, 40), (36, 10, 40), and (44, 10, 40), respec-tively. The directivity pattern of each microphone is as-sumed to be omnidirectional.

(v) SNR: varying between 10 dB and 25 dB.

A low-pass sampled version of the impulse response ofthe acoustic transmission channel between each source and


12/19


Table 1: Parameter setup for each TDE algorithm.

Window size Window ty pe FFT size Smoothing factor Filter length Adaptation step size

CC K = 1024 Kaiser K = 1024 = 0.95 N/A N/A

PHAT K = 1024 Kaiser K = 1024 = 0.95 N/A N/A

ML K = 1024 Kaiser K = 1024 = 0.95 N/A N/A

AED K = 1024 Rectangular N/A = 0.95 L = 1024 = 0.01

LMS N/A N/A N/A N/A L = 1024 = 0.0001

MCCC K = 1024 Rectangular N/A = 0.95 N/A N/A

AMC K = 1024 Rectangular 2048 = 0.8 [60] L = 1024 f = 0.2 [60]

FUSION K = 1024 Kaiser K = 1024 = 0.95 N/A N/A

s[k]

Plane wavefront

d

Sensor 1 Sensor 0

Figure 3: Illustration of TDE resolution problem in the context of DOA estimation.

each microphone is generated using the image method. Aspeech signal from a female speaker, digitized with 16-bit res-olution at 16 kHz, is then convolved with the synthetic im-pulse responses. Finally, mutually independent white Gaus-sian noise is properly scaled and added to each microphonesignal to control the SNR.

6.2. Implementation

Delay estimates were obtained on a frame-by-frame basis.The frame size used in all experiments is 64 ms. For the cross-correlation-based techniques (including dual- and multiple-channel algorithms), a 64-ms Kaiser window was applied tothe analysis frame, while a rectangular window of the samelength was applied for the channel identification-based algo-rithms. To reduce the temporal effect of noise on TDE per-formance, the cost function of each algorithm is smoothedusing a single-pole recursion as follows:

k = k1 + (1 )k, (43)where

k denotes the cost function estimated using the kth

frame of observation data, k is a smoothed version of the

cost function, based on which the delay estimates were ob-tained. For the MCCC algorithm, the signal was prewhitenedbefore computing the cost function. Therefore, this method,in the case of two sensors, is equivalent to the PHAT algo-rithm. For the ML method, we assume that the noise spec-trum is known a priori. The fusion algorithm implementedhere is the consistency method presented in [51]. All the pa-rameters used in each algorithm are summarized in Table 1.

It is not always easy to compare fairly different algo-rithms. In our experiments, we optimized each individualalgorithm in a nonreverberant and weak noisy (SNR =25 dB) environment to its best performance. We then testand compare all the algorithms in reverberation and differ-ent noise conditions. Such a process should, in generally, notfavor any specific algorithm.

6.3. Experimental results

A great deal of efforts have been devoted to analyzing theTDE performance of the GCC technique in reverberant envi-ronments [66, 67]; but not much comparison has been madebetween correlation and system-identification-based algo-rithms. In this experiment, we compare all the algorithms


13/19


0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

T60 = 120ms

CC:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

T60 = 350ms

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

T60 = 580ms

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

PHAT:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

ML:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

AED:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

LMS:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

MCCC:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

AMC:

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

FUSION:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

Figure 4: TDE performances in moderate noisy and reverberant environments, where SNR = 15dB and T60 = 120ms, 350ms, and 580ms,respectively.


14/19


outlined previously for their performances in different re-verberant environments. Figure 4 shows histograms of TDEin a moderate noise condition where SNR = 15dB. Thesource is a speech signal from a female speaker and its loca-tion is in (100, 100, 40). The first, second, and third columnscorrespond, respectively, to reverberation times of 120 ms,

350 ms, and 580 ms. The true time delay between sensor 0and 1 is equal to 5 (samples). It can be seen that, in thefirst two reverberant environments, all algorithms can accu-rately identify the time delay. When reverberation time is in-creased to 580 ms, both the CC and the ML methods suffersignificant performance degradation, showing that these twoapproaches are sensitive to reverberation. The PHAT algo-rithm, though also belongs to the GCC family like the CCand ML methods, still yields a reasonable performance, im-plying its robustness with respect to reverberation. This cor-roborates with many observations reported in the literature[46]. Among the five techniques that use two sensors (i.e.,CC, PHAT, ML, AED, LMS), the AED algorithm delivers the

best performance. This indicates that taking it into accountin the signal model is an effective way in dealing with re-verberation. Comparing the MCCC, AMC, and fusion algo-rithms with dual-sensor techniques, one can easily see theadvantage of using multiple sensors. Since the AMC algo-rithm was formulated from the reverberation signal modeland using multiple sensors, it is not surprising to see that itachieves the best performance in this strong reverberant en-vironment.

The second experiment involves a set of data obtained innonreverberant (simulated by setting all the reflection coef-ficients to 0) but noisy environments. The source signal andits presentation are the same as in the previous experiment.

Figure 5 shows histograms of delay estimates. The first, sec-ond, and third columns correspond, respectively, to noiseconditions of SNR = 15 dB, 5 dB, and 5 dB. In general, allTDE techniques are quite robust to noise. They work prettywell even when SNR is as low as 5 dB. When SNR dropsdown to 5 dB, the TDE performance begin to deteriorate,even though the degree of degradation may vary across al-gorithms. Among all the eight algorithms studied, the LMSmethod is most sensitive to noise. We may consider to im-prove this technique by using some adaptive algorithms thathave a faster convergence rate or a low steady-state error.The PHAT algorithm, which demonstrated the highest ro-bustness with respect to reverberation in the GCC family, isinferior to both the CC and the ML approaches in additivenoise. The ML algorithm delivers a better performance thanthe CC method. This indicates that some a priori knowledgecan help the estimator to cope with distortion. Among thefive dual-sensor techniques, we noticed that the AED algo-rithm demonstrates the highest robustness not only to re-verberation, but to additive noise as well. This observationis different from our intuition since it is well perceived thatthe blind channel identification technique is in general sen-sitive to noise. We attribute this to the nature of the TDEproblem, which only requires to identify the direct path. Es-timation of the whole impulse response, depending on itslength and many other factors, may be sensitive to noise; but

identification of the direct path is a much easier task, and itcan be immune to noise.

Comparing the AMC with the AED algorithm, we didnot see much improvement as we observed in the previousexperiment. This is understandable. The motivation behindthe AMC algorithm is to circumvent the common-zero prob-

lem. The probability of a common zero shared among chan-nels decreases when the number of channels increases. How-ever, in this experiment, all the channels apparently share nocommon zero since there is no reverberation. As a result, theAMC is similar to the AED algorithm in performance.

Both the MCCC and fusion methods yield a performancesuperior to that of the techniques with two sensors, indicat-ing that using multiple sensors is a good way to improve therobustness of TDE with respect to additive noise. The MCCCshows a better performance than the fusion method.

The final experiment is to test the TDE algorithms fortheir tracking ability. To simulate a moving source, we firstplace the source at (100, 100, 40) for 30 seconds, and then

switch to (32, 100, 40). Again, the source is a speech signalas used in the previous experiments. In this case, the truedelay in the first 30 seconds is 5 (samples), and 3 (samples)then. The average SNR is 0 dB, and T60 = 240 ms. Figure 6shows the TDE results. The AED, LMS, and AMC algorithmsare adaptive in nature. They take some time to converge to anew delay. All other five methods are nonadaptive. However,due to the smoothing processing, they also take some time toadapt to the new source position. From the results, one cansee that all the algorithms can adjust to the new delay in lessthan one second.

7. SUMMARY

Time delay estimation, which serves as a fundamental stepfor a source localization or a beamforming system, has at-tracted a considerable amount of research attention in thepast few decades. Various techniques were developed inthe literature. This paper briefly summarized these efforts,and reviewed the state of the art, the critical techniques,and the recent advances which had significantly improvedperformance of time delay estimation in adverse environ-ments. Broadly, the reviewed techniques can be classified intotwo categories: cross-correlation-based methods and systemidentification-based approaches. Both categories can be im-plemented either based on two sensors, or using multiplesensors. We evaluated eight algorithms, including five dual-channel techniques and three multiple-channel techniques,in both reverberant and noisy environments. Among the fivestudied dual-channel techniques, the adaptive eigenvalue de-composition algorithm demonstrated the best performancein both noise and reverberation conditions, showing its greatpotential for real applications. In general, more sensors willlead to a higher robustness because of the redundancy. How-ever, it should be pointed out that attention has to be paid toimplementing the multichannel cross-correlation algorithmand the fusion method. Both need to synchronize either thesignals observed at different sensors, or the cost functionsfrom different sensor pairs. In case that the true delay is not


15/19


0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

SNR = 15dB

CC:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

SNR = 5 dB

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

SNR = 5 dB

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

PHAT:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

ML:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

AED:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

LMS:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

MCCC:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

AMC:

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Pe

rcenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

FUSION:

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

0

20

40

60

80

100

Percenthits

10 6 2 2 6 10

Delay (samples)

Figure 5: TDE performances in nonreverberant but noisy environments, where SNR = 15dB,5 dB, and 5 dB, respectively.


16/19


105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

CC

105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

PHAT

105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

ML

105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

AED

105

05

10

D

elay(samples)

0 10 20 30 40 50 60Time (s)

LMS

105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

MCCC

105

05

10

Delay

(samples)

0 10 20 30 40 50 60

Time (s)

AMC

105

05

10

Delay(samples)

0 10 20 30 40 50 60

Time (s)

FUSION

Figure 6: Tracking performance of different algorithms in a noisy and reverberant environment, where SNR = 0 dB and T60 = 240ms.


17/19


the integral multiple of the sampling rate, we will have to ei-ther increase the sampling rate or use interpolation, whichmay significantly increase the computational complexity. Incase that the observation signals or the cost functions are notproperly aligned, we may not achieve much improvement.

REFERENCES

[1] J. E. Ehrenberg, T. E. Ewart, and R. D. Morris, Signal-processing techniques for resolving individual pulses in a mul-tipath signal, Journal of the Acoustical Society of America,vol. 63, no. 6, pp. 18611865, 1978.

[2] N. L. Owsley and G. R. Swope, Time delay estimation in asensor array, IEEE Transactions on Acoustics, Speech, and Sig-nal Processing, vol. 29, no. 3, pp. 519523, 1981.

[3] R. J. Tremblay, G. C. Carter, and D. W. Lytle, A practical ap-proach to the estimation of amplitude and time-delay param-eters of a composite signal, IEEE Journal of Oceanic Engineer-ing, vol. 12, no. 1, pp. 273278, 1987.

[4] R. Wu, J. Li, and Z.-S. Liu, Super resolution time delay esti-

mation via MODE-WRELAX, IEEE Transactions on Aerospaceand Electronic Systems, vol. 35, no. 1, pp. 294307, 1999.

[5] C. H. Knapp and G. C. Carter, The generalized correlationmethod for estimation of time delay, IEEE Transactions on

Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320327, 1976.

[6] G. C. Carter, Time delay estimation for passive sonar signalprocessing, IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. 29, no. 3, pp. 463470, 1981.

[7] G. C. Carter, Coherence and time delay estimation, in SignalProcessing Handbook, C. H. Chen, Ed., pp. 443482, MarcelDekker, New York, NY, USA, 1988.

[8] A. H. Quazi, An overview on the time delay estimate in activeand passive systems for target localization, IEEE Transactions

on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp.527533, 1981.

[9] G. C. Carter, Ed., Coherence and Time Delay Estimation: AnApplied Tutorial for Research, Development, Test and EvaluationEngineers, IEEE Press, New York, NY, USA, 1993.

[10] M. Feder and E. Weinstein, Parameter estimation of super-imposed signals using the EM algorithm, IEEE Transactionson Acoustics, Speech, and Signal Processing, vol. 36, no. 4, pp.477489, 1988.

[11] G. Su and M. Morf, The signal subspace approach for multi-ple wide-band emitter location, IEEE Transactions on Acous-tics, Speech, and Signal Processing, vol. 31, no. 6, pp. 15021522, 1983.

[12] S. S. Reddi, Multiple source locationa digital approach,

IEEE Transactions on Aerospace and Electronic Systems, vol. 15,no. 1, pp. 95105, 1979.

[13] T. G. Manickam, R. J. Vaccaro, and D. W. Tufts, A least-squares algorithm for multipath time-delay estimation, IEEETransactions on Signal Processing, vol. 42, no. 11, pp. 32293233, 1994.

[14] J.-J. Fuchs, Multipath time-delay detection and estimation,IEEE Transactions on Signal Processing, vol. 47, no. 1, pp. 237243, 1999.

[15] J. Benesty, Adaptive eigenvalue decomposition algorithm forpassive acoustic source localization, Journal of the AcousticalSociety of America, vol. 107, no. 1, pp. 384391, 2000.

[16] S. Doclo and M. Moonen, Robust adaptive time delay estima-tion for speaker localization in noisy and reverberantacoustic

environments, EURASIP Journal on Applied Signal Processing,vol. 2003, no. 11, pp. 11101124, 2003.

[17] T. G. Dvorkind and S. Gannot, Approaches for time differ-ent of arrival estimation in a noisy and reververant environ-ment, in Proceedings of International Workshop on AcousticEcho and Noise Control (IWAENC 03), pp. 215218, Kyoto,Japan, September 2003.

[18] J. C. Hassab and R. E. Boucher, Performance of the general-ized cross correlator in the presence of a strong spectral peakin the signal, IEEE Transactions on Acoustics, Speech, and Sig-nal Processing, vol. 29, no. 3, pp. 549555, 1981.

[19] L. E. Miller and J. S. Lee, Error analysis of time delay estima-tion using a finite integration time correlator, IEEE Transac-tions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3,pp. 490496, 1981.

[20] J. P. Ianniello, Time delay estimation via cross-correlation inthe presence of large estimation errors, IEEE Transactions on


[21] M. Azaria and D. Hertz, Time delay estimation by general-ized cross correlation methods, IEEE Transactions on Acous-

tics, Speech, and Signal Processing, vol. 32, no. 2, pp. 280285,1984.

[22] Y. Bar-Shalom, F. Palmieri, A. Kumar, and H. M. Shertukde,Analysis of wide-band cross correlation for time-delay esti-mation, IEEE Transactions on Signal Processing, vol. 41, no. 1,pp. 385387, 1993.

[23] J. K. Tugnait, Time delay estimation with unknown spatiallycorrelated Gaussian noise, IEEE Transactions on Signal Pro-cessing, vol. 41, no. 2, pp. 549558, 1993.

[24] Y. Wu, Time delay estimation of non-Gaussian signal in un-known Gaussian noises using third-order cumulants, Elec-tronics Letters, vol. 38, no. 16, pp. 930931, 2002.

[25] Y. (Arden) Huang and J. Benesty, A class of frequency-domainadaptive approaches to blind multichannel identification,IEEE Transactions on Signal Processing, vol. 51, no. 1, pp. 1124, 2003.

[26] F. A. Reed, P. L. Feintuch, and N. J. Bershad, Time delay es-timation using the LMS adaptive filterstatic behavior, IEEETransactions on Acoustics, Speech, and Signal Processing, vol. 29,no. 3, pp. 561571, 1981.

[27] D. M. Etter and S. D. Stearns, Adaptive estimation of time de-lays in sampled data systems, IEEE Transactions on Acoustics,Speech, and Signal Processing, vol. 29, no. 3, pp. 582587, 1981.

[28] D. H. Youn, N. Ahmed, and G. C. Carter, On using the LMSalgorithm for time delay estimation, IEEE Transactions on


[29] P. C. Ching and Y. T. Chan, Adaptive time delay estimationwith constraints, IEEE Transactions on Acoustics, Speech, and

Signal Processing, vol. 36, no. 4, pp. 599602, 1988.[30] H. C. So, P. C. Ching, and Y. T. Chan, A new algorithm for

explicit adaptation of time delay, IEEE Transactions on SignalProcessing, vol. 42, no. 7, pp. 18161820, 1994.

[31] P. P. Moghaddam, H. Amindavar, and R. L. Kirlin, A newtime-delay estimation in multipath, IEEE Transactions on Sig-nal Processing, vol. 51, no. 5, pp. 11291142, 2003.

[32] J. P. Ianniello, Large and small error performance limits formultipath time delay estimation, IEEE Transactions on Acous-tics, Speech, and Signal Processing, vol. 34, no. 2, pp. 245251,1986.

[33] J. C. Hassab, Contact localization and motion analysis in theocean environment: a perspective, IEEE Journal of OceanicEngineering, vol. 8, no. 3, pp. 136147, 1983.


18/19


[34] F. El-Hawary, F. Aminzadeh, and G. A. N. Mbamalu, Thegen-eralized Kalman filter approach to adaptive underwater targettracking, IEEE Journal of Oceanic Engineering, vol. 17, no. 1,pp. 129137, 1992.

[35] C. S. Clay and H. Medwin, Acoustical Oceanography, John Wi-ley & Sons, New York, NY, USA, 1977.

[36] A. Stephenne and B. Champagne, Cepstral prefiltering for

time delay estimation in reverberant environments, in Pro-ceedings of IEEE International Conference on Acoustics, Speech,and Signal Processing (ICASSP 95), vol. 5, pp. 30553058, De-troit, Mich, USA, May 1995.

[37] M. S. Brandstein and H. F. Silverman, A robust method forspeech signal time-delay estimation in reverberant rooms,in Proceedings of IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP 97), vol. 1, pp. 375378,Munich, Germany, April 1997.

[38] T. G. Dvorkind and S. Gannot, Time difference of arrival es-timation of speech source in a noisy and reverberant environ-ment, Signal Processing, vol. 85, no. 1, pp. 177204, 2005.

[39] Y. (Arden) Huang and J. Benesty, Adaptive multichanneltime delay estimation based on blind system identification foracoustic source localization, in Adaptive Signal Processing

Applications to Real-World Problems , J. Benesty and Y. (Arden)Huang, Eds., chapter 8, pp. 227248, Springer, Berlin, Ger-many, 2003.

[40] G. Jacovitti and G. Scarano, Discrete time techniques fortime delay estimation, IEEE Transactions on Signal Processing,vol. 41, no. 2, pp. 525533, 1993.

[41] G. Jacovitti, A. Neri, and R. Cusani, On a fast digital methodof estimating the autocorrelation of a Gaussian stationary pro-cess, IEEE Transactions on Acoustics, Speech, and Signal Pro-cessing, vol. 32, no. 5, pp. 968976, 1984.

[42] G. Jacovitti and R. Cusani, An efficient technique forhigh cor-relation estimation, IEEE Transactions on Acoustics, Speech,and Signal Processing, vol. 35, no. 5, pp. 654660, 1987.

[43] J. Chen, J. Benesty, and Y. (Arden) Huang, Performance of

GCC- and AMDF-based time-delay estimation in practical re-verberant environments, EURASIP Journal on Applied SignalProcessing, vol. 2005, no. 1, pp. 2536, 2005.

[44] G. C. Carter, A. H. Nuttall, and P. G. Cable, The smoothedcoherence transform, Proceedings of the IEEE, vol. 61, no. 10,pp. 14971498, 1973.

[45] P. R. Roth, Effective measurements using digital signal analy-sis, IEEE Spectrum, vol. 8, no. 4, pp. 6270, 1971.

[46] H. Wang and P. Chu, Voice source localization for automaticcamera pointing system in video conferencing, in Proceedingsof IEEE International Conference on Acoustics, Speech, and Sig-nal Processing (ICASSP 97), vol. 1, pp. 187190, Munich, Ger-many, April 1997.

[47] P. L. Feintuch, N. J. Bershad, and F. A. Reed, Time delay es-timation using the LMS adaptive filterdynamic behavior,

IEEE Transactions on Acoustics, Speech, and Signal Processing,vol. 29, no. 3, pp. 571576, 1981.

[48] S. Haykin, Radar array processing for angle of arrival estima-tion, in Array Signal Processing, S. Haykin, Ed., pp. 194292,Prentice-Hall, Englewood Cliffs, NJ, USA, 1985.

[49] R. L. Kirlin, D. F. Moore, and R. F. Kubichek, Improvementof delay measurements from sonar arrays via sequential stateestimation, IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. 29, no. 3, pp. 514519, 1981.

[50] T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, Lo-calization of multiple sound sources based on a CSP analysiswitha microphone array, in Proceedings of IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP00), vol. 2, pp. 10531055, Istanbul, Turkey, June 2000.

[51] S. M. Griebel and M. S. Brandstein, Microphone array sourcelocalization using realizable delay vectors, in Proceedings ofIEEE Workshop on the Applications of Signal Processing to Audioand Acoustics (WASPAA 01), pp. 7174, New Platz, NY, USA,October 2001.

[52] J. H. DiBiase, H. F. Silverman, and M. S. Branstein, Robustlocalization in reverberant rooms, in Microphone Arrays: Sig-

nal Processing Techniques and Applications, M. S. Branstein andD. B. Ward, Eds., chapter 8, pp. 157180, Springer, New York,NY, USA, 2001.

[53] J. Chen, J. Benesty, and Y. (Arden) Huang, Robust time de-lay estimation exploiting redundancy among multiple mi-crophoens, IEEE Transactions on Speech and Audio Processing,vol. 11, no. 6, pp. 549557, 2003.

[54] J. Benesty, J. Chen, and Y. (Arden) Huang, Time-delay esti-mation via linear interpolation and cross correlation, IEEETransactions on Speech and Audio Processing, vol. 12, no. 5, pp.509519, 2004.

[55] Y. (Arden) Huang, J. Benesty, and G. W. Elko, Adaptive eigen-value decomposition algorithm for real time acoustic sourcelocalization system, in Proceedings of IEEE International Con-

ference on Acoustics, Speech, and Signal Processing (ICASSP99), vol. 2, pp. 937940, Phoenix, Ariz, USA, March 1999.

[56] G. Xu, H. Liu, L. Tong, and T. Kailath, A least-squares ap-proach to blind channel identification, IEEE Transactions onSignal Processing, vol. 43, no. 12, pp. 29822993, 1995.

[57] H.-F. Chen, X.-R. Cao, and J. Zhu, Convergence of stochastic-approximation-based algorithms for blind channel identifica-tion, IEEE Transactions on Information Theory, vol. 48, no. 5,pp. 12141225, 2002.

[58] M. I. Gurelli and C. L. Nikias, EVAM: an eigenvector-basedalgorithm for multichannel blind deconvolution of input col-ored signals, IEEE Transactions on Signal Processing, vol. 43,no. 1, pp. 134149, 1995.

[59] L. Tong and S. Perreau, Multichannel blind identification:

from subspace to maximum likelihood methods, Proceedingsof the IEEE, vol. 86, no. 10, pp. 19511968, 1998.

[60] Y. (Arden) Huang and J. Benesty, Adaptive multi-channelleast mean square and Newton algorithms for blind channelidentification, Signal Processing, vol. 82, no. 8, pp. 11271138,2002.

[61] H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus,Real-valued fast Fourier transform algorithms, IEEE Trans-actions on Acoustics, Speech, and Signal Processing, vol. 35,no. 6, pp. 849863, 1987.

[62] L. Fox, An Introduction to Numerical Linear Algebra, Claren-don Press, Oxford, UK, 1964.

[63] R. E. Boucher and J. C. Hassab, Analysis of discrete imple-mentation of generalized cross correlator, IEEE Transactions

on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp.609611, 1981.

[64] R. Moddemeijer, On the determination of the position ofextrema of sampled correlators, IEEE Transactions on SignalProcessing, vol. 39, no. 1, pp. 216219, 1991.

[65] J. B. Allen and D. A. Berkley, Image method for efficientlysimulating small-room acoustics, Journal of the Acoustical So-ciety of America, vol. 65, no. 4, pp. 943950, 1979.

[66] B. Champagne, S. Bedard, and A. Stephenne, Performance oftime-delay estimation in the presence of room reverberation,IEEE Transactions on Speech and Audio Processing, vol. 4, no. 2,pp. 148152, 1996.

[67] T. Gustafsson, B. D. Rao, and M. Trivedi, Source localiza-tion in reverberant environments: modeling and statistical


19/19


analysis, IEEE Transactions on Speech and Audio Processing,vol. 11, no. 6, pp. 791803, 2003.

Jingdong Chen received the B.S. degree inelectrical engineering and the M.S. degreein array signal processing from the North-

western Polytechnic University in 1993 and1995 respectively, and the Ph.D. degree inpattern recognition and intelligence controlfrom the Chinese Academy of Sciences in1998. From 1998 to 1999, he was with ATRInterpreting Telecommunications ResearchLaboratories, Kyoto, Japan. He then joinedthe Griffith University, Brisbane, Australia, as a Research Fellow.From 2000 to 2001, he worked at ATR Spoken Language Trans-lation Research Laboratories, Kyoto, Japan. He joined Bell Labo-ratories as a Member of Technical Staff in July 2001. His researchinterests include adaptive signal processing, speech enhancement,adaptive noise/echo cancellation, and microphone array process-ing. He coauthored one monograph book and coauthored/coedited

one edited book.Jacob Benesty received the Masters de-gree in microwaves from Pierre and MarieCurie University, France, in 1987, and thePh.D. degree in control and signal process-ing from Orsay University, France, in April1991. From January 1994 to July 1995, heworked at Telecom Paris University. FromOctober 1995 to May 2003, he was first aConsultant and then a Member of the Tech-nical Staffat Bell Laboratories, Murray Hill,NJ, USA. In May 2003, he joined the University of Quebec, INRS-EMT, in Montreal, Quebec, Canada, as an Associate Professor. Hisresearch interests are in acoustic signal processing and multimedia

communications. Dr. Benesty received the 2001 Best Paper Awardfrom the IEEE Signal Processing Society. He was a Member of theeditorial board of the EURASIP Journal on Applied Signal Process-ing and was the cochair of the 1999 International Workshop onAcoustic Echo and Noise Control. He coauthored two books. Healso coedited/coauthored four other books.

Yiteng (Arden) Huang Huang received theB.S. degree from the Tsinghua Universityin 1994, the M.S. and Ph.D. degrees fromthe Georgia Institute of Technology (Geor-gia Tech) in 1998 and 2001, respectively,all in electrical and computer engineering.Now he is a Member of technical staffat BellLabs, where he conducts research in acous-

tic and speech signal processing for multi-media communications. Dr. Huang is cur-rently an Associat Editor of the EURASIP Journal on Applied Sig-nal Processing. He is a Member of the Signal Processing Theoryand Methods and the Audio and Electroacoustics Technical Com-mittees of the IEEE Signal Processing Society. He served as an As-sociat Editor for the IEEE Signal Processing Letters from 2002 to2005. He was a technical cochair of the 2005 Joint Workshop onHands-Free Speech Communication and Microphone Array. Hecoauthored one monograph book and coauthored/coedited twoother edited books. He received the 2002 Young Author Best Pa-per Award from the IEEE Signal Processing Society, and a numberof other awards/honors for his academic performance and services.

Date post:	06-Apr-2018
Category:	Documents
Upload:	tun-tun-naing
View:	228 times
Download:	0 times

TDE Overview

Documents