AuthorPeter Schober
SubmissionInstitute of
Signal Processing
Thesis SupervisorUniv.-Prof. Dr. Mario Huemer
Assistant Thesis SupervisorDI Andreas Gaich
October 2017
JOHANNES KEPLER
UNIVERSITY LINZ
Altenbergerstraße 694040 Linz, Austriawww.jku.atDVR 0093696
Source Localization with
a Small Circular Array
Bachelor’s Thesis
to confer the academic degree of
Bachelor of Science
in the Bachelor’s Program
Electronics and Information Technology
Contents
1 Introduction 21.1 Description of the Microphone Array . . . . . . . . . . . . . . 21.2 Coordinate System, Direction of Arrival . . . . . . . . . . . . 41.3 Near-Field and Far-Field . . . . . . . . . . . . . . . . . . . . . 4
2 Sound Source Localizer 52.1 Preprocessing the Microphone Signals . . . . . . . . . . . . . 6
2.1.1 Framing . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Discrete Fourier Transform . . . . . . . . . . . . . . . 6
3 One Frame Direction of Arrival Estimation 73.1 Based on Pairwise Time Delay Estimation . . . . . . . . . . . 7
3.1.1 Cross-Correlation . . . . . . . . . . . . . . . . . . . . . 83.1.2 Generalized Cross-Correlation . . . . . . . . . . . . . 93.1.3 Weighting Function ψ . . . . . . . . . . . . . . . . . . 93.1.4 DOA Estimation with Root Mean Square Error Mini-
mization . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.5 Azimuth Estimation Closed Form . . . . . . . . . . . . 11
3.2 Based on Steered Response Power . . . . . . . . . . . . . . . . 133.2.1 SRP-PHAT . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Computational Costs 154.1 Time Difference of Arrival . . . . . . . . . . . . . . . . . . . . 164.2 Root Mean Square Error Minimization . . . . . . . . . . . . . 174.3 Closed Form Azimuth Estimation . . . . . . . . . . . . . . . . 174.4 Steered Response Power . . . . . . . . . . . . . . . . . . . . . 18
5 Simulation 195.1 Standardized Library of Audio Files . . . . . . . . . . . . . . . 215.2 Performance Measures, Standard Deviation and Average . . . 21
5.2.1 Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . 215.3 Unknown Measurement Uncertainty . . . . . . . . . . . . . . 215.4 Performance Evaluation for 0 Degree . . . . . . . . . . . . . . 24
5.4.1 Performance Comparison . . . . . . . . . . . . . . . . 265.4.2 Dominant Reflection from the Wall Behind . . . . . . 30
5.5 Performance Evaluation for 15 Degree . . . . . . . . . . . . . 305.6 Estimation of the Radius in Closed Form Algorithm . . . . . 31
6 Conclusion 33
7 Appendix 347.1 Mathematic Symbol Reference . . . . . . . . . . . . . . . . . . 347.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1
1 Introduction
The need for locating sound sources comes with electronic devices inter-acting with humans. Before analysing the meaning of spoken words it isuseful to let the device locate the speaker, e.g. for electronically listeningto this direction. This thesis takes a closer look on Amazon Echo, an au-dio gadget which offers a voice-controlled personal assistant called Alexa.The device captures the sound and transmits it to a server via internet.There, algorithms analyse the spoken words and are capable of respondingto food orders, weather forecast, music playback, traffic and many others.To enable all of these interactions it is crucial to send clear voice signals,to mitigate other speakers, suppressing noise and limiting the negative ef-fect of reverberation. These signal processing steps can be done when thesource direction is known. In this thesis, source localization starts withsound waves being captured by multiple microphones arranged in a array.The microphone array used, is similar to the one integrated in AmazonEcho (Figure 1). With knowledge of the spatial distribution of the micro-phones, signal processing is done on short audio frames to provide highupdate rates on the source location. The performance of this techniquegenerally improves with computational power and amount of data beingused. However, this decreases update rates and increases latency. The goalof this bachelor thesis is to compare the performances of three, a low costclosed-form and two search based, algorithms with different reverberationsand sensor noises.
1.1 Description of the Microphone Array
In this thesis, a circular array with 6 microphones placed around one cen-trally located microphone is used (Figure 2).
2
Figure 1: Amazon Echo Dot [1]
359,32°
59,66°120,63°
181,21°
240,58° 299,59°
38,89 mm
38,18 mm
38,23 mm
38,91 mm
39,65 mm39.26 mm
x
y
m1
m2
m3m4
m5
m6m7
Figure 2: Circular array with one centrally located microphone
3
For reason of implementation, the 6 circular arranged microphones aretreated as uniform distributed with angle ym = 2πi
6 , with index i = m − 2from 0 to 5 and a constant radius r = 0.039m.
1.2 Coordinate System, Direction of Arrival
In this thesis, a three-dimensional coordinate system is being used. Thecenter of the microphone array is the origin and all microphones lie inxy-plane. Figure 3 shows the direction of arrival (DOA) vector ~ς pointsfrom the origin to the source and is defined in spherical coordinates withelevation θ and azimuth ϕ.
~ς =
cosθ · cosϕcosθ · sinϕ
sinθ
(1)
In the far-field (Section 1.3) the DOA vector is equivalent for each mi-crophone, because the wave fronts appear planar.
x
y
z
DOA
θ
φ
microphones
Figure 3: Definition of the DOA vector in terms of azimuth ϕ and elevationθ
1.3 Near-Field and Far-Field
Sound waves propagate over a sphere centered around the source. An ob-server close to the radiator sees a curvature of arriving waves. For an ob-server far from the source, waves appear planar (Figure 4). This is callednear- and far-field behavior. The curvature of the wave has to be taken into
4
account for sources close to the microphone array. With increasing dis-tance, the error being made by assuming planar waves decreases. It shouldbe mentioned that estimating the range from the microphone array to thesource is not possible in the far-field.
The phase delay for the microphone m with its coordinates ~pm and asource located at ~ps can be modelled as
dm(f , ~ps, ~pm) = exp(−j2πf|~ps − ~pm|
v). (2)
This is always valid, even for sources far away.In the far-field Equ. 2 is transformed to a function of the DOA ~ς(ϕ,θ).
dm(f , ~ς,~pm) = exp(−j2πf|~pm|c
cosθ cos(ϕ − ym)) (3)
In Equ.3 a planar microphone array is assumed with ym being azimuth ofmicrophone m.
Microphon ArraySource
Figure 4: Plane wave approximation for a source far away
2 Sound Source Localizer
Sound Source Localization (SSL) is usually processed in frequency domain(Figure 5). To start with, SSL begins by segmenting the discrete micro-phone signals xm[n], with microphone index m from 1 to M, in blocks(frames) of length N , using a window function (Chapter 2.1.1). The no-tation for one block of a single microphone channel is x(b)
m [n], with blockindex b from 1 to B. Subsequently, the discrete Fourier transform (Chap-ter 2.1.2) is applied to each block. Following that, a SSL algorithm op-erates on each block, when the voice-activity-detector detects valid input.Eventually, the one-frame-SSL block outputs the estimated direction ac-companied by a confidence level. By averaging over multiple one-frame-estimations or using knowledge about the room geometry and the array
5
position, a SSL-post-processor can enhance the precision of the estimation[2].
Framing,weighting,FFT
One frame SSL
Voiceactivitydetector
SSL postprocessor
Previousmeasurements
X(k)x(n)
pS(φ,θ,r)→ Ʌ Ʌ Ʌ
M M
M
Figure 5: Block diagram of a Sound Source Localizer
2.1 Preprocessing the Microphone Signals
2.1.1 Framing
SSL in real-time requires segmenting the microphone inputs xm[n] to blocksof length N and to process each block after another.
x(b)m [n] = w[n]x[bA+n] for m = 1 . . .M,n = 0 . . .N − 1
The blocks x(b)m [n] usually overlap. In this thesis the overlap rate is 50%,
i.e. A = 12N . Choosing the right window function w[n] and setting an ap-
propriate length N is important for detecting frequency components closeto each other (high spectral resolution). In this thesis the Hanning window(Equ. 4) is used.
wH [n] ={ci · 0.5[1− cos( 2πn
N−1 )] : 0 ≤ n ≤N0 : else
(4)
The window length N is a trade off between responsiveness and accu-racy. A higher block size also increase computational demands. [3]
2.1.2 Discrete Fourier Transform
The discrete Fourier transform is defined as follows
X(b)m [k] =
N−1∑n=0
x(b)m [n]exp(−jk2π
Nn) (5)
=N−1∑n=0
x(b)m [n]cos(kn
2πN
) + jn=0∑n−1
x(b)m [n](−1)sin(kn
2πN
), (6)
6
where N determines the DFT length with 0 ≤ k ≤N −1. After rewriting(Equ. 5) using Euler (Equ. 6) one can see that coefficient k can be inter-preted as a correlation between signal x(b)
m [n] and a sinus and cosine.For localizing speech sources, the bandwidth of interest is typically be-
tween flow = 300Hz and fhigh = 6kHz. Therefore there are at least
N = flow∆Tmin= 300Hz · 20ms = 6,
N = 6 periods of each frequency component processed in one DFT. Thisis important to get higher peaks and less Spectral leakage on lower fre-quencies.
3 One Frame Direction of Arrival Estimation
The algorithms for SSL under investigation in this thesis can be dividedinto two- and one-step algorithms. Two-step methods are based on TimeDelay Estimation (TDE). They calculate pairwise displacements of micro-phone signals and combine them matching the spatial distribution of mi-crophones. In this thesis a closed-form (Section 3.1.5) and a search basedversion (Section 3.1.4) are discussed. One-step methods are based on thesteered response power (SRP). These algorithms form a beam and steer overpredefined directions. A delay-and-sum beamformer with PHAT weight-ing (SRP-PHAT) will be discussed in section 3.2.1.
3.1 Based on Pairwise Time Delay Estimation
The algorithms based on time delay estimation (TDE) use the sound wavepropagation speed to localize sources. Dependent on the source positionsrelative to the array, a sound wave reaches each spatial distributed mi-crophone with a certain delay. Several techniques are proposed as a sec-ond step after TDE to localize sources or Direction of Arrivals (DOA, Sec-tion 1.2) [4, 3, 2]. Time difference of arrival (TDOA) is calculated pairwise.A M-microphone array has N = M(M−1)
2 unique pairs. Selecting two mi-crophones m1, m2, a sound wave in the far-field reaches microphone m2(Figure 6) with delay
τd =dc
sinφ, (7)
where d represents the distance between the microphones and c is the ve-locity of propagation.
7
m1
m2
φ
d.sin(φ)
d
x
y
S
Figure 6: Time delay as indication for DOA [2, P. 273]
In R3, a cone of revolution with vertex ~0 marks all DOAs producing thesame time delay τd . By calculating the intersections of cones, a number ofpossible DOAs is obtained.
3.1.1 Cross-Correlation
The most intuitive approach for TDOA estimation is cross-correlation. Fordiscrete functions the cross-correlation is defined as
c12[n] =](x1 ? x2)[n] def=∞∑
m=−∞x∗1[m]x2[m+n]. (8)
With the DFT property x∗[−m] c sX∗[k] and time shift theorem x[n −m] c sX(k)exp(−jk 2π
N m), Equ.8 can be Fourier transformed to the frequencydomain.
8
DFT [c12[n]] =N−1∑n=0
∞∑m=−∞
x∗1[m]x2[m+n]exp(−jk2πNn)
=N−1∑n=0
∞∑m=−∞
x∗1[−m]x2[−m+n] substitute m = m-1
=∞∑
m=−∞x∗1[−m]
N−1∑n=0
x2[−m+n]
=∞∑
m=−∞x∗1[−m]X2[k]exp(−jk2π
Nm) Time Shift Theorem
=∞∑m=0
x∗1[−m]exp(−jk2πNm)X2[k] DFT symmetric and x1 is causal
= X∗1[k]X2[k] (9)
This gives the Fourier transform pair x1 ? x2c sX1
∗X2.
3.1.2 Generalized Cross-Correlation
Experience shows that the performance of TDE through cross-correlationdecreases rapidly with reverberation and noise. Various techniques havebeen proposed to overcome this disadvantage. The generalized cross-correlation(GCC) [5] introduces a weighting function into Equ. 9.
r12[n] = = iDFT [ψ ·X1X2∗] (10)
= iDFT [ψ ·GX1X2] (11)
=1K
K−1∑k=0
ψ ·GX1X2(k)exp(jk
2πNn) (12)
τd = argmaxτd∈D
r12[n]1fs
(13)
D = {τ ∈ R| − dc ≤ τ ≤dc } restricts the search interval to possible time delays.
r12 should exhibit a peak at time delay τd . The intensity and uniqueness ofthe peak depends on the source signal, reverberation, noise and ψ.
3.1.3 Weighting Function ψ
A weighting function is motivated by the performance loss of the DOAestimation in reverberant and noisy surroundings. Assuming the far-field
9
sound model and that s(t) is uncorrelated with noise n1(t), n2(t)
x1(t) = s(t) +n1(t) (14)
x2(t) = s(t + τd,i) +n2(t)
X1(f ) = S(f ) +N1(f )
X2(f ) = S(f )exp(−j2πf τd,i) +N2(f ),
the power spectral densities GX1X1(f ),GX2X2
(f ) and the Cross-Power spec-tral density GX1X2
are given as
GX1X1(f ) = X1(f )X1(f ) (15)
= S(f )S(f ) + S(f )N1(f ) +N1(f )N1(f ) (16)
= GSS(f ) +GN1N1(f ) (17)
GX1X2(f ) = αGSS(f )exp(−j2πf τd,i) +GN1N2
. (18)
The multiplication exp(−j2πf τd) (Equ. 18) corresponds to a convolutionin the time domain. Hence, multiple peaks will be detected in the cross-correlation function.
c12(τ) = cSS(τ) ∗∑i
αiδ(τ − τd,i) (19)
Equation 19 results of Equ. 11, when setting ψ = 1, using Equ. 18, consider-ing only reverberation τd,i and noise power GN1N2
= 0. The peaks in r12(τ)lower with rising noise power GN1N2
, 0. Choosing the right weightingfunction ψ is crucial to the performance.
PHAT: The phase transform (PHAT) is know to perform well in realis-tic environments. It is robust to reverberation, but sub-optimal underreverberation-free conditions. It is defined as follows [5]:
ψPHAT (f ) =1
|GX1X2(f )|
. (20)
For the sound model with uncorrelated noise (Equ. 14), Cross-Power spec-tral density (Equ. 18) and the definition of the GCC (Equ. 11), r12(τ) showspeaks at τd,i .
r12(τ) = iDFT [GX1X2
(f )|GX1X2
(f )|] (21)
= iDFT [exp(j2πf τi)] (22)
= δ(t − τi) (23)
PHAT weighting is sensitive to noise because low power signals are em-phasized. Implementations of ψPHAT should be additionally weighted toprevent a division by zero, whenever the cross-power spectral density atf = fi is zero |GX1X2
(fi)| = 0.
10
Filter Weighting: The bandpass weighting function attenuates frequen-cies outside the band of interest. For speech detection ψ is defined as
ψbandpass(f ) ={
1 : 300 ≤ f ≤ 60000 : else
. (24)
Since much of the noise power is in the low frequencies, a bandpass weight-ing is often used in addition to other weighting functions. [3]
3.1.4 DOA Estimation with Root Mean Square Error Minimization
The DOA can be estimated by calculating the TDOAs τij for all M(M−1)2
microphone pairs.
τij(ϕ,ϑ) =~ς · (~pmi
− ~pmj)
c(25)
=
cosθ · cosϕcosθ · sinϕ
sinθ
·xiyizi
−xjyjzj
1c
(26)
This can be done once for a defined set of ϕ and θ and is being stored inthe vector ~τ.
τ = [τ12 τ13 . . . τ1M τ23 τ24 . . . τ2M . . . τ(M−1)M ] (27)
The measured time delays
τ̂ij = argmaxτd∈D
rij [n]1fs
(28)
τ̂ = [τ̂12 τ̂13 . . . τ̂1M τ̂23 τ̂24 . . . τ̂2M . . . τ̂(M−1)M ], (29)
are compared to the vector in Equ.27. The source direction is located,where the root mean square (RMS) is minimal.
e(ϕ,θ) =√
(τ(ϕ,θ)− τ̂)ᵀ(τ(ϕ,θ)− τ̂) (30)
[ϕ̂, θ̂] = argminϕ,θ
E (31)
3.1.5 Azimuth Estimation Closed Form
For the following algorithm microphone m2 (Array from Section 1.1) is theorigin of the coordinate system. This does not effect the angle estimation,
11
because in the far-field all points within the array have the same DOA.Considering only microphonem2,m6 and the source, we spawn the triangle~0 - ~p6 - ~ps (Figure 7).
m3m4
m5
m7
x
y
Source
r6
rs
rs + d6
m2
m6
pS
p6
→
→
Figure 7: Triangle ~0 - ~p6 - ~ps used for Pythagorean
The Pythagorean theorem leads to
(rs + d6)2 = r26 + r2
s − 2 ~p6 · ~ps (32)
ε6 = r26 − d
26 − 2rsd6− 2 ~p6 · ~ps. (33)
In this equation, d6 = τ6cfs
is the range difference, τ6 is the TDOA computedwith GCC and ε is the equation error. Rewriting Equ. 33 for all micro-phones mi , with index i from 3 to M in matrix form, gives
ε = r2 −d− 2rsd−S~ps (34)
with
S =
x3 y3 z3x4 y4 z4
...xM yM zM
. (35)
Here, the error vector ε is linear with an unknown ~ps. The least squaressolution to Equ 34 is [6]
12
~ps =12
(ST S)−1ST (δ − 2rsd). (36)
It’s valid to disregard δ =[r23 − d
232 r2
4 − d242 . . . r2
M − d2M2
]ᵀ, because
of the far-field condition.
~ps ·1rs
= −(ST S)−1STd (37)
Utilizing a plane array the elevation estimate θ̂ = 0. The result is
~ps ·1rs
=
cos θ̂ · cos ϕ̂cos θ̂ · sin ϕ̂
sin θ̂
=
cos ϕ̂sin ϕ̂
0
, (38)
and the azimuth can be calculated with ϕ̂ = cos−1(ps,x1rs
).
3.2 Based on Steered Response Power
The one-step-algorithm group forms a beam and computes the receivedpower. In this thesis a delay-and-sum beamformer is used. The beam-former delays signal xi with ∆i , called steering-delay and sums over allmicrophone inputs (Figure 8). ∆i depends on the microphone position andthe current steering direction.
y(t,∆1...∆m) =M∑m=1
xm(t −∆m)
Y (f ,∆1...∆m) =M∑m=1
Xm(f )exp(−j2πf ∆i)
Delay Δ1
Delay ΔM-1
Delay Δ2
Output
x1(t)
x2(t)
xm-1(t)
x1(t- Δ1)
x2(t-Δ2)
xm-1(t- ΔM-1)+
Source
Figure 8: Delay-and-sum beamformer [7, S. 32]
13
3.2.1 SRP-PHAT
The power at the beamformer output is defined as
pBf (ϕ,θ,k) = dHSd, (39)
where d is the steering vector and represents the phase-delay of the m-th microphone for a given DOA and frequency. The phase delay is writtenas
dm(ϕ,θ,f ) = exp(2πjf rm
ccosθ cos(ϕ − ym)), (40)
for plane arrays with microphones only in x and y coordinate.
• ϕ,θ . . . azimuth and elevation.
• rm . . . radius length of vector to microphone m ~pm.
• f = kfsN . . . the frequency of current Fourier bin k and sample rate fs.
• c . . . speed of sound.
• ym. . . azimuth of microphone m, for the array described in section 1.1y = [2π 1
M 2π 2M . . .2π
M−1M ]
Sk = XkXkH is a MxM matrix with X1,k = X1[k].
Xk =
X1,kX2,k...
XM,k
(41)
The delay-and-sum beamformer outputs, one for each frequency, arecombined in terms of PHAT weighting.
PSSL(ϕ,θ) =1N
N∑k=1
M
|XkHXk|
pBf (ϕ,θ,k) (42)
(43)
The source direction is determined by the maximum of PSSL.
[ϕ̂, θ̂] = argmaxϕ,θ
PSSL (44)
14
4 Computational Costs
There are several ways to compare algorithms in terms of their computa-tional costs. One possibility is the amount of multiplications (MULTs) andadditions (ADDs) needed during one DOA estimation.
For simplicity and clarity;
• Divisions will be treated as MULTs.
• Complex MULTs take four real MULTs and two real ADDs and com-plex ADDs take two real ADDs.
• Comparison of two numbers and subtractions will be treated as ad-dition.
• The following operations will be ignored in cost computation: negat-ing, matrix transpose, evaluation of exponential function, conjugat-ing complex numbers.
• Preparations, i.e. calculations which can be done preemptively willnot be taken into account.
• All algorithms use PHAT weighting and the GCC is without interpo-lation.
Table 1: Variables used in computational cost calculationsDFT Length K 2048
Frame Length N 512Microphones M 6
Microphone Pairs P 15Search Steps D 71
Valid GCC pins L 9
A matrix MULT can be broken down into MULTs and ADDs:
• Aaxb ·Bbxa
– MULTs : a2b
– ADDs: a2(b − 1)
• Aaxb ·Bbx1
– MULTs: ab
– ADDs: a(b − 1)
15
All algorithms require frequency-domain processing. The (inverse) fastFourier Transform algorithm is an efficient way to compute the (inverse)discrete Fourier Transform. It takes [8]
• ADDs: In total 3K log2K , for complex MULTs and ADDs.
• MULTs: 2K log2K
4.1 Time Difference of Arrival
Estimating the TDOA requires
1. FFT for M microphones.
2. calculating the cross power spectrum (Equ. 11). K complex MULTsfor P pairs of microphones.
3. PHAT weighting; normalizing the magnitude costs K MULTs for Ppairs.
4. iFFT for P microphone pairs.
5. a maximum search, comparing 9 pins and scaling (Equ 13).
Table 2: TDOA estimation costsFFT
MULT M · 3K log2K 4.06 · 105
ADD M · 2K log2K 2.70 · 105
Cross-Power SpectrumMULT 4P K 1.23 · 105
ADD 2P K 6.14 · 104
Phase TransformMULT P K 3.07 · 104
iFFTMULT P · 3K log2K 1.01 · 106
ADD P · 2K log2K 6.76 · 105
Maximum searchMULT P 1.5 · 101
ADD P L 1,35 · 102
SumMULT 1.57 · 106
ADD 1.01 · 106
16
4.2 Root Mean Square Error Minimization
The RMS minimization algorithm requires
1. TDOA estimation for all P microphone pairs.
2. generating the error vector. The root calculation will be ignored, be-cause removing it would not effect the outcome of the search process.
• MULTs: ∀ D search steps, P.
• ADDs: ∀ D search steps, P subtractions. The dot product of twovectors requires D(P − 1).
3. a minimum search.
Table 3: RMS minimization PHAT costsTDOAMULT 1.57 · 106
ADD 1.01 · 106
Generating The Error Vector e(ϕ,θ) =√
(τ(ϕ,θ)− τ̂)ᵀ(τ(ϕ,θ)− τ̂)MULT DP 1.07 · 103
ADD D(2P − 1) 2.06 · 103
Minimum search [ϕ̂, θ̂] = argminϕ,θ EADD D 7.10 · 101
SumMULT 1.57 · 106
ADD 1.01 · 106
4.3 Closed Form Azimuth Estimation
The closed-form algorithm requires
1. TDOA for five microphone pairs. Six FFT and five iFFT.
2. one A(M−1)x(M−1) Matrix Inverse. The costs are about (M−1)3
3 MULTsand ADDs [9].
3. three matrix operations. The cos−1 calculation will be neglected.
17
Table 4: Closed-form PHAT costsTDOA for five pairsMULT 7.95 · 105
ADD 5.16 · 105
~ps · 1rs
= −(ST S)−1STd
A3x(M−1) ·B(M−1)x3 and A3x3 ·B3x(M−1)
MULT 2 · 32(M − 1) 9.00 · 101
ADD 2 · 32(M − 2) 7.20 · 101
A3x(M−1) ·B(M−1)x1
MULT 3(M − 1) 1.50 · 101
ADD 3(M − 2) 1.20 · 101
InverseMULT (M − 1)3 1.25 · 102
ADD (M − 1)3 1.25 · 102
SumMULT 7.95 · 105
ADD 5.16 · 105
4.4 Steered Response Power
The SRP-PHAT algorithm requires calculating the
1. FFT for M microphones.
2. Cross-power-spectrum for all P pairs.
3. beamformer output power. ∀ D, ∀ K, two matrix operations: A1xM ·BMxM ·CMx1.
4. PHAT weighting. ∀ D, ∀ K, one dot product A1xM · BMx1 and onescaling.
5. sum of all frequency pins and maximum search.
18
Table 5: SRP PHATFFTMULT 4.06 · 105
ADD 2.70 · 105
Cross-Power Spectrum Sk = XkXkH
MULT 4P K 1.23 · 105
ADD 2P K 6.14 · 104
Beamformer output power pBf (ϕ,θ,k) = dHSdADD DK(M2 +M) 6.11 · 106
MULT DK(M(M − 1) +M − 1) 5.09 · 106
PHAT p?Bf (ϕ,θ,k) = MXk
HXkpBf (ϕ,θ,k)
ADD DK(M + 1) 1.02 · 106
MULT DK(M − 1) 7.27 · 105
Sum and search [ϕ̂, θ̂] = argmaxϕ,θ1N
∑Kk=1p
?Bf (ϕ,θ,k)
ADD D(K − 1) +D 1.45 · 105
SumMULT 7.65 · 106
ADD 1.2 · 106
5 Simulation
Although it would be possible for RMS minimization and SRP-PHAT toperform a 2D DOA localization, this thesis will focus on the azimuth esti-mation only. In general, a plane, circular microphone array is not effectivein 2D estimations, but it performs well in 360° localization. Estimatingelevation with the closed-form algorithm presented in section 3.1.5 wouldneed a 3D microphone array geometry.
Settings of the Simulation: Sound Pressure Level (SPL) is a logarithmicentity of an effective pressure of a sound relative to the reference valuep0 = 20µPa.
Lp = 20log10p
p0dB
The source in this simulation radiates with 60 dB SPL at 1 metre dis-tance. As all electronic devices are noisy, the microphones are simulatedwith a sensor SNR between 60 and 90 dB. The relations are shown in Fig-ure 9.
19
94
0
db SPL
4
34
60
60-90 Sensor SNR
Noise Floor of Microphonewith 60 db Sensor SNR
Noise Floor of Microphonewith 90 db Sensor SNR
26 db SNR
56 db SNR
Input Level
Reference SPL
Figure 9: Resulting input SNR with min and max sensor SNR
The impact of reverberation on the estimations are simulated with Trev =0/300/600ms. Table 6 summarizes the simulation parameters.
Table 6: Simulation settingsSPL Source 60 dBInput SNR 26 to 56 dB
Reverberation 0/300/600 msGrid for search algorithms 0,5,10,. . . 355 degrees
Interpolation of the GCC: The speed of sound is approximately 343ms−1.With this speed, a sound wave will travel 80mm in 233µs, which is 3.73sampling periods with fs = 16kHz. This means that there are 9 valid pinswhere the GCC function (secion 3.1.2) can reach a maximum. To be moreprecisely, the estimated time delay could only accept tdelay = n
fsfor −4 ≤
n ≤ 4. To overcome this constraint, parabolic interpolation is used in theRMS minimization and closed-form algorithm (Section 3.1.4 and 3.1.5).This significantly decreases the unknown measurement uncertainty (Sec-tion 5.3 table 7) but increases the computational power by ≈ 180%.
20
5.1 Standardized Library of Audio Files
Simulations are done using an audio library consisting of high-quality au-dio recordings called Grid corpus [10]. In Total, 50 files with differentspeakers, male and female, are used to provide meaningful results.
5.2 Performance Measures, Standard Deviation and Average
Statistical averages are often used as performance measures. Let X be aRandom Variable (RV), e.g. a simulation with outcome xi and mean valueµ:
E[X] = µ =1N
n∑i=1
xi .
E denotes the average or expected value of X. The standard deviation of aRV is a measurement of how much it fluctuates around its mean:
σX =√var(X) =
√E[(X −µ)2] =
√√√1N
N∑i=1
(xi −µ)2.
5.2.1 Error Rate
In general, the error rate is the percentage of the estimation with errorgreater or equal to some threshold, plotted as function of that threshold.The azimuth error rate will be used to evaluate the performance of theDOA estimation algorithms.
5.3 Unknown Measurement Uncertainty
The TDOA estimation improves whenever the microphone array is sym-metric with respect to the DOA vector. For DOAs matching this condition,the propagation time of sound waves is pairwise the same, setting the res-olution error through sampling to zero. For that reason, simulations withzero and 15° will be discussed. Uncertainty is maximal for 15° and minimalfor e.g. 0 and 30 degrees (Section 5.3). If a source is simulated with DOAs~ς =
[cosϕ sinϕ 0
]ᵀ, ϕ = 0,1, . . .30° for high SNR and no reverberation
(ideal circumstances), the maximal unknown measurement uncertainty isobtained (Figure 10 and 11).
21
x
y
m1
m2
m3m4
m5
m6m7
φ
30°
Figure 10: Azimuth range for estimating the maximal unknown measure-ment uncertainty
The uncertainty depends on the algorithm and calculation grid. ForSRP-PHAT (Section 3.2.1) and RMS minimization (Section 3.1.4) the algo-rithms were discretized with steps of 5 degrees, i.e. ϕi = i · 5°. Reducingthe step range, increases the computational complexity and lowers uncer-tainty. The best performing algorithm is SRP-PHAT (Figure 11). All al-gorithms actually perform quite well under ideal circumstances and withinterpolation of the GCC-function. Table 7 shows the maximum and meanuncertainties and time consumption ratios for SRP-PHAT, RMS minimiza-tion and closed-form. It should be mentioned that closed-form and RMSminimization are un-weighted. Adding PHAT weighting would decreaseaccuracy and underline the superiority of SRP.
22
Without Interpolation
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30−5
0
5
10
15
20
25
30
35
Azimuth Source in degree
Azi
mu
thE
stim
atio
nin
deg
ree
Error CriterionClosed FormSRP PhatIdeal
With Interpolation
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30−5
0
5
10
15
20
25
30
35
Azimuth Source in degree
Azi
mu
thE
stim
atio
nin
deg
ree
Error CriterionClosed FormSRP PhatIdeal
Figure 11: Mean of the azimuth estimation for source angles from 0 to 30°
Detailed results on the standard deviation are plotted in the Appendixsection 7.2:Figure 21.
23
Table 7: Maximum uncertainty error in °Algorithm Max Mean Mean-Absolute Time Ratio
With InterpolationSRP-PHAT -2 -0,19 0.99 761
RMS Min PHAT -2,81 -0,14 1.12 11,4Closed Form PHAT -2,49 -0,33 0.82 2,8
Without InterpolationRMS Min PHAT 5,6 -0,65 2.03 4
Closed Form PHAT 7,4 -0,9 2.62 1
5.4 Performance Evaluation for 0 Degree
A number of simulations were performed to evaluate and compare the per-formance of SRP-PHAT, RMS minimization and closed-form with and with-out PHAT weighting. In total there are 18 different simulation settings: sixSNR steps and for each of them three reverberation times, but all with thesame source location (Figure 12).
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x-axis (m)
y-ax
is(m
)
Figure 12: Room dimensions, microphone array (black) and source (red)location
The simulation results of the standard deviation and average are listedin the tables below. All values are rounded to one decimal place. Thisresults in an error of maximal 1.7mm for a source at 1 metre distance.
24
Table 8: Standard deviation and average for SRP-PHAT in degreesStandard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 11.2 15.1 30.3 -0.1 0.9 4.132 dB 4.7 11.9 27.7 -0.1 0.6 4.138 dB 1.4 10.5 27.2 0 0.6 4.244 dB 0.6 8.7 26.8 0 0.4 4.150 dB 0.2 9 26.8 0 0.5 4.156 dB 0 8.7 27 0 0.4 4.1
SRP-PHAT (Table 8) shows an average estimation almost unaffected bythe SNR. Variations are less than 0.6°. The effects of reverberation on theazimuth estimations are more pronounced. The search grid is set to 5°. Anincrease of grid points would increment the performance.
Table 9: Standard deviation and average for closed-form PHAT in degreesStandard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 27.2 41.2 55.2 0.8 -0.3 -0.132 dB 16.8 29.2 50.7 0.3 0.3 0.538 dB 7.5 26.6 49.3 0 0.1 0.844 dB 5.7 22.7 49.9 0 0.8 1.350 dB 6.2 22.7 49.6 0.1 0.1 256 dB 6.6 22.4 49.8 0 0.3 2
Closed-form PHAT (Table 9) performs well in the average azimuth esti-mation because it is not limited to 5° grid steps like the search based algo-rithms. One way to further improve the estimation performance is throughincreasing the amount of data processed in one frame.
Table 10: Standard deviation and average for RMS minimization PHAT indegrees
Standard deviation in ° Average in °SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s
26 dB 30.7 40 50.3 -0.8 2 332 dB 17.7 26.4 45.4 0.1 1.1 2.738 dB 7.8 21.6 43.3 0 1.4 3.444 dB 4.3 19.2 42.7 -0.1 0.5 450 dB 3 18.2 42.2 0 0.4 556 dB 2.9 18.3 42.3 0 0.7 5.2
25
RMS Minimization PHAT (Table 10) is a cost effective search based al-gorithm. It only takes 4 times the computation power of the closed-formsolution with current simulation settings. Due to this, an increment of gridpoints could be considered.
Performance of algorithms without weighting: As discussed in section 3.1.3,PHAT solutions are superior to un-weighted algorithms in a noisy and re-verberant scenario. Removing the weighting increases the standard devi-ation for realistic room environments significantly. Most rooms have re-verberation higher than 300ms. For the average azimuth estimation overall frames, the dependence on the amount of reverberation is smaller be-cause variations with opposite signs are crossed out. One advantage ofun-weighted algorithm is SNR’s robustness. Comparing Table 11 with Ta-ble 9 and Table 12 with Table 10, it can be observed that the performanceof the un-weighted algorithms hardly change due to SNR.
Table 11: Standard deviation and average for closed-form in degreesStandard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 7.8 54.1 80.4 0 0 -1.132 dB 2.6 54 80.5 0.1 1.9 -1.438 dB 1.1 54.3 81 0.1 1.2 -2.544 dB 1.2 54.5 81.2 0 2.1 -1.350 dB 1 54.5 81.4 0.1 2.1 -1.556 dB 0.9 54.5 81.3 0.1 1.8 -1.7
Table 12: Standard deviation and average for RMS minimization in degreesStandard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 6.9 53.9 76.2 0 9.8 24.232 dB 2 53.8 76.2 0 12.3 25.638 dB 0.7 53.5 75.5 0 12.5 27.844 dB 0.7 53.6 75.4 0 13.8 29.250 dB 0.6 53.7 75.6 0 13.2 29.156 dB 0.6 53.6 75.7 0 13.5 29
5.4.1 Performance Comparison
The comparison will be done with 600ms reverberation time and four cor-ner cases:
26
600ms When using the error percentages (Figure 13 to 15) as comparisoncriterion one can see that
• SRP-PHAT consistently outperforms the closed-form and RMS mini-mization algorithm.
• only for SNR > 38dB the PHAT solutions are superior.
• PHAT solutions are more sensitive to noise.
The sensitivity to noise of PHAT weighting is based on the fact that lowpower signals are emphasized (Section 3.1.3).
0 5 10 15 200%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Error Threshold in degrees
Azi
mu
thE
rror
Closed-Form 26-56dB SNR
26dB32dB38dB44dB50dB56dB
0 5 10 15 200%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Error Threshold in degrees
Closed-Form PHAT 26-56dB SNR
26dB32dB38dB44dB50dB56dB
Figure 13: Closed-form azimuth error rate for 600ms reverberation
27
0 5 10 15 200%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Error Threshold in degrees
Azi
mu
thE
rror
RMS Minimization 26-56dB SNR
26dB32dB38dB44dB50dB56dB
0 5 10 15 200%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Error Threshold in degrees
RMS Minimization PHAT 26-56dB SNR
26dB32dB38dB44dB50dB56dB
Figure 14: RMS minimization azimuth error rate for 600ms reverberation
0 2 4 6 8 10 12 14 16 18 200%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Error Threshold in degrees
Azi
mu
thE
rror
SRP-PHAT 26-56dB SNR
26dB32dB38dB44dB50dB56dB
Figure 15: SRP-PHAT azimuth error rate for 600ms reverberation
Corner Cases The error percentage for SRP-PHAT, RMS minimizationPHAT and closed-form PHAT for four corner cases (Table 13) are shownin Figure 16.
28
Table 13: Corner Cases of SimulationSNR\Rev Min Max
Min 26dB and 0s 26dB and 0.6sMax 56dB and 0s 56dB and 0.6s
Notice that SRP-PHAT consistently outperforms the other two methods.Both GCC-PHAT solutions are clearly inferior in high SNR scenarios dueto the estimation of TDOA. Closed-form and RMS minimization accuracyare nearly the same for errors > 5°, although closed-form only requires 1
4of the computation power. On the one hand, all 3 algorithms are robust toreverberation because of the PHAT weighting. For example, 85 percent ofthe estimations for 56dB SNR and 600ms reverberation, show an error lessthan 5 degrees. On the other hand, for 26dB SNR and 0ms, reverberationonly 81 percent are correct.
0 5 10 15 200%
20%
40%
60%
80%
100%
Error Threshold in degrees
Azi
mu
thE
rror
Min Reverberation, Max SNR
SRP-PHATRMS Minimisation PHATClosed-Form PHAT
0 5 10 15 200%
20%
40%
60%
80%
100%
Error Threshold in degrees
Azi
mu
thE
rror
Min Reverberation, Min SNR
0 5 10 15 200%
20%
40%
60%
80%
100%
Error Threshold in degrees
Azi
mu
thE
rror
Max Reverberation, Max SNR
0 5 10 15 200%
20%
40%
60%
80%
100%
Error Threshold in degrees
Azi
mu
thE
rror
Max Reverberation, Min SNR
Figure 16: Azimuth error rates for corner cases
29
5.4.2 Dominant Reflection from the Wall Behind
A closer look on (Figure 13 to 15) shows that all graphs are saturating ata specific error threshold. The remaining error in the DOA estimates isthe result of reflections from the wall closest to the microphone array (Fig-ure 12). The histogram below shows data from corner case in Figure 16(bottom,left). Estimations with negative results are mapped to an equiva-lent positive angle, because of the symmetry of the microphone array beingused.
0 20 40 60 80 100 120 140 160 180Absolute Azimuth in degrees
0%
11%
21%
32%
42%
53%
63%
74%
84%
95%
Per
cent
age
of to
tal e
stim
atio
ns
Closed-Form PHATClosed-FormRMS Minimisation PHATRMS MinimisationSRP-PHAT
Figure 17: Histogram of the absolute azimuth estimations 0− 180°. SourceDOA 15°
Figure 17 indicates that almost all DOA estimations greater than ±10°are in an interval from 170 to 190°. With knowledge of the position andorientation of the microphone array in the room, these errors could be cor-rected to significantly improve the performance. This is a SSL post pro-cessing step, see section 2. Table 14 to Table 16 in Appendix 7.2 showthe data of Table 8 to Table 10, with limiting the valid DOA estimations toϕ = 0 . . .±170°. All algorithms benefit from post processing, but SRP-PHATprofits more than the other algorithms. For high SNR the standard devia-tion and average are zero even in high reverberation environments. This isbecause SRP searches for maximum power and not the maximum of GCCfunction.
5.5 Performance Evaluation for 15 Degree
Apart from the near wall at 180°, the spatial distribution of the microphonearray and source in the room were favourable during the comparison ofperformance and robustness to noise and reverberation. The microphone
30
array does not perform equally well for all DOAs. In Figure 11 it can beobserved that for 15° simulation the worst estimates are achieved in gen-eral. The dominant reflection from wall behind (Section 5.4.2) also comesinto play when the source angle is increased from 0 to 15°. Figure 18 isgenerated under equivalent conditions than figure 17, apart from the dif-ferent DOA. The histogram shows a decrease in performance for all algo-rithms and highlights the advantage of PHAT weighted over unweightedalgorithms under realistic environments.
0 20 40 60 80 100 120 140 160 180 0%
11%
21%
32%
42%
53%
63%
74%
Absolute Azimuth in degrees
Per
cent
age
of to
tal e
stim
atio
ns
Closed−Form PHATClosed−FormRMS Minimisation PHATRMS MinimisationSRP−PHAT
Figure 18: Histogram of the absolute azimuth estimations 0− 180°. SourceDOA 15°
5.6 Estimation of the Radius in Closed Form Algorithm
Due to the small microphone array dimension and the relative low sam-pling rate, the radius estimation is inaccurate and therefore was fixed tor = 1m for the closed-form algorithm in all simulations. See section 3.1.5and Equ. 36 for the derivation. However, there are various techniques de-scribed in [6], to estimate the radius using the near-field sound model.The outcome of the range estimation presented in Figure 19, using thespherical-interpolation method, confirms that estimations of sources closeto the microphone array are better than those in the far-field.
31
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.5
0
0.5
1
1.5
2
2.5
Source Distance in metre
Dis
tanc
eE
stim
atio
nin
met
re
Spherical InterpolationIdeal
Figure 19: Performance of distance estimation using spherical-interpolation for distances of r = 0.1 . . .2m
For comparison, Figure 20 shows mean and standard deviation of theazimuth estimation of the closed-form algorithm for a source atϕ = 0° withradius r = 0.1 . . .2m for no reverberation and high SNR.
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−120
−100
−80
−60
−40
−20
0
20
40
60
Source Distance in metre
Mea
nan
dSt
and
ard
dev
iati
onof
azim
uth
ind
egre
e
Closed-Form with fixed 1m rangeClosed-Form with estimated range
Figure 20: Azimuth estimation for the closed-form algorithm with andwithout distance estimation
32
Due to the poor radius estimation, the mean of the closed-form algo-rithm using spherical interpolation for the range estimation, has a negativeoffset and high standard deviation. For all simulated distances the fixedfar-field assumption is the better choice.
6 Conclusion
The goal of this thesis is to compare source localization algorithms underdifferent reverberation times and SNRs. How impactful is changing SNRon the DOA estimates? The first task was choosing at least one algorithmbased on TDOA estimations and one based on the SRP. The choices are RMSminimization of TDOA, closed-form least-squares and the SRP with PHATweighting. To make comparison fair, closed-form and RMS minimizationare simulated with and without PHAT weighting. The research starts byanalysing the given microphone array integrated in Amazon Echo. Thefirst insight was rejecting the idea of estimating distances due to the smallsize of the microphone array and far-field behavior of arriving wave fronts.The second limitation is the 1D azimuth estimation as a result of the planarmicrophone array geometry and the weakness in estimating the elevation.Third, for TDOA based algorithms interpolation of the GCC is necessary tobe able to compete with the performance of the SRP, because otherwise theTDE is inaccurate.As a final step in preparation, a short chapter on computational cost com-pares the algorithms with regard to complexity. In general closed-formsolutions require less computing effort than search based and the SRP cal-culation is more costly than the TDOA estimation. This makes closed-formthe least complex, followed by RMS minimization and leaves SRP as themost complex algorithm.The main part covers simulations with a standardized audio file databaseand different settings, which then are compared using the error rate, firstand second order statistics. The impact of changing SNR depends on thechosen algorithm and weighting. It emerged, that un-weighted implemen-tations can’t keep up in realistic environments, and simulations confirmedtheory on the positive effect of PHAT weighting. Simulations show, thatmicrophones with high SNR increase the performance of the DOA estima-tions. Whenever low SNR is expected, a complex algorithm, e.g. the SRPshould be chosen. Moreover, although SRP is more complex, the idea ofusing it in real-time applications is more relevant nowadays due to the in-creasing computational power of electronic devices. When a relatively highSNR can be ensured, in simulations 44dB, the effect of SNR is low on TDOAbased algorithms as well. Then, the significant lower computational timemight be worth the worse quality of estimates compared to the SRP.
33
7 Appendix
7.1 Mathematic Symbol Reference
• Microphone input signal
– xi(t) . . . continuous input signal of microphone i = 1 . . .M.
– xi[n] . . . discrete input signal of microphone i = 1..M
– x(b)i [n] . . . framed discrete input signal, with block index (b).
– X(b)i [k] . . . Discrete Fourier Transform (DFT) x(b)
i [n] c sX(b)i [k].
– X frequency domain vectors and matrices are large and bold.
• a . . . scalars are small and italic.
• ~p . . . vector in 3D.
• x . . . vectors are written bold.
• M . . . total number of microphones in the microphone array.
• N . . . number of Fourier coefficients or number of samples in a vector.
• xᵀ . . . transposed x.
• xH . . . conjugate and transpose of x.
• ϕ̂ . . . measured or estimated values.
• E . . . matrices are large and bold.
7.2 Simulation
Performance evaluation for 0 degree with post processing Table 14 to 16use the same data set as the tables in section 5.4, but only considering esti-mates in the range from 0 . . .±170°. This kind of scenario appears when themicrophone array is positioned near a wall and there cannot be a talker at170 . . .190°. The knowledge of the microphone array position in the roomand the restriction to specific angles can increase the quality of DOA esti-mations.
34
Table 14: Standard deviation and average for SRP-PHAT 0 . . .± 170°Standard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 10,6 7,8 4,8 0 0,2 -0,132 dB 4,7 3,8 0,6 -0,1 0 038 dB 1,4 0,7 0,3 0 0 044 dB 0,6 0,3 0,1 0 0 050 dB 0,2 0,1 0 0 0 056 dB 0 0 0 0 0 0
Table 15: Standard deviation and average for closed-form PHAT 0 . . .±170°Standard deviation in ° Average in °
SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s26 dB 24,9 33,9 42,3 0,7 0,5 132 dB 14,2 22,2 33,5 0,3 0,5 0,938 dB 7,1 17,2 28,2 0 0,3 0,744 dB 4,5 13,1 25 0 0,4 0,850 dB 5,7 11,8 19,7 0,1 0,3 1,156 dB 4,8 10,2 18,2 0 0,2 1,1
Table 16: Standard deviation and average for RMS minimization PHAT0 . . .± 170°
Standard deviation in ° Average in °SNR\Rev 0 s 0.3 s 0.6 s 0 s 0.3 s 0.6 s
26 dB 28 31,2 40,3 -1,2 0,3 032 dB 16,6 21,4 33,7 0,1 -0,1 -0,238 dB 7,3 15,3 29,4 0 0,4 -0,444 dB 4,3 12,4 24,9 -0,1 -0,3 -0,350 dB 3 11,7 21,8 0 -0,1 -0,656 dB 2,9 10,1 20,7 0 0,1 -0,2
Unknown Measurement Uncertainty with standard deviation Observ-ing Figure 21 it should be mentioned that SRP uses PHAT weighting. Thisis clearly a disadvantage on current simulation settings, i.e. high SNR andlow reverberation. Thus, removing the weighting would decrease accuracyof the TDOA algorithms.
35
Without Interpolation
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30−10
−5
0
5
10
15
20
25
30
35
40
Source Location, Azimuth in degree
Mea
nan
dSt
and
ard
dev
iati
onin
deg
ree
Error CriterionClosed FormSRP PhatIdeal
With Interpolation
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30−5
0
5
10
15
20
25
30
35
Source Location, Azimuth in degree
Mea
nan
dSt
and
ard
dev
iati
onin
deg
ree
Error CriterionClosed FormSRP PhatIdeal
Figure 21: Mean and standard deviation of the azimuth estimation forsource angles from 0 to 30°
36
References
[1] Amazon. Amazon echo dot, 2017.
[2] Ivan Jelev Tashev. Sound Capture and Processing: Practical Approaches.Wiley Publishing, 2009.
[3] Joseph Hector DiBiase. A High-Accuracy, Low-Latency Technique forTalker Localization in Reverberant Environments Using Microphone Ar-rays. PhD thesis, B.S., Trinity College, 1991 Sc.M., Brown University,1993, 2000.
[4] J. Delosme, M. Morf, and B. Friedlander. Source location from timedifferences of arrival: Identifiability and estimation. In Acoustics,Speech, and Signal Processing, IEEE International Conference on ICASSP’80., volume 5, pages 818–824, Apr 1980.
[5] C. Knapp and G. Carter. The generalized correlation method for es-timation of time delay. IEEE Transactions on Acoustics, Speech, andSignal Processing, 24(4):320–327, Aug 1976.
[6] J. Smith and J. Abel. Closed-form least-squares source location es-timation from range-difference measurements. IEEE Transactions onAcoustics, Speech, and Signal Processing, 35(12):1661–1669, Dec 1987.
[7] Ajoy Kumar Dey and Susmita Saha. Acoustic beamforming: Designand development of steered response power with phase transform.Master’s thesis, BLEKINGE TEKNISKHA HÖGSKOLA, 2011.
[8] Hoang Tran Huy Do. Real-time srp-phat source location implementa-tions on a large-aperture microphone array, 2009.
[9] Schädle. Gauß-elimination und lr-zerlegung. 2010.
[10] Martin Cooke, Jon Barker, Stuart Cunningham, and Xu Shao. Anaudio-visual corpus for speech perception and automatic speechrecognition. The Journal of the Acoustical Society of America,120(5):2421–2424, 2006.
37