05743890

BLIND SOURCE SEPARATION BASED ON FAST-CONVERGENCE ALGORITHM USING ICA AND BEAMFORMING FOR REAL CONVOLUTIVE MIXTURE

Hiroshi SARUWATARI. Toshiya KAWAMURA. Kat3uyuki SAWAI. Atsunobu KAMINUMAt. and Mosao SAKATA t

Graduate School of [nfonnation Science, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0101, JAPAN

tNissan Research Center. NISSAN MOTOR CO., LTD. 1 Natsushima-cho, Yokosuka-shi, Kanagawa 237-8523, JAPAN

ABSTRACT

We propose a new algorithm for blind source separation (BSS). in which independent component analysis (lCA) and beanlfonning are combined to resolve the low-convergeru:e problem through optimization in ICA. The proposed method consists of the following three parts: (1) frequency-domain ICA with direction-of-arrival (DOA) estimation, (2) nul! beamforming based on the estimated DOA, and (3) integration of (1) and (2) based on the algorithm diversity in both iteration and freqcy domain. The inverse of the mixing matrix obtained by rCA is temporally substituted by the matrix based on null beamforming through iterative optimization, and the temporal alternation between ICA and beamfonning can realize fast- and high-convergence optimization. The results of the signal separation experiments reveaJ that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions.

I. INTRODUCTION

Blind source separation (BSS) is the approach taken to estimate original source signals using only the infonnation of the mixed signals observed in each input channel. This technique is applicable to the realization of noise-robust speech recognition and bigh-quality hands-free telecommunication systems. In the recent works for the BSS based on the independent component analysis (lCA) [I), several methods, in which the inverse of the complex mixing matrices are calculated in the frequency domain, have been proposed to deal with the arrival lags among each of the elements of the microphone array systent [2, 3,4]. However, this rCA-based approach has the disadvantage that there is difficulty with the low convergence of nonlinear optimization [5).

In this paper, we describe a new algorithm for BSS in which ICA and beamfotrning are combined. The proposed method consists of the foUowing three parts: (I) frequency-domain ICA with estimation of the direction of arrival (DOA) of the sound source, (2) null beamforming based on the estimated OOA, and (3) integration of (I) and (2) based on the algorithm diversity in both iteration and frequency domain. The tentporal utilization of null beamforming through ICA iterations can realize fast- and highconvergence optimization. The following sections describe the proposed method in detail, and it is shown that the signal separation performance of the proposed algorithm is superior to that of the conventionallCA-based BSS method. Also, the experiment in a real car environment shows that the separation performances of the proposed method are remarkably superior to those of tbe

sound

o d microphone I ItIk:rophone II; (d=d,) (d-dk)

Fig. 1. Configuration of a microphone array and signals.

conventional DS amy.

2. DATA MODEL AND CONVENTIONAL BSS MEmOD In this study, a straight-line array is assumed. The coordinates of the elements are designated as d/c (k = 1,, K), and the directions of arrival of multiple sound sources are designated as 9, (l = 1"", L) (see Fig. I), where we deal with the case of K=L=2.

In the frequency domain, the observed signals in which multiple source signals are mixed are given by X(f)=A(f)S(f), where X{f) = [XI (f),., " XK{f)r: is the observed signal vector, and S(f) = [51 (f), . . . ,SL (f)] is the source signal vector. A(/) is the mixing matrix which is assumed to be complex-valued because we introduce a model to deal with the arrival lags among each of the elements ofthe microphone array and room reverberations.

In the frequency-domain rCA, first, the short-time analysis of observed signals is conducted by frame-by-frame discrete Fourier tJansform (OFT). By plotting the spectral values in a frequency bin of each microphone input frame by frame, we consider them as a time series. Hereafter, we designate the time series as X {f, t) =[XI (I, t), ... ,XK(f, t)]T. Next, we perform signal separation using the complex-valued inverse of the mixing matrix, W(f so that the L time-series output Y{f, t)-[YI(f, t),'" , YL(/, t)J -W(I)X(/, t) becomes mutually independent We perform this procedure with respect to all frequency bins. finally, by applying the inverse OFT and the overlap-add technique to the separated time series Y(f, t), we reconstruct the resultant source signals in the time domain.

[n the conventional ICAbased BSS method, the optimal W(f) is obtained by the following iterative equation [2J:

0.7803-7402-9/021$17.00 C2002 IEEE 1-921

Wi(/)

, _____ .J

Fill. 2. Proposed algorithm combining frequency-domain ICA and beamfonning.

where (. h denotes the time-averaging operator, i is used to express the value of the i th step in the iterations, and 'I is the step-size parameter. Also. we define the nonlinear vector function +(.) as

(Y(J,t - [.(Y1(J,t, .. . ,it(YL(f,tf, (2) it(Yi(f.t ... [I +exp(-Y,(R)(f.tr1

'+j. [1+exp(-Y,(I)(f,tr', (3)

where (R)(J. t) and (I)(f. t) are the real and imaginllty parts ofYj(f, t). respectively.

3. PROPOSED ALGORITHM

The conventional ICA method inherently has a significant disadvantage which is due to low convergence through nonlinear optimization in ICA. In order to resolve the problem, we propose an algorithm based on the temporal alternation oflcaming between ICA and beamformingj the inverse of the mixing matrix. W(f), 0btained through ICA is temporally substituted by the matrix based on null beam forming for a temporal initialization or acceleration ofthe iterative optimization. The proposed algorithm is conducted by the following steps with respect to all frequency bins in parallel (see Fig. 2). IStep 1: Initialization] Set the initial W,(/), i,e . Wo(J). to an arbitrary value. where the subscripts i is set to be O. (Step 2: I-time leA iteration] Optimize Wi(!) using the following I-time ICA iteration:

WA)(!);;;; '1[diag( ((Y(f,tyH(/,t)t) -((Y(f, t)}yH(f, t)}c]WM) +Wj(f),

(4)

where the superscript "(ICA)" is used to express that the inverse of the mixing matrix is obtained by ICA. IStep 3: DOA estimation] Estimate DOAs of the solmd sources by utilizing the directivity pattern of the array system, F,(f,6). which is given by

K F,(f.6);;;; EW,CA)(f) exp[j211'/dlosin6/c). (5)

"1

where W,CII.) (f) is the element of WA) (f). In the directivity patterns, directional nulls exist in only two particular directions. Accordingly. by obtaining statistics with respect to the directions of nulls at all frequency bins, we can estimate the OOAs of the sound sources, The DOA of the I th sOlmd source. 9,. can be estimated as 8, ;;;; 2 EI 6,(fm)/N, where N is a total point of OFT. and 6,(f ... ) represents the DOA of the I th sound source at the m th frequency bin. These are given by

where mintz, III (maxIz, Ill) is defined as a function in order to obtain the smaller (larger) value among x and II, IStep 4: Beamforming] Construct an alternative matrix for signal separation, WCBF)(f), based on the null-beamforming technique where the DOA results obtained in the previous step is used. In the case that the look direction is 81 and the directional null is steered to 92, the elements of the matrix for signal separation arc given as

WfF)(f,..);;;;exp[ -j27r/mdlsin81/c] x {exp[j211'/mdl(9in-sin81)/c] - exp[j27r/,..da(sin 6a-sin 61)/C] r! (8)

W1C:F) (/m) = - exp[ - j27r/",d,sin81/c) x {ex:p[j211'fmdl (sin 82-sin 81)fC]

- exp[j21f/mda(sin ia-sin il)/c]} -I, (9)

Also, in the case that the look direction is 92 and the directional null is steered to iit, the elements of the matrix are given as

WJF)(fm) ;;;; -exp[ - j27r/ ... dl sin 82/C] )( {-exp[j2?r/mdl(sin61-sin6a)/e]

+exp[j27r/md,(sin91-sin9,)/c] r\ (10) WJ:F)(fm) ;;;; exp[ - j2?r/mdUin82/c]

)( {- exp [j2?r /",d1 (sin 61 -sin 92)/e] +exp[j27r/md2(sin61-sin6a)/c]) -I, (Il)

(Step 5: Diversity witb cost funetion] Select the most suitable unmixing matrix in each frequency bin and each iteration point, i.e., algorithm diversity in both iteration and frequency domain. As a cost function used to achieve the diversity. we calculate two kinds of cosine distances between the separated signals which are

1-922

where Yj(ICA) (/, t) is the separated signal by leA, and Yj(BF) (/, t) is the separated signal by bearnfonning. If the separation per fonnanee of beamforming is suemor to that of ICA, we obtain the condition, J(ICA)(/) > JI F)(f); otherwise J(lCA)(f) :5 J(BF)(/). Thus, an observation of the conditions yields the fol lowing algorithm:

_ {WI'iA)(f), (J(ICA)(f):5 J(BF)(f) W(/) - W(iF) (f), (J(ICA)(f) > J(BF)(f) . (14)

If the (i + l)th iteration was the final iteration, go to step 6; oth erwise go beck to step 2 and repeat the ICA iteration insening the W(f) given by Eq. (14) into W.(f) in Eq. (4) with an increment ofi. (Step 6: Ordering and scaliag) Using the DOA information ob tained in step 3, we detect and correct the source permutation and the gain inconsistency (6].

4. EXPERIMENTS IN REVERBERANT ROOM

4.1. Conditions for experiments

A two-i:lement array with the interelement spacing of 4 cm is assumed. The speech signals are assumed to arrive from two direc tions, -30 and 40. Two kinds of sentences, those spoken by two male and two female speakers selected from the ASJ contin o uous speech corpus fOi research, are used as the original speech samples. Using these sentences, we obtain 12 combinations with respect to speakers and source directions. In these experiments, we use the following signals as the source signals: the original speech convolved with the impulse responses specified by different reverberation times (RTs) of 150 msec and 300 msec. The impulse responses are recorded in a variable reverberation time room as shown in Fig. 3. The analytical conditions of these experiments are as follows: the sampling frequency is 8 kHz, the frame length is 128 msec, the frame shift is 2 msce, and the step-size parameter 1J is set to be 1.0 X 10-5

4-Z. Objective evaluation oheparated signals In order to compare the performance of the proposed algorithm with that of the conventional BSS described in Sect. 2 for different iteration points in ICA, the noise reduction rale (NRR), defined as the output signaltonoise ratio (SNR) in dB minus input SNR in dB, is shown in Fig. 4. These values were averages of all of the combinations with respect to speakers and source directions. As for the proposed algorithm, we also plot the NRR which is rescaled by the computational cost (see dotted lines) because the proposed algorithm has a computational complexity of about 1.9 fold compared with the conventional ICA.

5.73m

Loudspeakers IS .V(Height: 1.35 ml ..: i. 15 m ... -Lg

.. :o \ 2.1Sm \4001

"'. :

'i

array .

j Microphone .I

__

14 r------------------------- ............... _ ............ _.-. ............. --............... .

------M------------)

Date post:	03-Oct-2015
Category:	Documents
Upload:	fernando-de-sa
View:	221 times
Download:	2 times

05743890

Documents