+ All Categories
Home > Documents > 05743890

05743890

Date post: 03-Oct-2015
Category:
Upload: fernando-de-sa
View: 221 times
Download: 2 times
Share this document with a friend
Description:
hgfhggfghh
Popular Tags:
4
BLI SOURCE SEPATION BASED ON FAST-COERGENCE ALGOTHM USING ICA D BEOG FOR REAL CONVOLUTI MIXTU Hishi SARUWATA Th WAMU. Kaki WAI Atsunobu MINUMA t . and Mao SATA t Gradua Scol of [nation Science, N Instite of Science d Technol o 8916-5 Takaya-cho, Ima-shi, N, 630-0101, JAPAN tNiss h Center. NISSAN MOTOR CO., LTD. 1 Naʦushima-cho, Yokosuka-shi, Kanagawa 237-8523, JAPAN ABSTCT We propose a new algot for blind soce son (BSS). in which independent component analysis (lCA) beafoing combined to l the low-conee pblem tugh op- timization in ICA. The pposed meth consisʦ of the following tee pas: (1) uency-domain ICA with diction-of- (DOA) estimation, (2) nul! beforming based on the estimated DOA, d (3) inteation of (1) d (2) based on e algot diveity in th i d eq�cy d omain. The invee of the mixing matx obned by rCA is temporally substited t הrix based on null befoing ough iterative optimiza- tion, e tempol alon between ICA and beamfoing can alize t- and high-conveence optimition. The results of e signal sation experiments t the sial s- tion perance of the proposed algo is seor that of the conventional ICA-based BSS method, even uer reverbet conditions. I. INTRODUCTION Blind soe separation (BSS) is the approach ken to estimate oginal soe sis using only e infoation of e mixed signals obsed in each inp channel. This tecique is ap- plicle e realion of noise-robust speech recognition d bigh-qi hands-e telmmunic@ion systems. In e recent works for the BSS based on independent component alysis (lCA) [I), sever methods, in which the invee of e complex mixing matces calculated in e quency domain, have been proposed deal with e al lags among each of the elements of e microphone y syste [2, 3,4]. However, this rCA-based prh has the didtage that is difficul with low conveence of nonline optimion [5). In is er, we descbe a new got for BSS in which ICA befong combined. e proposed method con- sists of foUowing e pas: (I) frequency-domain ICA wi estimati of the diction of val (DOA) of e sound soe, (2) null beamfoing based on the estimated OOA, d (3) in- tegion of (I) d (2) based on the algot diversity in both ition uency in. T הtporal utilization of null beamfoing tugh ICA ions c fast- d high- conce optimization. The following sections desbe e ed method tail, d it is shn that sial sepa- r@ion peoce of the pposed algo is supeor to that of the convenonallCA-based BSS method. Also, e experiment in a envient shs at the stion oces of the posed method are mly supeor to ose of tbe sod o d michone I rophone (d=d,) (d-dk) Fig 1. Configtion of a microphone y d signals. conventional DS amy. 2. DATA MODEL AND CONVENTIONAL BSS MEmOD In this sdy, a sight-line y is assumed. e coordinates of e elements a desiated as d (k = 1,···, , and the directions of aval of multiple sound soues desiated as 9, (l = 1"", L) (see Fig. I), where we deal with e case of K=L=2. In e equency domain, the obseed sis in which mul- tiple soe signals are mixed are given by X(f)=A(f)S(f), whe X{f) = [XI (f),., " XK{f) is the obseed sial vec- tor, and S(f) = [51 (f), . . . ,SL (f)] is the soe sil vector. A(/) is e mixing max which is assumed to be complex-lued because we inoduce a model to deal with the val lags among each of the elements ofthe microphone y d om reverbera- tions. In the uency-domain rCA, first, the sho-time analysis of obsed signals is conducted by fre-by-e discte Fouer tJansfo (O. By plotting e ectral ues in a frequen bin of each microphone iut ame by fe, we consider them as a time sees. Hr, we designate the time sees X {f, t) =[XI (I, t), ... ,XK(f, t)]T . Next, peorm sial separation using the complex-lued inverse of e mixing matx, W(f� so at the L time-sees ouut Y{f, t)-[YI(f, t),'" , YL(/, t)J -W(I)X(/, t) becomes mully indendent We perfo this procedure with spect to all uency bins. finally, by plying the inverse OFT and the overlap-add technique to the separated time sees Y(f, t), we reconstct the resulnt source sials in e time main. [n e conntional ICA·based BSS method, the optimal W(f) is obtained by e following iterative equation [2J: 0.7803-7402-9/021$17.00 C2002 IEEE 1-921
Transcript
  • BLIND SOURCE SEPARATION BASED ON FAST-CONVERGENCE ALGORITHM USING ICA AND BEAMFORMING FOR REAL CONVOLUTIVE MIXTURE

    Hiroshi SARUWATARI. Toshiya KAWAMURA. Kat3uyuki SAWAI. Atsunobu KAMINUMAt. and Mosao SAKATA t

    Graduate School of [nfonnation Science, Nara Institute of Science and Technology 8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0101, JAPAN

    tNissan Research Center. NISSAN MOTOR CO., LTD. 1 Natsushima-cho, Yokosuka-shi, Kanagawa 237-8523, JAPAN

    ABSTRACT

    We propose a new algorithm for blind source separation (BSS). in which independent component analysis (lCA) and beanlfonning are combined to resolve the low-convergeru:e problem through optimization in ICA. The proposed method consists of the following three parts: (1) frequency-domain ICA with direction-of-arrival (DOA) estimation, (2) nul! beamforming based on the estimated DOA, and (3) integration of (1) and (2) based on the algorithm diversity in both iteration and freqcy domain. The inverse of the mixing matrix obtained by rCA is temporally substituted by the matrix based on null beamforming through iterative optimization, and the temporal alternation between ICA and beamfonning can realize fast- and high-convergence optimization. The results of the signal separation experiments reveaJ that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions.

    I. INTRODUCTION

    Blind source separation (BSS) is the approach taken to estimate original source signals using only the infonnation of the mixed signals observed in each input channel. This technique is applicable to the realization of noise-robust speech recognition and bigh-quality hands-free telecommunication systems. In the recent works for the BSS based on the independent component analysis (lCA) [I), several methods, in which the inverse of the complex mixing matrices are calculated in the frequency domain, have been proposed to deal with the arrival lags among each of the elements of the microphone array systent [2, 3,4]. However, this rCA-based approach has the disadvantage that there is difficulty with the low convergence of nonlinear optimization [5).

    In this paper, we describe a new algorithm for BSS in which ICA and beamfotrning are combined. The proposed method consists of the foUowing three parts: (I) frequency-domain ICA with estimation of the direction of arrival (DOA) of the sound source, (2) null beamforming based on the estimated OOA, and (3) integration of (I) and (2) based on the algorithm diversity in both iteration and frequency domain. The tentporal utilization of null beamforming through ICA iterations can realize fast- and highconvergence optimization. The following sections describe the proposed method in detail, and it is shown that the signal separation performance of the proposed algorithm is superior to that of the conventionallCA-based BSS method. Also, the experiment in a real car environment shows that the separation performances of the proposed method are remarkably superior to those of tbe

    sound

    o d microphone I ItIk:rophone II; (d=d,) (d-dk)

    Fig. 1. Configuration of a microphone array and signals.

    conventional DS amy.

    2. DATA MODEL AND CONVENTIONAL BSS MEmOD In this study, a straight-line array is assumed. The coordinates of the elements are designated as d/c (k = 1,, K), and the directions of arrival of multiple sound sources are designated as 9, (l = 1"", L) (see Fig. I), where we deal with the case of K=L=2.

    In the frequency domain, the observed signals in which multiple source signals are mixed are given by X(f)=A(f)S(f), where X{f) = [XI (f),., " XK{f)r: is the observed signal vector, and S(f) = [51 (f), . . . ,SL (f)] is the source signal vector. A(/) is the mixing matrix which is assumed to be complex-valued because we introduce a model to deal with the arrival lags among each of the elements ofthe microphone array and room reverberations.

    In the frequency-domain rCA, first, the short-time analysis of observed signals is conducted by frame-by-frame discrete Fourier tJansform (OFT). By plotting the spectral values in a frequency bin of each microphone input frame by frame, we consider them as a time series. Hereafter, we designate the time series as X {f, t) =[XI (I, t), ... ,XK(f, t)]T. Next, we perform signal separation using the complex-valued inverse of the mixing matrix, W(f so that the L time-series output Y{f, t)-[YI(f, t),'" , YL(/, t)J -W(I)X(/, t) becomes mutually independent We perform this procedure with respect to all frequency bins. finally, by applying the inverse OFT and the overlap-add technique to the separated time series Y(f, t), we reconstruct the resultant source signals in the time domain.

    [n the conventional ICAbased BSS method, the optimal W(f) is obtained by the following iterative equation [2J:

    0.7803-7402-9/021$17.00 C2002 IEEE 1-921

  • Wi(/)

    , _____ .J

    Fill. 2. Proposed algorithm combining frequency-domain ICA and beamfonning.

    where (. h denotes the time-averaging operator, i is used to express the value of the i th step in the iterations, and 'I is the step-size parameter. Also. we define the nonlinear vector function +(.) as

    (Y(J,t - [.(Y1(J,t, .. . ,it(YL(f,tf, (2) it(Yi(f.t ... [I +exp(-Y,(R)(f.tr1

    '+j. [1+exp(-Y,(I)(f,tr', (3)

    where (R)(J. t) and (I)(f. t) are the real and imaginllty parts ofYj(f, t). respectively.

    3. PROPOSED ALGORITHM

    The conventional ICA method inherently has a significant disadvantage which is due to low convergence through nonlinear optimization in ICA. In order to resolve the problem, we propose an algorithm based on the temporal alternation oflcaming between ICA and beamformingj the inverse of the mixing matrix. W(f), 0btained through ICA is temporally substituted by the matrix based on null beam forming for a temporal initialization or acceleration ofthe iterative optimization. The proposed algorithm is conducted by the following steps with respect to all frequency bins in parallel (see Fig. 2). IStep 1: Initialization] Set the initial W,(/), i,e . Wo(J). to an arbitrary value. where the subscripts i is set to be O. (Step 2: I-time leA iteration] Optimize Wi(!) using the following I-time ICA iteration:

    WA)(!);;;; '1[diag( ((Y(f,tyH(/,t)t) -((Y(f, t)}yH(f, t)}c]WM) +Wj(f),

    (4)

    where the superscript "(ICA)" is used to express that the inverse of the mixing matrix is obtained by ICA. IStep 3: DOA estimation] Estimate DOAs of the solmd sources by utilizing the directivity pattern of the array system, F,(f,6). which is given by

    K F,(f.6);;;; EW,CA)(f) exp[j211'/dlosin6/c). (5)

    "1

    where W,CII.) (f) is the element of WA) (f). In the directivity patterns, directional nulls exist in only two particular directions. Accordingly. by obtaining statistics with respect to the directions of nulls at all frequency bins, we can estimate the OOAs of the sound sources, The DOA of the I th sOlmd source. 9,. can be estimated as 8, ;;;; 2 EI 6,(fm)/N, where N is a total point of OFT. and 6,(f ... ) represents the DOA of the I th sound source at the m th frequency bin. These are given by

    where mintz, III (maxIz, Ill) is defined as a function in order to obtain the smaller (larger) value among x and II, IStep 4: Beamforming] Construct an alternative matrix for signal separation, WCBF)(f), based on the null-beamforming technique where the DOA results obtained in the previous step is used. In the case that the look direction is 81 and the directional null is steered to 92, the elements of the matrix for signal separation arc given as

    WfF)(f,..);;;;exp[ -j27r/mdlsin81/c] x {exp[j211'/mdl(9in-sin81)/c] - exp[j27r/,..da(sin 6a-sin 61)/C] r! (8)

    W1C:F) (/m) = - exp[ - j27r/",d,sin81/c) x {ex:p[j211'fmdl (sin 82-sin 81)fC]

    - exp[j21f/mda(sin ia-sin il)/c]} -I, (9)

    Also, in the case that the look direction is 92 and the directional null is steered to iit, the elements of the matrix are given as

    WJF)(fm) ;;;; -exp[ - j27r/ ... dl sin 82/C] )( {-exp[j2?r/mdl(sin61-sin6a)/e]

    +exp[j27r/md,(sin91-sin9,)/c] r\ (10) WJ:F)(fm) ;;;; exp[ - j2?r/mdUin82/c]

    )( {- exp [j2?r /",d1 (sin 61 -sin 92)/e] +exp[j27r/md2(sin61-sin6a)/c]) -I, (Il)

    (Step 5: Diversity witb cost funetion] Select the most suitable unmixing matrix in each frequency bin and each iteration point, i.e., algorithm diversity in both iteration and frequency domain. As a cost function used to achieve the diversity. we calculate two kinds of cosine distances between the separated signals which are

    1-922

  • where Yj(ICA) (/, t) is the separated signal by leA, and Yj(BF) (/, t) is the separated signal by bearnfonning. If the separation per fonnanee of beamforming is suemor to that of ICA, we obtain the condition, J(ICA)(/) > JI F)(f); otherwise J(lCA)(f) :5 J(BF)(/). Thus, an observation of the conditions yields the fol lowing algorithm:

    _ {WI'iA)(f), (J(ICA)(f):5 J(BF)(f) W(/) - W(iF) (f), (J(ICA)(f) > J(BF)(f) . (14)

    If the (i + l)th iteration was the final iteration, go to step 6; oth erwise go beck to step 2 and repeat the ICA iteration insening the W(f) given by Eq. (14) into W.(f) in Eq. (4) with an increment ofi. (Step 6: Ordering and scaliag) Using the DOA information ob tained in step 3, we detect and correct the source permutation and the gain inconsistency (6].

    4. EXPERIMENTS IN REVERBERANT ROOM

    4.1. Conditions for experiments

    A two-i:lement array with the interelement spacing of 4 cm is assumed. The speech signals are assumed to arrive from two direc tions, -30 and 40. Two kinds of sentences, those spoken by two male and two female speakers selected from the ASJ contin o uous speech corpus fOi research, are used as the original speech samples. Using these sentences, we obtain 12 combinations with respect to speakers and source directions. In these experiments, we use the following signals as the source signals: the original speech convolved with the impulse responses specified by different reverberation times (RTs) of 150 msec and 300 msec. The impulse responses are recorded in a variable reverberation time room as shown in Fig. 3. The analytical conditions of these experiments are as follows: the sampling frequency is 8 kHz, the frame length is 128 msec, the frame shift is 2 msce, and the step-size parameter 1J is set to be 1.0 X 10-5

    4-Z. Objective evaluation oheparated signals In order to compare the performance of the proposed algorithm with that of the conventional BSS described in Sect. 2 for different iteration points in ICA, the noise reduction rale (NRR), defined as the output signaltonoise ratio (SNR) in dB minus input SNR in dB, is shown in Fig. 4. These values were averages of all of the combinations with respect to speakers and source directions. As for the proposed algorithm, we also plot the NRR which is rescaled by the computational cost (see dotted lines) because the proposed algorithm has a computational complexity of about 1.9 fold compared with the conventional ICA.

    5.73m

    Loudspeakers IS .V(Height: 1.35 ml ..: i. 15 m ... -Lg

    .. :o \ 2.1Sm \4001

    "'. :

    'i

    array .

    j Microphone .I

    __

  • 14 r------------------------- ............... _ ............ _.-. ............. --............... .

    ------M------------)