Using eigenstructure decompositions of time-varying autoregressions in common spatial patterns-based...

Ui

DC

a

ARRAA

KEBTC

1

ttterwbsaouoa

fii

(n6

1d

Biomedical Signal Processing and Control 7 (2012) 622– 631

Contents lists available at SciVerse ScienceDirect

Biomedical Signal Processing and Control

j o ur nal homep a ge: www.elsev ier .com/ locate /bspc

sing eigenstructure decompositions of time-varying autoregressionsn common spatial patterns-based EEG signal classification

avid Gutiérrez ∗, Rocio Salazar-Varasentro de Investigación y de Estudios Avanzados (CINVESTAV), Unidad Monterrey, Apodaca, N.L., 66600, México

r t i c l e i n f o

rticle history:eceived 22 September 2011eceived in revised form 6 February 2012ccepted 6 March 2012vailable online 6 April 2012

eywords:

a b s t r a c t

Brain–computer interfaces based on common spatial patterns (CSP) depend on the operational frequencybands of the events to be discriminated. This problem has been addressed through sub-band decompo-sitions of the electroencephalographic signals using filter banks, then the performance relies on thenumber of filters that are stacked and the criteria to select their bandwidths. Here, we propose an alter-native approach based on an eigenstructure decomposition of the signals’ time-varying autoregressions(TVAR). The eigen-based decomposition of the TVAR allows for subject-specific estimation of the principal

lectroencephalographyrain–computer interfaceime-varying autoregressionsommon spatial patterns

time-varying frequencies, then such principal eigencomponents can be used in the traditional CSP-basedclassification. We show through a series of numerical experiments that the proposed classification schemecan achieve a performance which is comparable with the one obtained through the filter bank-basedapproach. However, our method does not rely on a preliminary selection of a frequency band, yet goodperformance is achieved under realistic conditions (such as reduced number of sensors and small amountof training data) independently of the time interval selected.

. Introduction

The common spatial patterns (CSP) method was introduced inhe field of electroencephalography (EEG) analysis as a techniqueo discriminate between normal and abnormal brain activity [1]. Inhe CSP method, optimal spatial filters are constructed such thatach filter enhances the variance of one feature of interest whileeducing the variance of other features [2]. Then, CSP filters areell suited to discriminate mental states that are characterized

y event-related synchronization/desynchronization effects. In [1],uch filtering approach was used to discriminate between normalnd abnormal patterns corresponding to various neurological dis-rders in multiple subjects, while in [3] the CSP method was alsosed for extracting abnormal components from EEG measurementsf a single subject. Furthermore, CSP has been proven useful inpplications related to brain source localization [4,5].

Even though the CSP method is not, strictly speaking, a classi-cation but a signal enhancement method, it has been also used

n multiple brain–computer interface (BCI) applications for the

∗ Corresponding author at: Centro de Investigación y de Estudios AvanzadosCINVESTAV), Vía del Conocimiento 201, Parque de Investigación e Innovación Tec-ológica (PIIT), Autopista al Aeropuerto Km. 9.5, Lote 1, Manzana 29, Apodaca, N.L.,6600, México. Tel.: +52 81 1156 1740x4513; fax: +52 81 1156 1741.

E-mail address: [email protected] (D. Gutiérrez).URL: http://www.gutierrezruiz.com (D. Gutiérrez).

746-8094/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.oi:10.1016/j.bspc.2012.03.004

© 2012 Elsevier Ltd. All rights reserved.

discrimination of motor imaginary tasks by itself and in combina-tion with other processing tools (see e.g. [6–8] for a detailed reviewon practical issues and several extensions of the CSP method). Inorder to achieve the desired discrimination effect, several param-eters have to be selected before CSP can be used, namely theband-pass filter and the time intervals (typically a fixed time inter-val relative to all stimuli/responses). In practice, some generalsettings are used: frequency band 7–30 Hz, time interval starting1000 ms after cue, and 2 or 3 filters from each side of the spectrum[9]. Therefore, the performance of the CSP filters depends on thecorrect selection of these parameters.

BCI systems relying on CSP-feature classification generally yieldpoor accuracies when the EEG measurements are either unfilteredor have been filtered with an inappropriately selected frequencyrange [10]. Therefore, setting a broad frequency range or manuallyselecting a subject-specific frequency range is a common practicewith the CSP algorithm. Still, the problem of manually selecting theoperational subject-specific frequency band of the CSP has beenaddressed in several ways. The most recent approaches rely onthe construction of filter banks that decomposes the EEG measure-ments into multiple sub-bands, then the CSP algorithm is used oneach of the sub-bands. In [10], the sub-bands are constructed byGabor filters, while in [11] a bank of zero-phase Chebyshev Type
II infinite impulse response filters is used. In both cases, the per-formance of the method relies on the number of filters that arestacked and the criteria used to determine their cutoff frequenciesand the overlapping between them (if any). While the previous
dx.doi.org/10.1016/j.bspc.2012.03.004

http://www.sciencedirect.com/science/journal/17468094

http://www.elsevier.com/locate/bspc

mailto:[email protected]

http://www.gutierrezruiz.com

dx.doi.org/10.1016/j.bspc.2012.03.004

Signal

aqaIiFet

wbteirTtgbdTtiac

felami

2

••

N

2

Eatntbtttc

rtg

x

wl

D. Gutiérrez, R. Salazar-Varas / Biomedical

pproaches require human intervention to select the cutoff fre-uencies of the filters, this problem is eluded in [12] by optimizing

FIR filter of high complexity simultaneously with the spatial filter.n order to control the complexity of the temporal filter, a regular-zation scheme is introduced which favors sparse solutions for theIR coefficients. Although some values of the regularization param-ter seem to give good results in most cases, a model selection haso be performed in order to achieve optimal performance.

In this paper (see also [13] for a preliminary version of thisork), we propose to substitute the frequency bank-based sub-

and decomposition by an eigenstructure decomposition wherehe most significant frequency components of the EEG signals arextracted by means of non-stationary time series models. Specif-cally, time-varying autoregressions (TVAR) are used to obtain aepresentation of the time-frequency structure of the EEG signals.VAR were first introduced in [14] and have thereafter been par-ially reformulated and applied to various fields such as seismology,eology, and economics. In the case of EEG studies, TVAR haveeen applied to the analysis of non-stationary human epileptic EEGata [15]. In our case, the assessment of changes over time in theVAR obtained from the EEG data allows for the identification ofime-varying principal frequencies (most likely related to the phys-ological events of interest) from which the most significant ones, inn eigenstructure sense, are then used in the traditional CSP-basedlassification.

In Section 2, we present the proposed classification scheme,rom which the basis of TVAR is reviewed, as well as the dynamicigenstructure decomposition of the corresponding TVAR evo-ution matrix. Section 2 includes a brief description of the CSPlgorithm as well. In Section 3, we show the applicability of ourethods through numerical examples using real EEG data. Finally,

n Section 4, we discuss the results, limitations, and future work.

. Methods

The proposed method can be seen as a two-step process:

Eigenstructure decomposition of the TVAR.Computation of the CSP of selected components.

ext, in this section we describe each of these steps.

.1. Decomposition of the TVAR

In the standard autoregressive framework, a discretely sampledEG signal is modeled by representing the voltage level at time ts a linear combination of voltage levels at times t − 1, t − 2, . . .,

− p, where p > 0 is the maximum time lag, plus a random compo-ent (driving noise or “innovation”). The relationship is assumedo be fixed over time, then the coefficients defining the linear com-ination are constant for the entire period of recording. In TVAR,hose coefficients are allowed to vary over time then they can adapto changes evidenced in the series. In particular, such representa-ion can respond to and adequately capture the forms of frequencyhanges seen on EEG oscillations.

Under those conditions, let us define xm(t) as the time seriesesulting of the EEG measurement at sensor m = 1, 2, . . ., M and atime samples t = 1, 2, . . ., N. The corresponding TVAR of order p isiven by

(t) =p∑

ϕ (t)x (t − i) + n (t), (1)
m
i=1

i,m m m

here ϕi,m(t) are time-varying coefficients which are often calcu-ated using the Levinson–Wiggins–Robinson (LWR) algorithm [16],

Processing and Control 7 (2012) 622– 631 623

and nm(t) represents the noise input to channel m. A vector repre-sentation of (1) can be achieved by defining xm(t) = [xm(t), xm(t − 1),. . ., xm(t − p + 1)]T, nm(t) = [nm(t), 0, . . ., 0]T, and Gm(t) as

Gm(t) =

⎡⎢⎢⎢⎢⎣

ϕ1,m(t) ϕ2,m(t) · · · ϕp−1,m(t) ϕp,m(t)1 0 · · · 0 00 1 · · · 0 0...

.... . .

......

0 0 · · · 1 0

⎤⎥⎥⎥⎥⎦ . (2)

Under these conditions, Eq. (1) is rewritten as

xm(t) = Gm(t)xm(t − 1) + nm(t). (3)

Once the EEG series are modeled via TVAR, the focus is on explor-ing the time-frequency structure of the latent processes underlyingthe signals using a dynamic model decomposition based on theeigenstructure of Gm(t). Such analysis is based on the fact that,at each time instant, the eigenvalues of the matrix defined in (2)satisfy the equation

det[�p − �p−1ϕ1,m − �p−2ϕ2,m − · · · − ϕp,m

]= 0, (4)

where ϕ*,m are used instead of ϕ*,m(t) for notational convenience.Furthermore, we can assume that Gm(t) has pz pairs of complexeigenvalues zi,m(t) of the form

zi,m(t) = ri,m(t)exp±jωi,m(t), for i = 1, . . . , pz, (5)

as well as py real eigenvalues yl,m(t), for l = 1, . . ., py, suchthat 2pz + py = p. Therefore, from the eigendecomposition Gm(t) =Bm(t)�m(t)B−1

m (t), we can define a transformation matrix Hm(t) =diag

(Bm(t)[1, 0, . . . , 0]T

)B−1

m (t) which, applied to (3), yields thefollowing:

Hm(t)xm(t) = Hm(t)Gm(t)xm(t − 1) + Hm(t)nm(t),

�m(t) = �m(t)Hm(t)H−1m (t − 1)�m(t − 1) + Hm(t)nm(t), (6)

where �m(t) = [�1,m(t), . . ., �p,m(t)]T = Hm(t) xm(t). As consequence,xm(t) = [1, 1, . . . , 1]�m(t), then we can rewrite it as

xm(t) =pz∑

i=1

ri,m(t)exp±jωi,m(t) +py∑

l=1

yl,m(t). (7)

In our case, we will consider pz and py to be fixed values due tothe quasi-static condition of the EEG signals. However, this con-dition in general does not hold over time, then relatively smallchanges in ϕ*,m can lead to one or more pair of complex roots tobe substituted by real roots with low values. Nevertheless, previousstudies on EEG signals (see [17]) have shown that this phenomenonis only observed for short periods of time, which produces a “break”in zi,m(t) but has little impact in the interpretation and understand-ing of its dynamical structure.

Another important consideration to move from the TVAR coeffi-cients ϕ*,m to the set of zi,m(t) is that identification must be enforcedthrough an ordering in terms of their relative frequencies at eachtime instant. This condition is necessary as the frequency andamplitude characteristics of the latent components vary throughtime, then a component that has the lowest frequency at one timemay have a higher frequency later, for example. Therefore, in orderto use the representation in (7) to analyze the time-frequency struc-ture of xm(t) by assessing the changes over time in ωi,m(t) and ri,m(t),we have to bear in mind that components may “switch” from timeto time as the data structure, and the model’s response, evolves.

Under these conditions, we performed the decomposition in (7)
using a software tool (freely available at http://www.stat.duke.edu/research/software/west/tvar.html) which implements the sequen-tial updating and retrospective smoothing algorithms for comput-ing (2)–(7) at each time instant [18]. Specifically, this software
http://www.stat.duke.edu/research/software/west/tvar.html

http://www.stat.duke.edu/research/software/west/tvar.html

624 D. Gutiérrez, R. Salazar-Varas / Biomedical Signal Processing and Control 7 (2012) 622– 631

e pro

iiwwoCstT

2

ltwicvdfs

P

TcD

W

wwbmd

f

f

atd

Fig. 1. Block diagram of th

mplementation allows us to decompose a window of the orig-nal EEG signals defined by the time-series [xm(to), . . ., xm(tf)]T,

here to < tf. Such decomposition returns pz components fromhich those with the most significant frequency content, based

n the eigenvalue assessment of ωi,m(t), will be used as input in theSP algorithm. The selected components are then arranged into apatio-temporal matrix X of size poM × tf − to + 1, where po ≤ pz ishe number of components selected from the decomposition of theVAR.

.2. CSP

Let us consider the case of two-class discrimination (e.g., contra-ateral brain activity from either left or right motor cortex). Then,he composite spatial covariance can be computed as Cc = C1 + C2,here C1 and C2 are the average spatial covariance obtained from

ndependent trials of classes 1 and 2, respectively. Furthermore, Cc

an be factored as Cc = UcDcUTc , where Uc is the matrix of eigen-

ectors and Dc is the diagonal matrix of eigenvalues arranged inescending order. Then, we can use the following whitening trans-ormation on C1 and C2 to equalize their variances in the spacepanned by Uc:

= D−1/2c UT

c . (8)

herefore, S1 = PC1PT and S2 = PC2PT are matrices that shareommon eigenvectors, i.e., if S1 = BD1BT then S2 = BD2BT, where1 + D2 = I.

Under these conditions, the spatial filter W is given by

= (PT Bo)T , (9)

here Bo contains the first and last eigenvectors in B, then V = WXill produce feature vectors that are optimal for discriminating

etween the two classes in the least square sense [6]. Further-ore, the columns of W−1 become the CSP, and from the rows of V,

enoted as vTk, for k = 1, 2, . . ., poM, we construct the feature vector

= [f1, . . . , fpoM]T to be used in the classification, where

k = log

[var(vT

k)∑poMvar(vT )

], (10)

j=1 j

nd with var(·) denoting the variance of the vector’s elements. Notehat the logarithmic transformation makes the variance features’istributions close to Gaussian.

posed processing scheme.

In order to allow a better understanding of the proposedmethod, Fig. 1 shows a diagram that summarizes the steps pre-viously described.

3. Numerical examples

In [13], we performed a series of numerical experiments usingsimulated EEG data from which we concluded that the proposedmethod was well suited for BCI applications under realistic condi-tions. Therefore, in this paper we evaluated the performance of themethod for the case of real EEG data corresponding to motor imag-inery tasks. We used two data sets from previous BCI competitions[19] which posed different challenges: firstly, the data set IVa fromthe third competition has the restriction of providing a small num-ber of training experiments; secondly, the data set I from the fourthcompetition corresponds to uncued tasks, then the EEG signals donot have a fixed length.

Under those conditions, we evaluated the performance of theprocessing method described in Section 2. Given that the softwaretool used for the decomposition adjusts the parameters of the TVARbased on a random-walk model [20], it takes into account two dis-count factors denoted as ̌ and ı. Then, for a given model order p,the parameters ̌ and ı must be chosen in such a way so that thejoint log-likelihood function of the data time series and the predic-tions of the model is maximized [21]. Since such estimation processwill generally lead to different estimates for distinct data sets, weperformed a series of preliminary tests in order to select fixed val-ues for the parameters. In the case of the model’s order, the valueof p was selected in accordance to the Schwartz’s Bayesian Crite-rion, which for different measurements yielded optimal orders inbetween 10 and 12. In any case, our preliminary tests showed thatgoing from p = 5 to p = 10 produced an increment in the classifica-tion accuracy in both simulated and real data between 8.61% and11.68%. However, increasing the order to p = 15 produced variationsin the accuracy from a reduction of 0.91% to an increment of 0.31%compared to p = 10. Then, we chose a fixed value of p = 10 for allthe experiments reported in this paper. For ̌ and ı, tests over agrid of values with 0.99 ≤ ̌ ≤ 0.999 and 0.98 ≤ ı ≤ 0.999 (which arethe operational ranges recommended in [18]) essentially showedno significant effect in the accuracy. Therefore, we chose the val-ues of ̌ = ı = 0.99 as these offered a good agreement between the
predicted TVAR and our data.
Next, the CSP of the decomposed TVAR were calculated and thecorresponding feature vectors in (10) were discriminated using aMahalanobis distance-based classifier. We chose this classification

Signal Processing and Control 7 (2012) 622– 631 625

aawToiatttpg

osctccc

u

•

•

•

•

Ir

crsbtt

3

A–pcc

Table 1Selection of sensors for each subject in data set IVa (labeled according to the 10–20international system).

Subject Sensors M

aa CP1, CP2, CP3, CP4, CPz 5al FC1, FC2, C1, C2, CP1, CP2 6av FC3, FC4, C3, C4, FCz, Cz 6aw CCP1, CCP2, CP1, CP2, C1, C2, CPz, Cz 8


lgorithm as it is known to provide good performance in BCIpplications where few measuring channels are used [22]. Thisell-known fact allows us to focus on evaluating the use of the

VAR decomposition in the EEG classification process. Therefore, inrder to evaluate the performance, we computed the correspond-ng receiver operating characteristics (ROC) curves. A ROC curve is

plot of the probability of correct classification of one class againsthe probability of incorrect classification of another class [23], thenhe best classification performance is achieved when the area underhe ROC curve is closer to a value of one. For this reason, in thisaper (see also [24]) we used the area under the ROC curves as aeneralized evaluation framework.

We performed the ROC-based evaluation for different selectionsf processing windows [xm(to), . . ., xm(tf)]T, as recent studies havehown that the performance of some methods is reduced as the pro-essing window gets shorter [25]. Given that to in our case indicateshe number of samples after the task’s cue, and tf − to + 1 samplesorrespond to the length of the processing window, we analyzedases where the window covered 2, 3, or 4 s of signal starting at theue, as well as 2 or 3 s of signal starting 1 s after the cue.

Furthermore, we evaluated the performance of the classifiernder different training conditions:

100% data training, which corresponds to train the classifier withall the data and then classify the same data. This is an importanttest in our case as it provides an upper bound in the perfor-mance, given that our proposed method is model-based and itis only expected to provide close to perfect classification whenthe TVAR model better fits the original data, independently ofthe classification algorithm used.50% data training, where the classifier was trained with half thedata available for each class, then the other half was used to eval-uate it. This process is repeated 2K times, where K is the numberof independent experiments (trials) available for each class, byreplacing the data then each trial becomes part of the evaluationset once. At the end, the mean and the standard deviation of thearea under the ROC curves for the 2K experiments is reported asthe final performance measure.Jackknife test, where the classifier is trained with all the dataminus one trial, then the remaining trial is used to test it. Theprocess is repeated 2K times to make sure each trial is classifiedonce. The mean performance is then reported as a measure ofconsistency in the classification process.BCI competition’s test, where the experimental settings requiredby the organizers of the BCI competition were used for compari-son purposes. The description of such settings are available at thecompetition’s official site: http://www.bbci.de/competition/.

n all cases, the evaluation trials of the two classes are arranged inandomized order before being presented to the classifier.

Finally, in order to evaluate the performance under conditionslose to those encountered in real-life applications, we used aeduced number of sensors out of those available. The selected sen-ors were those covering the motor cortex, and the number variedetween subjects from M = 5 to M = 8 depending on the combina-ion of sensors providing the best performance in the 100% dataraining test.

.1. Data set IVa

This data set was provided by Fraunhofer FIRST, Intelligent Datanalysis Group, and Campus Benjamin Franklin of the CharitÃ(c)
University Medicine Berlin, Department of Neurology, Neuro-
hysics Group [26]. In this case, five healthy subjects sat in aomfortable chair with arms resting on armrests. Visual cues indi-ated for 3.5 s which of the following two motor imageries the

ay FC3, FC4, CP3, CP4, C3, C4 6

subject should perform: right hand or right foot. The presentationof target cues was paused by periods of random length, 1.75–2.25 s,in which the subject could relax. Data from this process was mea-sured using an EEG array of 118 sensors located on the scalp usingthe international 10–20 placing system. The data was acquired at asampling frequency of 1000 Hz, then we downsampled it to 250 Hzin order to achieve similar conditions to those applied in our prelim-inary simulations [13]. Finally, we filtered the data with a band-passfilter between 8 and 30 Hz. This process was performed to five datasets labeled as aa, al, av, aw, and ay, where each data set containedK = 140 trials from each cue (hand or foot).

Under these conditions, we applied the proposed method asexplained in Section 2. From a preliminary assessment of the data,we chose po = 1 component from the eigendecomposition as italready corresponded to the frequency band of interest with themost relevant power in comparison to further components. Anexample of this is shown in Fig. 2, where the power densities of thefirst four components of the eigendecomposition were computedfrom the average data measured in channel C1 of subject al for eachclass. The resulting densities showed that the first component hadthe greatest energy content in the 10–18 Hz band, while the imag-inery tasks could be discriminated from the average densities. Thesecond component turned out to be about ten times less significantin terms of the energy content in comparison to the first compo-nent and discrimination between classes was not straight forward.The third and fourth components were even less significant.

Next, the CSP were computed from the poM = M characteristiccomponents. As stated previously, the final evaluation of the clas-sification process was performed for a small value of M which variedbetween subjects. However, in order to qualitatively compare ourresults to those in [11], we computed the CSP for all subjects for thecase of the same array of M = 31 sensors as the one used in [13]. Theresulting average spatial patterns are shown in Fig. 3. The obtainedCSP show, with the exception of subject av, well defined enhancingregions over the left motor-cortex (contra-lateral activation corre-sponding to the right hand task) and over the centro-parietal region(associated to the foot task). Furthermore, these patterns matchedvery closely those obtained with the filter bank-based method (seeFigure 5 in [27]). However, the selection of the most significant fre-quency band for each subject was, in our case, a direct result of theeigendecomposition, while in [27] an additional feature selectionalgorithm was required to choose the best CSP.

We proceeded to evaluate the performance of the proposedmethod for the case of discriminating the data with the Maha-lanobis classifier. Table 1 shows the selection of sensors that wasused for each subject. Such selection was performed by a series ofpreliminary tests where different sensor arrangements were com-pared, then Table 1 only reports the arrangements through whichthe best performance was achieved. An optimal selection of sen-sors can be systematically attained through different methods (fora more detailed discussion on that matter, see [28]). However, suchoptimization process falls out of the scope of this work, then we
simply opted for a manual search for the best sensor configura-tion with the constraint of choosing the least possible number ofchannels without compromising performance.
http://www.bbci.de/competition/


Fig. 2. Original power densities and their four most significant components obtained from the eigendecomposition. The densities of the components are normalized againstthe maximum power in (a). Red lines correspond to the imaginery hand task and blue lines to the foot task. (For interpretation of the references to color in the figure caption,the reader is referred to the web version of the article.)

Fig. 3. Spatial patterns interpolated from M = 31 sensors (indicated in black dots). From left to right, the figures correspond to the CSP of subjects aa, al, av, aw, and ay,respectively. The top row corresponds to the imaginery foot task, while the bottom row corresponds to the hand task.

D. Gutiérrez, R. Salazar-Varas / Biomedical Signal Processing and Control 7 (2012) 622– 631 627

Table 2Area under the ROC curves for the proposed method under different training conditions.

Subject Time interval (s) Training

100% 50% Jackknife

aa 0–3 0.890 0.813 ± 0.049 0.8320–2 0.862 0.801 ± 0.023 0.8291–3 0.836 0.729 ± 0.044 0.764

al 0–3 0.973 0.958 ± 0.031 0.9710–2 0.946 0.886 ± 0.056 0.9211–3 0.971 0.944 ± 0.043 0.943

av 0–3 0.768 0.694 ± 0.027 0.6710–2 0.771 0.681 ± 0.032 0.6361–3 0.756 0.674 ± 0.033 0.654

aw 0–3 0.915 0.841 ± 0.022 0.9320–2 0.832 0.720 ± 0.044 0.8541–3 0.887 0.826 ± 0.028 0.907

ay 0–3 0.916 0.861 ± 0.038 0.8860–2 0.899 0.847 ± 0.023 0.8791–3 0.893 0.826 ± 0.050 0.854

−1.2 −1 −0.8 −0.6 −0. 4−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

f

(a)

(b) (c)

(d) (e)

1

f 5

−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

f 1

f 6

−1.4 −1.2 −1 −0. 8 −0. 6 −0. 4−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

f 1

f 6

−1.4 −1.3 −1.2 −1.1 −1 −0.9 −0.8 −0.7 −0.6−1.5

−1.4

−1.3

−1.2

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

f 1

f 8

−1.3 −1. 2 −1. 1 −1 −0. 9 −0. 8 −0. 7 −0. 6 −0. 5−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

f 1

f 6

Fig. 4. Plots of the magnitude of the first versus the last component of feature vectors f for each subject. Events marked with a circle (©) correspond to the foot task, whilethose marked with an asterisk (∗) correspond to the hand task.


nce over the five subjects under different training conditions.M

sfotbttcttmiwdwttd

a

Table 3Classification accuracy under the BCI competition’s conditions.

Subject Correct classification (%)

aa 83al 98

M

Fig. 5. Comparison of the average classification performaodified from [29].

The results of the Mahalanobis distance-based classification arehown in Table 2 in terms of the area under the ROC curve obtainedor the 100% and 50% training tests, as well as the percentagef correct classifications for the case of the Jackknife tests. Fromhose results we note that our method is capable of discriminatingetween two classes with high accuracy even under realistic condi-ions such as reduced number of sensors and a reduced number ofrials used for training. In particular, the 50% training test providedonsistent results with a low standard deviation, while changinghe selection of the time-window to process only seemed to affecthe performance significantly in subjects aa and aw (reducing 1 s

ade the performance drop to 0.72). The exception to these find-ngs were the results obtained from subject av who represented the

orst case and for whom the CSP were not well defined indepen-ently of the sensor selection. As a qualitative reference, in Fig. 4e plotted the magnitude of the first versus the last component of

he feature vector v (those components offer the best discrimina-
ion between classes) for all subjects and for a representative set ofata used in the 50% training test.
Another test performed on this data set corresponded to evalu-te the classification accuracy according to the settings of the BCI

aa al av0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Su

Cor

rect

cla

ssifi

catio

n ra

te

CSPLW−CSPLRDSSRCSPR−CSP−CVR−CSP−ATVAR−CSP

Fig. 6. Comparison of the average classification performodified from [29].

av 58aw 84ay 80

Competition IV. The results of this test for each patient are shown inTable 3, while the averaged accuracy corresponds to 81%. This resultwould have ranked the proposed method in the fourth position ofthe competition.

Finally, and in addition to the tests originally proposed atthe beginning of Section 3, we decided to evaluate the proposedmethod against competing solutions for completeness. This setof experiments were carried out under the same conditions and
for the same methods originally suggested in [29], which are:conventional CSP, CSP using a covariance matrix which was regu-larized as proposed in [30] (denoted as LW-CSP), logistic regressionwith dual spectral regularization (LRDS) proposed in [31], spatially
aw ay meanbject

ance for each subject over all training conditions.

D. Gutiérrez, R. Salazar-Varas / Biomedical Signal Processing and Control 7 (2012) 622– 631 629

Table 4Area under the ROC curves for the proposed method under different training conditions (calibration set).

Subject Time interval (s) Training

100% 50% Jackknife

A 0–4 0.845 0.838 ± 0.046 0.8500–3 0.841 0.797 ± 0.033 0.8001–4 0.748 0.719 ± 0.049 0.7950–2 0.779 0.726 ± 0.028 0.785

B 0–4 0.589 0.559 ± 0.055 0.7100–3 0.553 0.528 ± 0.050 0.6651–4 0.592 0.574 ± 0.036 0.6800–2 0.548 0.539 ± 0.053 0.555

C 0–4 0.615 0.561 ± 0.045 0.5500–3 0.607 0.568 ± 0.034 0.5901–4 0.585 0.535 ± 0.045 0.6250–2 0.566 0.510 ± 0.043 0.595

D 0–4 0.796 0.779 ± 0.056 0.8350–3 0.761 0.712 ± 0.043 0.7851–4 0.802 0.785 ± 0.041 0.8000–2 0.639 0.612 ± 0.043 0.700

E 0–4 0.944 0.924 ± 0.025 0.9600–3 0.878 0.833 ± 0.039 0.8801–4 0.956 0.926 ± 0.019 0.9500–2 0.753 0.712 ± 0.048 0.756

F 0–4 0.801 0.716 ± 0.045 0.7750–3 0.739 0.682 ± 0.052 0.7701–4 0.617 0.568 ± 0.093 0.7500–2 0.799 0.756 ± 0.027 0.785

G 0–4 0.906 0.857 ± 0.045 0.915.925

.898

.846

riRet8utt

doabrritcmsnpwesatf

3

m

0–3 01–4 00–2 0

egularized CSP (SRCSP) proposed in [32], as well as the two regular-zed versions of the CSP proposed in [29] (denoted as R-CSP-CV and-CSP-A, respectively). All these methods were tested under differ-nt training conditions, where the training samples correspondedo the first K trials from the original data set, for K = 10, 20, 40, 60,0, 100, 120, 160, 200, and 240, while the remaining trials weresed for testing. More details regarding the experiment set-up andhe methods being compared can be found in [29] and referencesherein.

The results of the final test performed on this data set asescribed above are summarized in Figs. 5 and 6. In these figuresur proposed method is denoted as TVAR-CSP and they show theverage performance over all subjects as a function of the num-er of training samples and the individual average performance,espectively. The results in Fig. 5 shows that the proposed methodeached a rate of 80% correct classifications as soon as K = 40 train-ng trails were used, while keeping an average performance close tohe best performing algorithms for larger training sets. This resultonfirms that the proposed algorithm is capable of a good perfor-ance under realistic conditions (such as those of a small-sample

etting) even when our method only used a small number of chan-els as indicated in Table 1. Furthermore, Fig. 6 shows that theroposed method outperforms most of the competing methods,hile being surpassed only by the methods proposed in [29]. Nev-

rtheless, we strongly believe that our proposed method is betteruited for real-life applications as it requires few parameters to bedjusted in order to reach optimal performance (those related tohe TVAR model), and reaches an average performance above 80%or most patients even when a few measuring channels are used.

.2. Data set I

This data set was provided by the Berlin BCI group [33]. Here,otor imagery was performed without feedback. For each subject

0.873 ± 0.041 0.9300.854 ± 0.032 0.8800.795 ± 0.027 0.875

two classes of motor imagery were selected from left hand, righthand, and foot (side chosen by the subject and optionally also bothfeet). Then, cues were presented in two ways: (i) arrows pointingleft, right, or down were presented as visual cues on a computerscreen for a period of 4 s during which the subject was instructedto perform the cued motor imagery task, and these periods wereinterleaved with 2 s of blank screen and 2 s with a fixation crossshown in the center of the screen and superimposed on the cues,i.e. it was shown for 6 s; (ii) the motor imagery tasks were cued bysoft acoustic stimuli (words left, right, and foot) for periods of vary-ing length between 1.5 and 8 s, and the end of the motor imageryperiod was indicated by the word stop. For the purposes of the BCICompetition IV, the data generated in the first way was consideredthe calibration set, while the second was considered the evaluationset. In both cases, the data was acquired with an EEG array of 59electrodes at a sampling frequency of 1000 Hz and, in the same wayas with data set IVa, data sets were band-pass filtered and down-sampled. As a result, seven data sets from healthy subjects (labeledas A, B, C, D, E, F, and G) were obtained, each with K = 100 trials fromtwo out of the three available cues (the selection varied betweensubjects based on the subject’s best performance).

In a similar manner as for previous data sets, we evaluated theperformance of the classifier for a selection of sensors: in the caseof subjects A and F, we used the electrodes C1, C2, CP1, CP2, P1, andP2, while for the rest of the subjects the selection was C3, C4, CP3,CP4, CPz, and Cz. Under those conditions, we first computed theperformance of the proposed method when using the calibrationset as in this case all the EEG signals have the same length. There-fore, we performed the 100%, 50%, and Jackknife training tests, andthe results are shown in Table 4. In this case the results among
subjects are not as consistent as those seen in the previous datasets. However, we believe this is the result of having an even morereduced number of trials for training. Still, under some conditionsthe subjects A, D, E and G achieved performances above 0.8 using


Table 5Area under the ROC curves for the proposed method (evaluation set). The gray-shaded cells indicate the time interval used in the training stage of the “fixedinterval” test.

Subject Time interval (s) Equalintervals

Fixedinterval

A 0–4 0.743 0.6590–3 0.749 0.7491–4 0.621 0.5790–2 0.705 0.707

B 0–4 0.613 0.6130–3 0.513 0.5811–4 0.573 0.5330–2 0.565 0.485

C 0–4 0.009 0.0480–3 0.533 0.5331–4 0.498 0.4950–2 0.444 0.462

D 0–4 0.809 0.7870–3 0.839 0.8391–4 0.791 0.7960–2 0.752 0.812

E 0–4 0.873 0.8730–3 0.806 0.8311–4 0.864 0.9000–2 0.718 0.803

F 0–4 0.509 0.5640–3 0.548 0.5871–4 0.508 0.5540–2 0.704 0.704

G 0–4 0.837 0.8370–3 0.796 0.825

ow

twwacptuwwlTcttbppspc

umtv1tw

Table 6Classification accuracy under the BCI competition’s conditions.

Subject MSE

A 1.46

[2] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic

1–4 0.810 0.8660–2 0.772 0.804

nly six sensors and, in the case of subject G, this was true evenhen using the shortest processing window.

Furthermore, we evaluated the performance for the case whenraining our system with the calibration data set and testing itith the evaluation set. Therefore, the area under the ROC curveas computed for the different time intervals previously evalu-

ted. However, while the training was performed with 100% of thealibration set, only the trials in the evaluation set with the appro-riate length to fit the corresponding time interval were used foresting. The results of this process are shown in Table 5 in the col-mn labeled as “equal intervals”. Then, a second test was performedhere the system was trained using different time intervals, butas only tested using data in the evaluation set with a fixed time

ength corresponding to the best case in the previous evaluation.his test was performed to account for the inherent discrepan-ies in signal length encounter in uncued tasks. The results of thisest are shown also in Table 5 in the “fixed interval” column. Fromhese two tests we can conclude that using a different time intervaletween the training and testing data sets moderately affects theerformance, specially on those cases where the training stage iserformed with the shortest processing window (as in the case ofubject F). On the contrary, training the system with the longestrocessing window available seems to be the safest option, as onean note from the results with subjects E and G.

Finally, we evaluated the performance of the proposed methodnder the settings of the BCI competition. In this case, the perfor-ance measure was the mean squared error (MSE) with respect to

he target vector with the values { − 1, 0, 1} indicating (at every indi-idual time sample) the patient’s true state corresponding to class
, a baseline state (no task), or class 2, respectively. Even thoughhe case of multi-class classification was out of the scope of thisork and was planned as part of future developments, we decided
B 1.07F 1.80G 0.81

to present a preliminary evaluation for comparison purposes andcompleteness. Therefore, in order to implement this experiment,we used the extension of the CSP method to multiple classes pro-posed in [34], and the classification was performed on a slidingwindow with a time length corresponding to the best fixed inter-val previously found for each patient (gray-shaded cells in Table 5),while the processing window was moved every 10 time samples,i.e., the classifier was able to update the detected patient’s stateevery 40 ms. The MSE in the classification of the data correspondingto patients A, B, F, and G are shown in Table 6, while we omit-ted the test of the remaining patients as those corresponded tosimulated data. As result, the average MSE produced by this testwas 1.28, which would have ranked the proposed method in thetwenty-fourth position of the competition. The poor performancein this test again indicates that the proposed method requires fur-ther adjustment before being applicable to uncued tasks, whilemore extensive work is required for the classification multipleclasses.

4. Conclusions

We presented a classification method based on modeling EEGsignals through a TVAR as a preliminary step of the CSP method.By selecting the most significant eigencomponents and using themas input to the CSP method, we were able to eliminate the need ofusing filter banks, yet obtaining the same spatial patterns, and thenalso eliminating the need of the additional step of feature selectionbefore the classification stage.

Our results showed that the proposed method can achieve goodclassification performance, specially in the case of data from cuedtasks. In the case of uncued tasks, the performance seems to dependon the length of the processing window, specially in the trainingstage. Still, the proposed method is well suited for practical BCIapplications given that real-life conditions (such as a reduced num-ber of trials used for training data and a few sensors used from themeasurements) were always considered.

Future work will include a more intensive experimentationdifferent classification algorithms while using real EEG datacorresponding to multi-class events. Also, in terms of the imple-mentation, future work will introduce a systematic search for theoptimal selection of sensors, then performance could be improvedon a subject-to-subject basis. Finally, an adaptive implementationof the proposed method is already considered as a solution to thecurrent shortcomings in the case of uncued tasks.

Acknowledgment

This work was supported by the National Council of Science andTechnology (CONACyT-Mexico) under Grant 101374.

References

[1] Z.J. Koles, M.S. Lazar, S.Z. Zhou, Spatial patterns underlying population differ-ences in the background EEG, Brain Topography 2 (4) (1990) 275–284.

Press, Boston, 1990.[3] Z.J. Koles, The quantitative extraction and topographic mapping of the abnor-

mal components in the clinical EEG, Electroencephalography and ClinicalNeurophysiology 79 (1991) 440–447.

Signal

[

[

[

[

[

[

[

[

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[


[4] Z.J. Koles, J.C. Lind, A.C.K. Soong, Spatio-temporal decomposition of the EEG:a general approach to the isolation and localization of sources, Electroen-cephalography and Clinical Neurophysiology 95 (1995) 219–230.

[5] A.C.K. Soong, Z.J. Koles, Principal-component localization of the sources ofthe background EEG, Transactions on Biomedical Engineering 42 (1) (1995)59–67.

[6] H. Ramoser, J. Muller-Gerking, G. Pfurtscheller, Optimal spatial filtering ofsingle trial EEG during imaging hand movement, IEEE Transactions on Reha-bilitation Engineering 8 (2000) 441–446.

[7] G. Pfurtscheller, C. Neuper, Motor imagery and direct brain computer commu-nication, Proceedings of the IEEE 89 (7) (2001) 539–550.

[8] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, -R. Müller, Optimizing spatialfilters for robust EEG single-trial analysis, IEEE Signal Processing Magazine 25(1) (2008) 41–56.

[9] J. Müller-Gerking, G. Pfurtscheller, H. Flyvbjerg, Designing optimal spatial filtersfor single-trial EEG classification in a movement task, Clinical Neurophysiology110 (1999) 787–798.

10] Q. Novi, C. Guan, T.H. Dat, P. Xue, Sub-band common spatial pattern (SBCSP)for brain–computer interface, in: 3rd International IEEE/EMBS Conference onNeural Engineering, 2007, pp. 204–207.

11] K.P. Thomas, C. Guan, C.T. Lau, A.P. Vinod, K.K. Ang, A new discriminative com-mon spatial pattern method for motor imagery brain–computer interfaces, IEEETransactions on Biomedical Engineering 56 (11) (2009) 2730–2733.

12] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, K.R. Müller,Combined optimization of spatial and temporal filters for improvingbrain–computer interfacing, IEEE Transactions on Biomedical Engineering 53(11) (2006) 2274–2281.

13] D. Gutiérrez, R. Salazar-Varas, EEG signal classification using time-varyingautoregressive models and common spatial patterns, in: Proceedings of the33rd Annual Conference of the IEEE Engineering in Medicine and Biology Soci-ety, Boston, MA, USA, 2011, pp. 6585–6588.

14] T.S. Rao, The fitting of non-stationary signals, Journal of the Royal StatisticalSociety B32 (1970) 312–322.

15] A.D. Krystal, R. Prado, M. West, New methods of time series analysis ofnon-stationary EEG data: eigenstructure decompositions of time varyingautoregressions, Clinical Neurophysiology 110 (1999) 2197–2206.

16] M. Morf, A. Vieria, D. Lee, T. Kailath, Recursive multichannel maximum entropyspectral estimation, IEEE Transactions on Geoscience Electronics 16 (1978)85–94.

17] M. West, R. Prado, A.D. Krystal, Evaluation and comparison of EEG traces: latentstructure in non-stationary time series, Journal of the American Statistical Asso-
ciation 94 (448) (1999) 1083–1095.
18] R. Prado, Latent structure in non-stationary time series, PhD thesis, Duke Uni-versity, Durham, NC, 1998.

19] B. Blankertz, K.-R. Müller, D. Krusienski, G. Schalk, J. Wolpaw, A. Schoegl, G.Pfurtscheller, Jd.R. Millan, M. Schroeder, N. Birbaumer, The BCI competition.

[

Processing and Control 7 (2012) 622– 631 631

III: validating alternative approaches to actual BCI problems, IEEE Transactionson Neural Systems and Rehabilitation Engineering 14 (2) (2006) 153–159.

20] M. West, J. Harrison, Bayesian forecasting and dynamic models, 2nd ed.,Springer, New York, 1997.

21] M. West, Time series decomposition, Biometrika 84 (1997) 489–494.22] F. Babiloni, L. Bianchi, F. Semeraro, J.R. Millán, J. Mourino, A. Cattini, S. Sali-

nari, M.G. Marciani, F. Cincotti, Mahalanobis distance-based classifiers are ableto recognize EEG patterns by using a few EEG electrodes, in: Proceedings ofthe 23rd Annual Conference of the IEEE Engineering in Medicine and BiologySociety, Istanbul, Turkey, 2001, pp. 651–654.

23] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27(2006) 861–874.

24] D. Gutiérrez, D.I. Escalona-Vargas, EEG data classification through signal spa-tial redistribution and optimized linear discriminants, Computer Methods andPrograms in Biomedicine 97 (1) (2010) 39–47.

25] Y. Zhang, X. Qing, B. Wang, X. Wang, LASSO based stimulus frequency recog-nition model for SSVEP BCIs, Biomedical Signal Processing and Control, 7 (2)(2012) 104–111.

26] G. Dornhege, B. Blankertz, G. Curio, K.-R. Müller, Boosting bit rates innon-invasive EEG single-trial classifications by feature combination and multi-class paradigms, IEEE Transactions on Biomedical Engineering 51 (6) (2004)993–1002.

27] K.K. Ang, Z.Y. Chin, H. Zhang, C. Guan, Filter bank common spatial pattern(FBCSP) in brain–computer interface, in: Proceedings of the IEEE InternationalJoint Conference on Neural Networks, 2008, pp. 2390–2397.

28] C. Sannelli, T. Dickhaus, S. Halder, E.-M. Hammer, K.-R. Müller, B. Blankertz,On optimal channel configurations for SMR-based brain–computer interfaces,Brain Topography 23 (2010) 186–193.

29] H. Lu, H.-L. Eng, C. Guan, K.N. Plataniotis, A.N. Venetsanopoulos, Regularizedcommon spatial pattern with aggregation for EEG classification in small-sample setting, IEEE transactions on Biomedical Engineering 52 (12) (2010)2936–2945.

30] O. Ledoit, M. Wolf, A well-conditioned estimator for large dimensional covari-ance matrices, Journal of Multivariate Analysis 88 (2) (2004) 365–411.

31] R. Tomioka, K. Ahihara, Classifying matrices with s spectral regularization,Proceedings of the International Conference on Machine Learning (2007)895–902.

32] F. Lotte, C. Guan, Spatially regularized common spatial patterns for EEG classifi-cation, in: Proceedings of the International Conference on Pattern Recognition,2010, pp. 3712–3715.

33] B. Blankertz, G. Dornhege, M. Krauledat, K.-R. Müller, G. Curio, The non-invasive
Berlin brain–computer interface: fast acquisition of effective performance inuntrained subjects, NeuroImage 37 (2) (2007) 539–550.
34] M. Grosse-Wentrup, M. Buss, Multiclass common spatial patterns and informa-tion theoretic feature extraction, IEEE Transactions on Biomedical Engineering55 (8) (2008) 1991–2000.

Date post:	24-Nov-2016
Category:	Documents
Upload:	david-gutierrez
View:	214 times
Download:	0 times

Using eigenstructure decompositions of time-varying autoregressions in common spatial patterns-based...

Documents