Identification of inter-ictal spikes in the EEG using neural network analysis

t

L. Tarassen ko Y.U. Khan M.R.G. Holt

Indexing terms: Inter-ictal spikes, EEG, Neural network analysis, Epilepsy

Abstract: Between seizures, the electroencephalogram (EEG) of subjects who suffer from epilepsy is usually characterised by occasional spikes or spike and wave complexes (inter-ictal activity). These are notoriously difficult to detect reliably, and they are occasionally missed by the clinicians who review the paper records retrospectively. The authors investigate the training and testing of a neural network classifier for the detection of inter-ictal spikes in these subjects. For the characterisation of the EEG signal, they consider both time- domain parameters normalised with respect to context and coefficients from an autoregressive model. It is shown how to use balanced databases to evaluate the discriminatory power of these parameters when they are used as the input features to a multi-layer perceptron (MLP). Both patient-specific classifiers and a generic system tested on independent test subjects are investigated. With the former, spikes are detected with an accuracy varying from 85.6% to 95.6%, a sensitivity varying from 83.1% to 97.3% and a specificity varying from 85.9% to 95.5%. The performance of the generic MLP system is not substantially degraded with respect to this, but there are too many false positives for the system to be considered for regular clinical use at the moment. The authors suggest how this problem might be solved using a combination of techniques.

1 Introduction

In our previous work on the neural network analysis of the sleep electroencephalogram (EEG) [ 1-31, a neural network classifier was trained using a database often- dimensional feature vectors with equal amounts of wakefulness, light/dreaming sleep and deep sleep EEG data. The time course of the three outputs from the trained network allows the sleep-wake continuum of 0 IEE, 1998 IEE Proceedings onhne no. 19982328 Paper frst received 30th April and in revised form 12th August 1998 L. Tarassenko and Y.U. Khan are with the Neural Networks Research Group, Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK M.R.G. Holt was with the University o f Oxford and is now with Nexan Telemed Ltd, The Quorum, Barnwell Road, Cambridge CB5 8RE, UK

test subjects to be tracked second-by-second through- out the night [3]. This form of sleep EEG analysis has now been validated in a number of studies [4-61.

In this paper, we extend our work on the neural network analysis of the EEG to consider the detection of inter-ictal spikes in epileptic patients. Epilepsy does not refer to a specific disease, but rather to a group of symptoms due to varied causes. It is defined as a chronic brain disorder of various aetiologies characterised by recurrent seizures (ictal disturbances) due to excessive discharge of cerebral neurons. Between seizures, the EEG of subjects who suffer from epilepsy is usually characterised by occasional inter-ictal activity in the form of spikes or spike and wave complexes. When the EEG is recorded in the investigation of epilepsy, it is almost always an inter-ictal recording which is made. Occasionally, a seizure may also occur by chance during the recording but, in the majority of instances when epilepsy is confirmed by analysis of the EEG, it is on the basis of inter-ictal rather than ictal activity [7].

- I s Fig. 1 recordings analysed in this paper

Inter-ictal spikes (shown with rectangles) from one of the EEG

A spike, as shown in Fig. 1, is a sudden burst of electrical activity, lasting for up to 70ms, in an otherwise normal background. A sharp wave is simply a spike of longer duration, usually between 70 and 200ms. Spikes and sharp waves are sometimes followed by a slower wave, hence the description of ‘spike and wave complex’.

Ambulatory monitoring of patients with known or suspected epilepsy has become widely used, and this may involve one or more days of continuous EEG recording. Inter-ictal spikes are always a relatively rare

IEE Proc -Sa Meas Technol, Vol 145, No 6, November 1998 270

occurrence and are occasionally missed by the clinicians who review 1.he paper records retrospectively. There is therefore, a clear need for an automated analysis system, which can reliably identify spikes in an EEG signal. In this paper, we train neural networks to classify EEG patterns as either being normal or containing a spike-like waveform. Most of the previous work on the automalted detection of spikes in the EEG has concentrated or1 the use of time-domain information ([8--10]). In a later Section in this paper, we assess both time-domain and frequency-domain parameters as input features to the neural network classifiers.

The databases of EEG recordings containing inter- ictal events are deslcribed briefly in the next Section. The rest of the papler deals with the issues of feature selection, the training of the neural networks and the assessment of their performance. A key issue is whether the spike detection system can be made generic or whether it has to be specific to each patient.

2 Databases

The assembly of a uaining database containing examples of inter-ictal spikes is complicated by the fact that clinicians assign a global score to each time period in the EEG record; although a spike (which is usually a localised event) may only be seen in a few channels, all of the channels for that time period will be given the same label. As a result, the channels in which there is no spike are mislabelled. The relabelling of individual channels by cliniciains would require a huge amount of work and is therefore not practical. We have to accept that the databases will contain some labelling errors.

Two separate databases were available for the work described in this paper. The first of these consisted of data recorded on two alert subjects (A and B) with a 16-channel montage. For subject A, there were 31 recordings, corresponding to 190 minutes of data; for subject B, the 77 short recordings available amounted to 246 minutes of data. The EEG was sampled at 128 Hz, with 8-bit accuracy. The second database was acquired at a different hospital, using a 20-channel montage, for two groups of subjects. With the first group (Group C), 60 minutes of EEG were recorded in total, during periods of wakefulness; with the second group (Group D), the same amount of data was recorded from another set of patients, but during sleep. In both cases, the EEG was sampled at 200Hz, with 12-bit accuracy, and subsequently resampled at 128 Hz, using linear interpolation. In all cases, the recordings were made using the 10-20 international system of elec- trode placement [7].

The EEG recordings from subjects A and B were used for the development of the neural network classification system. It was intended that the recordings from Groups C and D should be kept back for the assessment of the final system on independent test data. For all subjects, all the data from the 16 or 20 channels was saved, but each channel was effectively treated inde- pendently (save for the fact that all channels from the same time period have the same label as discussed above).

Even with 16 or 20 channels, spikes are under- represented in all of the databases. If the recordings are partitioned using 1 -second segments (see next Section), spikes are found to occur between 1 and 8% of the total recording time, depending on the subject. Even for the subject for whom they are most frequent

IEE Proc -Sa Meas Techrrol, Vol 145, No 6, November 1998

(subject B), the 246 minutes of EEG data only include 18 minutes of spike data. This imbalance between normal and pathological data has an impact on the issue of feature selection as explained in Section 3.

3 Overview

The aim of the work described in this paper is simple enough to state: to design, train and test a neural network classifier capable of identifying spikes in EEG data. For our purposes, the detection of a spike is treated as a two-class problem (i.e. discriminating between normal EEG and ‘spike EEG). The design of a neural network analysis system involves two distinct phases [ll]: ‘prototyping’ during which issues such as feature selection and network architecture are investigated as comprehensively as possible, followed by a phase during which the final system needs to be tested rigorously under conditions similar to its eventual use ‘in the field’. In addition to this, there are two other important considerations in this study: first, is it possible to detect spikes with a generic system as opposed to a patient-specific classifier? Secondly, how should the fact that spikes only occur rarely be taken into account?

The training strategy described below was developed firstly to establish benchmarks for a patient-specific classifier and then to evaluate whether the system could be made generic. The first decision to be made con- cerns the choice of segmentation strategy. Spikes and sharp waves last for between 50 and 200ms. Hence the EEG could be segmented at quarter-second intervals, but the effort required to label an EEG record of several minutes duration would be considerable. The shortest timescale over which human experts will assign labels to the EEG (‘normal’ or ‘spike’) is of the order of one second. All of our previous work on the sleep EEG has also used 1-second segments [12], and we therefore adopt the same time period for segmenting the EEG in this study.

If we are going to try and identify a set of features which is optimal for discriminating between normal EEG patterns and spikes, we need to ensure that there are sufficient examples of the latter in the training data. Thus, for the prototyping phase, a balanced database is constructed in which the number of spike patterns is made artificially equal to that of the normal patterns: all the available examples of spike patterns are retained, whilst an equal number of normal patterns are chosen at random from the data set. In the final system, however, a normal EEG pattern will on average be 20 times more frequent than a spike. It can be shown [l 11 that application of Bayes’ theorem makes it possible to compensate for the different prior probabil- ities used in training the network during prototyping (for which the prior probability of a spike is artificially raised to 0.5 by using a balanced database) and in the final system (for which the prior probability of a spike is of the order of 0.05): the threshold on the neural network output required for the identification of a spike in the final system is simply raised from 0.5 to 0.95.

The first tasks during the prototyping phase are the selection of the feature set and of the neural network architecture. The procedure for training the neural networks, required not only for these tasks but also for all subsequent stages until the final system test, is described in Section 5, after the time- and frequency- domain features have been introduced.

271

4 Feature selection

Although there have been attempts to use neural networks for spike detection with raw EEG data as the input [13], such networks will not be capable of generalisation across a wide patient database, if only because the input representation is amplitude-dependent and hence patient-dependent. Furthermore, the dimensionality of the raw data is usually very high (number of samples in the chosen time interval). For a neural network to learn a nonlinear function mapping, the input vectors must be of low dimension unless there are very large numbers of representative vectors available to populate the high-dimensional input space adequately. Hence a feature extraction stage is required to reduce the dimensionality of the input space, so that the network can learn the function mapping and not simply the details of the training data [I 11.

4. I Erne-domain parameters Since a spike is a sharp transient, it will be characterised by high values of first and second derivatives. In the EEG literature, these parameters are known as the slope, dvldt (VIS), and sharpness, &vldt2 (VIS2), where v is the time-varying EEG signal. An alternative set of time-domain descriptors have been proposed by Hjorth [14] and subsequently used by Walmsley (151, namely activity (A), mobility ( M ) and complexity (C). These characterise the amplitude, slope and slope spread of the EEG as described below and the last two could therefore be used as alternatives to the more traditional slope and sharpness parameters.

“2 Fig.2 parameters

Threesample window for the compututlon of the time-domm

All of the time-domain parameters are calculated using a 3-sample window, as shown in Fig. 2. Let the slope from vo to vl be so and that from v1 to v2 be sl. We have:

( 1 )

( 2 )

AV /sol + Is11 2

average slope = - = at A 2 V sharpness = - = \si - sol At2

The time-domain descriptors, mobility (A4) and complexity (C) are estimated as follows:

Vo” + U; + u2” 3

v;v =

( 3 )

The 2x factor ensures that mobility and complexity are defined in Hz. A@ is always positive but C? may be

272

negative, in which case the signal is considered to have no complexity and 62 is an invalid measure.

If the EEG signal is digitised using a sampling frequency of N Hz, N - 2 values of slope, sharpness, mobility and complexity are calculated in a one-second segment. From these, the maximum values (AvlAt)m,x, (A2v/At2)max, MmaX and C,,, are determined for that one-second segment.

All of the early work on the use of slope and sharpness for spike detection (see, for example, [16-181) relied on predefined thresholds being exceeded as an indication of the occurrence of a spike. It is clear, however, that the threshold value should be adaptive, as first suggested by Carrie [19], to allow for variations in the amplitude and high-frequency noise content of different recordings. The running average can be used as a normalising factor to identify significant changes in any of these time-domain parameters with respect to the background activity. Thus, the maximum values identified in a one-second segment are scaled as follows with respect to the background values:

(5) ( n v i w n , ,

(NF)A normalised maximum slope =

Mmaz

( N F ) M normalised maximum mobility = ~ (7)

( 8 ) cmaz

( N F ) C normalised maximum complexity = ~

where ( N q A , (NqA2 , ( N q , and ( N q , are the normalisation factors for slope, sharpness, mobility and complexity, respectively.

It remains to determine what the optimal normalising factor is, optimal in the sense that it will allow maximum discrimination between normal EEG segments and those containing a spike. The discriminatory information present in each of the normalised parameters can be investigated qualitatively by plotting histograms of the distribution of its values for both normal EEG and spike EEG segments. The maximum amount of discrimination will be associated with that parameter for which there is least amount of overlap between the two histograms.

A systematic study was undertaken in order to find the optimal normalisation factor [20]. The following candidates were considered:

4. I. I Average of parameter value in present one-second segment:

N-2

(9)

where x, is the value of (AvlAt), (A2vlAt2), M or C for sample i and N = 128.

4.1.2 Average of parameter value in present segmenf excluding the maximum value and its immediate neighbours:

~ N - 2

(10) IEE Proc -Sa Meas Techno1 Vol 14.5 No 6, November 1998

where xi is as defined above and i,,, is the sample corresponding to (AvlAt),,,, (A2vlAt2)max, M,,, or em,, (different i,,, for each x).

4. '1.3 Average of parameter value in preceding normal segment: Here, the average of the values of the preceding 1-second segment is used for normalisation purposes, provided that the preceding segment has been labelled as being normal (i.e. not containing a spike). Thus:

. N - 2

2.0

1.0

4.1.4 Average of parameter value in previous five normal segments: Gotman [21] has suggested that the preceding five seconds of normal background are sufficient for defining the context when attempting to detect spikes. Here, the normalisation factor is calculated only if the previous five segments immediately preceding the segment in question have been deemed to be normal, in which case:

-

-

Each of the above discrimination factors was investigated using the database collected from subject B. For the first two indices, it was possible to make use of all the normal patterns (218523) and all the spike patterns (17782). For the last two, the number of available patterns is reduced, since there is a

0 10 20 30 40 a

requirement for either the previous segment or the previous five segments to be normal. This reduced the number of patterns to 207972 normals and 7975 spikes for the third normalisation factor, with a further decrease to 177773 normals and 4582 spikes for the fourth. In each case, the histogram plots were normalised to have equal areas for each class. They are shown in Fig. 3 for the fourth normalisation factor (eqn. 12), which was found to give the best separation between the two classes. This qualitative assessment was also confirmed by calculating the Bhattacharrya distance [20, 221, which is a measure of the separability between two probability density functions.

Fig. 3 also shows that there is more discriminatory information in the normalised values of the traditional time-domain parameters, slope and sharpness, rather than the normalised values of the mobility and complexity descriptors. For this reason, the first two are retained as possible input features for a neural network trained to detect spike patterns (see Section 5). An extra parameter, which is the duration of the spike with the maximum slope is also calculated for each segment (POI) . 4.2 Frequency-domain parameters Although a sharp transient such as a spike can be identified in the time domain, its high-frequency content should also make it detectable in the frequency domain. In all of our previous work on EEG analysis (mostly in the context of sleep disorders), we have made use of autoregressive modelling. AR modelling is best known as an alternative to the discrete Fourier

0.4 t h normal

0.2

epileptic

0 10 20 30 40 50

b

4'0 I 3'0 i

normal

0 1 ' - 1 ' 1 ' " - \ I . I 1.4 1.6 1.8 2.0 2.2 2.4

C d Fig.3 The normalisation factor is; the average of the parameter value in the previous five normal segments. Results are shown for subject B (177 773 normal patterns, 4582 epileptic spikes) a Slope; 11 Sharpness; c Mobility; d Complexity

IEE Proc.-Sei. Meas. Technol., Vol. 145. No. 6, November 1998

Histogram plotr: of the four time-domain parameters for the two classes (normal or epileptic spike)

213

transform (DFT) which is traditionally used for the spectral analysis of sampled signals [23]. With one-second segments and a sampling rate of 128Hz, there would be 64 DFT or FFT coefficients to characterise the amplitude spectrum of each segment. With an input dimensionality of m and a hidden-layer size (using a first approximation for estimating hidden-layer size [ll]) of dm, there will be of the order of dm(m + 1) weights in a neural network classifier with a single output node (not including the bias weights). With m = 64, this corresponds to 576 weights or free parameters in the network. To achieve good generalisation on previ- ously unseen patterns, it is recommended [ l l , 241 that there should be of the order of ten times as many training examples as weights in the network (i.e. approxi- mately 6000 training patterns). With similar numbers of patterns possibly required for the validation and test sets (see Section 9, this places a totally unrealistic con- straint on the number of spike patterns, which would need to be collected. A low-dimensional representation is therefore essential to keep the number of weights in the network, and hence the number of patterns required to train it, down to a minimum. A coarse, low-dimensional representation of the amplitude spectrum could be generated by grouping successive FFT coefficients together; for example, averaging each block of eight coefficients to obtain a single value over that frequency band would give a reduction from 64 to 8 input dimensions. However, this only succeeds in blur- ring the details of the EEG spectrum to such an extent that the discriminatory information is mostly lost. What is required is an accurate characterisation of the dominant frequencies in the EEG, so that changes in these over time can be tracked as a function of sleep state in our original work and as a result of the occurrence of an inter-ictal spike in this study. Such a characterisation is provided by an all-pole AR model, which allows the spectral peaks to be tracked even with a low-order model [12].

The theoretical basis for AR modelling of the EEG has been explored in detail elsewhere 112, 23, 251. The key concept is the assumption that the sequence {sk} of values from the sampled EEG signal is the output of a linear system driven by white noise. If successive samples from the output sequence sJ = 0, 1, ..., ( N - 1) are available, we can estimate a sample sk by the linearly weighted summation of the previous p sample values:

P ~ ^ k = - a2sk-2

where p is the model order. At time t = kT, where T is the sampling interval, we can calculate the error ek between the actual value and the predicted one:

2 = 1

P e k = s k - S^k = s k + C a 2 s k - 2

The parameters a, of the model (the AR Coefficients) are estimated by minimising the expectation of the squared error E over the N samples in the sequence:

2 = 1

The minimisation is performed by setting aElaai to zero which yields p linear equations known as the Yule- Walker equations [12] from which the p AR coefficients can be determined by inverting a p x p matrix. Since

214

this matrix is a Toeplitz matrix (the elements along any diagonal are identical), a more efficient solution, however, is to use a recursive procedure known as the Lev- inson-Durbin algorithm.

An intermediate set of values, known as the partial correlation or reflection coefficients, are also produced as part of this recursive procedure. The K, reflection coefficients encode the same information as the AR coefficients but they have an important advantage in that it can be ensured [25] that they satisfy the following condition:

This means that the distribution of values is bounded for each reflection coefficient and hence no scaling needs to be applied before they are used as inputs to a neural network. For these reasons, all of our recent EEG analysis work has used reflection coefficients in preference to AR coefficients 120, 25, 261.

4.2.7 Choice of model order: When autoregressive modelling is used for spectral estimation, a sufficient number of poles must be chosen to approximate the shape of the spectrum. The criterion for selecting the optimal order when AR or reflection coefficients are used as a feature vector at the input to a neural network is quite different. If the model order is too low, part of the information which would discriminate between the two classes will not be present in the input feature vector. If the model order is too high, some of the noise in the EEG signal will also be fitted. Hence the optimal model order is the minimum order which adequately encodes the discriminatory information and this can be established from simple classification experiments. Neural network classifiers are trained using a balanced database containing equal numbers of normal and spike patterns from subject A and partitioned into training, validation and test sets.

14 51

15 r

I

2 4 6 8 10

model order Fig. 4 coef$cients as input features (subject A )

Test classijkution error (96) against model order fov reflection

For each model order, the network architecture is determined from performance on the validation set (see below) and the classification error is then evaluated for this architecture on the test set. When this classification error is plotted as a function of model order, the plot shown in Fig. 4 is obtained. This shows that a model order of 6 is optimal. A similar result is obtained if the experiment is repeated for a balanced database of normal and spike patterns for subject B.

IEE Pyoc -Sa Mem Terhnol, Yo1 145, No 6, November 1998

4.2.2 Prediction error: Autoregressive modelling assumes that the time-varying signal is stationary within each segment (i.e. piecewise-stationary). A departure from stationarity (i.e. a change in EEG state, such as would occur with the generation of a spike) will be indicated by a large increase in the prediction error. It is therefore possible, for each segment, to use the prediction error for the sixth-order model as an extra input feature for the neural network classifier.

5 Training the neural network

The neural network classifier is a standard multi-layer perceptron [ l l , 271 with one layer of hidden units, trained using gradient descent to minimise the squared output error. The error-backpropagation algorithm [ 1 11 is used to calculate the weight updates in each layer of the network. (Typically, a learning rate of 0.01 and a momentum term of 0.5 may be used, but these values are not critical.) As the number of patterns in each database used for itraining is limited (see below), the technique of S-fold cross-validation is employed to partition the data.

Cross-validation lis a technique which was designed to ensure that as much information as possible is used in the training process. The available data is split up into S subsets, eaclh of equal size. The first subset is chosen to be the test set and the other S - 1 subsets are combined to form the training and validation sets. After the network is trained using these, the classification performance 011 the test set is recorded. The process is then repeated1 so that each of the S - 1 subsets acts as the test set im turn. The final classification performance is the average of the S test set results. In the work described in this paper, values of 10 or 5 were used for S. The use of cross-validation removes any dependence on the choice of patterns for the test set. By the time the procedure is complete, each pattern will have appeared once in the test set.

As the number of inputs in the MLP is determined by the number of features chosen (see next Section), the network architecture is effectively determined by the choice of number of hidden units. For a given network architecture, t he weights are initialised with small random values and the patterns in the training set are repeatedly presenteld in random order to the network. The weight update equations are applied after the pres- entation of each pattern. The training process is con- trolled by monitoring the classification error on the validation set. When this stops decreasing or even starts to rise, training should stop. The stopping criterion is therefore the point at which the minimum classification error on the validation set is reached (a method known as ‘early stopping’, [11, 271).

As the number of hidden units is gradually increased from its initial value, the minimum classification error on the validation set begins to decrease: the complexity of the neural network model more closely matches the complexity of the required input-output mapping. The optimal number of hidden units is that number for which the lowest classification error is achieved on the validation set. (If the number of hidden units is increased beyond this, performance does not improve and soon begins to deteriorate as the complexity of the neural network mode1 is increased beyond that which is required for the problem.) Once the optimal network architecture has been determined, the performance can

IEE Proc -Sei Meas Techriol, Vol 145, No 6, November 1998

be evaluated on the test set, which should always con- sist of independent data not used in the training procedure, either to determine the weights (training set) or decide when to stop training (validation set). The combination of early stopping with S-fold cross-validation is described in detail in [ 111.

6 Evaluation of features on balanced databases

MLP classifiers are now trained on balanced databases from subjects A and B, using either the time- or frequency-domain parameters as input features. For subject A, there are 2100 spike patterns and so the same number of patterns are selected at random from the normal EEG patterns in order to construct the balanced database. For subject B, the same strategy yields 4500 patterns of each class. In each case, S-fold cross- validation is used to partition the balanced database into training, validation and test sets.

For the time-domain parameters, slope and sharpness, the normalisation factor of eqn. 12 is applied to take the context into account. Reflection coefficients are independent of the amplitude of the time-varying signal which they model, and hence require no normalisation with respect to context. The prediction error for a given 1-second segment (see eqn. 13) does depend on the amplitude of the sample values and must therefore also be normalised with respect to the previous five normal segments, as with the time-domain parameters.

Table 1: Classification error rates with normalised slope, sharpness and duration as input features (balanced data bases)

Subject architecture error rate (%) rate (%I Optimal network Validation set Test set error

A 3-4-1 18.2 18.7 B 3-4-1 10.2 10.4 Duration is the interval between two successive minima which surround the point of maximum slope [I21

Table 2: Classification error rates with 6 reflection coefficients and normalised prediction error as input features (balanced databases)

Optimal network Validation set Test set error architecture error rate (%) rate (%)

A 7-5-1 7.7 8.2 B 7-7-1 9.3 9.5

The error rates in Tables 1 and 2 represent average classification error rates on the validation sets and the corresponding error rates subsequently evaluated on the test sets with the same MLP. (The test error rates are averages from several experiments, as the S-fold cross-validation procedure is repeated several times, with different random allocations of patterns to the S subsets in each case.)

It is clear from Tables 1 and 2 that the reflection coefficients, together with the normalised prediction error, give lower classification error rates than the time-domain parameters. The difference is more notice- able for subject A for whom some of the recordings were contaminated with 50Hz noise. This has a greater effect on the computation of slope and sharpness than on that of the reflection coefficients, and is part of the

215

reason for the higher classification error rates. It is true, in both cases, that the reflection coefficients and associated prediction error represent a more robust feature set than time-domain parameters.

With a neural network, it is possible to fuse the data from different domains at the input to the network. If there are higher-order correlations between the different parameters which are useful for discriminating between normal and spike patterns, these will be learnt during training. A ten-dimensional feature vectors con- sisting of the three time-domain parameters, the six reflection coefficients and the normalised prediction error are therefore used both for the patient-specific classifier and the generic systems investigated in the next two Sections.

7 Patient-specific classifier

So far, the emphasis has been on feature selection. We now turn our attention to developing an optimal patient-specific classifier. The performance of this classifier is assessed not only in terms of accuracy but also of sensitivity and specificity. Table 3 shows the four possibilities which can exist for each classification made by the MLP.

Table 3: Confusion matrix for MLP outputs

MLP output = MLP output = ‘spike‘ ‘normal’

Label = ‘spike‘ true positive (TP) false negative (FN)

Label = ’normal‘ false positive (FP) true negative (TN)

In the case of a true positive (TP), the MLP identifies a spike pattern which was labelled as such by the expert. A false positive (FP) is the detection of a spike in an EEG segment which is labelled as normal by the expert. A false negative (FN) indicates that the MLP has missed a spike which the expert has identified in that segment. Finally, in the case of a true negative (TN), the MLP and the expert both agree that the EEG pattern is normal.

number of spikes correctly identified total number of spike patterns TP

Sensitivity =

- - T P f F N

Specificity - number of patterns predicted as normal -

total number of normal patterns T N

T N + F P - -

Patient-specific classifiers are now trained for each of subject A and subject B. In each case, the same balanced databases as in the previous Section are split into training, validation and test sets using ten-fold cross- validation. The results are shown in Table 4 for both subjects, from which it can be seen that the use of the augmented 10-D feature vectors (rather than the 7-D vectors as in Table 2) has reduced the test classification error, from 8.2% to 1.2% for subject A and 9.5% to 8.3% for subject B.

216

Table 4: Patient-specific classifiers for subjects A and B trained using ten-fold cross-validation and balanced databases

Subject Optimal Accuracy Sensitivity Specificity architecture (%) (%) (%)

A 10-7-1 92.8 93.6 91.9

B 10-9-1 91.7 91.4 92.1

Figures are shown for the test sets only

7. I Final system test of patient-specific classifier Now that the optimal network architecture has been determined, there remains a final system test to be performed, which simulates the use of a patient-selective classifier in the field. In practice, such a system would be trained from data recorded when the patient first came to the hospital. Once the training was complete (this requires the labelling of these initial records), the system could be used live on subsequent days or visits to the hospital.

For this final system test, 26 recordings from subject A are selected from the database in order to construct balanced training and validation sets. A 10-7-1 MLP is then trained using these data before being tested on an independent set of four recordings (for which the threshold for spike detection should be increased to 0.95). The same procedure is repeated for subject B, group C and group D with, in each case, a fraction of the recordings being kept back to act as an independent test set. The performance of the patient-specific classifiers is summarised in Table 5.

Table 5: Performance of the patient-specific classifiers on an independent test set of recordings for each subject

Accuracy Sensitivity Specificity (%) (%) (%)

Su bjectk)

A 95.6 97.3 95.5

B 89.6 97.6 89.4

C 85.6 83.1 85.9

D 91.2 94.9 91.1

The results for Group C are significantly worse than those for the other subjects. Group C is a set of three different patients, all in a state of active wakefulness when the recordings were made. Their EEG recordings are the noisiest and contain more movement artefact than the other recordings, thereby making it more difficult to detect spikes more accurately.

There are only 754 spikes in total in all of the independent test sets put together. 724 of these are detected by the patient-specific classifiers and only 30 are missed. The main problem, however, is the prevalence of false positives (spike detection by the MLP in segments which have been labelled as normal by the expert) and we will return to this issue when discussing the results obtained with the generic system.

8 Generic system

The aim of this Section is first to establish whether the accuracy, sensitivity and specificity achieved by the patient-specific classifiers can be maintained in a generic system, tested on independent data (recordings

IEE Proc -Sei Meas Technol, Val 145, No 6, November 1998

from subjects not included in the training and validation sets). The ideal test of the generic system would be to train the MLP classifier on the recordings from the first database (subjects A and B) and then test it on the recordings from the second database (groups C and D). Unfortunately, the two datasets were not recorded under the same conditions and such a test is therefore not possible. Instead, the following three experiments are undertaken: (a) train an MLP cllassifier on subject B and test it on subject A; (b) train an MLP classifier on subject A and test it on subject B; (c) train an MLP classifier on group C and test it on group D. The results are given in Tables 6, 7 and 8, with the first column indicating the test subject, the next the results from the patient-specific classifier (i.e. trained on that subject) and the lant one the results from the generic system (i.e. trained on subjects other than the test subject). To quantify the decrease in performance in going from a patient-specific classifier to a generic system, the test recordings, which were used in the previous Section to assess the performance of the patient-specific classifiers, are also used to evaluate the generic systems.

Table 6: Performance for the test recordings from subject A for the patienit-specific and generic MLP classifiers

Subject A Patient-specific Generic (%) ( Y O )

Accuracy 95.6 95.0 Sensitivity 97.3 97.1

Specificity 95.5 94.9

Table 7: Performance for the test recordings from subject B for the patient-specific and generic MLP classifiers

Subject Patienit-specific Generic (%) (%)

~~~

Accuracy 89.6 81.9

Sensitivity 97.6 95.2


Table 8: Performance for the test recordings from group D for the patient-specific and generic MLP classifiers

Group Patient-specific Generic (%) (%)

Accuracy 91.2 64.2

Sensitivity 94.9 97.4


The three Tablles show that the ‘generic system’ trained on subject B achieves a level of performance in the analysis of the test recordings from subject A which is comparable to that of the patient-specific classifier trained on that same subject. There is some degrada- tion when subject B is the test subject (with a ‘generic system’ trained on subject A), but that is probably due to the fact that there are only 26 recordings in the database for subject A (compared with 73 recordings for subject B) and hence a less comprehensive coverage of spike patterns in that training set. Not surprisingly, the biggest drop in accuracy is seen when group D pro- vides the test subjects for the ‘generic system’ trained

IEE PVOL -Sa Mear Tecknol, Vol 145, No 6, November 1998

on group C. This is not unexpected since the patients in group C were awake when their EEG was recorded, whereas those in group D were asleep. We know from our previous work that the sleep EEG is markedly different from the wakefulness EEG and this explains the decrease in both accuracy and specificity.

Although the other results are less conclusive, the evidence from Table 6 suggests that it should be possible to train a generic MLP classifier for spike detection, provided that a database covering a large cross-section of the patient population can be acquired and labelled by expert clinicians. It is not yet clear whether a single generic system could be trained to detect spikes during both wakefulness and sleep. Such a question will not be answered until the training database is extended to include the same subjects in both wakefulness and sleep states.

There is one significant problem which remains out- standing before any of this can be contemplated. When the classification results from the test recordings of subjects A and B and group D (which are shown in Tables 6, 7 and 8) are combined together, the following confusion matrix is obtained:

T P F N [ F P .,I = [;;;5 ,::74]

As with the patient-specific classifiers, the overall sensitivity of the ‘generic systems’ is high; 665 out of 689 spikes are detected, giving a sensitivity of 96.5%. The problem, however, is the high number of false positives: 2725 out of the 16399 normal segments are wrongly classified as containing a spike. The selectivity of the system (the number of true positives with respect to the total number of spikes detected) is only 665/ (2725 + 665) (i.e. 19.6%). In other words, only 1 in 5 of the spikes highlighted by the ‘generic systems’ corresponds to an inter-ictal spike. Hardly any real spikes are missed, but there is little discrimination against the sharp transients which are not inter-ictal spikes. During wakefulness, false positives are generated by eye blinks or muscle movements and, during sleep, by sleep spin- dles or vertex waves. On the basis of the information at the input to the neural network (normalised time- domain parameters and reflection coefficients), the classification of these EEG waveforms as spikes is not, strictly speaking, erroneous, since they are similar in morphology and duration to inter-ictal spikes. Experts use both contextual information (is the patient awake or asleep?) and spatial information (by comparing different channels from the multi-channel records) in order to discard false positives. The former could easily be incorporated by tracking the sleep-wake continuum from the background EEG and data fusion techniques could be used for multi-channel analysis. Movement artefacts could be identified by using another, independent, source of information such as the electro-ocu- logram (EOG) or the chin electromyogram (EMG), but this is not acceptable for long-term monitoring. Instead, in our present studies with the National Hos- pital for Neurology and Neurosurgery, the mastoid EEG is being recorded as one of the 20 EEG channels. The mastoid EEG will contain both inter-ictal spike information and muscle movements, and it should be possible to identify the latter by comparing with other channels (say the central or frontal EEG) using the technique of independent component analysis, for example [28].

271

9 Conclusions

In this paper, we have presented our preliminary results on the training and testing of neural network classifiers to detect inter-ictal spikes in the EEG. We used balanced databases to evaluate the amount of discriminatory information present in different types of features. Although previous work by others has concentrated on time-domain parameters, we have shown that reflection coefficients, together with the normalised prediction error, better model the change in EEG frequency content associated with the occurrence of a spike.

A neural network classifier, using a combination of normalised time-domain parameters and reflection coefficients as input features, could form the basis of a generic spike detection system, provided that the training database incorporated data recorded both during sleep and during wakefulness. It is clear, however, that the selectivity of such a system needs to be improved for regular clinical use. We suggest that this could be achieved by a combination of contextual analysis, multi-channel data fusion and independent component analysis.

10 Acknowledgments

Y.U. Khan was supported by a Felix Scholarship. The authors are also grateful to the Medical Systems Divi- sion of Oxford Instruments and to Phil Allen at the National Hospital for Neurology and Neurosurgery, Queen Square, London, for supplying the EEG data.

11 References

ROBERTS, S.J., and TARASSENKO, L.: ‘Analysis of the sleep EEG using a multilayer network with spatial organisation’, IEE Proc.-F, Radar Signal Process., 1992, 139, pp. 420425 ROBERTS, S.J., AND TARASSENKO, L.: ‘Automated sleep EEG analysis using an RBF network‘, in MURRAY, A.F. (Ed.): Applications of neural networks’ (Kluwer Academic Publishers,

PARDEY, J., ROBERTS, S.J., TARASSENKO, L., and STRA- DLING, J.: ‘A new approach to the analysis of the human sleep/ wakefulness continuum’, J. Sleep Research, 1996, 5, (4), pp. 201- 210

1995), pp. 305-322

STRADLING, J.R., PARTLETT, J., DAVIES, R.J.O., SIEG- W,4RT, D., and TARASSENKO, L.: ‘Effect of short term graded withdrawal of nasal continuous oositive airwav Diessure on svstemic blood uressure in uatients with obstru6ti;e sleen apnoka’, Blood Pres&, 1996, 5, ‘pp. 234-240 DAVIES, R.J.O., BENNETT, L.S., BARBOUR, C., TARASSENKO, L.. and STRADLING, J.R.: ‘Second bv second patterns in cortical electroencephalograph and sub-cortical, systo- lic blood pressure, arousal during Cheyne-Stokes breathing’, Eur. Resp. J, (in press) ESPIE, C.A., PAUL, A., MCFIE, J., AMOS, P., HAMILTON, D., MCCOLL, J.H., TARASSENKO, L., and PARDEY, J.: Sleep studies of adults with severe or profound mental retarda-

tion and epilepsy’, Am. J. Ment. Retard., 1998, 103, pp. 47-59

7 HUGHES, J.R.: ‘EEG in clinical practice’ (Butterworth-Heine- mann, 1994)

8 GOTMAN, J., and GLOOR, P.: ‘Automatic recognition and quantification of interictal epileptic activity in the human scalp EEG’, Electroencephalogr. Clin. Neurophysiol., 1916, 41, pp. 513- 529

9 GLOVER, J.R., KTONAS, P.Y., and FROST, J.D.: ‘Context based automated detection of epileptogenic sharp transients in the EEG: elimination of false positives’, ZEEE Trans. Biomed. Eng.,

0 PIETILA, T., VAFAAKOSKI, S., NOUSIAINEN, U., VARRI, A., FREY, H., HAKKINEN, V., and NEUVO, Y.: ‘Evaluation of a computerized system for recognition of epileptic activity during long-term EEG recording’, Electroencephalog. Clin. Neuro- physiol., 1994, 90, pp. 438443

1 TARASSENKO, L.: ‘A guide to neural computing applications’ (Arnold London, 1998)

12 PARDEY, J., ROBERTS, S., and TARASSENKO, L.: ‘A review of parametric modelling techniques for EEG analysis’, Med. Eng.

13 WEBBER, W.R.S., LITT, B., WILSON, K., and LESSER, R.P.: ‘Practical detection of epileptiform discharges (EDs) in the EEG using an artificial neural network: a comparison of raw and parameterized EEG data’, Electroencephalog. Clin. Neurophysid., 1994, 91, pp. 194-204

14 HJORTH, B.: ‘The physical significance of time domain descriptors in EEG analysis’, Electroencephalogr. Clin. Neurophysiol.,

15 WALMSLEY, M.: ‘On the normalised slope descriptor method of quantifying EEG’, ZEEE Trans. Biomed. Eng., 1984, 31, pp. 720-123

16 SALTZBERG, B., HEATH, R.G., and EDWARDS, R.J.: ‘EEG spike detection in schizophrenia research’. Digest of the 7th international conference Medical and biological engineering, Stock- holm, 1967, pp. 266

17 WALTER, D.O., MULLER, H.F., and JELL, R.M.: ‘Semiauto- matic quantification of sharpness of EEG phenomena’, ZEEE Trans. Biomed. Eng., 1973, 20, pp. 53-54

18 SMITH, J.R.: ‘Automatic analysis and detection of EEG spikes’, IEEE Trans. Bionted. Eng., 1974, 21, pp. 1-7

19 CARRIE, J.R.: ‘A technique for analysing transient EEG abnor- malities’, Electroencephalogr. Clin. Neurophysiol., 1972, 32, pp. 199-201

20 KHAN, Y.U.: ‘Detection of epileptic events in electroencephalo- grams using artificial neural networks’. DPhil, thesis, Oxford Uni- versity, 1997

21 GOTMAN, J.: ‘Automatic recognition of interictal spikes’, Elec- troencephalogr. Clin. Neurophysiol., 1985, Suppl. 37, pp. 93-1 14

22 FUKUNAGA, K.: ‘Introduction to statistical pattern recognition’ (Academic Press, Boston, 1990, 2nd edn.)

23 COHEN, A.: ‘Biomedical signal processing’ (CRC Press, Boca Raton, 1986)

24 BAUM, E.B., and HAUSSLER, D.: ‘What size net gives valid generalization’, Neur. Comp., 1989, 1, pp. 151-160

25 HOLT, M.R.G.: ‘The use of neural networks in the analysis of the anaesthetic electroencephalogram’. DPhil, thesis, Oxford Uni- versity, 1997

ERTS, C., and TARASSENKO, L.: ‘The use of parametric modelling and statistical pattern recognition in the detection of awareness during general anaesthesia’, (this issue)

27 BISHOP, C.M.: ‘Neural networks for pattern recognition’ (Oxford University Press, Oxford, 1995)

28 BELL, A.J., and SEJNOWSKI, T.J.: ‘An information-maximisa- tion approach to blind separation and blind deconvolution’, Neur. Comp., 1995, 7, pp. 1129-1159

1989, 36, pp. 519-527

Phys., 1996, 18, pp. 2-11

1973, 34, pp. 306-310

26 HOLT, M.R.G., TOOLEY, M., FOREST, F., PRYS-ROB-

278 IEE Proc-Sei. Meas. Technol., Vol. 145, No 6, November 1998

Date post:	18-Sep-2016
Category:	Documents
Upload:	mrg
View:	212 times
Download:	0 times

Identification of inter-ictal spikes in the EEG using neural network analysis

Documents