Two Brain Signal (EEG) processing applications Zbigniew Zbigniew LEONOWICZ, PhD Robust estimators &...

Two Brain Signal (EEG) Two Brain Signal (EEG) processing applicationsprocessing applicationsTwo Brain Signal (EEG) Two Brain Signal (EEG) processing applicationsprocessing applications

ZbigniewZbigniew LEONOWICZ, PhDZbigniewZbigniew LEONOWICZ, PhD

Robust estimators & Blind Signal Separation (BSS)Robust estimators & Blind Signal Separation (BSS)

Wroclaw University of TechnologyWroclaw University of Technology

Foundation – 1945 (1910)Students - 32 8211, 2 or 3 place in National Ranking1, 2 or 3 place in National Ranking

Degree programmes• Bachelor of Science - 13• Master of Science – 24• PhD - 10• Academic Staff• Professors: 193• Associate professors: 165• Assistant professors: 90• PhD Fellows: 957

Foundation – 1945 (1910)Students - 32 8211, 2 or 3 place in National Ranking1, 2 or 3 place in National Ranking

Degree programmes• Bachelor of Science - 13• Master of Science – 24• PhD - 10• Academic Staff• Professors: 193• Associate professors: 165• Assistant professors: 90• PhD Fellows: 957

Alumni since 1945

•Graduates - above 80 000

•PhD – above 4 500

•Degree Programmes taught in English English -7 -7

Main topicsMain topicsMain topicsMain topics

• Robust Averaging of Evoked Potentials Z. Leonowicz, J. Karvanen, S. Shishkin: "Trimmed estimators for robust

averaging of event-related potentials" Journal of Neuroscience Methods,Journal of Neuroscience Methods, Elsevier Ltd, 2005, vol. 142, No. 1, pp. 17-26.

• Alzheimer’s Disease-related EEG Analysis and Classification

A. Cichocki, S. Shishkin, T. Musha, Z. Leonowicz, T. Asada, T. Kurachi: EEG filtering based on blind source separation (BSS) for detection of Alzheimer disease, Clinical NeurophysiologyClinical Neurophysiology, 2005, vol. 116, No 3, pp. 729-737.

• Robust Averaging of Evoked Potentials Z. Leonowicz, J. Karvanen, S. Shishkin: "Trimmed estimators for robust

averaging of event-related potentials" Journal of Neuroscience Methods,Journal of Neuroscience Methods, Elsevier Ltd, 2005, vol. 142, No. 1, pp. 17-26.

• Alzheimer’s Disease-related EEG Analysis and Classification

A. Cichocki, S. Shishkin, T. Musha, Z. Leonowicz, T. Asada, T. Kurachi: EEG filtering based on blind source separation (BSS) for detection of Alzheimer disease, Clinical NeurophysiologyClinical Neurophysiology, 2005, vol. 116, No 3, pp. 729-737.

Robust EstimationRobust Estimation

• Trimmed estimators provide an alternative way to average experimental data.

• Location estimators (trimmed mean, Winsorized mean and recently introduced trimmed L-mean), arithmetic mean and median.

• New robust location estimator tanh, which allows the data-dependent optimization – for averaging of small number of trials.

• The possibilities to improve signal-to-noise ratio (SNR) of averaged waveforms - for epochs from a set of real auditory evoked potential data.

• Trimmed estimators provide an alternative way to average experimental data.

• Location estimators (trimmed mean, Winsorized mean and recently introduced trimmed L-mean), arithmetic mean and median.

• New robust location estimator tanh, which allows the data-dependent optimization – for averaging of small number of trials.

• The possibilities to improve signal-to-noise ratio (SNR) of averaged waveforms - for epochs from a set of real auditory evoked potential data.

Auditory evoked potentials EPAuditory evoked potentials EP

• “evoked” by a certain event, usually a sensory onesensory one, but not by independent endogenous processes.

• demonstrate the efficiency of trimmed estimators of data location for computing EPs

• propose the ways to optimize their parameters

• “evoked” by a certain event, usually a sensory onesensory one, but not by independent endogenous processes.

• demonstrate the efficiency of trimmed estimators of data location for computing EPs

• propose the ways to optimize their parameters

Statistical Estimators of LocationStatistical Estimators of Location

• The problem of sensitivity of an estimator to the presence of outliers, i.e. “the data points that deviate from the pattern set by the majority of the data set”

• development of robust location measures.

• Robustness of an estimator is measured by the breakdown valuebreakdown value

• The problem of sensitivity of an estimator to the presence of outliers, i.e. “the data points that deviate from the pattern set by the majority of the data set”

• development of robust location measures.

• Robustness of an estimator is measured by the breakdown valuebreakdown value

AssumptionsAssumptions• Averaging is probably the most widely used basic

statistical procedure in experimental science.• Estimation of the location of data („central tendency”) in

the presence of random variations among the observations

• Data variations can be a result of variations in the phenomenon of interest or of some unavoidable measuring errors.

• In signal processing terms, this can be considered as contamination of useful „signal” by useless „noise” linearly added to it.

• Since the noise usually has zero mean, averaging minimizes its contribution, while the signal is preserved, and the signal to noise ratio is improved

• Averaging is probably the most widely used basic statistical procedure in experimental science.

• Estimation of the location of data („central tendency”) in the presence of random variations among the observations

• Data variations can be a result of variations in the phenomenon of interest or of some unavoidable measuring errors.

• In signal processing terms, this can be considered as contamination of useful „signal” by useless „noise” linearly added to it.

• Since the noise usually has zero mean, averaging minimizes its contribution, while the signal is preserved, and the signal to noise ratio is improved

SynchronizationSynchronization

• Averaging consists of applying of any statistical procedure to extract the useful information from the background noise.

• When useful data are time-locked to some event and the noise is not time-locked, it allows the cancellation of the noise by simple point-by-point data summation.

• This procedure is equivalent to the use of the arithmetic mean

• Averaging consists of applying of any statistical procedure to extract the useful information from the background noise.

• When useful data are time-locked to some event and the noise is not time-locked, it allows the cancellation of the noise by simple point-by-point data summation.

• This procedure is equivalent to the use of the arithmetic mean

Robust location estimators Robust location estimators

• Many location estimators can be presented in Many location estimators can be presented in uniunifified way by ordering theed way by ordering the values of the sample values of the sample as as

and then applying the and then applying the weightweight functionfunction

• where where is a function designed to reduce the is a function designed to reduce the ininflfluence of certainuence of certain observations (data points)observations (data points) in in form of weighting and form of weighting and represents orderedrepresents ordered data. data.

• Many location estimators can be presented in Many location estimators can be presented in uniunifified way by ordering theed way by ordering the values of the sample values of the sample as as

and then applying the and then applying the weightweight functionfunction

• where where is a function designed to reduce the is a function designed to reduce the ininflfluence of certainuence of certain observations (data points)observations (data points) in in form of weighting and form of weighting and represents orderedrepresents ordered data. data.

ExamplesExamples

• MedianMedianWhenWhen the data have the size of (2 the data have the size of (2MM+1), the median is the +1), the median is the

value of the (value of the (M M +1)+1)thth ordered observation. ordered observation.

• Trimmed meanTrimmed meanFor the For the --trimmed mean (where trimmed mean (where p p = = NN) the weights can ) the weights can

bebe d deefifined as:ned as:

p p highest and highest and p p lowest samples are removedlowest samples are removed..

• MedianMedianWhenWhen the data have the size of (2 the data have the size of (2MM+1), the median is the +1), the median is the

value of the (value of the (M M +1)+1)thth ordered observation. ordered observation.

• Trimmed meanTrimmed meanFor the For the --trimmed mean (where trimmed mean (where p p = = NN) the weights can ) the weights can

bebe d deefifined as:ned as:

p p highest and highest and p p lowest samples are removedlowest samples are removed..

Winsorized meanWinsorized mean

• Winsorized mean replaces each observation in Winsorized mean replaces each observation in each each fraction (fraction (p p = = NN) of) of the tail of the the tail of the distribution by the value of the nearest distribution by the value of the nearest unaffectedunaffected observation. observation.

• 0 0 p p 00,,2525N N usuallyusually, depending on, depending on the the heaviness of the tails of the distribution.heaviness of the tails of the distribution.

• Winsorized mean replaces each observation in Winsorized mean replaces each observation in each each fraction (fraction (p p = = NN) of) of the tail of the the tail of the distribution by the value of the nearest distribution by the value of the nearest unaffectedunaffected observation. observation.

• 0 0 p p 00,,2525N N usuallyusually, depending on, depending on the the heaviness of the tails of the distribution.heaviness of the tails of the distribution.

Weight functionsWeight functions

Weight functions - advancedWeight functions - advanced

• TL-mean applies higher TL-mean applies higher weights for the middle weights for the middle observationsobservations

• tanh estimator appliestanh estimator applies smoothly changing smoothly changing weights to the values weights to the values close to extreme, it close to extreme, it can can bebe set to ignore set to ignore extreme extreme valuevaluess

• TL-mean applies higher TL-mean applies higher weights for the middle weights for the middle observationsobservations

• tanh estimator appliestanh estimator applies smoothly changing smoothly changing weights to the values weights to the values close to extreme, it close to extreme, it can can bebe set to ignore set to ignore extreme extreme valuevaluess

ComparisonComparison

1. Conclusions1. ConclusionsTrimmed estimators are a class of robust estimators of Trimmed estimators are a class of robust estimators of

data locations whichdata locations which can help to improve averaging can help to improve averaging of of experimental data experimental data whenwhen::

• number of number of experimentsexperiments is small is small• data are highly nonstationarydata are highly nonstationary• data data include outliers. include outliers. CCompromise between median which is very robustompromise between median which is very robust but but

discard too much information and arithmetic mean discard too much information and arithmetic mean conventionally used forconventionally used for averaging which use all data averaging which use all data but, due of this, is sensitive to outliers. but, due of this, is sensitive to outliers.

AdditionalAdditional improvement of averaging can be gained by improvement of averaging can be gained by introducingintroducing advanced advanced weighting of orderedweighting of ordered datadata..

Trimmed estimators are a class of robust estimators of Trimmed estimators are a class of robust estimators of data locations whichdata locations which can help to improve averaging can help to improve averaging of of experimental data experimental data whenwhen::

• number of number of experimentsexperiments is small is small• data are highly nonstationarydata are highly nonstationary• data data include outliers. include outliers. CCompromise between median which is very robustompromise between median which is very robust but but

discard too much information and arithmetic mean discard too much information and arithmetic mean conventionally used forconventionally used for averaging which use all data averaging which use all data but, due of this, is sensitive to outliers. but, due of this, is sensitive to outliers.

AdditionalAdditional improvement of averaging can be gained by improvement of averaging can be gained by introducingintroducing advanced advanced weighting of orderedweighting of ordered datadata..

Main motivations of BSS Main motivations of BSS

• Enhance speech and/or image signals and recognize voices and human faces.

• Extract the features, detect and discriminate different patterns and images.

• Recognize and classify different kind of odors or smells, and/or somato-sensory stimulus like touch, vibration, pain, temperature

• Estimate, detect and classify some abnormal and Estimate, detect and classify some abnormal and normal patterns of brain signals, which may enable normal patterns of brain signals, which may enable in the future early non-invasive medical diagnosis in the future early non-invasive medical diagnosis and evaluate human mental state and intelligence or and evaluate human mental state and intelligence or abilities for specific mental tasksabilities for specific mental tasks.

• Enhance speech and/or image signals and recognize voices and human faces.

• Extract the features, detect and discriminate different patterns and images.

• Recognize and classify different kind of odors or smells, and/or somato-sensory stimulus like touch, vibration, pain, temperature

• Estimate, detect and classify some abnormal and Estimate, detect and classify some abnormal and normal patterns of brain signals, which may enable normal patterns of brain signals, which may enable in the future early non-invasive medical diagnosis in the future early non-invasive medical diagnosis and evaluate human mental state and intelligence or and evaluate human mental state and intelligence or abilities for specific mental tasksabilities for specific mental tasks.

Instantaneous linear modelInstantaneous linear model

x(t): m x 1 vector (array output)s(t): n x 1 vector (source vector)n(t): m x 1 vector (additive noise) assumed

independent from the source signalsA: m x n matrix (mixing matrix) assumed

unstructured (blind array processing)

Array of m sensors receiving n sources

(t)nAs(t)(t)x

Independent Component Analysis (ICA)

Independent Component Analysis (ICA)

Matrix of Observed

data

Matrix of Observed

data

Mixing Matrix of

Basis Vectors

Mixing Matrix of

Basis Vectors

Matrix of independentindependent components

Matrix of independentindependent components

Challenge -- to estimate both A and S, using X

X = ASX = AS

General approaches to BSS/BSE

General approaches to BSS/BSE

• There are in general two approaches for estimating the source signals:

• the simultaneous blind source separation approach (BSS);

• sequential (one by one) blind extraction, the sources are extracted on-by-one by eliminating the already extracted sources equilibrium points.

• There are in general two approaches for estimating the source signals:

• the simultaneous blind source separation approach (BSS);

• sequential (one by one) blind extraction, the sources are extracted on-by-one by eliminating the already extracted sources equilibrium points.

Extraction of signals with specified frequency band - audio-visual stimuliExtraction of signals with specified

frequency band - audio-visual stimuli

Result showing of EEG patterns for 64-channel recordings (audio-visual stimuli). The original recording had a very low spatial resolution, so the exact localization of signal sources was difficult.

Result showing of EEG patterns for 64-channel recordings (audio-visual stimuli). The original data had again very low resolution for the responses expected in auditory and visual cortex areas.

• Result showing of EEG patterns for 32-channel recordings (P300 response). The original recording had a significant distortion from the facial muscle. After separation the auditory response and visual activation was visible.

• Result showing of EEG patterns for 32-channel recordings (P300 response). The original recording had a significant distortion from the facial muscle. After separation the auditory response and visual activation was visible.

resultsresults

)(1 kx

)(kxm

)(2 kx

+

)(~)( ksky ji

)(~)(~ ksky ji

_

fc i

imw

2iw

B a n d p a s s

F ilte r

)(ki1z

Auditory evoked potentialsAuditory evoked potentialsThe data set of auditory evoked potential experiment The data set of auditory evoked potential experiment

was recorded by testing normawas recorded by testing normall male adult. male adult.

Results for auditory evoked potential data analysis (the total channels or components are 64 in each figure). (a) Exemplary plot of 16 channels raw data.

(b) Result for PCA, without evoked-response components. (c)Result for ICA, some evoked-response components and spontaneous brain noises were extracted.

EEG filtering based on blind source separation (BSS) improves detection of

Alzheimer disease

EEG filtering based on blind source separation (BSS) improves detection of

Alzheimer disease• ObjectiveObjective: Improvement of detection of Alzheimer

disease (AD) by filtering of EEG data using blind source separation (BSS) and projection of components which are possibly sensitive to cortical neuronal impairment found in early stages of AD.

• MethodMethod: Artifact-free 20 s intervals of raw resting EEG recordings from mild AD patients and age-matched controls decomposed into spatio-temporally decorrelated components using BSS algorithm "AMUSE". Filtered EEG was obtained by back-projection of components with the highest linear predictability. Relative power of filtered data in delta, theta, alpha1, alpha2, beta1, and beta 2 bands were processed with Linear Discriminant Analysis (LDA).

• ObjectiveObjective: Improvement of detection of Alzheimer disease (AD) by filtering of EEG data using blind source separation (BSS) and projection of components which are possibly sensitive to cortical neuronal impairment found in early stages of AD.

• MethodMethod: Artifact-free 20 s intervals of raw resting EEG recordings from mild AD patients and age-matched controls decomposed into spatio-temporally decorrelated components using BSS algorithm "AMUSE". Filtered EEG was obtained by back-projection of components with the highest linear predictability. Relative power of filtered data in delta, theta, alpha1, alpha2, beta1, and beta 2 bands were processed with Linear Discriminant Analysis (LDA).

• Results: Preprocessing improved the percentage of correctly classified patients and controls computed with jack-knifing cross-validation from 59 to 73% and from 76 to 84%, correspondingly.

• Conclusions: The proposed approach can significantly improve the sensitivity and specificity of EEG based AD diagnosis and may have potential for improvement of EEG classification in other clinical areas or fundamental research.

• Significance: Since the patients with AD should be identified during large scale screening, inexpensive tools are highly needed. The developed method is quite general, inexpensive and flexible, allowing for various extensions.

• Results: Preprocessing improved the percentage of correctly classified patients and controls computed with jack-knifing cross-validation from 59 to 73% and from 76 to 84%, correspondingly.

• Conclusions: The proposed approach can significantly improve the sensitivity and specificity of EEG based AD diagnosis and may have potential for improvement of EEG classification in other clinical areas or fundamental research.

• Significance: Since the patients with AD should be identified during large scale screening, inexpensive tools are highly needed. The developed method is quite general, inexpensive and flexible, allowing for various extensions.

Main IdeaMain Idea

• "filtering based on Blind Source Separation (BSS)", that is, filtering of EEG by selection of most relevant components followed by reconstruction of the relevant part (subspace) of EEG signal using back projection of only these components.

• Finding the rules to discriminate components which are more sensitive to Alzheimer’s disease and the related disorders than others.

• "filtering based on Blind Source Separation (BSS)", that is, filtering of EEG by selection of most relevant components followed by reconstruction of the relevant part (subspace) of EEG signal using back projection of only these components.

• Finding the rules to discriminate components which are more sensitive to Alzheimer’s disease and the related disorders than others.

EEG components clustersEEG components clusters

• For the purposes of EEG classification the estimation of individual components corresponding to separate and meaningful brain sources is not required (unlike in other applications of BSS to EEG)

• We use clusters of components - beneficial when the data from different subjects are compared.

• For the purposes of EEG classification the estimation of individual components corresponding to separate and meaningful brain sources is not required (unlike in other applications of BSS to EEG)

• We use clusters of components - beneficial when the data from different subjects are compared.

Basic assumptionsBasic assumptions• EEG signal is composed of a finite number of

components

• Components are mixed through unknown linear mixing process (described by mixing matrix A)

• BSS algorithm finds an un-mixing (separating) nxn matrix W consisted of coefficients with which the electrode signals should be taken to form, by summation, the estimated components

• Back projection of some selected components

• EEG signal is composed of a finite number of components

• Components are mixed through unknown linear mixing process (described by mixing matrix A)

• BSS algorithm finds an un-mixing (separating) nxn matrix W consisted of coefficients with which the electrode signals should be taken to form, by summation, the estimated components

• Back projection of some selected components

1( ) [ ( ),..., ( )]Tns t s t s t

( ) ( )t tx As

( ) ( )t ty Wx

1( ) ( )r rt tx W y

AMUSE Algorithm AMUSE Algorithm

• AMUSE algorithm belongs to the group of second-order-statistics spatio-temporal decorrelation (SOS-STD) BSS algorithms

• Estimated components should be spatio-temporally decorrelateddecorrelated and be less complex (i.e., have better linear predictability) than any mixture of those sources.

• AMUSE algorithm belongs to the group of second-order-statistics spatio-temporal decorrelation (SOS-STD) BSS algorithms

• Estimated components should be spatio-temporally decorrelateddecorrelated and be less complex (i.e., have better linear predictability) than any mixture of those sources.

AMUSE (Tong et al., 1991, 1993; Szupiluk and Cichocki, 2001;AMUSE (Tong et al., 1991, 1993; Szupiluk and Cichocki, 2001; Cichocki and Amari, 2003)Cichocki and Amari, 2003)

AMUSEAMUSE

• AMUSE algorithm = 2 x PCA:

• First PCA is applied to input data.

• Second PCA is applied to the time-delayed covariance matrix of the output of previous stage.

• Unmixing matrix

• AMUSE algorithm = 2 x PCA:

• First PCA is applied to input data.

• Second PCA is applied to the time-delayed covariance matrix of the output of previous stage.

• Unmixing matrix

( ) ( )t tz Qx 12

xQ R T

x t tR x xE

1T Tz t t R z z USVE

1ˆ T W A U Q

MethodologyMethodology

• AD patients in this database had, at the time of EEG recording, only memory impairment but no apparent loss in general cognitive, behavioral, or functional status. Recording was made with eyes closed in an awake resting condition (with vigilance control) using 21 electrodes according to 10-20 system.

• AD patients in this database had, at the time of EEG recording, only memory impairment but no apparent loss in general cognitive, behavioral, or functional status. Recording was made with eyes closed in an awake resting condition (with vigilance control) using 21 electrodes according to 10-20 system.

• Each EEG was decomposed into 21 decorrelated components by AMUSE AMUSE. Some of the components were selected for back projection, which formed preprocessed ("AMUSE filtered") EEG data.

• Spectral analysis based on Fast Fourier Transform was applied to raw data, to the components and to the projections of selected components.

• Relative spectral powers were computed by dividing the power in delta (1.5- 3.5 Hz), theta (3.5-7.5 Hz), alpha 1 (7.5-9.5 Hz), alpha 2 (9.5-12.5 Hz), beta 1 (12.5-17.5 Hz) and beta 2 (17.5-25 Hz) bands by the power in 1.5-25 Hz band.

• These values were normalized for better fitting the normal distributionnormal distribution using the transformation log(x/(1-x))

• Each EEG was decomposed into 21 decorrelated components by AMUSE AMUSE. Some of the components were selected for back projection, which formed preprocessed ("AMUSE filtered") EEG data.

• Spectral analysis based on Fast Fourier Transform was applied to raw data, to the components and to the projections of selected components.

• Relative spectral powers were computed by dividing the power in delta (1.5- 3.5 Hz), theta (3.5-7.5 Hz), alpha 1 (7.5-9.5 Hz), alpha 2 (9.5-12.5 Hz), beta 1 (12.5-17.5 Hz) and beta 2 (17.5-25 Hz) bands by the power in 1.5-25 Hz band.

• These values were normalized for better fitting the normal distributionnormal distribution using the transformation log(x/(1-x))

• Linear discriminant analysis (LDA) - used for discriminating AD and control groups on the basis of log-transformed relative spectral power in the 6 frequency bands, averaged over channels.

• To improve validation of the classification results, discriminant analysisdiscriminant analysis was applied in combination with jack-knifingjack-knifing..

• Jack-knifing means that each case is classified using individual discriminant function trained with all cases except this one.

• Linear discriminant analysis (LDA) - used for discriminating AD and control groups on the basis of log-transformed relative spectral power in the 6 frequency bands, averaged over channels.

• To improve validation of the classification results, discriminant analysisdiscriminant analysis was applied in combination with jack-knifingjack-knifing..

• Jack-knifing means that each case is classified using individual discriminant function trained with all cases except this one.

EEG recordings and AMUSE components

EEG recordings and AMUSE components

How many components?How many components?

• How many components with highest linear predictability provides optimal classification rate?

• Overall misclassification rate was computed each time by applying obtained discriminant function to the same 60 subjects (22 patients + 38 controls).

• The best classification was obtained for projection of 5 components (with numbers from 1 to 5).

• How many components with highest linear predictability provides optimal classification rate?

• Overall misclassification rate was computed each time by applying obtained discriminant function to the same 60 subjects (22 patients + 38 controls).

• The best classification was obtained for projection of 5 components (with numbers from 1 to 5).

Classification results [%]Classification results [%]

AMUSE Mild AD Controls All

No preproc. 59 76 70

c. 1-5 7373 8484 8080

c. 1-7 73 84 80

c. 1-10 73 76 75

c. 6–21 59 71 67

c. 8-21 59 71 67

c. 16-21 45 68 60

Sensitivity & SpecificitySensitivity & Specificity

2. Conclusions2. Conclusions

• Existing techniques are limited to removing only such part of raw signal which contain no or almost no components of brain origin but rather external artifacts and noise.

• We found a cluster of AMUSE-decorrelated components which is sensitive to AD.

• Room for improvement in ranking and selection of optimal (significant) components.

• Existing techniques are limited to removing only such part of raw signal which contain no or almost no components of brain origin but rather external artifacts and noise.

• We found a cluster of AMUSE-decorrelated components which is sensitive to AD.

• Room for improvement in ranking and selection of optimal (significant) components.

Date post:	13-Jan-2016
Category:	Documents
Upload:	naomi-marianna-shepherd
View:	232 times
Download:	1 times

Two Brain Signal (EEG) processing applications Zbigniew Zbigniew LEONOWICZ, PhD Robust estimators &...

Documents