Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | naomi-marianna-shepherd |
View: | 232 times |
Download: | 1 times |
Two Brain Signal (EEG) Two Brain Signal (EEG) processing applicationsprocessing applicationsTwo Brain Signal (EEG) Two Brain Signal (EEG) processing applicationsprocessing applications
ZbigniewZbigniew LEONOWICZ, PhDZbigniewZbigniew LEONOWICZ, PhD
Robust estimators & Blind Signal Separation (BSS)Robust estimators & Blind Signal Separation (BSS)
Wroclaw University of TechnologyWroclaw University of Technology
Foundation – 1945 (1910)Students - 32 8211, 2 or 3 place in National Ranking1, 2 or 3 place in National Ranking
Degree programmes• Bachelor of Science - 13• Master of Science – 24• PhD - 10• Academic Staff• Professors: 193• Associate professors: 165• Assistant professors: 90• PhD Fellows: 957
Foundation – 1945 (1910)Students - 32 8211, 2 or 3 place in National Ranking1, 2 or 3 place in National Ranking
Degree programmes• Bachelor of Science - 13• Master of Science – 24• PhD - 10• Academic Staff• Professors: 193• Associate professors: 165• Assistant professors: 90• PhD Fellows: 957
Alumni since 1945
•Graduates - above 80 000
•PhD – above 4 500
•Degree Programmes taught in English English -7 -7
Main topicsMain topicsMain topicsMain topics
• Robust Averaging of Evoked Potentials Z. Leonowicz, J. Karvanen, S. Shishkin: "Trimmed estimators for robust
averaging of event-related potentials" Journal of Neuroscience Methods,Journal of Neuroscience Methods, Elsevier Ltd, 2005, vol. 142, No. 1, pp. 17-26.
• Alzheimer’s Disease-related EEG Analysis and Classification
A. Cichocki, S. Shishkin, T. Musha, Z. Leonowicz, T. Asada, T. Kurachi: EEG filtering based on blind source separation (BSS) for detection of Alzheimer disease, Clinical NeurophysiologyClinical Neurophysiology, 2005, vol. 116, No 3, pp. 729-737.
• Robust Averaging of Evoked Potentials Z. Leonowicz, J. Karvanen, S. Shishkin: "Trimmed estimators for robust
averaging of event-related potentials" Journal of Neuroscience Methods,Journal of Neuroscience Methods, Elsevier Ltd, 2005, vol. 142, No. 1, pp. 17-26.
• Alzheimer’s Disease-related EEG Analysis and Classification
A. Cichocki, S. Shishkin, T. Musha, Z. Leonowicz, T. Asada, T. Kurachi: EEG filtering based on blind source separation (BSS) for detection of Alzheimer disease, Clinical NeurophysiologyClinical Neurophysiology, 2005, vol. 116, No 3, pp. 729-737.
Robust EstimationRobust Estimation
• Trimmed estimators provide an alternative way to average experimental data.
• Location estimators (trimmed mean, Winsorized mean and recently introduced trimmed L-mean), arithmetic mean and median.
• New robust location estimator tanh, which allows the data-dependent optimization – for averaging of small number of trials.
• The possibilities to improve signal-to-noise ratio (SNR) of averaged waveforms - for epochs from a set of real auditory evoked potential data.
• Trimmed estimators provide an alternative way to average experimental data.
• Location estimators (trimmed mean, Winsorized mean and recently introduced trimmed L-mean), arithmetic mean and median.
• New robust location estimator tanh, which allows the data-dependent optimization – for averaging of small number of trials.
• The possibilities to improve signal-to-noise ratio (SNR) of averaged waveforms - for epochs from a set of real auditory evoked potential data.
Auditory evoked potentials EPAuditory evoked potentials EP
• “evoked” by a certain event, usually a sensory onesensory one, but not by independent endogenous processes.
• demonstrate the efficiency of trimmed estimators of data location for computing EPs
• propose the ways to optimize their parameters
• “evoked” by a certain event, usually a sensory onesensory one, but not by independent endogenous processes.
• demonstrate the efficiency of trimmed estimators of data location for computing EPs
• propose the ways to optimize their parameters
Statistical Estimators of LocationStatistical Estimators of Location
• The problem of sensitivity of an estimator to the presence of outliers, i.e. “the data points that deviate from the pattern set by the majority of the data set”
• development of robust location measures.
• Robustness of an estimator is measured by the breakdown valuebreakdown value
• The problem of sensitivity of an estimator to the presence of outliers, i.e. “the data points that deviate from the pattern set by the majority of the data set”
• development of robust location measures.
• Robustness of an estimator is measured by the breakdown valuebreakdown value
AssumptionsAssumptions• Averaging is probably the most widely used basic
statistical procedure in experimental science.• Estimation of the location of data („central tendency”) in
the presence of random variations among the observations
• Data variations can be a result of variations in the phenomenon of interest or of some unavoidable measuring errors.
• In signal processing terms, this can be considered as contamination of useful „signal” by useless „noise” linearly added to it.
• Since the noise usually has zero mean, averaging minimizes its contribution, while the signal is preserved, and the signal to noise ratio is improved
• Averaging is probably the most widely used basic statistical procedure in experimental science.
• Estimation of the location of data („central tendency”) in the presence of random variations among the observations
• Data variations can be a result of variations in the phenomenon of interest or of some unavoidable measuring errors.
• In signal processing terms, this can be considered as contamination of useful „signal” by useless „noise” linearly added to it.
• Since the noise usually has zero mean, averaging minimizes its contribution, while the signal is preserved, and the signal to noise ratio is improved
SynchronizationSynchronization
• Averaging consists of applying of any statistical procedure to extract the useful information from the background noise.
• When useful data are time-locked to some event and the noise is not time-locked, it allows the cancellation of the noise by simple point-by-point data summation.
• This procedure is equivalent to the use of the arithmetic mean
• Averaging consists of applying of any statistical procedure to extract the useful information from the background noise.
• When useful data are time-locked to some event and the noise is not time-locked, it allows the cancellation of the noise by simple point-by-point data summation.
• This procedure is equivalent to the use of the arithmetic mean
Robust location estimators Robust location estimators
• Many location estimators can be presented in Many location estimators can be presented in uniunifified way by ordering theed way by ordering the values of the sample values of the sample as as
and then applying the and then applying the weightweight functionfunction
• where where is a function designed to reduce the is a function designed to reduce the ininflfluence of certainuence of certain observations (data points)observations (data points) in in form of weighting and form of weighting and represents orderedrepresents ordered data. data.
• Many location estimators can be presented in Many location estimators can be presented in uniunifified way by ordering theed way by ordering the values of the sample values of the sample as as
and then applying the and then applying the weightweight functionfunction
• where where is a function designed to reduce the is a function designed to reduce the ininflfluence of certainuence of certain observations (data points)observations (data points) in in form of weighting and form of weighting and represents orderedrepresents ordered data. data.
ExamplesExamples
• MedianMedianWhenWhen the data have the size of (2 the data have the size of (2MM+1), the median is the +1), the median is the
value of the (value of the (M M +1)+1)thth ordered observation. ordered observation.
• Trimmed meanTrimmed meanFor the For the --trimmed mean (where trimmed mean (where p p = = NN) the weights can ) the weights can
bebe d deefifined as:ned as:
p p highest and highest and p p lowest samples are removedlowest samples are removed..
• MedianMedianWhenWhen the data have the size of (2 the data have the size of (2MM+1), the median is the +1), the median is the
value of the (value of the (M M +1)+1)thth ordered observation. ordered observation.
• Trimmed meanTrimmed meanFor the For the --trimmed mean (where trimmed mean (where p p = = NN) the weights can ) the weights can
bebe d deefifined as:ned as:
p p highest and highest and p p lowest samples are removedlowest samples are removed..
Winsorized meanWinsorized mean
• Winsorized mean replaces each observation in Winsorized mean replaces each observation in each each fraction (fraction (p p = = NN) of) of the tail of the the tail of the distribution by the value of the nearest distribution by the value of the nearest unaffectedunaffected observation. observation.
• 0 0 p p 00,,2525N N usuallyusually, depending on, depending on the the heaviness of the tails of the distribution.heaviness of the tails of the distribution.
• Winsorized mean replaces each observation in Winsorized mean replaces each observation in each each fraction (fraction (p p = = NN) of) of the tail of the the tail of the distribution by the value of the nearest distribution by the value of the nearest unaffectedunaffected observation. observation.
• 0 0 p p 00,,2525N N usuallyusually, depending on, depending on the the heaviness of the tails of the distribution.heaviness of the tails of the distribution.
Weight functionsWeight functions
Weight functions - advancedWeight functions - advanced
• TL-mean applies higher TL-mean applies higher weights for the middle weights for the middle observationsobservations
• tanh estimator appliestanh estimator applies smoothly changing smoothly changing weights to the values weights to the values close to extreme, it close to extreme, it can can bebe set to ignore set to ignore extreme extreme valuevaluess
• TL-mean applies higher TL-mean applies higher weights for the middle weights for the middle observationsobservations
• tanh estimator appliestanh estimator applies smoothly changing smoothly changing weights to the values weights to the values close to extreme, it close to extreme, it can can bebe set to ignore set to ignore extreme extreme valuevaluess
ComparisonComparison
1. Conclusions1. ConclusionsTrimmed estimators are a class of robust estimators of Trimmed estimators are a class of robust estimators of
data locations whichdata locations which can help to improve averaging can help to improve averaging of of experimental data experimental data whenwhen::
• number of number of experimentsexperiments is small is small• data are highly nonstationarydata are highly nonstationary• data data include outliers. include outliers. CCompromise between median which is very robustompromise between median which is very robust but but
discard too much information and arithmetic mean discard too much information and arithmetic mean conventionally used forconventionally used for averaging which use all data averaging which use all data but, due of this, is sensitive to outliers. but, due of this, is sensitive to outliers.
AdditionalAdditional improvement of averaging can be gained by improvement of averaging can be gained by introducingintroducing advanced advanced weighting of orderedweighting of ordered datadata..
Trimmed estimators are a class of robust estimators of Trimmed estimators are a class of robust estimators of data locations whichdata locations which can help to improve averaging can help to improve averaging of of experimental data experimental data whenwhen::
• number of number of experimentsexperiments is small is small• data are highly nonstationarydata are highly nonstationary• data data include outliers. include outliers. CCompromise between median which is very robustompromise between median which is very robust but but
discard too much information and arithmetic mean discard too much information and arithmetic mean conventionally used forconventionally used for averaging which use all data averaging which use all data but, due of this, is sensitive to outliers. but, due of this, is sensitive to outliers.
AdditionalAdditional improvement of averaging can be gained by improvement of averaging can be gained by introducingintroducing advanced advanced weighting of orderedweighting of ordered datadata..
Main motivations of BSS Main motivations of BSS
• Enhance speech and/or image signals and recognize voices and human faces.
• Extract the features, detect and discriminate different patterns and images.
• Recognize and classify different kind of odors or smells, and/or somato-sensory stimulus like touch, vibration, pain, temperature
• Estimate, detect and classify some abnormal and Estimate, detect and classify some abnormal and normal patterns of brain signals, which may enable normal patterns of brain signals, which may enable in the future early non-invasive medical diagnosis in the future early non-invasive medical diagnosis and evaluate human mental state and intelligence or and evaluate human mental state and intelligence or abilities for specific mental tasksabilities for specific mental tasks.
• Enhance speech and/or image signals and recognize voices and human faces.
• Extract the features, detect and discriminate different patterns and images.
• Recognize and classify different kind of odors or smells, and/or somato-sensory stimulus like touch, vibration, pain, temperature
• Estimate, detect and classify some abnormal and Estimate, detect and classify some abnormal and normal patterns of brain signals, which may enable normal patterns of brain signals, which may enable in the future early non-invasive medical diagnosis in the future early non-invasive medical diagnosis and evaluate human mental state and intelligence or and evaluate human mental state and intelligence or abilities for specific mental tasksabilities for specific mental tasks.
Instantaneous linear modelInstantaneous linear model
x(t): m x 1 vector (array output)s(t): n x 1 vector (source vector)n(t): m x 1 vector (additive noise) assumed
independent from the source signalsA: m x n matrix (mixing matrix) assumed
unstructured (blind array processing)
Array of m sensors receiving n sources
(t)nAs(t)(t)x
Independent Component Analysis (ICA)
Independent Component Analysis (ICA)
Matrix of Observed
data
Matrix of Observed
data
Mixing Matrix of
Basis Vectors
Mixing Matrix of
Basis Vectors
Matrix of independentindependent components
Matrix of independentindependent components
Challenge -- to estimate both A and S, using X
X = ASX = AS
General approaches to BSS/BSE
General approaches to BSS/BSE
• There are in general two approaches for estimating the source signals:
• the simultaneous blind source separation approach (BSS);
• sequential (one by one) blind extraction, the sources are extracted on-by-one by eliminating the already extracted sources equilibrium points.
• There are in general two approaches for estimating the source signals:
• the simultaneous blind source separation approach (BSS);
• sequential (one by one) blind extraction, the sources are extracted on-by-one by eliminating the already extracted sources equilibrium points.
Extraction of signals with specified frequency band - audio-visual stimuliExtraction of signals with specified
frequency band - audio-visual stimuli
Result showing of EEG patterns for 64-channel recordings (audio-visual stimuli). The original recording had a very low spatial resolution, so the exact localization of signal sources was difficult.
Result showing of EEG patterns for 64-channel recordings (audio-visual stimuli). The original data had again very low resolution for the responses expected in auditory and visual cortex areas.
• Result showing of EEG patterns for 32-channel recordings (P300 response). The original recording had a significant distortion from the facial muscle. After separation the auditory response and visual activation was visible.
• Result showing of EEG patterns for 32-channel recordings (P300 response). The original recording had a significant distortion from the facial muscle. After separation the auditory response and visual activation was visible.
resultsresults
)(1 kx
)(kxm
)(2 kx
+
)(~)( ksky ji
)(~)(~ ksky ji
_
fc i
imw
2iw
B a n d p a s s
F ilte r
)(ki1z
Auditory evoked potentialsAuditory evoked potentialsThe data set of auditory evoked potential experiment The data set of auditory evoked potential experiment
was recorded by testing normawas recorded by testing normall male adult. male adult.
Results for auditory evoked potential data analysis (the total channels or components are 64 in each figure). (a) Exemplary plot of 16 channels raw data.
(b) Result for PCA, without evoked-response components. (c)Result for ICA, some evoked-response components and spontaneous brain noises were extracted.
EEG filtering based on blind source separation (BSS) improves detection of
Alzheimer disease
EEG filtering based on blind source separation (BSS) improves detection of
Alzheimer disease• ObjectiveObjective: Improvement of detection of Alzheimer
disease (AD) by filtering of EEG data using blind source separation (BSS) and projection of components which are possibly sensitive to cortical neuronal impairment found in early stages of AD.
• MethodMethod: Artifact-free 20 s intervals of raw resting EEG recordings from mild AD patients and age-matched controls decomposed into spatio-temporally decorrelated components using BSS algorithm "AMUSE". Filtered EEG was obtained by back-projection of components with the highest linear predictability. Relative power of filtered data in delta, theta, alpha1, alpha2, beta1, and beta 2 bands were processed with Linear Discriminant Analysis (LDA).
• ObjectiveObjective: Improvement of detection of Alzheimer disease (AD) by filtering of EEG data using blind source separation (BSS) and projection of components which are possibly sensitive to cortical neuronal impairment found in early stages of AD.
• MethodMethod: Artifact-free 20 s intervals of raw resting EEG recordings from mild AD patients and age-matched controls decomposed into spatio-temporally decorrelated components using BSS algorithm "AMUSE". Filtered EEG was obtained by back-projection of components with the highest linear predictability. Relative power of filtered data in delta, theta, alpha1, alpha2, beta1, and beta 2 bands were processed with Linear Discriminant Analysis (LDA).
• Results: Preprocessing improved the percentage of correctly classified patients and controls computed with jack-knifing cross-validation from 59 to 73% and from 76 to 84%, correspondingly.
• Conclusions: The proposed approach can significantly improve the sensitivity and specificity of EEG based AD diagnosis and may have potential for improvement of EEG classification in other clinical areas or fundamental research.
• Significance: Since the patients with AD should be identified during large scale screening, inexpensive tools are highly needed. The developed method is quite general, inexpensive and flexible, allowing for various extensions.
• Results: Preprocessing improved the percentage of correctly classified patients and controls computed with jack-knifing cross-validation from 59 to 73% and from 76 to 84%, correspondingly.
• Conclusions: The proposed approach can significantly improve the sensitivity and specificity of EEG based AD diagnosis and may have potential for improvement of EEG classification in other clinical areas or fundamental research.
• Significance: Since the patients with AD should be identified during large scale screening, inexpensive tools are highly needed. The developed method is quite general, inexpensive and flexible, allowing for various extensions.
Main IdeaMain Idea
• "filtering based on Blind Source Separation (BSS)", that is, filtering of EEG by selection of most relevant components followed by reconstruction of the relevant part (subspace) of EEG signal using back projection of only these components.
• Finding the rules to discriminate components which are more sensitive to Alzheimer’s disease and the related disorders than others.
• "filtering based on Blind Source Separation (BSS)", that is, filtering of EEG by selection of most relevant components followed by reconstruction of the relevant part (subspace) of EEG signal using back projection of only these components.
• Finding the rules to discriminate components which are more sensitive to Alzheimer’s disease and the related disorders than others.
EEG components clustersEEG components clusters
• For the purposes of EEG classification the estimation of individual components corresponding to separate and meaningful brain sources is not required (unlike in other applications of BSS to EEG)
• We use clusters of components - beneficial when the data from different subjects are compared.
• For the purposes of EEG classification the estimation of individual components corresponding to separate and meaningful brain sources is not required (unlike in other applications of BSS to EEG)
• We use clusters of components - beneficial when the data from different subjects are compared.
Basic assumptionsBasic assumptions• EEG signal is composed of a finite number of
components
• Components are mixed through unknown linear mixing process (described by mixing matrix A)
• BSS algorithm finds an un-mixing (separating) nxn matrix W consisted of coefficients with which the electrode signals should be taken to form, by summation, the estimated components
• Back projection of some selected components
• EEG signal is composed of a finite number of components
• Components are mixed through unknown linear mixing process (described by mixing matrix A)
• BSS algorithm finds an un-mixing (separating) nxn matrix W consisted of coefficients with which the electrode signals should be taken to form, by summation, the estimated components
• Back projection of some selected components
1( ) [ ( ),..., ( )]Tns t s t s t
( ) ( )t tx As
( ) ( )t ty Wx
1( ) ( )r rt tx W y
AMUSE Algorithm AMUSE Algorithm
• AMUSE algorithm belongs to the group of second-order-statistics spatio-temporal decorrelation (SOS-STD) BSS algorithms
• Estimated components should be spatio-temporally decorrelateddecorrelated and be less complex (i.e., have better linear predictability) than any mixture of those sources.
• AMUSE algorithm belongs to the group of second-order-statistics spatio-temporal decorrelation (SOS-STD) BSS algorithms
• Estimated components should be spatio-temporally decorrelateddecorrelated and be less complex (i.e., have better linear predictability) than any mixture of those sources.
AMUSE (Tong et al., 1991, 1993; Szupiluk and Cichocki, 2001;AMUSE (Tong et al., 1991, 1993; Szupiluk and Cichocki, 2001; Cichocki and Amari, 2003)Cichocki and Amari, 2003)
AMUSEAMUSE
• AMUSE algorithm = 2 x PCA:
• First PCA is applied to input data.
• Second PCA is applied to the time-delayed covariance matrix of the output of previous stage.
• Unmixing matrix
• AMUSE algorithm = 2 x PCA:
• First PCA is applied to input data.
• Second PCA is applied to the time-delayed covariance matrix of the output of previous stage.
• Unmixing matrix
( ) ( )t tz Qx 12
xQ R T
x t tR x xE
1T Tz t t R z z USVE
1ˆ T W A U Q
MethodologyMethodology
• AD patients in this database had, at the time of EEG recording, only memory impairment but no apparent loss in general cognitive, behavioral, or functional status. Recording was made with eyes closed in an awake resting condition (with vigilance control) using 21 electrodes according to 10-20 system.
• AD patients in this database had, at the time of EEG recording, only memory impairment but no apparent loss in general cognitive, behavioral, or functional status. Recording was made with eyes closed in an awake resting condition (with vigilance control) using 21 electrodes according to 10-20 system.
• Each EEG was decomposed into 21 decorrelated components by AMUSE AMUSE. Some of the components were selected for back projection, which formed preprocessed ("AMUSE filtered") EEG data.
• Spectral analysis based on Fast Fourier Transform was applied to raw data, to the components and to the projections of selected components.
• Relative spectral powers were computed by dividing the power in delta (1.5- 3.5 Hz), theta (3.5-7.5 Hz), alpha 1 (7.5-9.5 Hz), alpha 2 (9.5-12.5 Hz), beta 1 (12.5-17.5 Hz) and beta 2 (17.5-25 Hz) bands by the power in 1.5-25 Hz band.
• These values were normalized for better fitting the normal distributionnormal distribution using the transformation log(x/(1-x))
• Each EEG was decomposed into 21 decorrelated components by AMUSE AMUSE. Some of the components were selected for back projection, which formed preprocessed ("AMUSE filtered") EEG data.
• Spectral analysis based on Fast Fourier Transform was applied to raw data, to the components and to the projections of selected components.
• Relative spectral powers were computed by dividing the power in delta (1.5- 3.5 Hz), theta (3.5-7.5 Hz), alpha 1 (7.5-9.5 Hz), alpha 2 (9.5-12.5 Hz), beta 1 (12.5-17.5 Hz) and beta 2 (17.5-25 Hz) bands by the power in 1.5-25 Hz band.
• These values were normalized for better fitting the normal distributionnormal distribution using the transformation log(x/(1-x))
• Linear discriminant analysis (LDA) - used for discriminating AD and control groups on the basis of log-transformed relative spectral power in the 6 frequency bands, averaged over channels.
• To improve validation of the classification results, discriminant analysisdiscriminant analysis was applied in combination with jack-knifingjack-knifing..
• Jack-knifing means that each case is classified using individual discriminant function trained with all cases except this one.
• Linear discriminant analysis (LDA) - used for discriminating AD and control groups on the basis of log-transformed relative spectral power in the 6 frequency bands, averaged over channels.
• To improve validation of the classification results, discriminant analysisdiscriminant analysis was applied in combination with jack-knifingjack-knifing..
• Jack-knifing means that each case is classified using individual discriminant function trained with all cases except this one.
EEG recordings and AMUSE components
EEG recordings and AMUSE components
How many components?How many components?
• How many components with highest linear predictability provides optimal classification rate?
• Overall misclassification rate was computed each time by applying obtained discriminant function to the same 60 subjects (22 patients + 38 controls).
• The best classification was obtained for projection of 5 components (with numbers from 1 to 5).
• How many components with highest linear predictability provides optimal classification rate?
• Overall misclassification rate was computed each time by applying obtained discriminant function to the same 60 subjects (22 patients + 38 controls).
• The best classification was obtained for projection of 5 components (with numbers from 1 to 5).
Classification results [%]Classification results [%]
AMUSE Mild AD Controls All
No preproc. 59 76 70
c. 1-5 7373 8484 8080
c. 1-7 73 84 80
c. 1-10 73 76 75
c. 6–21 59 71 67
c. 8-21 59 71 67
c. 16-21 45 68 60
Sensitivity & SpecificitySensitivity & Specificity
2. Conclusions2. Conclusions
• Existing techniques are limited to removing only such part of raw signal which contain no or almost no components of brain origin but rather external artifacts and noise.
• We found a cluster of AMUSE-decorrelated components which is sensitive to AD.
• Room for improvement in ranking and selection of optimal (significant) components.
• Existing techniques are limited to removing only such part of raw signal which contain no or almost no components of brain origin but rather external artifacts and noise.
• We found a cluster of AMUSE-decorrelated components which is sensitive to AD.
• Room for improvement in ranking and selection of optimal (significant) components.