Date post: | 11-Jan-2023 |
Category: |
Documents |
Upload: | khangminh22 |
View: | 0 times |
Download: | 0 times |
HAL Id: tel-00426852https://tel.archives-ouvertes.fr/tel-00426852
Submitted on 28 Oct 2009
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Mapping, timing and tracking cortical activations withMEG and EEG: Methods and application to human
visionAlexandre Gramfort
To cite this version:Alexandre Gramfort. Mapping, timing and tracking cortical activations with MEG and EEG: Meth-ods and application to human vision. Modeling and Simulation. Ecole nationale supérieure destelecommunications - ENST, 2009. English. tel-00426852
PhD THESIS
prepared at
INRIA Sophia Antipolis
and presented at
Graduate School of Telecom ParisTech
A dissertation submitted in partial fulfillment
of the requirements for the degree of
DOCTOR OF SCIENCE
Specialized in Signal and Image Processing
Mapping, timing and tracking
cortical activations
with MEG and EEG:
Methods and application
to human vision
Alexandre GRAMFORT
Advisors Dr. Maureen Clerc ENPC / INRIA Sophia Antipolis, France
Pr. Olivier Faugeras INRIA Sophia Antipolis, France
Reviewers Pr. Matti Hamalainen MGH/MIT/HMS Martinos Center, Boston, USA
Dr. Remi Gribonval IRISA, Rennes, France
Examiners Dr. Sylvain Baillet Medical College of Wisconsin, Milwaukee, USA
Pr. Eric Moulines Telecom ParisTech, Paris, France
Invited scientist Dr. Elsa Angelini Telecom ParisTech, Paris, France
!
!
THESE
presentee pour obtenir le grade de Docteur
de l’Ecole Nationale Superieure des Telecommunications
Specialite : Signal et Image
Alexandre GRAMFORT
Localisation et suivi d’activite
fonctionnelle cerebrale en electro et
magnetoencephalographie :
Methodes et applications au systeme
visuel humain
Soutenue le 12 Octobre 2009 devant le jury compose de:
Composition du jury:
Directeurs Dr. Maureen Clerc ENPC / INRIA Sophia Antipolis, France
Pr. Olivier Faugeras INRIA Sophia Antipolis, France
Rapporteurs Pr. Matti Hamalainen MGH/MIT/HMS Martinos Center, Boston, USA
Dr. Remi Gribonval IRISA, Rennes, France
Examinateurs Dr. Sylvain Baillet Medical College of Wisconsin, Milwaukee, USA
Pr. Eric Moulines Telecom ParisTech, Paris, France
Invites Dr. Elsa Angelini Telecom ParisTech, Paris, France
5
ABSTRACT
The overall aim of this thesis is the development of novel electroencephalography (EEG) and
magnetoencephalography (MEG) analysis methods to provide new insights to the functioning
of the human brain. MEG and EEG are non-invasive techniques that measure outside of the
head the electric potentials and the magnetic fields induced by the neuronal activity, respec-
tively. The objective of these functional brain imaging modalities is to be able to localize in
space and time the origin of the signal measured. To do so very challenging mathematical
and computational problems needs to be tackled. The first part of this work proceeds from
the biological origin the M/EEG signal to the resolution of the forward problem. Starting
from Maxwell’s equations in their quasi-static formulation and from a physical model of the
head, the forward problem predicts the measurements that would be obtained for a given
configuration of current generators. With realistic head models the solution is not known
analytically and is obtained with numerical solvers. The first contribution of this thesis in-
troduces a solution of this problem using a symmetric boundary element method (BEM) which
has an excellent precision compared to alternative standard BEM implementations. Once a
forward model is available the next challenge consists in recovering the current generators
that have produced the measured signal. This problem is referred to as the inverse problem.
Three types of approaches exist for solving this problem: parametric methods, scanning tech-
niques, and image-based methods with distributed source models. This latter technique offers
a rigorous formulation of the inverse problem without making strong modeling assumptions.
However, it requires to solve a severely ill-posed problem. The resolution of such problems
classically requires to impose constraints or priors on the solution. The second part of this the-
sis presents robust and tractable inverse solvers with a particular interest on efficient convex
optimization methods using sparse priors. The third part of this thesis is the most applied
contribution. It is a detailed exploration of the problem of retinotopic mapping with MEG
measurements, from an experimental protocol design to data exploration, and resolution of
the inverse problem using time frequency analysis. The next contribution of this thesis, aims
at going one step further from simple source localization by providing an approach to investi-
gate the dynamics of cortical activations. Starting from spatiotemporal source estimates the
algorithm proposed provides a way to robustly track the “hot spots” over the cortical mesh
in order to provide a clear view of the cortical processing over time. The last contribution of
this work addresses the very challenging problem of single-trial data processing. We propose
to make use of recent progress in graph-based methods in order to achieve parameter esti-
mation on single-trial data and therefore reduce the estimation bias produced by standard
multi-trial data averaging. Both the source code of our algorithms and the experimental data
are freely available to reproduce the results presented. The retinotopy project was done in
collaboration with the LENA team at the hopital La Pitie-Salpetriere (Paris).
Keywords:
Neuroimaging, magnetoencephalography (MEG), electroencephalography (EEG), human vi-
sion, retinotopy, boundary element method, inverse problem, convex optimization, sparse
regression, single-trial analysis, graph cuts.
7
RESUME
Cette these est consacree a l’etude des signaux mesures par Electroencephalographie
(EEG) et Magnetoencephalographie (MEG) afin d’ameliorer notre comprehension du cerveau
humain. La MEG et l’EEG sont des modalites d’imagerie cerebrale non invasives. Elles
permettent de mesurer, hors de la tete, respectivement le potentiel electrique et le champ
magnetique induits par l’activite neuronale. Le principal objectif lie a l’exploitation de ces
donnees est la localisation dans l’espace et dans le temps des sources de courant ayant genere
les mesures. Pour ce faire, il est necessaire de resoudre un certain nombre de problemes
mathematiques et informatiques difficiles. La premiere partie de cette these part de la
presentation des fondements biologiques a l’origine des donnees M/EEG, jusqu’a la resolution
du probleme direct. Le probleme direct permet de predire les mesures generees pour une con-
figuration de sources de courant donnee. La resolution de ce probleme a l’aide des equations
de Maxwell dans l’approximation quasi-statique passe par la modelisation des generateurs
de courants, ainsi que de la geometrie du milieu conducteur, dans notre cas la tete. Cette
modelisation aboutit a un probleme direct lineaire qui n’admet pas de solution analytique
lorsque l’on considere des modeles de tete realistes. Notre premiere contribution porte sur
l’implementation d’une resolution numerique a base d’elements finis surfaciques dont nous
montrons l’excellente precision comparativement aux autres implementations disponibles.
Une fois le probleme direct calcule, l’etape suivante consiste a estimer les positions et les
amplitudes des sources ayant genere les mesures. Il s’agit de resoudre le probleme inverse.
Pour ce faire, trois methodes existent: les methodes parametriques, les methodes dites de
“scanning”, et les methodes distribuees. Cette derniere approche fournit un cadre rigoureux
a la resolution de probleme inverse tout en evitant de faire de trop importantes approxima-
tions dans la modelisation. Toutefois, elle impose de resoudre un probleme fortement sous-
contraint qui necessite de fait d’imposer des a priori sur les solutions. Ainsi la deuxieme
partie de cette these est consacree aux differents types d’a priori pouvant etre utilises dans
le probleme inverse. Leur presentation part des methodes de resolution mathematiques
jusqu’aux details d’implementation et a leur utilisation en pratique sur des tailles de problemes
realistes. Un interet particulier est porte aux a priori induisant de la parcimonie et con-
duisant a l’optimisation de problemes convexes non differentiables pour lesquels sont presentees
des methodes d’optimisation efficaces a base d’iterations proximales. La troisieme partie
porte sur l’utilisation des methodes exposees precedemment afin d’estimer des cartes retinotopiques
dans le systeme visuel a l’aide de donnees MEG. La presentation porte a la fois sur les aspects
experimentaux lies au protocole d’acquisition jusqu’a la mise en œuvre du probleme inverse
en exploitant des proprietes sur le spectre du signal mesure. La contribution suivante ambi-
tionne d’aller plus loin que la simple localisation d’activites par le probleme inverse afin de
donner acces a la dynamique de l’activite corticale. Partant des estimations de sources sur
le maillage cortical, la methode proposee utilise des methodes d’optimisation combinatoires
a base de coupes de graphes afin d’effectuer de facon robuste le suivi de l’activite au cours
du temps. La derniere contribution de cette these porte sur l’estimation de parametres sur
des donnees M/EEG brutes non moyennees. Compte tenu du faible rapport signal a bruit,
l’analyse de donnees M/EEG dites “simple essai” est un probleme particulierement difficile
dont l’interet est fondamental afin d’aller plus loin que l’analyse de donnees moyennees en
explorant la variabilite inter-essais. La methode proposee utilise des outils recents a base
de graphes. Elle garantit des optimisations globales et s’affranchit de problemes classiques
tels que l’initialisation des parametres ou l’utilisation du signal moyenne dans l’estimation.
L’ensemble des methodes developpees durant cette these ont ete utilisees sur des donnees
M/EEG reels afin de garantir leur pertinence dans le contexte experimental parfois com-
plexe des signaux reelles M/EEG. Les implementations et les donnees necessaires a la re-
8
productibilite des resultats sont disponibles. Le projet de retinotopie par l’exploitation de
donnees de MEG a ete mene en collaboration avec l’equipe du LENA au sein de l’hopital de
La Pitie-Salpetriere (Paris).
Mots cles:
neuroimagerie, magnetoencephalographie (MEG), electroencephalographie (EEG), vision, retinotopie,
methode des elements frontieres, probleme inverse, optimization convexe, regression parci-
monieuse, analyse simple-essai, coupes de graphes.
9
ACKNOWLEDGMENTS
First I would like to express my deep gratitude to Maureen Clerc for supervising my
work and sharing her expertise during these three years. I would also like to thank Olivier
Faugeras for having welcomed me in his research team and giving me the opportunity to work
in two prestigious and stimulating institutes: the INRIA Sophia Antipolis and the Ecole Nor-
male Superieure in Paris.
I would also like to thank Matti Hamalainen and Remi Gribonval for reviewing my thesis
and taking the time to make insightful remarks and suggestions on my work. I am also ex-
tremely grateful to Sylvain Baillet, Eric Moulines and Elsa Angelini for their participation in
the jury.
Many people have contributed to the work presented in this thesis and I want to thank
them here: Theo Papadopoulo for helping me clarifying my thoughts on many subjects and
sharing his “geeky side”, Francis Bach for always being available and sharing his expertise on
optimization and machine learning with so much simplicity, Sylvain Baillet for mentoring me
in the MEG community, Renaud Keriven for introducing me to combinatorial optimization,
Rachid Deriche for being my new INRIA team leader and helping me with my doubts and
interrogations, Sylvain Arlot for sharing his knowledge on non-parametric statistics, Jean-
Yves Audibert for his feed back on manifold learning methods, Bertrand Thirion for the past
but mostly the future, Jean Lorenceau for assisting me in the hard moments of experimental
data acquisition, Benoit Cottereau for challenging me when comes the problem of analyzing
real data, Matthieu Kowalski for our fruitful collaboration, Demian Wassermann for putting
up with me in his office, Nicole Voges for proofreading part of this thesis, Gabriel Peyre for
always answering my questions, Stanley Durrleman for all the hours we spent together in
the traffic talking about our research and Marie-Cecile Lafont for all her help.
I will not forget all the friends I have made during these three years: Sylvain Vallaghe,
Florence Gombert, Julien Lefevre, Maxime Descoteaux, Jonathan Touboul, Maria-Jose Es-
cobar, Emmanuel Caruyer, Auro Ghosh, Francois Grimbert, Adrien Wohrer, Romain Veltz,
Mathieu Galtier, Joan Fruitet, Emmanuel Olivi, Rodolphe Jenatton and the “dream team” at
the ENS: Michael Pechaud, Pierre Maurel and Patrick Labatut.
I would also like to thank my parents for all the efforts they made for me since day one.
Finally, I would like to thank Claire for her love and support day-to-day during these
years.
Contents
Introduction 25
1 Neural basis of EEG and MEG 31
1.1 Anatomy and electrophysiology of the human brain . . . . . . . . . . . . . . . . 32
1.1.1 General brain structures: From macro to nano . . . . . . . . . . . . . . . 32
1.1.2 How neurons produce electromagnetic fields . . . . . . . . . . . . . . . . 40
1.2 Instrumentation for MEG and EEG . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.2.1 Electroencephalography (EEG) . . . . . . . . . . . . . . . . . . . . . . . . 44
1.2.2 Magnetoencephalography (MEG) . . . . . . . . . . . . . . . . . . . . . . . 47
1.2.3 Other modalities for brain functional imaging . . . . . . . . . . . . . . . 47
1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2 The forward problem 55
2.1 The physics of EEG and MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.1 Maxwell’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.2 Quasi-static approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.3 The electric potential equation . . . . . . . . . . . . . . . . . . . . . . . . 58
2.1.4 The magnetic field equation: the Biot-Savart law . . . . . . . . . . . . . 58
2.2 Unbounded homogeneous medium . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.1 Dipolar sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2.2 Multipolar sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 The spherically symmetric head model . . . . . . . . . . . . . . . . . . . . . . . . 61
2.3.1 Electric potential generated by a dipole . . . . . . . . . . . . . . . . . . . 62
2.3.2 The magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.2.1 The radial component of the magnetic field . . . . . . . . . . . . 64
2.3.2.2 Total magnetic field generated by a dipole . . . . . . . . . . . . 64
2.3.3 Magnetic field generated by a multipole . . . . . . . . . . . . . . . . . . . 65
2.3.4 Limits of spherical models . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4 Realistic head models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.1 The Finite Difference Method (FDM) . . . . . . . . . . . . . . . . . . . . . 66
2.4.2 The Finite Element Method (FEM) . . . . . . . . . . . . . . . . . . . . . . 67
2.4.3 The Boundary Element Method (BEM) . . . . . . . . . . . . . . . . . . . 69
2.4.4 The Symmetric Boundary Element Method (SymBEM) . . . . . . . . . . 72
2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.6.1 Review of non commercial available software . . . . . . . . . . . . . . . . 76
2.6.2 OpenMEEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
11
12 CONTENTS
3 The inverse problem with distributed source models 89
3.1 General introduction to inverse methods . . . . . . . . . . . . . . . . . . . . . . . 91
3.1.1 Parametric models and dipole fitting approaches . . . . . . . . . . . . . . 91
3.1.2 Scanning methods: the beamformers . . . . . . . . . . . . . . . . . . . . . 91
3.1.3 Image-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 Minimum norm solutions and its variants . . . . . . . . . . . . . . . . . . . . . . 95
3.2.1 The Minimum-Norm solution . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.2.1.1 Minimum-norm equations . . . . . . . . . . . . . . . . . . . . . . 97
3.2.1.2 Choosing the regularization parameter . . . . . . . . . . . . . . 98
3.2.2 Variants around the minimum-norm solution . . . . . . . . . . . . . . . . 99
3.2.2.1 The weighted minimum-norm (WMN) . . . . . . . . . . . . . . . 101
3.2.2.2 The ℓ2 priors and Gaussian models . . . . . . . . . . . . . . . . . 102
3.2.2.3 Noise normalized methods: dSPM and sLORETA . . . . . . . . 103
3.2.2.4 Spatiotemporal minimum-norm estimation . . . . . . . . . . . . 104
3.3 Learning-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3.1 Model selection using a multiresolution approach: MiMS . . . . . . . . . 107
3.3.2 Restricted Maximum Likelihood (ReML) and Sparse Bayesian Learning
(SBL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4 Inverse modeling with sparse priors 117
4.1 Why use sparse priors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2 Inversion with sparse priors: Methods . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2.1 Iterative Least Squares (IRLS) . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2.2 LARS-LASSO with the ℓ1 norm . . . . . . . . . . . . . . . . . . . . . . . . 123
4.2.3 Proximity operators and iterative schemes . . . . . . . . . . . . . . . . . 124
4.3 Sparsity and spatially extended activations: The Total Variation . . . . . . . . 129
4.4 Sparsity and spatiotemporal data . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.1 VESTAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.2 ℓ1 over space and ℓ2 over time . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5 Sparse priors with multiple experimental conditions: ℓ212 . . . . . . . . . . . . . 135
4.5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.5.3 MEG study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5 Fast retinotopic mapping with MEG 145
5.1 From the eyes to the cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.2 Retinotopic mapping with fMRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3 Source localization with M/EEG in the visual cortex: previous studies . . . . . 150
5.4 MEG experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.1 Stimulus design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.2 Protocol design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.5 Mapping V1 with MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5.1 Data exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5.2.1 How to invert? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5.2.2 Estimating active regions with permutation tests . . . . . . . . 162
5.5.2.3 The mapping procedure . . . . . . . . . . . . . . . . . . . . . . . 166
5.5.3 Mapping results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.5.3.1 Localization results with ℓ2 inverse solvers . . . . . . . . . . . . 167
CONTENTS 13
5.5.3.2 From localization to retinotopic maps . . . . . . . . . . . . . . . 168
5.5.3.3 Reconstruct on WM-GM or GM-CSF interface? . . . . . . . . . . 170
5.5.3.4 Localization results beyond simple ℓ2 inverse solvers. . . . . . . 173
5.5.3.5 Effect of the orientation constraint . . . . . . . . . . . . . . . . . 176
5.6 Timing visual dynamics with MEG . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.6.1 Estimating timings in the visual cortex with M/EEG: Literature review 176
5.6.2 Extracting information from the phase . . . . . . . . . . . . . . . . . . . . 178
5.6.3 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6 Tracking cortical activations 187
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2 Tracking with Graph Cuts on a Triangulated Surface . . . . . . . . . . . . . . . 190
6.2.1 From Thresholding to Tracking . . . . . . . . . . . . . . . . . . . . . . . . 190
6.2.2 Discretization on a Triangulation . . . . . . . . . . . . . . . . . . . . . . . 191
6.2.3 Tracking Results with Synthetic Data . . . . . . . . . . . . . . . . . . . . 193
6.3 Application to M/EEG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.3.1 Results on visual stimulation . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.3.2 Results on somatosensory data . . . . . . . . . . . . . . . . . . . . . . . . 200
6.4 Conlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7 Single-trial analysis with graphs 207
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.2 Manifold learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2.2 Nonlinear embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.2.3 Laplacian embedding algorithm . . . . . . . . . . . . . . . . . . . . . . . . 214
7.3 Spectral reordering of EEG times series . . . . . . . . . . . . . . . . . . . . . . . 215
7.3.1 Toy examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.3.2 Spectral reordering with realistic time series . . . . . . . . . . . . . . . . 217
7.4 Robust latency estimation via discrete optimization . . . . . . . . . . . . . . . . 218
7.4.1 Optimization framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.2 Graph Cuts algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.3 Result of single-trial latency extraction . . . . . . . . . . . . . . . . . . . 221
7.5 Parameter estimation and robustness . . . . . . . . . . . . . . . . . . . . . . . . 223
7.5.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Conclusion 233
Appendix 237
A Kronecker products 239
B Introduction to Graph-Cuts 241
C Time frequency analysis with Gabor filters 245
D Publications of the author 247
List of Tables
2.1 Review of non commercial software computing the forward problem in M/EEG. 76
2.2 Sample geometry file for OpenMEEG. . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3 Sample conductivity file for OpenMEEG. . . . . . . . . . . . . . . . . . . . . . . 78
2.4 Demo script for computing the forward problem with OpenMEEG in Python. . 80
2.5 Output of Python demo script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.6 Output of testing procedure for OpenMEEG. . . . . . . . . . . . . . . . . . . . . 82
2.7 RDMs precision results with 42 vertices per interface. . . . . . . . . . . . . . . . 83
2.8 RDMs precision results with 162 vertices per interface. . . . . . . . . . . . . . . 84
2.9 RDMs precision results with 642 vertices per interface. . . . . . . . . . . . . . . 84
2.10 MAGs precision results with 42 vertices per interface. . . . . . . . . . . . . . . . 84
2.11 MAGs precision results with 162 vertices per interface. . . . . . . . . . . . . . . 84
2.12 MAGs precision results with 642 vertices per interface. . . . . . . . . . . . . . . 85
2.13 Computing an EEG leadfield with Fieldtrip and OpenMEEG. . . . . . . . . . . 87
3.1 Running a LCMV beamformer with EMBAL . . . . . . . . . . . . . . . . . . . . . 92
3.2 Running MUSIC with EMBAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.3 Running a Minimum-Norm with EMBAL . . . . . . . . . . . . . . . . . . . . . . 100
3.4 Running a Weighted Minimum-Norm with EMBAL . . . . . . . . . . . . . . . . 102
3.5 Running the Gamma-MAP inverse solver with EMBAL . . . . . . . . . . . . . . 114
4.1 Running an IRLS inverse solver with EMBAL . . . . . . . . . . . . . . . . . . . . 123
4.2 Running a LASSO inverse solver using the LARS algorithm with EMBAL . . . 124
4.3 Running an inverse solver using proximity operators with EMBAL . . . . . . . 127
4.4 Running an inverse solver using proximity operators with EMBAL and a con-
straint on the reconstruction error. . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.5 Running an inverse solver with two priors (one non differentiable and an ℓ2term) using proximity operators with EMBAL . . . . . . . . . . . . . . . . . . . . 129
4.6 Runing sparse inverse modeling with temporal data using proximity operators
and EMBAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.1 Edge weights, i.e., link capacities, of the graph for tracking on a triangulated
mesh (no time). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.2 Edge weights, i.e., link capacities, of the graph for tracking on a triangulated
mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.1 Edge weights, i.e., link capacities, of the graph for robust time delay estimation. 222
7.2 Running the lag extraction pipeline on an EEGLAB dataset from the command
line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
15
List of Figures
1.1 Main anatomical structures of the vertebrate brain. . . . . . . . . . . . . . . . . 32
1.2 Axial slide of the brain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.3 Standard naming conventions for planar slices through the brain. . . . . . . . . 33
1.4 Brain hemispheres. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5 The different lobes of the cerebral cortex. . . . . . . . . . . . . . . . . . . . . . . 34
1.6 Main gyri. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 Cortical homunculus by Wilder Graves Penfield. . . . . . . . . . . . . . . . . . . 37
1.8 Cortical layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.9 Brodmann areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.10 Mountcastle’s experiment and cortical columns. . . . . . . . . . . . . . . . . . . 38
1.11 Diagram of a neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.12 Neurons observed with an electron microscope. . . . . . . . . . . . . . . . . . . . 39
1.13 From action potentials to post-synaptic potentials (PSP). . . . . . . . . . . . . . 40
1.14 Action potential propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.15 Pyramidal neurons in medial prefrontal cortex of macaque. . . . . . . . . . . . . 42
1.16 Diffusion MRI in the gray matter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.17 Dipole model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.18 Electric field produced by a current dipole. . . . . . . . . . . . . . . . . . . . . . 44
1.19 Magnetic field produced by a current dipole. . . . . . . . . . . . . . . . . . . . . . 45
1.20 EEG equipment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.21 Sample EEG recordings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.22 Standard positions for EEG electrodes. . . . . . . . . . . . . . . . . . . . . . . . . 46
1.23 An electric potential distribution measured with EEG. . . . . . . . . . . . . . . 47
1.24 MEG devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.25 Magnetic field measured with MEG. . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.26 Spatiotemporal resolution and invasivity of brain functional imaging modalities. 49
1.27 Electrode implantation and recordings with sEEG. . . . . . . . . . . . . . . . . . 50
1.28 Sample fMRI activation map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.29 Sample PET activation map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1 Dipolar approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2 A spherical model with three layers. . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3 Slice of a CT volume and an MRI volume. . . . . . . . . . . . . . . . . . . . . . . 66
2.4 A tetrahedral mesh of the head. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.5 Example of piecewise constant head model. . . . . . . . . . . . . . . . . . . . . . 70
2.6 Example of triangulated surface used as interface in the boundary element
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.7 Head model with nested regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.8 Spherical head model with 5 dipoles close to the inner layer. . . . . . . . . . . . 84
17
18 LIST OF FIGURES
2.9 Evaluation of precision of different implementations of the BEM with three
layers spherical head models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.10 OpenMEEG computation times with parallel processing enabled. . . . . . . . . 86
3.1 Surface-based distributed dipolar sources illustration. . . . . . . . . . . . . . . . 94
3.2 L-curve in Minimum-Norm estimator. . . . . . . . . . . . . . . . . . . . . . . . . 99
3.3 Generalized Cross Validation with Minimum-Norm estimator. . . . . . . . . . . 100
3.4 Illustration of thresholded statistical map obtained with the dSPM and sLORETA.105
3.5 Illustration of a 300 mm2 cortical patch. . . . . . . . . . . . . . . . . . . . . . . . 108
3.6 GCV error vs. spatial resolution k in semilog scale. . . . . . . . . . . . . . . . . 108
3.7 γ-MAP convergence rates observed with the three update schemes. . . . . . . . 113
4.1 Graphical illustration of the difference between ℓ1 and ℓ2 norms. . . . . . . . . 120
4.2 Comparison of convergence speed between Landweber and Nesterov iterative
schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3 Convergence of the optimization with constraint on the reconstruction error. . 128
4.4 Simulation result using a TV prior. . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.5 Evaluation of ‖.‖w;F vs. ‖.‖w;212 vs. ‖.‖w;111 estimates on synthetic somatosen-
sory data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.6 Illustration of result on the primary somatosensory cortex. . . . . . . . . . . . . 140
4.7 Illustration of result on the primary visual cortex (V1). . . . . . . . . . . . . . . 141
4.8 Labeling results of the left primary somatosensory cortex in MEG. . . . . . . . 142
5.1 The path of the visual information from the eyes to the primary visual cortex. . 147
5.2 Schematic representation of the calcarine fissure in medial view (From 20th
U.S. edition of Gray’s Anatomy of the Human Body, 1918 (public domain)). . . . 148
5.3 Illustration of the retinotopic organization in V1. . . . . . . . . . . . . . . . . . . 149
5.4 Retinotopic organization of the primary visual cortex (V1). . . . . . . . . . . . . 149
5.5 Rings and wedges visual stimuli used for retinotopic mapping with fMRI. . . . 150
5.6 Polarity and eccentricity maps obtained by fMRI. . . . . . . . . . . . . . . . . . 151
5.7 Visual areas delineated by fMRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.8 Circular checkerboard pattern used for visual stimulation in [151]. . . . . . . . 152
5.9 A normal pattern reversal VEP measured in EEG. . . . . . . . . . . . . . . . . . 153
5.10 Time-frequency plots obtained using a checkerboard pattern flickering at vari-
ous frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.11 Localization results obtained by Moradi in [151] with fMRI and MEG. . . . . . 156
5.12 Stimuli displayed for retinotopic mapping with MEG. . . . . . . . . . . . . . . . 157
5.13 Amplitude of the FFT at the fundamental frequency for different stimulation
frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.14 Amplitude of response of cat ON-center X ganglion cell. . . . . . . . . . . . . . . 159
5.15 A trial in the protocol for retinotopic mapping with MEG. . . . . . . . . . . . . . 159
5.16 Multi-taper example on a single-trial MEG measurement . . . . . . . . . . . . . 160
5.17 Multi-taper periodogram obtained with 3 different sizes of windows. . . . . . . 160
5.18 Power spectral density at 15 Hz represented on the sensors. . . . . . . . . . . . 161
5.19 Sample time frequency map estimated on the averaged signal measured on the
MLO11 sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.20 Example of histograms for non-parametric statistical tests. . . . . . . . . . . . . 165
5.21 Sample statistical map to be thresholded using a non-parametric statistical test. 168
5.22 Sample thresholded statistical map (p=0.05 with 15000 permutations). . . . . . 169
5.23 Color conventions for each condition represented at their position in the visual
field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
LIST OF FIGURES 19
5.24 Retinotopic map result obtained with minimum-norm. . . . . . . . . . . . . . . 171
5.25 Comparison of retinotopic mapping results obtained on the GM-CSF and on the
WM-GM interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.26 Comparison of retinotopic map results obtained with a MN and with the ℓ212prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.27 Retinotopic map result obtained with an ℓw;212 prior displayed on GM/CSF in-
terface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.28 Example of localization obtained with no orientation constraint. . . . . . . . . . 177
5.29 Illustration of phase locking value. . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.30 Example of phase lock map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.31 Sample phase map used for delay estimation. . . . . . . . . . . . . . . . . . . . . 180
5.32 GCV vs. L-Curve for retinotopic mapping (left hemisphere). . . . . . . . . . . . 182
5.33 GCV vs. L-Curve for retinotopic mapping (right hemisphere). . . . . . . . . . . 183
6.1 Schematic illustration of spatiotemporal active cortical regions. . . . . . . . . . 189
6.2 From thresholding to tracking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.3 Energy discretization on a triangulated mesh. . . . . . . . . . . . . . . . . . . . 192
6.4 Computation times of the tracking algorithm. . . . . . . . . . . . . . . . . . . . . 194
6.5 Result of tracking using the graph cut algorithm on a synthetic dataset. . . . . 195
6.6 Labeling errors obtained by the tracking algorithm for various pairs of regular-
ization parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.7 Result of tracking using the graph cut algorithm on the “Bunny” triangulation. 197
6.8 One block of successive frames used to produce expanding checkerboard rings. 198
6.9 Schematic representation of the cortical activation propagation produced by the
expanding checkerboard rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.10 Experimental protocol for visual stimulation with the expanding checkerboard
rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.11 Tracking results obtained with visual stimulation of expanding checkerboard
rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.12 Comparison between naive thresholding and tracking with spatiotemporal reg-
ularization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.13 Result of tracking using the graph cut algorithm on somatosensory dataset. . . 204
6.14 Influence of the regularization on the tracking results on the somatosensory
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.1 Illustation of raster plot reordering on real EEG recordings. . . . . . . . . . . . 210
7.2 PCA analysis of a set of 500 jittered time series of 512 time samples. . . . . . . 211
7.3 Non-linear embedding into a low-dimensional Euclidian space . . . . . . . . . . 212
7.4 Illustration of manifold learning using graph Laplacian . . . . . . . . . . . . . . 216
7.5 Illustration of manifold learning using graph Laplacian on a synthetic dataset
with latency and scale variability. . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.6 Reordering results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.7 Spectral reordering results on synthetic data. . . . . . . . . . . . . . . . . . . . . 218
7.8 Spectral reordering results on EEG oddball time series. . . . . . . . . . . . . . . 219
7.9 Result of binary partitioning using the graph cut algorithm. . . . . . . . . . . . 220
7.10 Graph illustration for an image N × T (N = 3 time series of length T = 4) with
an example of minimal cut in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.11 Evoked potentials illustrations using single-trial latency estimation. . . . . . . 223
7.12 E∗α as a function of r and σ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.13 Reordered raster plots with lags estimate for different values of α. . . . . . . . 225
7.14 Simulation results and errors estimates with different types of evoked responses.226
20 LIST OF FIGURES
C.1 Spectral support of a Gabor filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
C.2 Gabor atoms for different values of the oscillation parameter. . . . . . . . . . . 246
C.3 Sample time frequency map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
List of Abbreviations
AP Action potential
BEM Boundary Element Method
CNS central nervous system
ECD equivalent current dipole
EEG Electroencephaphy
FDM Finite Difference Method
FDR False Discovery Rate
FEM Finite Element Method
fMRI Functional Magnetic Resonance Imaging
GCV generalized cross-validation
M1 Primary motor cortex
MEG Magnetoencephalography
MN Minimum-Norm
nIRS Near-infrared Spectroscopy
PET Positron Emission Tomography
PLV Phase locking value
PSP Post-synaptic potential
ReML Restricted Maximum Likelihood
S1 Primary somato-sensory cortex
SBL Sparse Bayesian Learning
SNR Signal to noise ratio
SVD Singular Value Decomposition
SymBEM Symmetric Boundary Element Method
TV Total variation
V1 Primary visual cortex
WMN Weighted Minimum-Norm
21
LIST OF FIGURES 25
CONTEXT
With approximately 1012 neurons in the central nervous system (CNS), 1015 synaptic con-
nections releasing and absorbing 1018 neuro-transmitter and neuro-modulator molecules per
second, the human brain is an object of prodigious complexity. If it were a computer, it would
be capable to process 1012 Gigabits of information per second, all in about 1.6 Kg of weight and
with a power consumption of 10-15 Watts [68]. The study of the brain activity with medical
imaging methods is named functional neuroimaging.
In the last 30 years, neuroimaging has been a very active field of research. From 1985
to 2005, the number of related publications has increased by an order of magnitude. Func-
tional neuroimaging however has a history that dates back far earlier. Human brain activity
was first recorded by Hans Berger (1929) [98] who measured the first electroencephalogram
(EEG) in humans. Later, in the 60’s and 80’s, several other neuroimaging techniques were
introduced. The best known are magnetoencephalography (MEG), positron emission tomog-
raphy (PET), functional magnetic resonance imaging (fMRI) and near-infrared spectroscopy
(nIRS). fMRI is the most popular functional neuroimaging modality. One reason is that MRI
scanners used for anatomical imaging can also be employed for functional imaging. EEG with
its cheap instrumentation cost comes next, followed by PET, MEG, and nIRS. Even if fMRI
is the most popular modality in the neuroimaging field, statistics prove that MEG and EEG
research received a growing interest in the 90’s, which can be explained by the improvement
of acquisition devices, by the development of MRI as an anatomical basis for M/EEG stud-
ies, and also by the development of new methods adapted from other research fields such as
signal and image processing, statistics, and scientific computing.
MEG and EEG (collectively M/EEG) are electromagnetic brain imaging modalities whose
interest comes from the electric nature of neuronal communications. Neurons communicate
with the displacements of electric charges that produce tiny currents. Neurons can be seen
as tiny current generators. In order to produce electromagnetic fields detectable outside of
the head, multiple neurons within a same structure need to act in concert. As opposed to
fMRI that measures differences of blood oxygenation associated with the neuronal activa-
tions, M/EEG have a direct and instantaneous access to the electric phenomena. Therefore,
MEG and EEG have an excellent temporal resolution.
In order to measure the electric potentials generated by neuronal activity an EEG device
consists of a set of electrodes that are applied on the scalp so to establish electrical contact
with the skin. Modern full head EEG caps can have nowadays more than a 200 electrodes.
The counterpart of EEG is MEG that measures the magnetic fields generated by the neuronal
activity. The first MEG measurements date back to the research of David Cohen in the 60’s
[35]; the first whole head MEG systems with hundreds of sensors capable of imaging the
entire brain became available in the early 90’s.
Traditionally, the EEG analysis has been based on inspection of the morphology of wave-
forms. As a matter of fact, most current practice of EEG in neurology is still based on these
first attempts. Typically, neurological clinics perform EEG examinations for epilepsy, sleep
disorder, migraine, and a few other pathological conditions for which the waveform bears di-
agnostic utility, as for spikes, spindles, generalized slowing, temporal theta, etc. Meanwhile,
basic electrophysiological research has taken a different path.
The development of digital computers, together with the advances of signal processing
methods contributed to transform M/EEG data analysis into a domain of research for engi-
neers, physicists and mathematicians. Assisted by the invention of the FFT algorithm in
1965 by Cooley and Tuckey, frequency-domain analysis of EEG time series, such as power
spectral density estimation or phase coherence, has been used since the 60’s for cognitive
and clinical studies. Time-frequency analysis of event-related synchronization and desyn-
26 LIST OF FIGURES
chronization (ERS/ERD) has provided means to study brain dynamics in the scale of tens
of milliseconds, preserving both spatial and spectral information. This has extended task-
related brain studies beyond evoked response potentials and morphology. More recently, in
the 90’s, the advances of anatomical MRI data, giving access to individual brain anatomy,
marked the transition into the era of functional localization of M/EEG activity.
Results of functional localization of M/EEG activity can be seen as 3D volumetric or 2D
surface images of the living brain. While a standard movie is streamed with 25 images per
second, accurate functional imaging with M/EEG could provide around a thousand images
of the brain per second. However accurate functional localization of M/EEG activity with
a temporal resolution of 1 kHz is a partially solved problem and is still a major challenge of
M/EEG data analysis. To reach this goal, various computational and mathematical challenges
need to be tackled, turning the study of the brain activations with M/EEG in a strongly
multidisciplinary field of research at the crossroads of neurophysiology, signal processing,
electromagnetism, multivariate statistics, and scientific computing.
In this thesis, various mathematical and computational aspects of M/EEG data analysis
are covered, with the constant objective of being able to achieve accurate localization in space
and reconstruction of the dynamics of neural activity. Our contributions start by the accurate
modeling of the head as a medium that propagates the electromagnetic fields produced by the
neurons. This problem, known as the forward problem, has a unique solution. The solution
can be obtained analytically for spherical head models but requires numerical solvers when
realistic head models are considered. Improving the speed and accuracy of such solvers, but
also facilitating their usability in the M/EEG community is the topic covered by the first part
of this thesis.
To estimate the current generators underlying noisy M/EEG data, one has to solve an
electromagnetic inverse problem. Theoretically, a specific electromagnetic field pattern may
be generated by an infinite number of current distributions. Fortunately, physiological and
anatomical information can be employed to constrain the solution. The problem is said to
be ill-posed. In this thesis we focus on distributed inverse solvers. The use of such solvers
is motivated by their ability to provide localization results for activation patterns involving
multiple generators distributed over the entire brain. In order to tackle this challenging prob-
lem, we provide in this thesis very efficient optimization methods in order to get algorithms
tractable on real datasets. Our methodological contributions go beyond the inverse problem
by proposing a method to robustly follow over time the activations on the cortex. The main
motivation for the development of the methods detailed in this thesis, was to contribute to the
study of human vision with MEG and EEG. Throughout this thesis, methods are tested with
real M/EEG data in order to prove their effectiveness and relevance for clinical and cognitive
M/EEG studies.
LIST OF FIGURES 27
ORGANIZATION AND CONTRIBUTIONS OF THIS THESIS
Chapter 1 - Neural basis of EEG and MEG
The EEG and MEG signals are generated by the electrical activity of the neurons. At the
cellular level, displacements of electric charges create tiny differences of potential. In the
cortex, groups of neurons, particularly pyramidal neurons in the cortex, form structured as-
semblies that, when simultaneously active, produce electromagnetic fields detectable outside
of the head. Human EEG recordings date back to 1929 with the German physiologist and
psychiatrist Hans Berger, while the first MEG recordings were obtained in the late 60’s by
David Cohen. In this first chapter, we review the physiological basis of the generation of the
signal measured by MEG and EEG and provide some details on the evolution of acquisition
devices from their discovery to the most recent systems.
Chapter 2 - The forward problem
Understanding how a current generator located inside the head can produce a distribution of
potential on the scalp or a magnetic field outside of the head is called the forward problem.
Because of the low frequency of the signals measured with M/EEG, the time derivatives in the
Maxwell’s equations can be neglected. In this quasi-static approximation, the forward mod-
eling implies that the signal measured on the sensors is the instantaneous sum of the signals
produced by each current generator. However, computing this linear operator, i.e., solving
the forward problem with a realistic head model can be mathematically and computationally
challenging. In this chapter, we review existing methods to solve the forward problem with
different assumptions for the conductor geometry of the head. With realistic head models the
solution is not known analytically and is obtained with numerical solvers. The first contribu-
tion of this thesis is on the efficient and precise numerical resolution of this problem using a
Boundary Element Method (BEM) called the Symmetric BEM. This approach is compared to
alternative open source solvers, demonstrating its excellent precision.
Chapter 3 - The inverse problem with distributed source models
While the forward problem provides the link between the measured signal and the neural
current generators, the inverse problem aims at estimating the positions and amplitudes of
these generators from a limited number of noisy measurements. Three types of approaches
exist: parametric methods also referred as dipole fitting, scanning techniques and image-
based methods with distributed source models. The latter approach formulates the inverse
problem as a deconvolution problem where the convolution operator, or smoothing kernel,
is the solution of the forward problem. Such an approach offers a rigorous formulation of
the inverse problem without making strong modeling assumptions. However, the problem is
strongly ill-posed. The solution of such problems classically requires to impose constraints
or priors on the solution. This chapter is dedicated to the presentation of priors based on
the ℓ2 norm. Implementation details and practical information are carefully detailed. The
presentation covers standard minimum-norm methods, noise normalized solutions (dSPM
and sLORETA), spatio-temporal solvers, and finally Bayesian approaches where the prior is
not fixed a priori but learned from the data.
Chapter 4 - M/EEG inverse modeling with non differentiable constraints and sparse
priors
Standard ℓ2 priors lead to very convenient linear inverse solvers but produce source estimates
smeared out over the cortex. The ℓ2 prior is said to lead to solutions with high diversity, as
opposed to solutions with high sparsity where only a few sources have non-zero activations.
28 LIST OF FIGURES
Such a behavior of the ℓ2 norm can become problematic when one attempts to achieve pre-
cise localization of focal sources. In order to reduce this problem, Bayesian learning of the
prior can be an alternative. In this chapter, we investigate priors where the sparsity of the
reconstruction is induced by the choice of the prior. The ℓ1 norm has this interesting prop-
erty and has proved its ability to efficiently solve very challenging ill-posed problems in signal
processing and machine learning. Unfortunately, such a prior leads to non differentiable opti-
mization problems for which the solutions cannot be obtained in closed-form as in the ℓ2 case.
In this chapter, we review some algorithms that can be used to efficiently solve ill-posed prob-
lems involving the ℓ1 norm. We promote iterative algorithms based on the use of proximity
operators and show that they provide a very general approach for solving inverse problems
previously introduced in the M/EEG literature. We also explain how structured sparsity with
mixed norms can be used to provide an efficient spatiotemporal solver and develop a new
framework to compute source estimates for multiple experimental conditions simultaneously
using an inter-condition prior.
Chapter 5 - Fast retinotopic mapping with MEG
This chapter presents a direct application of the previous chapters to a real case study. The
objective of this study was to achieve retinotopic mapping with MEG. The motivation for this
work was twofold. First, we wanted to demonstrate that MEG could reproduce the retinotopic
maps obtained by standard protocols in fMRI. Second, thanks to the excellent temporal res-
olution of MEG, we gain access to brain dynamics during visual processing. In this chapter,
we present the anatomical basis of the human visual system, detail the experimental proto-
col we contributed to design, and also the methodological tools we implemented in order to
provide retinotopic maps with MEG. The protocol is based on steady-state visual evoked po-
tentials. We discuss the algorithmic details of the signal extraction procedure and our method
for non-parametric statistical tests. We present results obtained with linear inverse solvers
and illustrate their limitations. To address these limitations, we propose to include all the
experimental conditions simultaneously in the analysis and to use an inter-condition sparse
prior based a mixed norm described in the previous chapter. Finally, we give some insight
on how timings and delays of propagation could be extracted from the phase of the Fourier
spectrum of the source activation time series.
Chapter 6 - Tracking cortical activations with graph cuts
The work presented in this chapter attempts to go one step further from source localization in
order to provide a clear representation of the cortical dynamics during neural processing. The
linear ℓ2 inverse solvers are convenient to use but produce huge amounts of data out of which
the relevant information needs to be extracted. The purpose of our contribution presented
in this chapter, is to extract from the mass of data provided by distributed inverse solvers
the spatio-temporally consistent activations. The algorithm provides a robust and principled
way to track the “hot-spots”, i.e., active regions, over the triangulated cortical mesh. A vari-
ational formulation of the problem is derived and a very efficient optimization method based
on graph-cuts is detailed in order to find globally optimal solutions.
Chapter 7 - Graph-based estimation of 1-D variability in event related neural re-
sponses
The last contribution of this thesis addresses a particularly challenging problem in M/EEG
data processing: parameter estimation from single-trial data. In classical M/EEG data pro-
cessing pipelines, the signal-to-noise ratio of the measured data is improved by averaging
multiple recordings obtained under the same experimental conditions. By doing so, one as-
sumes that the signal of interest is the same in each repetition, also called a trial. This is
LIST OF FIGURES 29
unfortunately not true, as the neural response of the subject can vary, typically because of
habituation effects, anticipation strategies, or fatigue. This is particularly the case for brain
responses occurring late after the stimulation. Such late activations can correspond to higher
cognitive levels of processing and are therefore of major interest to better understand how
our brain performs complex cognitive tasks. The method uses advanced graph-based meth-
ods and has numerous advantages over alternative strategies: trial averaging is not used
in the estimation, they provide solutions with global optimality, thus avoiding initialisation
problems, finally thanks to the efficiency of the method, parameters can be rapidly estimated
by cross-validation and grid search.
Appendices
Appendix A - Kronecker products
This appendix is a brief introduction to the manipulation of Kronecker products. The Kro-
necker product is a valuable tool to manipulate spatiotemporal regularizations as illustrated
in chapter 3.
Appendix B - Introduction to graph cuts
In this appendix, we present the basic concepts on graph cuts in order to facilitate the under-
standing of the optimization methods used in chapter 6 and chapter 7.
Appendix C - Time frequency analysis with Gabor filters
This appendix contains a description of the Gabor filters used to compute the time-frequency
analysis results presented in chapter 5.
Appendix D - Publications of the author
In this appendix, we list the submitted and the already published material from the author.
Software contributions
Finally, we would like to point out that all the algorithms presented in this thesis are avail-
able on the INRIA Forge.
The forward solver OpenMEEG detailed in chapter 2 is available at:
https://gforge.inria.fr/projects/openmeeg/
The Matlab interface we developed was integrated into the current release of Fieldtrip and is
available for download from the Fieldtrip home page:
http://fieldtrip.fcdonders.nl/
All the implementations of the inverse solvers presented in chapters 3 and 4, with also the
code to perform the tracking detailed in chapter 6, are available in a MATLAB Toolbox called
EMBAL (Electro-Magnetic Brain Activity Localization):
https://gforge.inria.fr/projects/embal
Most of the figures presented in this thesis are done with the functions implemented in EM-
BAL.
Finally the EEGLAB plugin to perform parameter estimation on single-trial M/EEG data
as described in chapter 7 is available here:
https://gforge.inria.fr/projects/eeglab-plugins/
CHAPTER 1
NEURAL BASIS OF EEG AND MEG
MEG and EEG measure the electromagnetic signal produced by the activity of our brain. To
provide more insight into the physiological phenomena behind M/EEG measurements, this
first chapter discusses the biological aspects of the functioning of the human brain.
Contents
1.1 Anatomy and electrophysiology of the human brain . . . . . . . . . . . . 32
1.1.1 General brain structures: From macro to nano . . . . . . . . . . . . . . . 32
1.1.2 How neurons produce electromagnetic fields . . . . . . . . . . . . . . . . 40
1.2 Instrumentation for MEG and EEG . . . . . . . . . . . . . . . . . . . . . . . 44
1.2.1 Electroencephalography (EEG) . . . . . . . . . . . . . . . . . . . . . . . . 44
1.2.2 Magnetoencephalography (MEG) . . . . . . . . . . . . . . . . . . . . . . 47
1.2.3 Other modalities for brain functional imaging . . . . . . . . . . . . . . . 47
1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
31
32 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
1.1 ANATOMY AND ELECTROPHYSIOLOGY OF THE HUMANBRAIN
1.1.1 General brain structures: From macro to nano
Together with the spinal cord, the brain forms the central nervous system (CNS). It is the
largest part of the nervous system and is itself composed of a lower part, the brainstem, and
an upper part, the prosencephalon, a.k.a., the forebrain. In figure 1.1 the brainstem includes
the mesencephalon, the medulla, and the pons. It connects the two remaining structures
that form the prosencephalon, i.e, the telencephalon and the diencephalon, to the spinal cord.
The medulla, or lower part of the brain stem, controls unconscious activity of muscles and
glands involved in breathing, heart contraction, salivation, etc. Just above the medulla, the
pons connects the two hemispheres of the cerebellum which is located in the inferior posterior
portion of the head (directly dorsal to the pons). The diencephalon is located in the midline of
the brain and contains the thalamus and the hypothalamus. The most superior structure, the
telencephalon, or cerebrum, includes the lateral ventricles, the basal ganglia and the cerebral
cortex.
Figure 1.1: Main anatomical structures of the vertebrate brain (Source wikipedia.org).
An axial slice (see figure 1.3 for naming conventions) of the cerebrum presented in fig-
ure 1.2 exhibits two main structures: the white matter and the gray matter. The gray matter
of the cerebrum forms the cerebral cortex, a.k.a., the neocortex. The neocortex forms the
majority of the cerebrum and corresponds to its most exterior part. It has a left and a right
hemisphere (see figure 1.4). It is assumed that the neocortex is a recently evolved structure,
and is associated with “higher” information processing by more fully evolved animals (such
as humans, primates, dolphins, etc.).
Each hemisphere of the neocortex is generally divided into 4 lobes as represented in fig-
ure 1.5.
The following functions can be roughly related to each lobe:
33
White matter
Gray matter
Figure 1.2: Axial slide of the brain (Adapted from: dartmouth.edu).
SAGITAL SLICE
CO
RO
NA
L S
LIC
E
AXIAL SLICE
MEDIAL
LATERAL
RIGHT
LATERAL
LEFT
POSTERIORANTERIOR
DORSAL
VENTRAL
Figure 1.3: Standard naming conventions for planar slices through the brain.
34 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Left
Hemisphere
Right
Hemisphere
Figure 1.4: Brain hemispheres. At first glance the two hemispheres are very similar but their
detailed structure is clearly different.
F R O
N T
A L
L
O B
EP A R I E T A L L O B E
T E M P O R A L L O B E
OCCIPITALLOBE
CE
NT
RA
L F.
OCCIPITAL F.
SYLVIAN F.EXOOCCIPITAL FF.
Figure 1.5: The different lobes of the cerebral cortex: the occipital, the parietal, the temporal,
and the frontal lobes (From 20th U.S. edition of Gray’s Anatomy of the Human Body, 1918
(public domain)).
35
• Frontal Lobe: associated with reasoning, planning, parts of speech, movement, emo-
tions, and problem solving.
• Parietal Lobe: associated with movement, orientation, recognition, perception of stim-
uli, and speech.
• Occipital Lobe: associated with visual processing.
• Temporal Lobe: associated with perception and recognition of auditory stimuli, memory,
and speech.
Lobes are separated by major fissures that are present in all individuals. This makes the
identification of the different lobes on a particular subject possible by simple visual inspec-
tion. For example, the parietal and the frontal lobes are separated by the central fissure,
a.k.a. the central sulcus, and the temporal lobe is separated from the parietal lobe and the
frontal by the Sylvian fissure (cf. figure 1.5). Fissures are also commonly called sulci.
The counterpart of the cortical fissures are the gyri. Gyri are the structures between the
fissures. The main gyri are presented in lateral and medial views in figure 1.6. Some of the
gyri contain brain regions with known cognitive functions like the post-central gyrus that
includes the primary somatosensory cortex (S1), cf., figure 1.7.
Such a knowledge on the localization of some brain functions is particularly interesting
from a methodological point of view as it provides a way to achieve validation. Many M/EEG
methodological tools are tested on datasets involving somesthetic stimulation. This is the
case, for example, in chapter 4 and chapter 6.
A closer look at the gray matter shows that its structure varies across the different re-
gions. The structural properties of the gray matter include the number of layers (see fig-
ure 1.8), the cell composition, the thickness and organization. These properties, called by
neuroanatomists cytoarchitectonic properties, are not the same over the whole surface of the
cortex. Their differences led, in 1909, the neuroanatomist Korbinian Brodmann to divide
the cortex into regions called Brodmann areas (see figure 1.9) whose historical characteris-
tics were homogeneous [25]. Some functions were then assigned to some of these areas. For
example the visual cortex, which is the object of an MEG study in chapter 5, corresponds to
areas 17 and 18. Although, even if this subdivision of the cortex in Brodmann areas seems
very convenient, its utility in brain functional imaging studies are usually limited labelling
a particular brain region: it is more convenient to write Brodmann area 5 (BA5) than the
“posterior part of the post-central gyrus”.
Generally speaking, most of the cortex is made up of six layers of neurons, from layer I
at the surface of the cortex to layer VI, close to the white matter. For humans, the cortical
thickness varies from 3 to 6 mm. The organization of the cortex is not only laminar. It
has been observed that neurons one meets when moving perpendicular to the cortex tend
to be connected to each other and to respond to precise stimulations with similar activities
throughout the layers. They form a cortical column. This columnar organization of the cortex
was discovered by Mountcastle with a pioneering experiment in 1957 [157]. With electrode
recordings, he showed that neurons inside columns of 300 to 500 µm of diameter displayed
similar activities. This is illustrated in figure 1.10. More detailed information about cortical
structure and function can be found in [121, 124, 175].
The gray matter is composed of neurons and glial cells. The human brain contains around
1012 neurons. The neurons are linked together and each neuron has up to 10000 connections.
The neuron is a cell with a special shape: it is composed of a soma or cell body, containing the
nucleus, a dendritic tree and an axon, as shown in figure 1.11. The white matter is formed
predominantly by myelinated axons interconnecting different regions of the central nervous
system.
36 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
(a) Gyri lateral view
S U P .
C I N G U L A T E
P A R A C E N T R A L
L O B U L E
L I N G U A L
C U N E U S
P R A
E C U
N E
U S
H I P P O C A M P A L
UNCUS
F U S I F O R MINF. TEMP. GYRUS
Infrtemp.
sulcus
F o r n i x
C o r p u s
C i n g u l a t e
C a
l c
a r i
n e
f i s s u r e
Central
sulcus
Parie
to-o
ccip.
fissure
I S T H M U S
Parolfa
ctory
area
G Y R U S
G Y R U S
G Y R U S
G Y R U Ss u l c u s
c a l l o s u m
F R O N T A LG Y R U S
(b) Gyri medial view
Figure 1.6: Main gyri presented in lateral (a) and medial (b) views (From 20th U.S. edition of
Gray’s Anatomy of the Human Body, 1918 (public domain)).
37
Figure 1.7: Cortical homunculus by Wilder Graves Penfield [174]. It represents the mapping
the primary sensory (S1) and primary motor (M1) cortex. S1 lies on the posterior wall of the
central sulcus (cf. post central gyrus in figure 1.6(a)) and M1 on the anterior part. These
maps were established by direct electrical stimulation on patients during surgery. Primary
auditory cortices (A1), left and right, are represented in the temporal lobes.
Figure 1.8: Cortical layers. Layer organization of the cortex (a) Weigert’s coloration shows
myelinated fibers (axons) and so the connections inside and between layers, (b) Nissl’s col-
oration only reveals cell bodies (c) Golgi’s coloration shows the whole cells (From [163]).
38 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Lateral
Surface
Medial
Surface
(a) Reproduction from original
(b) Lateral schematic view (From 20th U.S. edition ofGray’s Anatomy of the Human Body, 1918 (public do-main))
(c) Medial schematic view (From 20th U.S. edition ofGray’s Anatomy of the Human Body, 1918 (public do-main))
Figure 1.9: Brodmann areas. In 1909, Brodmann [25] divided the cortex into 52 cytoarchitec-
tonic areas according to the thickness of the cortical layers. For example, layer IV is very thin
in the primary motor cortex (area 4) while it is very thick in the primary visual cortex (area
17).
CORTEX
ELECTRODES
WHITE MATTER
activities
similar
activities
dierent
Figure 1.10: Mouncastle’s experiment and the discovery of the columnar organization of the
cortex. When he moved an electrode perpendicular to the cortex surface, he encountered neu-
rons with similar electrical activities while moving the electrode obliquely gave him different
types of recordings. So he showed the existence of 300-500 µm wide columns in the cortex.
39
Dendrite
Cell body
Node ofRanvier
Axon Terminal
Schwann cell
Myelin sheath
Axon
Nucleus
Figure 1.11: Diagram of a neuron (Source wikipedia.org)..
Figure 1.12: Neurons observed with an electron microscope.
40 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
1.1.2 How neurons produce electromagnetic fields
A neuron can be viewed as a signal receiver, processor and transmitter: the signal coming
from the dendrites is processed at the soma and generates (or not) an action potential which
is carried along the axon towards other neurons. During this process neurons produce elec-
tromagnetic fields at the basis of the M/EEG measurements.
The signals in the dendrites are called post-synaptic potentials (PSPs). The signal emitted,
moving along the axon, is called the action potential (AP).
Post-synaptic potential (PSP)
The junction between the axon terminal of a neuron and a dendrite or the soma of another
neuron is called a synapse. It can be a direct electrical junction, but synapses are mostly
chemical: when an action potential reaches the end of an axon terminal, it leads to the release
of neuro-transmitters. Neuro-transmitter molecules that reach an other neuron affect the
membrane permeability so that specific ions (Na+ and K+) penetrate inside, increasing the
resting state potential of about 10 mV with a duration of 10 ms. This is called a post-synaptic
potential, shown in figure 1.13.
Action potential
If many post-synaptic potentials sum up, the membrane potential of the soma can locally
reach a certain threshold which causes the neuron to “spike”: some voltage-sensitive chan-
nels open, allowing positive ions to flow inside the cell, and the potential inside the neuron
increases suddenly. The potential comes back rapidly to its resting state (in 1 ms), with the
help of other voltage-sensitive channels that allow a compensating outward current. Because
of this peak of potential, the nearby regions also reach the threshold: the action potential
thus propagates along the axon, as illustrated in figure 1.14. See [125] for more details on
the the ion mechanisms responsible for these two types of potentials.
Axon
Synapse
Dendrite
Neuron
body
Action Potentials
Action PotentialsPSP
Axon
-- - - -
-- -
++ ++ + + +
+ + + +---
Spike
initiation zone
+
++
++
+
Figure 1.13: From action potentials to post-synaptic potentials (PSP). Illustration with a
chemical synapse. The action potentials reach the neuron on its dendrites via chemical
synapses. It creates post-synaptic potentials that by summation generate other action po-
tentials that can propagate along the axon of the neuron.
These two types of potentials create some displacements of charges and therefore some
very small currents within the neuron: the intracellular or primary currents. These currents,
42 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
however, create very tiny electromagnetic fields that cannot be directly measured outside of
the head with M/EEG. In order to have measurable signals, these tiny fields need to sum
up. Action potentials have a temporal duration close to the millisecond making them hard
to synchronize in order to sum up. On the contrary, PSPs have a temporal duration around
10 ms. This makes PSPs much better candidates to produce measurable electromagnetic
fields outside the head. The temporal resolution of the phenomena points out a necessary but
not sufficient condition to get good M/EEG signals. Electrical currents are vectorial quanti-
fies. They have both an amplitude and a direction. In order to actually sum up, the currents
produced by the neurons need to have a common direction. Following the conclusions of [159],
it is necessary to add the field amplitudes of about 104 neurons with dendrites having a com-
mon direction to produce a field amplitude that is detectable from outside the head. For
instance, stellate cells which have dendrites in all directions can not produce a measurable
field. Only neurons called pyramidal cells have the regular geometric structure organization
that is required to sum up the fields generated by their post synaptic potentials.
Pyramidal neurons
The bodies and dendrites of pyramidal neurons are located mostly in the gray matter of the
cortex, and they all have a thick dendrite (called apical dendrite) extending towards the ex-
terior of the cortex, perpendicularly to its surface, as shown in figure 1.15. These neurons
constitute about 70%-80% of the neocortex, and their density is such that theoretically the
simultaneous activation of an area of 1 mm2 of the cortex would be detectable. However,
an experimental study showed that the minimal detectable activity spreads over an area of
about 100 mm2 [100].
(a) Pyramidal neurons
Current
(b) Pyramidal neurons and the produced intracellularcurrents.
Figure 1.15: Pyramidal neurons in medial prefrontal cortex of macaque (Source brain-
maps.org).
This structured organization of pyramidal cells has been discovered by invasive studies
that provided experimental results like the one presented in figure 1.15(a). Nowadays, due
to the progress of brain imaging devices like MRI, more precisely diffusion MRI, this or-
ganization can be observed non invasively. Diffusion MRI offers the possibility to measure
the anisotropy of the diffusion of water molecules in living tissues. This is presented in fig-
ure 1.16. In order to obtain images with such a good signal-to-noise ratio, the acquisition
was performed ex-vivo with a very long period of scanning. Note that the principal directions
of diffusion in the gray matter follow the organization of the cortical layers and the general
structure of pyramidal neurons assemblies.
43
(a) Principal eigenvector directions from tensor fittingsuperimposed on T1 MRI
(b) Tractlets rendered in 3D (courtesy of GordonKindlmann)
Figure 1.16: Principal directions of water molecules diffusion estimated with tensor fitting
on Diffusion MRI. Orientations appear to be very well organized with directions given by the
normals to the cortical mantle. (Data: Dr J McNab & Dr K Miller, FMRIB, Oxford 3T Siemens
ex-vivo whole-head diffusion imaging, .7x.7x.7mm).
Models of brain electric activity for EEG and MEG
The consequence of the latter observations for EEG and MEG is that the brain activity is
observed at a macroscopic scale with respect to the size of a neuron. They capture the elec-
trical activity of structured assemblies of neurons. The typical size of the neuron assemblies
observable with EEG or MEG is larger than the size of cortical columns but smaller than the
size of a cortical area. For the last three decades, neuroscientists have built models of neuron
assemblies [48, 76, 113, 199, 226, 237], based on the knowledge of neuronal dynamics, but
these dynamics are far from being fully understood.
These observations lead us to the problem of modeling brain electric activity for EEG and
MEG. The main assumption is that the measurements corresponds to the activity of one or
several assemblies of neurons. For one assembly, the EEG or MEG measurements only reflect
its average activity, but usually the intrinsic dynamics of the group of neurons is unknown.
As a consequence, for EEG and MEG, the most common model of the brain activity assumes
that each source reflects the average activity within an assembly of neurons. The intrinsic
dynamics of an assembly of neurons is hidden due to this averaging. Note that such a model
agrees with the columnar organization of the cortex mentioned above. As explained in section
2.2.1, the area of a neuron assembly is small compared to the distance to the observation point
(the M/EEG sensors). Therefore, the electromagnetic fields produced by an active neuron
assembly at the sensor level is very similar to the fields produced by a current dipole. As a
first approximation, this makes current dipoles relatively good models for active brain regions
(cf. figure 1.17).
Assuming the simple dipolar model for current generators whose activity is measured by
M/EEG, the electric and magnetic fields produced by an active brain region can be schemat-
ically represented like in figure 1.18 and figure 1.19. The summation of the neural currents
produced by elementary generators can be approximated by an equivalent current dipole
44 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Figure 1.17: The activity of a small region of the brain can be approximated by a current
dipole. The position of the dipole (the dot) is at the center of the activated cortex area (in red)
and the moment of the dipole (the green arrow) corresponds to the average orientation of the
pyramidal neurons in this region (perpendicular to the cortical surface).
(ECD). The electromagnetic fields produced by this ECD are strong enough to be measured
outside the head. This raises the question of how to measure these fields.
EElectric Field
Neural
Current
(post synaptic)
Equivalent
Current
Dipole
Figure 1.18: Electric field produced by neural currents modeled by an equivalent current
dipole (ECD)
1.2 INSTRUMENTATION FOR MEG AND EEG
1.2.1 Electroencephalography (EEG)
The first human EEG recordings date back to the first measurements by the German physi-
ologist and psychiatrist Hans Berger in 1929. The recording is obtained by placing electrodes
which measure the electric potential on the scalp of the subject (cf. figure 1.20).
45
BMagnetic Field
Neural
Current
(post synaptic)
Equivalent
Current
Dipole
Figure 1.19: Magnetic field produced by neural currents modeled by an equivalent current
dipole (ECD)
(a) EEG recordings in 1949 (b) Modern EEG recordings (Odyssee project team, IN-RIA Sophia Antipolis)
Figure 1.20: EEG equipment: the electrode helmet is placed on the head of the subject, then
the signal is processed through an amplifier.
To obtain congruence among different laboratories, a standard electrode placement scheme
was proposed by Jasper in 1958 [115], basing the positioning on head anatomical landmarks
(see figure 1.22). This standardization marked the beginning of modern electroencephalog-
raphy. The number of electrodes used in research has increased over the years from around
19 of Jasper’s time to as many as 512 today, however the 10-20 system with 19 electrodes is
still the dominant standard in clinical settings and most research is carried out with 19 to 64
electrodes.
In a modern EEG system, the electrodes are connected to an amplifier and the signals are
then digitized and stored on a computer. Signals measured by EEG sensors have an order of
magnitude in the range of a few µV. An example of EEG recordings is presented in figure 1.21.
The advantage of this device is its simplicity and cheap cost. Unfortunately, the low con-
ductivity of the skull tends to diffuse the electric potential. As illustrated in figure 1.23, at
the surface of the scalp, the potential only reflects roughly the underlying brain activity.
46 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Figure 1.21: Sample EEG recordings. Each time series is the signal measured by one elec-
trode. Electrodes have names (e.g., FP1, F3, C3 etc.) function of their position of the scalp (cf.
figure 1.22).
Figure 1.22: The international 10-20 system seen from (A) left and (B) above the head. A =
Ear lobe, C = central, Pg = nasopharyngeal, P = parietal, F = frontal, Fp = frontal polar, O =
occipital. (C) Location and nomenclature of the intermediate 10% electrodes, as standardized
by the American Electroencephalographic Society. (Adapted from [67]).
47
−1.1
−0.5
0
0.5
1.1
(a) 3D topography
−1.1
−0.5
0
0.5
1.1
(b) 2D topography
Figure 1.23: The electric potential distribution measured with EEG on a somato-sensory
experiment 20 ms after stimulation (Adapted from [211]).
1.2.2 Magnetoencephalography (MEG)
The magnetic counterpart of EEG, the magnetoencephalogram, was recorded 40 years later
(1968), using room temperature coils and signal averaging on the basis of EEG [35]. Fur-
ther progress in MEG required highly sensitive magnetic detectors based on superconducting
and quantum phenomena and are called SQUIDs (superconducting quantum interference de-
vice). In 1969, Zimmerman and colleagues developed the first SQUIDs. They were first used
for MEG in 1972 by David Cohen [36]. After this pioneering work, the field of MEG devel-
oped first by using single-channel devices, followed by somewhat larger systems with 5 to 7
channels in the mid 1980s, then systems with 20 to 40 sensor arrays in the late 1980s and
early 1990s. The first MEG systems with a helmet covering the entire cortex were introduced
in 1992. Today MEG systems have several hundreds channels in a helmet arrangement (see
figure 1.24) allowing to capture the signal originating from the whole brain simultaneously.
More details can be found in [100, 217].
MEG measurements span a frequency range from about 10 mHz to 1 kHz and field mag-
nitudes from about 10 fT for spinal cord signals to about several pT for brain rhythms. To
realize how small the MEG signals are, it should be recalled that the Earth’s field magnitude
is about 0.5 mT and the urban magnetic noise about 1 nT to 1 µT, which corresponds to a
factor of 1 million to 1 billion larger than the MEG signals. Such large differences between
signal and noise demand noise cancellation with extraordinary accuracy.
A MEG system is very expensive compared to EEG, because the SQUID sensors need to op-
erate at very low temperature, and for this reason are immersed in liquid helium. Moreover,
most often a magnetic shielded room is necessary to use the system. The main advantage of
the magnetic field measurements is that it is much less sensitive to the detailed conductivity
geometry of the head than the electric potential. The magnetic field observed outside the
head offers a more precise representation of the underlying brain activity, see figure 1.25 in
comparison to figure 1.23(b). That is why, in spite of their high cost, MEG systems are very
attractive for the exploration of the human brain.
1.2.3 Other modalities for brain functional imaging
Brain functional imaging modalities can be classified in two categories: direct and indirect
measures of the neuronal activity. The direct measures, like M/EEG, provide access to the
48 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Liquid
helium
Sensors
(a) Schematic representationof historical MEG device witha small number of sensors
(b) Schematic represen-tation of full head MEGdevice
(c) Recent MEG device (Magnetoencephalogra-phy center, La Timone, Marseille)
Figure 1.24: MEG devices. SQUID sensors are immersed in liquid helium.
−2.5e−13
−1.3e−13
0
1.3e−13
2.5e−13
Figure 1.25: Magnetic field measured with MEG on a somato-sensory experiment. It is a 2D
topography 20 ms after stimulation. Image obtained from the data used in [150].
49
spatial re
solu
tion (
mm
) invasivity
weak strong
5
10
15
20
temporal resolution (ms) 1 10 102 103 104 105
sEEG
MEG
EEG
fMRI
MRI(a,d)
PET
SPECT
nIRS
Figure 1.26: Spatiotemporal resolution and invasivity of brain functional imaging modalities.
electrical activity. Indirect measures estimate the brain activations only via the metabolic
and hemodynamic processes caused by the actual neuronal activations.
Neuroimaging modalities have each some characteristic features. They can be classified
in term of spatial resolution, temporal resolution and invasivity. This is summarized in fig-
ure 1.26.
Stereo-electroencephalography (sEEG)
Like M/EEG, stereoelectroencephalography (sEEG) provides access to the currents produced
by the neuronal activity. By implanting depth electrodes surgically into the brain tissues,
sEEG records the electrical potentials directly within the cortical layers. Electrodes are a
few centimeters long and contain multiple contacts. Each contact record the local electric
potential. Around the location of the activation are observed large deflections in the signal
waveforms typical to sEEG recordings (cf. figure 1.27).
In the treatment of epilepsy, this ability to precisely locate the origin of a neuronal acti-
vation contributes to define the boundaries of the “epileptogenic zone”, i.e., the area of brain
generating the epileptic seizures. It can be necessary to surgically resect this area to get rid of
the epileptic seizures. This technique was introduced by the group of the Ste Anne Hospital,
Paris, France, in the second half of the 20th century [111, 200].
However, this technique although has some drawbacks. The access to neuronal currents is
invasive and the number of electrodes limits the recordings to very specific brain regions. In
comparison, the spatial resolution of M/EEG is more limited but it records a very distributed
cortical activation and is therefore not restricted to predefined brain regions.
Functional magnetic resonance imaging (fMRI)
Functional magnetic resonance imaging, or fMRI, works by detecting the changes in blood
oxygenation and flow that occur in response to neural activity. An active brain area con-
sumes more oxygen. To meet this increased demand, blood flow in the active area increases.
50 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Figure 1.27: Electrode implantation and recordings with sEEG. (Reproduced from [72]).
Functional MRI can be used to produce volumetric activation maps showing which parts of
the brain are involved in a particular mental process (cf. figure 1.28).
Oxygen is delivered to neurons via haemoglobin carried by red blood cells. Haemoglobin
is diamagnetic when it is oxygenated while it is paramagnetic when deoxygenated. This
difference in magnetic properties leads to small differences in the MR signal. Since blood
oxygenation varies according to the levels of neural activity these differences can be used to
detect brain activity. This type of MRI is known as blood oxygenation level dependent (BOLD)
imaging.
One point to note is that the blood oxygenation increases following neural activation with
a delay of a few seconds. Due to the indirect measure of neural activation, temporal resolution
of fMRI is limited to the time scales of the measured hemodynamic processes. See [184] for a
historical perspective on fMRI development.
Positron Emission Tomography (PET)
Positron emission tomography (PET) is a nuclear medicine imaging technique which produces
a three-dimensional image of brain activations. The system detects pairs of gamma rays
emitted indirectly by a positron-emitting radionuclide, a tracer, which is injected into the
body on a biologically active molecule. Images of tracer concentration in 3D space within
the brain are then reconstructed by computer analysis, as illustrated in figure 1.29. Without
going into details, tracers used for brain PET scanning focus on the glucose consumption of
the different brain regions. Like fMRI, it gives access to neural activity indirectly via the
measurements of metabolic processes, but contrary to fMRI it requires the injection of an
invasive radioactive tracer.
51
Figure 1.28: Sample fMRI activation map. The fMRI statistics (yellow) are overlaid on an
average of the brain anatomies of several humans (source: wikipedia.org)
Figure 1.29: Sample PET activation map (source: wikipedia.org).
52 CHAPTER 1. NEURAL BASIS OF EEG AND MEG
Single Photon Emission Computed Tomography (SPECT)
Single Photon Emission Computed Tomography (SPECT), is similar to PET in its use of ra-
dioactive tracer material and detection of gamma rays. In contrast to PET, however, the
tracer used in SPECT emits gamma radiation that is measured directly, whereas PET tracer
emits positrons which annihilate with electrons up to a few millimeters away, causing two
gamma photons to be emitted in opposite directions. A PET scanner detects these emissions
“coincident” in time, which provides more radiation event localization information and thus
higher resolution images than SPECT. SPECT scans, however, are significantly less expen-
sive than PET scans, in part because they are able to use longer-lived more easily-obtained
radioisotopes than PET.
Optical imaging with near-infrared spectroscopy (nIRS)
Near-infrared spectroscopy (nIRS) uses near infrared light to measure the absorption of
haemoglobin. It relies on the absorption spectrum of haemoglobin varying with its oxygena-
tion status and as a consequence on the level of neural activity in a specific brain region.
It has the interesting ability for measure both deoxygenated and oxygenated haemoglobin,
which is of particular interest for understanding hemodynamic processes. nIRS is more con-
venient than fMRI with babies for which the skull is more transparent. It is also less noisy
than fMRI systems and therefore more adapted to children. However, nIRS is restricted to
superficial sources. See for example [19, 88] for more details.
53
1.3 CONCLUSION
As the electric activity of the neurons produces an electromagnetic field, and more
importantly because, the organization of neural assemblies enables the summation of these
fields, it is possible to detect and measure the brain activity outside of the head.
This offers the possibility to directly measure the neuronal activity in a non-invasive way.
The high temporal resolution of M/EEG measurements makes them particularly interesting
compared to other brain functional imaging modalities that are limited by the time scales
of metabolic and hemodynamic processes. More than simply localizing the origin of the mea-
sured signal, M/EEG offer the possibility to investigate the dynamics of the cortical processing
involved in different cognitive tasks.
In order to use M/EEG for brain functional imaging, some modeling and computation
need to be done. Prior to any localization of activation, the way the neuronal currents and
electromagnetic fields propagate within the different head tissues needs to be modeled. This
aspect of the work, that consists in quantifying how the neuronal activations produce a signal
on a given sensor involves physical considerations and is referred to as the direct (or forward)
problem. This problem is the subject of the following chapter. The aspect that consists in
localizing the activations based on the measured signal will be treated in chapters 3 and 4,
and is called the inverse problem.
CHAPTER 2
THE FORWARD PROBLEM
In chapter 1, it was explained how neurons can induce electromagnetic fields that can be mea-
sured non-invasively outside of the head. The problem that consists in modeling the head in
order to compute the electric potential or the magnetic field that should be produced by a
given configuration of generators at the sensor level is called the forward problem. The solu-
tion of this problem is the first step in the M/EEG processing pipeline whose final objective
is the localization of brain activations. The accuracy of the solution of the forward problem is
fundamental in order to provide good localization results.
In the first part of this chapter, we review the equations and methods for solving the
forward problem. We then detail the software contributions made in this thesis mainly in
the OpenMEEG project that implements the symmetric BEM presented in this chapter. We
also provide a list of open source software projects that can be used for solving the M/EEG
forward problem. Finally, we provide some numerical evaluations that demonstrate that the
precision obtained by OpenMEEG clearly improves over concurrent implementations.
Contents
2.1 The physics of EEG and MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.1 Maxwell’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.2 Quasi-static approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.1.3 The electric potential equation . . . . . . . . . . . . . . . . . . . . . . . . 58
2.1.4 The magnetic field equation: the Biot-Savart law . . . . . . . . . . . . . 58
2.2 Unbounded homogeneous medium . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.1 Dipolar sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2.2 Multipolar sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 The spherically symmetric head model . . . . . . . . . . . . . . . . . . . . 61
2.3.1 Electric potential generated by a dipole . . . . . . . . . . . . . . . . . . . 62
2.3.2 The magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3.3 Magnetic field generated by a multipole . . . . . . . . . . . . . . . . . . 65
2.3.4 Limits of spherical models . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4 Realistic head models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4.1 The Finite Difference Method (FDM) . . . . . . . . . . . . . . . . . . . . 66
2.4.2 The Finite Element Method (FEM) . . . . . . . . . . . . . . . . . . . . . 67
2.4.3 The Boundary Element Method (BEM) . . . . . . . . . . . . . . . . . . . 69
2.4.4 The Symmetric Boundary Element Method (SymBEM) . . . . . . . . . . 72
2.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
55
56 CHAPTER 2. THE FORWARD PROBLEM
2.6.1 Review of non commercial available software . . . . . . . . . . . . . . . 76
2.6.2 OpenMEEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
57
2.1 THE PHYSICS OF EEG AND MEG
Notations
All vectors are denoted in bold characters. The vector indicating the position of a point
r of R3 is denoted by r. In the following, we use vector calculus notation, with the “nabla”
operator ∇. For a real function f(r), ∇f is the gradient of f . For a vector field X(r), ∇ ·X is
the divergence of this field (a scalar) and ∇×X is the curl of this field (a vector).
2.1.1 Maxwell’s equations
Maxwell’s equations relate the electromagnetic field to the charge density and current den-
sity. We denote by E the electric field, B the magnetic field, ρ the charge density and J the
current density. Maxwell’s equations are a set of four partial differential equations:
∇ ·E =ρ
ǫ
∇×E = −∂B∂t
∇ ·B = 0
∇×B = µ
(
J + ǫ∂E
∂t
)
(2.1)
where ǫ is the electrical permittivity of the medium and µ is the magnetic permeability.
For human tissues, the magnetic permeability µ is the same as for vacuum µ = µ0,
whereas the relative electrical permittivity ǫr = ǫǫ0
varies a lot depending on tissue and fre-
quency. For instance, at a frequency of 100 Hz, ǫr is around 4× 106 for gray matter, 5× 105 for
fat and 6× 103 for compact bone [83].
2.1.2 Quasi-static approximation
As described in section 1.1, the post-synaptic potentials have a duration of about 10 ms. As a
consequence, it is commonly accepted that the time frequencies of the brain electromagnetic
field that can be observed outside the head can rarely exceed 100 Hz. For such low frequencies,
the time derivatives in Maxwell’s equations can be neglected, this is called the quasi-static
approximation.
A justification of the quasi-static approximation can be found in [100]. Let us illustrate it
with some orders of magnitude. We know that in a simple medium, the general solution of
the electromagnetic wave equation can be written as a linear superposition of planar waves
of different frequencies and polarizations. Let us just consider one planar wave for the sake
of simplicity. Its equation is:
E(r, t) = E0ei2πk·rei2πft , (2.2)
where i is the imaginary unit, E0 is a real amplitude vector contained in the wave plane, k is
a real spatial frequency vector normal to the wave plane (E0 · k = 0), and f is the temporal
frequency. Let us consider a Maxwell’s equation including a time derivative in a passive
conductive non-magnetic medium with conductivity σ:
∇×B = µ0
(
σE + ǫ∂E
∂t
)
. (2.3)
58 CHAPTER 2. THE FORWARD PROBLEM
The current J is replaced by σE in (2.1) following Ohm’s law, J = σE.
To neglect the time derivative in (2.3), it is necessary that ‖ǫ∂E∂t ‖ ≪ ‖σE‖. For the planar
wave, this is equivalent to κ = |2πf ǫσ | ≪ 1. At a frequency of 100 Hz, the average permittivity
of the head tissues is ǫ = 105ǫ0 and the average conductivity is σ = 0.3 Ω−1m−1. With these
values, κ = 1, 8× 10−3, hence the term ǫ∂E∂t can be neglected.
More intuitively, we can just consider the spatial wavelength λ of our planar wave which
is given by the relation c = fλ, where c = 1√µǫ is the speed of the wave in the medium and f
is the temporal frequency of the wave. With a frequency of 100 Hz, it gives us a wavelength of
about 105 m. Thus, at the scale of a human head, we can neglect the oscillations of the wave,
which gives ∇×E = 0 instead of ∇×E = −∂B∂t .
The practical consequences of the quasi-static approximation are twofold. First, the elec-
tric component is decoupled from the magnetic component, allowing the computation of the
electric potential separately. Second, the delays of propagation of the signal from the neu-
ronal sources to the M/EEG sensors can be neglected. We can, therefore, assume that M/EEG
sensors measure at each instant the activity produced at the very same instant.
2.1.3 The electric potential equation
In the quasi-static approximation, we neglect all the time derivatives. As a consequence, the
curl of the electric field E is zero, meaning that it derives from a scalar potential V :
E = −∇V. (2.4)
In a medium with current generators, the total current can be decomposed in two parts:
a primary current flow Jp related to the current generators and a volume current flow Jv due
to the electric field in the volume.
Using Ohm’s Law, Jv = σE, we have:
J = Jp + Jv = Jp + σE = Jp − σ∇V . (2.5)
The volume currents, a.k.a., the ohmic currents, correspond to the displacements of charges
due to the gradient of potential in the medium.
Neglecting the time derivative in Maxwell equations leads to:
∇×B = µJ
⇒ ∇ · (∇×B) = ∇ · (µJ)
⇒ 0 = ∇ · (µJ)
⇒ 0 = ∇ · J
(2.6)
which finally leads to the potential equation:
∇ · (σ∇V ) = ∇ · Jp . (2.7)
2.1.4 The magnetic field equation: the Biot-Savart law
Because ∇ ·B = 0, there exists a vector field A such that:
B = ∇×A .
We use the classical gauge condition ∇ · A = 0 to avoid the indetermination caused by the
definition of A.
59
This leads to:
∇×B = ∇×∇×A = ∇ (∇ ·A)−∆A = −∆A .
Maxwell’s equation ∇ × B = µ0J becomes ∆A = −µ0J, which is a Poisson equation. If we
impose A(|r| → ∞) = 0 (no magnetic field at infinity), it has a general solution in R3 :
A(r) =µ0
4π
∫
R3
J(r′)‖r− r′‖dr
′ .
Taking the curl, we obtain the Biot-Savart law:
B(r) =µ0
4π
∫
R3
J(r′)× (r− r′)‖r− r′‖3 dr
′ .
Because the current can be written as J = Jp − σ∇V , we can transform the Biot-Savart
law into:
B(r) = B0(r)− µ0
4π
∫
R3 σ∇V (r′)× r−r′
‖r−r′‖3 dr′ , (2.8)
with
B0(r) =µ0
4π
∫
R3
Jp(r′)× (r− r′)‖r− r′‖3 dr
′ .
With this formulation, B0 is often called the primary magnetic field while the second term is
called the secondary magnetic field.
Note that the primary magnetic field does not depend on the medium, which corresponds
to the head with M/EEG, and therefore it does not depend on the values of the conductivities.
2.2 UNBOUNDED HOMOGENEOUS MEDIUM
Let us consider an homogeneous volume conductor with a constant conductivity σ.
The equation (2.7) becomes:
∆V =1
σ∇ · Jp ,
which is a Poisson equation of general solution:
V (r) =1
4πσ
∫
R3
∇ · Jp(r′)‖r− r′‖ dr
′ ,
with V vanishing at infinity. Applying the divergence theorem, it yields
V (r) = 14πσ
∫
R3 Jp(r′) · (r−r′)‖r−r′‖3 dr
′ . (2.9)
For the magnetic field, we take σ out of the integral in (2.8) because it is constant:
B(r) = B0(r)−µ0σ
4π
∫
R3
∇V (r′)× (r− r′)‖r− r′‖3 dr
′ . (2.10)
Using the identity
∇V (r′)× (r− r′)‖r− r′‖3 = ∇×
( ∇V (r′)‖r− r′‖
)
− ∇× (∇V (r′))‖r− r′‖
and the fact that the curl of a gradient is null, the integral on the right hand side of (2.10)
60 CHAPTER 2. THE FORWARD PROBLEM
becomes∫
R3
∇×( ∇V (r′)‖r− r′‖
)
dr′ .
Since V vanishes at infinity, using Stokes’ theorem, this integral is null. We obtain finally
that, in an infinite homogeneous medium, the magnetic field reduces to the primary field:
B(r) = B0(r) = µ0
4π
∫
R3 Jp(r′)× (r−r′)‖r−r′‖3 dr
′ . (2.11)
In this special case, the passive current σE = −σ∇V does not contribute to the magnetic field.
2.2.1 Dipolar sources
If the primary current Jp is reduced to a single current dipole at position r0 with moment
q, then Jp(r) = q δr0(r), where δr0 is the Dirac distribution at r0. Using equations (2.9) and
(2.11) with such a primary current, we obtain that the potential and magnetic field in an
homogeneous space have very simple formulations:
Vdip(r) =1
4πσq · r− r0
‖r− r0‖3(2.12)
Bdip(r) =µ0
4πq× r− r0
‖r− r0‖3(2.13)
Such formulas can also be obtained from a Taylor expansion of the function:
Φr(r′) =
r− r′
‖r− r′‖3
Let us assume that the primary current sources lie in small volume δΩ (cf. figure 2.1). The
Taylor expansion of Φr at r′ = r0, where r0 is the centroid of the small volume δΩ, gives:
Φr(r′) = Φr(r0) +∇r0
Φr(r0 − r′) + o(‖r′ − r0‖)
where ∇r0Φr is the gradient of Φr taken at position r0.
By approximating Φr(r′) by Φr(r0) the equations (2.9) and (2.11) write:
V (r) =1
4πσ
∫
δΩ
Jp(r′)dr′ · r− r0
‖r− r0‖3
B(r) =µ0
4π
∫
δΩ
Jp(r′)dr′ × r− r0
‖r− r0‖3.
(2.14)
We obtain the formulas giving the potential and magnetic field of a dipolar source whose
moment is given by:
q =
∫
δΩ
Jp(r′)dr′ . (2.15)
As illustrated in figure 2.1, this approximation is justified if ‖r′ − r0‖ ≪ ‖r′ − r‖ in δΩ. This
implies that the primary currents produced by a region δΩ sufficiently small compared to the
distance to the observation point r can be correctly modeled by an equivalent current dipole
(ECD).
2.2.2 Multipolar sources
If the Taylor approximation at order 0 used to justify the dipolar source model does not hold,
it is necessary to consider more terms in the Taylor expansion. This leads to the multipolar
61
r0δΩ
r
r'
y
x
z
IIr-r'II
IIr'-r0II
Figure 2.1: Measurement at point r of the electromagnetic fields produced by a current dis-
tribution in a region δΩ when ‖r′ − r0‖ ≪ ‖r′ − r‖.
source models.
Using one more term we get:
Vmult(r) =1
4πσ
(∫
δΩ
Jp(r′)dr′ · r− r0
‖r− r0‖3+
∫
δΩ
∇r0Φr(r0 − r′) · Jp(r′)dr′
)
= Vdip(r) + Vquad(r)
(2.16)
where Vquad stands for the quadrupolar term.
Borrowing the notation “:” for tensor contraction from [118], we rewrite the quadrupolar
term Vquad(r) as:
Vquad(r) =1
4πσ∇r0
Φr :
∫
δΩ
(r0 − r′)Jp(r′)dr′
where the term∫
δΩ(r0 − r′)Jp(r′)dr′, denoted by Qquad, is called the quadrupolar moment.
This term is a 3× 3 tensor.
Similarly, for the magnetic field we get:
Bmult(r) = Bdip(r) + Bquad(r)
where
Bquad(r) =µ0
4π∇r0Φr : Qquad
When considering more terms in the expansion, we add correcting terms in the expression
of the electric and magnetic fields.
2.3 THE SPHERICALLY SYMMETRIC HEAD MODEL
Obviously, the human head is not an infinite homogeneous conductor. First of all, it
is a bounded conductor and no electric current can flow outside the head (except at the neck).
Secondly, the electrical conductivity σ of the head is not constant: for instance, the skull is
62 CHAPTER 2. THE FORWARD PROBLEM
between 20 and 100 times less conductive than other head tissues. This must be taken into
account to get more accurate approximations of the potential and magnetic field generated
by the brain electrical activity. A first step towards head modeling is to consider the head as
a set of nested concentric spheres. Each volume enclosed between two spheres is supposed to
represent a different tissue with a constant isotropic conductivity. Figure 2.2 shows a sphere
model with three spheres. Without respect to proportions, it could represent the brain, the
skull and the scalp of a human head. This simple geometry allows one to find an analytic
solution for the electric potential generated by a dipole, like for the infinite homogeneous
medium (2.12).
Ω1
Ω2
Ω3
S1
S2
S3
σ1
σ2
σ3
r1
r2 r3
Figure 2.2: A spherical model with three layers.
2.3.1 Electric potential generated by a dipole
The key point is to take advantage of the spherical symmetry of the geometry. First, we
use spherical coordinates (r, θ, φ) instead of Cartesian coordinates, and second, we expand
the electric potential in spherical harmonics Y ml (θ, φ). The spherical harmonics have the
following form:
Y ml (θ, φ) = Nm
l Pml (cos θ)eimφ, l ∈ N,m ∈ Z, |m| < l ,
where Nml is a normalization coefficient and Pm
l is an associated Legendre function. In our
case, this is of particular interest because the general solution of Laplace’s equation ∆f = 0
in spherical coordinates can be written as a linear combination of spherical harmonics
f(r, θ, φ) =
∞∑
l=0
l∑
m=−l
(Almr−1−l +Blmr
l)Y ml (θ, φ) .
If f is a real function, it simplifies to
f(r, θ, φ) =
∞∑
l=0
l∑
m=0
(Almr−1−l +Blmr
l)Pml (cos θ) cos(mφ) .
Now let us consider a current dipole inside the spherical model at location r0 with q its
moment. We use the notation of figure 2.2, with indices increasing from the innermost sphere
63
to the outermost one. In all subregions Ωk where the dipole is not located, the potential
equation states that ∇ · (σk∇V ) = σk∆V = 0 because σk is constant. As a consequence, the
restriction Vk of V in each domain Ωk is harmonic and can be decomposed on the spherical
harmonic basis:
Vk(r, φ, θ) =∑∞
l=0
∑lm=0(A
klmr
−1−l +Bklmr
l)Pml (cos θ) cos(mφ) . (2.17)
In the domain Ωk∗ where the dipole is located, the potential V satisfies σk∗∆V = ∇ · Jp,
with Jp = q δr0. So we can decompose the potential in V = v + u, where v is the potential
generated by the dipole in an infinite homogeneous domain of conductivity σk∗ , and u is an
harmonic function. The function v is defined as
v(r) =1
4πσk∗q · (r− r0)
‖r− r0‖3.
This function can be decomposed in the spherical harmonic basis
v(r, φ, θ) =
∑∞l=0
∑lm=0 q
inflm rlPm
l (cos θ) cos(mφ) , r < r0∑∞
l=0
∑lm=0 q
suplm r−1−lPm
l (cos θ) cos(mφ) , r > r0
So if we denote Ak∗lm and Bk∗
lm the coefficients of the decomposition of u, we have a decomposi-
tion of Vk∗ in the spherical harmonic basis
Vk∗(r, φ, θ) =
∑∞l=0
∑lm=0(A
k∗lmr
−1−l + (qinflm +Bk∗
lm)rl)Pml (cos θ) cos(mφ) , r < r0
∑∞l=0
∑lm=0((A
k∗lm + qsup
lm )r−1−l +Bk∗lmr
l)Pml (cos θ) cos(mφ) , r > r0
(2.18)
Finally, to fully determine the potential in the whole domain Ω, one needs to fix the value
of the coefficients Aklm and Bk
lm. This is done by considering the boundary conditions at each
surface Sk. The electric potential and the current density must be continuous through the
interfaces:
Vk(rk, φ, θ) = Vk+1(rk, φ, θ)
σk∂Vk
∂r (rk, φ, θ) = σk+1∂Vk+1
∂r (rk, φ, θ)(2.19)
From (2.17), (2.18) and (2.19), a linear system can be built for the Aklm and Bk
lm, which
leads to the determination of these coefficients. Because of the infinite series, in practi-
cal situations one has to choose at which order the series have to be truncated. For high
orders, the solution is more accurate but the computation is more expensive. Several ap-
proaches have been proposed for efficient computation of the electric potential in multilayer
spheres [18, 58, 238].
2.3.2 The magnetic field
In the case of the magnetic field, there is no need to use an infinite series based on a decom-
position in spherical harmonics. Indeed, in a spherical geometry, the magnetic field has a
simple closed-form.
64 CHAPTER 2. THE FORWARD PROBLEM
2.3.2.1 The radial component of the magnetic field
We again consider a spherical geometry as described in figure 2.2. With spherical coordinates,
the radial component of the magnetic field is
Br(r) = B(r) · er = B(r) · rr,
and the outward normal at each surface Sk is
n(r′) =r′
r′.
The spherical geometry in figure 2.2 is a special case of geometry with a piecewise constant
conductivity, so that the magnetic field can be expressed using the formula (2.28) described in
section 2.4.3. If we compute the radial component of the magnetic field, the following scalar
triple product appears in the surface integrals:
r− r′
‖r− r′‖3 ×r′
r′· rr.
This quantity is zero because r, r′ and (r − r′) are in a same plane. As a consequence, in a
spherical geometry, the radial component of the magnetic field is equal to the radial compo-
nent of the primary field:
Br(r) = B0(r) · er =µ0
4π
∫
Ω
Jp × (r− r′)‖r− r′‖3 · erdr
′ . (2.20)
2.3.2.2 Total magnetic field generated by a dipole
We assume that Jp is a dipole at position r0 with a moment q. Outside the domain Ω, there is
no current, the Maxwell’s equations in the quasi-static approximation state that ∇ ×B = 0.
As a consequence, outside Ω, B derives from a scalar potential U :
B = −∇U ,
with U vanishing at infinity. For r outside Ω, we can then write the following line integral:
U(r) = −∫ ∞
0
∇U(r + ter) · erdt
=
∫ ∞
0
B(r + ter) · erdt
=
∫ ∞
0
Br(r + ter) · erdt
Using the expression of the radial magnetic field in (2.20) and the dipolar approximation from
(2.15), we obtain:
U(r) =µ0
4πq× (r− r0) · er
∫ ∞
0
1
‖r + ter − r0‖3dt .
The computation of the integral in the right hand side leads to
U(r) = −µ0
4π
q× r0 · rF
,
65
where F = a(ra + r2 − r0 · r) with a = r − r0 and a = ‖a‖. Taking the gradient results in the
following formulation for the total magnetic field:
B(r) = µ0
4πF 2 (Fq× r0 − q× r0 · r∇F ) . (2.21)
This formula for the total magnetic field generated by a dipole in a spherical geometry was
found by Sarvas [192]. Interestingly, although it is different from the formula in an infinite
homogeneous medium, it is also independent of the conductivity σ of the domain Ω.
2.3.3 Magnetic field generated by a multipole
Considering multipolar expansions, Jerbi et al. showed in [118], that a similar expression as
(2.21) can be derived for the magnetic field when using spherical geometries.
Let Xr be the cross product tensor, defined by r × x = Xr · x. Using this notation, the
Sarvas formula can be rewritten for a current dipole:
Bdip(r) = − µ0
4πF 2(FXr0 +∇F (r0 × r)) · q .
Borrowing again the notation “:” from [118], the magnetic field produced by a current
multipole can be obtained from:
Bmult(r) = − µ0
4πF 2(FXr0 +∇F (r0 × r)) · q−∇r0
µ0
4πF 2(FXr0 +∇F (r0 × r)) : Qquad (2.22)
The term q has 3 coefficients and Qquad has 9 coefficients. Yet, an analysis of this first
order multipole model [117] has shown that the forward field produced is only of rank 7. A
first order current multipole can therefore be modeled with 7 parameters instead of 12.
2.3.4 Limits of spherical models
The analytical or semi-analytical formulas of the electromagnetic field can be extended to non
concentric spheres [149], or to ellipsoidal geometries [54]. However, for EEG, several studies
have shown that such simplified models can not produce satisfactory results [33, 47, 109]. For
MEG data, the head modeling is not as crucial. With a spherical head model, the total mag-
netic field does not depend on the conductivities and it is observed with more complex head
geometries that this limited influence of the conductivities is maintained. This explains why
spherical models are very popular in MEG. With EEG, it is necessary to considerer realistic
head models with less constrained geometrical properties.
2.4 REALISTIC HEAD MODELS
To improve the accuracy of the forward calculation one needs to consider more realis-
tic head models. The geometry of such improved head models can be obtained from other
anatomical imaging modalities: computed tomography (CT) and structural magnetic reso-
nance imaging (sMRI). Structural MRI is here opposed to functional MRI (fMRI) used for
brain functional imaging with MRI (cf. section 1.2.3).
MRI consists in applying a strong external magnetic field to a volume, which consequently
aligns all the magnetic moments of nucleus like hydrogen with a fixed direction within this
volume. After stopping this magnetic field, MR machines are able to measure the relaxation
66 CHAPTER 2. THE FORWARD PROBLEM
Figure 2.3: On the left, a slice of a CT image. On the right, the same slice obtained with T1
MRI. We observe that CT offers a clear view of the skull while MRI provides a good contrast
in soft tissues. (Source gehealthcare.com)
time in voxels located on a 3D grid within the volume. The relaxation time is the time it takes
for the hydrogen to return to equilibrium. This relaxation time depends of the physical prop-
erties of each tissue offering the possibility to get 3D images with a high contrast especially
for soft tissues (white matter, gray matter, fat, muscle). MRI is however not very adapted to
the imaging of bones due to the reduced presence of hydrogen in such structures. Computed
tomography (CT) imaging modality is more appropriate for bones. However, based on X-ray,
this modality exposes the patients to the hazards of ionizing radiation and is therefore not
commonly used for M/EEG studies. Hence, precise models of the head tissues are often built
from structural MR images, but the skull is most of the time not clearly visible and the skull
models obtained from MRI are less accurate. CT and MRI images of the same subject are
shown in figure 2.3.
Volumetric anatomical data reveal the geometrical complexity of the head structures. We
will now present existing approaches and approximations to take this information into ac-
count for computing precise forward models. The different approaches that exist to compute
numerical solutions for the forward problem in M/EEG are Finite Difference Methods (FDM),
Finite Element Methods (FEM) and the Boundary Element Method (BEM).
2.4.1 The Finite Difference Method (FDM)
Finite difference methods provide numerical solutions to differential equations by approxi-
mating derivatives with finite differences, i.e., approximative equivalent difference quotients.
For instance, for a function f in 1D, the first order derivative is given by the limit:
f ′(x) = limh→0
f(x+ h)− f(x)
h,
thus, for a small value of h, the derivative can be approximated by:
f ′(x) ≃ f(x+ h)− f(x)
h.
For equation (2.7), we need to approximate the differential operator ∇ · (σ∇V ). In 3D, a
point has 6 neighbors, located at a distance of +h and −h in each direction. This approxima-
67
tion leads to:
(∇ · σ∇V ) (r0) ≃1
h2
(
α0V (r0)−6∑
i=1
αiV (ri)
)
, (2.23)
where the constants α0 and αi depend on the conductivities at the points r0 and ri. Please note
that this scheme corresponds exactly to Kirchhoff ’s law for the balance of currents, assuming
that the points form a network of resistors. Generally, the head volume is discretized using
a cubic grid with a regular spacing h, therefore the same scheme (2.23) can be used at every
point of the grid by computing differences between closest neighbors.
For the source, we need to approximate the divergence operator ∇ · Jp. The primary
currents are defined over the edges between the grid points. For example, a dipole can be
represented as a small current flowing over the edge linking two points r+ and r−, so that the
divergence is reduced to the source and sink of current, i.e., ∇ · Jp = Iδr+ − Iδr− , where I is
the amplitude of the current. Denoting by [Ji] the values of the primary currents between the
neighboring grid points, the term ∇ · Jp can be written in matrix form as B [Ji]. By denoting
[Vi] the values of the potential at grid points and plugging this expression into Kirchhoff ’s
law, we get that the potential [Vi] is solution of the linear problem:
A [Vi] = B [Ji]
The matrices involved are typically very large since the whole head domain has to be dis-
cretized. However the matrix A that needs to be inverted is highly sparse, because it has at
most six off-diagonal elements per line, which implies that iterative methods are efficient.
The main drawback of the FDM method for M/EEG forward modeling is that, due to the
cubic grid, the complex interfaces between brain structures and thin layers cannot be pre-
cisely modeled. Indeed, with a cubic grid, the interfaces have to follow the grid points which
leads to a “staircase” effect.
2.4.2 The Finite Element Method (FEM)
The FEM can work on an unstructured grid like a triangulated surface when computed in
2D or on a tetrahedrized volume in 3D. Figure 2.4 is an example of a tetrahedral mesh of
the head. The problem is first reformulated in its variational form, also called weak form.
Then, an approximate solution of the problem in its weak formulation is found by looking for
a solution in a finite dimensional vector space. The second step requires to properly choose
the finite dimensional space in order to guarantee the quality of the approximation. Once
discretized, the problem leads to a linear system that is solved numerically. Contrary to the
FDM, Finite Element Methods (FEM) do not suffer from the staircase effect.
To illustrate this, let us consider the equation (2.7) for the potential. We consider the
domain Ω describing the head and its boundary denoted ∂Ω. On the boundary, there is no
electric current flowing outside, so the differential equation with its boundary condition is
∇ · (σ∇V ) = ∇ · Jp in Ω
σ∇V · n = 0 on ∂Ω(2.24)
An important step of the FEM is to transform the differential equation (2.24) in its varia-
tional formulation. We assume that V lives in a certain Hilbert space E of regular functions.
68 CHAPTER 2. THE FORWARD PROBLEM
Figure 2.4: A tetrahedral mesh of the head. The different domains are shown with differ-
ent colors, from inside to outside: white matter, gray matter, CSF, skull, scalp (Adapted
from [229]).
If V is solution of (2.24), then for any function φ in E:
∫
Ω
∇ · Jpφ =
∫
Ω
∇ · (σ∇V )φ
=
∫
∂Ω
φσ∇V · n−∫
Ω
σ∇V · ∇φ
= −∫
Ω
σ∇V · ∇φ
The last equality is of the form a(V, φ) = f(φ) where a is a bilinear functional and f is
linear. This provides the weak formulation of (2.24):
∀φ ∈ E, a(V, φ) = f(φ) . (2.25)
The second step consists in solving equation (2.25) in a finite dimensional subspace Eh of
E. The parameter h refers to the precision of the approximation. Let (φi)i=1...n be a basis of
Eh. A solution V in Eh can be written V =∑n
i=1 Viφi.
The variational formulation in Eh becomes:
∀j ∈ [1, . . . , n] ,
n∑
i=1
Viah(φi, φj) = fh(φj) . (2.26)
The right hand side is given by:
fh(φj) =∫
Ωh∇ · Jpφj
=∫
∂ΩhφjJ
p · n−∫
ΩhJp · ∇φj
= −∫
ΩhJp · ∇φj ,
assuming that there are no sources on the boundary ∂Ωh and that partial integration is possi-
ble. For instance, for a dipole, Jp = qδr0, where δr0
is the Dirac distribution at r0, the equality
69
∫
Ωh∇ · Jpφj = −
∫
ΩhJp · ∇φj leads to:
fh(φj) = q · ∇φj(r0) .
Like for the FDM, this leads to a linear system A [Vi] = b which completely determines
the values Vi. An approximate solution for the potential can then be obtained by solving
this linear system which can however be huge. The computation of the magnetic field is not
detailed here, but it can be obtained from the computed electric potential.
The success of the FEM comes from the idea that by using basis functions φi with a local
support, the matrix A can be very sparse. Iterative methods, like conjugate gradient methods,
can then perform well for solving this linear system.
Typical choices for the functions φi include the P0 and P1 elements. For instance, the
the subspace of the piecewise constant functions corresponds to a P0 discretization. With P0
elements the φi are indexed by the tetrahedron and are constant on the ith tetrahedron. With
P1 elements, the functions φi are indexed by the nodes of the tetrahedrization. Each function
φi takes the value 1 at node i, 0 at every other node, and is affine on the tetrahedra adjacent
to node i. With such compactly supported elements, we observe that ah(φi, φj) = 0 for two
non neighboring elements, which guarantees the sparsity of A. As shown by (2.8), once the
potential V is computed, the magnetic field can be obtained by numerical integration.
For the sake of clarity in the current presentation, the potential V is decomposed over
a set of basis functions (φi)i, and it is these same functions that are used as test functions,
a.k.a., evaluation functions. The approximation of (2.25) leads to (2.26) where only appear
the dot products ah(φi, φj). However, it is also possible to use different basis of functions for
the approximation of V and for the evaluation. If we consider different evaluation functions
(ψj)j , the discretization leads to the computation of ah(φi, ψj). A particular choice for these
functions leads to what is called collocation methods. With such methods, the test functions
(ψj)j are Dirac functions and the computation of ah(φi, ψj) leads to simple function evalu-
ation. A numerical method where P1 elements are used for the approximation and Dirac
functions are used for the evaluation is called “linear collocation” [155]. A collocation method
with P0 elements for the discretization of V is called “constant collocation”. Methods with
general test functions are referred to as “Galerkin methods”. For example, we call “linear
Galerkin” (resp. “constant Galerkin”) a method where both the approximation and the test
functions are P1 (resp. P0) elements.
2.4.3 The Boundary Element Method (BEM)
From a structural MRI of the subject, it is possible to extract the different structures of
the head (white matter, gray matter, etc.). In a first approximation, we can consider that
the conductivity within each of these structures is constant. The boundary element method
(BEM) is a numerical method for solving linear partial differential equations which have
been transformed into integral equations defined over the boundaries of the different domains
(white matter, gray matter, etc.). In order to achieve such a reformulation of the problem, one
needs to assume homogeneous conductivities of each domain.
The piecewise constant approximation
Figure 2.5 shows the kind of geometry that can be extracted from the MRI of a subject’s
head. In practice, each subregion corresponds to a certain type of head tissue which is sup-
posed to be sufficiently homogeneous to have a constant conductivity. The conductivity of the
head is only discontinuous at the interfaces between tissues.
70 CHAPTER 2. THE FORWARD PROBLEM
(a) Sagittal cross section of an anatomical MR vol-ume image.
Ω1 : white matter
Ω2 : grey matter
Ω3 : cerebrospinal fluid
Ω4 : skullΩ5 : scalp
S1
S2
S3
S4
S5
σ1
σ2
σ3
σ4
σ5
(b) The domains extracted from an MRI.
Figure 2.5: Example of piecewise constant head model.
With this approximation, we can model the head as a domain Ω composed of several sub-
regions Ωk separated by surfaces Sk, each with a constant conductivity σk, and with σ = 0
outside Ω. With the piecewise constant approximation, equations (2.7) and (2.8) can be trans-
formed into integral equations:
σk + σk+1
2V (r) = V0(r)− 1
4π
∑
l(σl − σl+1)∫
SlV (r′) r−r′
‖r−r′‖3 · nl(r′)ds′ , (2.27)
B(r) = B0(r)− µ0
4π
∑
l(σl − σl+1)∫
SlV (r′) (r−r′)
‖r−r′‖3 × nl(r′)ds′ , (2.28)
where r ∈ Sk, V0 and B0 are the electric potential and magnetic field generated by the pri-
mary current distribution Jp in a homogeneous domain. These formulas are obtained from
Geselowitz [86, 87]. Details on how to derive these equations are given in section 2.4.4 via
the use of the representation theorem.
We observe that the integrals are defined over the boundaries of the domains, i.e, the inter-
faces. Provided with these equations and the triangulations of the interfaces (cf. figure 2.6),
an approximate solution is obtained. Let us denote by S = ∪kSk the union of the surfaces
Sk, and E the space of functions square integrable on S. We assume that the surfaces Sk are
approximated with a set of n triangles Ti | i ∈ [1, . . . , n]. Like for the FEM, the solution is
approximated in a subspace of finite dimension and this approximation is computed by using
test functions.
Just for illustration purposes, let us consider a constant collocation method. The potential
is discretized in the subspace of piecewise constant functions, which are constant on each
triangle (P0 discretization). The test functions are Dirac functions located at the center of
each triangle. We denote the approximation subspace Eh, where h is an index which stands
for the size of the largest triangle. A basis (φi)i of P0 elements for Eh is:
φi(r) = 1, r ∈ Ti
φi(r) = 0, r /∈ Ti
Any function f ∈ Eh can then be written f =∑
i fiφi, where fi is the constant value of f
71
(a) The scalp surface extracted from an MRI. (b) An approximation with triangles.
Figure 2.6: Example of triangulated surface used as interface in the boundary element
method.
on Ti. Injecting this expression into the equation (2.27) for the electric potential and testing
with Dirac functions leads to the problem:
Find V ∈ Eh such that:
∀i, Ti ∈ Sk,σk + σk+1
2Vi = V0(ri)−
1
4π
∑
l
(σl − σl+1)∑
Tj∈Sl
Vj
∫
Tj
φj(r′)
ri − r′
‖ri − r′‖3 · njds′ ,
where ri is the center of triangle Ti and nj is the constant normal to triangle Tj . This last
equation is of the form
Vi = bi +∑
j
aijVj ,
where the bi and aij are constant coefficients that can be computed. The problem takes again
the form of a linear system:
A [Vi] = b . (2.29)
The values Vi of V on each triangle are obtained by resolution of this linear system. This
matrix is however singular. The potential is defined up to an additive constant. In order
to obtain a non singular matrix another constraint needs to be added. This can be done by
forcing the average potential on all surfaces to be zero. This operation is generally referred
to as deflation. From this electric potential, an approximate solution of the magnetic field
generated by the same source can be computed with equation (2.28) [71].
Computationally, one advantage of the BEM is that the matrix A is generally sufficiently
small to use direct methods for the resolution of the linear system. Such methods use fac-
torizations of the matrix A (e.g., A = LU) which transform the linear system (2.29) in a new
linear system which can be solved very rapidly. Because the contribution of the source Jp only
appears in the right hand side b of the system (2.29), the factorization of the matrix (which
corresponds to the most computationally expensive part) has to be performed once for a given
head model, and then the solution of the forward problem can be computed rapidly for many
different source distributions.
The procedure described here, uses a piecewise constant discretization for the potential
(i.e., P0 elements) and collocation. This is a very coarse level of discretization. In order to
improve the numerical precision, it is preferable to discretize the potential with P1 elements,
72 CHAPTER 2. THE FORWARD PROBLEM
i.e., in the space of piecewise linear functions, and to use the same P1 functions as test func-
tions [57, 155].
However, the standard BEM derived from Geselowitz’s formulas is prone to certain nu-
merical errors. First, if there are large differences between the conductivities of the different
compartments of the head model, it can lead to an amplification of the numerical errors [148].
The Isolated Problem Approach can be used to reduce this effect [102]. Second, for sources
which are located close to an interface, typically at a distance smaller than the size of the
triangles used to describe the surfaces, the accuracy of the BEM drops severely. It is to cir-
cumvent this limitation of the BEM that a new formulation has been introduced [131, 132].
This formulation that we are going to present now is called the Symmetric BEM, since its
leads to a linear system where the matrix to be inverted is symmetric.
2.4.4 The Symmetric Boundary Element Method (SymBEM)
The Symmetric Boundary Elements Method (SymBEM) is intrinsically a reformulation of the
integral equations (2.27) and (2.28) at the origin of the standard BEM. This method is based
on advanced representation theorems originally developed in the group of J-C Nedelec [164].
Green Representation Theorem
The Green Representation Theorem states that a piecewise harmonic function can be ex-
pressed as a combination of boundary integrals of its discontinuities and the discontinuities
of its normal derivative across interfaces.
Let ∂nV = n ·∇V denote the partial derivative of V in the direction of a unit vector n. The
restriction of a function f to a surface Sj will be denoted fSj. We define the discontinuity of a
function f : R3 → R across Sj as
[f ]Sj= f−Sj
− f+Sj,
where the functions f− and f+ on Sj are respectively the interior and exterior limits of f :
for r ∈ Sj , f±Sj(r) = lim
α→0±f(r + αn).
Let us consider an open region Ω and a function u such that ∆u = 0 in Ω and in R3\Ω. Let
G(r) = 14π‖r‖ be the fundamental solution of the Laplacian such that −∆G = δ0. The Green
Representation Theorem states that, for a point r belonging to ∂Ω,
u−(r) + u+(r)
2= −
∫
∂Ω
[u] ∂n′G(r− r′)ds(r′) +
∫
∂Ω
[∂n′u]G(r− r′)ds(r′) .
As shown in [131], this representation also holds when Ω is the union of disjoint open sets:
Ω = Ω1 ∪ Ω2 ∪ . . .ΩN , with ∂Ω = S1 ∪ S2 ∪ . . . SN , as in figure 2.7. In this case, for r ∈ Si,
u−(r) + u+(r)
2= −
N∑
j=1
(
∫
Sj
[u]Sj∂n′G(r− r′)ds(r′) +
∫
Sj
[∂n′u]SjG(r− r′)ds(r′)
)
(2.30)
The notation is simplified by introducing two integral operators, called the “double-layer”
and “single-layer” operators, which map a scalar function f on ∂Ω to another scalar function
on ∂Ω:(
Df)
(r) =
∫
∂Ω
∂n′G(r− r′)f(r′) ds(r′)
73
Figure 2.7: The head is modeled as a set of nested regions Ω1, . . . ,ΩN+1 with constant isotropic
conductivities σ1, . . . , σN+1, separated by interfaces S1, . . . , SN . Arrows indicate the normal
directions (outward).
and(
Sf)
(r) =
∫
∂Ω
G(r− r′)f(r′) ds(r′) .
For a given operator A, its restriction which maps a function of Sj to a function of Si is denoted
Aij .
The double-layer BEM
To apply the representation theorem to the forward problem of EEG, a harmonic func-
tion which relates the potential and the sources must be produced. Let us decompose the
source term as f =∑
i fi where the support of each fi lies inside homogeneous region Ωi,
and consider vΩisuch that ∆vΩi
= fi holds in all R3. The function vd =
∑Ni=1 vΩi
satisfies
∆vd = f and is continuous across each surface Si, as well as its normal derivative ∂nvd. The
function u = σ V − vd is a harmonic function in Ω, to which (2.30) can be applied. Since
[u]Si= (σi − σi+1)Vj and [∂nu] = 0, we obtain, on each surface Si,
σi + σi+1
2Vj +
N∑
j=1
(σj − σj+1) Dij Vj = vd . (2.31)
By noticing that:(
Df)
(r) =
∫
∂Ω
f(r′)r− r′
‖r− r′‖3 ds(r′)
we get that this formula is exactly the formula established by Geselowitz (2.27). Hence, the
classical BEM corresponds to a double-layer potential formulation because it involves the
double-layer operator D.
An extension of the Green Representation Theorem represents the directional derivative
of a harmonic function as a combination of boundary integrals of higher order. This requires
two more integral operators: the adjoint D∗ of the double-layer operator, and a hyper-singular
operator N defined by:(
Nf)
(r) =
∫
∂Ω
∂n,n′G(r− r′)f(r′) ds(r′) .
74 CHAPTER 2. THE FORWARD PROBLEM
The theorem says that if r is a point of Si, then
− ∂nu−(r) + ∂nu
+(r)
2= +N[u]−D
∗[∂nu] (2.32)
The Geselowitz formula uses the first boundary integral representation equation (2.30), whereas
the Symmetric BEM [131] uses both (2.30) and (2.32) in a formulation combining single- and
double-layer potentials.
The symmetric BEM
The originality of the symmetric Boundary Element Method is to consider one piecewise
harmonic function per domain: the function uΩiequal to V − vΩi
σiwithin Ωi and to−vΩi
σioutside
Ωi. This function uΩiis indeed harmonic in R
3\∂Ωi, and the representation equations (2.30)
and (2.32) can be applied, leading to a system of integral equations involving two types of
unknowns: the potential Vi and the normal current (σ∂nV )i on each interface.
The surfaces are represented by triangular meshes. To fix ideas, we consider a three-
layer geometrical model for the head (cf. figure 2.7). Conductivities of each domain are re-
spectively denoted σ1, σ2 and σ3. The surfaces enclosing these homogeneous conductivity
regions are denoted S1 (inner skull boundary), S2 (skull-scalp interface) and S3 (scalp-air
interface). Denoting ψ(k)i the P0 function associated to triangle i on surface Sk, and φ
(l)j
the P1 function associated to node j on surface Sl, the potential V on surface Sk is ap-
proximated as VSk(r) =
∑
i x(k)i φ
(k)i (r), while p = σ∂nV on surface Sk is approximated by
pSk(r) =
∑
i y(k)i ψ
(k)i (r).
As an illustration, considering the source term to be restricted to the brain compartment
Ω1, the variables(
xk
)
i= x
(k)i and
(
yk
)
i= y
(k)i satisfy the linear system:
(σ1+σ2)N11 −2D∗
11 −σ2N12 D∗
12 0
−2D11 (σ−11 +σ
−12 )S11 D12 −σ
−12 S12 0
−σ2N21 D∗
21 (σ2+σ3)N22 −2D∗
22 −σ3N23
D21 −σ−12 S21 −2D22 (σ−1
2 +σ−13 )S22 D23
0 0 −σ3N32 D∗
32 σ3N33
x1
y1
x2
y2
x3
=
b1
c1
0
0
0
(2.33)
where b1 (resp. c1) are the coefficients of the P0 (resp. P1) boundary element decomposition
of the source term ∂nvΩ1 (resp. −σ−11 vΩ1).
The blocks Nij and Dij map a potential Vj on Sj to a quantity defined on Si. The blocks
Sij map a normal current pj on Sj to a quantity defined on Si. The resulting matrix is block-
diagonal, and symmetric, hence the name “symmetric BEM”.
With OpenMEEG, the deflation is done by forcing the average potential on the external
surface to be zero. One can prove that this can be done by correcting the last diagonal block
with a matrix filled with ones. With the three layer BEM, this corresponds to the replacement
of N33 by a matrix N33 + α11T , α > 0.
Compared to the standard BEM, the symmetric BEM introduces an additional unknown
into the problem: the continuity of the normal current through the interfaces is guaranteed.
By doing so the Symmetric BEM leads to larger system matrices but demonstrates signif-
icantly higher accuracy than the double-layer BEM [131]. This is illustrated in the next
75
section where the implementation of the SymBEM we have contributed to develop via the
OpenMEEG software project, is compared in terms of precision to other available implemen-
tations of the BEM.
2.5 IMPLEMENTATION
We have seen during the presentation of the BEM and Symmetric BEM that the compu-
tation of the electric potential produced by a dipole leads to a linear system that we write for
simplicity:
A[V ] = b .
For the SymBEM the notation [V ] also contains the discretized normal derivative of the
potential, i.e., the normal current.
After applying the deflation to A, the potential (and the normal current with the SymBEM)
on each interface is therefore given by:
[V ] = A−1b.
In practice, the potential is only measured in EEG at the position of the electrodes on the
outer surface of the head model. As a consequence, the forward problem of EEG requires to
compute the linear operator, denoted E, that maps the potential defined on the interfaces to
the potential at the sensors positions. In practice, this is just an interpolation of the potential
computed on the outer surface. This leads to:
[Veeg] = EA−1b .
The forward field, denoted geeg, for EEG is therefore computed in 5 steps:
• computation of A
• inversion of A
• computation of b
• computation of E
• computation of geeg = EA−1b
With MEG, one needs to add the computation of the matrix that maps the primary cur-
rents directly to the MEG sensors. This corresponds to the B0 term in equation (2.8). Let
us denote by c this linear operator (considering only one dipolar source). Like with EEG,
one also need to compute the operator that computes the effects of the potential on the MEG
sensors. The matrix is denoted D.
The forward field, denoted gmeg, for MEG is then computed in 6 steps:
• computation of A
• inversion of A
• computation of b
• computation of D
• computation of c
• computation of gmeg = c + DA−1b
When considering multiple dipoles, like in chapter 3 for distributed source models, the
forward field for EEG and MEG is computed for each of them and the concatenation of all the
76 CHAPTER 2. THE FORWARD PROBLEM
Pro
gra
mm
ing
La
ngu
age
Lic
en
ce
DM
wit
hsp
heri
cal
HM
inM
/EE
G
MM
wit
hsp
heri
cal
HM
inM
EG
Sta
nd
ard
BE
M
Sta
nd
ard
BE
Mw
ith
ISA
en
ab
led
FE
M
Sym
metr
icB
EM
Brainstorm Matlab GPL v2√ √ √ √
Simbio Fortran/C++ GPL v2√ √ √ √
Fieldtrip (dipoli) C Not open source√ √ √
SPM & Fieldtrip
(BEMCP)C/Matlab GPL v2
√ √
MNE C/C++ Not open source√ √ √
OpenMEEG C++ Cecill-B√
Table 2.1: Review of non commercial software computing the forward problem in M/EEG. DM
stands for Dipolar Model, MM for Multipolar Model and HM stands for Head Model.
forward fields leads to what is usually called the leadfield matrix, or the gain matrix, that
will be denoted G. Each column of G is the forward field of one dipole.
2.6 SOFTWARE
2.6.1 Review of non commercial available software
We now present a list of non commercial software packages that can be used to compute
the forward problem solutions presented within this chapter. Some of these packages are
completely open source: OpenMEEG, Simbio, Fieldtrip (BEMCP implementation) and Brain-
storm. The Fieldtrip Toolbox shares its code for M/EEG forward and inverse modeling with
the SPM toolbox. It offers two implementations of the BEM. The first one, called dipoli, was
written by Oostendorp and is not open source (only binary files for Linux are available), while
the second one, called BEMCP, is opensource and was written by Christoph Phillips during
his PhD [176]. The Simbio solver implements a linear Galerkin method (P1 elements with
P1 test functions) as described in [57], while dipoli and BEMCP implement linear collocation
methods. The dipoli implementation details can be found in [165]. The MNE Matlab Tool-
box also offers a linear collocation implementation of the BEM usable via binary files. This
information is summarized in Table 2.1.
We now describe the work accomplished in the OpenMEEG software package. The fol-
lowing paragraph can be read as a short manual for computing the forward problem with
OpenMEEG.
77
2.6.2 OpenMEEG
Let us first present how the description in section 2.5 is transposed in the OpenMEEG nam-
ing conventions.
The matrix:
• A is called HeadMat or Head Matrix
• A−1 is the inverse of HeadMat
• B is called SourceMat or Source Matrix
• C is DipSource2MEGMat
• D is Head2MEGMat
• E is Head2EEGMat
The OpenMEEG package takes as input:
• subject.geom : a file describing the geometry of the head (see. table 2.2)
• subject.cond : a file containing the conductivity of each tissue of the head (see. ta-
ble 2.3)
• eeg electrodes.txt : a file containing the 3D positions of each EEG electrode (3
coordinates on each line)
• dipoles.txt : a file containing the 3D positions and orientations of each current dipole
(6 values on each line)
• meg squids.txt : a file containing the 3D positions and orientations of each MEG
sensor (6 values on each line). More complex sensors can also be modeled by integration,
or with finite differences like for basic gradiometers.
The different matrices are computed with the following command lines (.bin file extension
corresponds to matrix files stored in binary format):
OpenMEEG with the command line
Matrix A:
$ om assemble -HeadMat subject.geom subject.cond HeadMat.bin
Note: the abbreviated option names -HM or -hm can be used instead of -HeadMat.
Matrix A−1:
$ om minverser HeadMat.bin HeadMatInv.bin
Matrix B:$ om assemble -DipSourceMat subject.geom subject.cond dipoles.txt
SourceMat.bin
Note: the abbreviated option names -DSM or -dsm can be used instead of -DSM.
Matrix E:
78 CHAPTER 2. THE FORWARD PROBLEM
# Domain Description 1.0
Interfaces 3 Mesh
skull.tri
brain.tri
scalp.tri
Domains 4
Domain Skin 1 -3
Domain Brain -2
Domain Air 3
Domain Skull 2 -1
Table 2.2: Sample geometry file for OpenMEEG. It provides the names of the meshes for
all the interfaces and the structure by specifying which regions are being separated by each
mesh, e.g., the Skin region is between the meshes skull.tri (1 means that Skin is outside the
first interface on the list, i.e., “skull.tri”) and scalp.tri (-3 means that the Skin is inside the
third interface on the list, i.e., “scalp.tri”).
# Properties Description 1.0 (Conductivities)
Air 0.0
Skin 1
Brain 1
Skull 0.0125
Table 2.3: Sample conductivity file for OpenMEEG. It specifies the conductivity values for
each tissue.
79
$ om assemble -Head2EEGMat subject.geom subject.cond eeg electrodes.txt
Head2EEGMat.bin
Note: the abbreviated option names -H2EM or -h2em can be used instead of -Head2EEGMat.
Matrix D:$ om assemble -Head2MEGMat subject.geom subject.cond meg squids.txt
Head2MEGMat.bin
Note: the abbreviated option names -H2MM or -h2mm can be used instead of -Head2MEGMat.
Matrix C:$ om assemble -DipSource2MEGMat dipoles.txt meg squids.txt
Source2MEGMat.bin
Note: the abbreviated option names -DS2MM or -ds2mm can be used instead of -DipSource2MEGMat.
Matrix Geeg:
$ om gain -EEG HeadMatInv.bin SourceMat.bin Head2EEGMat.bin
GainEEGMat.bin
Matrix Gmeg:
$ om gain -MEG HeadMatInv.bin SourceMat.bin Head2MEGMat.bin
Source2MEGMat.bin GainMEGMat.bin
During this PhD we contributed to add a set of features to the OpenMEEG package:
• An scripting interface with Python: A demo script is provided in table 2.4, with its
output in table 2.5.
• Parallel processing with OpenMP to speed up computations on machines with multiple
processors (it required to rewrite part of the code used by the operators.). Computation
times with parallel processing enabled are available in figure 2.10. It can be observed
that our parallel implementation offers a significative improvement in terms of com-
putation time. The computation time of A and B decreases almost linearly with the
number of threads. The matrix inversion is not multithreaded which explains why the
computation of A−1 is not improved when increasing the number of threads. With 2 pro-
cessors, a gain matrix with a realistic head model (cf. figure 2.10(c)) is assembled about
2 times faster, while with 8 processors, the same gain matrix is assembled 3 times faster.
• An advanced testing procedure to guarantee the integrity of the results obtained by the
forward problem computation. Results can be compared at no cost to analytical solutions
obtained with 3 layer spherical head models like the one presented in figure 2.8. This
procedure is based on the CTest testing software. The output of the testing procedure is
presented in table 2.6.
• A Matlab interface within the Fieldtrip Toolbox and SPM Toolbox (see table 2.13).
• A multi platform packaging system based on CPack allowing easy deployment on all
architectures (Linux, Mac and Windows environments).
Thanks to the integration of the OpenMEEG software into the Fieldtrip Toolbox, we have
been able to demonstrate that the precision obtained by our numerical solution of the for-
ward problem clearly outperforms the standard BEM implementations offered by the SPM
and Fieldtrip Toolbox (dipoli and BEMCP). We have also been implicated in the development
80 CHAPTER 2. THE FORWARD PROBLEM
#!/usr/bin/env python
import openmeeg as om
# =============
# = Load data =
# =============
condFile=’om_demo.cond’
geomFile=’om_demo.geom’
dipoleFile=’cortex.dip’
squidsFile=’meg_squids.txt’
electrodesFile=’eeg_electrodes.txt’
geom = om.Geometry()
geom.read(geomFile,condFile)
dipoles = om.Matrix()
dipoles.load(dipoleFile)
squids = om.Sensors()
squids.load(squidsFile)
electrodes = om.Matrix()
electrodes.load(electrodesFile)
# =================================================
# = Compute forward problem (Build Gain Matrices) =
# =================================================
gaussOrder = 3; # Integration order over the triangles in the BEM
hm = om.HeadMat(geom,gaussOrder)
hminv = hm.inverse()
dsm = om.DipSourceMat(geom,dipoles,gaussOrder)
ds2mm = om.DipSource2MEGMat(dipoles,squids)
h2mm = om.Head2MEGMat(geom,squids)
h2em = om.Head2EEGMat(geom,electrodes)
gain_meg = om.GainMEG(hminv,dsm,h2mm,ds2mm)
gain_eeg = om.GainEEG(hminv,dsm,h2em)
print "hm : %d x %d"%(hm.nlin(),hm.ncol())
print "hminv : %d x %d"%(hminv.nlin(),hminv.ncol())
print "dsm : %d x %d"%(dsm.nlin(),dsm.ncol())
print "ds2mm : %d x %d"%(ds2mm.nlin(),ds2mm.ncol())
print "h2mm : %d x %d"%(h2mm.nlin(),h2mm.ncol())
print "h2em : %d x %d"%(h2mm.nlin(),h2mm.ncol())
print "gain_meg : %d x %d"%(gain_meg.nlin(),gain_meg.ncol())
print "gain_eeg : %d x %d"%(gain_eeg.nlin(),gain_eeg.ncol())
Table 2.4: Demo script for computing the forward problem with OpenMEEG in Python.
81
Sorted List : 1 0 2
Sorted Domains : Brain Skull Scalp Air
Total number of points : 126
Total number of triangles : 240
Checking
Mesh 0 : internal conductivity = 1 and external conductivity = 0.0125
Mesh 1 : internal conductivity = 0.0125 and external conductivity = 1
Mesh 2 : internal conductivity = 1 and external conductivity = 0
OPERATOR S...
[********************]
OPERATOR S...
[********************]
OPERATOR S...
[********************]
OPERATOR N...
[********************]
OPERATOR N...
[********************]
OPERATOR N...
[********************]
OPERATOR D (Optimized)...
[********************]
OPERATOR D (Optimized)...
[********************]
OPERATOR D (Optimized)...
[********************]
OPERATOR D (Optimized)...
[********************]
OPERATOR S...
[********************]
OPERATOR S...
[********************]
OPERATOR N...
[********************]
OPERATOR N...
[********************]
OPERATOR D (Optimized)...
[********************]
[********************]
[********************]
hm : 286 x 286
hminv : 286 x 286
dsm : 286 x 42
ds2mm : 162 x 42
h2mm : 162 x 286
h2em : 162 x 286
gain_meg : 162 x 42
gain_eeg : 42 x 42
Table 2.5: Output of Python demo script presented in table 2.4.
82 CHAPTER 2. THE FORWARD PROBLEM
Running tests...
Start processing tests
Test project openmeeg_trunk
1/ 73 Testing matlibtest Passed
2/ 73 Testing HM-Head1 Passed
3/ 73 Testing HMINV-Head1 Passed
4/ 73 Testing SSM-Head1 Passed
5/ 73 Testing AI-Head1 Passed
6/ 73 Testing H2EM-Head1 Passed
7/ 73 Testing SurfGainEEG-Head1 Passed
8/ 73 Testing ESTEEG-Head1 Passed
9/ 73 Testing EEG-HEAT-Head1 Passed
10/ 73 Testing EEG-MN-Head1 Passed
11/ 73 Testing EEG-TV-Head1 Passed
12/ 73 Testing H2MM-Head1 Passed
13/ 73 Testing SS2MM-Head1 Passed
14/ 73 Testing SurfGainMEG-Head1 Passed
15/ 73 Testing ESTMEG-Head1 Passed
16/ 73 Testing MEG-HEAT-Head1 Passed
17/ 73 Testing MEG-MN-Head1 Passed
18/ 73 Testing MEG-TV-Head1 Passed
19/ 73 Testing DSM-Head1 Passed
20/ 73 Testing DS2MM-Head1 Passed
21/ 73 Testing DipGainEEG-Head1 Passed
22/ 73 Testing DipGainMEG-Head1 Passed
...
52/ 73 Testing compareEEGEST-dip-Head1-d1 Passed
53/ 73 Testing compareEEGEST-dip-Head2-d1 Passed
54/ 73 Testing compareEEGEST-dip-Head1-d2 Passed
55/ 73 Testing compareEEGEST-dip-Head2-d2 Passed
56/ 73 Testing compareEEGEST-dip-Head1-d3 Passed
57/ 73 Testing compareEEGEST-dip-Head2-d3 Passed
58/ 73 Testing compareEEGEST-dip-Head1-d4 ***Failed - supposed to fail
59/ 73 Testing compareEEGEST-dip-Head2-d4 Passed
60/ 73 Testing compareEEGEST-dip-Head1-d5 ***Failed - supposed to fail
61/ 73 Testing compareEEGEST-dip-Head2-d5 Passed
62/ 73 Testing compareMEGEST-dip-Head1-d1 Passed
63/ 73 Testing compareMEGEST-dip-Head2-d1 Passed
64/ 73 Testing compareMEGEST-dip-Head1-d2 Passed
65/ 73 Testing compareMEGEST-dip-Head2-d2 Passed
66/ 73 Testing compareMEGEST-dip-Head1-d3 Passed
67/ 73 Testing compareMEGEST-dip-Head2-d3 Passed
68/ 73 Testing compareMEGEST-dip-Head1-d4 Passed
69/ 73 Testing compareMEGEST-dip-Head2-d4 Passed
70/ 73 Testing compareMEGEST-dip-Head1-d5 Passed
71/ 73 Testing compareMEGEST-dip-Head2-d5 Passed
72/ 73 Testing compareMEGEST-dip-Head1-d6 ***Failed - supposed to fail
73/ 73 Testing compareMEGEST-dip-Head2-d6 Passed
100% tests passed, 0 tests failed out of 73
Table 2.6: Output of testing procedure for OpenMEEG. Output is systematically compared to
analytical solutions with spherical head models.
83
of the SimBio package during this PhD allowing to add to the comparison the SimBio imple-
mentation of the BEM with ISA. The sample dataset used to demonstrate this is presented
in figure 2.8 with 5 dipoles at various distances from the inner layer. The quantification of
performance is based on the Relative Difference Measure (RDM) and the ratio of Magnitude
(MAG) between each numerical solution and the analytical solution. The analytical solutions
are computed with the formulas detailed in section 2.3.1.
The RDM between two forward fields is defined as:
RDM(gnumeric, ganalytic) =
∥
∥
∥
∥
gnumeric
‖gnumeric‖− ganalytic
‖ganalytic‖
∥
∥
∥
∥
∈ [0 , 2] .
The closer to 0 is the RDM, the better it is.
The MAG between two forward fields is defined as:
MAG(gnumeric, ganalytic) =‖gnumeric‖‖ganalytic‖
The closer to 1 is the MAG, the better it is.
The results with EEG forward fields are presented in figure 2.9 for three-shell spherical
head models having 3 different point samplings on each interface. One with only 42 vertices
per interface and 42 EEG electrodes, one with 162 points per interface and 162 EEG elec-
trodes, and one with 642 points per interface and 642 EEG electrodes. The radii of the 3
shells, supposed to reproduce the inner surface of the skull, the outer surface of the skull and
the skin, were set to 88, 92 and 100. The conductivities of the three domains were set to the
ones commonly used in the literature: 1, 1/80 and 1.
From these simulations we can conclude that:
• The BEMCP implementation is clearly the less accurate solver.
• SimBio and DIPOLI give very similar results.
• Our implementation of the Symmetric BEM provides the most accurate solutions.
The numerical values plotted in figure 2.9 are reproduced in tables 2.7, 2.8, 2.9, 2.10, 2.11
and 2.12.
Remark. Late in our investigations, we also experimented the Matlab BEM Toolbox developed
by Matti Stenroos (avalaible at http://peili.hut.fi/BEM/). This software implements
the classical formulation of the BEM from Geselowitz with a linear collocation method. Our
first results with this toolbox demonstrate that it does not provide a better accuracy than
OpenMEEG either.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 1.87e-01 1.06e+00 1.77e+00 1.84e+00 1.86e+00
DIPOLI 1.12e-01 2.58e-01 5.12e-01 6.51e-01 7.36e-01
OpenMEEG 4.23e-02 9.99e-02 1.57e-01 2.03e-01 2.45e-01
SimBio 5.89e-02 1.88e-01 4.44e-01 5.63e-01 6.23e-01
Table 2.7: RDMs precision results with 42 vertices per interface.
84 CHAPTER 2. THE FORWARD PROBLEM
(a) 3 layers spherical head model (b) Zoom
Figure 2.8: Spherical head model with 5 dipoles close to the inner layer.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 1.59e-01 3.69e-01 1.44e+00 1.82e+00 1.86e+00
DIPOLI 5.45e-02 9.77e-02 1.34e-01 2.89e-01 5.59e-01
OpenMEEG 2.55e-02 5.23e-02 8.10e-02 1.03e-01 1.29e-01
SimBio 4.49e-02 7.73e-02 1.14e-01 2.77e-01 5.46e-01
Table 2.8: RDMs precision results with 162 vertices per interface.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 6.38e-02 1.83e-01 2.77e-01 6.62e-01 1.83e+00
DIPOLI 2.39e-02 4.32e-02 5.25e-02 5.97e-02 1.76e-01
OpenMEEG 2.76e-03 6.91e-03 1.03e-02 1.31e-02 1.77e-02
SimBio 1.91e-02 3.43e-02 4.15e-02 4.88e-02 1.79e-01
Table 2.9: RDMs precision results with 642 vertices per interface.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 2.66e+00 2.67e+00 1.44e+01 5.27e+01 2.40e+02
DIPOLI 8.09e-01 9.03e-01 1.60e+00 3.25e+00 9.45e+00
OpenMEEG 1.07e+00 1.04e+00 9.99e-01 9.62e-01 9.19e-01
SimBio 1.49e+00 1.31e+00 1.18e+00 1.09e+00 9.88e-01
Table 2.10: MAGs precision results with 42 vertices per interface.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 1.32e+00 1.48e+00 1.81e+00 1.16e+01 6.94e+01
DIPOLI 8.06e-01 7.87e-01 8.25e-01 1.07e+00 2.53e+00
OpenMEEG 1.11e+00 1.12e+00 1.15e+00 1.18e+00 1.21e+00
SimBio 8.11e-01 7.96e-01 8.38e-01 1.09e+00 2.58e+00
Table 2.11: MAGs precision results with 162 vertices per interface.
85
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
Distance to inner layer
RD
M
OpenMEEGBEMCPDIPOLISimBio
(a) RDM 42 points per interface
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
Distance to inner layer
MA
G
OpenMEEGBEMCPDIPOLISimBio
(b) MAG with 42 points per interface
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
Distance to inner layer
RD
M
OpenMEEGBEMCPDIPOLISimBio
(c) RDM with 162 points per interface
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
Distance to inner layer
MA
G
OpenMEEGBEMCPDIPOLISimBio
(d) MAG with 162 points per interface
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
Distance to inner layer
RD
M
OpenMEEGBEMCPDIPOLISimBio
(e) RDM with 642 points per interface
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
Distance to inner layer
MA
G
OpenMEEGBEMCPDIPOLISimBio
(f) MAG with 642 points per interface
Figure 2.9: Evaluation of precision of different implementations of the BEM with three layers
spherical head models. We observe that the Symmetric BEM outperforms in term of precision
the other methods.
Distance to inner layer 45.5 20 11.5 7.25 3.85
BEMCP 1.07e+00 1.11e+00 1.21e+00 9.49e-01 1.11e+01
DIPOLI 9.15e-01 9.05e-01 8.98e-01 9.01e-01 1.08e+00
OpenMEEG 1.00e+00 1.01e+00 1.01e+00 1.01e+00 1.01e+00
SimBio 9.28e-01 9.21e-01 9.16e-01 9.22e-01 1.12e+00
Table 2.12: MAGs precision results with 642 vertices per interface.
86 CHAPTER 2. THE FORWARD PROBLEM
1 2 4 80
2
4
6
8
10
12
14
Number of threads
Com
puta
tion tim
e (
s)
A A−1 B E G
(a) Head model with 3 spheres (162 points perinterface) and 1 dipole.
1 2 4 80
50
100
150
200
250
300
Number of threads
Co
mp
uta
tio
n t
ime
(s)
A A−1 B E G
(b) Head model with 3 spheres (642 points perinterface) and 1 dipole.
1 2 4 80
100
200
300
400
500
600
Number of threads
Co
mp
uta
tio
n t
ime
(s)
A A−1 B E G
(c) Realistic head model with 600, 638 and 625points on the 3 interfaces and 14055 dipoles.
Figure 2.10: Computation times with parallel processing enabled for all the steps required to
compute an EEG leadfield. Tests on 3 head models: 2 spherical head models with 1 dipolar
source and 3 layers (162 and 642 vertices per interface) and 1 realistic head model with
600, 638 and 625 points on the 3 interfaces and 14055 dipolar sources. Computation was
performed on a Linux 64bit architecture with 8 processors and 64 GB of RAM. Thanks to
the parallel implementation of the operators, the computation time of A and B decreases
almost linearly with the number of threads. The matrix inversion is not multithreaded which
explains why the computation of A−1 is not improved when increasing the number of threads.
87
1 % ===========================================
2 % = Generate a 3 layers spherical headmodel =
3 % ===========================================
4
5 % 3 Layers
6 r = [100 92 88]; % radius of each interface
7 c = [1 1/80 1]; % conductivity within each interface
8
9 [pnt,tri] = icosahedron162; % sphere with 162 vertices per interface
10
11 % create a set of electrodes on the outer surface
12 sens.pnt = max(r) * pnt;
13 sens.label = ;
14 nsens = size(sens.pnt,1);
15 for ii=1:nsens
16 sens.labelii = sprintf(’vertex%03d’, ii);
17 end
18
19 % Position of the dipole
20 pos = [0 0 70];
21
22 % create a BEM volume conduction model (3 nested interfaces)
23 vol = [];
24 for ii=1:length(r)
25 vol.bnd(ii).pnt = pnt * r(ii);
26 vol.bnd(ii).tri = tri;
27 end
28 vol.cond = c;
29
30 % =========================
31 % = Compute the leadfield =
32 % =========================
33
34 % compute the BEM
35 cfg.method = ’openmeeg’; % can be dipoli or bemcp
36 vol = prepare_bemmodel(cfg, vol);
37 lf_openmeeg = compute_leadfield(pos, sens, vol);
38
39 % lf_openmeeg is a 162 x 3 matrix
40 % each column of lf_openmeeg is the forward field of a dipole
41 % in one direction of the coordinate system
Table 2.13: Computing an EEG leadfield with Fieldtrip and OpenMEEG.
88 CHAPTER 2. THE FORWARD PROBLEM
2.7 CONCLUSION
Brain functional imaging with M/EEG requires an efficient and accurate forward model.
In this section, we have presented the general framework to achieve good forward modeling.
It implied to introduce the physics with Maxwell equations, to make a set of hypothesis like
the quasi-static approximation and to use simplified head models. We have presented the
theory and discussed some implementation details before finally demonstrating that the for-
ward modeling that we contributed to develop and promote in the M/EEG community is the
most accurate BEM solver available.
Before closing this chapter, we would like to mention recent and promising work on for-
ward modeling with anisotropic conductivity models [170], i.e., models where the conductivity
can vary inside the same tissue. This method avoids the meshing step required by the above
FEM and BEM methods and provides quite accurate solutions in a very reasonable amount
of time.
We will now move on to the other fundamental aspect of brain functional imaging with
MEG: the inverse problem.
CHAPTER 3
THE INVERSE PROBLEM WITH
DISTRIBUTED SOURCE MODELS
In chapter 2, we have seen how the tiny electromagnetic fields produced by the neural activity
is modeled to understand what is measured with M/EEG devices. This aspect of the process-
ing of M/EEG data is called the forward problem. Its counterpart is the inverse problem. The
inverse problem of M/EEG is the procedure that consists in recovering the distribution of the
neural generators that have produced the measurements. Three main types of approaches
exist to solve this problem:
1. The parametric models usually referred to as dipole fitting approaches.
2. The beamforming or scanning techniques.
3. The image-based methods with distributed source models.
In this chapter, the first two approaches are briefly explained and commented. The last
one, the image-based method, is presented in more detail. The M/EEG inverse problem is pre-
sented in a classical framework where the solution is penalized with a regularization prior.
Standard priors are based on ℓ2 norms, leading to differentiable optimization problems and
closed-form solutions. In this chapter, focus is put on such priors, going from the simple
“Minimum-Norm” (MN) approach to spatiotemporal solvers and learning-based methods re-
cently presented in the literature.
This chapter covers the methodological aspects of multiple inverse solvers based on ℓ2priors. It presents for each of them the hypotheses made and the eventual limitations when
used with M/EEG data. The optimization strategies employed will also be commented and
discussed. For experimental results and simulation studies with the reviewed solvers, we
refer the reader to the original papers. We also provide the code snippets to use these solvers
using EMBAL , an open source toolbox that we wrote during this thesis.
Contents
3.1 General introduction to inverse methods . . . . . . . . . . . . . . . . . . . 91
3.1.1 Parametric models and dipole fitting approaches . . . . . . . . . . . . . 91
3.1.2 Scanning methods: the beamformers . . . . . . . . . . . . . . . . . . . . 91
3.1.3 Image-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 Minimum norm solutions and its variants . . . . . . . . . . . . . . . . . . 95
3.2.1 The Minimum-Norm solution . . . . . . . . . . . . . . . . . . . . . . . . 96
3.2.2 Variants around the minimum-norm solution . . . . . . . . . . . . . . . 99
89
90 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
3.3 Learning-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3.1 Model selection using a multiresolution approach: MiMS . . . . . . . . . 107
3.3.2 Restricted Maximum Likelihood (ReML) and Sparse Bayesian Learn-
ing (SBL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
91
3.1 GENERAL INTRODUCTION TO INVERSE METHODS
Notations
All matrices and vectors are written in bold letters. Matrices, such as A, are written in
upper case, whereas vectors, such as b, are in lower case. A real valued matrix A ∈ Rn×p has
n rows and p columns. The notation ‖A‖F stands for the Frobenius norm of A, while |||A|||2,
or simply |||A|||, stands for the spectral norm. The notation ‖b‖2 stands for the ℓ2 norm of the
vector b. The matrix I stands for the identity matrix.
3.1.1 Parametric models and dipole fitting approaches
The dipole fitting approaches assume that the measured data have been produced by a small
number of active regions that can each be modeled by an equivalent current dipole (ECD). The
number of ECDs, denoted K, is fixed. Each dipole i has a position ri and a moment qi. The
strength of the dipole is given by xi = ‖qi‖2. We denote gi(ri,qi) the forward field produced by
this dipole and m the M/EEG measurements (cf. chapter 2). Since the forward field depends
linearly on xi, the forward field gi(ri,qi) can be rewritten gi(ri,qi
‖qi‖ )xi. The data m can
correspond either to one time instant or to a block of time samples. If dipoles are allowed to
move during the time window of interest, the method is called moving dipole whereas if it can
only rotate it is referred to as rotating dipole. Parametric dipole fitting algorithms minimize
a data fit cost function such as the Frobenius norm of the residual [11, 156, 193]:
min(ri,qi)i=1,...,K
∥
∥
∥
∥
∥
m−K∑
i=1
gi
(
ri,qi
‖qi‖
)
xi
∥
∥
∥
∥
∥
2
F
.
This optimization problem is non linear, and solvers are easily trapped in local minima as
soon as K > 1. The optimization strategies employed range from Levenberg-Marquardt and
Nelder-Meade downhill simplex searches to global optimization schemes using multistart
methods, genetic algorithms and simulated annealing [209].
The main limitation with these methods is that the user has to fix a priori the number
of active regions. For these reasons, dipole fitting approaches are commonly used with only
one dipole, or sometimes a few, after one is set to a known position. These limitations imply
that such methods can only be used reliably with one very focal active region. This is usually
a valid assumption for brain activations occurring shortly after stimulation. However, when
other functional imaging data with high spatial resolution, like fMRI activation maps, are
available, the position of the dipoles can be supposed as known. In this case, multiple dipoles
can be manually positioned. Their amplitudes and eventually their orientations are the only
estimated parameters. Such an approach is illustrated in chapter 5 where investigations on
the human visual cortex with this methodology are presented.
3.1.2 Scanning methods: the beamformers
The Bearmforming approaches, a.k.a., scanning methods, avoid the convexity issue by scan-
ning a region of interest, typically the gray matter forming the cortical mantle. It is also
possible to sample dipoles on a regular grid within the cortical envelope. An estimator of
the contribution of each putative source location to the data can be derived either via spatial
filtering techniques or signal classification indices. Historically scanning methods were first
introduced in the radar and sonar community.
92 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
1 clear options
2 % C is the covariance matrix
3 options.C = C;
4 % pct a percentage to regularize the inversion of C
5 options.pct = 10;
6 [X,W] = lcmv_inverse(M,G,options);
7 % W contains the spatial filters of all the dipoles
Table 3.1: Running a LCMV beamformer with EMBAL .
In its simplest presentation, a spatial filter is a vector w, function of the location and
orientation of the dipole of interest, that when correlated to the measurements m provides
an estimate of the moment’s amplitude. Equivalently:
x(r) = wT m
where x denotes the moment’s amplitude of the dipolar source considered. A well designed
spatial filter, should filter out sources which do not come from a small volume around r.
When considering dipolar sources with unconstrained orientation, the spatial filter w is a
matrix with 3 columns (one per orientation) and the result of wT m provides the moment of
the dipole.
The simplest spatial filter is called a matched filter. Let gi be the forward field of a dipole
at position ri with a normalized moment q (‖q‖2 = 1). The vector gi can be seen as the ith
column of the lead field and the matched filter is obtained by normalizing this column. The
spatial filter for this position and orientation is given by
w =gi
‖gi‖2
This approach guarantees that, when only one source is active, the absolute maximum of the
estimate corresponds to the true maximum. In practice this assumption is usually not valid.
Since the correlation between the columns of the leadfield matrix is high, the spatial resolu-
tion of the matched filter is limited. The goal of more advanced spatial filters is to estimate
the activity at a source point while avoiding the crosstalk from other regions. By doing so,
the perturbing sources have little influence on the estimation in the region of interest.
The most common spatial filter is the Linearly Constrained Minimum Variance (LCMV)
beamformer [215]. It attempts to minimize the beamformer output power subject to a unity
gain constraint:
wLCMV = arg minw
trace(wT Cw) subject to wT g = 1 (3.1)
where C is the data covariance matrix. By constraining the gain at the considered location
and minimizing the energy projected from elsewhere the LCMV beamformer limits the influ-
ence of the noise and the crosstalk between the different sources. The constrained optimiza-
tion problem (3.1) is solved with the method of Lagrange Multipliers under the assumptions
of decorrelation between different sources, and between the sources and the noise. The solu-
tion is given by:
wTLCMV = (gT C−1g)−1gT C−1 .
The formula shows it is important to have a correct estimate of the covariance matrix C which
implies in practice to have a sufficient amount of data.
Synthetic aperture magnetometry (SAM) [217] is an alternative the LCMV beamformer.
Contrary to LCMV, SAM works on statistical quantities based on the differences between
93
1 clear options
2 % k is the dimension of the signal subspace
3 options.k = k;
4 [X] = music_inverse(M,G,options);
5 % X contains the scores obtained by the MUSIC cost function
Table 3.2: Running MUSIC with EMBAL .
a control period and the period of activations. Therefore SAM integrates some additional
information linked to the design of the experimental paradigms.
An attractive feature of the beamformer methods is that they do not require any a pri-
ori on the number of underlying sources. However, they make the strong assumption that
the activations of the different sources are uncorrelated. This hypothesis is a particularly
critical point. Neural activations in different parts of the brain often co-activate, forming
what a network of correlated sources. Even if simulation results [194] and evaluation on
real data [97] seem to indicate LCMV-based beamforming methods are robust to moderate
levels of source/interference correlation, it is still a fundamental limitation of spatial filtering
methods.
The alternative to spatial filtering originates also from the radar and sonar community.
These methods are based on signal classification between signal and noise via signal sub-
spaces. The MUltiple SIgnal Classification (MUSIC) is the most popular of these methods
[154]. In the MUSIC algorithm the space spanned by the measurements m is divided with
an SVD between the signal space and the noise space. If we write the SVD of m = USVT
and estimate the rank of the signal space to r, the signal space is spanned by the r first
columns of U denoted Ur. Note that for the SVD to make sense, the measurements m have to
contain multiple time instants. The cost function associated to MUSIC in the beamforming
framework is given by:‖(I−UrU
Tr )gi‖
‖gi‖. (3.2)
The linear operator (I−UrUTr ) acts as an orthogonal projection onto the noise subspace. The
smaller is this score the more the ith dipole contributes to the measurements.
A greedy strategy that aims at modeling source configurations with multiple generators
has been inspired by the MUSIC algorithm. This variant is called Recursively APplied MU-
SIC (RAP MUSIC) [153] since it consists in applying the MUSIC cost function successively
after removing the contribution of the previously identified sources. Like matching pursuit
algorithms are used for sparse signal decomposition over dictionary of atoms [144], the RAP
MUSIC method adopts a greedy strategy to select the relevant dipoles in a dictionary of
sources.
The MUSIC algorithm is relatively popular in the M/EEG community, probably because of
its robustness to noise and its ability to present precise locations for the current generators.
However, the necessity to set a priori the size of the signal subspace can be an issue. When
the amplitudes of the singular values of m do not exhibit a sharp drop-down after singular
value r demonstrating that m can be approximated with a rank r matrix, the definition of
the rank is left to the experience of the user. One clear advantage of the MUSIC method over
the LCMV beamformer is the relaxation of the decorrelation constraint between the sources.
However, MUSIC requires that the active dipoles have linearly independent time series.
94 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
3.1.3 Image-based methods
The alternative to dipole fitting and scanning methods is the image-based approach. With
this method source models, typically dipoles, are sampled, or distributed, over the source
space. The source space can be defined as a volume or as a surface. When considering dipo-
lar source models and a volumetric source space, dipoles are typically sampled on a regular
3D grid within the brain region. With surface-based source models, the dipoles are typically
distributed over the cortical mantle modeled as a triangular mesh. Such an approach was
pioneered by Dale and Sereno in [50], who presented surface-based reconstructions of the
neural activations. The neural activations are represented as scalar valued data, hence the
comparison with images. The difference comes from the discrete space over which the ac-
tivations are defined. Rather than having to deal with scalar valued data defined on a 2D
or 3D grid like in standard image processing applications, the space is a surface tessellation
defined with vertices and triangles. The surface-based distributed source model is illustrated
on a synthetic dataset presented in figure 3.1.
(a) Active region on inflated cortical mesh
(b) Active dipoles with constrained orientations (c) Zoom on active dipoles with constrainedorientations
Figure 3.1: Surface-based distributed dipolar sources illustration. The synthetic active re-
gion is located on the posterior part of the central sulcus where stands the human primary
somatosensory cortex (S1).
Source reconstructions on the cortical surface require to segment the cortex using a T1
95
MRI image (this type of anatomic data was used to estimate the triangulated interfaces for
the BEM in section 2.4.3). This step is rather complex but well handled by software such
as BrainVisa [38] or FreeSurfer [51], which provide almost fully automatic pipelines to run
the segmentations. Such pipelines are generally not integrated in commercial M/EEG source
imaging software1 that therefore only provide volumetric source spaces with 3D grids.
The current generators that produce the electromagnetic fields are known to be located in
the gray matter forming the cortex. This implies that the estimated sources should at least
be constrained to be located within the gray matter. This is achieved with surface source
models. To argue even more in favor for such models, we would like to mention that the fMRI
community also tends to map the 3D data acquired onto cortical surfaces [66]. Another reason
for this is that anatomical landmarks are more easily defined on cortical segmentation than
on volumetric data.
Orientation vs. no orientation constraint
With distributed dipolar source models, the orientation of the dipoles can either be defined a
priori using the normal to the cortical mesh (cf. figure 3.1(c)), or left unconstrained. When
the dipoles orientations are left unconstrained, 3 orthogonal dipoles are positioned at each
location. With MEG, since sensors are blind to the radial component of the field, only 2 can
be used. Considering our knowledge on the structure of the neural assemblies formed by the
pyramidal neurons (cf. chapter 1), constraining the orientation is a reasonable assumption.
One can also argue that the more a priori are used to compute neural estimates, the better
it is. However, practice shows that the orientation is a critical parameter for a dipole since it
affects its forward field on the M/EEG sensors a lot more that its 3D position. This suggests
that if orientation constraints are used, the normals to the cortical mesh should be very
accurately estimated. Depending on the brain location of the sources this can be more or less
challenging.
In this chapter, many illustrations are presented on the somatosensory cortex lying on the
post-central gyrus. The central sulcus and central gyrus of the cortex are major structures
of the human cortex and are very well segmented with anatomical pipelines. For this reason
the orientation constraint is generally well justified in this brain region.
3.2 MINIMUM NORM SOLUTIONS AND ITS VARIANTS
When orientations are fixed and only the amplitudes of the dipolar current generators
need to be estimated, the forward problem results in the following linear problem:
M = GX + E (3.3)
where G stands the forward operator, M corresponds to the measurements (Electric potential
or/and magnetic field), X contains the unknown amplitudes of the sources and E is the noise.
We denote the number of sources by dx, the number of sensors by dm and the number of
time instants by dt. With these notations, we have, M ∈ Rdm×dt , G ∈ R
dm×dx , X ∈ Rdx×dt and
E ∈ Rdm×dt .
In practice, dm is in the range of 10, for low resolution EEG, and 400, for high resolution
MEG and EEG combined studies. The parameter dt is commonly between 1 and a few thou-
sand. With the digital amplifiers used in M/EEG, the sampling rate can be over 1000 Hz
1To our knowledge only the software package Curry from http://www.neuroscan.com/ provides segmentationsand surface-based reconstructions.
96 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
which leads to high values of dt when recording several seconds of signal. The number of
sources dx is given by the number of dipoles distributed over the cortical surface. In practice
the number of vertices on a typical segmented cortical surface ranges from 10000 to 50000.
With dipoles having their orientations constrained by the normal to the mesh the number of
dipoles corresponds to the number of vertices. When the orientation is not constrained the
number of dipoles can be three times bigger.
The conclusion of the latter remarks is that the M/EEG inverse problem with distributed
source models is strongly ill posed (dm ≪ dx). The number of unknowns is much bigger than
the number of equations. To get estimates of the sources X given only the the measurements
M requires therefore to consider priors on the solutions.
Setting a prior typically consists in assuming that the solution is small for a given norm
denoted just for now by ‖ ‖. In other words, we assume that a good estimate X∗ of the true
source distribution is given by the solution to the following optimization problem:
X∗ = arg minX
‖M−GX‖F , subject to ‖X‖ ≤ η . (3.4)
The norm denoted by ‖‖F corresponds to the Frobenius norm of the matrix and the param-
eter η controls the regularity of the solution. We will refer to ‖M−GX‖F as the reconstruction
error or the norm of the residual. In other words, we want to minimize the reconstruction er-
ror while imposing the solution to be small for a given norm.
Problem (3.5) can also be presented as:
X∗ = arg minX
‖X‖, subject to ‖M−GX‖F ≤ δ . (3.5)
In this case, we want to minimize the norm of the solution while imposing that the recon-
struction error is smaller than δ.
Remark. In statistics, the norm of ‖X‖ refers to the model complexity. By choosing the norm,
i.e, a prior, a model is assumed for the solution. By constraining ‖X‖we impose to the solution
to be simple according to a given measure of complexity. In other words, the optimal solution
is the “simplest” solution, for the model considered, that correctly explains the observed data.
In practice the problem (3.5) is more commonly presented in the Lagrangian formulation:
X∗ = arg minX
‖M−GX‖2F + λ‖X‖, λ > 0 . (3.6)
The formulations in (3.6), (3.5) and (3.4) are equivalent if the problem is convex. We
will refer to (3.6) as the penalized formulation of the problem. The parameter λ controls
the “trade-off” between the fidelity to measurements and noise sensitivity. It balances the
reconstruction error and the regularity of the solution. The lower the level of noise present in
the measurements, the smaller should be the reconstruction error. There is a correspondence
between the parameter η and the parameter λ, although this link is usually not explicit.
In the following paragraphs, we will explore a series of approaches that can be cast into
this Lagrangian formulation. Focus will be put on a series of variants involving ℓ2 norms. For
priors involving non differentiable constraints and typically ℓ1 norms, we refer the reader to
chapter 4.
3.2.1 The Minimum-Norm solution
In the previous paragraph, we explained that the resolution of the inverse problem with dis-
tributed source models leads to an optimization problem where the fit of the data is balanced
by a penalization based on a particular norm. In this regard, any distributed inverse solver
97
is a “minimum-norm” problem. However, in the M/EEG community, the Minimum-Norm
solution usually only refers to a minimization of an ℓ2 norm [101, 220].
Regularization of inverse problems with the ℓ2 norm was introduced by Tikhonov [203]
and is known in statistics as ridge regression.
3.2.1.1 Minimum-norm equations
The standard Minimum-Norm solution is obtained by solving:
X∗ = arg minX
E(X) = arg minX
‖M−GX‖2F + λ‖X‖2F , λ > 0 (3.7)
The solution of this unconstrained and differentiable problem is obtained by setting the
derivative with respect to X to 0:
dE
dX= 0
⇔ −GT (M−GX) + λX = 0
⇔ (GT G + λI)X = GT M
⇔ X = (GT G + λI)−1GT M
(3.8)
The solution X∗ is given by a simple matrix multiplication:
X∗ = (GT G + λI)−1GT M . (3.9)
The fact that the inverse solution is given by a simple matrix multiplication is a general
property of ℓ2 based methods. This property makes them really attractive, although it can
happen that computing the inverse operator is intractable in practice.
To understand this, one can observe that equation (3.9) involves computing the matrix
GT G ∈ Rdx×dx , where dx is the dimensionality of the source space, and inverting a matrix of
this size. When considering realistic cortical models this computation becomes impossible.
To give an order of magnitude, a matrix in double precision with 10 000 lines and columns
contains 108 elements. A double precision number takes 8 bytes in memory which means that
the matrix requires 8 · 108 = 0.8 GB of RAM just for storage. On a standard computer, even
nowadays, inverting such a matrix can become a computational burden.
To circumvent these limitations, the following trick is used:
Lemma 3.1. Matrix Inversion (Woodbury matrix identity)
(A + UCV)−1 = A−1 −A−1U(C−1 + VA−1U)−1VA1 (3.10)
or with A = I and C = I
(I + UV)−1 = I−U(I + VU)−1V . (3.11)
Applying equation (3.11) to equation (3.9), with λ = 1 for simplicity, leads to
(GT G + I)−1GT
= (I−GT (I + GGT )−1G)GT
= GT (I + GGT )−1(I + GGT −GGT )
= GT (I + GGT )−1
(3.12)
The solution X∗ is now given by:
X∗ = GT (GGT + λI)−1M , (3.13)
98 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
which involves the inversion of a small matrix in Rdm×dm .
By comparing the MN solution in (3.9) and the LCMV beamformer in (3.1), similarities
can be observed. This justifies the limited discussion on beamforming techniques and we
refer the reader to [152] where the authors detail how to relate linear beamformers such as
LCMV to the Minimum-Norm solutions.
3.2.1.2 Choosing the regularization parameter
The naive but quite efficient approach
The λ is related to the level of noise present in the measurements. Schematically, the larger
the noise amplitude, the larger the reconstruction error and the larger λ should be. If λ
increases, ‖X∗‖ decreases and the reconstruction error increases. In theory, this parameter
has to be estimated on each dataset since it depends on the data. However, a strategy exists
to get a reasonable estimate of λ. This strategy is used by the Brainstorm toolbox [10].
In order to understand this method, it is necessary to introduce the singular value decom-
position (SVD) of G:
G = USVT ,
where the matrices U and V are square unitary matrices, i.e., UT U = I and VT V = I, and
the matrix S is diagonal.
The diagonal entries of S are the singular values (si)i of G. The (si)i are ordered such that
|s1| > |s2| > · · · > |sdx|. By replacing G by its SVD in equation (3.13) we get:
X∗ =GT (US2UT + λI)−1M
=GT (U(S2 + λI)UT )−1M
=GT U(S2 + λI)−1UT M
=VS(S2 + λI)−1UT M
(3.14)
The matrix (S2 + λI)−1 is also diagonal and its diagonal coefficients are (s2i + λ)i. The
λ should therefore take a value comparable to the (s2i )i. The heuristic choice of λ proposed
by the first strategy consists in setting λ = 0.01s21. This rule of thumb works quite well in
practice.
Brainstorm’s implementation also removes singular values for which s2i < dm10−7s21.
The L-curve
The L-curve approach was originally proposed by Hansen in [103]. The idea is to compute for
multiple values of λ the value of the norm ‖X∗‖ and the reconstruction error. By plotting the
norm of ‖X∗‖ as a function of the residual ‖M −GX∗‖ in loglog one gets a curve similar to
the curve presented in the illustration figure 3.2. This curves describes an “L” and the best
λ is obtained at the corner of the curve. It is estimated in practice by looking for the point
with the highest curvature. Hansen argues that when λ is smaller than this optimal value,
the inverse solver reconstructs part of the noise. In [103], Hansen lists a set of conditions to
guarantee that the resulting curve describes an L. One of the conditions is that the signal
measured in not too buried in noise. It is observed that when increasing the amplitude of the
additive noise, the corner of the curve, used to estimate the λ, becomes harder to see.
When increasing the λ from 0 to∞, the 2D point (‖M−GX∗‖, ‖X∗‖) goes from the upper
left extremity of the curve to the lower right extremity. Thus, for a larger λ the reconstruction
error increases.
The generalized cross-validation (GCV)
The generalized cross-validation (GCV) is an alternative to the L-curve from Hansen. It
99
10−10
100
102
residual norm || A x − b ||
so
lutio
n n
orm
|| x ||
Figure 3.2: L-curve in Minimum-Norm estimator (loglog plot). The lambda is estimated by
searching the point of highest curvature.
was originally proposed by Wahba and Golub in [90, 218]. The idea behind the GCV is to
say that λ is correctly defined if the measurements on dm − 1 sensors can help to predict
the measurements of the left out sensor, hence the term cross-validation. Wahba and Golub
showed that this prediction error, averaged across all sensors, could be computed with a
closed-form for a given λ.
Let us denote the measurements on sensor j by Mj and by M|j the measurements ob-
tained by removing the jth sensor. The sources estimated with M|j are denoted X∗|j and the
leadfield obtained after removing row j is denoted G|j . The jth row of G is denoted Gj . Using
3.13, we get
X∗|j = GT
|j(G|jGT|j + λI)−1M|j
The generalized cross-validation error is defined as:
G(λ) =∑
j
‖Mj −GjX∗|j‖2F
Wahba and Golub have shown that this function of λ can be easily computed with the
following formula:
G(λ) =‖M−GX‖2F
(trace(I−GGT (GGT + λI)))2(3.15)
Finding the best λ consists in minimizing this function with respect to λ. Such a function
G(λ) is illustrated in figure 3.3.
The equation (3.15) is obtained after assuming that the noise is independent and identi-
cally distributed across sensors. When this assumption does not hold the performance of the
GCV can be affected. This is a limitation pointed out by Hansen in his presentation of the
L-curve method. In practice, this issue can be addressed by pre-whitening the data.
3.2.2 Variants around the minimum-norm solution
We will now present some alternative approaches also based on ℓ2 penalization, namely, the
weighted minimum-norm (WMN), the dSPM [49] and the sLORETA [171] methods. All these
methods work time instant by time instant, and therefore do not make use of the temporal
correlations of the activations. Finally, we will present an ℓ2 based method that takes into
100 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
10−15
10−10
10−5
100
10−28
10−26
10−24
10−22
lambda
G(lam
bda)
Figure 3.3: Generalized Cross Validation with Minimum-Norm estimator. The vertical line
points the minimum of the GCV which provides the value of lambda.
Setting λ with a percentage like exposed in paragraph 3.2.1.2:
1 clear options
2 % 10 percents corresponds to Brainstorm’s default lambda
3 options.pct = 10;
4 [X,Ginv] = mn_inverse(M,G,options); % X = Ginv * M;
Or manually:
1 clear options
2 options.lambda = 1e-5;
3 [X,Ginv] = mn_inverse(M,G,options);
Or with generalized cross-validation:
1 clear options
2 options.use_gcv = true;
3 [X,Ginv] = mn_inverse(M,G,options);
Or with the L-curve:
1 clear options
2 options.use_lcurve = true;
3 [X,Ginv] = mn_inverse(M,G,options);
Table 3.3: Running a Minimum-Norm with EMBAL .
101
account this temporal information.
3.2.2.1 The weighted minimum-norm (WMN)
When applying a simple Minimum-Norm solution, each dipolar source is penalized equiva-
lently, although the columns of the leadfield matrix G are not normalized. Sources that are
close to the sensors have a higher forward field, i.e., the effect on the sensors of a small acti-
vation is big. As a consequence, the minimum-norm solution is biased towards the superficial
sources. The WMN was originally proposed to cope with this problem.
The weighted Minimum-Norm solution corresponds to the problem:
X∗ = arg minX
E(X) = arg minX
‖M−GX‖2F + λ‖WX‖2F , λ > 0 . (3.16)
The matrix W ∈ Rdx×dx is the weighting matrix. To guarantee a valid penalization, it is
necessary to impose that W is not singular, i.e., W−1 exists. If W is singular, then the cost
function to minimize might not be strictly convex, leading to a non unique solution.
Setting Y = WX and replacing G by G = GW−1 in equation (3.13) leads to
Y∗ = arg minY
E(X) = arg minY
‖M− GY‖2F + λ‖Y‖2F , λ > 0 . (3.17)
Using (3.13), we get that
X∗ = W−1Y∗ = (WWT )−1GT (G(WWT )−1GT + λI)−1M . (3.18)
We observe that computing X∗ with a WMN using equation (3.18) requires to be able to
invert a big matrix WWT ∈ Rdx×dx , which is what was previously avoided thanks to the
matrix inversion lemma.
In order to make the equation (3.18) tractable the weighting matrices W have to be diago-
nal, or easily invertible. We will now show how such a diagonal matrix can be chosen in order
to avoid the bias towards superficial sources.
The amplitude of the forward field for a dipole close to the sensors is bigger than for a
dipole deep in the brain. Hence the standard minimum-norm that penalizes all the dipoles
equivalently tends to explain the measurements with superficial dipoles close to the sensors.
If a small amplitude for a superficial dipole can explain the measurements, the same effect
for a deeper source requires a much bigger amplitude of activation.
Let Gi denote the ith column of G and (wi)i the diagonal coefficients of W. The norm
‖Gi‖2 is the amplitude of the forward field of the dipole i. By setting wi = ‖Gi‖γ2 with γ > 0,
the bias is reduced. Typically γ is set to 1 or 0.5. In practice, setting the parameter γ is an
issue that has led to its empirical estimation with real and simulated data in [140].
In the case where we do not have access to (WWT )−1, the problem has to be solved from
the equation obtained after differentiating E(X):
(GT G + λWT W)X = GT M
Then for each column of M the problem can be solved separately using an iterative method
such as the conjugate gradient algorithm. Attention should be paid to avoid assembling and
storing the matrix GT G.
Examples that face this problem are what we call the Laplacian-based penalizations.
When trying to impose spatial smoothness on the solution a natural choice for W is to use the
surface Laplacian Lsurf of the cortical mesh. This method is generally called in the literature
the maximum smoothness solution and corresponds to the LORETA solution [172], although
LORETA was originally formulated on a grid of dipoles rather than on the cortical mesh.
102 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
1 clear options
2 options.pct = 10;
3 options.W = sqrt(sum(G.*G))’; % Set weights
4 X = wmn_inverse(M,G,options);
Table 3.4: Running a Weighted Minimum-Norm with EMBAL .
An alternative to LORETA, that also uses a spatial smoothing prior consists in penalizing
the estimate with the ℓ2 norm of the surface gradient ∇surf . The discretization of ∇surf is
denoted in matrix form by Dsurf . This solution is referred to as the HEAT solution [5], since
it can be related to the heat equation defined over the mesh. The surface gradient Dsurf
verifies DTsurfDsurf = Lsurf , which implies that the HEAT solution is given by solving
(GT G + λLsurf )X = GT M .
Remark. The gradient and Laplacian are a bit more complex to compute on a tessellated
surface than on a grid since it involves a discretization with P1 elements of the activation
map. See chapter 2 for more details on P1 elements and their role in finite element methods.
We will come back to the Laplacian-based methods when considering spatiotemporal reg-
ularizations.
3.2.2.2 The ℓ2 priors and Gaussian models
Up to here, ℓ2 priors have been presented without much attention drawn onto the underlying
assumptions. In order to understand the link between ℓ2 priors and Gaussian models, it is
necessary to relate the ℓ2 penalization model (3.7) to Bayesian estimation.
Let us assume that M and X are random variables. According to Bayes’ rule, we have:
P (X|M) =P (M|X)P (X)
P (M).
The optimal X is obtained by estimating a maximum a posteriori (MAP), i.e., solving:
X∗MAP = arg max
X
P (X|M)
= arg maxX
P (M|X)P (X)
= arg minX
−log(P (M|X))− log(P (X))
(3.19)
The noise and the source amplitudes are assumed to be Gaussian variables with zero
mean and respectively ΣE and ΣX as covariances. This leads to:
X∗MAP = arg min
X
(M−GX)T Σ−1E (M−GX) + XT Σ−1
X X
= arg minX
‖M−GX‖2ΣE+ ‖X‖2ΣX
(3.20)
We recall that ‖X‖Σ = trace(XT Σ−1X). Using a simple derivation and the matrix inver-
sion lemma 3.1, the solution of this problem is given by:
X∗MAP = ΣXGT (GΣXGT + ΣE)−1M . (3.21)
We observe that the standard MN corresponds to the case where ΣX = I and ΣE = λI.
The Bayesian approach offers a convenient framework for modeling the noise. Note that
103
in (3.21), the source covariance ΣX does not need to be inverted. This implies that if the
prior is defined on the covariance, there is no need to invert a matrix of size dx × dx. Another
interesting aspect of the Bayesian approach is its ability to formalize model selection methods.
This will be detailed later in section 3.3.2.
3.2.2.3 Noise normalized methods: dSPM and sLORETA
The central idea behind noise normalized methods, as they are being called in the M/EEG
community, is to represent on the cortex not the activity itself, but a dimensionless statis-
tical quantity. As we will see, using statistical quantities attenuates the bias towards the
superficial sources and also has the advantage of providing a natural way of thresholding the
reconstructed estimates.
For simplicity, we will restrict the presentation of both methods to the inverse problem
with constrained orientations.
The mathematical foundations of both of these methods come from the knowledge on
Student’s statistical distributions. Let (xi)i = 1, . . . , N be N normally distributed random
samples. We denote by x the empirical mean of the (xi)i and σemp the empirical standard
deviation:
x =1
N
N∑
i=1
xi ,
and
σemp =
√
√
√
√
1
N − 1
N∑
i=1
(xi − x)2 .
One can prove that, if the true mean of the (xi)i equals µ, then the quantity defined as:
x− µσemp/
√N
follows a T-distribution, also called Student’s distribution, with N − 1 degrees of freedom.
This quantity is the T-statistic of the sample.
Definition 3.1 (Student’s T-distribution). Student’s t-distribution with n degrees of freedom
has a probability density function given by:
p(x) =Γ(n+1
2 )√nπΓ(n
2 )
(
1 +x2
n
)−n+12
,
where Γ is the Gamma function: Γ(x) =∫∞0tx−1e−tdt.
When dealing with M/EEG data the classical question is to know whether a given exper-
imental stimulation created an effect in the subject’s brain and where. An effect is always
measured in comparison with a reference level usually estimated in the period before stim-
ulation, called the baseline period. To take the baseline period into account, a classical pre-
processing of the data is called the baseline correction. It consists in subtracting to every
sensor the mean value of the signal measured during the baseline period. This being done,
the model, mentioned is section 3.2.2.2, that assumes that the sources X and the noise E have
0 mean holds.
The relevant T-statistic to consider is therefore reduced to:
x
σemp/√N
.
104 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
Without going into more details on statistical tests, under the Gaussianity assumption, if this
statistic is large in absolute value, an effect is detected.
The point where dSPM and sLORETA differ is on the way to estimate the empirical stan-
dard deviation.
Dynamic Statistical Parametric Mapping (dSPM)
With the dSPM method, the variability of the estimate is assumed to exist due to the
additive noise. The sources X are not supposed to be random variables, unlike the noise E. If
we assume that E is Gaussian and that the covariance matrix of each of its columns is given
by ΣE, using the MN inverse formula, we get that the covariance of each of the columns of X∗
is given by C = HΣEHT . The matrix H denotes here the MN inverse H = GT (GGT + ΣE)−1
obtained with (3.21). The dSPM method estimates the variances of the sources with the
diagonal elements of C. We get:
TdSPM = RX∗ , (3.22)
where R is a diagonal matrix whose coefficients are Rii = 1/√
Cii.
Once the values of T are computed, we obtain statistical maps, instead of current esti-
mates, and active sources are detected by thresholding them. The threshold can be related to
a p-value according to the T-distribution calculated under Gaussianity assumptions.
Since the number of time samples used to calculate the noise covariance matrix C is quite
large (typically more than 100), the T-distribution approaches a unit normal distribution.
With a large number of samples, the empirical variance approaches the true variance and
the T-statistic becomes equivalent to a z-score, a.k.a., a standard score.
sLORETA
In the sLORETA method, the variability is supposed to come also from the sources. If we
denote by ΣX the covariance matrix of the sources, this implies that the covariance of the
source estimates is given by C = H(ΣE + GΣXGT )HT where the matrix H is now given by
H = ΣXGT (GΣXGT + ΣE)−1. The matrix C is often called the resolution matrix. The result
is obtained as with the dSPM method using (3.22).
In practice the source covariance is not observed and without any learning procedure, this
covariance has to be fixed a priori. Like for the standard minimum-norm, this covariance is
usually set to ΣX = I.
Remark. When considering multiple orientations at each cortical position the statistical quan-
tity considered follows an F-distribution (cf. equation 9 in [140]) but the philosophy of the
method is the same.
One can prove that in a noiseless case, if only one source is active, the sLORETA method
has no localization bias [171].
The dSPM solver is probably more classical, in the sense that the sources are fixed and
only the additive noise is a Gaussian random variable. With the sLORETA solver, the sources
are also Gaussian random variables, which relates this method to the Bayesian approaches
exposed later in this chapter.
3.2.2.4 Spatiotemporal minimum-norm estimation
In the previous paragraphs, we have seen how to set a spatial smoothness prior in the inverse
problem. This was done for example with a spatial Laplacian defined over the mesh. We will
now see how a similar approach can be used to impose temporal smoothness for the current
estimates X∗.
105
(a) Result obtained with dSPM (b) Result obtained with sLORETA
Figure 3.4: Illustration of thresholded statistical map obtained with the dSPM and sLORETA
methods on somato sensory data recorded with MEG.
Let Ltime be a temporal Laplacian operator. For a 1D signal x = (xt)t, the Laplacian can
be approximated by:
(Ltimex)t =xt−1 − 2xt + xt+1
4.
With our notations, whereas the spatial Laplacian is applied by multiplication on the left,
the temporal Laplacian is applied by multiplication on the right. The currents X∗ are now
estimated by solving:
X∗ = arg minX
‖M−GX‖2F + λ‖LsurfX‖2F + µ‖XLtime‖2F , (3.23)
where λ > 0 and µ > 0 are the spatial and temporal regularization parameters.
Equation equation (3.23) can be solved very elegantly using Kronecker products. We refer
the reader to Appendix A for an introduction to Kronecker products manipulation. We just
briefly recall that the stacking operator “vec” converts a matrix A to a vector by stacking all
the columns of A. The following equations make an extensive use of the following identity:
vec(AXB) = (BT ⊗A)vec(X) ,
assuming that the dimensions of the matrices A, B and X agree.
The cost function in equation (3.23) becomes:
‖M−GX‖2F + λ‖LsurfX‖2F + µ‖XLtime‖2F=‖vec(M−GX)‖2F + λ‖vec(LsurfX)‖2F + µ‖vec(XLtime)‖2F=‖vec(M)− (I⊗G)vec(X)‖2F + λ‖(I⊗ Lsurf )vec(X)‖2F + µ‖(LT
time ⊗ I)vec(X)‖2F=‖vec(M)− Gvec(X)‖2F + λ‖(I⊗ Lsurf )vec(X)‖2F + µ‖(LT
time ⊗ I)vec(X)‖2F
(3.24)
where G stands for I⊗G.
By differentiating with respect to vec(X) and setting the derivative to 0 we get:
(GT G + λ(I⊗ Lsurf )T (I⊗ Lsurf ) + µ(LTtime ⊗ I)T (LT
time ⊗ I))vec(X) = GT vec(M)
(GT G + λ(I⊗ LTsurfLsurf ) + µ(LtimeL
Ttime ⊗ I))vec(X) = GT vec(M)
(3.25)
The matrix on the left hand side is extremely large and cannot be stored in memory. How-
ever multiplying a vector by this matrix is not very computationally expensive since Lsurf
and Ltime are sparse matrices. The problem is solved using a conjugate gradient method.
106 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
As with standard minimum-norm methods, i.e., with no temporal smoothness, the dif-
ferent techniques detailed above can be applied. The parameters can be set by GCV with
a function G(λ, µ). We can imagine an L-curve approach where we would look for a point of
highest curvature on a 2D surface parametrized by λ and µ [13, 26]. The dSPM and sLORETA
methods can also be extended. The only pitfall here is that, contrary to the standard MN, the
inverse matrices cannot be explicitly computed. We need for each pair (λ, µ) to run an itera-
tive solver, which can make the GCV and L-Curve methods particularly time consuming.
3.3 LEARNING-BASED METHODS
In previous sections, the ℓ2 priors used in the penalization of the inverse problem are de-
fined a priori. Following the explanations in section 3.2.2.2, this means that the proposed
methods assume a predefined covariance matrix for the sources. In the following paragraphs,
we will present inverse solvers that aim at designing a prior based on the data. The source
covariance matrix, i.e., the weights in the ℓ2 penalization term, is “learned”. We will also say
that the model is learned from the data [201].
For simplicity, we will present the following method in the context of instant-by-instant
inverse computation. The methods presented in this section use the Bayesian formulation of
the inverse problem with the assumption of no temporal correlations. We recall the Bayesian
framework from section 3.2.2.2:
p(X|M) =p(M|X)p(X)
p(M). (3.26)
where we assume Gaussian variables:
E ∼ N (0,ΣE) (3.27)
X ∼ N (0,ΣX) (3.28)
and an additive model:
M = GX + E . (3.29)
If ΣE and ΣX are known, X is obtained by maximizing the likelihood which leads to:
X∗ = arg minX
‖M−GX‖ΣE+ ‖X‖ΣX
, (3.30)
which leads to:
X∗ = ΣXGT (GΣXGT + ΣE)−1M .
In this framework the prior is an ℓ2 norm and learning the prior means learning ΣX, i.e., the
source covariance matrix. One may also want to learn the noise covariance matrix ΣE. Note
that in the WMN framework, learning ΣX consists in learning the weights.
In the case where ΣX and ΣE are not fixed a priori, these parameters define the model
commonly denotedM. Bayes’ rule can be rewritten:
p(X|M,M) =p(M|X,M)p(X|M)
p(M|M). (3.31)
p(X|M,M) is called the posterior.
107
p(M|X,M) is called the likelihood.
p(X|M) is called the prior.
p(M|M) is called the model evidence.
3.3.1 Model selection using a multiresolution approach: MiMS
The Multiresolution Image Model Selection (MiMS) algorithm was proposed by Cottereau
et al. in [43]. Its principal motivation was to be able to reconstruct spatially extended active
regions, but also to quantify their extents. The idea behind this inverse solver is first to
estimate weights using a multiresolution approach, and second to use these weights in the
WMN framework. The multiresolution step is the learning step. The building blocks of the
MiMS source model are parcels of the cortical surface, designed at multiple spatial resolutions
in combination with anatomical and functional priors. The sources on the parcels are modeled
with current multipoles.
The procedure is iterative and goes from a coarse to a fine resolution. At each iteration k,
the source spaceMk is segmented into Nc parcels, or clusters, (Ckj )j of elementary sources:
Mk = Ckj , j ∈ [1, Nc]. (3.32)
A cluster is a cortical region, as illustrated in figure 3.5. The authors call Mk a piecewise
image model at resolution k.
From iteration k to iteration k + 1, some sources are eliminated. After removing these
sources, a new image model at resolution k + 1 is defined. The procedure is summarized by
the following steps:
1. Design of the piecewise image model Mk+1 at resolution k + 1 from the elementary
sources that survived previous eliminatory procedure of Step 3.
2. Compact parametric modeling of regional neural activity from each elementary cluster
Ck+1j inMk+1.
3. Model selection: eliminate using generalized cross-validation (GCV) the least-significant
source cluster fromMk and loop back to Step 1.
We now briefly detail the technical aspects of the modeling and selection steps (see [6, 43]
for a full description). At each resolution k, the available cortical sources are clustered in Nc
patches Ckj of similar surface area. For MEG, Jerbi et al. showed in [119] that the current
quadrupolar expansion was an adequate model for extended sources (> 5 cm2) with only 7
moment parameters accounting for the cortical activity generally supported by about 100
elementary dipoles in conventional imaging models (see section 2.3.3 ). At resolution k,
the activity of cluster Ckj is modeled up to its quadrupolar expansion about its geometrical
centroid (cf. figure 3.5).
Let us denote Hk the forward field of the parcels at resolution k, and Qk the parameters
of the multipoles. The model becomes:
M = HkQk + noise . (3.33)
The operator Hk has 7Nc columns. It is possible to enforce the problem to be overdetermined
by assuming that 7Nc is smaller than the number of sensors. With an MEG device with 151
sensors, this leads to 21 parcels at each resolution.
108 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
Figure 3.5: Illustration of a 300 mm2 cortical patch. Moments of current quadrupolar ex-
pansion are obtained from the elementary dipole moments supporting the patch geometry,
expanded about the parcel centroid.
The least-significant cluster in the source model is removed by computing the Generalized
Cross-Validation error (GCV-error) for Nc submodels indexed by j consisting of all clusters in
Mk except Ckj . The cluster Ck
j0associated with the smallest GCV-error is supposed to be the
least significant and is removed at resolution k+1. At step k+1, the remaining Nc−1 regions
of the cortex are redivided in Nc parcels.
At the end of this exhaustive procedure when no more parcel can be removed, the best
model in the GCV-error sense is selected retrospectively (see figure 3.6). As the initial cortical
parcellation at k = 0 is arbitrary and coarse, the entire process is restarted L times. A
weighted summation of the individual GCV errors across the L best models yields a so-called
multiresolution clustering (MRC) frequency map of dipole amplitudes. It is this frequency
map that is used as weights in a WMN (see (3.18)) in order to get the current estimates.
Figure 3.6: GCV error vs. spatial resolution k in semilog scale. A selection of image source
modelsMk are shown with their associated cortical parcels. Here, the global minimum of the
GCV error is reached for k∗ = 62 and the corresponding imaging model M∗k is magnified in
the green boxplot (Adapted from [43]).
109
This method presents an interesting multiresolution approach and uses a relevant mod-
eling of active parcels via current quadrupoles. By doing so it provides a solution to the
challenging problem of estimating the spatial extent of active cortical regions. However, a
few critiques remain. First, this approach can only be used with spherical models in MEG (cf.
chapter 2 in section 2.2.2). It cannot be used with neither EEG nor MEG when considering
non spherical models. Second, it may appear relatively strange to use the frequency map
as estimators for source variances. Nevertheless, simulation studies [43] and experimental
results on retinotopic mapping [6, 42] obtained with this solver demonstrate its ability to
provide good localization results. The retinotopic data are presented in chapter 5.
3.3.2 Restricted Maximum Likelihood (ReML) and Sparse Bayesian
Learning (SBL)
More classical Bayesian learning methods use the evidence framework in order to learn
adaptive parametrized priors from the data itself. A MAP estimate only gives the maximum
of the posterior. In practice, this posterior might be multimodal implying that the maximum
is not representative of the full posterior. The following approaches aim at estimating the
posterior probability mass is order to provide better source estimates. This is done by maxi-
mizing the model evidence.
Restricted Maximum Likelihood (ReML)
Restricted Maximum Likelihood [105] was introduced to the neuroimaging community by
Friston et al. [81]. Even if this ReML method has been applied to other problems than M/EEG
source imaging, we will present it in this particular context. The general idea of this approach
is to estimate hidden variables also called hyperparameters with an iterative procedure that
amounts to Expectation-Maximization (EM) update rules [60]. The quantity that drives the
learning procedure is the model evidence: p(M|M).
The model is the following:
ΣE =∑
i
µiQiE (3.34)
ΣX =∑
i
λiQiX (3.35)
where the QiE and Qi
X are covariance matrices defined a priori. The µi and λi are positive
hyperparameters that need to be learned in order to estimate the “good” model.
Like a standard EM algorithm the procedure consists is maximizing a non-convex likeli-
hood by maximizing a convex surrogate functional for which the optimization is tractable.
110 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
The likelihood that needs to be maximized is: p(M|M) = p(M|λ, µ). We have:
log(p(M|M)) = log
(∫
X
p(M,X|M)dX
)
= log
(∫
X
q(X|M)p(M,X|M)
q(X|M)dX
)
≥∫
X
q(X|M)log
(
p(M,X|M)
q(X|M)
)
dX (Jensen Inequality)
≥⟨
log
(
p(M,X|M)
q(X|M)
)⟩
q(X|M)
≥⟨
log
(
p(M|X, µ)p(X|λ)
q(X|M)
)⟩
q(X|M)
≥ F(q,M) = F(q, λ, µ)
The surrogate functional F is called the free energy. It is a function of the probability
density function q and the hyperparameters (λ, µ).
The EM algorithm alternates an M-step and the E-step.
The M-step finds the Maximum Likelihood (ML) estimate of the hyperparameters:the val-
ues of (λ, µ) that maximize F while keeping q fixed:
(λk+1, µk+1) = arg maxλ,µ
F(qk, λ, µ) ,
where the values of (q, λ, µ) at iteration k are denoted: (qk, λk, µk).
Then the E-step consists in updating q using the new values of the hyperparameters
(λk+1, µk+1). It is a coordinate ascent on F :
qk+1 = arg maxq
F(q, λk+1, µk+1) ,
and one can prove that:
qk+1(X|M) = p(X|M, λk+1, µk+1) .
In their numerous contributions on this subject [78, 79, 80, 81, 146, 177, 178] the authors
present the algorithm as a maximization of a Restricted Maximum Likelihood. ReML can be
regarded as embedding the E-step into the M-step to provide a single log-likelihood objective
function.
The method presented here offers an interesting framework in order to avoid the a priori
setting of the ℓ2 prior, i.e., the source covariance matrix. Its ability to determine the regulari-
sation parameter is also particularly convenient. However, a few critiques can be made about
this approach. The optimization method requires a nonlinear search for each M-step (Fisher
scoring method) which does not guarantee a positive definite estimated covariance. While
shown to be successful in estimating a handful of hyperparameters in [146, 177, 178], this
could potentially be problematic when very large numbers of hyperparameters are present.
Also, the optimization proposed requires to invert a matrix whose number of rows is equal to
the number of hyperparameters. This is another limiting factor when using a large number
of priors.
When experimenting with the software provided in the SPM package, we observed that
111
when using a large number of covariance priors, a fraction of the hyperparameters obtained
could be negative-valued. We also observed that the procedure may fail to converge and
oscillate between two sets of parameters. It is probably to avoid such problems that the SPM
implementation imposes a fixed number of EM iterations. This may appear rather surprising
since standard EM algorithms usually require many iterations to converge.
The following approach, based on what is referred to as Sparse Bayesian Learning (SBL)
and Automatic Relevance Determination (ARD), offers a more principled approach.
Sparse Bayesian Learning and γ −MAP
The Sparse Bayesian Learning (SBL) approach [142, 160] is an extremely important al-
ternative to the point estimate obtained by simple maximum likelihood. This approach is
also based on the maximization of the model evidence, but the procedure consists in selecting
among a very high number of priors, the ones that fit the best to the data. Contrary to the
ReML solver that is limited to a small number of priors, covariance templates in this case, the
following method is not. It achieves model selection with a sparsity inducing cost function.
In this context, SBL consists in maximizing a tractable Gaussian approximation of the
evidence, also known as the type-II likelihood or marginal likelihood:
p(M|ΣM) =
∫
p(M|X)p(X|ΣM)dX = N(0,ΣM) ,
where ΣM stands for the measurements covariance matrix. This is equivalent to minimizing
the negative log marginal likelihood:
L =− log p(M|ΣM)
=− log
(
1√
(2π)dm |ΣM|dt
exp
(
− trace(
MT Σ−1M M
)
2
))
. (3.36)
For convenience, the log marginal likelihood is simplified and redefined as:
L = dt log |ΣM|+ trace(
MT Σ−1M M
)
. (3.37)
With the additive model (3.29), ΣM is given by:
ΣM = GΣXGT + ΣE .
The model proposed for ΣX is ΣX =∑ds
i=1 γiCi. The (γi)i are the hyperparameters and
the (Ci)i are the a priori source covariance matrices. With this model the likelihood L is
parameterized by the (γi)i. The (Ci)i can be defined as Ci = eieTi where ei is a vector with
zeros everywhere except at the ith element, where it is 1. Such covariance matrices model
isolated dipolar sources. It is also possible to use general covariance matrices and particularly
some that model extended activation, patches. This was proposed in [182].
The cost function (3.37) induces sparsity via the term log |ΣM|, a.k.a., volume-based regu-
larization. It penalizes a measure of the volume formed by the model covariance ΣM. In high
dimensions the volume is more efficiently reduced by setting a few dimensions to 0 rather
than by diminishing all of them by a small factor. This penalty term promotes a model covari-
ance that is maximally degenerate (or non-spherical), which pushes elements of γ to exactly
zero.
If we assume that Ci = eieTi , the diagonal matrix ΣX = Γ = diag(γi) is the prior source
covariance matrix which contains the vector of hyperparameters on the diagonal (i.e., the
variances). In the ARD framework, the precisions (i.e., inverse variances) are Gamma dis-
112 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
tributed. The matrix ΣE is the noise covariance matrix, which is assumed here to be a mul-
tiple of the identity matrix (e.g., σ2EI, where σ2
E is the noise variance, a hyperparameter that
can also be learned from the data or empirically obtained from the measurements).
The optimization is run with an iterative procedure that updates the (γi)i at each step.
Various update schemes exist to optimize (3.37).
First, the evidence maximization can be achieved by using an Expectation-Maximization
update rule:
γ(k+1)i =
1
dtri
∥
∥
∥
∥
γ(k)i GT
(:, i)
(
Σ(k)M
)−1
M
∥
∥
∥
∥
2
F
+1
ritrace
(
γ(k)i I− γ(k)
i GT(:, i)
(
Σ(k)M
)−1
G(:, i)γ(k)i
)
,
where ri is the rank of G(:, i)GT(:, i), and G(:, i) is a matrix grouping the column vectors from
G that are controlled by the same hyperparameter γi [227].
It can also be achieved using a fixed-point gradient update rule (called MacKay updates):
γ(k+1)i =
γ(k)i
dt
∥
∥
∥
∥
GT(:, i)
(
Σ(k)M
)−1
M
∥
∥
∥
∥
2
F
(
trace
(
GT(:, i)
(
Σ(k)M
)−1
G(:, i)
))−1
,
or alternatively with:
γ(k+1)i =
γ(k)i√dt
∥
∥
∥
∥
GT(:, i)
(
Σ(k)M
)−1
M
∥
∥
∥
∥
F
(
trace
(
GT(:, i)
(
Σ(k)M
)−1
G(:, i)
))− 12
.
Contrary to the MacKay updates, the latter scheme guarantees that the cost function de-
creases at each iteration. The proof is based on convex analysis and the introduction of a
surrogate function.
With fixed dipole orientations, G(:, i) is a vector, but with unconstrained orientations G(:, i)
is a dm × 3 matrix. For patch source models involving dipoles within a region, G(:, i) is a
matrix containing all gain vectors associated with the local patch of cortex. The last two
update schemes are much faster than the EM rule (cf. figure 3.7). It can be noticed that
one iteration of the gradient-based update is almost identical to the sLORETA algorithm,
which is expressed in a completely different framework. Once the optimal hyperparameters
have been learned, the source estimates are given by the classical closed-form solution of the
Minimum-Norm:
X∗ = E[X|M;ΣX] = ΣXGT (ΣM)−1
M .
It is important to note that many SBL algorithms are distinguished by the parameterization
of the source covariance matrix. Here the model presented is ΣX =∑ds
i=1 γiCi. It is referred
to as the γ-MAP inverse solver [227].
If we did not include the ReML framework from Friston and colleagues in the Sparse
Bayesian Learning framework, it is because their approach with a limited number of hy-
perparameters and diagonal covariances with no non-zero elements, cannot produce sparse
estimates.
Comments on Sparse Bayesian Learning. First a remark on the implementation. It
can be observed in the update schemes detailed above that once a γi is set to 0 it stays at 0.
It then requires no more updating. This means that the faster the γi are set to 0, the faster
the loop over i. It allows for example to run the γ-MAP solver with thousands of covariance
templates. Very rapidly only a handful of γi are concerned by the update and the algorithm
converges even faster.
The second remark is related to the practical use of this solver. With a classical event
related experimental setup, a subject is asked to perform a task or simply to respond to a
113
100
102
104
−5.6
−5.4
−5.2
−5x 10
4
Iteration
Negative L
og L
ikelih
ood
EM updateMacKay updateConvex based update
Figure 3.7: Convergence rates observed with the three update schemes (EM,
MacKay,Convexity-Based Approach). The EM-based scheme appears clearly as the less ef-
ficient. Simulation was run with around 20000 covariance templates.
stimulation multiple times. For each repetition of the experiment, the M/EEG signals are
recorded, forming one trial. Let us suppose that dn trials are recorded and that each trial
contains dt successive time instants. By averaging all the dn trials, we obtain what is called
the evoked response. A first approach consists in using as input successive time frames of the
evoked response. The input data could for example be the measured evoked response between
40 and 50 ms or between 20 and 200 ms after the beginning of the stimulation. The problem
with the latter example is that source covariance is very likely to change during the time
interval. The (γi) might be different for early and late brain responses. In order to run the
inverse solver on the full temporal data, one might want to consider the possibility of letting
γi change over time. Indeed, the γ-MAP solver assumes that the noise and prior covariances
do not depend on time, which can be a problem for long time interval. Note that this remark
is at the origin of our contribution presented in chapter 6.
An alternative that does not suffer from the problem just mentioned consists in using as
input the data measured at a given time instant t∗ across the dn repetitions. This procedure
provides a set of (γi) and localization results at this particular time instant t∗ but does not
integrate temporal information. Moreover, this approach assumes no variability across trials.
The source configuration is considered to be the same at t∗ in each repetition. As illustrated
in our contribution in chapter 7, this assumption can be questioned.
Finally we would like to mention that our experience with the γ-MAP inverse solver
showed that it could provide very accurate results. While performing numerical simulations,
it showed its ability to recover very complex source configurations. However, it also proved its
sensitivity to the definition of the noise covariance matrix provided as input. A wrong noise
covariance matrix can strongly bias the localization result.
With real data, since this solver can provide with focal source estimates, we also observed
that the active source could be estimated on the wrong side of a gyrus, a location also con-
firmed by simple dipole fitting. In this example, the forward modeling was probably at the
origin of the problem. The point of this remark is that, a very precise inverse solver, requires
a very precise forward modeling.
The γ-MAP inverse solver with Ci = eieTi can be used with the code snippet in table 3.5
extracted from EMBAL .
114 CHAPTER 3. THE INVERSE PROBLEM WITH DISTRIBUTED SOURCE MODELS
1 clear options
2 options.noise_cov = C; % Set noise covariance
3 options.maxit = 500; % Set maximum number of iterations of the gammas
4 [X,Ginv,gammas] = gmap_inverse(M,G,options);
Table 3.5: Running the Gamma-MAP inverse solver with EMBAL .
115
3.4 CONCLUSION
This chapter was written to provide an overview of the state of the art of M/EEG inverse
solvers based on Gaussian assumptions and ℓ2 priors. The list of solvers presented is fairly
long but does not claim to be exhaustive.
All along this chapter, particular attention was drawn to implementation details, manda-
tory to have software able to deal with realistic datasets. We also took care of providing
personal comments on each algorithm by discussing their advantages but also their limita-
tions.
All the algorithms detailed in this chapter (except for MiMS that should soon be integrated
in the Brainstorm Toolbox) have been implemented and tested on synthetic and real MEG
data. The source code of the solvers and the demo scripts with synthetic and real data are
available in a Matlab toolbox called EMBAL (Electro-Magnetic Brain Activity Localization):
https://gforge.inria.fr/projects/embal
The following two chapters discuss additional aspects of the distributed source models:
priors other than ℓ2 and frequency domain analyses.
CHAPTER 4
M/EEG INVERSE MODELING WITH
NON DIFFERENTIABLE
CONSTRAINTS AND SPARSE
PRIORS
In chapter 3, focus has been put on distributed inverse solvers with ℓ2 priors, either with
fixed priors or learning-based approaches. When using an ℓ2 norm to regularize the inverse
problem, the cost function to minimize is differentiable and strictly convex, which leads to a
convenient solution obtained in closed-form. The ℓ2 norm is known for its good robustness to
noise. However, the ℓ2 norm suffers from various pitfalls.
The main criticism that is addressed to standard Minimum-Norm solutions, i.e., without
Bayesian learning, results from their tendency to smear out the estimated cortical currents,
often leading to solutions that are too widely extended. This is intrinsically due to the ℓ2 norm
used to regularize the inverse problem. In order to reduce this effect, a natural choice is to
use a regularizing prior that tries to limit the number of active sources, i.e., that introduces
sparsity in the source space.
At the center of sparse priors are the ℓp norms with p < 2 and particularly the ℓ1 norm that
achieves sparsity while leading to a convex problem. Such priors lead to non differentiable
optimization problems for which numerous optimization methods have been proposed in the
last few years [34, 39, 55, 63, 77, 99, 161, 180, 202, 212]. What motivated such an interest
is the ability of sparse priors to improve the solution of ill-posed inverse problems present in
machine learning and signal processing.
The major motivation for using sparse priors in M/EEG originates from the fact that they
provide a natural way to integrate relevant a priori information in the inverse problem. Such
information includes the number of active sources, the spatial extent of active regions, the
spatial or temporal smoothness of the reconstructions or even anatomo-functional knowledge
between multiple experimental conditions.
In this chapter, we review previous contributions that introduced sparse priors in the
context of M/EEG inverse modeling. The proposed optimization methods are commented on
and, eventually, simpler and more efficient algorithms are proposed. We finish this chapter by
our contributions that concern the integration of the anatomo-functional knowledge between
experimental conditions in the prior [93, 128]. As it will be shown with simulations and real
MEG data, this approach offers a principled way to achieve functional mapping with M/EEG
with better results than classic MN estimators.
117
118 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
Contents
4.1 Why use sparse priors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2 Inversion with sparse priors: Methods . . . . . . . . . . . . . . . . . . . . . 121
4.2.1 Iterative Least Squares (IRLS) . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2.2 LARS-LASSO with the ℓ1 norm . . . . . . . . . . . . . . . . . . . . . . . 123
4.2.3 Proximity operators and iterative schemes . . . . . . . . . . . . . . . . . 124
4.3 Sparsity and spatially extended activations: The Total Variation . . . . 129
4.4 Sparsity and spatiotemporal data . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.1 VESTAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.4.2 ℓ1 over space and ℓ2 over time . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5 Sparse priors with multiple experimental conditions: ℓ212 . . . . . . . . 135
4.5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.5.3 MEG study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
119
4.1 WHY USE SPARSE PRIORS?
When using an ℓ2 norm to regularize the inverse problem, the cost function to mini-
mize is differentiable and strictly convex. The strict convexity implies that the solution nec-
essarily exists and is unique. The differentiability allows to find the optimum by just setting
the derivative with respect to X to 0. This is what provides, in the ℓ2 case, the solution in
closed-form. Here, we focus on strategies that do not exhibit such computational advantages,
but that offer the possibility to integrate other prior knowledge on the solution in order to
better constrain of the inverse problem.
Historically, sparse priors have been introduced for M/EEG inverse modeling [91, 145] to
address a problem raised by standard Minimum-Norm solutions (without Bayesian learning).
The MN estimator has a tendency to smear out the estimated cortical currents, often leading
to solutions that are too widely extended.
Such behavior of standard MN estimators is intrinsically due to the ℓ2 norm used to regu-
larize the inverse problem. In order to reduce this effect a natural choice is to use a regular-
izing prior that tries to limit the number of active sources, i.e., that introduces sparsity in the
source space. This can be achieved by using ℓp (quasi)-norms with p < 2.
Definition 4.1 (ℓp norms). Let x ∈ RI . The ℓp norm for 1 ≤ p < ∞ and the quasi-norm for
0 ≤ p < 1 of the vector x is defined by:
‖x‖p =
(
∑
i
|xi|p)
1p
. (4.1)
Remarks.
• For p = 0, ‖x‖0 is equal to the number of non-zero coefficients of x.
• We only define quasi-norms with 0 ≤ p < 1 since the triangular inequality is not satis-
fied.
The ℓp norms with p close to 1 measure “diversity”. It is a notion that is opposed to “spar-
sity”. Hence, minimizing an ℓp norm with p close to 1, implies to minimize the diversity and to
increase the sparsity. A sparse vector is a vector with a small number of non zero coefficients.
Let us illustrate this concept with the classical formulation of the M/EEG inverse problem
with distributed source models:
X = arg minX
E(X) = arg minX
‖M−GX‖2F + λφ(X), λ > 0 , (4.2)
where φ(X) = ‖X‖1 or φ(X) = ‖X‖22 and X ∈ Rdx×dt . The ellipses in figure 4.1 represent the
isocontours of the datafit, while the circle and the square at the center correspond respectively
to the ℓ2 and ℓ1 “balls”. The isovalues for the ℓ1 prior are squares while the isovalues for the
ℓ2 prior are circles. At the optimum, the ellipses and the “balls” are tangent. If the tangent
point lies over a coordinate axis, a coefficient is set to zero and the solution is sparse. This
effect is illustrated in figure 4.1(c).
Popular choices for p are 0 and 1. The ℓ1 is attractive since it leads to convex optimization
problems whereas for p < 1 the problems become non convex. When computing the inverse
problem instant by instant, the ℓ1 norm is known in the M/EEG community as the Minimum
Current Estimate (MCE) inverse solution [145] while the ℓ0 norm refers to the FOCUSS
inverse solver [91, 92].
120 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
(a) ℓ2 (b) ℓ2
(c) ℓ1 (d) ℓ1
Figure 4.1: Graphical illustration of the difference between ℓ1 and ℓ2 norms. The ℓ1 ball is very
likely to be tangent to the isovalues of the datafit represented by the ellipses at its corners
as in figure 4.1(c). This produces a solution that sets a coordinate to 0 inducing sparsity.
This is not likely to happen with the ℓ2 norm as illustrated in figure 4.1(a) and figure 4.1(b).
However it can happen that both coefficients are non zeros with the ℓ1 norm as illustrated in
figure 4.1(d).
121
4.2 INVERSION WITH SPARSE PRIORS: METHODS
This section presents different algorithms that can be used for inverse computation
with sparse priors. Here, the methods assume that the inverse solution is computed instant
by instant like in the original contributions in the M/EEG community [91, 145]. The presen-
tation starts with Iterative Least Squares (IRLS) that consist in iteratively computing WMN
solutions with weights updated after each iteration [56, 139]. It is followed by a very brief
description of the LARS-LASSO algorithm [63, 202], that is an extremely powerful method
for solving the ℓ1 problem. The LARS-LASSO is a variant of the homotopy method from Os-
borne [180]. Finally methods based on proximity operators and iterative schemes are detailed
[39, 55, 161]. The latter methods are the ones used for the Total Variation (TV) problem and
the final contribution on inter-condition sparse priors.
There exist other methods for solving the ℓ1 penalization problem using iterative thresh-
olding like SPGL1 [212] and fixed point continuation [99] but their limited improvement over
simple proximal iterations does not justify their presentation here. The ℓ1 problem can also
be solved with simple coordinate descent [77] or by blockwise coordinate descent also called
Block Coordinate Relaxation (BCR) [27]. Depending on the problem of interest, the latter
methods can be very competitive.
4.2.1 Iterative Least Squares (IRLS)
IRLS with the ℓ1 norm
The Minimum Current Estimate (MCE) was introduced in the field of M/EEG by Matsuura
and Okabe [145]. As mentioned above, the MCE consists in solving instant by instant, the
inverse problem with an ℓ1 penalization. With dt = 1, the source amplitudes are denoted by a
vector x. The inverse problem with an ℓ1 prior then writes:
x∗λ = arg min
x
1
2‖m−Gx‖2F + λ‖x‖1, λ > 0 . (4.3)
This problem corresponds to the LASSO problem. Originally the authors in [145] proposed
to solve this problem using the simplex method by reformulating the problem as a Linear
Program (LP). This approach works. However, an optimal solution can be obtained by a
relatively simple IRLS algorithm.
The IRLS method is not very competitive in terms of convergence speed compared to cut-
ting edge methods for the ℓ1 prior and can suffer from numerical instabilities. However, due
to its link with Minimum-Norm solutions exposed in previous chapter, we start by presenting
this algorithm. It also has the advantage of providing the outline of the FOCUSS algorithm
for the ℓ0 prior.
Let Wk denote the weighting matrix used in the WMN at iteration k. The matrix Wk is
diagonal, Wk = diag(wki ). The WMN optimization problem related with Wk is:
minx
1
2‖m−Gx‖22 + λ
∑
i
wki |xi|2, λ > 0 .
In order to give an intuition about the algorithm, it can be noticed that the ℓ2 norm ‖x‖w,2 =∑
i wki |xi|2 is equal to the ℓ1 norm ‖x‖1 =
∑
i |xi|, when wki = 1/|xi|.
Algorithm 4.1 (IRLS ℓ1 solver).
• Initialization: W0 = I
122 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
• Compute: xk+1 = (Wk)−1GT (G(Wk)−1GT + λI)−1m
• Update the weights: wk+1i = 1/|xi|
• Stop if ‖xk+1 − xk‖ is smaller than a fixed tolerance value.
Proposition 4.1. Algorithm 4.1 converges to a minimizer of (4.3).
Sketch of the proof. Let a ∈ R+. One can prove that:
∀w ∈ R+, a ≤ fa(w) =1
2
(
a2
w+ w
)
and that fa(a) = a. The function fa is strictly convex on R+.
This gives:
minx
1
2λ‖m−Gx‖22 + ‖x‖1
= minx
1
2λ‖m−Gx‖22 +
∑
i
|xi|
= minx,w
1
2λ‖m−Gx‖22 +
1
2
∑
i
(
(xi)2
wi+ wi
)
(4.4)
The minimization is performed alternatively over w and x. For fixed x, the w at optimum is
given by: wi = |xi|. For fixed w, the problem corresponds to a weighted minimum-norm.
This proves that the algorithm 4.1 minimizes at each iteration the energy in (4.3). The
reader can refer to [56] for the proof that this iterative scheme actually leads to a minimum.
It is worth noting that the update rule for x is equivalent to:
xk+1 = ∆kGT (G∆kGT + λI)−1m ,
where ∆k is the diagonal matrix whose diagonal elements are the (|xki |)i. This prevents divi-
sion by zero when coefficients vanish as the solution becomes more and more sparse during
the iterations. More details on IRLS methods using sparse priors can be found in [56, 139].
As we have seen, the IRLS solver for the LASSO problem (4.3) is extremely simple to
implement. However, it may suffer from numerical instabilities due the limited precision of
the matrix inversion.
Note that a similar IRLS approach can also be used for a mixed norm involving grouped
variables as we will see in the section 4.4.2.
IRLS with the ℓ0 norm: FOCUSS (FOCal Underdetermined System Solver)
The FOCUSS algorithm as proposed in [91, 92], is an IRLS method used to compute,
instant by instant the inverse problem with an ℓ0 penalization. More generally, it works for
ℓp norms with p ≤ 1 [186].
The strategy is very similar to the IRLS solver used to compute the ℓ1 solution. With the
same notations it is given by:
Algorithm 4.2 (IRLS ℓ0 solver: FOCUSS).
• Initialization: W0 = I
• Compute: xk+1 = (Wk)−1GT (G(Wk)−1GT + λI)−1m
123
1 clear options
2 options.p = 1; % for Lasso
3 options.maxit = 20; % Set maximum number of iterations
4 [X] = irls_inverse(M,G,options);
Table 4.1: Running an IRLS inverse solver with EMBAL . Here p is set to 1 in order to solve
the LASSO problem. If p is set to 0, the FOCUSS solver is used.
• Update the weights: wk+1i = 1/|xi|2
• Stop if ‖xk+1 − xk‖ is smaller than a fixed tolerance value.
The difference between algorithm 4.1 and algorithm 4.2 comes from the update rule
for the weights. One can prove that updating the weights with wk+1i = |xi|p−2 leads to a
solution of the ℓp penalized problem.
We refer to [91, 92] for details on how to circumvent the bias for superficial sources with
an ℓ0 prior.
This IRLS solver can be run with EMBAL using the code snippet in table 4.1.
4.2.2 LARS-LASSO with the ℓ1 norm
The LARS-LASSO algorithm is a powerful method to solve the LASSO problem (4.3) since it
allows to compute the optimum x∗ for all values of λ in one run. The acronym LARS stands
for Least angle regression. The LARS-LASSO algorithm is a “path algorithm”. It consists in
finding the solution for biggest value of λ and following an optimal path of solutions while
decreasing the λ. The reason for which such an approach is possible comes from the fact that
in the LASSO case this path is piecewise linear.
Let us denote Gi the ith column of G and more generally GΓ the concatenation of the
columns of G whose index belongs to a set of indices Γ.
Proposition 4.2. x∗λ = (xi)i is optimal iff
∀i ∈ 1, . . . , p, |GiT (m−Gx∗
λ)| ≤ λ (4.5)
∀i/xi 6= 0, GiT (m−Gx∗
λ) = λ sign (xi) (4.6)
Sketch of the proof. Let us write the directional derivatives of E(x) around point x. One can
prove that this derivative in direction u is given by:
duE(x) = −uT GT (m−Gx) + λ
p∑
i
ui sign (xi) if xi 6= 0
|ui| if xi = 0
In order for x to be optimal one needs to have, for all u: duE(x) > 0. For a given i, by
considering both cases (xi 6= 0 and xi = 0), one gets the constraints in (4.5) and (4.6).
Let us take Γ as the active set, i.e., Γ = i/xi 6= 0, and ǫΓ the sign of the active variables,
i.e., ǫΓ = sign (xΓ).
Then equation (4.6) leads to:
x∗Γ(λ) = (GT
ΓGΓ)−1(GTΓm− λǫΓ)
This equation stays valid as long as the optimality conditions in (4.5) and (4.6) are satisfied.
On each interval of λ where they are satisfied, the solution is an affine function of λ. To
124 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
1 clear options
2 options.lambda = 1e-7;
3 X = lars_inverse(M,G,options);
Table 4.2: Running a LASSO inverse solver using the LARS algorithm with EMBAL .
compute the solution, on each interval, one has to follow the direction provided by vector
−(GTΓGΓ)−1ǫΓ. When an optimality condition does not hold anymore, the active set Γ needs
to be updated. We refer the reader to [63] for more details on the update rules. As a result,
we obtain the solution for all possible λ.
After each update, the optimal direction is obtained by inverting a square matrix in
RdΓ×dΓ , where dΓ stands for the size of the active set. The complexity of the LARS is cu-
bic in the number of active variables and can therefore be outperformed by other methods
when the optimal active set is big.
In practice, the inverse can be efficiently computed using prior inversions. Between two
calls of the inverse method, a variable can either be added to the active set or removed.
Therefore one line and one column are being added or removed to the matrix to invert. In this
case, it is possible to update the inverse, using tricks like for example the matrix inversion
lemma or efficient Cholesky updates, yielding significant speed ups.
This LARS algorithm can be run with EMBAL using the code snippet in table 4.2.
4.2.3 Proximity operators and iterative schemes
When G = I, i.e., there is no smoothing kernel or “convolution” operator, the problem in (4.3)
corresponds to:
x∗ = arg minx
1
2‖y − x‖22 + λ‖x‖1, λ > 0
= arg minx
∑
i
(
1
2(yi − xi)
2 + λ|xi|) (4.7)
In this case, the problem can be solved coordinate by coordinate, and one can easily prove
that an exact solution is given by a soft thresholding [61]:
∀i, x∗i = yi
(
1− λ
|yi|
)+
, (4.8)
where, by definition, we have (x)+def= max(x, 0). By convention, ·/0 = 0, meaning that if yi = 0
then x∗i = 0.
While the problem can be solved analytically when there is no convolution operator, G = I,
it is not the case with a general matrix G. In order to solve the general case in (4.3), one needs
to introduce the notion of proximity operator, well known in convex analysis, and the iterative
forward-backward algorithm [120].
Definition 4.2 (Proximity operator). Let φ : RP → R be a lower semicontinuous, convex
function. The proximity operator associated with φ and λ ∈ R+ denoted by proxλφ : RP → R
P
is given by
proxλφ(y) = arg minx∈RP
1
2‖y − x‖22 + λφ(x) .
Remark. When φ is the ℓ1 norm, the proximity operator is given by a soft thresholding (4.8).
125
The following algorithm provides an optimization strategy for:
x∗λ = arg min
x
1
2‖m−Gx‖22 + λφ(x), λ > 0 . (4.9)
where φ is a lower semicontinuous, convex function.
Algorithm 4.3 (Forward-Backward Proximal iterations).
• Initialize: Choose x(0) ∈ Rdx (for example 0).
• Iterate:
x(k+1) = proxµλφ
(
x(k) + µGT (m−Gx(k)))
where 0 < µ < 2|||GT G|||−1.
• Stop if ‖x(k+1) − x(k)‖/‖x(k)‖ is smaller than a fixed tolerance criterion.
Theorem 4.3. Algorithm 4.3 converges to a minimizer of (4.9), for any choice of µ ∈ [ǫ, 2|||GT G|||−1−ǫ] , ǫ > 0.
Proof. The convergence of this algorithm is guaranteed by results by Combettes et al. in [39]
using the properties of forward-backward proximal iterations originally proposed by Moreau
in [120]. Daubechies et al. in [55] prove a similar result but end up with the condition 0 <
µ < |||GT G|||−1.
In practice, we set µ = |||GT G|||−1 as it appears to provide better results.
Remarks.
• The stopping criterion proposed here is based on the ratio ‖x(k+1)−x(k)‖/‖x(k)‖. This is
certainly not the most principled way to stop the algorithm but it appears in our context
to provide an acceptable strategy. A more rigorous criteria could be based on the size of
the duality gap [20]. This is, however, may not be trivial to compute for certain priors.
• The iterations in algorithm 4.3 are also called Landweber iterations.
The solution of (4.3) is obtained by setting φ = ‖ ‖1 and using for proxµλ‖ ‖1the soft thresh-
olding detailed in (4.8). This algorithm is called ISTA (Iterative Soft Thresholding Algorithm)
in the signal processing community.
In order to better understand the idea behind this method, one needs to notice that the
term GT (m −Gx(k)) corresponds to the gradient of the reconstruction error. The algorithm
can therefore be understood as an alternated minimization over the regularization term, with
the proximity operator, and over the reconstruction error via simple gradient descent.
The convergence speed of this method can however be quite slow, especially if the condi-
tioning of the matrix G is bad. This is mainly due to the fact that the step size scaled by µ
is fixed and can be relatively small. This issue can be fixed using more complex optimiza-
tion schemes proposed by Nesterov[161]. A convenient rewriting of these algorithms using
proximity operators is presented in [224] in chapter 4. We rewrite it here, with our notation:
Algorithm 4.4 (Nesterov scheme with proximity operators).
• Initialize: Choose x(0) ∈ Rdx (for example 0).
• Set auxiliary variables: a = 0, g = 0, µ = |||GT G|||−1.
• Iterate:
126 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
– t = 2µ
– b = t+√
t2+4ta2
– v = proxaλφ(x(0) − g)
– u = ax(k)+bva+b
– x(k+1) = proxλµφ(u + µGT (m−Gx(k)))
– g = g − bGT (m−Gx(k+1))
– a = a+ b
• Stop if ‖x(k+1) − x(k)‖/‖x(k)‖ is smaller than a fixed tolerance criterion.
Nesterov proved that E(x(k))−E(x∗) decreases in O(1/k2) and that this is the best conver-
gence rate that can be achieved by a first-order method. A first-order method is a method that
only requires to compute gradients, i.e., first derivatives. What can however be reproached to
this algorithm is that when modifying λ, the “history” of the gradients that have been used
in this multistep approach needs to be cleared. In this sense, knowing x∗ for a given λ does
not help much to find the optimum for another λ even close. Nesterov’s scheme does not
completely benefit from “warm restarts”.
In practice, for numerical stability of the algorithm, we observed that increasing µ can be
necessary. In our implementation we set µ = (1.05 · |||GT G|||)−1.
Each iteration in algorithm 4.4 contains two gradient computations and two calls to the
proximity operator. It is twice more than an iteration in algorithm 4.3 . However the cost is
fully justified by the speed of convergence (cf. figure 4.2).
Remark. We presented Nesterov’s optimization scheme using a quadratic reconstruction er-
ror. However Nesterov’s scheme can be applied to any functional of the form:
J (x) = ψ(x) + φ(x) ,
where ψ is differentiable with an L-Lipschitz derivative and φ a lower semicontinuous convex
function. In our case the Lipschitz constant L is given by the spectral norm |||GT G|||.
100
101
102
103
6
6.2
6.4
6.6
6.8
7
iteration
Energ
y
Landweber
Nesterov
Figure 4.2: Comparison of convergence speed between Landweber and Nesterov iterative
schemes. Computation was run with a real MEG leadfield and an ℓ1 prior.
127
1 clear options
2 options.maxit = 500;
3 options.mode = ’nesterov’;
4 % options.mode = ’landweber’; % Or use landweber
5 options.lambda = 1e-7;
6 options.penalty = ’l1’; % Use an L1 prior \ie a Lasso
7 X = prox_inverse(M,G,options);
Table 4.3: Running an inverse solver using proximity operators with EMBAL .
Optimizing with a constraint on the reconstruction error
In practice, it happens that the inverse problem presented in its penalized form as in (4.9) is
not the most natural way to constrain the inverse problem. One may have a good estimate
of the noise amplitude, with the baseline period for example, and therefore of the norm of
the residual, a.k.a., the reconstruction error. Hence a natural formulation of the constrained
problem is:
x∗λ = arg min
x
φ(x), s.t.‖m−Gx‖2 ≤ δ, δ > 0 . (4.10)
In order to solve this problem, we propose the following empirical strategy that consists
in updating the λ after p forward-backward iterations using the current value of the recon-
struction error. The parameter λ now depends on the iteration number and is indexed by k:
λ(k).
Algorithm 4.5 (Forward-Backward Proximal iterations with constraint on the residual).
• Initialize: Choose x(0) ∈ Rdx (for example 0).
• Iterate:
x(k+1) = proxµλ(k)φ
(
x(k) + µGT (m−Gx(k)))
where 0 < µ < |||GT G|||−1.
• Update λ: If t+ 1 ≡ 0 (mod p)
λ(k+1) = λ(k) δ
‖m−Gx(k)‖2
• Stop if ‖x(k+1) − x(k)‖/‖x(k)‖ is smaller than a fixed tolerance criterion.
In practice, updating λ every 10 iterations, i.e., p = 10, is a good trade-off between the
computational cost of computing the residual and the speed of the convergence observed.
This trick that consists in dynamically changing the λ was proposed by Chambolle in [30]
when regularizing with the Total Variation. Our experience confirms that it also works with
all the convex priors detailed in this chapter.
It is possible to run this solver with a constraint on the reconstruction error with EMBAL
using the code snippet provided in table 4.4.
Mixing sparse-priors and ℓ2 priors
A sparse prior like a ℓ1 norm leads to a convex problem. In order to guarantee the unique-
ness of the solution, one needs a strictly convex problem. An easy way to achieve this is to
128 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
0 500 1000 1500 2000−0.05
0
0.05
0.1
0.15
Iteration
Err
or
on d
ata
term
Figure 4.3: Convergence of the optimization with constraint on the reconstruction error. The
error corresponds to δ − ‖m−Gx‖2 and converges to 0.
1 clear options
2 options.maxit = 3000;
3 options.delta = 1e-2;
4 options.mode = ’landweber’;
5 options.penalty = ’l1’; % Use an L1 prior \ie a Lasso
6 X = prox_inverse(M,G,options);
Table 4.4: Running an inverse solver using proximity operators with EMBAL and a con-
straint on the reconstruction error.
add an ℓ2 term to the cost function to minimize:
x∗λ = arg min
x
1
2‖m−Gx‖22 + λ
(
(1− ρ)φ(x) +1
2ρ‖Lx‖22
)
, λ > 0 . (4.11)
Adding an ℓ2 term to the Lasso problem was proposed in [239] and is called in the litera-
ture Elastic-Net. Using a gradient for the operator L was introduced under the name of the
Smooth-Lasso in [106]. In practice adding an ℓ2 term to the LASSO problem tends to produce
results that are less sensitive to the noise inherent to any real dataset. When using a gradi-
ent operator for L it also promotes neighboring active dipoles, which produces in the context
of M/EEG inverse modeling spatially consistent active patterns. This idea of mixing priors
can be found in [210].
Solving (4.11) can be done elegantly by noticing that the functional can be rewritten:
x∗λ = arg min
x
1
2‖m′ −G′x‖22 + λ(1− ρ)φ(x), λ > 0 . (4.12)
where:
m′ =
(
m
0
)
and
G′ =
(
G√λρL
)
.
The algorithms 4.3 and 4.4 can now be reformulated for the new problem (4.11).
129
1 clear options
2 options.maxit = 3000;
3 options.mode = ’nesterov’;
4 options.rho = 0.1;
5 options.L = mesh_gradientP1(points,faces);
6 options.lambda = 1e-7;
7 options.penalty = ’l1’; % Use an L1 prior \ie a Lasso
8 X = prox_inverse(M,G,options);
Table 4.5: Running an inverse solver with two priors (one non differentiable and an ℓ2 term)
using proximity operators with EMBAL .
Algorithm 4.6 (Forward-Backward Proximal iterations with an additive ℓ2 prior).
• Initialize: Choose x(0) ∈ Rdx (for example 0).
• Iterate:
x(k+1) = proxµλ(1−ρ)φ
(
x(k) + µ((1− ρ)GT (m−Gx(k))− ρLT Lx))
where 0 < µ < 2/(|||GT G|||+ λρ|||LT L|||).
• Stop if ‖x(k+1) − x(k)‖/‖x(k)‖ is smaller than a fixed tolerance criterion.
Algorithm 4.7 (Nesterov scheme with an additive ℓ2 prior).
• Initialize: Choose x(0) ∈ Rdx (for example 0).
• Set auxiliary variables: a = 0, g = 0, µ = (|||GT G|||+ λρ|||LT L|||)−1.
• Iterate:
– t = 2µ
– b = t+√
t2+4ta2
– v = proxaλ(1−ρ)φ(x(0) − g)
– u = ax(k)+bva+b
– x(k+1) = proxµλ(1−ρ)φ(u + µ(GT (m−Gx(k))− λρLT Lx(k)))
– g = g − bGT (m−Gx(k+1)) + λρLT Lx(k+1)
– a = a+ b
• Stop if ‖x(k+1) − x(k)‖/‖x(k)‖ is smaller than a fixed tolerance criterion.
This algorithm can run with EMBAL using the code in table 4.5.
4.3 SPARSITY AND SPATIALLY EXTENDED ACTIVATIONS: THETOTAL VARIATION
The ℓ1 and ℓ0 priors are not adapted to spatially extended activations. In order to understand
this, let us consider the case where the SNR is particularly bad. The lower is the SNR, the
bigger is the regularization parameter λ. By increasing λ, the sparsity of x∗ is increased.
This implies that the number of active dipoles gets smaller, i.e., the extent of the active re-
gion becomes more and more limited. Active regions become focal. Therefore, by increasing
130 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
the additive noise at the sensor level, the ℓ1 and ℓ0 priors create a bias for very focal source
distributions. To tackle this limitation, the Total Variation (TV) prior can be used.
The TV prior penalizes the solution with the ℓ1 norm of the gradient, in the present case
the surface gradient: TV(x) = ‖∇surfx‖1. It is a seminorm, since it can be equal to 0 even if x
is non-zero. Assuming x corresponds to a discretization with P1 elements on the tessellation,
∇surfx is a constant vector of R3 on each triangle. Let us denote it (∇x
px,∇ypx,∇z
px) ∈ R3,
where p indexes triangles. More details on how to compute the surface gradient with a P1
discretization can be found in [4]. With these notations, TV(x) can be written:
TV(x) =∑
p
√
(∇xpx)2 + (∇y
px)2 + (∇zpx)2
It can be proved that TV(x) is equal to the sum of the lengths of the isolevels of x [143]:
TV(x) =
∫
R
length(Ct)dt ,
where Ct = r ∈ mesh s.t. x(r) = t is the isolevel, or levelset, at level t. The consequence
is that penalizing the inverse problem with the Total Variation tends to produce piecewise
constant reconstructions with regular borders.
The TV norm has been widely used for image deconvolution and restoration [12, 30, 31].
It is related in functional analysis to the space of functions with bounded variations. For the
M/EEG inverse problem the TV prior was originally proposed in [5]. Here, we argue that the
TV regularization can be casted into the general framework of M/EEG inverse solvers with
sparsity inducing priors and that the iterative optimization schemes detailed above for the
standard ℓ1 norm can be directly adapted to invert with a TV prior.
The optimization problem considered is given by:
x∗ = arg minx
1
2‖m−Gx‖2F + λTV (x), λ > 0 . (4.13)
The TV is based on a ℓ1 norm, it is convex and lower semicontinuous. Hence, assuming
one knows how to compute the proximity operator associated to the TV norm, the Forward-
Backward iterations and Nesterov schemes can be used to optimize (4.13).
The proximity operator proxλ‖ ‖T Vcorresponds to following problem:
x∗ = arg minx
1
2‖y − x‖2F + λTV (x), λ > 0 (4.14)
This problem is known in the literature as the ROF (Rudin, Osher and Fatemi) problem
[189]. Various solvers have been proposed in the literature to solve this problem [31, 32].
Here, we detail the dual approach from Chambolle [30]. It consists in using a gradient-based
algorithm for solving the dual problem that interestingly is a smooth optimization problem
over a convex set.
The duality between the ℓ1 norm and the ℓ∞ norm reads
TV (x) = ‖∇x‖1= max
‖z‖∞≤1〈∇x, z〉 . (4.15)
We also need the adjoint relation between the gradient and the divergence operator:
〈∇x,y〉 = −〈x,div y〉 (4.16)
131
Minimization in equation (4.14) becomes:
minx
(
1
2‖y − x‖22 + λTV (x)
)
=λminx
(
1
2λ‖y − x‖22 + max
‖z‖∞≤1〈∇x, z〉
)
=λ max‖z‖∞≤1
(
minx
(
1
2λ‖y − x‖22 + 〈∇x, z〉
))
=λ max‖z‖∞≤1
(
minx
(
1
2λ‖y − x‖22 − 〈x,div z〉
))
(4.17)
The computation of the minimum and the maximum above can be exchanged because the
optimization over x is convex and the optimization over z is concave (see for example [188]).
By setting the derivative with respect to x to zero one gets:
x∗ = y + λ div z
Replacing x in previous expression leads to:
minx
1
2‖y − x‖22 + λTV (x)
=λ max‖z‖∞≤1
λ
2‖div z‖22 − 〈y,div z〉 − λ‖div z‖22
=λ max‖z‖∞≤1
−λ2‖div z‖22 − 〈y,div z〉
=− λ min‖z‖∞≤1
λ
2‖div z‖22 + 〈y,div z〉
=− 1
2min
‖z‖∞≤1λ2‖div z‖22 + 2λ〈y,div z〉
=− 1
2min
‖z‖∞≤1‖λ div z + y‖22 − ‖y‖22
=− λ2
2min
‖z‖∞≤1‖div z +
y
λ‖22 −
1
λ2‖y‖22
(4.18)
Hence z∗ is obtained by:
z∗ = arg min‖z‖∞≤1
‖div z +y
λ‖22
This provides the result from Chambolle [30].
This constrained problem can be solved by a projected gradient algorithm. The gradient
with respect to z is given by −∇(div z + y/λ) which gives the following iterative algorithm to
solve the ROF problem:
xn = y + λ div zn
zn+1i =
zni + (τ/λ)(∇xn)i
max(1, |zni + (τ/λ)(∇xn)i|)
(4.19)
Where τ is the gradient step. In order to guarantee the algorithm convergence one needs to
have:
τ ≤ 2
|||div∇|||where |||div∇||| stands for the spectral norm of the operator.
Note that Chambolle proposes an alternative strategy based on a fixed point method to
132 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
solve this constrained problem, but, our experience is that the fixed-point method does not
actually provide faster convergence rates than the simple projected gradient method. Like in
any standard gradient descent, the projected gradient just detailed can be improved with a
multistep approach [1].
The ROF problem for which we just described an optimization scheme corresponds to the
proximity operator associated with the Total Variation penalization. The real solution can
then be obtained using forward-backward proximal iterations (algorithm 4.3) or Nesterov
iterations (algorithm 4.4).
To our knowledge, Nesterov schemes have never been applied to the inverse problem of
M/EEG, in particular with a surface TV prior. In [4], the optimization procedure used suf-
fered from a very slow convergence rate. In this work, Adde et al. implemented the forward-
backward algorithm 4.3 proposed for TV optimization in [12]. Due to its slow convergence
rate, the author suggested to use a fixed step gradient scheme that requires to make the TV
prior differentiable by replacing the TV by:
TV (x) =∑
p
√
(∇xpx)2 + (∇y
px)2 + (∇zpx)2 + ǫ, ǫ > 0 .
Unfortunately, this scheme does not reach an optimal solution. However, in practice, it pro-
vides visually acceptable results in a small amount of time. The Nesterov scheme that we
propose here is a principled algorithm to solve (4.13) with guarantees of speed and optimal-
ity.
Figure 4.4 presents a result obtained with a TV prior with constrained orientations.
(a) Synthetic active region. (b) Reconstruction result obtained with a TV prior.
Figure 4.4: Simulation result using a TV prior. (a) The synthetic active region used to sim-
ulate MEG measurements. The measurements where corrupted with a small additive Gaus-
sian white noise (SNR=10). The activation pattern was designed to be piecewise constant
which is what is adapted for TV minimization. (b) The reconstruction result obtained by solv-
ing the inverse problem with a TV prior. The solution presents a clear “hot spot” at the correct
location and sets to 0 the majority of the remaining cortical surface.
The TV prior offers a principled to way to cope with spatially extended activations. How-
ever good care should be taken when applied with M/EEG data. Our experience shows that
when applied with too big values of λ, the TV prior tends to move the active regions towards
“flat” cortical regions. The complexe shape of the cortical mantle near the active region may
therefore in practice lead to localization errors. Also, to our knowledge, the bias towards
superficial sources has never been tackled with a TV prior. And finally, the case of dipolar
source spaces with unconstrained orientations has to our knowledge never been treated.
We are now done with the presentation of image-based inverse solvers that work on an
133
instant-by-instant basis. We will know present solvers that make use of the temporal infor-
mation in the data.
4.4 SPARSITY AND SPATIOTEMPORAL DATA
The reason for using temporal information in the inverse problem seems relatively
natural. Two neighboring time instants carry a very similar information since the under-
lying physiological phenomena have a low frequency compared to the sampling rate of the
recordings. The noise that corrupts the measurements is also highly correlated in time. And
our knowledge about physiology favors a vision of the sources as a set of sources, of limited
number, whose activity is stable in time.
4.4.1 VESTAL
The VESTAL method [108] addresses the main critic that is made about MCE. When using
MCE during a small time window, the set of active dipoles can vary significantly between
two neighboring instants, although one would expect a very similar current distribution. The
MCE solver leads to “spiky” estimated time courses.
In order to fix this problem, the VESTAL solver proposes to run an ℓ1 inverse problem at
each time instant, like MCE, but projects the sample-wise l1-norm estimates into the signal
subspace defined by a set of temporal basis functions. Let us write the SVD of the measure-
ments, M = USVT . The first columns of V are used as temporal basis functions.
The critiques that can be made about the VESTAL solver are the following. First, the
optimization scheme proposed in [108] can be largely improved, using the LARS algorithm
for example. Second, the authors do not clearly state what cost function is actually optimized
by their procedure. The following algorithm proposes a much better approach with the same
objective of mixing time and space with a sparsity inducing prior.
4.4.2 ℓ1 over space and ℓ2 over time
An ℓ1 prior brings sparsity, i.e., a limited number of active sources, while an ℓ2 prior brings
what we call diversity, i.e., no zero coefficients. The ℓ2 norm spreads the energy over all the
sources and therefore brings smoothness. By using an ℓ1 prior over space while keeping an
ℓ2 prior over time, the problem of “spiky” activation times series faced by the MCE method is
addressed. We recall that the MCE solver runs on each time instant independently with an
ℓ1 prior.
The approach described above consists in penalizing the inverse problem with a mixed
norm given by:
‖X‖21 =∑
i
√
∑
t
x2it , (4.20)
where i indexes space and t indexes time. One will denote this norm ℓ21.
It can also be modified to take into account some weighting coefficients:
‖X‖w;21 =∑
i
√
∑
t
wix2it (4.21)
in order to reduce the bias for superficial sources.
134 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
The optimization problem becomes:
X∗ = arg minX
1
2‖M−GX‖22 + λ‖X‖w;21, λ > 0 . (4.22)
This norm introduces an ℓ1 norm at a group level. Within a group the coefficients are
compared using an ℓ2 norm. In statistics and machine learning, it is known as the Group-
LASSO problem[235]. We also refer the reader to [141] for more details on mixed norms in
the signal processing context.
In the M/EEG community this norm was recently proposed in this exact form by Ou et
al. in [166]. Prior to this work, [74] proposed another grouping strategy this time between
orientations. In [166], the authors detail how to group both time and orientations. If the coef-
ficients of X are indexed by the position i, the orientation r, and the time t, the corresponding
prior is given:
‖X‖21 =∑
i
√
∑
r
∑
t
x2irt .
Integrating the orientations in the prior does not add any difficulty in the optimization.
In order to speed up the computation, it is proposed in [166] to reduce the rank of the
matrix M with an SVD and to work on the SVD components rather than on the raw temporal
data. When considering K SVD components, the size of the matrix to invert is dm ×K rather
than dm × dt. Due to the ℓ2 norm within the groups, this is perfectly justified. If we use all
the temporal components of the SVD, we project the data using a basis of orthogonal vectors
and the temporal ℓ2 norm is not changed.
However, in [166], the authors use an interior point methods to solve the optimization
problem. Interior point method, also referred to as barrier methods, are a certain class of
algorithms to solve linear and nonlinear convex optimization problems. They guarantee the
optimality of the solution in polynomial time but do not scale very well to large problems, i.e.,
source spaces with a high number of dipoles and a lot of time instants. The complexity of the
algorithm proposed in [166] is in fact cubic in the number of variables.
As for the ℓ1 problem, various methods exist to compute the inverse problem with an ℓ21norm. The particular method that does not apply is the LARS. One can use a coordinate
descent algorithm [77], an IRLS solver similar to the one proposed for the LASSO or an
iterative method based on proximity operators similar to what has been presented for the ℓ1and TV priors [55, 224].
To apply the forward-backward iterations or the optimal scheme from Nesterov, one needs
to compute the proximity operator associated to a ‖ · ‖21 prior.
Let us denote by Xi the ith row of X. By definition, the proximity operator associated to
the ℓ21 norm is given:
X∗ = proxλ‖ ‖21(Y)
= arg minX
1
2‖Y −X‖2F + λ‖X‖21
(4.23)
The solution is obtained using the following proposition:
Proposition 4.4 (Group-LASSO proximity operator). The solution X∗ = proxλ‖ ‖21(Y) of the
proximity operator associated to the Group-LASSO is given group by group (here row by row)
by:
Xi∗ = Yi
(
1− λ
‖Yi‖2
)+
. (4.24)
Using the algorithms 4.4 and 4.3 detailed for the ℓ1 prior, one can compute the optimiza-
tion in (4.22). The only difference comes from the modification of the proximity operator. If
order to run this algorithm with EMBAL , the code snippet in table 4.6 can be used.
135
1 clear options
2 options.penalty = ’l21’; % Use L21 as Prior
3 options.project = 5; % Set K=5 (SVD of the measurements)
4 options.lambda = 1e-5;
5 X = prox_inverse(M,G,options);
Table 4.6: Runing sparse inverse modeling with temporal data using proximity operators and
EMBAL .
This strategy that consists in grouping time instants, achieves the same goal as the
VESTAL solver. It provides sparse solution over space and smooth solutions over time. How-
ever, the methodology behind the Group-LASSO is much more principled and is expected to
provide better results.
Due to the use of an ℓ1 prior over space, that favors focal sources with low SNR, this
approach is not adapted to the reconstruction of spatially extended activations. Another critic
that can be addressed to this approach is that the active set, i.e., the list of active dipoles, is
the same for the full time window of interest. This also implies that the length of the time
window considered has an influence on the active set and therefore the solution. The ℓ1 prior
implies that an optimal solution has fewer active sources than the number of sensors. By
extending the size of the time window to look for example at late responses, the active set
estimated just after stimulation might be changed. Indeed, the ℓ2 prior over time, and the
grouping of all time instants, implies that if a dipole has an activation at one time instant, it
has one during the full time period. There is no way with this method to see a dipole become
active in the middle of the time window of interest. It is however possible to handle such a
case, by using groups with overlaps. In the current setting, a variable xit belongs to only one
group, the ith. Setting the ith group to zero sets the activation of dipole i to zero for the full
time period. Introducing overlap between groups means allowing a variable xit to belong to
multiple groups. This would circumvent the limitations of the ℓ21 prior. Unfortunately, the
optimization with overlapping groups is not trivial to handle. We refer the reader to [116]
and [112] for more details on this topic.
4.5 SPARSE PRIORS WITH MULTIPLE EXPERIMENTAL CON-DITIONS: ℓ212
In the previous paragraph, we have seen how the use of priors with mixed norms, like ℓ21,
allows to integrate structured sparsity. From a Bayesian point of view, using such a mixed
norm introduces some coupling between coefficients, instead of the independency hypothesis
associated with an ℓp norm. By grouping time instants, Ou et al. [166] introduced a coupling
between all time instants.
We now propose to integrate in a single framework a prior that brings structured sparsity
between space, time but also the experimental condition. The development of this method was
motivated by its application to the retinotopic mapping with MEG presented in chapter 5.
During an experiment, a subject is generally asked to perform different cognitive tasks or
to respond to various external stimuli. They are referred as different experimental conditions.
With a standard ℓ2 prior, it may occur, that the estimated active cortical regions in condition
1 overlap the active regions of condition 2, which may often be unrealistic considering what
is known about neuroanatomy. In order to take into account this anatomical knowledge, and
obtain more accurate mappings of some brain functional organization, we propose to use a
prior that penalizes overlap between active regions.
136 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
4.5.1 Method
In order to introduce such inter-condition sparsity constraints, currents corresponding to all
conditions have to be estimated simultaneously. Let dk denote the number of conditions. It
is achieved by concatenating all measurements, M ∈ Rdm×dkdt . Let X ∈ R
dx×dkdt have its
elements now indexed by (i, k, t), i indexes space, k the condition and t the time.
In chapter 5, the following norm is used with Fourier coefficients. Therefore, the follow-
ing definition and the following proposition, that gives the associated proximity operator, is
written with complex valued coefficients.
Definition 4.3 (Three level mixed norm). Let x ∈ Cdxdkdt be indexed by a triple index (i, k, t)
such that x = (xi,k,t). Let p, q, r ≥ 1 and w ∈ Rdxdkdt
+,∗ be a sequence of strictly positive weights
labelled by a triple index (i, k, t). We call mixed norm of x the norm ℓw;p,q,r defined by
‖x‖w;pqr =
dx∑
i=1
dk∑
k=1
(
dt∑
t=1
wi,k,t|xi,k,t|p)q/p
r/q
1/r
.
The problem that is addressed here is:
X∗ = arg minX‖M−GX‖2F + λ‖X‖2w;212 , λ ∈ R+ . (4.25)
A ℓ1 prior is set over the index k corresponding to the condition, while an ℓ2 prior is used
over space and time. By doing so, each dipole has an incentive to explain a small number
of conditions. The conditions are not supposed to change during the time window. Note that
‖X‖w;222 = ‖X‖w;F and that if dk = 1, i.e., only one condition, ‖X‖w;212 = ‖X‖w;F . This means
that in the case where only one condition is considered, the ℓw;212 solution corresponds to the
widely used MN and WMN (see section 3.2).
Here also, solving (4.25) is based on the computation of the proximity operator associated
to the ℓw;212 norm.
The proximity operator associated with the mixed norm ‖.‖2w;212 is analytically given by
the following proposition. We denote yi,k,• = (yi,k,1, yi,k,2, . . . , yi,k,T ).
Proposition 4.5. Let y ∈ Cdxdkdt be indexed by a triple index (i, k, t). Let w a sequence of
strictly positive weights such that ∀t, wi,k,t = wi,k. Let ri,k be defined as ri,kdef= ‖yi,k,•‖w,2/wi,k
, where ‖yi,k,•‖w,2def=√
wi,k
∑
t |yi,k,t|2. For each i, let the indexing denoted by k′i be defined
such that ∀k′i, ri,k′i+1 ≤ ri,k′
i. Let the index Ki and the quantity Kwi
def=∑Ki
ki=1 wi,kibe defined
such that
λ
Ki∑
k′i=1
wi,k′i
(
ri,k′i− ri,Ki
)
< ri,Ki≤ λ
Ki+1∑
k′i=1
wi,k′i
(
ri,k′i− ri,Ki
)
.
Then the solution z = proxλ2 ‖.‖2
w;212(y) is given for each coordinate (i, k, t) by
zi,k,t = yi,k,t
1− λ√wi,k
1 + λKwi
Ki∑
k′i=1
‖yi,k′i,•‖w,2
‖yi,k,•‖2
+
.
Proof. To simplify the notations in the demonstration, we remove the 12 in the proximity
137
operator. We address here the equivalent problem:
x∗ = arg minx
‖y − x‖22 + λ‖x‖2w;212 , (4.26)
with
‖x‖2w;212 =∑
i
∑
j
(
∑
k
wi,j |xi,j,k|2)1/2
2
.
To simplify the notations we write ‖yi,k,•‖2 and ‖yi,k,•‖w,2 respectively as ‖yi,k‖2 and ‖yi,k‖w,2.
Let us derive the functional in (4.26) with respect to xi,j,k. It leads to the following system
of variational equations:
|xi,j,k| = |yi,j,k| − λ√wi,j |xi,j,k|‖xi,j‖−1
2 ‖xi‖w;21
arg(xi,j,k) = arg(yi,j,k)
which gives:
|xi,j,k|(
1 + λ√wi,j‖xi,j‖−1
2 ‖xi‖w;21
)
= |yi,j,k| (4.27)
⇒ |xi,j,k|2(
1 + λ√wi,j‖xi,j‖−1
2 ‖xi‖w;21
)2= |yi,j,k|2
By summing over k, we get:
‖xi,j‖2(
1 + λ√wi,j‖xi,j‖−1
2 ‖xi‖w;21
)
= ‖yi,j‖2⇒ ‖xi,j‖2 + λ
√wi,j‖xi‖w;21 = ‖yi,j‖2 . (4.28)
We have that:
‖xi‖w;21 =∑
l/‖xi,l‖2>0
√wi,l‖xi,l‖2 .
which implies that:
‖xi,j‖2 = ‖yi,j‖2 − λ√wi,j
∑
k/‖xi,k‖2>0
√wi,k‖xi,k‖2 . (4.29)
Using (4.28), we have, if j and k satisfy ‖xi,j‖2 > 0 and ‖xi,k‖2 > 0, that:
‖xi,k‖2√wi,k
=‖xi,j‖2√wi,j
+‖yi,k‖2√wi,k
− ‖yi,j‖2√wi,j
.
By injecting it in (4.29) we get:
‖xi,j‖2 = ‖yi,j‖2 − λ√wi,j
∑
k/‖xi,k‖2>0
wi,k
(‖xij‖2√wi,j
+‖yi,k‖2√wi,k
− ‖yi,j‖2√wi,j
)
⇔ ‖xi,j‖2 = ‖yi,j‖2 − λKwi‖xi,j‖2 + λKwi
‖yi,j‖2 − λ√wi,j
∑
k/‖xi,k‖2>0
√wi,k‖yi,k‖2
⇔ ‖xi,j‖2 = ‖yi,j‖2 −λ√wi,j
1 + λKwi
∑
k/‖xi,k‖2>0
‖yi,k‖w;2
,
where Kwi=∑
k/‖xi,k‖2>0 wi,k. This provides the solution for ‖xi,j‖2 > 0. When
‖yi,j‖2 −λ√wi,j
1 + λKwi
∑
k/‖xi,k‖2>0
‖yi,k‖w;2 ≤ 0
138 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
it implies that ‖xi,j‖2 = 0.
We therefore have for all i:
‖xi,j‖2 =
‖yi,j‖2 −λ√wi,j
1 + λKwi
∑
k/‖xi,k‖2>0
‖yi,k‖w;2
+
.
Let us introduce the indexing k′i such that ‖xi,k′i+1‖2 ≤ ‖xi,k′
i‖2 and the index Ki such that
‖xi,Ki‖2 > 0 and ‖xi,Ki+1‖2 = 0. After reordering, we have that:
‖yi,Ki‖2 −
λ√wi,Ki
1 + λKwi
Ki∑
k′i=1
‖yi,k′i‖w;2 > 0 ,
which leads to:
λ
Ki∑
k′i=1
wi,k′i
(
ri,k′i− ri,Ki
)
< ri,Ki.
This provides the reordering in the proposition and the equation:
‖xi,j‖2 =
‖yi,j‖2 −λ√wi,j
1 + λKwi
Ki∑
k′i=1
‖yi,k′i‖w,2
+
. (4.30)
Let us rewrite (4.27):
|xi,j,k| =|yi,j,k|
1 + λ√wi,j‖xi,j‖−1
2 ‖xi‖w;21
=|yi,j,k|‖xi,j‖2
‖xi,j‖2 + λ√wi,j‖xi‖w;21
Using (4.28), we get:
|xi,j,k| =|yi,j,k|‖xi,j‖2‖yi,j‖2
.
By injecting the result in (4.30) in this equation we get:
|x∗i,j,k| =|yi,j,k|
(
‖yi,j‖2 − λ√
wi,j
1+λKwi
∑Ki
j=1 ‖yi,j‖w,2
)+
‖yi,j‖2
= |yi,j,k|(
1− λ√wi,j
1 + λKwi
∑Ki
j=1 ‖yi,j‖w,2
‖yi,j‖2
)+
.
Remarks.
1. If T = 1, then proxλ‖.‖2w;212
(y) = proxλ‖.‖2√w;12
(y) which corresponds to the Elitist-Lasso
problem [129].
2. This proposition also provides the proximity operator for the ℓw;21 norm introduced in
section 4.4.2.
3. The proximity operator is known analytically. It is simply a shrinkage operator after a
sorting operation. It implies that the solution is exact and relatively fast to compute.
139
Columns (G·i)i of M/EEG forward operators are not normalized. The closer is the dipole i
from the head surface, the bigger is ‖G·i‖2. This implies that a naive inverse procedure would
favor dipoles close to the head surface. Using a weighted norm is an alternative to cope with
this problem. With the mixed norm ‖.‖w,212, it is done by setting wi,k = wi = ‖G·i‖2.
ℓw;212 with unconstrained orientations
Up to here, the ℓw;212 prior was presented assuming a source space with constrained orien-
tations. It is however easy to integrate into the framework the unconstrained case. Let us
denote dr the number of oriented dipoles set at each brain location. The number of brain loca-
tions is denoted dx. Let X ∈ Rdxdr×dkdt have its elements now indexed by (i, r, k, t), i indexes
the position, r the orientation, k the condition and t the time. The prior is, in this case, given
by:
‖X‖w;212 =
dx∑
i=1
dk∑
k=1
√
√
√
√
dt∑
t=1
dr∑
r=1
wi|xi,r,k,t|2
2
1/2
.
This norm is still of the form of ℓw;212 and leads therefore to the same optimization strategy.
Like in the constrained case, bias for superficial sources can be corrected by setting:
wi =
√
∑
r
‖G·ir‖22 ,
where G·ir stands for the forward field of the dipole with position i and orientation r.
4.5.2 Simulations
By setting a ℓ1 prior between conditions, the mixed norm proposed penalizes overlap be-
tween active cortical regions. In order to illustrate this, we generated two synthetic datasets.
The first reproduces part of the organization of the primary somatosensory cortex (S1) [174].
Three, non overlapping, cortical regions with a similar area (cf. Fig. 4.6a), that could corre-
spond to the localization of 3 right hand fingers, have been computed and used to generate
synthetic measurements corrupted with an additive Gaussian random noise. The amplitude
of activation for the most temporal region (colored in red in Fig. 4.6), that could correspond
to the thumb, was set twice bigger than the amplitudes of the two other regions. This situa-
tion, where the source amplitudes differ between conditions, is relatively common with real
M/EEG data. The inverse problem was then computed with a standard ‖.‖w;F norm and the
‖.‖w;212 mixed norm. Within the 3 neighboring active regions, a label corresponding to the
condition giving the maximum of amplitude in each of the three conditions was assigned to
each dipole. Quantification of performance was done for multiple values of signal-to-noise
ratio (SNR) by counting the percentage of dipoles that have been incorrectly labeled. The
SNR is defined here as 20 times the log of the ratio between the norm of the signal and the
norm of the added noise. Results are also presented in Fig. 4.5. Results with an ℓ1 prior,
has also been added. It can be observed that the ‖.‖w;212 produces systematically the best
result. The ℓ1 is very rapidly affected by the decrease of SNR, which is known in the M/EEG
community. In order to have a fair comparison between all methods, λ was set in each case to
have ‖M−GX∗‖F equal to the norm of the added noise, known in the simulations.
Results are illustrated in Fig. 4.6b and 4.6c on a region of interest (ROI) around the left
primary somatosensory cortex. It can be observed that the extent of the most lateral region,
obtained with ‖.‖w;F , is overestimated while the result obtained with the ‖.‖w;212 mixed norm
is relatively accurate. Similar simulations have been performed in the primary visual cortex
(V1), reproducing the well known retinotopic organization of V1. Results are presented in
140 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
Fig. 4.7. Simulations lead to the same conclusion about the superiority of the ‖.‖w;212 mixed
norm for the mapping of such brain functional organizations.
0 3 6 9 200
20
40
60
80
100
SNR (dB)
Err
or
(%)
|| ||w;F
|| ||w;111
|| ||w;212
Figure 4.5: Evaluation of ‖.‖w;F vs. ‖.‖w;212 vs. ‖.‖w;111 estimates on synthetic somatosensory
data. The error represents the percentage of wrongly labeled dipoles.
(a) Simulation data
(b) ‖.‖w;F result (ROI) (c) ‖.‖w;212 result (ROI)
Figure 4.6: Illustration of result on the primary somatosensory cortex (S1) (SNR = 20dB).
Neighboring active regions reproduce the organization of S1.
141
(a) Simulation data
(b) ‖.‖w;F result (ROI) (c) ‖.‖w;212 result (ROI)
Figure 4.7: Illustration of result on the primary visual cortex (V1) with SNR = 20dB. Neigh-
boring active regions reproduce the retinotopic organization of V1. When comparing the two
results in (b) and (c) with the simulation data in (a), it can be observed that the result in (c)
obtained with the ‖.‖w;212 prior provides the most accurate result.
4.5.3 MEG study
Results of the proposed algorithm using MEG data from a somatosensory experiment are
now presented. The data acquisition was done using a CTF Systems Inc. Omega 151 system
with a 1250 Hz sampling rate. The somatosensory stimulation was an electrical square-wave
pulse delivered randomly to the thumb, index, middle and little finger of each hand of a
healthy right-handed subject. Evoked data were computed by averaging 400 repetitions of
the stimulation of each finger. To produce precise localization results, the triangulation over
which cortical activations have been estimated was sampled with a very high number of
vertices (about 55 000). The forward modeling was performed with a spherical head model1
using dipoles with fixed orientations given by the normals to the cortex [50].
Prior to the current estimation, data were whitened using the noise covariance matrix Σ,
estimated on the period before stimulation. Let Σ = LT L the Cholesky factorization of Σ.
Whitening consists in replacing G by L−1G and M by L−1M. With an additive Gaussian
noise model this implies that the noise, given by M − GX ∈ Rdx×dkdt , is assumed to have
a standard normal distribution. This implies that a good estimate of ‖M −GX∗‖F is given
by√dxdkdt. Therefore, the regularization parameter λ was set in order for X∗ to be also the
solution of the constrained problem: X∗ = arg minX ‖X‖ subject to ‖M −GX‖F ≤√dxdkdt.
The optimization is done with the algorithm 4.5.
Results obtained with the right hand fingers during the period between 42 and 46 ms are
presented in figure 4.8. Knowing that for this somatosensory dataset active parcels should
1http://neuroimage.usc.edu/brainstorm/
142 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
have negative activations around 45 ms, regions with positive activations were first removed.
Within the remaining regions, a label was assigned to each dipole based on its maximum am-
plitude across conditions. For each condition, equivalently each label, the biggest connected
component was kept. Each of the 4 estimated components, corresponding to the 4 right hand
fingers are presented in Fig. 4.8. Solutions using both norms ‖.‖w;F and ‖.‖w;212 are detailed.
With ‖.‖w;212 the well known organization of the primary somatosensory cortex [174] is suc-
cessfully recovered, while with ‖.‖w;F , the component corresponding to the index finger is
overestimated leading to an incorrect localization of the area corresponding to the thumb.
(a) Fingers color coded (a) ‖.‖w,212 result
(c) ‖.‖w,F result (ROI) (b) ‖.‖w,212 result (ROI)
Figure 4.8: Labeling results of the left primary somatosensory cortex in MEG.
143
4.6 CONCLUSION
In this chapter we presented various approaches and algorithmic details necessary
to find the solution of the M/EEG inverse problem with a sparsity inducing prior. State-
of-the-art optimization techniques are presented and discussed in order to provide methods
tractable on big datasets. We have presented simple algorithms, which is particularly im-
portant to facilitate their adoption by the M/EEG community. And finally, we detailed fast
algorithms which are of major interest since real studies require to compute statistics in order
to validate neuroscientific hypotheses. Indeed, when using robust non-parametric statistics,
the inverse solver needs to be run thousands of times. This makes the speed of convergence
of the solver a very critical issue.
Our last contribution describes an inter-condition prior that improves the localization of
cortical activations by offering the possibility to use a prior between different experimental
conditions. By proposing to perform the inverse problem on multiple conditions simultane-
ously and to use a mixed norm that sets an ℓ1 prior between each condition, the method
penalizes current estimates with an overlap between the corresponding active regions. When
such an hypothesis holds anatomically, the more conditions are recorded and used in the in-
verse problem, the better is the localization of neuronal activity. By keeping an ℓ2 prior over
space and time, the proposed method guarantees a good robustness to noise, like standard ℓ2based methods. This approach also improves over the smeared reconstructions observed with
standard ℓ2 inverse solutions. This is confirmed by the simulations and the MEG somatosen-
sory data, with which the method is successfully illustrated.
As for the previous chapter, all the algorithms detailed in this chapter have been imple-
mented and tested with synthetic and real MEG data. The source code of the solvers and the
demo scripts with synthetic and real data are available is a Matlab toolbox called EMBAL
(Electro-Magnetic Brain Activity Localization):
https://gforge.inria.fr/projects/embal
We refer the reader to the demo scripts running on synthetic MEG data:
• demo inverse l1.m: contains a comparison of LARS, IRLS, Landweber and Nesterov for
the ℓ1 problem.
• demo inverse l21.m: contains a comparison of IRLS, Landweber and Nesterov for the
ℓw,21 problem.
• demo inverse TV.m: contains a comparison of IRLS, simple gradient descent (with adap-
tive step size obtained with line search), Landweber and Nesterov for the TV problem.
Nothing was said in this chapter about the use of IRLS and simple gradient descent for
solving the TV problem as it only solves a “smoothed” version of the TV problem. Also, for the
IRLS solver, the presence of the gradient makes the solver even more numerically unstable
than in the ℓp case.
We finish the chapter by listing factors that could be investigated in future studies. Among
these is the ability to use a TV prior with dipoles having unconstrained orientations. This
can be related to an image deconvolution problem when considering a colored image with
3 channels (red, green and blue). Furthermore, the algorithms detailed in this chapter can
be directly applied to multichannel deconvolution problems. Also, we plan to investigate
recently proposed iterative algorithms with the same convergence rate as Nesterov’s scheme
[1] but with a smaller computational complexity. Finally, another major topic we would like to
explore is the case where the prior contains overlapping groups of variables. This would allow
to deal with temporal data where the active sources might not be flagged as active during the
144 CHAPTER 4. INVERSE MODELING WITH SPARSE PRIORS
full time interval. This latter limitation will be addressed in chapter 6 with a completely
different approach based on a graph cuts optimization technique.
CHAPTER 5
FAST RETINOTOPIC MAPPING
WITH MEG
As demonstrated by research on computer vision in order to reproduce the capabilities of
human perception with computers, the human visual system is an amazingly complex ma-
chinery. Understanding how visual information is encoded and treated by our brain is still
a big challenge that brain functional imaging has been trying to tackle in the last decades
thanks to advanced techniques like fMRI [24, 62, 64, 204, 222, 223], PET [75], surface elec-
trodes [234], optical tomography [236], near-infrared spectroscopy [95], EEG [110, 213], and
MEG [69, 82, 151, 197].
The motivation for the work presented in this chapter is twofold. First we wanted to inves-
tigate how well MEG could reproduce the retinotopic maps obtained by standard protocols in
fMRI. Second, we wanted to exploit the excellent temporal resolution of MEG to get an access
to brain dynamics during visual processing.
In this chapter we start by presenting the basics of the human visual system and insist
particularly on the retinotopic properties of the primary visual cortex (V1). Results obtained
in the literature by functional imaging are then presented. We will then describe the exper-
imental protocol designed in order to explore the retinotopic mapping of V1 with MEG. The
methods used for data processing, functional mapping and timing include signal extraction
with spectral analysis, inverse modelling and statistics with resampling techniques using
permutations. Some information on our experimental protocol, analysis methods and results
are presented in [2, 42, 44, 45, 46].
Contents
5.1 From the eyes to the cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.2 Retinotopic mapping with fMRI . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3 Source localization with M/EEG in the visual cortex: previous studies 150
5.4 MEG experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.1 Stimulus design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.2 Protocol design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.5 Mapping V1 with MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5.1 Data exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5.3 Mapping results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 Timing visual dynamics with MEG . . . . . . . . . . . . . . . . . . . . . . . 176
5.6.1 Estimating timings in the visual cortex with M/EEG: Literature review 176
145
146 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
5.6.2 Extracting information from the phase . . . . . . . . . . . . . . . . . . . 178
5.6.3 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
147
5.1 FROM THE EYES TO THE CORTEX
We start this section with a description of the path conveying the visual information from
the eyes to the primary visual cortex (V1). We then present briefly the organization of the
visual cortex. For more comprehensive description, see for instance [29, 124, 167].
Figure 5.1: The path of the visual information from the eyes to the primary visual
cortex (Adapted from http://homepage.psy.utexas.edu/homepage/Class/Psy308/
Salinas/Vision/Vision.html).
In primates, in particular in humans, the visual system includes many anatomical ele-
ments, from the eyes to the cortex. In the eye, light goes successively through the cornea, the
aqueous humor, and the pupil. Next it passes through the lens, before entering the vitreous
humor. It finally reaches the retina, which is covered with over 125 million photosensitive
receptors of two families. The cones form a population of around 8 millions cells. Mainly
concentrated in the center of the retina, also known as the fovea, the cones are responsible
for chromatic and normal lighting (photopic) condition vision (or photopic). About 120 million
rods are found everywhere except in the fovea. They deal with black and white perception
and low-lighting conditions (scotopic).
These photosensitive receptors translate lighting information into electrical information,
transmitted to the optical nerves via the ganglion cells. The two optical nerves meet, forming
the optic chiasm, after which information is transmitted separately for each visual hemifield
(separated vertically with respect to the head position): the information from the left (respec-
tively right) parts of both retina and corresponding to the right (left) visual field is brought
together to form the left (right) optical tractus (cf. figure 5.1).
The vast majority of the optical tracts fibers get projected to a part of the thalamic sen-
sory relay system, the Lateral Geniculate Nucleus (LGN). Visual signals from the two eyes
remain segregated in the LGN which approximately counts 1 million cells corresponding to
the number of optical fibers. Finally, the LGN axons form the optic radiations which reach
the primary visual cortex (V1), centered around the calcarine fissure (cf. figure 5.2).
V1, also known as Brodmann area 17 (cf. figure 1.9) or “striate cortex” due to its cytoar-
chitectonic properties, is viewed as the entry of the visual cortex. It receives most outputs of
the LGN.
148 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
L I N G U A L
C U N E U S
P R A
E C U
N E
U S
Infr
C a
l c
a r i
n e
f i s s u r e
Parie
to-o
ccipita
lfissure
I S T
G Y R U S
Figure 5.2: Schematic representation of the calcarine fissure in medial view (From 20th U.S.
edition of Gray’s Anatomy of the Human Body, 1918 (public domain)).
V1 contains a complete (mirror) representation of the contralateral hemifield. This is illus-
trated by figure 5.3 and figure 5.4. This property corresponds to the retinotopic organisation
of V1. A practical consequence of this organization, is that besides on patients who under-
went damage to the occipital lobe, the left (resp. right) visual hemifield projects to the right
(resp. left) occipital cortex.
Beyond this retinotopy, neurons in V1 are organized into sub-regions, each specialized in
the analysis of a given visual feature. Among these features are sensitivity to color, contrast,
orientations or direction of motion. The sensitivity and selectivity to orientation was first
measured by Hubel and Wiesel (Nobel Prize in Physiology or Medicine in 1981) in 1959. By
inserting a microelectrode into the primary visual cortex of an anesthetized cat they discov-
ered that some neurons responded more strongly to one particular orientation while other
neighboring neurons were more sensitive to other orientations. They called such neurons of
V1 “simple cells”. This observation motivated the design of the visual stimuli used in the
MEG experiment detailed below in section 5.4.
Next to V1 are other visual areas reported in the literature. Without going into much
detail, studies on macaque monkeys lead Felleman and Van Essen [70] to differentiate 30
areas based on four main criteria: (i) local cortical cells architecture, (ii) connectivity patterns
across areas, (iii) global functional selectivity and (iv) retinotopy. In humans, the last two
criteria were successfully used to unveil several areas. We make below a short list of areas
neighboring V1.
V2, also called prestriate area, is subdivided in each hemisphere into two parts: V2v (for
ventral) and V2d (for dorsal). They respectively represent the upper and lower contralateral
quarterfields. Together the four regions provide a complete map of the visual field. Area V2
mainly receives its inputs from V1. Functionally, V2 has many properties in common with
V1. Cells are tuned to simple properties such as orientation, spatial frequency and color.
V3, refers to the region of the cortex located immediately next to V2. Like V2, is sub-
149
Figure 5.3: Illustration of the retinotopic organization in V1. V1 contains a complete (mirror)
representation of the contralateral hemifield (Adapted from [219]).
Figure 5.4: Retinotopic organization of the primary visual cortex (V1): The visual field is
continuously mapped onto the visual cortex. A: A grid pattern made of con- centric circles
is presented to a macaque. The black box corresponds to the right visual field, that will be
transferred to the left hemisphere of the brain. B: The grid pattern is reproduced at the level
of the primary visual cortex (V1). The mapping from visual field to V1 is termed retinotopic,
because it continuously preserves the topology of the retina. The central zone of the visual
field is processed with great precision: It is processed by many neurons in V1, comparatively
to the periphery. Mathematically, the mapping from visual field to V1 is well approximated
by a log- polar scheme (Adapted from [205]).
150 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
divided into two parts: V3v ventrally (sometimes also called VP, in reference to the Ventral
Posterior area in monkeys) and V3d dorsally. However, contrarily to V1 and V2, there is still
some controversy regarding its exact extent and its functional selectivity. Even if no con-
sensus exists for humans, we can consider that V3v represents the upper quadrant and V3d
represents the lower quadrant.
5.2 RETINOTOPIC MAPPING WITH FMRI
By looking at figure 5.4, it can be noticed that a natural coordinate system for the spatial
organization of V1 is the polar coordinate system. This observation led to the design of the
stimuli called wedges and rings. The wedges encode the polarity, i.e., the angular information,
while the rings encode the eccentricity, i.e., the radial information. Such stimuli are presented
in figure 5.5. Original results of retinotopic mapping with fMRI can be found in [64, 65, 195].
In standard fMRI protocols, the stimuli presented to the subjects are rotating wedges and
expanding rings. The measurements obtained with the rotating wedges provide a polarity
map (cf. figure 5.6(a)) while the expanding rings provide an eccentricity map (cf. figure 5.6(b)).
A position in the visual field is associated to a cortical position by intersecting the information
from the polarity map and the eccentricity map.
(a) Rings (b) Wedges
Figure 5.5: Rings and wedges visual stimuli used for retinotopic mapping with fMRI (Adapted
from [223]).
Once the polarity map and the eccentricity map have been computed, the visual areas can
be delineated. For example the border between V1 and V2d is given by the lower meridian
and the border between V1 and V2v is given by upper meridian. Delineation results are
presented in figure 5.7.
The conclusion of this brief presentation of retinotopic mapping with fMRI, is that the
spatial resolution of fMRI enables the precise delineation of visual areas like V1, V2 and V3.
However, due the low temporal resolution of fMRI, dynamical information remains unacces-
sible. Retinotopic mapping with M/EEG finds its principal motivation in the measurements
of such dynamics.
5.3 SOURCE LOCALIZATION WITH M/EEG IN THE VISUALCORTEX: PREVIOUS STUDIES
151
(a) Orientation, i.e., polarity, map obtained by stimulation using the wedges in figure 5.5(b).
(b) Eccentricity map obtained by stimulation using the rings in figure 5.5(a).
Figure 5.6: Polarity, i.e., orientation, map and eccentricity map obtained by fMRI. Cortical
maps are flattened for 2D representation (Adapted from [223]).
Figure 5.7: Visual areas delineated by fMRI (Adapted from [223]).
152 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
We now present a review of important previous studies involving M/EEG recordings for
non invasive investigations in the human visual cortex. This presentation is restricted to
contributions that achieve source localization.
The literature on the topic demonstrates that various experimental and methodological
strategies are possible. Designing a strategy implies to choose:
• the patterns used for stimulation,
• the experimental protocol (duration of presentation of the stimuli, number of repeti-
tions, etc.),
• the method for signal extraction (averaging, spectral analysis, etc.),
• forward modeling (Spherical head model, BEM, etc.),
• inverse modeling (dipole fitting, beamforming, distributed sources, etc.).
We now discuss the different possible choices.
Stimulation pattern
Neural response is generated by pattern onset stimuli. This response is however different
depending wether the stimuli is black and white or colored, if it is highly contrasted, if it is
oriented, etc. A good pattern should evoke a response in the visual cortex with a high SNR in
order to provide the best source localization results. Following fMRI protocols, most previous
M/EEG studies use black and white checkerboard patterns [69, 110, 151]. The patterns are
displayed on a gray background to be well contrasted. However, contrary to fMRI, the pat-
terns are presented in a portion of the visual field as illustrated in figure 5.8 with a circular
checkerboard pattern. This is done in order to avoid crosstalks between neural current gen-
erators. With expanding rings, currents produced by two sources on the walls of the left and
right cuneus (cf. figure 5.2) could cancel their effect on the sensors. Source localization would
then become impossible.
Figure 5.8: Circular checkerboard pattern used for visual stimulation in [151].
Experimental protocol and signal extraction
The experimental protocol details the way the pattern is presented but also what is asked of
the subject during the experiment. In requires to choose the duration of the pre-stimulation
and stimulation periods and also the inter-stimulation interval (ISI), usually set to be random
to limit habituation and anticipation of the subject. The pre-stimulation period is also known
as the baseline period. What drives the choice of the protocol is the way the signal of interest
is extracted from the measurements. For this purpose, two approaches exist.
The first, and most classical, consists in averaging the measurements of multiple record-
ings and using as signal of interest the averaged response at a particular time instant. This
153
instant classically corresponds to a latency peak. Visual evoked potential (VEP) are elicited
by pattern onset stimuli. Like for all evoked potentials, the waveform of VEP exhibits char-
acteristic peaks. For VEP, some of these peaks are known as the C1 (a.k.a. N75), the P1
(a.k.a. P100) or the N1 (a.k.a. N145). A typical EEG VEP waveform is presented in fig-
ure 5.9. In [110, 214], the authors estimate source amplitudes using the C1. The C1 occurs
approximately 70 ms after pattern onset. According to previous fMRI studies and the results
described in [110, 214], the neural generators of the C1 are located in V1. With such proto-
cols, the pattern is presented a few hundred times and the pre-stimulation and stimulation
periods are respectively about 100 ms and 500 ms. The main problem of such approaches is
that the latency peak is not stable across subjects. This implies that the latency of interest
on each subject needs to be set manually.
Figure 5.9: A normal pattern reversal VEP measured in EEG (Adapted from [216])
The second consists is extracting the signal of interest from the spectrum of the measured
time series. This alternative to VEP is called steady-state visual evoked potential (SSVEP).
Rather than displaying multiple time locked pattern onsets, the pattern is flashed on the
screen with a known frequency. The stimulus is said to be tagged in frequency. We speak
of “frequency tagging”. While protocols based on standard VEP focus on the transient period
just after pattern onset, the experimental paradigms based on SSVEP exploit the stationarity
in the measured time series. The advantage of working with stationary time series is that
standard signal processing tools like Fourier analysis are well adapted to extract the signal of
interest. Contrary to the transient VEP signal, the SSVEP signal is easily quantified in the
frequency domain and can be rapidly extracted from background noise. A possible drawback
is that stimulation periods need to be sufficiently long to let the neural generators enter in a
stationary mode. Also, SSVEP are not particularly pleasant for the subject.
SSVEP were previously measured on anaesthetized cats [183] using invasive techniques
called multi-unit activity (MUA) and local field potentials (LFP). It was observed that about
300 ms after flicker onset, responses stabilized and exhibited a highly regular oscillatory
pattern precisely locked to the stimulus. The stationary period started after 300 ms. This du-
ration for the transient period was confirmed in humans with EEG in [187]. Using such fre-
quency tagged stimuli, Regan [187] observed also that using a “on-off” stimulation paradigm
or a reversing pattern does not produce neural activations with the same Fourier spectrum.
Pattern reversal stimulation produced a peak in the Fourier spectrum at the double of stimu-
lation frequency [69, 187, 191]. This can be explained by the observation that pattern reversal
produce a change of contrast in the visual field twice per period, i.e., twice per full cycle of
stimulation. The experimental results adapted from [69] presented in figure 5.10 confirm this
observation. The reversing pattern was presented for a long period of time, i.e., 14 s, for 4 dif-
ferent frequencies (2 Hz, 4 Hz, 8 Hz and 21 Hz). In [69] the stimulus consists both of square
wave modulated checkerboards or sinusoidally modulated checkerboards, while in [191] the
154 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
checkerboards are only modulated by a sine wave. Fawcett et al. [69] report that the results
with both types of modulations are very similar.
Inverse modeling
As described in chapter 3, the inverse problem can be solved with three main categories
of methods: dipole fitting, beamforming and image-based methods with distributed source
models. Previous studies on the human vision with M/EEG make use of all three of them.
The most common is dipole fitting. In [214], a small number of dipoles are positioned
in the visual areas localized a priori with fMRI and their amplitudes are then estimated.
Note that such a procedure is made possible by the fMRI since, as reminded in chapter 3,
parametric dipole fitting is a non convex problem for more than one dipole. The positions and
orientations of multiple dipoles could not be robustly estimated with pure M/EEG. In [82],
the authors use single dipole fitting with MEG data to localize a generator in V1. In [191], Di
russo et al. use dipole fit with “proximity seeding” and fMRI localizers.
In [69], Fawcett et al. investigated with MEG the neural response of V1 to frequency
tagged stimuli with a beamforming technique called Synthetic Aperture Magnetometry (SAM).
The beamforming method provided the amplitude over time of a manually positioned source
in V1. The position of the source was set according to the known position of the pattern in
the visual field. The computed time frequency decomposition presented in [69] is reproduced
in figure 5.10.
Alternative to dipole fitting approaches and beamforming techniques are imaging methods
with distributed source models. In [110] is presented an EEG study with ℓ2 source estimates.
Localization results are compared to fMRI data and it is observed that with the proposed
data processing pipeline the spatial resolution of EEG is approximately 3 of visual angle.
This study is however limited to visual stimuli displayed on the horizontal meridian. No
retinotopic mapping is performed per se. In [197], Sharon et al. demonstrate that combining
MEG and EEG measurements can improve localisation results compared to pure MEG in-
verse modeling. The inverse solver is dSPM (cf. chapter 3) which is a noise normalized solver
based on an ℓ2 prior. In [197], no retinotopic maps are presented either.
In 2002, Moradi et al. [151] have proposed a quantitative comparison between localization
results with MEG and fMRI. The inverse problem is solved on a volumetric grid of distributed
dipolar sources and source amplitudes are estimated with an inverse method called Magnetic
Field Tomography (MFT) [3]. Results obtained by Moradi et al. are presented in figure 5.11.
Moradi et al. argue that distributed source models are more adapted than dipole fitting since
the neural activations evoked by stimulus onsets are very likely to come also from extra-
striate regions. Sources amplitudes are estimated at various peak latencies starting around
45 ms after visual presentation of the stimulus. Note that in this contribution, the sources
do not lie on a triangular mesh but on a 3D grid. With such a method, like with fMRI data,
obtaining retinotopic mapping requires to interpolate activations to the cortical surface.
The different studies detailed above demonstrate the ability of M/EEG to localize with a
good precision neural activations in the intricate occipital region around the calcarine sulcus.
This observation was also confirmed by the simulation study we published in [2]. However
none of these contributions provides a complete retinotopic mapping of V1. Dipole fitting
and beamforming approaches are certainly not adapted for such a purpose that can only be
achieved with a distributed source model and an image-based inverse solver. The methodolo-
gies developed in [110] and in [197] are probably the best attempt towards retinotopic map-
ping with MEG and therefore the closest related work to the contribution presented in this
chapter. These methods however work with VEP and therefore suffer from various drawbacks
like the problem of variable peak latencies between subjects. Their experimental results are
also limited to the horizontal meridians in [110] and the four quadrants in [197].
The use of frequency tagged stimuli in [69, 151] demonstrates the ability of M/EEG devices
155
Figure 5.10: Time-frequency plots obtained using a checkerboard pattern flickering at various
frequencies in the bottom right quadrant of the screen. Time-frequency plots are calculated
using Morlet wavelet analysis from a voxel in the left (that is, contralateral) medial visual
cortex and averaged across multiple subjects to improve the SNR. The cortical harmonics
of the stimulus frequency can be clearly seen in the active phase, together with an onset
response shortly after the stimulus onset. The colour bar on the right of each figure shows
the event-related synchronization (ERS) and event related desynchronization (ERD) scale
used, expressed as a percentage of change from the baseline. (Adapted from [69])
156 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
Figure 5.11: Localization results obtained by Moradi in [151] with fMRI and MEG. Results
are displayed on a 2D slice going through the calcarine fissure. In blue are the results ob-
tained by fMRI (p¡0.01) while in yellow and red are the active regions estimated with MEG
(p¡0.01 and p¡0.001). (Adapted from [151])
to capture with a good SNR the signal evoked by a flickering pattern. Contrary to variable
peak latencies observed with VEP on multiple subjects, the frequency tuning of the neural
response evoked by SSVEP appears to be much more stable across subjects.
For these reasons, the protocol proposed in the following section is based on SSVEP and
the inverse problem is solved with a distributed source model. Besides the protocol, our
contribution is on the efficient processing of the data, both for the signal extraction and the
inverse modeling.
5.4 MEG EXPERIMENTAL DESIGN
5.4.1 Stimulus design
The primary objective of the design of the stimulation pattern was to produce a good signal to
noise ratio (SNR) in the MEG measurements. Considering what has been said earlier on the
functional properties of the simple cells in the visual cortex, a good SNR can be obtained with
highly contrasted patterns presenting multiple orientations. By multiplying the number of
orientations, more simple cells get activated and the global amplitude of the evoked response
of the brain increases.
The stimuli that have been designed are presented in figure 5.12. In order to obtain a map-
ping of V1, the “star” like patterns have been placed at multiple positions of the visual field.
Each position corresponds to an experimental condition. There are two types of experimental
conditions: the quadrants and the meridians. Each quadrant contains a big single “star”.
Along the meridians 4 “stars” whose sizes increase in accordance to the cortical magnification
factor in V1 [52] are displayed.
157
Stimuli were back-projected using a video projector (60 Hz refresh rate) on a translucent
screen located away 90 cm from the subjects. The Michelson contrast of the displayed patterns
was 96%. They were presented against a grey background (103 cd/m2; see figure 5.12).
The Michelson contrast is defined by:
CMichelson =Lmax − Lmin
Lmax + Lmin
where Lmin and Lmax are respectively the minimum and the maximum luminances (cd/m2)
measured on the pattern.
In each quadrant, the pattern was 2.66 wide with an eccentricity of 8.85. For the merid-
ians the 4 patterns had varying sizes (Width: 0.22, 0.35, 0.71, 1.33) and were presented at
4 different eccentricities (0.62, 1.52, 3.3, 6.64 respectively).
(a) Upper left quadrant (b) Lower left quadrant (c) Upper Right quadrant
(d) Lower right quadrant (e) Left meridian (f) Right meridian
Figure 5.12: Stimuli displayed for retinotopic mapping with MEG. Each position corresponds
to an experimental condition. The subject is asked to look at the colored fixation point at the
center of the screen. The flickering patterns were displayed for 6.5 s at 7.5 Hz or 10 Hz in
each trial evoking a steady-state visual evoked potential (SSVEP).
5.4.2 Protocol design
The primary motivation for using SSVEP and frequency tagged stimuli is that for long enough
stimulation periods the visual system can be considered to be stationary. The complex dy-
namic phenomena that occur during the transient period just after a stimulus onset are not
considered. Since a SSVEP can be completely described in terms of the amplitude and phase
of each frequency component it can be quantified more unequivocally than an averaged tran-
sient evoked potential.
The choice of the stimulation frequency
In [69], Fawcett et al. presented a reversing pattern tagged with 4 different frequencies. The
time frequency maps presented in figure 5.10 demonstrate that the increase in synchroniza-
tion during stimulation, i.e, active period, exceeds 100% at 2, 4 and 8 Hz. At 21 Hz the
steady-state response is clearly reduced. Due to the reversing pattern, the actual frequency
158 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
observed corresponds to the second harmonic. Similar experimental facts are reported by
Pastor et al. in [173] with EEG measurements and a flickering “on-off” pattern (no contrast
reversing). In order to estimate the best stimulation frequency for brain computer interface
(BCI) based on SSVEP, the authors measured the amplitude of the signal on the occipital
electrodes as a function of the stimulation frequency. As presented in figure 5.13, they ob-
served that the amplitude reached a maximum at 15 Hz and then fell with a plateau up to 27
Hz, declining at higher frequencies.
Figure 5.13: Average of the mean values of the amplitude of the FFT fundamental frequency
of the SSVEP recorded on three occipital EEG electrodes at the different stimulation fre-
quencies. The amplitude of the occipital neural response, expressed in microvolts, reached a
maximum at 15 Hz and then fell with a plateau up to 27 Hz, declining at higher frequencies.
(Adapted from [173])
These experimental observations indicate that the best frequency for a stimulation with
reversing pattern is between 5 and 10 Hz. Our hypothesis explaining the dependence of
the SSVEP amplitude on the stimulation frequency is that the retina is the limiting step
of the visual processing pipeline. This is confirmed by the experimental results from [196]
reproduced in figure 5.14. By measuring the response of ganglion cells in the retina of an
anesthetized cat while it was presented a sinusoidally modulated pattern, Shapley and Victor
concluded in 1978 that the retina is tuned to attenuate less the temporal frequencies between
10 and 20 Hz. These experimental results agree with the conclusion obtained by Pastor et al.
and Fawcett et al. with M/EEG measurements on humans.
Taking this into account in our experimental paradigm, stimuli were displayed at a fre-
quency of 7.5 Hz and 10 Hz. For experimental reasons, the stimulation frequency were con-
strained by the 60 Hz refresh rate of the screen. During the stimulation period, with 7.5 Hz,
the pattern was displayed during 4 successive frames before alternating with the reversed
pattern during the next 4 frames. At 10 Hz, the pattern was displayed during 3 successive
frames before alternating with the reversed pattern during the next 3 frames.
Pre-stimulation and stimulation periods
Stimulation consists of multiple repetitions of alternating pre-stimulation and stimulation
periods. A cycle of pre-stimulation and stimulation period is called a trial.
A trial started with the display of a colored fixation disk at the center of the screen during
600 ms. It was followed by one of the patterns displayed in figure 5.12 flickering in counter-
phase at 7.5 Hz or 10 Hz. Two successive trials were separated by a random inter stimulus
interval (ISI) of about 1.5 s (cf. figure 5.15).
We call “run” the successive presentation of multiple trials. One run was done for the
quadrants and one for the meridians. In each run, the conditions were randomly presented.
Each run contained 15 trials for each condition, i.e., position in the visual field. For example
159
Figure 5.14: Amplitude of response of cat ON-center X ganglion cell, reproduced from Shapley
and Victor (1978). Stimulus consists of 4 sinusoids with different contrasts. It reveals that
the frequencies between 10 and 20 Hz are the less attenuated by the retina. (Adapted from
[228] and original data from [196])
the run for the quadrants had 60 trials with 15 trials for each quadrant.
Stimulation
period
Flickering star & Fixation pointFixation
point
Inter-stimulation
period
Pre-stimulation
period
Time
≈1.5 s 0.6 s 6 s
...
t=0
Figure 5.15: A trial in the protocol for retinotopic mapping with MEG. Each trial is composed
of an inter trial period also called ISI (inter stimulus interval). This period has a random
length close to 1.5 s. At t=0 a fixation point appears and 600 ms later the stimulation starts.
During a trial, the fixation disk randomly changed color every second. To maintain the
attention of the subjects focussed and minimize eye movements, they were asked to report on
which color appeared more often at the end of the run.
5.5 MAPPING V1 WITH MEG
5.5.1 Data exploration
In order to confirm the presence of energy at the frequencies of stimulation on the data,
periodograms and spectrograms were computed on the signals measured by each MEG sensor.
The periodograms provide estimates of the power spectral density (PSD) of the signal
during the steady-state period. The stimulation started 0.6 s after the fixed point appeared
and the transient regime was estimated around 0.4 s. The beginning of the steady-state
period was therefore set to t = 1 s.
The periodogram can simply be computed with a standard FFT or in a more efficient
160 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
way using a smooth periodogram technique introduced in [225]. This technique commonly
referred to as a multitaper method, consists in computing the PSD on different portions of
the data and to average all the results in order to reduce the variance of the PSD estimation.
Each portion of the data is tapered by a time domain window. This is illustrated in figure 5.16,
where 6 tapers with a 50% overlap are extracted from a single-trial measurement. For each
of the 6 signal windows, the PSD is computed. The 6 results are then averaged to provide
the periodogram. By doing so the periodogram is smoothed and the PSD estimate is biased.
The variance of the estimate is however divided by the number of windows which makes the
PSD estimate more accurate. It corresponds to the classical trade-off in statistical estimation
between bias and variance.
To illustrate this, PSD estimates computed on the grand average data during one con-
dition are represented in figure 5.17. It can be observed that using small windows lead to
a smoothed version of the periodogram. This procedure reduces the estimation bias called
spectral leakage. When estimating the PSD of finite-length signals or finite-length segments
of infinite signals, it happens that some energy leaks out of the original signal spectrum into
other frequencies. In the present case where the stimulation frequency produces a peak at
15 Hz, it happens that part of the signal of interests leaks to the frequency bins near 15 Hz.
The frequency bins are fixed by the FFT depending on the length on the input signal. There-
fore, they may not include a bin exactly at 15 Hz.
Figure 5.16: Multi-taper example on a single-trial MEG measurement
(a) 1 window of 5 s (b) 5 windows of 1.6 s (c) 12 windows of 0.8 s
Figure 5.17: Multi-taper periodogram obtained with 3 different sizes of windows. Each curve
corresponds to one sensor on the different groups of MEG sensors (OR: occipital right, OL:
occipital left, PR: parietal right, PL: parietal left, TR: temporal right, TL: temporal left). The
periodogram is estimated on the averaged data for one subject during stimulation of the lower
left quadrant at 7.5 Hz. One can observe the smoothing effect of the multitapering.
Once the PSD is estimated, the topography at the frequency of interest can be observed.
161
In figure 5.18, the estimated PSD at 15 Hz (or at the closest frequency bin) is displayed. The
topography represents an energy and is therefore positive. A hot spot on the occipital sensors
can be observed which confirms the spatial localization of the signal.
Figure 5.18: PSD at 15 Hz represented on the sensors. A clear hot spot on the occipital region
confirms the presence of a source responding at 15 Hz in the occipital cortex.
Another convient way for exploring the spectral content of the data, consists in computing
time-frequency (TF) maps. Using filters banks localized in time and frequency, such repre-
sentations describe the location of energy in a time-frequency plot. Common filters used in
M/EEG are Gabor and Morlet filters. We refer the reader to Appendix C for more details on
how the Gabor filters used in this work were designed.
An interest of the TF representations is that they enable a comparison between the pre-
stimulation and the stimulation periods. Let us denote TF (t, f) the PSD estimated at time
t and frequency f . Let TFbase(t, f) the restriction of TF (t, f) to the pre-stimulation period
(0 < t < 0.6 s) and TFbase(f) the mean over t of TFbase(t, f). Let us call the Event Related
Synchronization / Event Related Desynchronization (ERS/ERD) coefficient the quantity:
ERS/ERD(t, f) =TF (t, f)− TFbase(f)
TFbase(f).
If no difference is present between pre-stimulation and stimulation periods ERD/ERS(t, f)
is equal to 0. If ERD/ERS(t, f) is equal to 10, it means that the energy during stimulation
is 10 times bigger than during the pre-stimulation, a.k.a., the baseline period. To have a
representation that scales to possibly large values of ERD/ERS(t, f), we represented in the
TF plot the quantity:
sign(ERD/ERS(t, f)) log(1 + |ERD/ERS(t, f)|) ,
where sign(x) stands for the sign of x, i.e., sign(x) = x/|x| if x 6= 0 and 0 otherwise. The
function log stands for the decimal logarithm (log(10) = 1).
In figure 5.19, a sample TF plot is presented. Like in [69], the TF plot was computed on
the signal obtained by averaging all the trials corresponding to the condition of interest. We
can observe a strong increase of PSD at the harmonics of f = 7.5 Hz, especially 2f = 15 Hz,
after the stimulation onset at 0.6 s. The harmonic 4f = 30 Hz can also be observed. The
dashed vertical bar represents the beginning of the stimulation. This observation, done also
on different sensors and conditions, led us to the conclusion that the signal of interest could
162 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
be extracted from the Fourier coefficients at 15 Hz.
time (s)
frequency (
Hz)
2 4 6579
11131517192123252729313335373941
sig
n(E
RS
/ER
D)
Log(1
+ |E
RS
/ER
D|)
0
0.5
1
1.5
Figure 5.19: Sample time frequency map, a.k.a., spectrogram, estimated on the averaged
signal measured on the MLO11 sensor. Stimulation was performed at 7.5 Hz with a pattern
positioned on the lower left quadrant. Spectrogram was computed with Gabor filters with
ξ = 15 (cf. chapter C).
5.5.2 Method
5.5.2.1 How to invert?
In order to localize the current generators responding to the frequency tagged stimulation
pattern, the straightforward way consists in inverting the full temporal data in order to obtain
a time series per current dipole on the cortex. Then for each dipole the PSD can be computed.
The dipole having the highest PSD are the most likely to be active.
Using a linear inverse method based on an ℓ2 prior like the standard minimum-norm (MN)
solver, the source estimates are given by:
X = GT (GT G + λI)−1M .
Spectral estimation with Fourier analysis is linear. Let us denote Φ the dictionary of Fourier
atoms such that the Fourier transform of a temporal vector x can be written in a matrix
form Φx. The Fourier coefficients of the sources can then be written X = XΦT . The ith
line of X contains the Fourier transform of the temporal activation of the ith dipole on the
source space. By writing this in matrix form, it appears that the Fourier transform of the
sources can be obtained from the Fourier coefficients estimated on the sensors. We have that
X = GT (GT G + λI)−1M where M = MΦT . The practical consequence of this observation is
that when using a linear inverse solver the FFT on the sources can be obtained at the price
of the computation of the FFT on the sensors. The PSD can then be obtained from the FFT.
Rather than inverting the full data, this suggests to invert only the portion of the spectrum
that contains the relevant information. In the current study, this information appears to be
contained in the Fourier coefficient at 15 Hz.
5.5.2.2 Estimating active regions with permutation tests
The inverse problem gives the current estimates. However, what is really of interest are the
locations where these estimates are sufficiently significant to be considered as active. In order
to locate these statistically significant activations, non-parametric statistical tests provide a
163
robust and principled approach.
Basics of non-parametric statistical tests and permutations
H0 and H1. Like standard statistical tests, non-parametric tests are used to test the
validity of a hypothesis. There two hypotheses: the null hypothesis, denoted H0, that states
that observations result purely from chance, and the alternative hypothesis, denoted H1, that
the observations are influenced by a non random cause. Rejecting H0 with a certain level of
confidence means that the observations are not purely the result of chance.
In the context of brain functional imaging with M/EEG, the null hypothesis H0 classically
states that the stimulation has no effect on the neural activation in a given brain region,
while H1 states that the activation is influenced by the stimulation. When rejecting H0, the
brain region of interest is flagged as active.
Type I and type II errors. Two types of errors can be erroneously produced in statistics:
type I and type II. A type I error is made when the null hypothesis is erroneously rejected.
It corresponds to a false positive. Type II errors are made when the null hypothesis is erro-
neously accepted. It corresponds to a false negative. The statistical power of a test is related
to type II errors. The more powerful a test is, the less it will produce type II errors. In prac-
tice, one wants to use a statistical test with a high power and with a control of type I errors.
This control is given by the P-value.
Let us illustrate this with an example in the context of non-parametric tests.
Example. Let us consider two populations of 10 subjects each. The two populations,
denoted A and B, have been assigned to 2 different treatment conditions (conditions A and
B). If the null hypothesis H0 is true, no difference between the means of condition A and B
will be found at the end of the treatment. Under H0, i.e., if H0 is true, label A is not different
from label B, hence they can be exchanged without affecting the difference between means
A and B. In the same way, the 10 observations can be switched from A to B. Let us now
systematically rearrange the 20 observations by permuting the labels of the observations and
compute the difference between mean A and B. There are 210 = 1024 possible permutations of
the labels, leading to 1024 values of difference of the means between A and B under the null
hypothesis. We can now answer the following (inferential) question: Under H0, what is the
probability to obtain the difference in the means that we observed in the experimental data?
If this quantity is smaller than 5%, we say that the hypothesis H0 is rejected with a P-value
of 0.05.
Let us rewrite this with simple equations. Let us denote a = (al)l ∈ R10 (resp. b = (bl)l ∈
R10) the vector of measurements observed in population A (resp. B). The difference of the
means obtained with experimental data is:
Texp =1
10
∑
l
(al − bl) =1
10(a− b)T · 1 ,
where 1 is a vector in R10 filled with ones. It corresponds to the value of the statistic, denoted
T , with the experimental data. We denote it Texp. In order to compute a “permuted” version
of this statistic one can simply replace the vector 1 with a vector filled randomly with 1 and
−1. For a given permutation indexed by n we denote pn this random vector. We compute a
value of the statistic under H0 with:
Tn =1
10(a− b)T pn .
164 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
Let us denote P ∈ R10×1024 a matrix containing as columns all the possible vectors pn. The
distribution under H0 is obtained by:
(Tn)n =1
10(a− b)T P ∈ R
1024 .
The hypothesis H0 is rejected with a P-value of 0.05 if:
#n/Tn ≥ Texp1024
≤ 0.05 .
General permutation test procedure.
1. Select a test statistic which measures the differences between conditions (here differ-
ence of the means).
2. Compute the test statistic for the original condition labeling (Texp).
3. For each resampling k, randomly rearrange the condition labels and compute the test
statistic Tk for the permuted data and add it to the null distribution.
4. Repeat step 3 until a predefined number of resamplings has been performed (or all
resamplings if it is tractable).
5. Compare the null distribution of the test statistic to the original data.
6. Accept or reject the null hypothesis based on the proportion of permuted test statistics
greater than or equal to the original.
Remarks. Using as statistics the difference of the means has the advantage that the pro-
cedure can be described with basic linear algebra formulas. This formulation with matrices
also has the advantage to provide a straightforward way to implement it and to benefit from
efficient linear algebra software packages like (BLAS/LAPACK or the Intel MKL).
In practice, one often has more than 10 observations per condition. Therefore testing all
the possible permutations is not always tractable. To circumvent this problem only a random
sample of all data permutations is selected. By doing so, the test loses of his statistical power
and the P-value is only an approximation. It can be observed that the P-value cannot be
smaller than 1/1024 or more generally 1/N , where N is the number of random permutations.
Non-parametric tests are very flexible compared to standard parametric tests. Any statis-
tic, such as a T-test, can be used depending of the application. Also, no assumption, such as
Gaussianity, on the statistical distribution of the random samples is required. More impor-
tantly, they offer a very easy way to correct for the problem of multiple comparisons.
Multiple comparisons. When running many times the same test, assuming the tests
are independent, the probability that one test will reject H0 increases. When running 100
tests with independent samples with a P-value of 0.01 this probability is equal to 1! The
null hypothesis will be rejected once almost surely. Therefore, when running multiple tests,
it is important to control for example the FWER (Familywise error rate). The FWER is the
probability of making one or more type I errors among all the hypotheses when performing
multiple tests.
Remark. The False Discovery Rate (FDR), often used in neuroimaging [85], is a different way
of controlling Type I errors when running multiple comparisons. FDR controls the expected
proportion of incorrectly rejected null hypotheses. If the FDR is controlled with a P-value of
0.05, it means that among 100 rejected null hypotheses, 5 are expected to be false positives.
With a control of the FDR, one rejects null hypotheses more “easily” than when controlling
165
the FWER. The control of the FDR is sometimes said to be less conservative than the FWER.
With non-parametric tests, control of the FWER can be done using what is called the
statistic of the “max”. Let us assume that a test is performed at multiple brain locations,
indexed by i.
P (FWER) = P (∪iT i ≥ u|H0) (Prob. any position exceeds the threshold u)
= P (maxiT i ≥ u|H0) (Prob. max position exceeds the threshold)
= 1− Fmax T |H0(u) (1-cumulative density function of max position)
= 1− (1− α) = α
(5.1)
This means that controlling the probability that the maximum statistic over the brain
exceeds a threshold u under H0 provides a control of the FWER with the same probability α.
Let us come back to our example. Suppose now that we run the test of the two treatment
conditions A and B, I > 1 times. The measured results are stored in two matrices A ∈ R10×I
and B ∈ R10×I . The experimental values of the statistic are given by:
(T iexp)i =
1
10(A−B)T · 1 ∈ R
I .
Under H0 the distribution of the maximum is given by:
(Tmaxn )n = max
(
abs
(
1
10(A−B)T P
))
∈ R1×1024 ,
where the function max computes the maximum value of every column of its input and abs
computes the modulus of each coefficient of its input. The hypothesisH0 is rejected at position
i with a P-value of 0.05 if:#n/Tmax
n ≥ T iexp
1024≤ 0.05 .
Sample histograms of (T iexp)i and (Tmax
n )n are presented in figure 5.20. Data are extracted
from the retinotopy MEG dataset for which I is between 15000 and 40000 depending on the
number of dipoles considered on the cortical mesh.
−1 −0.5 0 0.5 1 1.50
0.2
0.4
H0
T0
Figure 5.20: Example of histograms of (T iexp)i, denoted T0, and histogram of the (Tmax
n )n,
denoted H0. The dashed vertical line represents the threshold of the statistic with a P-value
of 0.05. 5% of the (Tmaxn )n are above this line. The null hypothesis is rejected at all the
positions i such that T iexp is above this threshold.
166 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
Permutations tests with M/EEG data. In neuroimaging, and particularly with M/EEG
data, what is of interest is to know where and when a particular brain region is activated
by the stimulation. The control condition is then typically extracted from the measurements
recorded in the pre-stimulation period, a.k.a., the baseline period. Permutation tests were
introduced in the field of M/EEG in [168, 169].
5.5.2.3 The mapping procedure
The data exploration on the sensors in section 5.5.1 confirmed the presence of significant
information at 15 Hz. In section 5.5.2.1, it was explained how to restrict the computation
of the inverse problem to a portion of Fourier spectrum. And in section 5.5.2.2, basics of
non-parametric statistical tests have been presented. The mapping procedure that we now
present is based on all these remarks and results.
The frequency of interest is 15 Hz. Therefore, rather than computing an FFT to get the
Fourier coefficients at 15 Hz, or at the closest frequency bin, a simple correlation of the signal
with a complex sinusoid tuned at 15 Hz can be used. Such a sinusoid at a frequency f0 is
defined by
φf0(t) = exp(2iπf0t) .
The discretized version of φf0is a vector denoted φf0
. By computing the correlation of the
measurements in each trial, indexed by l, with φf0, we obtain a complex valued Fourier co-
efficient for each sensor in each trial. Let us denote Ml the measurements for trial l. The
coefficients for this trial are given by mlf0
= Mlφf0∈ R
dm . By concatenating all the mlf0
for all the dl trials, we get a matrix of Fourier coefficients, Mf0 ∈ Rdm×dl . For the sake of
simplicity, we will omit from now on the index f0 for the Fourier coefficients.
The data that we propose to invert is M. Using an ℓ2 prior, the Fourier coefficients in the
source space are given by:
X = GT (GT G + λI)−1M ∈ Rdx×dl .
In order to run statistical tests, the Fourier coefficients need to be estimated under two
conditions, here on the stimulation period and on the baseline period. Let us denote Xstim
and Xbase the two sets of coefficients. If we were to consider the difference of the means as
statistic the distribution under H0 would be given by:
(Tmaxk )k = max
(
abs
(
1
dl(Xstim − Xbase)T P
))
∈ R1×N .
The matrix P ∈ Rdl×N is the permutation matrix filled with 1 and -1.
In order to compensate for depth bias, it is classical with M/EEG to normalize the re-
constructed currents using an estimate of the variance of the noise. We refer the reader to
chapter 3 and particularly to section 3.2.2.3. The estimate of the noise variance is obtained
by computing the variance and standard deviation of each row vector in Xbase. We denote
this vector of standard deviations by σbase = (σbasei )i ∈ R
dx . The noise normalized versions of
Xstim and Xbase, denoted Xstimnn and Xbase
nn , are obtained by dividing each coefficient on line i
by the standard deviation σbasei .
The experimental value of the statistic at position i, T iexp, is then given by:
Texp =1
dl(Xstim
nn − Xbasenn )T · 1 ∈ R
dx , (5.2)
167
and the distribution under H0 by:
(Tmaxk )k = max
(
abs
(
1
dl(Xstim
nn − Xbasenn )T P
))
∈ R1×N .
Again, the hypothesis H0 is rejected at position i with a P-value of 0.05 if:
#n/Tmaxn ≥ T i
expN
≤ 0.05 .
The vertices on the triangulated source space where H0 is rejected, are the active vertices.
Computation time. The Fourier coefficients of interest are obtained with a simple matrix
multiplication on both the stimulation and the baseline data. The inverse computation with
an ℓ2 prior, is also achieved with a simple matrix multiplication. Finally, thanks to our presen-
tation of the permutation tests procedure using also a matrix formulation in section 5.5.2.2,
the full procedure has a very limited computational demand. On a standard computer, the
mapping pipeline takes less than a 1 minute. This of course assumes that the forward models
have been previously computed.
Testing on the Fourier coefficients vs. on the Power spectral density. The procedure
just described works on the Fourier coefficients. An alternative strategy consists in working
on the power spectral densities (PSD). Rather than considering Texp as detailed in (5.2), the
statistic can be computed with:
Texp =1
dl(|Xstim
nn |2 − |Xbasenn |2)T · 1 ∈ R
dx .
The matrix |Xnn|2 contains the squared modulus of the elements of Xnn. A possible moti-
vation for using the PSD rather than the Fourier coefficients with the phase information is
that it allows to estimate the PSD using a multitaper approach. However, by neglecting the
phase, the statistical power of the test is reduced. But on the other hand, the quality of the
estimate of the PSD, or equivalently the modulus, is improved. In our study both approaches
were investigated.
5.5.3 Mapping results
5.5.3.1 Localization results with ℓ2 inverse solvers
The first step consists in computing Xstim and Xbase. An estimate of the noise can be es-
timated from Xbase. This leads to the computation of Xstimnn and Xbase
nn , and finally Texp. A
example of Texp map is represented in figure 5.21. This image was obtained with a stimu-
lation in the lower left quadrant of the visual field with a pattern flickering at 7.5 Hz. The
quantity Texp was here computed using the Fourier coefficients at 15 Hz (not the PSD).
In order to interpret the values displayed, it can be noticed that the value Texp can be
related to a z-score. A value of 2 is a value twice bigger than the variance estimated on the
baseline. For a Gaussian distribution, a value twice bigger than the standard deviation is
relatively significant. However, the active regions were not designated with such a rule of
thumb. The map in figure 5.21 is thresholded using the non-parametric statistical procedure
detailed above. The resulting thresholded activation map is presented in figure 5.22. The
image was obtained with a p-value set to 0.05 and 15000 permutations. The experimental
data contained 15 trials which means that we used 15000 out of 215 = 32768 possible per-
mutations. In order to visually improve the result, we extracted from the thresholded map
168 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
the connected component containing the highest activations (cf. figure 5.22(c)). We observe
in this result, that the estimated active region stands on the upper bank of the calcarine
fissure. According to the position of the stimulation pattern in the lower left quadrant, this
localization result agrees perfectly with our knowledge on the organization of the primary
visual cortex (cf. section 5.1).
Figure 5.21: Example of Texp map to be thresholded. The color represents at each position ithe value T i
exp computed on the Fourier coefficients at 15 Hz. Data correspond to the stimu-
lation in the lower left quadrant at a frequency of 7.5 Hz.
5.5.3.2 From localization to retinotopic maps
In previous section, we detailed the different intermediate steps to complete in order to ob-
tain a localization result. However, our interest goes beyond simple localization, since our
objective is to obtain a pipeline to achieve retinotopic mapping with MEG. This implies that
the same parameters should be used for all the experimental conditions, meaning here all
the localizations of the flickering pattern in the visual field. We would like to emphasize that
this can be relatively challenging and that this issue is rarely mentioned in classical studies
where the different experimental conditions are treated separately. This implies for example
that the regularization parameter in the inverse problem and that the statistical threshold
level should not be manually tuned for each experimental condition. When achieving a map-
ping like here, all the data for the different conditions are processed in the very same way.
Therefore, the processing pipeline needs to be robust to the variations that necessarily occur
between different experimental datasets.
In order to address the problem of retinotopic mapping, the results for all the positions
in the visual fields need to be displayed on a common source space, i.e., on a same triangula-
tion. When two different conditions both produce a significant activation at a same location,
the condition that is selected and displayed is the condition for which the value of Texp is
maximum.
In the following results, the position of the flickering pattern in the visual field is color
coded. Color conventions are given in figure 5.23. A result of retinotopic mapping obtained
with a minimum-norm is provided in figure 5.24. In order to obtain this result, the regulariza-
tion parameter in the minimum-norm was set using the 10% rule of thumb from Brainstorm
(cf. section 3.2.1.2). The Fourier coefficients were obtained during the steady-state period
169
(a) Thresholded map on the cortex
(b) Thresholded map on the inflated cortex
(c) Thresholded map on the inflated cortex with only themain connected component lying in V1.
Figure 5.22: Example of thresholded statistical map Texp (p=0.05 with 15000 permutations).
Data correspond to the stimulation of an lower left quadrant at a frequency of 7.5 Hz.
170 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
in the stimulation time interval (after 400 ms of stimulation) and during the baseline. The
P-value was classically set to 0.05 and 15000 permutations were run for each condition.
In this result, it can be observed that the mapping for the left hemi visual field is partic-
ularly well recovered on the right hemisphere. The horizontal meridian is correctly mapped
in the calcarine sulcus while the upper and lower left visual fields are respectively mapped
on the the lower and upper banks. When observing the result for the right hemi visual field,
it can be observed that the lower quadrant is correctly mapped on the upper bank of the left
calcarine sulcus and that the right horizontal meridian produces an activation that includes
the left calcarine sulcus. However, it can also be observed that the Minimum-Norm tends to
over estimate the extent of the activation for this particular condition (cf. figure 5.24(a) and
figure 5.24(b)).
Please note that this is after observing such results that we investigated the use of the
ℓw;212 prior to improve the quality of retinotopic mapping with MEG.
Figure 5.23: Color conventions for each condition represented at their position in the visual
field.
5.5.3.3 Reconstruct on WM-GM or GM-CSF interface?
Depending on the M/EEG source analysis pipelines, neural current can be estimated over a
mesh separating the gray matter (GM) and the white matter (WM), or over a mesh separat-
ing the gray matter from the cerebro-spinal fluid (CSF). This latter interface corresponds to
the outer surface of the gray matter. For example the MNE software mentioned at the end
of chapter 2 generally presents source estimates on the WM-GM interface while users of the
Brainstorm toolbox usually work with the GM-CSF interface extracted with BrainVISA [38].
One reason for this is that the MNE software computes forward models with a 3 layer BEM
whose inner layer is the inner skull interface, which is very close to the GM/CSF, while Brain-
storm’s users work with spherical head models and therefore can use the GM-CSF interface
as source space.
During this thesis, we tried our retinotopic mapping pipeline on both interfaces. We
present in figure 5.25(a) and figure 5.25(b) two results obtained with the very same param-
eters for the reconstruction and for the statistical procedure. Some clear differences appear
between these two results, which demonstrate the influence of the source mesh on the results.
In both cases, the extent of the active region corresponding to the right meridian appears to
be overestimated. This is particularly problematic for GM-CSF interface since the active re-
gion for the meridian clears the active region for the lower right quadrant. This illustrates
particularly well the problem of working with multiple conditions simultaneously when no
parameter tuning is performed for each condition individually.
The ℓ212 mixed norm presented at the end of chapter 4 is the strategy we investigated
during this thesis in order to better control the extent of the active regions.
171
(a) Left hemisphere (Medial view) (b) Inflated left hemisphere (Medial view)
(c) Right hemisphere (Medial view). (d) Inflated right hemisphere (Medial view).
Figure 5.24: Retinotopic map result obtained using a minimum-norm inverse solver and sta-
tistical tests run on the Fourier coefficients (p=0.05 with 15000 permutations). Data corre-
spond to the stimulation at a frequency of 7.5 Hz.
172 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
(a) Result on the GM-CSF interface.
(b) Result on the WM-GM interface.
Figure 5.25: Comparison of retinotopic map results obtained by reconstructing with MN on
the GM-CSF and on the WM-GM interfaces. Statistical tests were run on the Fourier coeffi-
cients (p=0.05 with 15000 permutations). Data correspond to the stimulation at a frequency
of 7.5 Hz.
173
5.5.3.4 Localization results beyond simple ℓ2 inverse solvers.
The ℓ2-norm vs. the ℓ212-norm. As it has been observed above, a standard ℓ2 prior, a.k.a.
Minimum-Norm, tends to overestimate the extent of active regions. In order to reduce this
problematic behaviour of standard ℓ2 inverse solvers, we have proposed to invert all the ex-
perimental conditions simultaneously and to promote non overlapping activations by using
what we called in chapter 4 an inter-condition sparse prior. This prior described in detail
in section 4.5 is based on a mixed norm with 3 levels where sparsity is induced between
conditions using an ℓ1 norm. The inverse solver is not linear any more but the problem is
still convex, which offers the possibility to perform the current estimation with very efficient
algorithms. In the following results, the optimization with the ℓ212 prior is performed with
Nesterov’s iterative scheme (cf. algorithm 4.4 in chapter 4).
When working with an ℓ212 prior, the measurements are indexed with a triple index. Here
we concatenate the Fourier coefficients estimated on all trials. Each coefficient is indexed by
the sensor, the condition and the trial. Compared to what is presented in chapter 4, the last
index used here do not correspond to the time but to the trial. With 6 experimental conditions,
the matrix to invert is in Rdm×6dl . After running the inverse solver, the estimated Fourier
coefficients on the source space form a matrix in Rdx×6dl . The same statistical procedure as
for the MN is then used to threshold the activation maps.
A comparison of mapping results obtained with a simple MN and an ℓ212 prior is presented
in figure 5.26. It can be observed that the extent of the activation for the right horizontal
meridian is significantly improved by using the ℓ212 prior. The active region now clearly lies
only in the calcarine sulcus which is consistent with our knowledge about the organization
of the primary visual cortex. However the activation for the upper right quadrant is still not
correctly localized.
The full retinotopic maps obtained with the ℓ212 prior are presented in figure 5.27.
With the MiMS solver. The MiMS inverse solver (cf. section 3.3.1 ) was also experi-
mented on the same data by Benoit Cottereau during his thesis. Results obtained with his
method can be found in [6].
The MiMS solver belongs to the class of Bayesian inverse solvers in the sense that the
weights of an ℓ2 prior are learned. These weights are learned with a multi-resolution ap-
proach based on multipolar modeling of spatially extended cortical parcels (cf. section 3.3.1).
In the approach exposed in [6], the weights are learned on the full temporal data during the
stimulation period. Then the WMN linear inverse solver obtained with the learned weights is
used to get the source estimates. Cottereau then estimates the PSD at 15 Hz for each source
in each trial using a multitaper approach. The PSD estimator is based on Welch’s method (cf.
section 5.5.1 ). Finally, he performs on the PSDs a non-parametric permutation test similar
to the one we exposed above.
Our approach and his approach differ in a number of points. First, Cottereau learns
on the full data even though he is using only the Fourier coefficient at 15 Hz in his non-
parametric statistical test procedure. This means that he may exploit in his localization
pipeline information in a wider region of the spectrum than solely 15 Hz. In order to test
this hypothesis, it would be interesting to band pass filter the data around 15 Hz to see if the
MiMS inverse solver keeps providing the same localization results. Second, by learning the
weights with multipolar expansions, Cottereau limits the influence of the dipole orientations
on the results. Indeed, the multiresolution approach can be seen as a way to provide regions
of interest without really considering the orientation of each individual dipole. Once the
matrix used for linear inversion is computed, his strategy is very similar to one we used in
this chapter. However, it is necessarily slower since the PSDs are actually estimated in the
source space using the reconstructed time series.
174 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
(a) Map obtained with MN.
(b) Map obtained with the ℓ212 prior.
Figure 5.26: Comparison of retinotopic mapping results obtained with a MN and with the ℓ212prior. Results are presented on left hemisphere using the GM-CSF interface as source space.
Statistical tests were run on the Fourier coefficients (p=0.05 with 15000 permutations). Data
correspond to the stimulation at a frequency of 7.5 Hz.
175
(a) Left hemisphere (Medial view) (b) Inflated left hemisphere (Medial view)
(c) Right hemisphere (Medial view). (d) Inflated right hemisphere (Medial view).
Figure 5.27: Retinotopic map result obtained with an inverse solver based on an ℓw;212 prior.
Statistical tests were run on the Fourier coefficients (p=0.05 with 15000 permutations). Data
correspond to the stimulation at a frequency of 7.5 Hz. Results are represented on the inter-
face between the gray matter and the cerebro-spinal fluid.
176 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
To our knowledge, Cottereau uses an FFT to extract the PSDs. This is not required if
the interest is only in the spectral coefficient at 15 Hz. A simple correlation with a complex
sinusoid is enough. It also avoids the problems at the border of the time interval by not
requiring to compute a circular convolution like with a discrete Fourier transform.
Taking the best of both approaches would consist in plugging into our pipeline the mul-
tiresolution approach in order to improve the MN results with the weights learned by the
MiMS procedure. This would favor more spatially regular activation patterns and certainly
improve the mapping results while keeping a computationally efficient procedure.
5.5.3.5 Effect of the orientation constraint
As mentioned when discussing the mapping strategy based on MiMS, a possible limitation of
the solvers we experienced is their strong dependence on the orientation of the dipoles sam-
pled over the triangulated source space. A solution to circumvent this problem is to work with
unconstrained orientations. At each location of the source space lie 3 dipoles, each oriented in
the direction of a coordinate axis. This provides 3 coefficients at each location. The amplitude
of the activation at each location is then obtained by computing the norm of the vector formed
by these 3 coefficients. This provides a straightforward way to estimate the PSD at a given
location when no orientation contraints are used. The statistical test procedure can then be
identically computed on the PSDs estimated separately on the stimulation and the baseline
periods.
A result obtained with unconstrained orientation on the same data as in section 5.5.3.1
is presented in figure 5.28. It can be observed that the absence of orientation constraints
produces spatially smoother active regions. The border of the active region is less influenced
by the change of curvature in the source space. Results are consequently more robust to the
intricate structure of the cortical region neighboring the calcarine fissure. However, ignoring
the orientations tends to produce even wider active regions by spreading the currents esti-
mates over the banks of the calcarine fissure. As a result, retinotopic mapping with an ℓ2 prior
also appeared to be very challenging when using no orientation constraints. Improvement on
the robustness of the method to the complex anatomic structure of the occipital cortex has the
drawback of an increased tendency of the minimum-norm to smear the reconstructions over
wide cortical areas when the orientations are ignored.
5.6 TIMING VISUAL DYNAMICS WITH MEG
The primary reason for using M/EEG is to exploit its very good temporal resolution.
Up to here, we focused on the spatial precision of M/EEG source estimates and we exploited
the excellent temporal resolution of M/EEG by using a frequency tagged stimuli. By doing
so the SNR is improved which facilitates the localization. However, the ultimate objective
concerns the measurement of delays between cortical activations and particularly between
different visual areas.
5.6.1 Estimating timings in the visual cortex with M/EEG: Literature
review
Using EEG in [213], the authors address this issue with fitted dipoles whose positions were
constrained with fMRI localization results. Dipoles are located in V1, jointly V2v and V3v,
jointly V2d and V3d, left and right LOV5 (Lateral occipital V5). Once the amplitude time
series are estimated for each dipole, the peaks of the waveform can be compared. The delays
are estimated by measuring peak to peak time intervals. In [162, 190], delays of activation
177
(a) Right hemisphere (Medial view).
(b) Right hemisphere (Medial view) on inflated cortex.
Figure 5.28: Example of localization obtained with no orientation constraint using a MN.
Statistical tests were run on the PSDs (p=0.05 with 15000 permutations). Data correspond
to the stimulation at a frequency of 7.5 Hz in the left lower quadrant of the visual field.
178 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
in visual system are also obtained with fMRI localizers and dipole fitting. To our knowledge,
timing of activations in the visual cortex with MEG has always been done in conjunction with
fMRI data.
In all these studies, the delays observed correspond to a few milliseconds. This observa-
tion is of major interest with respect to the frequency used in our steady-state stimulations.
Measurements based on the phase are limited to the interval between 0 and 2π, or equiva-
lently −π and π. It corresponds to a full cycle which lasts, at a frequency of 15 Hz, 66.6 ms.
A measure of phase difference can therefore be related to a time delay as long as the delay
of interest is smaller than 66.6 ms. This is the case, in particular, for the delays between
activations in the visual cortex as it is reported in various studies [28, 162, 190, 213]. This
latter remark suggests that time delays could be extracted from the phase estimated on the
sources. A recent study from Di Russo et al. [191] tends to confirm this point.
5.6.2 Extracting information from the phase
The Fourier coefficients at 15 Hz at the source level are denoted Xstim. The coefficient, de-
noted xstimil , on line i and column l in Xstim corresponds to the Fourier coefficient at 15 Hz for
the dipole at position i during trial l. In order to evaluate the quality of the phase information,
a quantity referred to in the M/EEG literature as phase lock can be computed [133].
We define in our case the phase lock, or phase locking value (PLV), for dipole at position i
by:
PLV(i) =1
dl
∣
∣
∣
∣
∣
dl∑
l=1
xstimil
|xstimil |
∣
∣
∣
∣
∣
.
It can be observed that PLV(i) ∈ [0, 1] and that PLV(i) is equal to 1 if all the angles, i.e., the
arguments, of the complex values xstimil are the same. This means that, the bigger is PLV(i),
the more the phase is stable across trials. This is illustrated in figure 5.29.
PLV APLV
Figure 5.29: Schematic representation to illustrate the computation of the phase locking
value (PLV) and the angular information called APLV (see text).
A PLV map computed on the dataset that helped to illustrate section 5.5.3.1 is presented
in figure 5.30. A clear “hot spot” can be observed on the upper bank of the calcarine sulcus
which is consistent with the mapping results obtained above on the same dataset.
Once the PLV is computed in order to assert that the phase contains information stable
across trial, we can investigate the angular part of the average vector. Please note that if a
quantity is stable across trials, it means that it is related to the stimulation. While the abso-
lute value of the average vector provides the PLV, the angle contains the phase information
179
Figure 5.30: Example of phase lock value (PLV) map. The closer is the PLV to 1, the more
the phase of the estimated Fourier coefficients is stable across trials. One can observe a clear
“hot spot” on the upper bank of the calcarine sulcus. This agrees with our knowledge on V1
for a stimulation in the lower left quadrant in the visual field.
which can provide information about delays. We call this quantity APLV and we define it by:
APLV(i) = ang
(
1
dl
dl∑
l=1
xstimil
|xstimil |
)
∈ [−π, π] .
The computation of the APLV is illustrated in figure 5.29.
5.6.3 Preliminary results
In previous sections, we detailed our strategy for the analysis of the phase information at the
cortical level in order to investigate delays. The PLV provides a principled way to assert if the
phase is stable across trials and therefore if it contains information related to the stimulation.
If the PLV is close to 1, as it is the case with our example in figure 5.30, the APLV provides
an angular information also related to the stimulation which might lead to new insights on
the delays. A map of APLV restricted to the regions flagged as active by permutation tests on
the Fourier coefficients is provided in figure 5.31.
Such a map shows the presence of distinct values of angle in the occipital region. However,
it appears to be hard to directly interpret the differences of angle between two regions as a
delay of propagation. Due to the numerous steps to complete in order to obtain such an im-
age, we acknowledge that such an interpretation requires much more validation possibly by
testing with multiple stimulation frequencies. Our results on the phase are very preliminary
and are presented here just to motivate the study of delay estimation with MEG data and
steady-state visual stimulation. Also, we should recall that phase or delay estimation comes
after the mapping which means that the mapping procedure should first be considered as a
solved problem.
5.7 DISCUSSION
In this chapter, we demonstrated that fast retinotopic mapping of V1 with MEG
could be achieved using relatively simple mathematical and algorithmic tools. This was
made possible thanks to a set of technical decisions from the design of the protocol to the
180 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
(a) Full right hemisphere. (b) Zoom on occipital cortex.
Figure 5.31: Sample phase map used for delay estimation. The quantity represented is the
APLV (see text) restricted to the active region delineated by permutation test on the Fourier
coefficients. Data correspond to the stimulation at a frequency of 7.5 Hz in the lower left
quadrant of the visual field. Results are represented on the inflated interface between the
white matter and the gray matter.
data analysis. The first decision was to use a steady-state stimulation protocol with a flicker-
ing pattern. With a steady-state stimulation and a frequency tagged stimulus, it is possible
to automatically extract from the spectrum of the signal the relevant information. It is not
the case with a standard study based on event related potentials (ERPs). With an ERP based
protocol, the information is extracted at the peak of the waveform in the time domain. The
peak of activation is not stable across conditions and subjects, therefore the extraction re-
quires a manual intervention. The second advantage of the frequency tagged stimulus is that
it allowed us to extract the signal of interest from raw data. All the results presented in this
chapter were obtained without any data cleaning. We used for the stimulation a contrasted
pattern with multiple orientations in order to increase the amplitude of the neural response
and consequently improve the SNR. Finally, we decided to solve the inverse problem with a
distributed source model in order to be able to reconstruct the activation pattern produced
by an unknown number of active sources but also to obtain mapping results with spatially
extended active regions. These concomitant choices of protocol and data analysis strategy
allowed us to obtain a fully automatic procedure for the retinotopic mapping of V1 with MEG.
Our understanding of the different steps in the pipeline enabled us to significantly speed
up the computation by limiting the computation of the solution of the inverse problem to
the Fourier coefficients of interest and by performing the non-parametric tests with simple
linear algebra. In this chapter, we argued that to actually achieve retinotopic mapping it is
necessary to have a pipeline where all the experimental conditions are processed with the
same parameters. In our analysis of the data, we tried to find solutions where manual tuning
was not required for each experimental condition independently. This led us to the conclusion
that a possible solution was to invert the measurements, or more precisely here the Fourier
coefficients, for all the conditions simultaneously.
Such a multi-condition analysis was presented above using a regularization prior based
on a ℓw;212 mixed norm. In section 5.5.3.4, we demonstrated that this prior can actually
improve the mapping results. Our experience with the ℓw;212 norm shows that it clearly helps
181
to delineate the active regions for each condition. It sets the reconstructions to 0 over regions
where other conditions are more likely to be active and it enhances the amplitudes of the
reconstructions in other regions. By doing so it helps to control the spatial extent of active
patterns and it improves the mapping.
Using the ℓw;212 prior implies an increase in computation time. However, the efficient
algorithms detailed in chapter 4 allowed us to get results with a ℓw;212 prior in a few minutes
with highly sampled cortical meshes. However, our experience shows that the ℓw;212 prior
does not help to obtain significantly active dipoles for conditions where the MN has failed to
see something significant.
During our exploration of these data, we met a set of difficulties. Among these is the
approximative estimation of the dipole orientations when they are fixed by the normals to the
mesh. The cortical region neighboring the calcarine fissure can be very intricate which makes
the estimates of the normals to the gray matter quite noisy in this brain region. Sensitivity
of the results to the source space is clearly illustrated above, where reconstruction on the
WM/GM and GM/CSF interfaces are compared. However, using unconstrained orientations
creates other problems already discussed.
Our interest for other solvers than MN was largely motivated by the difficulties we met
when addressing the problem of retinotopic mapping with MEG. In our investigations on
these data, we tried to use the ℓ21 prior (cf. section 4.4.2) but as we could have expected it
did not provide very nice results as this solver by construction sets an ℓ1 norm over space
and consequently cannot really reconstruct spatially extended activations. We also tried to
use the Gamma-MAP inverse solver (cf. section 3.3.2) but it appeared quite difficult to have
good source covariance templates and a good noise covariance estimate to obtain good re-
sults. This is probably due to our limited expertise with this solver on real data. In order to
promote spatially smooth reconstructions, we also tried a prior based on the ℓ2 norm of the
surface gradient. We called this solver HEAT in chapter 3. However, results with this solver
show that such a prior can significantly change and degrade the localization results when the
cortical region of interest is particularly intricate.
In our investigations, we also tried to use other criteria than the 10% rule of thumb from
Brainstorm to estimate the lambda in the MN inverse solver. We present in figure 5.32 and
figure 5.33 a retinotopic mapping obtained using the GCV and the L-curve methods. The
lambda parameter was estimated independently with the Fourier coefficients obtained in
each trial (baseline and stimulation). This result presents smaller active regions in compari-
son to the results in figure 5.24(c). With the GCV, the active region for the upper left quadrant
of the visual field (lower bank of the calcarine sulcus on the right hemisphere) almost disap-
pears while with the L-curve it is completely removed. This suggests that the GCV and the
L-curve tend both to estimate a regularization parameter smaller than the one obtained with
Brainstorm’s 10% rule. Also, we can conclude from this example that the L-curve tends to
provide a value of lambda smaller than the GCV. This agrees with the recurrent claim that
the L-curve approach tends to “under regularize” the inverse problem of M/EEG.
From the investigations and preliminary results presented in this chapter we can draw
some conclusions. The first one in that using the phase of the Fourier coefficients when
performing statistics can be very useful to improve the mapping results. It actually increases
the power of the test. However, our experience tends to prove that estimates of the phase
require long periods of stimulation to be estimated robustly. In the results presented above,
the phase was estimated on a period of stimulation during 5 s. With the same dataset, we
tried our mapping pipeline after artificially shortening the period of stimulation. By doing so,
we observed that the mapping quality starts rapidly to degrade when the phase is estimated
on less than 3.5 s of MEG signal. We have also run the same computations when working with
only the PSDs. It appeared that the mapping is still relatively stable even when the periods
of stimulation last 3 s in each trial. This suggests that the phase of the Fourier coefficients
182 CHAPTER 5. FAST RETINOTOPIC MAPPING WITH MEG
(a) Left hemisphere with GCV.
(b) Left hemisphere with L-curve.
Figure 5.32: Comparison of mapping results obtained using the GCV and the L-curve meth-
ods to estimate the regularization parameter. Statistics are performed on the Fourier coeffi-
cients. Data correspond to the stimulation at a frequency of 7.5 Hz in the right visual field.
Results are represented on the WM/GM interface.
183
(a) Right hemisphere with GCV.
(b) Right hemisphere with L-curve.
Figure 5.33: Comparison of mapping results obtained using the GCV and the L-curve meth-
ods to estimate the regularization parameter. Statistics are performed on the Fourier coef-
ficients. Data correspond to the stimulation at a frequency of 7.5 Hz in the left visual field.
Results are represented on the WM/GM interface.
185
5.8 CONCLUSION
While in the literature many studies involve both fMRI and M/EEG data when investigat-
ing the human visual cortex, we demonstrated in this chapter that basic retinotopic mapping
of V1 was possible with MEG data only. Our review of the literature showed that our ap-
proach based on MEG measurements, steady-state stimuli and distributed inverse solvers,
had never been done. The steady-state stimuli allowed us to have an automatic way to ex-
tract the signal of interest from raw data without any artifact rejection method or filtering.
Our expertise on inverse solvers allowed us to design a very efficient pipeline to estimate the
Fourier coefficients of interests at the source level. Finally, the non-parametric statistical
method we detailed above allowed us to extract significantly active regions very efficiently.
During this study, we faced a set of difficulties. The most important appeared when map-
ping multiple conditions. What was particularly challenging was the fact that the different
conditions i.e., position of the flickering pattern in the visual field, were producing data with
differences in quality and SNR. The amplitudes of the signal of interest were different at the
sensor level depending on the depth of the source, its orientation and its position (ventral or
dorsal). This issue motivated our methodological contribution as a solver where all the exper-
imental conditions are inverted simultaneously. This solver is based on a mixed norm where
the overlap of active regions is penalized using an ℓ1 norm. We called this regularization, an
inter-condition sparse prior.
To conclude this chapter, we would like to mention that this study was at the center of our
work during this thesis. It motivated most of our methodological investigations and contri-
butions. It was also our first occasion to participate in data acquisitions and the first time
we were confronted to the problem of controlling an experimental protocol. We can now fully
appreciate the fact that this step of the study is non trivial and particularly important when
dealing with brain functional imaging data.
CHAPTER 6
TRACKING CORTICAL
ACTIVATIONS WITH
SPATIO-TEMPORAL CONSTRAINTS
The work presented in this chapter goes one step beyond inverse modeling and source local-
ization. It aims at providing a way to sketch the evolution of the cortical activations after
stimulus onset. The proposed method builds on top of standard and widely used linear in-
verse solvers and proposes a strategy to extract spatiotemporally consistent, i.e., physiolog-
ically relevant, active patterns. The limitations of classical linear inverse solvers are well
known. Indeed, as it is discussed in chapter 3, they tend to smear the estimated distributions
of currents over the cortex, and they do not achieve a selection of active regions by setting
coefficients to zero, as it could be done with a sparsity inducing prior. This work gives a prin-
cipled method to achieve such a selection and to remove spurious activations that might be
introduced by basic instant by instant linear inverse solvers.
Exploiting the graph structure of the triangulated cortical surface and the high time sam-
pling of M/EEG recordings, neural activations are tracked over time using a very efficient
graphcut-based algorithm. Such an approach computes a minimum cut on a particularly
designed weighted graph, imposing spatiotemporal regularity constraints on the activation
patterns. Labels are assigned to each node of the graph, distinguishing between active vs.
non-active conditions. The method works globally on the full time period of interest, can
cope with spatially extended active regions and allows the active domain to exhibit topology
changes over time. The algorithm is illustrated and validated on synthetic data. Application
of the method to two MEG datasets demonstrate the ability of the algorithm to track cortical
activations in the primary visual cortex and the somatosensory cortex.
Contents
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2 Tracking with Graph Cuts on a Triangulated Surface . . . . . . . . . . . 190
6.2.1 From Thresholding to Tracking . . . . . . . . . . . . . . . . . . . . . . . 190
6.2.2 Discretization on a Triangulation . . . . . . . . . . . . . . . . . . . . . . 191
6.2.3 Tracking Results with Synthetic Data . . . . . . . . . . . . . . . . . . . . 193
6.3 Application to M/EEG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.3.1 Results on visual stimulation . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.3.2 Results on somatosensory data . . . . . . . . . . . . . . . . . . . . . . . . 200
6.4 Conlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
187
188 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
6.1 INTRODUCTION
By providing instantaneous measurements of the weak electromagnetic fields generated
by neural activations, M/EEG offer a way to estimate neural currents with a millisecond
temporal resolution. Given these current estimates, which can be seen as images of the active
brain, a new challenge consists in studying the spatiotemporal evolution of neural activities
rather than only localizing specific brain areas involved in experimental tasks [136].
A parallel can be drawn between the challenge proposed here and the problem referred
to as tracking in image processing (see [233] for a recent survey). Both applications have
some similarities and some differences. With neuroimaging data, activations can rapidly
move from one part of the brain to another one via white matter tracks; the signals in the
connecting axons are not captured by MEG or EEG. This phenomenon can be compared to
the occlusion problem in video sequences. Brain activations can move and appear while their
intensities and contrasts change over time, which has some similarity with the illumination
problem. On the other hand, there are also some fundamental differences. Brain activations
are not rigid objects making irrelevant constraints such as point-to-point correspondence be-
tween time instants [126], common motion constraints or shape priors [138]. The topology
and the shapes of the active brain regions can evolve over time. This suggests that the track-
ing methods developed in the computer vision community can provide principled methods to
capture the dynamics of active brain regions, but cannot be applied directly. In the context of
brain functional imaging with M/EEG where the activations are defined over a triangulated
cortex with a natural graph structure, combinatorial optimization techniques based on graph
cuts are the most relevant. Graph cut based techniques in the computer vision community
were first applied for image restoration [96] and segmentation [21, 231]. Examples of video
segmentation via graph cuts are presented in [21], where a complete set of frames is provided
as input to the algorithm that treats the entire sequence as a 3D grid of pixels. Related work
such as [122, 232] present results of object tracking using graph cuts by solving the prob-
lem frame by frame. In this chapter, a graph cut based approach is designed to achieve the
tracking of brain activations and a global optimization on the full temporal data is advocated.
As discussed in chapters 3 and 4, when considering distributed source models, the inverse
problem is ill-posed and requires therefore to set priors on the solution. These priors can,
for example, be based on the subject’s anatomy when constraining the sources to lie on the
triangulated cortex, or on results from other imaging modalities such as fMRI. Schematically,
setting a prior on the solution consists in defining a norm and finding the solution that has
the smallest norm among the ones that explain the measured data sufficiently well.
Commonly, this norm is a ℓ2 norm. Solutions obtained by such constraints lead to linear
solutions (cf. chapter 3). Even if these methods are known to smear the estimated distri-
butions of cortical currents, often leading to solutions that are too widely extended, they are
still considered as the standard methods. Technical reasons for the success of such inverse
procedures are that they are easy to implement, make very few assumptions on the solutions,
are very fast to compute and relatively robust considering the level of noise present in real
M/EEG datasets. More importantly, they are used in the M/EEG community because they
provide localization results that are sufficiently accurate. There are some cognitive neuro-
science studies where the precision of the localization is not critical. On the contrary extreme
precision should be required for pre-surgical recordings, e.g., for epileptic patients.
Basic inverse solvers do not integrate constraints on the spatial or temporal regularity of
the activations, although physiology imposes such regularity on cortical activations. Various
contributions have proposed methods to integrate such smoothness priors in an ℓ2 inverse
problem by adding for example spatial or temporal smoothing operators, like a Laplacian,
in the regularization (see section 3.2.2.4). Such Laplacian based methods appear to be diffi-
189
ΩΩc
Time
Ωt
Figure 6.1: Schematic illustration of spatiotemporal active cortical regions. Ω (resp. Ωc)
indicates the active (resp. non-active region). Ωt is the restriction of Ω to time t.
cult to use with real data as it is unclear how the smoothness prior affects the localization.
Depending on the complexity of the cortical sheet structure around the active brain region,
a bad spatial smoothness prior can strongly bias the localization result. More recently, the
use of a mixed norm has been proposed to better constrain the M/EEG inverse problem (cf.
section section 4.4.2 ). However, such a method, based on an ℓ1 prior over space and an ℓ2prior over time has difficulties to cope with spatially extended activations. With an ℓ1 prior
over space, the lower the SNR, the more the inverse problem is penalized and the more focal
is the active region. Also, no matter how good is the SNR, the ℓ1 prior implies that an optimal
solution has fewer active focal sources than the number of sensors. This implies that with a
detailed source space an extended region cannot be reconstructed using such a prior. In order
to cope with this problem, one could introduce a TV prior with temporal regularization, but
such a solver would not be very tractable on real datasets. Since [9], other techniques based
on Bayesian estimation have also been applied to the M/EEG inverse problem (cf. section 3.3).
Such methods aim at estimating the unknown source covariance matrix in order to learn the
good prior used to regularize the inverse problem. Localization precision obtained in simula-
tion studies using these methods is very promising but still these approaches are not flawless.
First, these methods rely on the assumption that the source covariance is stable in the time
window during which the learning step is achieved. This is not true with real data, especially
for long time windows that are considered when looking at late brain responses. Second, the
results obtained with these methods depend on the source covariance templates defined as
input. Even if the Bayesian estimators developed within these frameworks aim at selecting
the good templates, the choice of these templates can have a strong impact on the localization
results. We refer the reader to the end of chapter 3 for a discussion on this topic.
Such an observation leads to the conclusion that, even if various recent contributions
offer very promising methods to solve the M/EEG inverse problem, standard linear inverse
methods, and more specifically simple ℓ2 priors, provide sufficiently good neural currents
estimates, in order to track the dynamics of distributed and rapidly-evolving cortical current
patterns in a principled manner. The method presented in this contribution provides a way
to follow over time the “hot spots”, i.e., the active regions (cf. figure 6.1), while preserving
spatiotemporal regularity. In the framework detailed in this chapter, the topology of active
regions can change over time. They can appear, split, merge, and disappear. This makes the
method able to handle spatially extended active regions while allowing the active domain to
evolve during the time window of interest. To our knowledge no other existing method offers
such possibilities. Thanks to recent implementations of graph-based algorithms, the method
detailed here is tractable on real datasets and offers a very efficient tool to capture the brain
dynamics.
The rest of this chapter consists of two parts. Section 6.2 presents the optimization frame-
work that is used to select coherent spatiotemporal activations defined over a triangulated
mesh. A variational formulation of the tracking problem is introduced with its discretization
over a triangulation, leading to an optimization problem that can be very efficiently solved
using a graph cut algorithm. Section 6.2 concludes with a validation on synthetic datasets.
190 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
∆Ωc
Ω
(a) Thresholding
∆Ωc
Ω
(b) Regularized Thresholding
∆
ΩcΩ T
ime
(c) Tracking on spatiotemporal do-main
Figure 6.2: From thresholding to tracking.
Section 6.3 presents the application of the algorithm to MEG data with two different datasets
exhibiting activations in the primary visual cortex and the somatosensory cortex. Even if the
results presented are obtained with MEG data, the method can be directly applied to neural
currents estimated from EEG data as well.
6.2 TRACKING WITH GRAPH CUTS ON A TRIANGULATEDSURFACE
6.2.1 From Thresholding to Tracking
Let f be a real valued function, defined over a domain ∆
f : ∆→ R
When ∆ contains a temporal dimension, finding an “active” region, denoted Ω, vs. a “non
active” region of ∆, denoted Ωc can be viewed as detecting activity over time. The regions Ω
and Ωc form a partition of ∆, i.e., Ω ∩ Ωc = ∅ and Ω ∪ Ωc = ∆. The function f encodes the
likelihood for an element of ∆ to be inactive, and is thus assumed to take small values in
active regions.
A coarse tracking result can be obtained by simple thresholding, i.e., Ω∗ = x ∈ ∆ s.t. f(x) ≤T where T ∈ R is the thresholding value. However, results obtained by thresholding can be
very noisy when f is corrupted by noise. Results are considered to be noisy if the border of
the active region is irregular or if Ω consists of very small active regions. This is illustrated
in figure 6.2(a). It can be shown that the result obtained by thresholding is the solution of the
following variational problem:
Ω∗ = arg minΩ
∫
Ω
f(x)dx+
∫
Ωc
Tdx = arg minΩD(Ω) . (6.1)
D(Ω) is a data fidelity term. One can improve the thresholding by forcing the solution to be
regular. This is done using a Lagrangian approach that adds a term to (6.1) that penalizes
solutions Ω based on a measure of their regularity R(Ω) ∈ R+. Equation (6.1) becomes
Ω∗ = arg minΩD(Ω) + λR(Ω) , λ ∈ R+ . (6.2)
To improve robustness to noise, the regularity measure should prevent the occurrence of
small isolated regions. If the domain ∆ is only spatial, such a regularity can be enforced by
191
penalizing the solution Ω∗ by the length of its border ∂Ω∗ [158]. Figure 6.2(b) illustrates a
result obtained with such a regularization.
If ∆ is a spatiotemporal domain, imposing the regularity of ∂Ω can be achieved by en-
forcing the restriction of the domain at each time instant to have a small perimeter but
also by enforcing an overlap between neighboring time restrictions of Ω. The regularisa-
tion measure R(Ω) can be separated in two parts: a spatial regularisation measure, denoted
Rspace(Ω), and a temporal measure, denoted Rtime(Ω). A solution obtained using the penalty
term R(Ω) = Rspace(Ω) + Rtime(Ω) is illustrated in figure 6.2(c). By imposing to the active
region, Ω∗, to be regular over space and time, one creates the tubular structures that appear
in figure 6.2(c). Each of the tubular structures can be seen as an active region evolving over
time. Being able to exhibit such tubular structures on a triangulated domain, and therefore
to track the activations over time, is the objective of the algorithm proposed in this chapter.
6.2.2 Discretization on a Triangulation
Let us consider T a triangulation consisting of vertices xi and triangles np, and f a function
defined over the vertices of T and over time:
f : (xi)i × (tk)k=1,...,K → R (6.3)
The set of pairs of adjacent triangles is denoted by E , i.e., “np and nq are adjacent triangles” is
equivalent to “(p, q) ∈ E and (q, p) ∈ E”. The restriction of f to an instant tk is denoted by ftk.
To clarify the presentation, K is first supposed to be equal to 1, f = ft1 . It corresponds to the
case where ∆ is only spatial, i.e., no temporal dimension. On a triangulation, partitioning ∆
in Ωc and Ω consists in assigning to each triangle np a label 0 or 1. The label 0 corresponds to
Ωc and 1 to Ω. The integrals in (6.2) can be rewritten:
∫
Ω
f(x)dx =
∫
Ω
ft1(x)dx =∑
np∈Ω
∫
np
ft1(x)dx and
∫
Ωc
Tdx =∑
nq∈Ωc
∫
nq
Tdx
The perimeter of the active region is obtained by the discretization of ∂Ω using the edges of
the triangulation. The regularization term is here given by the sum of the lengths of the
edges separating Ω from Ωc:
∫
∂Ω
dl =∑
(p,q)∈E/np∈Ω, nq∈Ωc
lpq
where lpq stands for the length of the edge between triangle np and triangle nq. Furthermore,
this regularization term is weighted by a constant defined as λspace. The energy in (6.2)
becomes:
Ω∗ = arg minΩ
∑
np∈Ω
Dp(1) +∑
nq∈Ωc
Dq(0) + λspace
∑
(p,q)∈E/np∈Ω, nq∈Ωc
lpq (6.4)
where∫
npft1(x)dx = Dp(1), and
∫
nqTdx = Dq(0) (0 and 1 refer to the 2 labels). If f is assumed
to be affine on each triangle, i.e, f is discretized with P1 elements, Dp(1) = ap(f(xp1)+f(xp2)+
f(xp3))/3 where p1, p2 and p3 are the indices of the vertices of the triangle np, and ap stands
for its area. Similarly, Dq(0) = aqT .
By rewriting (6.2) in this discrete form, the energy to minimize has been cast into a
Markov Random Field optimization framework [84] that can be very efficiently solved us-
ing graph-based methods [179]. These methods, that have recently been extensively used in
Computer Vision [23], establish the equivalence between energy minimization and finding
192 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
Figure 6.3: Energy discretization on a triangulated mesh.
Table 6.1: Edge weights, i.e., link capacities, of the graph for tracking on a triangulated
mesh. Graph nodes np are indexed by a space index p. N-Links of type “Spatial” control
spatial regularization.T-Links Weight
S → np Dp(0)np → T Dp(1)
N-Links Weight Type
np ↔ nq λspacelpq Spatial
the minimal cut of a specially designed graph. They are commonly known as “graph-cut”
methods. The difficulty is to design a weighted graph providing a natural correspondence
between the partitioning of the graph and the energy that is minimized.
See Appendix B for a short introduction to graph cuts.
An illustration of the graph constructed for the current optimization is presented in fig-
ure 6.3. Such a construction is inspired by [96], where a similar graph is used for binary image
restoration. Contrarily to [96], or more recently to [21], where the graph is constructed on nD
grids, the current application imposes to work on temporal data defined on 2D triangulated
surfaces embedded in 3D.
Each triangle of the cortex mesh corresponds to one node of the graph. These nodes are
thus indexed like the triangles, i.e., using the notation np. There are two supplementary
terminal nodes : the “Source” S and “Sink” T that represent respectively the domains Ω
and Ωc. Each node np is linked to both S and T (these edges referred as T-links imply that
the triangle np can belong to one of the two domains represented by S or T ). Furthermore,
the graph contains edges between each of the nodes that correspond to adjacent triangles.
These edges are referred as N-links. Cutting the graph in two consists in separating the
“Source” from the “Sink” by removing some edges. With a minimal cut, each node will remain
connected to only one of the terminal nodes so that the remaining graph directly corresponds
to a partitioning of the mesh into two domains Ω and Ωc. Table 6.1 details the edge weights
corresponding to the energy (6.4). In practice, edge weights, i.e., link capacities, must be
positive. Therefore, prior to the computation of the minimum cut, edge weights are translated
to guarantee that they all satisfy this computational constraint.
One can notice that the cost associated to a cut, defined as the sum of the edge weights
along the path of the cut, is equal to an energy value. The minimum cut thus provides the
optimum. Graph partitioning via minimum cut is, in turn, known to be equivalent to a poly-
nomial problem: the max flow problem [73]. The fact that an exact solution is obtained by a
single binary cut guarantees the algorithm to be extremely fast (a few seconds for the prob-
lems addressed in this contribution) and also globally optimal [179]. In practice, the min-cut
193
Table 6.2: Edge weights, i.e., link capacities, of the graph for tracking on a triangulated
mesh. Graph nodes np,k are indexed by a space index p, and a time index k. “Spatial” N-
Links control spatial regularization and “Temporal” ones control temporal overlap between
neighboring time instants thus temporal regularization. ap is the area of triangle p and lpq is
the length of the edge separating triangle p and triangle q.T-Links Weight
S → np,k Dp,k(0)np,k → T Dp,k(1)
N-Links Weight Type
np,k ↔ nq,k λspacelpq Spatial
np,k ↔ np,k+1 λtimeap Temporal
is therefore obtained via the computation of the max-flow using an open source implementa-
tion1 [22].
When considering multiple time instants, a similar approach is used. The nodes (np,k)p,k
are now indexed by the triangle index p and the time k. The full graph is obtained by stacking
spatial graphs obtained for each tk and adding N-links between triangles in neighboring time
instants. The number of nodes in the graph is now equal to the number of time instants times
the number of triangles. Both terminal nodes S and T are still unique. Edges weights can
now integrate temporal smoothness (See Table 6.2).
In the optimization framework detailed above, the smoothness term becomes λspaceRspace(Ω)+
λtimeRtime(Ω), where:
Rspace(Ω) =
K∑
k=1
R(Ωtk) =
K∑
k=1
∑
(p,q)∈E/np,k∈Ωtk, nq,k∈Ωc
tk
lpq
Rtime(Ω) =K−1∑
k=1
∑
p/np,k∈Ωtk,np,k+1∈Ωc
tk+1
ap +∑
p/np,k∈Ωctk
,np,k+1∈Ωtk+1
ap
.
(6.5)
In this formulation the regularization parameters λspace and λtime do not depend on the
position in space or in time. However they could be tuned independently for each edge (p, q).
One could think of using for example the curvature of the cortex to promote the cuts on
regions of high curvature. However, without such a priori and for the sake of simplicity λspace
and λtime were kept constant over space and over time.
The weights are detailed in Table 6.2. N-Links of type “Temporal” promote overlap be-
tween neighboring time instants and thus enforce temporal smoothness.
The complexity of the graph cut algorithm is O(N3) where N stands for the number of
nodes in the graph. However in practice, like observed in [22] with nD grids, computation
time appears to increase linearly with the number of nodes (cf. figure 6.4). More than the
computation time, the limiting factor when dealing with large graphs is the memory con-
sumption of the implementation used in this contribution.
6.2.3 Tracking Results with Synthetic Data
The tracking algorithm is now illustrated on two synthetic datasets. The first simulation on
a randomly triangulated sphere is designed to be simple and to demonstrate the influence
of the regularization parameters. The algorithm is then applied to a more realistic dataset
exhibiting three simultaneous moving “hot spots” on a Bunny triangulation.
For the first dataset, the domain ∆ consists of a triangulated sphere with 3 time instants.
About 30 000 vertices are randomly sampled over the sphere and the triangles are obtained
with a Delaunay triangulation. The f function was generated to simulate the displacement
1http://www.adastral.ucl.ac.uk/˜vladkolm/software.html
194 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
0.5 1 1.5 2 2.5 3
x 104
5
10
15
20
25
Number of vertices in triangulation
CP
U T
ime (
s)
3 frames 9 frames30 frames90 frames
(a)
20 40 60 80
5
10
15
20
25
Number of time frames
CP
U T
ime (
s)
5000 vertices10000 vertices20000 vertices30000 vertices
(b)
Figure 6.4: Computation times measured on a synthetic dataset. The computation time of
the tracking algorithm appears in practice linear with the number of vertices in the mesh (a)
and the number of time frames (b). Computation was run on an Intel Core 2 Duo 2.3 GHz
CPU with 2 GB of RAM.
of an activation over time, with the addition of a small active region only at the second time
instant (see figure 6.5(a)). The function f taking the values 0 in active regions and 1 outside
was then corrupted by an additive Gaussian noise with a standard deviation equal to 1.
The tracking algorithm was applied to the data with a threshold T equal to 0.5. Results
are presented in two conditions: first, in figure 6.5(c), with only the spatial regularization
constraint, i.e., λspace = 2 and λtime = 0, and second, in figure 6.5(d), with both spatial and
temporal constraints active, λspace = 2 and λtime = 0.1. It can be observed that λspace > 0
induces spatially coherent regions while λtime > 0 causes the small region only present in
frame 2 to disappear. It can also be noticed that the result in figure 6.5(b) obtained with sim-
ple thresholding is extremely noisy. The method actually manages to select spatiotemporally
consistent activations.
In order to evaluate the sensitivity of the method to the choice of the regularization pa-
rameters, a simulation study has been performed. The computation has been run multiple
times with various pairs of parameters (λspace, λtime). For each pair, the result was compared
to the ground truth that was used to simulate the data, i.e, the active region in figure 6.5(a)
without the small false positive region in frame 2.
The error was quantified with 3 different measures. The first one is given by the ratio
between the number of mislabeled vertices and the total number of vertices, here 30000∗3 (cf.
figure 6.6). It can be observed with figure 6.6(b) that the method provides accurate results
with parameters in a wide range around the optimal obtained with λ∗space = 2.3 and λ∗time =
0.04. The second performance measure is based on the number of connected components
obtained in the result. The right number of connected components is 3, one per time frame.
The number of connected components for each pair of parameters is provided in figure 6.6(c).
One can observe that the right number of components is correctly estimated for a large range
of parameters. Finally, error in active areas is quantified using the Dice’s coefficient (DC)
between two domains Ω and Ω′:
DC(Ω,Ω′) = 2area(Ω ∩ Ω′)
area(Ω) + area(Ω′)(6.6)
which ranges from 0 (no overlap) to 1 (perfect overlap). One can observe in figure 6.6(d) that
the Dice’s coefficient stays close to 1 for a large range of parameters around (λ∗space, λ∗time).
These observations confirm that the method is robust to an approximate definition of the
parameters. This is also confirmed by the following results on synthetic and real MEG data,
195
(a) Borders of simulated active regions. (b) Thresholding result: λspace = 0 andλtime = 0.
(c) Tracking results with λspace = 2 > 0and λtime = 0.
(d) Tracking results with λspace = 2 > 0and λtime = 0.1 > 0.
Figure 6.5: Result of tracking using the graph cut algorithm on synthetic dataset defined on
a randomly triangulated sphere with 30 000 vertices. Colored lines correspond to the border
of the active regions and the color codes for the time instant. Initial data f represented in (a)
is equal to 0 in active regions and 1 outside. Prior to the tracking, Gaussian white noise with
a standard deviation equal to 1 was added to f .
for which the definition of regularization parameters never actually required a very fine tun-
ing.
In a second dataset, three “hot spots” are moving simultaneously during 100 time frames
over a “Bunny” triangulation with about 8000 vertices. These data, also used as a validation
set in [135], are presented at 5 different time instants in figure 6.7(a). Such data were de-
signed to provide a more complex and realistic synthetic dataset according to the geometry
but also to the time scales of the brain activations measured by M/EEG. In order to respect
the convention that f should take small values in active region, f was set to the opposite of
the actual activations defined over the mesh. According to the signal amplitude, the param-
eter T was set to −4 ∗ 10−3. Prior to the tracking, an additive Gaussian white noise with
standard deviation equal to 6 ∗ 10−3 was added to the synthetic data. Results of thresholding
and tracking are presented in figure 6.7(b) and figure 6.7(c). It can be noticed that the method
can actually cope with topology changes since figure 6.7(c) presents the merging of two active
regions. Here also, it can be observed that the tracking algorithm provides a clear view of the
dynamics of the activation defined over the triangulation.
6.3 APPLICATION TO M/EEG DATA
196 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
λspace
λtim
e
0 2 4 60
0.2
0.4
0.6
−2
−1.8
−1.6
−1.4
−1.2
−1
(a) Labeling errors in logarithmic scale
λspace
λtim
e
0 2 4 60
0.2
0.4
0.6
(b) Labeling errors smaller than 1.5%
λspace
λtim
e
0 2 4 60
0.2
0.4
0.6
(c) Region where the estimated number ofcomponents is equal to 3
λspace
λtim
e
0 2 4 60
0.2
0.4
0.6
0
0.2
0.4
0.6
0.8
1
(d) Dice’s coefficient
Figure 6.6: Labeling errors obtained by the tracking algorithm for various pairs of regular-
ization parameters (λspace, λtime). In (a) Error was quantified by the ratio between the number
of mislabeled vertices and the total number of vertices. The color-coded errors are presented
in logarithmic scale. The best performance is obtained with λspace = 2.3 and λtime = 0.04, but
the performances remain very acceptable with parameters in a wide interval around these
values. This is illustrated in (b) where is represented the region in which errors are smaller
than 1.5%. In (c) is represented the region where the number of components is correctly esti-
mated to 3. In (d) the performance is measured using the Dice’s coefficient (6.6). The closer it
is to 1, the better it is. All performance measures confirm that the result is relatively robust
to the definition of the parameters λspace and λtime.
197
Frame 1 Frame 25 Frame 50 Frame 75 Frame 100(a) Synthetic dataset on “Bunny” triangulation presented at multiple time instants.
(b) Tracking result with no regularization :λspace = 0 and λtime = 0
(c) Tracking result with spatiotemporal regular-ization : λspace > 0 and λtime > 0
Figure 6.7: Result of tracking using the graph cut algorithm on a synthetic dataset defined
on the “Bunny” triangulation with 8000 vertices and 100 time instants. The original data
consists of three moving “hot spots” illustrated in (a). Data were corrupted by an additive
Gaussian white noise with a standard deviation equal to 6 ∗ 10−3. Figure (c) demonstrates
the ability of the method to cope with topology changes. Between frame 0 and approximately
frame 30 the 2 activations on the head of the bunny merge.
198 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
The tracking method presented above is now applied on two MEG datasets. The
first one is obtained with a visual stimulation paradigm that consists of a series of expanding
checkerboard rings. Such a stimulation creates a propagation of activation along the pri-
mary visual cortex (V1) that enables direct application of the tracking algorithm. The second
dataset consists of a somatosensory finger stimulation. For this dataset, due to the different
amplitude levels of activations within the various brain regions involved in the processing of
the task, a particular data fidelity cost is designed prior to the tracking algorithm.
6.3.1 Results on visual stimulation
In this experiment, expanding checkerboard rings extended radially from 0 to 4 degrees of vi-
sual eccentricity are presented periodically with a frequency of 5 Hz (see figure 6.8). Because
of the retinotopic organization along the calcarine fissure and V1, the optical flow generated
by the expanding checkerboard rings produces a posterior - anterior wave of activation as
illustrated by figure 6.9. It is this propagating wave within V1 that we propose to track.
Frame 1 Frame 2 Frame 3 Frame 4
Figure 6.8: One block of successive frames used to produce expanding checkerboard rings.
The block of 4 frames is projected periodically with a frequency of 5 Hz.
L I N G U A L
C U N E U S
P R A
E C U
N E
U S
Infr
C a
l c
a r i
n e
f i s s u r e
Parie
to-o
ccipita
lfissure
I S T
G Y R U S
Propagation
Figure 6.9: Schematic representation of the cortical activation propagation produced by the
expanding checkerboard rings. The propagating wave covers the primary visual cortex (V1)
on both sides (superior and inferior) of the calcarine fissure, from the posterior to the anterior
part of the cortex (Adapted from 20th U.S. edition of Gray’s Anatomy of the Human Body,
1918, public domain).
199
Data acquisition and analysis
The MEG data were acquired at 1250 samples/sec with a 151-SQUID sensor CTF MEG (CTF
System Inc.). In each trial, rings were presented for 2.5 s after a prestimulation period of
1.4 s. The first 500 ms of stimulation correspond to a transitory period before 2 s of steady
state period (cf. figure 6.10). In order to improve the signal-to-noise ratio, 33 trials were
averaged and the data were band pass filtered between 2.5 and 7.5 Hz. These data come
originally from the study presented by D. Cosmelli et al. in [41].
Steady state
period
Prestimulation
period
Transitory
period
Time
1.4 s 0.5 s 2 s
...
t=0
Figure 6.10: Experimental protocol for visual stimulation with the expanding checkerboard
rings. Prestimulation period lasts for 1.4 s before 2.5 s of stimulation. Stimulation period is
divided in two: the transitory period estimated around 0.5 s and the steady state period that
lasts for 2 s. Tracking is performed during the steady state period.
In order to estimate the source amplitudes, the forward problem was computed with a
spherical head model and a distributed source space consisting of a cortex triangulation with
about 50 000 vertices. An inverse solution was computed with an ℓ2 penalization term using
dipoles with unconstrained orientations. The source amplitudes, denoted sit (i indexes space
and t time), were computed by computing the norm of the equivalent current dipole at each
location. Using prestimulation recordings sit was then normalized. Assuming stimulation
starts at t = 0, this corresponds to: zit = sit/σi where σi is the standard deviation estimated
on (sit)t<0. By doing so, the zit are all positive and have large values in active brain regions.
Tracking
The tracking algorithm was run using as data term the computed zit. To follow previously
exposed conventions it leads to ft(i) = −zit. The threshold T was set manually in order to
obtain active regions lying approximately within V1. To limit computation time and memory
consumption during processing, the tracking was performed independently on each period
of stimulation during the steady state period. With the 5 Hz stimulation frequency, it corre-
sponds to a period of 200 ms. The results of the tracking algorithm are presented in figure 6.11
while a comparison between thresholding and tracking results is presented in figure 6.12. It
can be observed in figure 6.11 a clear propagation of the activation along the calcarine fissure
and V1, from the posterior part of the cortex towards its anterior part. By observing the de-
tailed comparison in figure 6.12, it appears clearly that the graph cut based spatiotemporal
regularization provides more consistent activation patterns by regularizing the propagation
front and removing false activations outside of the primary visual cortex.
200 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
(a) Medial view (b) Zoom of the calcarine fissure
Figure 6.11: Tracking results obtained with visual stimulation of expanding checkerboard
rings (λspace = 3 , λtime = 1). Color codes for the first time of activation during the time
window considered, between 2310 and 2367 ms after the beginning of the stimulation. Col-
ormap was reduced to 6 colors to present a clearer representation of the propagation. Source
estimates were obtained with a spherical forward model and a minimum-norm inversion us-
ing unconstrained orientations. Source amplitudes at each position were normalized using
prestimulation recordings (see text). Triangulation has about 50 000 vertices. The graph cut
based spatiotemporal regularization provides consistent activation patterns by regularizing
the propagation front and exhibiting active regions in the primary visual cortex.
6.3.2 Results on somatosensory data
Data acquisition and analysis
Acquisition of the somatosensory data was done with the same MEG device that was used for
the visual stimulation paradigm. The somatosensory stimulation was an electrical square-
wave pulse delivered randomly to the thumb, index, middle, and little finger of each hand of
a healthy right-handed subject. The stimulus intensity was below the motor threshold. In
order to improve the SNR, 400 recordings were averaged for each finger. These data come
originally from the study presented by S. Meunier et al. in [150]. To produce precise tracking
results, the triangulation over which cortical activations have been estimated was sampled
with a very high number of vertices (about 55 000). The forward modeling was performed
with a spherical head model. The source activations were computed with an ℓ2 prior using
constrained orientations. The reason for using constrained orientations with this dataset, is
that the shape of the cortical mantle around the brain regions activated by such a stimulation
is much simpler than in the occipital region around the calcarine fissure. Dipole orientations
provided by the normals to the mesh obtained by segmentation of the gray matter are in this
case better estimated.
Somatosensory stimulation is commonly used to validate M/EEG methods since it is known
that somesthetic inputs project in precise brain areas [104]. Among these areas are the pri-
mary motor cortex (S1) and the secondary motor cortex (S2). While the amplitude of the
first activation in S1 is high, the activation that appears later after stimulation in S2 is
much weaker but still present. This leads to the conclusion that a same threshold for both
activations in S1 and S2 is bound to fail. This issue can be compared to the illumination
problem when tracking objects in video sequences. The object keeps moving but its intensity
and contrast can change over time. To tackle this problem, the tracking algorithm on the
somatosensory dataset requires as preprocessing to construct a particular data cost function
201
Time (a) 2310 ms (b) 2344 ms
Th
resh
old
ing
Tra
ckin
g
Time (c) 2368 ms (d) 2374 ms
Th
resh
old
ing
Tra
ckin
g
Figure 6.12: Comparison between naive thresholding and tracking with spatiotemporal reg-
ularization. Tracking and thresholding results are presented at multiple time instants dur-
ing visual presentation of the expanding checkerboard rings. Thresholding corresponds to
λspace = 0 and λtime = 0, i.e., no regularization, while the tracking is performed with λspace = 3and λtime = 1. (a) illustrates how the tracking manages to remove the spatially inconsistent
activation on the lower part of V1. (b) illustrates how the tracking makes the incorrect acti-
vation on the anterior part of the parieto-occipital fissure (cf. figure 6.9) disappear. (c) shows
how the tracking fills the hole in the active region. This is consistent with the retinotopic
organization of V1 and the rings used in the visual stimulation. (d) is another illustration of
the regularization, here during the half period when the activation leaves V1.
202 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
that enables to define a common threshold at any time after stimulation while still using the
essential information that is the source amplitude.
Designing f with heterogeneous activation levels
M/EEG data have typically a sampling rate around 1000 Hz and the characteristic times of
the phenomena that are being recorded are about a few milliseconds. Although this time of
course depends on the type of neural activations, it suggests that an activity is significant if
it lasts for a few consecutive time instants. Hence, the construction of a function f based on
small time windows rather than each time instant is still relevant. From now on, k indexes
time windows rather than time instants. These windows are denoted by wk.
The natural idea behind the choice of f is that a vertex is very likely to be active during a
time window wk if its activity is close, in relative distance, to the activation of a source that
captures a significant amount of energy.
Let a0i (k) denote the activation time series of vertex i during window wk (for the time being
the superscript 0 can just be ignored). The template a0i0
(k) is defined as the time series that
captures the highest amount of energy (i0 = arg maxi ‖a0i (k)‖2). The function f can now be
defined over window wk by:
fk(i) =‖a0
i (k)− a0i0
(k)‖22‖a0
i0(k)‖2
∈ [0, 1] (6.7)
The function f is designed to take its values between 0 and 1 in order to facilitate the choice
of the threshold T . The influence of the activation level does not appear any more directly in
f but only in the choice of the template.
It is however possible to use multiple templates since activations are very likely to be
simultaneously localized in different regions having different temporal types of activations.
This is done using a greedy algorithm similar to standard matching pursuit algorithms. Note
that such greedy approaches have been successfully used in the field of M/EEG with the
RAP MUSIC inverse problem solver [153]. In both procedures, the most significant source
is first estimated. Its contribution to the data is then removed before looking for the next
significantly active dipole. This continues until the data have been sufficiently well explained.
Although the RAP MUSIC algorithm uses signal subspaces, the idea developed here using
temporal templates is fairly similar.
Let A0k be the matrix of all activations during window k, A0
k = (a0i (k))i. Each column
of A0k contains the activation time series of one dipole. The objective is to select the best L
templates that capture most of the activity during window wk. The strategy to do so consists
in selecting them iteratively. Template l is obtained after template l−1 by finding the column
of Alk = (al
i(k))i that has the biggest ℓ2 norm, il = arg maxi ‖ali(k)‖2. The matrix Al+1
k is
obtained by projecting the columns of Alk orthogonally to the vector al
il.
Al+1k = Πl
kAlk =
(
I− alilalT
il
‖alil‖22
)
Alk (6.8)
The process ends when ‖Alk‖2 becomes smaller than a certain percentage P ∈ [0, 1] of ‖A0
k‖2(‖ ‖2 stands for the Frobenius norm). Note that since Πl
k is a projector, necessarily ‖Al+1k ‖2 <
‖Alk‖2 must hold. When considering multiple templates the function f is defined by:
fk(i) = maxl=0,...,L−1
‖ali(k)− al
il(k)‖2
2‖alil(k)‖2
∈ [0, 1] (6.9)
The procedure has the advantage that the time series a0i0, . . . ,a0
iL−1correspond to particular
203
brain locations. More fundamentally, this construction of f has the advantage of making the
threshold easy to set. However, during this work several other attempts were also made to
define f differently using statistical quantities. Using “noise normalized” inverse methods
like dSPM [49] and sLORETA [171], it appeared to be impossible to define a common thresh-
old leading to well localized activations both for the activation peak around 45 ms as well
as for later responses. Permutation tests [169] were also investigated using time dependent
thresholds computed based on a given p-value. Thresholds obtained using False Discovery
Rates (FDR) [85] were also experimented. Both of these approaches, particularly computa-
tionally time demanding, did not produce very satisfactory results. The proposed approach
has the advantage of speed and simplicity when it comes to selecting the parameters, making
the tool quite easy to use.
Tracking
The window size was set equal to 20 ms with an overlap between neighboring windows of
75%, i.e., 15 ms. For each window, f was computed using multiple templates with P = 0.2.
In practice, a maximum of 3 templates were chosen within each time window. The threshold
was set equal to T = 0.2. Results with the right index finger are presented in figure 6.13.
Color codes for time, or equivalently a window index, and indicates the border of the active
region in each time window. This result provides a representation of the brain dynamics
after stimulation of the right index finger. Cortical activations are successfully tracked over
time, as early as 30 ms after stimulus onset in left primary somatosensory cortex (S1), during
the displacement of neural activity along the postcentral gyrus all the way to the secondary
somatosensory cortex (left and right) and the left Brodmann area 5. This confirms what is
reported in [104] about the processing of such somatosensory tasks by the human brain. In
order to illustrate the influence of the spatiotemporal regularization on the solution, results
with no regularization at all, and no temporal regularization are provided in figure 6.14.
It can be observed that the spatiotemporal regularization is actually required to prune the
spurious activations.
204 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
(a) Result on the partially inflated cortex. The green dot corresponds to the location of the equivalentcurrent dipole located at 44 ms after stimulation. Colored lines correspond to the border of the activeregion at different time instants (1 to 3).
(b) Zoom on the tracking result in S1.
Figure 6.13: Result of tracking using the graph cut algorithm on somatosensory dataset.
Source estimates were obtained with a spherical forward model and a minimum-norm inver-
sion on MEG data obtained from a somatosensory evoked response study. Triangulation has
about 55 000 vertices. Data presented here are for the stimulation of the right index finger.
Cortical activations are tracked over time (λspace = 0.05 and λtime = 0.05), as early as 30 ms
after stimulus onset in left primary somatosensory cortex (S1), during the displacement of
neural activity along the postcentral gyrus all the way to the secondary somatosensory cortex
(left and right) and left Brodmann area 5 [104].
205
(a) Tracking result on inflated cortex: Left hemi-sphere
(b) Tracking result on inflated cortex: Right hemi-sphere
(c) Result with no regularization (λspace = 0 , λtime =0)
(d) Result with no temporal regularization (λspace > 0, λtime = 0)
Figure 6.14: Influence on the regularization on the tracking result on the somatosensory
dataset. Source estimates were obtained with a spherical forward model and a minimum-
norm inversion on MEG data obtained from a somatosensory evoked response study. Trian-
gulation has about 55 000 vertices. Data presented here are for the right index finger. It can
be observed that the spatiotemporal regularization is actually required to prune the spurious
activations.
206 CHAPTER 6. TRACKING CORTICAL ACTIVATIONS
6.4 CONLUSION
In this chapter a method has been proposed to address the challenging problem
of robustly estimating the spatiotemporal evolution of neural activities rather than just lo-
calizing specific brain areas involved in experimental tasks. The approach is based on an
optimization over the active domain that penalizes spurious activations and extracts activity
with spatiotemporal regularity. A Lagrangian formulation is derived and it is shown how the
functional obtained can be discretized over a triangulation and efficiently optimized with a
graph cut based procedure.
The principled method proposed here allows the active domain to have topology changes.
It implies that active regions can appear, split, merge and disappear during the time window
of interest. A source is not necessarily flagged as active during the full time period of interest.
Thanks to the use of a very efficient graph cut implementation, with a linear complexity
observed in practice, the optimization can be run in a few seconds on real M/EEG datasets
with highly discretized cortical meshes.
A possible perspective for this work is to integrate into the temporal regularisation term
a prior coming from diffusion MRI data. By measuring white matter anisotropy such data
provide insights on subcortical neural pathways. With the latter graph-based formulation
such an information could be easily integrated by adding temporal edges between triangles
strongly linked by the subcortical fiber bundles present in the white matter. Possibly, such
links could be defined between non neighboring time instants in order to model delays in the
propagation of the neural activations. Moreover, using the residual capacities of the edges
after computing the min-cut, it may be possible to quantify which subcortical edges have
influenced the solution, leading to fruitful insight about how the processing of the information
has been distributed across the different functional brain regions involved in the task.
The source code of the tracking algorithm and the demo scripts necessary to reproduce
the figures of this chapter are available in a Matlab toolbox called EMBAL (Electro-Magnetic
Brain Activity Localization):
https://gforge.inria.fr/projects/embal
CHAPTER 7
GRAPH-BASED ESTIMATION OF
1-D VARIABILITY IN
EVENT-RELATED NEURAL
RESPONSES
Up to here, much interest has been put on the forward and inverse problems assuming the
signal of interest had been accurately captured by the M/EEG sensors. As it is illustrated
in this chapter, various factors can cause the estimation of the evoked neural response to be
biased.
Classical source estimates are computed from data obtained by averaging many repeti-
tions of recordings measured under the same conditions. The measurements are realigned
in time according to a common reference, typically the stimulus onset, and then averaged to
get a clean evoked potential (EP). However, by doing so, it is assumed that the evoked neural
response stays the same across the different repetitions, a.k.a., trials. This is unfortunately
not true. The event related potential (ERP) can for example vary in latency, amplitude or
frequency typically because of habituation effects, anticipation strategies, or fatigue of the
subject.
In this chapter, we address the challenging problem of single-trial data processing. We
propose to make use of recent progress in graph-based methods in order to achieve param-
eter estimation on single-trial data and therefore limit the estimation bias on the evoked
response. The method exposed guarantees global optimality of the solution, hence avoiding
initialization problems. Contrary to many alternative methods, it also avoids the use of the
average data in the computation and the necessity to define an a priori model for the re-
sponse. The algorithm is data-driven and works in two steps. First, the graph Laplacian
offers a convenient way of reordering the dataset with respect to the response latency. And
second, the actual estimation of the response latency across trials is performed in a robust
way with a graph cuts algorithm. The full processing does not require any manual tuning
since a method to automatically set the parameters is also detailed.
Results of a simulation study are presented, demonstrating the ability of the method to
handle datasets with low SNR as it the case with M/EEG single-trial data. Results on an
EEG auditory oddball dataset are also presented.
Contents
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
207
208 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
7.2 Manifold learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2.2 Nonlinear embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.2.3 Laplacian embedding algorithm . . . . . . . . . . . . . . . . . . . . . . . 214
7.3 Spectral reordering of EEG times series . . . . . . . . . . . . . . . . . . . . 215
7.3.1 Toy examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.3.2 Spectral reordering with realistic time series . . . . . . . . . . . . . . . 217
7.4 Robust latency estimation via discrete optimization . . . . . . . . . . . . 218
7.4.1 Optimization framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.2 Graph Cuts algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.3 Result of single-trial latency extraction . . . . . . . . . . . . . . . . . . . 221
7.5 Parameter estimation and robustness . . . . . . . . . . . . . . . . . . . . . 223
7.5.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.5.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
209
7.1 INTRODUCTION
Stimulus-locked averaging is applicable for very early (primary) responses whose charac-
teristics are very stable, such as early somatosensory evoked potentials. However, for later
responses, corresponding to a more complex treatment of information by the brain, charac-
teristics such as latencies and amplitudes are usually variable across trials, typically because
of habituation effects, anticipation strategies, or fatigue [130]. For responses with latency
variability, signals time-locked to the stimulus onset are not properly aligned with respect to
the timing of the single-trial brain responses, and this leads to considerable blurring when
the averaged evoked response is computed. The event related brain response is hence inaccu-
rately estimated, and this can lead to wrong conclusions about the latencies between neural
activations [123]. This chapter provides methodological tools to estimate the variability, e.g.,
the latency, of brain responses across trials, and thus to correct for the averaging bias.
Analysis of single-trial EEG was pioneered by Woody’s cross-correlation averaging [230].
In this work, the latency of single-trial event-related activity is estimated by using a tem-
plate, supposed to model accurately the evoked response. The algorithm alternately esti-
mates the template and corrects for the latency bias in each trial by finding the maximum
of a cross-correlation. In each iteration, the template is updated with the average of the
corrected data. Convergence is observed in practice, but not guaranteed. The obtained so-
lution may not be globally optimal since it depends on the time series used to initialize the
algorithm. Subsequent work has extended Woody’s idea by including amplitude variability
and by placing the estimation of single-trial parameters in a maximum likelihood frame-
work [114, 134, 147, 206, 207]. Direct denoising of EEG single-trial data with a time-scale
decomposition has been proposed [181], which designs a wavelet template from the aver-
age signal across trials. As in [230], the average signal is explicitly used as a template in
the computation. Several other methods, based on linear decompositions or wavelet analy-
sis [15, 16, 185, 221], make the assumption that the evoked response can be well represented
by a linear combination of functions within a dictionary. The dictionary is provided as a prior
to the algorithm or learned from the data, e.g., with a singular value decomposition (SVD).
All methods described above suffer from various pitfalls: the lack of proof of convergence
of the procedure, the dependence of the results on the initialization, the use of the average
data in the computation, and the necessity to define an a priori model for the waveform of the
response.
This chapter approaches cross-trial variability from a different perspective, in a data-
driven way, free of the pitfalls just enumerated. The problem is cast into an optimization
framework in which global optimality can be proved, and where initialization is not an issue.
Moreover, the solution can be found very efficiently by using associated fast algorithms. The
method proceeds in two stages. First, we propose to sort the trials automatically according to
the variability of the evoked potentials without estimating it explicitly. The estimation of the
variability is performed in a second stage.
As an illustration, figure 7.1 presents raster plots of a raw dataset (a) and a reordered
dataset (b): in these images, each line represents a trial. In figure 7.1(b), the trials have been
reordered according to the measured motor response of the subject. A structure thus becomes
visible and can be used to interpret the single-trial ERPs. Such representations of multi-trial
ERP recordings, called ERP images were pioneered in [123] and made generally available
through EEGLAB [59]. EEGLAB proposes several ways of reordering ERP images, according
to event triggers, or to the phase of time-frequency decompositions.
In this chapter, we propose a reordering, based only on the evoked response, without re-
lying on any external information. In some multi-trial ERP recordings, it is reasonable to
assume that a similar neural activation occurs in each repetition of the experiment, but with
210 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
a latency between stimulation and response that is variable across trials. This leads to the
intuition that the latency of the response is the main “degree of freedom” within the data.
The single-trial time series then lie on a noisy one-dimensional manifold which can be pa-
rameterized by this latency. In order to capture this “degree of freedom”, manifold-learning
techniques are very well-suited: by providing low dimensional representation of the data,
they offer an efficient way of revealing the structure present in a dataset. We propose to use
methods based on eigendecompositions of graph Laplacians [14, 37]. Nonlinear dimensional-
ity reduction methods have been applied before to functional MRI data [7]. In [8] and [198],
low dimensional representations of fMRI datasets have been used to identify and classify
brain functional regions. For EEG data, our previous work [94] has shown how Laplacian
Eigenmaps [14] can be used to reveal the structure of an EEG dataset.
In this chapter, the nodes of the constructed graphs correspond to single-trial EEG time
series and it is shown on an EEG sample dataset that the “best” one dimensional represen-
tation of the data obtained with graph Laplacian procedures is monotonically related to the
response latency. Equivalently, it is shown that the first coordinate in the low dimensional
space can be used to reorder the dataset as in figure 7.1(b). We will refer to this step of the
method as the spectral reordering of the raster plot.
Trial
Time (ms)
0 199 398 597
50
100
150
200
−20
−10
0
10
20
(a) Original raster plot
Trial
Time (ms)
0 199 398 597
50
100
150
200
−20
−10
0
10
20
(b) Raster plot reordered with themeasured motor response of the sub-ject.
Figure 7.1: Illustation of raster plot reordering on real EEG recordings. Each line of the
image represents a time series and the color codes for the signal amplitude in µV . In this
dataset the measured latencies of the motor responses, represented by the dark semi-vertical
line in figure 7.1(b), are used to sort single-trial time series.
Spectral reordering reveals the structure present in the raster plot, but it does not ex-
plicitly estimate the trial-dependent parameters in each trial. This problem is tackled in the
second step of the procedure, eventually leading to optimized event-related averaging. Given
a reordered raster plot, estimating the latencies corresponds to finding an increasing function
similar to the one defined with the dark semi-vertical line in figure 7.1(b). This function takes
its values on the grid defined by the image, and can only assume a finite number of discrete
values. The problem of extracting the latency information from the reordered raster plots can
thus be viewed as a combinatorial problem, which can be very efficiently optimized using a
graph cut algorithm. This step is performed independently from the spectral reordering step,
and only requires a sorted raster plot as input.
The chapter is organized as follows. In section 7.2 we introduce Manifold Learning and
present the graph Laplacian method. In section 7.3, we apply it to the reordering of multi
trial EEG data and show how the proposed approach compares to the more classical Principal
Components Analysis (PCA). Section 7.4 focusses on latency estimation and formulates it
as a combinatorial optimization problem. An efficient solution to this problem is proposed
by computing a minimal cut on a specially designed graph. Finally, a strategy to estimate
the different parameters is detailed and the robustness of the procedure is investigated by
211
numerical simulations. Results on synthetic and real data accompany each of the methods
presented.
7.2 MANIFOLD LEARNING
Let (xi)i=1,...,N be N elements of a metric space (X , dX ), which are distributed with a
probability distribution p on a low-dimensional smooth sub-manifoldM of X . In this chapter,
the (xi)i=1,...,N are time series and N the number of trials.
This section deals with manifold learning techniques: Section 7.2.1 presents a Principal
Components approach to variability analysis. Sections 7.2.2 and 7.2.3 recall notions on non-
linear embedding, leading to the Laplacian embedding algorithm. Differences between the
linear and non-linear approaches are emphasized on synthetic datasets in section 7.2.1 and
will also be illustrated in section 7.3.1.
7.2.1 Principal Component Analysis
Principal Component Analysis represents the data in a new coordinate system, obtained
through a rotation that diagonalizes the empirical covariance matrix. Although PCA per
se does not modify the dimensionality of the representation, by ordering the eigenvalues of
the empirical covariance matrix, one can represent the data in the leading PCA directions.
The PCA representation is a valuable tool for exploratory analysis as it can provide a repre-
sentation of the structure present in the data.
To take the example of latency variability, consider a dataset of translated versions of a
reference template xi(t) = x(t− τi). The data is wide-sense stationary, and its covariance ma-
trix Cx is diagonalized in the discrete Fourier basis. The Fourier transform of the translated
signal xi(t) = x(t− τi) is phase modulated: xi(ω) = eiτi ω x(ω), where ω denotes the frequency.
Principal Components correspond to the frequencies that dominate the power spectrum of
the signal Px(ω) (the Fourier transform of the covariance matrix Cx). Consider the dominant
frequency ω1: the coordinates of each trial xi in the two first PCA directions are cos(τiω1 +
φ)|x(ω1)| and sin(τiω1 + φ)|x(ω1)|, where φ is the phase of x(ω1). The data thus organize along
a one-dimensional manifold parameterized by latency τi.
Time (ms)
Tria
l
100 200 300 400 500
100
200
300
400
500
(a) Original raster plot (a setof jittered Gabor wavelets)
−40 −20 0 20−30
−20
−10
0
10
20
(b) 2D PCA projection of theset of translated time-courses
−40
−20
0
20
−50
0
50−20
0
20
40
(c) 3D PCA projection of the set oftranslated time-courses
Figure 7.2: PCA analysis of a set of 500 jittered time series of 512 time samples.
To illustrate this, a set of 500 jittered time series, each with 512 time samples is displayed
as a raster plot in Figure 7.2(a). The projections of the data in the leading two and three
PCA dimensions, in Figure 7.2, cluster along curves, indicating the 1D structure of the data.
Reordering the time-courses with respect to latency is equivalent to finding a parameteriza-
212 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
tion of the curves in Figure 7.2(b) and 7.2(c). In the next subsections, we present methods to
automatically reorder time series according to their 1D variability.
7.2.2 Nonlinear embedding
X
M
. ... .. . .. ..
. . .. xi.. .. .R
n
...
.
.
..
..
..
.. .
......i
f(x )
f
Figure 7.3: Non-linear embedding into a low-dimensional Euclidian space
Given the distance dX and the points (xi), the aim is to recover the structure of M via
an embedding function f that maps the (xi) into a low-dimensional Euclidian space Rn (cf.
figure 7.3). The embedding f provides a low-dimensional representation of the dataset and
also a parameterization of the manifold. When M has a 1-d structure, the first coordinate
of f can be used to order the points (i.e., in our context, the time series), provided that the
function f satisfies a “regularity” constraint: if two points x and z are close in M, then so
must f(x) and f(z) in Rn. This is sometimes referred to as a minimal distortion property.
For n = 1, a Taylor expansion provides the following inequality [14]:
|f(z)− f(x)| ≤ dM(x, z)‖∇f(x)‖+ o(dM(x, z)) (7.1)
where ∇f stands for the gradient of f and dM is the geodesic distance on the manifold be-
tween points x and z. The notation g(z) = o(dM(x, z)) means thatg(z)
dM(x,z) tends to 0 as z tends
towards x. The inequality in equation (7.1) means that dM(x, z)‖∇f(x)‖ is a good first order
upper bound for |f(z)− f(x)|.In order to obtain an embedding that satisfies the “regularity” constraint, Laplacian-based
methods try to control the smoothness of f globally by minimising
∫
M||∇f(x)||2p(x)sdx ,
provided that∫
M||f(x)||2p(x)sdx = 1 (i).
The latter condition removes the scaling indetermination for the function f . The parameter
s controls the influence of the probability density on the solution. If s is strictly positive, the
regularity of f will be more constrained in high density regions.
The optimization problem under constraint (i) can be formulated as a saddle-point problem
for the Lagrangian L(f, λ):
L(f, λ) =
∫
M||∇f(x)||2p(x)sdx+ λ
(
1−∫
M||f(x)||2p(x)sdx
)
. (7.2)
Introducing the s-weighted Laplacian operator defined as ∆sf , − 1ps div(ps∇f) and 〈f, g〉M ,
213
∫
M 〈f(x), g(x)〉p(x)sdx, L(f, λ) defined in equation (7.2) can be rewritten:
L(f, λ) =
∫
M||∇f(x)||2p(x)sdx+ λ
(
1−∫
M||f(x)||2p(x)sdx
)
=
∫
M〈∇f(x),∇f(x)〉p(x)sdx+ λ
(
1−∫
M〈f(x), f(x)〉p(x)sdx
)
=
∫
M〈p(x)s∇f(x),∇f(x)〉dx+ λ
(
1−∫
M〈f(x), f(x)〉p(x)sdx
)
=
∫
M−〈div(p(x)s∇f(x)), f(x)〉dx+ λ
(
1−∫
M〈f(x), f(x)〉p(x)sdx
)(*)
=
∫
M〈∆sf(x), f(x)〉p(x)sdx+ λ
(
1−∫
M〈f(x), f(x)〉p(x)sdx
)
= 〈∆sf, f〉M + λ (1− 〈f, f〉M)
Step (*) comes from the fact that the gradient operator is the negative adjoint of the diver-
gence.
Differentiating L(f, λ) with respect to f leads to:
∂L(f, λ)
∂f= ∆sf − λf
Therefore setting this derivative to zero imposes f to satisfy:
∆sf = λf
thus f to be an eigenvector of the operator ∆s associated to the eigenvalue λ.
Notice that if ∆sfi = λifi,
L(fi, λi) = λi
This implies that optimizing the constrained problem whose Lagrangian is given by L(f, λ)
requires to find the eigenvectors of ∆s with the smallest eigenvalues.
The constant function fcst equal to 1 everywhere is an eigenfunction of ∆s for the eigen-
value 0. To avoid this trivial embedding, the solution f is constrained to be orthogonal to the
function fcst, i.e.,∫
M f(x)p(x)sdx = 0 (ii).
The optimal embedding f under constraints (i) and (ii) is the eigenfunction of ∆s corre-
sponding to the smallest non-zero eigenvalue.
In order to compute f from a manifold sampled with a limited number of points, the
operator ∆s needs to be approximated. Graph Laplacian methods approximate ∆s by the
Laplacian of a particularly designed graph. Let G = (V, E) be an undirected graph where
V are the nodes (xi)i=1,...,N and E are the edges. A weight wij is associated to every edge
(i, j) ∈ E , leading to a weighted graph G. The Laplacian L of the graph is a matrix defined by
L = D −W where W = (wij)ij and D is diagonal with Dii =∑
j wij .
The random-walk Laplacian is a normalized version of L defined by Lrw = D−1L = I −D−1W where I is the identity.
A weighting matrix W yielding a good approximation of ∆s must now be defined. This is
done with the help of a similarity measure k which is a non-increasing function of the distance
dX . Here, k is a Gaussian kernel with standard deviation σ:
k(xi, xj) = e−dX (xi,xj)2
σ2 .
Results of [107] show that, for a given k, the random-walk Laplacian of G converges almost
214 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
surely to ∆s when the sample size N goes to infinity, if
wij =k(xi, xj)
(d(xi)d(xj))1−s/2(7.3)
where d(x) =∑N
i=1 k(x, xi) is an estimator of the probability density function over M, The
embedding function can therefore be obtained by computing the eigenvectors fq of Lrw satis-
fying:
(I −D−1W )fq = λqfq
(D −W )fq = λqDfq
Lfq = λqDfq
(7.4)
It can easily be proved that 0 is a trivial eigenvalue and also that L is a symmetric positive
matrix, which implies, since d(xi) ≥ 0 for all i, that the generalized eigenvalues λq are all
positive and can be ordered:
0 = λ0 ≤ λ1 ≤ · · · ≤ λq ≤ λq+1 ≤ · · · ≤ λN−1.
The embedding f into Rn is then given by:
f(xi) = (f1(i), f2(i), ..., fn(i)).
where fq(i) is the ith component of fq.
7.2.3 Laplacian embedding algorithm
Observe that, in the case of a uniform sampling over the manifold, the parameter s has no
influence. Numerical experiments on synthetic and real EEG data showed that s did not have
much influence on the results, and as a consequence s was set to 1. This corresponds to the
Diffusion Map algorithm [37]. The following algorithm details the different steps to compute.
Laplacian-based n-dimensional embedding:
• Set dX , σ and n the dimension of the embedding.
• Compute K with K(i, j) = e−dX (xi,xj)
σ2 .
• Compute DK with DK(i, i) =∑N
j=1K(i, j) and DK(i, j) = 0 if i 6= j.
• Compute W = D−1/2K KD
−1/2K (equation (7.3) with s = 1)
• Compute D with D(i, i) =∑N
j=1W (i, j) and D(i, j) = 0 if i 6= j.
• Find the n + 1 first generalized eigenvectors fk solution of (D − W )fk = λkDfk,
k = 0, . . . , n.
• The coordinates of point xi in Rn are (f1(i), f2(i), . . . , fn(i)).
Since L is symmetric, its first eigenvectors can be computed efficiently with an iterative
method, for example an Implicitly Restarted Arnoldi Method [137]. For computational effi-
ciency, it is possible to set to 0 the wij below a threshold, leading to sparse matrices, and re-
ducing the computational cost of matrix-vector multiplications at each iteration. Note should
be taken that too high a threshold leads to a very coarse approximation of the solution.
215
In order to provide more insight on the discrete solution, it can be observed from equa-
tion (7.4) that the solution f1 also solves the following optimization problem:
arg minfT Df=1, fT D1=0
fTLf
and that expanding fTLf gives:
fTLf =1
2
∑
(i,j)∈Nwij(f(i)− f(j))2
Since each term of the sum is positive, minimizing fTLf requires to minimize each wij(f(i)−f(j))2 which implies that if wij is big, i.e., dX (xi, xj) is small, (f(i) − f(j))2 should be small.
This can be directly related to the “regularity” constraint mentioned above. Although this
comment helps to bridge the gap between the discrete and the continuous formulations of the
problem, it can be noticed that, contrary to the continuous formulation, the discrete problem
does not provide much insight into the influence of the density p.
Manifold learning methods are “data driven”. They capture the structure of the dataset,
provided that the chosen distance dX is appropriate. When dealing with time series, many
distance functions can be used. In practice, dX does not need to be an actual distance, and
instead it can measure the difference between features of interest for two elements of the
dataset. One may even design dX to be blind to some features of the data, which are irrelevant
for the application at hand.
7.3 SPECTRAL REORDERING OF EEG TIMES SERIES
In this context, each xi is a time series, and X = RT where T is the number of time
samples. The same stimulus has been delivered N times leading to N time series.
7.3.1 Toy examples
Once computed, the Laplacian embedding f provides a parameterization of the manifoldM,
which means that ifM has a noisy 1D structure, the first coordinate, f1, orders the elements
along the manifold. This is now illustrated with the noiseless synthetic dataset already pre-
sented in Section 7.2.1. It is simulated with T = 512 and N = 500 points. The embedding
was performed using the Euclidian distance and a Gaussian kernel. The embedded points
are represented in figure 7.4(c). It can be observed that the embedding unfolds the mani-
fold structure. The ordering provided by f1 can be encoded with a color, hence each point
of the PCA representation can be colored. The 2D and 3D PCA point clouds are presented
in figure 7.4(a) and figure 7.4(b): observe that the color changes continuously along the one
dimensional structure. Figure 7.4(d) presents the reordering of the raster plot already pre-
sented in figure 7.2(a).
To illustrate that the manifold learning method can capture more than a variability in
latency, a sample dataset with a variability in latency and scale has been designed (time series
are “stretched”). The reordered dataset and the 3D PCA colored embedding are presented in
figure 7.5.
The EEGLAB toolbox offers an alternative way of ordering signals, based on the phase
of a time-frequency decomposition using Gabor wavelets. The user is asked to provide the
latency of the response, the number of oscillations and the frequency of interest (the frequency
216 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
(a) Points of the 2D PCA pro-jection, colored with respect totheir first Laplacian embed-ding coordinate
(b) Points of the 3D PCA projection,colored with respect to their firstLaplacian embedding coordinate
(c) Embedded points colored with re-spect to the first Laplacian embed-ding coordinate.
Time (ms)
Tria
l
100 200 300 400 500
100
200
300
400
500
(d) Reordered raster plot
Figure 7.4: Illustration of manifold learning using graph Laplacian on the synthetic dataset
of Figure 7.2.
(a) Points of the 3D PCA pro-jection, colored with respect totheir first Laplacian embed-ding coordinate
(b) Embedded points colored withrespect to the first Laplacian embed-ding coordinate.
Time (ms)
Tria
l
100 200 300 400 500
100
200
300
400
500
(c) Reordered raster plot
Figure 7.5: Illustration of manifold learning using graph Laplacian on a synthetic dataset
with latency and scale variability (time series are “stretched” in time with the increase of the
latency).
217
Time (ms)T
ria
l
100 200 300 400 500
100
200
300
400
500
(a) Phase reordering on thedataset in figure 7.2
Time (ms)
Tria
l
100 200 300 400 500
100
200
300
400
500
(b) Phase reordering on thedataset in figure 7.5
Figure 7.6: Reordering results obtained on the datasets in figure 7.2 and figure 7.5 with the
EEGLAB reordering method based on the phase of a Gabor wavelet, with a priori defined
latency and frequency.
can also be automatically set as maximizing the Fourier spectrum). Once these parameters
are set, the phase for the corresponding Gabor function is computed and the raster plot is
reordered accordingly. The two datasets presented in figure 7.2 and figure 7.5 are shown,
reordered with EEGLAB, in figure 7.6. It can be noticed that the method fails to accurately
reorder the raster plots, for different reasons in the two examples. In the first one, the Gabor
waveform that has been translated has multiple cycles which implies that the phase in [0, 2π]
cannot capture the variability. In the second one, due to the temporal scaling, the frequency
slightly varies across trials. The EEGLAB procedure assumes a constant frequency across
trials, and estimating the phase when the frequency is not well defined leads to errors.
7.3.2 Spectral reordering with realistic time series
The manifold learning method has just been illustrated on synthetic toy examples. We now
apply it to a more challenging synthetic problem where the time series are corrupted by an
additive noise, and to real ERP recordings.
As a preprocessing, the time series are centered and normalized so that ‖xi‖ = 1. After
this normalization, the half Euclidian distance is given by ‖xi − xj‖2/2 =√
(1− 〈xi, xj〉)/2,
which implies that it is equivalent to considering the correlation between time series.
As mentioned earlier dX may not be a real distance but just a similarity measure. To
investigate the use of different similarity measures, we propose to use:
dX ;r(xi, xj) =
(
1− 〈xi, xj〉2
)r/2
(7.5)
where r is a power exponent applied to the classical Euclidian distance.
Such a similarity measure is sensitive to time shifting, and is therefore appropriate for
capturing the latency information. It also has the advantage that dX ;r(xi, xj) ∈ [0, 1] which
constrains the choice of σ (see Section 7.5).
The parameters used by the spectral embedding are s (fixed to 1 in practice), r and σ.
Section 7.5.1 provides a strategy to set these parameters.
The synthetic data consists in 100 time series (N = 100 and T = 500) computed from a
template (cf. figure 7.11(e), green curve) by translating the positive deflection with a random
time lag (with a Gaussian probability distribution of standard deviation σlag), and adding
noise with a SNR measured in variance ratio (variance of signal divided by variance of the
noise). Figure 7.7(a), presents a dataset with σlag = 50 ms and an additive autoregressive (AR)
noise (SNR=1.5=1.76dB). The AR model of order 8 was fitted on spontaneous EEG activity.
The second application of the method is to auditory oddball EEG data. This paradigm
218 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
consists of alternating frequent tones and rare (“target”) tones. It is known to elicit a positive
EEG deflection to the rare tones, referred to as the “P300” or “P3” wave, more prominent on
the midline electrodes and occurring at a latency around 300 ms [17]. The data is recorded
from the central electrode Cz (cf. figure 1.22 in chapter 1), sampled at a rate of 256 Hz and
processed with a high-pass filter at 0.5 Hz (Butterworth Zero Phase Filter, Time constant
0.3183 s, 12 dB/oct) and a low-pass filter at 8 Hz (Butterworth Zero Phase Filter, 48 dB/oct).
The positive deflection of the P300 wave, in the 3-5 Hz range, is preserved. Figures 7.7(a) and
7.8(a) present raster plots of both data sets. The random nature of the time latency of the
P300, as first observed in [130], is obvious in the oddball raster plot.
Time (ms)
Tria
l
100 200 300 400 500
20
40
60
80
100
(a) Raster plot of raw time series.
0 100 200 300 400 500
−0.5
0
0.5
1
Time (ms)
(b) Two time series belonging to thedataset.
−0.1 −0.05 0 0.05 0.1−0.2
−0.15
−0.1
−0.05
0
0.05
f 1
f2
(c) 2D embedding of the time series.
Time (ms)
Tria
l
100 200 300 400 500
20
40
60
80
100
(d) Reordered raster plot of time se-ries using of the first coordinate ofthe embedding f .
Figure 7.7: Spectral reordering results on synthetic data (σlag = 50 ms). Embedding was
performed with r = 2 and σ = 0.1.
In both cases, the time series were embedded into a two dimensional space (cf. figure 7.7(c)
and figure 7.8(c)). In both figures, it can be noticed that the points are clustered along an
elongated 1D structure, as it was the case with the toy example in figure 7.4. The first
coordinate can therefore be used to correctly parameterize the manifold and to order the
time series. By observing the reordered raster plots in figure 7.7(d) and figure 7.8(d), and
comparing them to a raster plot reordered using the externally measured motor response, it
appears that the first coordinate of f has correctly captured the latency information.
7.4 ROBUST LATENCY ESTIMATION VIA DISCRETE OPTI-MIZATION
Once the raster plot has been correctly reordered, estimating the latency of the response
consists in tracing a non decreasing line through the extrema of the raster plot, like the dark
semi-vertical line in figure 7.1(b). Though this may appear a simple task, it is non trivial to
219
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(a) Raster plot of raw time series
0 200 400 600
−10
−5
0
5
10
Time (ms)
(b) Example of time series belongingto the dataset.
−0.15 −0.1 −0.05 0 0.05 0.1−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
f 1
f2
(c) 2D embedding of the time series.
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(d) Reordered raster plot of time se-ries using the first coordinate of theembedding f .
Figure 7.8: Spectral reordering results on EEG oddball time series. The solid vertical line
in the raster plots corresponds to the stimulus onset. The vertical dashed lines provide the
limits on the time window used to reorder the data. Embedding was performed with r = 2and σ = 0.05.
220 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
automatize.
In order to automatically extract the latency corresponding to each time series from a
reordered plot, one should exploit the global information in the data and take into account the
a priori information that latencies are monotonically related in the reordered trials. Finding
an increasing function such as the one defined by the dark semi-vertical line in figure 7.1(b),
is equivalent to partitioning the raster plot in two, i.e., achieving a two class segmentation of
the ERP image, while taking into account the monotony constraint (cf. figure 7.9(b)).
7.4.1 Optimization framework
After reordering, it can be assumed that the latencies of brain responses of two neighboring
trials xi and xi+1 are close. Let us denote li the latency of the response for trial i. With the
Markov Random Field (MRF) optimization framework pioneered in [84], the robust estima-
tion of the (li)i=1,...,N amounts to solving:
(li)∗i=1,...,N = arg min
(li)i=1,...,N
E(li) with E(li) =
N∑
i=1
Di(li) + α
N−1∑
i=1
Vi(li, li+1) (7.6)
Each term Di(li), usually called a data term, tends to set li to the best latency value for
time series xi, regardless of the other series. When considering EEG evoked potentials, high
deflections of the signal from the baseline are of particular interest. Thus a reasonable choice
is to set li so that xi(li) is maximal (or minimal). In our optimization formulation, this leads
to Di(li) = φ(xi(li)), where φ is a positive decreasing function1. In practice, when looking for
positive deflections φ is defined by φ(xi(j)) = M − xi(j) ≥ 0 where M = maxi,jxi(j).
Since single-trial EEG recordings are heavily corrupted by noise, robust estimation of
lags is obtained by adding a second term to the energy E. This has been made possible by the
spectral reordering, that guarantees that for any i, the time series xi and xi+1 are similar, and
thus, neighboring latencies l∗i and l∗i+1 should be close. The smoothing term Vi(li, li+1) must
increase when the distance between li and li+1 increases. Without any prior information, it is
natural to set all the Vi to a unique function V that does not depend on i.
In our case, we choose V (li, li+1) = |li+1−li|while strongly imposing li ≤ li+1,∀i. Parameter
α penalizes the difference of latencies between two neighboring time series.
7.4.2 Graph Cuts algorithm
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(a) Automatic lag extraction result
50
100
150
200
(b) Corresponding binary seg-mentation
Figure 7.9: Result of binary partitioning using the graph cut algorithm applied on the raster
plot reordered with motor response (cf. figure 7.1(b))
1Di(li) should be positive for technical reasons imposed by the Graph Cuts method – see Section 7.4.2.
221
The unknown variables li take their values on the same time samples as the time series
xi, and can thus only take a finite number of discrete values. This implies that the problem
necessarily has a solution, which may however not be unique. In order to find a fast solu-
tion of such discrete combinatorial optimization problems, graph-based approaches are being
extensively used by the Computer Vision community [23].
MRF (cf. equation (7.6)) optimization with graph cuts is applicable depending both on the
discrete set of possible values for li and on the choice of the smoothing terms V . They can
yield global or local minima. When li can take only two values, i.e., labels, global optimal-
ity is guaranteed [96]. With multiple labels, global optimality is possible when the li can be
linearly ordered [23]. This is the case with the one-dimensional problem addressed in this
contribution. The optimal solution is here obtained by computing only one minimal cut of a
specially designed graph. More generally graph cuts methods are adapted for MRF optimiza-
tion when the smoothing terms lead to a sub-modular energy [127]. See Appendix B for a
short introduction to graph cuts.
To construct the graph in our case, it can be observed that a set of latencies satisfying the
constraints can be defined by partitioning the raster plot into two classes. This is illustrated
in figure 7.9(b) where the optimization has been performed on the raster plot reordered by
the motor response figure 7.9(a). The border between the two regions provides an estimation
of the latency. Our problem thus amounts to designing a weighted graph that provides a
natural correspondence between the partitioning and the energy detailed in equation (7.6).
With the previously defined V , a graph can be constructed as in figure 7.10. Each node of
the graph ni,j is indexed by trial number and by time, i.e., a line index i and a column index j,
except for the terminal nodes (“source” S and “sink” T ). Cutting the graph in two consists in
separating the “source” from the “sink”. Edge weights are detailed in table 7.1. Edges with∞weights are used to guarantee the constraints: horizontal ∞ weighted edges guarantee that
the cut goes through each line once, i.e., li is unique for each time series, while vertical ∞weighted edges guarantee that the solution is increasing.
One can notice that the cost associated to a cut, defined as the sum of the edge weights
along the path of the cut, is equal to an energy value. The minimum cut thus provides the op-
timum. The solution is obtained by a single binary cut, which makes the algorithm extremely
fast (a few milliseconds) and also globally optimal. Graph partitioning via minimum cut is, in
turn, known to be equivalent to a polynomial problem: the max flow problem [73]. We solve
the minimum cut problem with the max flow algorithm described in [22]2. The complexity
observed in practice is linear in the number of nodes in the graph, i.e., O(NT ).
The spectral reordering may provide a raster plot ordered from large latencies to small
latencies, in which case one should seek a non increasing partioning of the raster plot. This
is why, in practice, the graph cut algorithm is run twice on each dataset, once with the order
provided by the manifold learning algorithm and once with this order inverted. The order
that leads to the smallest energy is kept.
7.4.3 Result of single-trial latency extraction
The optimization procedure previously described was applied to the reordered datasets dis-
played in figure 7.7(d) and figure 7.8(d). Results are presented in figure 7.11(a) and fig-
ure 7.11(b). Such a lag extraction technique would have been inapplicable on the unordered
time series displayed in the raster plots of figure 7.7(a) and figure 7.8(a), and is only made
possible after reordering by the non linear embedding.
Figures 7.11(c) and 7.11(d) present the synthetic and the oddball datasets after realign-
ment, i.e., after raster plot reordering and lag correction. The evoked potentials were com-
puted by standard averaging with and without latency correction cf. figure 7.11(e) and fig-
2using the open source implementation http://www.adastral.ucl.ac.uk/˜vladkolm/software.html
222 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
SSource
n11
n21
n31
n12
n22
n32
n13
n23
n33
n14
n24
n34
n15
n25
n35
T Sink
∞
∞
∞
∞
∞
∞
∞α
∞α
∞α
∞α
∞α
∞α
∞α
∞α
∞α
∞α
D1(1)
∞
D2(1)
∞
D3(1)
∞
D1(2)
∞
D2(2)
∞
D3(2)
∞
D1(3)
∞
D2(3)
∞
D3(3)
∞
D1(4)
∞
D2(4)
∞
D3(4)
∞
Figure 7.10: Graph illustration for an image N × T (N = 3 time series of length T = 4) with
an example of minimal cut in red. Time instants index the horizontal edges. In blue are the
nodes linked to the “source” node and in gray the nodes linked to the “sink” node after cutting
the graph. The corresponding energy is the sum of the weights on the edges (in red) crossed
by the minimum cut.
T-Links Weight
S → ni,1 ∞ni,T+1 → T ∞
N-Links Weight
ni,j → ni+1,j ∞ (1)
ni,j ← ni+1,j αni,j → ni,j+1 Di(j) = φ(xi(j))ni,j ← ni,j+1 ∞ (2)
Table 7.1: Edge weights, i.e., link capacities, of the graph for robust time delay estimation.
Graph nodes ni,j are indexed by a line index i, trial number, and a column index j corre-
sponding to time. The infinite weight (1) guarantees that l∗i is unique for i fixed, i.e., that the
cut intersects each line only once and (2) guarantees that l∗i+1 ≥ l∗i , i.e., that the function is
increasing. In practice φ(xi(j)) = M − xi(j) ≥ 0 where M = maxi,jxi(j).
223
ure 7.11(f). In the synthetic case, we observe a very good match between the average of the
realigned time series and the reference used to generate the data. As expected, realigning the
time series provides higher and narrower deflections, because it greatly reduces the blurring
effect caused by the variable delays of the neural response.
Simulation Experiment
Time (ms)
Tria
l
100 200 300 400 500
20
40
60
80
100
(a)
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(b)
Time (ms)
Trial
0 100 200 300 400 500
20
40
60
80
100
(c)
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(d)
0 100 200 300 400 500
0
0.05
0.1
0.15
Time (ms)
Po
ten
tia
l
No correction Lag correction
(e)
−200 0 200 400 600 800
−4
−2
0
2
4
6
8
Time (ms)
Pote
ntial
No correction Lag correction
(f)
Figure 7.11: Evoked potentials illustrations using single-trial latency estimation. Lags ex-
traction on reordered raster plot of synthetic time series (a) and oddball time series (b). Lags
correction on raster plot of synthetic time series (c) and oddball time series (d). (e) evoked po-
tential computed after reordering and lags correction on synthetic dataset (the green curve is
the template used to generate the data, the red curve is the evoked potential without reorder-
ing and the blue curve is the evoked potential after reordering). We observe a good fit between
the template and the evoked potential after reordering. (f) evoked potential computed after
reordering and lags correction on the auditory oddball dataset (the red curve is the evoked
potential without reordering and the blue curve is the evoked potential after reordering).
7.5 PARAMETER ESTIMATION AND ROBUSTNESS
The output of the spectral reordering depends on the definition of the distance dX and on
224 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
the σ used in the Gaussian kernel. It also depends on the parameter s used to weight the
Laplacian with the density, although our experience is that the influence of s on the results
is negligible. In order to limit the computation time of the following procedure, s was set to
1 by default, corresponding to the Diffusion Map algorithm proposed in [37]. According to
our definition (7.5), dX depends on the exponent r. Once the reordering is done, the graph
cut method requires the setting of α used to control the regularity of the cut. We propose an
automatic way to set these parameters. The robustness of the procedure is evaluated with
numerical simulations.
7.5.1 Parameter estimation
The spectral reordering is good if it succeeds in exhibiting the monotonic structure observed
for example in figure 7.11(a) and figure 7.11(b). With a proper reordering, the graph cut pro-
cedure, constrained to provide non decreasing cuts, reaches lower energy levels. This obser-
vation motivates the following strategy to estimate the parameters of the spectral reordering.
Estimating the parameters of the spectral reordering
Let us denote the parameters of the spectral reordering θ = (r, σ) and E∗α(θ) the value of the
energy reached by the graph cut algorithm with α fixed. The best θ∗ is simply obtained by:
θ∗ = arg minθ
E∗α(θ) .
In practice, the xi are normalized and dX ;r is designed to take its values in [0, 1]. For r and
s fixed, the tested values for σ were (0.1 + 0.1k)k with k = 1, . . . , 20. Using a fixed range of
values for σ is made possible by the constraint dX ;r ∈ [0, 1].
For a standard M/EEG dataset with a few hundreds of trials, testing all the different sets
of θ takes a few seconds. Lag extraction results for the oddball data are presented in fig-
ure 7.13 for different values of α. For α set to 0.1, Eα is displayed in figure 7.12 as a function
of σ with r = 1 and r = 2. The optimum is reached for r = 2 and α = 0.05.
0 0.05 0.1 0.15 0.234
36
38
40
42
44
46
Sigma
E lags
r=1r=2
Figure 7.12: E∗α as a function of r and σ. Computation is done on the oddball dataset with
α = 0.1.
Estimating α
For α fixed, the above procedure provides a way to estimate the parameters of the spectral
reordering. In order to find the α, we propose a method based on K-fold cross-validation.
In figure 7.11(e), it can be observed that the new evoked potential obtained after lag cor-
rection matches the template used in the simulation. This validates the procedure in the
225
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(a) α = 0.01, σ = 0.05
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(b) α = 0.1, σ = 0.05
Time (ms)
Trial
−7 187 382 578 773
50
100
150
200
(c) α = 1, σ = 0.15
Figure 7.13: Reordered raster plots with lags estimate for different values of α. The σ are
automatically defined for α fixed as described in section 7.5.1. Results are obtained with
r = 2.
synthetic case. Unfortunately, in practice there is no “ground truth”. To circumvent this ob-
stacle, the proposed strategy consists in estimating the new evoked potential e(t) on a portion
of the data, called the learning set, and checking if it correlates well with the rest of the data
called the test set. The dataset is in practice partitioned into K disjoint subsets (Ck)k. For
k ∈ [1,K], the test set is the kth subset and the learning set is the rest. With K = 10, the
evoked potential ek(t) is estimated on 90% of the data and tested on the remaining 10%.
Since the lag is unknown for the test set, the correlation between the evoked potential
and each time series is given by the maximum of the cross correlation with arbitrary lag. The
score of generalization Sk is given by:
Sk =∑
i∈Ck
maxτ〈ek(·), xi(· − τ)〉
The procedure is run K times leading to a score S =∑
k Sk.
The α that achieves the maximum score S is selected.
In practice the tested values for α were 0, 0.001, 0.01 and 0.1. As expected, with no noise,
the procedure automatically sets α to 0. On the oddball EEG dataset, the procedure leads
to α = 0.1. This also confirms that using smoothing terms Vi(li, li+1) is mandatory in the
presence noise to obtain a proper estimate of the latencies and subsequently of the evoked
response.
With 4 values for α and 10 values for σ, the computation time on the oddball dataset is
around 5 minutes per value of r. About half of the time is spent in the computation of the
eigenvectors. Note that for each set of parameters the computation is independent. This
allows easy parallel computing which could speed up the parameter estimation procedure.
7.5.2 Validation
In order to validate the procedure and also investigate its robustness to noise, various numer-
ical experiments were performed on simulated datasets. For each dataset, the parameters
were automatically set using the procedure described above. The resulting evoked potential
is denoted e∗(t). Assuming the latencies are known, by realigning the data and averaging,
one obtains the best evoked potential given the SNR of the dataset. This evoked potential,
obtained with the known real latencies, is denoted erl(t). The real template used to generate
the synthetic dataset is denoted eref (t). The error on the solution was then computed as:
Error =
∣
∣
∣
∣
〈 e∗
‖e∗‖ ,eref
‖eref‖〉 − 〈 erl
‖erl‖,eref
‖eref‖〉∣
∣
∣
∣
∈ [0, 2] (7.7)
226 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
The smaller the error, the better the estimation of the lags and of the evoked potential.
Results with 3 different templates (cf. figure 7.14(a)) are presented in figure 7.14(b). Sim-
ulations were performed with two different types of noise (uncorrelated white noise and noise
computed with an autoregressive filter whose coefficients were fitted on spontaneous EEG
activity.
First it can be observed that in the noiseless case, the error is equal to 0. This validates
the method and demonstrates that the estimation procedure is unbiased. Second it can be
noticed that the best performance is obtained with uncorrelated white noise and that the
errors remain small even with low SNR.
0 100 200 300 400 500−0.2
−0.1
0
0.1
0.2
Time (ms)
Template 1
Template 2
Template 3
(a) The 3 different templates used in thesimulations.
−2 0 2 4 60
0.02
0.04
0.06
0.08
SNR (dB)
Err
or
AR noise
White noise
(b) Errors (see (7.7)) obtained with the 3templates in (a) and 2 different types ofnoise (uncorrelated white noise and noisecomputed with an eighth-order autore-gressive filter whose coefficients were fit-ted on spontaneous EEG activity).
Figure 7.14: Simulation results and errors estimates with different types of evoked responses
(σlag = 50 ms). These values are the average errors obtained out of 10 repetitions of the
experiment. It can be observed that errors are equal to 0 with no noise. The worst error 0.1
is observed with SNR = −3dB on the template used in previous simulations. A correlation
error below 0.1 can be considered as satisfactory.
7.6 DISCUSSION
By making use of advanced graph-based methods, we have proposed a robust and very fast
two-step procedure for estimating the variability of evoked neural responses on single-trial
MEG or EEG data: spectral reordering by Laplacian embedding followed by an estimation
procedure by graph cuts. The whole process runs in a few seconds on real datasets of several
hundred trials covering several hundred time points. The approach is a model-free, “data
driven” algorithm, offering guarantees of global optimality for both of the steps. It does not
suffer from initialization problems and does not assume a model, e.g., imposing that the data
can be well represented in an a priori dictionary of waveforms [16]. The procedure has several
parameters that can be set automatically, as explained. Finally, numerical experiments on
synthetic datasets confirm the robustness to noise of the full procedure.
This contribution puts an emphasis on the latency estimation problem. However, as il-
lustrated in figure 7.5(c), the manifold learning method can handle other types of variability,
for whose estimation a graph cut procedure could be designed. As long as the model of a
1D manifold holds, i.e., that the variability can be parameterized by a single parameter, the
methodology of this chapter can be applied.
227
Quantifying the variability of brain response delays, non invasively, in humans, can help
to improve our understanding of the cognitive treatment of information. As presented in this
chapter, once the delay has been corrected on each trial, the data can be realigned to the
neural response, improving the quality of the estimated evoked response. A better control of
the temporal aspects of the signal can moreover improve the spatial precision of the source
reconstructions when solving the inverse problem [11].
Another application of single-trial estimation is related to the correlations between dif-
ferent channels, coming from the correlations between latencies of different functional brain
regions. By computing delays of response independently on different channels, interactions
between delays can be investigated. We can hypothesize that two neural processes that are
independent exhibit uncorrelated delays while two sequential neural tasks have highly cor-
related delays. To quantify such interactions between various brain functional areas, simple
correlations can be computed. It would be interesting to investigate what difference the delay
correction, based on the data of one channel, makes to the amplitude of evoked potential on
other channels. An augmentation of this amplitude on an other channel could be interpreted
as a strong correlation between them, while an absence or very weak modification of this
amplitude would suggest on the contrary an absence of correlation or a very weak one. Such
questions, raised in the cognitive neurosciences community, can be addressed with the help
of the method proposed in this chapter.
Although our method was applied to time series coming from a single EEG channel, it
is also possible to estimate the delay of activations of an ICA component or more generally
any source configuration. By employing signal-space projectors [208], or by projecting the full
EEG or MEG recordings onto the forward field of a source configuration, we obtain a time
series for each repetition of the experiment, which is what our algorithm requires as input.
The method detailed in this chapter has been implemented as an EEGLAB plugin. The
code snippet in table 7.2 details how the plugin can be used from a Matlab script.
228 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
1 % Example Matlab code for single-trial latency estimation
2 % Load data
3 load(’data/oddball3-num1-512Hz-chan10.set’,’-mat’);
4
5 % Set parameters
6 use_ica = false; % Set to true, if you want to realign based on an ICA component
7 channel = 1; % Index of channel or ICA component used for realignment
8 time_win = [150 500]; % (ms) : work on this time window
9 bad_trials = []; % set bad trials
10
11 clear options
12 options.sigma = [0.01:0.01:0.2];
13 options.alpha = [0.001,0.01,0.1];
14 [EEG, com, order, lags, event_type, E_lags] = ...
15 pop_extractlag( EEG , use_ica, channel, time_win, options);
16
17 % View ERP image reordered
18 figure;
19 pop_erpimage(EEG,1, [channel],[],EEG.chanlocs(channel).labels,1,1, ...
20 event_type , [],’latency’ ,’yerplabel’,’\muV’,’erp’,’cbar’);
21
22 % Re-epoch the data
23 EEG = pop_epoch( EEG, event_type, [-0.4 0.3]);
24
25 % View ERP image of re-epoched data
26 figure;
27 pop_erpimage(EEG,1, [channel],[],EEG.chanlocs(channel).labels,1,1,,[],’’, ...
28 ’yerplabel’,’\muV’,’erp’,’cbar’);
Table 7.2: Running the lag extraction pipeline on an EEGLAB dataset from the command
line. Source code of the EEGLAB plug-in is available on the INRIA Forge https://gforge.
inria.fr/projects/eeglab-plugins/.
229
7.7 CONCLUSION
The method presented in this chapter provides a computationally efficient and princi-
pled framework to address the challenging problem of estimating parameters on single-trial
M/EEG data. Single-trial data analysis is an important goal in the M/EEG community as
such analysis can give access to estimates that are not biased by the averaging process used
with classical ERP studies.
The source code and the demo scripts necessary to reproduce the figures of this chapter
are available in a Matlab EEGLAB Plug-in:
https://gforge.inria.fr/projects/eeglab-plugins/
233
In this thesis, the main methodological and theoretical aspects of M/EEG data processing
have been covered, from accurate solution of the forward problem to efficient approaches
to inverse problem in the context of distributed source models, including the challenging
problem of single-trial data processing. We have been the main contributor to an open source
software project, OpenMEEG, that offers to the M/EEG community the most accurate BEM
solver available today.
Our work with experimental data led us to the analysis and implementation of state-of-
the-art inverse solvers, domain to which we contributed by introducing a framework that
enables to include multiple experimental conditions simultaneously. This work was moti-
vated by the ambition to demonstrate that retinotopic mapping was possible with MEG. This
topic was investigated from the design of an experimental protocol and the exploration of the
data to the construction of principled methodology that allowed us to obtain promising results
even if the final objective that consists in achieving timing of cortical processing in the visual
cortex with only MEG is still for us an open problem.
Our interest on the research area motivated us to address some hard and still open ques-
tions in the field. Going beyond simple localization, we proposed a tracking algorithm working
on triangular meshes that offers interesting perspectives for the investigation of cortical dy-
namics. We applied this method to visual processing and somatosensory MEG data which
demonstrated that such an approach could provide insight into the timing of cortical process-
ing.
The last topic addressed during this thesis concerns the problem of extracting information
on single-trial M/EEG data, which is an issue of major interest, as such methods can give
access to neural response estimates that are not biased by the averaging process used with
classical ERP studies.
The contributions are thus threefold: theoretical, methodological and applied. Throughout
this thesis, we tried to make the right mathematical choices to model the problems of interest.
We believe this enabled us to propose appropriate and efficient algorithms so that we could
finally tackle challenging neuroscience questions.
To summarize:
• We contributed to provide to the M/EEG community the most precise forward problem
solver when considering realistic head models with piecewise constant conductivities.
• We presented the mathematical and computational details of the state-of-art inverse
problem methods. Our implementation of all these methods is freely available in an
open source project called EMBAL . The community has now access to simple but very
efficient convex optimization schemes that we hope will contribute to the widespread
use of such methods.
• We developed a framework for M/EEG inverse modeling able to integrate as a priori
anatomo-functional knowledge between experimental conditions.
• We contributed to set up a full experimental study from protocol design and data explo-
ration to the construction of a data analysis pipeline that offers promising results for
the study of the visual cortex with MEG.
• We proposed a novel approach to address the hard problem of single-trial data analy-
sis. We believe that this contribution can be a valuable tool to investigate inter-trial
variabilities, which is of major interest for cognitive neuroscience studies.
Finally, we hope that this thesis elucidates some aspects of M/EEG data processing in
order to improve the understanding but also the use of advanced methodological tools in the
community. Consequently, we hope that such a better understanding will improve the quality
234 CHAPTER 7. SINGLE-TRIAL ANALYSIS WITH GRAPHS
of results obtained with EEG and MEG in order for these brain functional imaging modalities
to have a higher impact on both basic neuroscience and clinical studies.
Research Perspectives
Single-trial analysis
In this thesis, we approached the challenging problem of single-trial data analysis. This
topic has a major interest especially in cognitive studies where the inter-trial variability can
provide valuable information. In the near future, we plan to apply our existing tools for
delay estimation to explore the human motor system. The next methodological step would
be to extend our approach to various kinds of inter-trial variabilities. This work is currently
starting in collaboration with Boris Burle in Marseille at CNRS / Universite de Provence.
We would also be interested in single-trial inverse modeling with non-linear inverse solvers.
When considering linear inverse solvers, averaging the single-trial estimates or inverting the
averaged M/EEG measurements provides the same result. With non linear inverse solvers,
this does not hold. We would like to approach the problem using sparsity inducing priors
where the penalization would be on the time-frequency decompositions. M/EEG signals are
oscillatory. Therefore, using a time-frequency representation to constrain the inverse solvers
seems very reasonable.
Another problem we would like to address is “resting state M/EEG”. While in the last
two research problems the recordings were time locked to the beginning of the stimulation,
resting state M/EEG would impose to work on raw data without “time triggers” such as the
stimulus onsets. Independent component analysis (ICA) is an approach to address this prob-
lem but it suffers from some practical problems like the necessity to set a priori the number of
components. The approach we plan to consider is based on a research topic generally referred
to as “sparse coding”. Our preliminary results on this topic seem to confirm the power of the
approach.
Investigations of the visual system with MEG
The neuroscience topic that motivated this thesis is the understanding of human visual sys-
tem using M/EEG. To go one step further from where we arrived on the project of retinotopic
mapping with MEG, we would like to conduct more experiments and investigate the role
of the different stimulation parameters on the activation maps obtained. We would like to
investigate the comparison of mapping results using either classical visual evoked poten-
tials or steady state stimulations as in chapter 5. This would challenge the robustness and
ultimately improve our data processing pipeline for retinotopic mapping with MEG. If our
pipeline reaches a limit, we might consider going towards a generative model of the structure
of V1 and V2 to better constrain the inverse problem. To conclude, we believe that a necessary
condition for such a project to succeed is to have more interaction with experimentalists in
order to benefit from their expertise to achieve a good experimental control and an easy low
level processing of the data.
With well-designed experiments and robust data processing pipelines, the challenging
objective that is to achieve precise estimations of cortical dynamics could be addressed. Po-
tential approaches to this problem are based the analysis of phase differences as mentioned
in chapter 5 or using tracking methods like in chapter 6.
Multi-conditions inverse modeling
235
During this thesis, we investigated the use of multiple experimental conditions simultane-
ously within the inverse modeling. The classical way consists in applying inverse solver to
the data from each condition individually and in a second step comparing the source esti-
mates obtained for each of them. However, this approach has limitations especially when
considering inverse modeling with sparse priors.
Being able to compare experimental conditions is of major interest for brain research. A
cognitive question that could be answered is: Is this part of the brain activated by condition 1
and condition 2 simultaneously? Or equivalently what activation pattern is shared between
condition 1 and 2, and what is different in the cognitive process?
We think our expertise on the field of M/EEG inverse modeling provides us good tools to
address such questions in a mathematically principled and computationally efficient way.
Improving the M/EEG data processing pipelines
As it appears clearly in this thesis, we devoted a lot of time to develop and disseminate
software, to detail implementation problems, and to provide practical tips to help analyze
M/EEG data. The motivation of this comes from the observation that analyzing M/EEG data
is difficult, time consuming, and sometimes even despairing when the results are not as nice
as expected.
When confronted with real data, we want to be confident about the tools we use. We do
not want to permanently wonder if the low quality of the result is due to the bad quality of
the data, to a bad choice of a method or, worse, to a “buggy” implementation.
To avoid this, the challenge is to set up M/EEG data processing pipelines whose building
blocks are reliable and easy to use. These building blocks start from data processing and
removal of artefacts, to the accurate and automatic forward modeling with realistic head
models, but also the fast and efficient computation of inverse solvers.
The best examples today of freely available software packages that attempt to achieve
these goals are EEGLAB, Fieldtrip, Brainstorm, and MNE. Each of these packages have
different ambitions, a different usability for a non expert, a different level of automatization,
and a different level of flexibility when comes the problem of integrating new tools in the
processing pipelines.
We believe that M/EEG research will benefit from the improvement of these processing
pipelines. By sharing data and software and by standardizing pipelines, one could guarantee
the reproducibility of the results and facilitate the comparison between methods. Sometimes,
we wonder if M/EEG research is not as popular as fMRI because of the complexity of each of
the steps to go from raw MEG data to clean cortical activations. The different steps require
mathematical and computing skills, a relatively good understanding of the physics, and an
ability to interpret the results for neuroscience. And as it is known “A chain breaks at its
weakest link”.
APPENDIX A
KRONECKER PRODUCTS
Kronecker product is a very convenient tool that often enables compact and readable matrix
computations.
Definition A.1. Kronecker product Let A ∈ Rm×n and B ∈ R
p×q. Then the Kronecker
product (or tensor product) of A and B is defined as the matrix
A⊗B =
a11B · · · a1nB...
. . ....
am1B · · · amnB
∈ R
mp×nq (A.1)
PROPERTIES OF KRONECKER PRODUCTS
Theorem A.1. Let A ∈ Rm×n,B ∈ R
r×s,C ∈ Rn×p and D ∈ R
s×t. Then
(A⊗B)(C⊗D) = (AC⊗BD) ∈ Rmr×pt .
Theorem A.2. For all A and B, (A⊗B)T
= AT ⊗BT
Theorem A.3. If A and B are non singular, (A⊗B)−1 = A−1 ⊗B−1
Theorem A.4. Let A ∈ Rm×n have a singular value decomposition UAΣAVT
A and let B ∈R
p×q have a singular value decomposition UBΣBVTB . Then
(UA ⊗UB)(ΣA ⊗ΣB)(VTA ⊗VT
B)
yields a singular value decomposition of A ⊗ B (after a simple reordering of the diagonal
elements of ΣA ⊗ΣB and the corresponding right and left singular vectors).
Let A ∈ Rm×n. The matrix A can be converted to a vector by stacking all columns of A on
top of one another. Let a·i denote the ith column of A.
vec(A) =
a·1...
a·n
∈ R
mn
Proposition A.5. Let A ∈ Rm×n and B ∈ R
p×q and X ∈ Rn×p. Then
vec(AXB) = (BT ⊗A)vec(X) .
The proposition provides the following result, allowing to compute the product of a vector
with a Kronecker product without actually assembling the full matrix A⊗B.
239
240 APPENDIX A. KRONECKER PRODUCTS
For x ∈ Rnq,
(A⊗B)x = vec(Bmat(x)AT )
where the notation mat(x) denotes the matrix in Rq×n such that vec(mat(x)) is equal to x.
APPENDIX B
INTRODUCTION TO GRAPH-CUTS
Lets consider an oriented graph G = (V, E), where V is the set of vertices, often called nodes,
and E ⊂ V2 is the set of oriented edges, i.e., (a, b) 6= (b, a).
Lets consider the function w : E → R+ ∪ +∞, that assigns a weight, also called capacity,
to each edge. Notice that values of w are necessarily positive.
Among E are two particular vertices, S and T . S is called the source and does not have
any incoming edges while T is called the sink and does not have any outgoing edges.
Here is an example of such a graph:
S
T
45
12 1
3
2
Definition B.1 (Cut). : A cut (S,T) of the graph G is a partition of the vertices (i.e., S∪T = Vand S ∩T = ∅) such that S ∈ S and T ∈ T
Here is an example of cut:
S
T
45
12 1
3
2
Definition B.2 (Weight of a cut). : The weight of a cut (S,T) is defined as
c(S,T) =∑
(p,q)∈Ep∈S,q∈T
w(p, q) (B.1)
241
242 APPENDIX B. INTRODUCTION TO GRAPH-CUTS
The weight of the cut presented in the figure above is given by the sum of the weights of
the red colored edges, i.e., 2 + 5 = 7.
Definition B.3 (Minimum cut - MinCut). : A cut is minimal if the weight of the cut is not
larger than the weight of any other cut.
The following figure presents a MinCut of G. Its weight is 2 + 3 = 5.
S
T
45
12 1
3
2
The minimum cut might however not be unique. In the figure below the two represented
cuts are both minimum with a weight of 5.
S
T
45
12 1
3
2
One of the fundamental results in combinatorial optimization is that the minimum cut
problem can be solved by finding a maximum flow from the source S to the sink T . Speaking
informally, maximum flow is the maximum “amount of water” that can be sent from the
source to the sink by interpreting graph edges as directed “pipes” with capacities equal to
edge weights. The theorem of Ford and Fulkerson [53] states that a maximum flow from S to
T saturates a set of edges in the graph dividing the nodes into two disjoint parts, S and T,
corresponding to a minimum cut. Thus, MinCut and MaxFlow problems are equivalent. In
fact, the maximum flow value is equal to the cost of the minimum cut.
Presented more formally it leads to:
Definition B.4 (Flow). Let G = (V, E) be a graph, w its capacity function, S and T the source
and the sink. A flow is a function f : E∗ → R+ (E∗ is the set of edges and their inverse)
satisfying the following properties:
- for each edge e = (p, q) ∈ Ef(p, q) = −f(q, p) (B.2)
- for each vertex p besides S and T ,
∑
e=(p,.)e∈E∗
f(e) = 0 (B.3)
243
- for each edge e ∈ E ,
f(e) ≤ w(e) (B.4)
The constraint in equation (B.3) corresponds to a conservation law similar to the Kirchoff
law. The constraint in equation (B.4) imposes the flow in edge e to be smaller then its capacity
w(e). Both equation (B.3) and equation (B.4) imply that:
∑
e=(S,.)
f(e) =∑
e=(.,T )
f(e) (B.5)
Equivalently, the amount of liquid that comes out of the source S is equal to the amount of
liquid that goes into the sink T . The quantity is called the value of the flow.
Definition B.5 (Maximum flow - MaxFlow). : A flow is maximum if its value is not smaller
than the value of any other flow.
Theorem B.1 (MinCut - MaxFlow equivalence). The MinCut of a graph G as defined above,
is equal to the MaxFlow [53].
A MaxFlow on the example graph with a corresponding MinCut:
S
T
4/44/5
1/10/2 1/1
3/3
2/2
Our interest for problems that can be reformulated as a minimum cut problem comes from
the following theorem.
Theorem B.2 (MinCut - MaxFlow complexity). Finding the maximum flow, and equivalently
the minimum cut, of a graph is a problem that can be solved in polynomial time.
In other words, this theorem implies that the MinCut problems are “efficiently solvable”
or “tractable”. In practice, minimum cuts are not are obtained via computation of maximum
flows.
Algorithms for the MinCut and MaxFlow Problem
There are many standard polynomial time algorithms for MinCut/MaxFlow [40]. These al-
gorithms can be divided into two main groups: “push-relabel” style methods [89] and algo-
rithms based on augmenting paths. In practice the push-relabel algorithms perform better
for general graphs. In vision applications, however, the most common type of a graph is a
two or a higher dimensional grid. For regular graphs like grids, Boykov and Kolmogorov
[22] developed a fast augmenting path algorithm which often significantly outperforms the
push-relabel algorithm. Furthermore, its observed running time is linear.
We now explain briefly how the augmenting path algorithm works. Given a flow f, the
residual capacity r(p, q) of an edge e = (p, q) ∈ E linking node p to node q is the maximum
additional flow that can be sent from node p to node q using the edges (p,q) and (q,p). The
residual capacity r(p, q) has two components: the unused capacity of the edge (p,q): w(e)−f(e)
244 APPENDIX B. INTRODUCTION TO GRAPH-CUTS
and the current flow f(q, p) from node q to node p which can be reduced to increase the
flow from p to q. A residual graph G(f) of a graph G consists of the node set V and the
edges with positive residual capacity (with respect to the flow f ). The topology of G(f) is
identical to G. G(f) differs only in the capacity of its edges and so for zero flow, i.e., f(p, q) =
0 ∀(p, q) ∈ E , G(f) is same as G. An augmenting path is a path from the source to the sink along
unsaturated edges of the residual graph. Augmenting path based algorithms for solving the
max-flow problem work by repeatedly finding augmenting paths in the residual graph and
saturating them. When no more augmenting paths can be found, i.e., the source and sink are
disconnected in the residual graph, the maximum flow is obtained.
APPENDIX C
TIME FREQUENCY ANALYSIS WITH
GABOR FILTERS
Gabor filters are linear filters localized in time and frequency. In time, they consist of complex
exponential functions modulated by a Gaussian with standard deviation σ. The parameter σ
controls the trade-off between temporal precision and spectral precision of the filter.
Let ψσt0,f0
denote the Gabor filter centered at time t0 and at frequency f0 (cf. figure C.1)
[143].
t0
f0
time
frequency
σt
σf
Figure C.1: Spectral support of the Gabor filter ψσt0,f0
. The parameter σ in the text corre-
sponds to σt in the figure.
The expression of ψσt0,f0
is given by:
ψσt0,f0
(t) = (πσ2)−1/4e2iπf0(t−t0)e−(t−t0)2
2σ2 .
and its Fourier transform is given by:
ψσt0,f0
(f) = (4πσ2)−1/4e−2iπft0e−σ2
2 (2π(f−f0))2
.
The temporal resolution of ψσt0,f0
is denoted σ and by application of the Fourier transform,
one can observe that its spectral resolution is proportional to 1/σ. It means that the area of
the box in figure C.1 is constant whatever the choice of σ. In order to be precise in time one
needs to reduce σ which implies to lose in spectral resolution.
M/EEG signals are oscillatory and typically consist of bursts of activations with a few
oscillations. These oscillations can be observed on the raw signal, especially at low frequen-
cies. For this reason, we prefer to parameterize Gabor filters with an oscillatory parameter ξ
245
246 APPENDIX C. TIME FREQUENCY ANALYSIS WITH GABOR FILTERS
Figure C.2: Gabor atoms for different values of the oscillation parameter ξ (modified by vary-
ing f0 with a contant σ). A low oscillation parameter produces a transient wave, and a high
value a sustained oscillation.
rather than with σ [16]. The parameterization with σ is more classical in the signal process-
ing community. The parameter ξ is defined by:
ξ = 2πf0σ .
The parameter σ stretches or compresses the time support of the filter without modifying
its frequency, whereas ξ can be related to the number of visible oscillations of the filter (cf.
figure C.2). When f0 increases, the parameter σ decreases in order to maintain the number
of oscillations ξ constant.
One can observe in figure C.3 an example of time frequency decomposition obtained with
Gabor filters (ξ = 10). One can notice that the temporal resolution of the atoms increases
with the frequency.
time (s)
frequency (
Hz)
2 4 6579
11131517192123252729313335373941
200
400
600
Figure C.3: Sample time frequency map, a.k.a. spectrogram, estimated with Gabor filters
with ξ = 10 on real MEG data extracted from the retinotopy study (cf. chapter 5).
APPENDIX D
PUBLICATIONS OF THE AUTHOR
JOURNAL PAPERS
A. Gramfort, T. Papadopoulo, S. Baillet and M. Clerc, Tracking cortical activity with spatio-
temporal constraints using graph cuts, SIAM Imaging Science, (submitted).
A. Gramfort, R. Keriven and M. Clerc, Graph-based estimation of 1-D variability in event re-
lated neural responses, IEEE Transactions on Biomedical Engineering (TBME), (submitted).
A. Gramfort and M. Kowalski, M/EEG inverse problem with structured sparse priors: why
and how., In preparation.
A. Gramfort, T. Papadopoulo, E. Olivi and M. Clerc, OpenMEEG: opensource software for qua-
sistatic bioelectromagnetics., In preparation.
PEER-REVIEWED CONFERENCE PAPERS AND ABSTRACTS
A. Gramfort and M. Kowalski, Improving M/EEG source localization with an inter-condition
sparse prior, Proceedings International Symposium on Biomedical Imaging: From Nano to
Macro (ISBI), jun. 2009.
M. Kowalski and A. Gramfort, A priori par normes mixtes pour les problemes inverses: Appli-
cation a la localisation de sources en M/EEG, Proceedings GRETSI, sept. 2009.
B. Cottereau, J. Lorenceau, A. Gramfort, M. Clerc, B. Thirion and S. Baillet, Fine chronomet-
ric mapping of human visual areas, Human Brain Mapping, jun. 2009.
A. Gramfort, T. Papadopoulo, B. Cottereau, S. Baillet and M. Clerc, Tracking cortical activ-
ity with spatio-temporal constraints using graph cuts, International Conference on Biomag-
netism (BIOMAG), aug. 2008.
B. Cottereau, A. Gramfort, J. Lorenceau, M. Clerc, B. Thirion and S. Baillet, Fast retinotopic
mapping of visual fields using MEG, Human Brain Mapping, jun. 2008.
A. Gramfort, B. Cottereau, M. Clerc, B. Thirion and S. Baillet, Challenging the estimation of
cortical activity from MEG with simulated fMRI-constrained retinotopic maps, Proceedings of
247
248 APPENDIX D. PUBLICATIONS OF THE AUTHOR
the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), 4945-4948, aug 2007.
A. Gramfort and M. Clerc, Low dimensional representations of MEG/EEG data using Lapla-
cian Eigenmaps, Proceedings Noninvasive Functional Source Imaging of the Brain and Heart
(NFSI), 169-172, oct. 2007.
M. Clerc, A. Gramfort, P. Landreau and T. Papadopoulo, MEG and EEG processing with Open-
MEEG, Proceedings of Neuromath, 2007.
B. Cottereau, J. Lorenceau, A. Gramfort, B. Thirion, M. Clerc and S. Baillet, Fast Retinotopic
Mapping of Visual Fields using MEG, Proceedings of Neuromath, 2007.
SOFTWARE
OpenMEEG: C++ package to solve the M/EEG forward problem with the symmetric boundary
element method.
https://gforge.inria.fr/projects/openmeeg
EMBAL: Matlab Toolbox for M/EEG inverse modeling with distributed source models.
https://gforge.inria.fr/projects/embal
Matlab EEGLAB Plug-in for single-trial parameter estimation.
https://gforge.inria.fr/projects/eeglab-plugins/
Bibliography
[1] Beck A and Teboulle M. Fast iterative shrinkage-thresholding algorithm for linear
inverse problems. SIAM J. Imaging Sciences, 2:183 – 202, 2009.
[2] Gramfort A., Cottereau B., Clerc M., Thirion B., and Baillet S. Challenging the estima-
tion of cortical activity from MEG with simulated fMRI-constrained retinotopic maps.
In EMBC 2007: Proceedings of the 29th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, Jun 2007.
[3] J P R Bolton A A Ioannides and C J S Clarke. Continuous probabilistic solutions to the
biomagnetic inverse problem. Inverse Problems, 6:523–542, 1990.
[4] G Adde. Methodes de traitement d’image appliques au probleme inverse en Magneto-
Electro-Encephalographie. PhD thesis, Ecole Nationale des Ponts et Chaussees, 2005.
[5] G Adde, M Clerc, and R Keriven. Imaging methods for MEG/EEG inverse problem.
In Proc. Joint Meeting of 5th International Conference on Bioelectromagnetism and 5th
International Symposium on Noninvasive Functional Source Imaging, 2005.
[6] Cottereau B. Modeles hierarchiques en imagerie MEG/EEG - Application a la creation
rapide de cartes retinotopiques. PhD thesis, Universite Paris-Sud 11, May 2008.
[7] B. Thirion and O. Faugeras. Nonlinear dimension reduction of fMRI data: the Lapla-
cian embedding approach. In Proceedings ISBI, pages 372–375, Apr 2004.
[8] B. Thirion, S. Dodel, and J.-B. Poline. Detection of signal synchronizations in resting-
state fMRI datasets. NeuroImage, 29:321–327, Aug 2005.
[9] S Baillet and L Garnero. A bayesian approach to introducing anatomo-functional priors
in the eeg/meg inverse problem. Biomedical Engineering, 44(5), Jan 1997.
[10] S. Baillet, J. C. Masher, and R. M. Leahy. Electromagnetic brain imaging using Brain-
Storm. In Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium
on, volume 1, pages 652–655, 2004.
[11] S. Baillet, J.C. Mosher, and R.M. Leahy. Electromagnetic brain mapping. IEEE Signal
Processing Magazine, 18(6):14–30, 2001.
[12] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle. A l1-unified variational frame-
work for image restoration. In T. Pajdla and J. Matas, editors, Proc. European Con-
ference on Computer Vision (ECCV), volume LNCS 3024, pages 1–13, Prague, Czech
Republic, May 2004. Springer.
[13] Murat Belge, Misha E. Kilmer, and Eric L. Miller. Efficient Determination of Multiple
Regularization Parameters in a Generalized L-Curve Framework. Inverse Problems,
18:2002, 2002.
249
250 BIBLIOGRAPHY
[14] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data
representation. Neural Computation, 15(6):1373–1396, jun 2003.
[15] C. Benar, M. Clerc, and T. Papadopoulo. Adaptive time-frequency models for single-
trial M/EEG analysis. In Karssemeijer and Lelieveldt, editors, Information Processing
in Medical Imaging, volume 4584 of Lecture Notes in Computer Science, pages 458–469.
Springer, 2007.
[16] C. Benar, T. Papadopoulo, B. Torresani, and M. Clerc. Consensus matching pursuit for
multi-trial EEG signals. Journal of Neuroscience Methods, 180(1):161–170, 2009.
[17] C.G. Benar, D. Schon, S. Grimault, B. Nazarian, B. Burle, M. Roth, J.M. Badier, P. Mar-
quis, C. Liegeois-Chauvel, and J.L. Anton. Single-trial analysis of oddball event-related
potentials in simultaneous EEG-fMRI. Human Brain Mapping, 28:602–613, 2007.
[18] P. Berg and M. Scherg. A fast method for forward computation of multiple-shell spher-
ical head models. Electroencephalogr. Clin. Neurophysiol., 90(1):58–64, 1994.
[19] D.A. Boas, D.H. Brooks, E.L. Miller, C.A. DiMarzio, M. Kilmer, R.J. Gaudette, and Quan
Zhang. Imaging the body with diffuse optical tomography. Signal Processing Magazine,
IEEE, 18(6):57–75, Nov 2001.
[20] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University
Press, March 2004.
[21] Y Boykov and M Jolly. Interactive graph cuts for optimal boundary and region segmen-
tation of objects in ndimages. International Conference on Computer Vision, 1:115, Jan
2001.
[22] Y Boykov and V Kolmogorov. An Experimental Comparison of Min-Cut/Max-Flow Al-
gorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 26(9), Sep 2004.
[23] Y. Boykov and O. Veksler. Mathematical Models in Computer Vision: The Handbook. N.
Paragios, Y. Chen and O. Faugeras Eds., chapter Graph Cuts in Vision and Graphics:
Theories and Applications. Springer, 2006.
[24] A.A Brewer, J.L., A R Wade, and B A Wandell. Visual field maps and stimulus selectiv-
ity in human ventral occipital cortex. Nature Neuroscience, 8(8):1102–1109, 2005.
[25] K. Brodmann. Vergleichende Lokalisationslehre der Grobhirnrinde. J.A.Barth, Leipzig,
1909.
[26] DH Brooks, GF Ahmad, RS MacLeod, and GM Maratos. Inverse electrocardiography
by simultaneous imposition of multiple constraints. IEEE transactions on biomedical
engineering, 46(1):3–18, 1999.
[27] A Bruce, S Sardy, and P Tseng. Block coordinate relaxation methods for nonparamatric
signal denoising. Proceedings of SPIE, 3391(75), Jan 1998.
[28] J Bullier. Integrated model of visual processing. Brain Res. Reviews, 36:96–107, 2001.
[29] L Chalupa and J.S Werner. The visual neurosciences. The MIT Press, 2004.
[30] A Chambolle. An algorithm for total variation minimization and applications. Journal
of Mathematical Imaging and Vision, 20(1-2):89–97, Jan 2004.
BIBLIOGRAPHY 251
[31] A. Chambolle and P. L. Lions. Image recovery via total variation minimization and
related problems. Numer. Math., 76:167–188, 1997.
[32] Tony F. Chan, Gene H. Golub, and Pep Mulet. A nonlinear primal-dual method for total
variation-based image restoration. SIAM J. Sci. Comput., 20(6):1964–1977, 1999.
[33] N. Chauveau, X. Franceries, B. Doyon, B. Rigaud, J.P. Morucci, and P. Celsis. Effects of
skull thickness, anisotropy, and inhomogeneity on forward EEG/ERP computations us-
ing a spherical three-dimensional resistor mesh model. Human Brain Mapping, 21:86–
97, 2004.
[34] S Chen, D Donoho, and M Saunders. Atomic decomposition by basis pursuit. SIAM
Journal on Scientific Computing, Jan 1999.
[35] David Cohen. Magnetoencephalography: Evidence of magnetic fields produced by
alpha-rhythm currents. Science, 161(3843):784–786, August 1968.
[36] David Cohen. Magnetoencephalography: Detection of the brain’s electrical activity with
a superconducting magnetometer. Science, 175(4022):664–666, February 1972.
[37] R.R Coifman, S Lafon, A.B Lee, M Maggioni, and Nadler. Geometric diffusions as a tool
for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of
the National Academy of Sciences, 102(21):7426–7431, 2005.
[38] Y. Cointepas, J.-F. Mangin, Line Garnero, J.-B. Poline, and H. Benali. BrainVISA:
Software platform for visualization and analysis of multi-modality brain data. In Proc.
7th HBM, page S98, Brighton, United Kingdom, 2001.
[39] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward split-
ting. Multiscale Modeling and Simulation, 4(4):1168–1200, November 2005.
[40] William J. Cook, William H. Cunningham, William R. Pulleyblank, and Alexander
Schrijver. Combinatorial Optimization. John Wiley & Sons, 1998.
[41] Diego Cosmelli, Olivier David, Jean-Philippe Lachaux, Jacques Martinerie, Line Gar-
nero, Bernard Renault, and Francisco Varela. Waves of consciousness: ongoing cortical
patterns during binocular rivalry. NeuroImage, 23(1):128–140, September 2004.
[42] B. Cottereau, A. Gramfort, J. Lorenceau, B. Thirion, M. Clerc, and S. Baillet. Fast
retinotopic mapping of visual fields using MEG. In Human Brain Mapping, 2008.
[43] B Cottereau, K Jerbi, and S Baillet. Multiresolution imaging of meg cortical sources
using an explicit piecewise model. Neuroimage, Sep 2007.
[44] B. Cottereau, J. Lorenceau, A. Gramfort, M. Clerc, and S. Baillet. Fine chronometric
mapping of human visual areas. In Human Brain Mapping, jun 2009.
[45] B. Cottereau, J. Lorenceau, A. Gramfort, B. Thirion, M. Clerc, and S. Baillet. Fast
retinotopic mapping of visual fields using meg. In Proceedings of Neuromath, 2007.
[46] Benoit Cottereau, Jean Lorenceau, Alexandre Gramfort, Bertrand Thirion, Maureen
Clerc, and Sylvain Baillet. Fast retinotopic mapping of visual fields using meg. In
Proceedings of Neuromath, 2007.
[47] B.N. Cuffin. EEG localization accuracy improvements using realistically shaped head
models. IEEE Trans. on Biomed. Engin., 43(3), 1996.
252 BIBLIOGRAPHY
[48] F.H Lopes da Silva, A van Rotterdam, P Barts, E van Heusden, and W Burr. Model
of neuronal populations. the basic mechanism of rhythmicity. M.A. Corner, D.F. Swaab
(eds) Progress in brain research, 45:281–308, 1976.
[49] A Dale, A Liu, B Fischl, and R Buckner. Dynamic statistical parametric neurotechnique
mapping: combining fMRI and MEG for high-resolution imaging of cortical activity.
Neuron, 26:55–67, 2000.
[50] A Dale and M Sereno. Improved localization of cortical activity by combining EEG and
MEG with MRI cortical surface reconstruction. Journal of Cognitive Neuroscience, Jan
1993.
[51] Anders Dale, Martin Sereno, Bruce Fischl, Sean Marrett, Arthur Liu, Eric Halgren,
Kevin Teich, Christian Haselgrove, Doug Greve, and Florent Segonne. Freesurfer man-
ual.
[52] P.M Daniel and D Whitteridge. The representation of the visual field on the cerebral
cortex in monkeys. Journal of Neurophysiology, 159:203–221, 1961.
[53] G. B. Dantzig and D. R. Fulkerson. On the max-flow min-cut theorem of networks. Ann.
Math. Studies, 38, 1956.
[54] G. Dassios and F. Kariotou. Magnetoencephalography in ellipsoidal geometry. Journal
of Mathematical Physics, 44:220–241, 2003.
[55] I Daubechies, M Defrise, and C De Mol. An iterative thresholding algorithm for linear
inverse problems with a sparsity constraint. Communications on Pure and Applied
Mathematics, Jan 2004.
[56] I Daubechies, R DeVore, M Fornasier, and S Gunturk. Iteratively re-weighted least
squares minimization: Proof of faster than linear rate for sparse recovery. Information
Sciences and Systems, 2008.
[57] J de Munck. A linear discretization of the volume conductor boundary integral equa-
tion using analytically integrated elements. IEEE Trans. Biomed. Eng., 39(9):986–990,
1992.
[58] J.C. de Munck and M.J. Peters. A fast method to compute the potential in the multi-
sphere model. IEEE Trans. on Biomed. Engin., 40(11):1163–1174, 1993.
[59] Arnaud Delorme and Scott Makeig. EEGLAB: an open source toolbox for analysis of
single-trial EEG dynamics including independent component analysis. Journal of Neu-
roscience Methods, 134(1):9–21, 2004.
[60] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete
data via the em algorithm. Journal of the Royal Statistical Society. Series B (Method-
ological), 39(1):1–38, 1977.
[61] D Donoho. De-noising by soft-thresholding. IEEE Trans. Information Theory,
41(3):613–627, May 1995.
[62] R Dougherty, V Koch, A Brewer, B Fischer, J Modersitzki, and B Wandell. Visual field
representations and locations of visual areas v1/2/3 in human visual cortex. Journal of
Vision, 3:586–598, 2003.
[63] Bradley Efron, Trevor Hastie, Lain Johnstone, and Robert Tibshirani. Least angle
regression. Annals of Statistics, 32:407–499, 2004.
BIBLIOGRAPHY 253
[64] S Engel, D Rumelhart, B Wandell, A Lee, G Glober, E-J Chichilnisky, and M Shadlen.
fmri of human visual cortex. Nature, 369:525–529, 1994.
[65] S.A Engel, G.H Glover, and B.A Wandell. Retinotopic organization in human visual
cortex and the spatial precision of functional mri. Cerebral Cortex, 7:181–192, 1997.
[66] D Van Essen, H Drury, S Joshi, and M Miller. Functional and structural mapping
of human cerebral cortex: Solutions are in the surfaces. Proceedings of the National
Academy of Sciences, 95:788–795, 1998.
[67] Sharbrough F, Chatrian G-E, Lesser RP, Luders H, Nuwer M, and Picton TW. American
Electroencephalographic Society Guidelines for Standard Electrode Position Nomencla-
ture. Journal of Clinical Neurophysiology, 8:200–202, 1991.
[68] O Faugeras, F Clement, R Deriche, R Keriven, T Papadopoulo, J Roberts, T Vieville,
F Devernay, J Gomes, G Hermosillo, P Kornprobst, and D Lingrand. The inverse EEG
and MEG problems: The adjoint space approach I: The continuous case. Technical
Report 3673, INRIA, 1999.
[69] I Fawcett, G Barnes, A Hillebrand, and K Singh. The temporal frequency tuning of
human visual cortex investigated using synthetic aperture magnetometry. Neuroimage,
Jan 2004.
[70] D.J Felleman and D.C Essen. Distributed hierarchical processing in the primate cere-
bral cortex. Cereb Cortex, 1:1–47, 1991.
[71] A Ferguson, X Zhang, and G Stroink. A complete linear discretization for calculating
the magnetic field using the boundary element method. IEEE Trans. Biomed. Eng.,
41(5):455–459, 1994.
[72] Agnes Trebuchon-Da Fonseca, Christian-G Benar, Fabrice Bartolomei, Jean Regis,
Jean-Francois Demonet, Patrick Chauvel, and Catherine Liegeois-Chauvel. Electro-
physiological study of the basal temporal language area: A convergence zone between
language perception and production networks. Clinical Neurophysiology, pages 1–12,
Feb 2009.
[73] L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, 1962.
[74] M Fornasier and F Pitolli. Adaptive iterative thresholding algorithms for magnetoen-
cephalography (meg). Journal of Computational and Applied Mathematics, page 10,
Oct 2007.
[75] PT Fox, FM Miezin, JM Allman, DC Van Essen, and ME Raichle. Retinotopic organi-
zation of human visual cortex mapped with positron-emission tomography. Journal of
Neuroscience, 7, 1987.
[76] W.J Freeman. Simulation of chaotic eeg patterns with a dynamic model of the olfactory
system. Biological Cybernetics, 56:139–150, 1987.
[77] J Friedman, T Hastie, H Hofling, and R Tibshirani. Pathwise coordinate optimization.
Annals of Applied Statistics, 1(2):302–332, Jan 2007.
[78] K Friston, L Harrison, J Daunizeau, and S Kiebel. Multiple sparse priors for the m/eeg
inverse problem. Neuroimage, Jan 2008.
[79] K Friston, R Henson, C Phillips, and J Mattout. Bayesian estimation of evoked and
induced responses. Human brain mapping, 27(9):722–35, Sep 2006.
254 BIBLIOGRAPHY
[80] K.J Friston, D.E Glaser, R.N.A Henson, S Kiebel, C Phillips, and J Ashburner. Classical
and bayesian inference in neuroimaging: Applications. NeuroImage, 16(2):484–512,
2002.
[81] K.J Friston, W Penny, C Phillips, and Kiebel. Classical and bayesian inference in neu-
roimaging: Theory. NeuroImage, 16(2):465–483, 2002.
[82] F Fylan, I Holliday, K Singh, and S Anderson. Magnetoencephalographic investigation
of human cortical area v1 using color stimuli. Neuroimage, 6:47–57, Jan 1997.
[83] S. Gabriel, R.W. Lau, and C. Gabriel. The dielectric properties of biological tissues: Ii.
measurements in the frequency range 10 hz to 20 ghz. Physics in Medicine and Biology,
41:2251–2269, 1996.
[84] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence,
6(6):721–741, 1984.
[85] C. Genovese, N. Lazar, and T. Nichols. Thresholding of statistical maps in functional
neuroimaging using the false discovery rate. NeuroImage, 15(4):870–878, 2002.
[86] D Geselowitz. On bioelectric potentials in an homogeneous volume conductor. Bio-
physics Journal, 7:1–11, 1967.
[87] D Geselowitz. On the magnetic field generated outside an inhomogeneous volume con-
ductor by internal volume currents. IEEE Trans. Magn., 6:346–347, 1970.
[88] A P Gibson, J C Hebden, and S R Arridge. Recent advances in diffuse optical imaging.
Physics in Medicine and Biology, 50(4):R1–43, Feb 2005.
[89] A.V. Goldberg and R.E. Tarjan. A new approach to the maximum-flow problem. Journal
of the Association for Computing Machinery, 35(4):921–940, Oct 1988.
[90] G Golub, M Heath, and G Wahba. Generalized cross-validation as a method for choosing
a good ridge parameter. Technometrics, Jan 1979.
[91] I Gorodnitsky, J George, and B Rao. Neuromagnetic source imaging with focuss: a
recursive weighted minimum norm algorithm. Electroencephalography and clinical
Neurophysiology, Jan 1995.
[92] I.F. Gorodnitsky and B.D. Rao. Sparse signal reconstruction from limited data using
FOCUSS: A re-weighted minimum norm algorithm. Signal Processing, IEEE Transac-
tions on, 45:600–616, Mar 1997.
[93] A Gramfort and M Kowalski. Improving m/eeg source localization with an inter-
condition sparse prior. In Proceedings ISBI, Jun 2009.
[94] Alexandre Gramfort and Maureen Clerc. Low dimensional representations of
MEG/EEG data using laplacian eigenmaps. In NFSI 2007: Proceedings of the 6th In-
ternational Symposium, pages 169–172, oct 2007.
[95] G Gratton, M.R Goodman-Wood, and M Fabiani. Comparison of neuronal and hemody-
namic measures of the brain response to visual stimulation : An optical imaging study.
Human Brain Mapping, 13:13–25, 2001.
[96] D Greig, B Porteous, and A Seheult. Exact maximum a posteriori estimation for binary
images. Journal of the Royal Statistical Society, Series B, 51(2):271–279, 1989.
BIBLIOGRAPHY 255
[97] J Gross, J Kujala, M Hamalainen, and L Timmermann. Dynamic imaging of coherent
sources: studying neural interactions in the human brain. Proceedings of the National
Academy of Sciences, 98(2):694–699, Jan 2001.
[98] Berger H. Uber das Elektroenkephalogramm des Menschen. Archiv fur Psychiatrie
und Nervenkrankheiten, 87:527–570, 1929.
[99] Elaine T Hale, Wotao Yin, and Yin Zhang. A fixed-point continuation method for l1-
regularized minimization with applications to compressed sensing. CAAM Technical
Report TR07-07, page 45, Jul 2007.
[100] M Hamalainen, R Hari, R Ilmoniemi, J Knuutila, and O.V Lounasmaa. Magnetoen-
cephalography: theory, instrumentation, and applications to noninvasive studies of the
working human brain. Reviews of Modern Physics, 65(2):413–497, 1993.
[101] M Hamalainen and R Ilmoniemi. Interpreting magnetic fields of the brain: minimum
norm estimates. Medical and Biological Engineering and Computing, 32(1):35–42, Jan
1994.
[102] M Hamalainen and J Sarvas. Realistic conductivity geometry model of the human head
for interpretation of neuromagnetic data. IEEE Trans. Biomed. Eng., 36(2):165–171,
1989.
[103] P Hansen. Analysis of discrete ill-posed problems by means of the l-curve. SIAM
Review, Jan 1992.
[104] R Hari and N Forss. Magnetoencephalography in the study of human somatosensory
cortical processing. Philos Trans R Soc Lond, B, Biol Sci, 354(1387):1145–54, Jul 1999.
[105] D.A Harville. Maximum likelihood approaches to variance component estimation and
to related problems. Journal of the American Statistical Association, 72(358):320–338,
1977.
[106] M. Hebiri. Regularization with the smooth-lasso procedure. Preprint Laboratoire de
Probabilites et Modeles Aleatoires, 2008.
[107] M Hein, JY Audibert, and U von Luxburg. Graph Laplacians and their Convergence on
Random Neighborhood Graphs. The Journal of Machine Learning Research, 8:1325–
1370, 2007.
[108] Ming-Xiong Huang, Anders M Dale, Tao Song, Eric Halgren, Deborah L Harrington,
Igor Podgorny, Jose M Canive, Stephen Lewis, and Roland R Lee. Vector-based spatial-
temporal minimum l1-norm solution for meg. Neuroimage, 31(3):1025–37, Jul 2006.
[109] G Huiskamp, M Vroeijenstijn, R Dijk, G Wieneke, and A Huffelen. The need for cor-
rect realistic geometry in the inverse EEG problem. IEEE Trans. on Biomed. Engin.,
46(11):1281–1287, 1999.
[110] Chang-Hwan Im, Arvind Gururajan, Nanyin Zhang, Wei Chen, and Bin He. Spatial
resolution of eeg cortical source imaging revealed by localization of retinotopic organi-
zation in human primary visual cortex. J Neurosci Methods, 161(1):142–54, Mar 2007.
[111] Bancaud J, Talairach J, Bonis A, Schaub C, Szikla G, and Morel P et al. La
stereoelectroencephalographie dans l’epilepsie: informations neurophysiopathologiques
apportees par l’investigation fonctionelle stereotaxique. Paris, Masson, 1965.
256 BIBLIOGRAPHY
[112] L. Jacob, G. Obozinski, and J.-P. Vert. Group Lasso with Overlap and Graph Lasso. In
ICML’09 Proceedings of the 26th international conference on Machine learning, 2009.
[113] Ben Jansen and Vincent Rit. Electroencephalogram and visual evoked potential gener-
ation in a mathematical model of coupled cortical columns. Biol. Cybern., 73:357–366,
1995.
[114] P Jaskowski and R Verleger. Amplitudes and latencies of single-trial ERP’s estimated
by a maximum-likelihood method. IEEE Transactions on Biomedical Engineering,
46(8):987–993, Aug 1999.
[115] H. H. Jasper. The ten-twenty electrode system of the International Federation. Elec-
troencephalography and Clinical Neurophysiology, 10:371–375, 1958.
[116] R Jenatton, J-Y Audibert, and F Bach. Structured variable selection with sparsity-
inducing norms. Technical report, WILLOW (INRIA Rocquencourt), Imagine, 2009.
[117] K Jerbi, Sylvain Baillet, J.C Mosher, G Nolte, L Garnero, and R.M Leahy. Localization
of realistic cortical activity in meg using currentmultipoles. Neuroimage, 22(2):779–
793, 2004.
[118] K Jerbi, C Mosher, S Baillet, and R.M Leahy. On meg forward modelling using multi-
polar expansions. Physics in Medicine and Biology, 47:523–555, 2002.
[119] K. Jerbi, J.C. Mosher, S. Baillet, and R.M. Leahy. On MEG forward modelling using
multipolar expansions. Phys. Med. Biol., 47:523–555, 2002.
[120] Moreau J.J. Proximite et dualite dans un espace hilbertien. Bull. Soc. Math. France.,
93:273–299, 1965.
[121] E.G Jones and A Peters. Cerebral cortex, functional properties of cortical cells, volume 2.
Plenum Press, 1984.
[122] O Juan and Y Boykov. Active graph cuts. In Computer Vision and Pattern Recognition,
2006 IEEE Computer Society Conference on, volume 1, pages 1023–1029, 2006.
[123] T.P. Jung, S. Makeig, M. Westerfield, J. Townsend, E. Courchesne, and T.J. Sejnowski.
Analysis and visualization of single-trial event-related potentials. Human Brain Map-
ping, 14:166–185, 2001.
[124] E.R Kandel, J.H Schwartz, and T.M Jessel. Principles of Neural Science. McGraw-Hill
Education, 2000.
[125] C Koch. Biophysics of Computation: Information Processing in Single Neurons. Oxford
University Press, USA, 1999.
[126] V. Kolmogorov and R. Zabih. Computing visual correspondence with occlusions using
graph cuts. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE Interna-
tional Conference on, volume 2, pages 508–515, 2001.
[127] V Kolmogorov and R Zabih. What energy functions can be minimized via graph cuts?
IEEE Transactions on Pattern Analysis and Machine, Jan 2004.
[128] M Kowalski and A Gramfort. A priori par normes mixtes pour les problemes inverses:
Application a la localisation de sources en M/EEG. In Proceedings GRETSI, Jun 2009.
[129] M Kowalski and B Torresani. Sparsity and persistence: mixed norms provide simple
signals models with dependent coefficients. Signal, Image and Video Processing, 2008.
BIBLIOGRAPHY 257
[130] M Kutas, G McCarthy, and E Donchin. Augmenting mental chronometry: the P300 as
a measure of stimulus evaluation time. Science, 197:792–795, Aug 1977.
[131] J Kybic, M Clerc, T Abboud, O Faugeras, R Keriven, and T Papadopoulo. A common
formalism for the integral formulations of the forward eeg problem. IEEE Transactions
on Medical Imaging, 24(1):12–28, 2005.
[132] J Kybic, M Clerc, O Faugeras, R Keriven, and T Papadopoulo. Generalized head mod-
els for meg/eeg: boundary element method beyond nested volumes. Phys. Med. Biol.,
51:1333–1346, 2006.
[133] J.-P. Lachaux, E. Rodriguez, Jacques Martinerie, and Francisco Varela. Measuring
phase-synchrony in brain signals. Human Brain Mapping, 8(4):194– –208, Nov 1999.
[134] D Lange, H Pratt, and G Inbar. Modeling and estimation of single evoked brain poten-
tial components. IEEE Transactions on Biomedical Engineering, 44(9):791–799, Sep
1997.
[135] J Lefevre and S Baillet. Optical flow and advection on 2-riemannian manifolds: A
common framework. IEEE transactions on Pattern Analysis and Machine Intelligence,
30(6):1081–1092, 2008.
[136] J Lefevre, G Obozinski, and S Baillet. Imaging brain activation streams from optical
flow computation on 2-riemannian manifolds. IPMI 2007, Lecture notes in Computer
Science, 4587:470–481, Jan 2007.
[137] R. Lehoucq, D. Sorensen, and D. Yang. ARPACK users’ guide: Solution of large-scale
eigenvalue problems with implicitly restarted Arnoldi methods. SIAM Publications,
Philadelphia, Jan 1998.
[138] M. Leventon, E. Grimson, and O. Faugeras. Statistical Shape Influence in Geodesic
Active Contours. In CVPR, page 316–323, 2000.
[139] Y Li. A globally convergent method for lp problems. SIAM Journal on Optimization,
3(3):609–629, 1993.
[140] Fa-Hsuan Lin, Thomas Witzel, Seppo P. Ahlfors, Steven M. Stufflebeam, John W. Bel-
liveau, and Matti S. Hamalainen. Assessing and improving the spatial accuracy in
meg source localization by depth-weighted minimum-norm estimates. NeuroImage,
31(1):160–171, May 2006.
[141] Kowalski M. Sparse regression using mixed norms. Applied and Computational Har-
monic Analysis, page In press, 2009.
[142] David J. C. Mackay. Bayesian interpolation. Neural Computation, 4:415–447, 1992.
[143] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
[144] S Mallat and Z Zhang. Matching pursuit with time-frequency dictionaries. IEEE Trans.
on Signal Processing, 41(12):3397–3414, 1993.
[145] K. Matsuura and Y. Okabe. Selective minimum-norm solution of the biomagnetic in-
verse problem. IEEE Trans Biomed Eng, 42(6):608–615, June 1995.
[146] J Mattout, C Phillips, W Penny, and M Rugg. Meg source localization under multiple
constraints: An extended bayesian framework. Neuroimage, Jan 2006.
258 BIBLIOGRAPHY
[147] C McGillem, J Aunon, and C Pomalaza. Improved waveform estimation procedures
for event-related potentials. IEEE Transactions on Biomedical Engineering, 32(6):371–
379, Jun 1985.
[148] J Meijs, O Weier, M Peters, and A van Oosterom. On the numerical accuracy of the
boundary element method. IEEE Trans. Biomed. Eng., 36:1038–1049, 1989.
[149] J. W. H. Meijs and M. Peters. The EEG and MEG using a model of eccentric spheres to
describe the head. IEEE Transactions on Biomedical Engineering, 34:913–920, 1987.
[150] S Meunier, L Garnero, A Ducorps, and Maziı¿ 12es. Human brain mapping in dystonia
reveals both endophenotypic traits and adaptative reorganization. Annals of Neurol-
ogy, 50:521–527, 2001.
[151] F Moradi, L.C Liu, K Cheng, R.A Waggoner, K Tanaka, and A.A Ioannides. Consistent
and precise localization of brain activity in human primary visual cortex by meg and
fmri. NeuroImage, 18:595–609, 2003.
[152] J Mosher, S Baillet, and R Leahy. Equivalence of linear approaches in bioelectromag-
netic inverse solutions. Statistical Signal Processing, Jan 2003.
[153] J Mosher and R Leahy. Source localization using recursively applied and projected
(rap) music. Signal Processing, 47(2):332–339, Jan 1999.
[154] J Mosher, P Lewis, and R Leahy. Multiple dipole modeling and localization from spatio-
temporal megdata. Biomedical Engineering, Jan 1992.
[155] John Mosher, Richard Leahy, and Paul Lewis. Eeg and meg: Forward solutions for
inverse methods. IEEE Transactions on Biomedical Engineering, 46(3):245–259, 1999.
[156] John Mosher, Paul Lewis, and Richard Leahy. Multiple dipole modeling and localiza-
tion from spatio-temporal meg data. IEEE Transactions on Biomedical Engineering,
39(6):541–553, 1992.
[157] V.B Mountcastle. Modality and topographic properties of single neurons of cat’s so-
matosensory cortex. Journal of Neurophysiology, 20:408–434, 1957.
[158] D Mumford and J Shah. Optimal approximations by piecewise smooth functions and
associated variational problems. Comm. Pure Appl. Math, Jan 1989.
[159] S Murakami and Y Okada. Contributions of principal neocortical neurons to mag-
netoencephalography and electroencephalography signals. The Journal of Physiology,
575(3):925–936, 2006.
[160] Radford M. Neal. Bayesian Learning for Neural Networks (Lecture Notes in Statistics).
Springer, 1 edition, August 1996.
[161] Y Nesterov. Gradient methods for minimizing composite objective function. CORE
Discussion Papers 2007076, Universite catholique de Louvain, Center for Operations
Research and Econometrics (CORE), Sep 2007.
[162] T. Noesselt, S. A. Hillyard, M. G. Woldorff, A. Schoenfeld, T. Hagner, L. Jancke, C. Tem-
pelmann, H. Hinrichs, and H. J. Heinze. Delayed striate cortical activation during
spatial attention. Neuron, 35:575–587, 2002.
[163] John Nolte. The human brain: an introduction to its functional anatomy. Mosby-Year
Book, 3 edition, 1993.
BIBLIOGRAPHY 259
[164] Jean-Claude Nedelec. Acoustic and Electromagnetic Equations. Springer Verlag, 2001.
[165] T. F. Oostendorp and A. van Oosterom. Source parameter estimation in inhomogeneous
volume conductors of arbitrary shape. IEEE Trans. Biomed. Eng., BME-36:382–391,
1989.
[166] W Ou, M Hamalainen, and P Golland. A distributed spatio-temporal eeg/meg inverse
solver. Neuroimage, 44:932–946, 2009.
[167] S. E. Palmer. Vision Science-Photons to Phenomenology. MIT Press, Cambridge, MA,
1999.
[168] D Pantazis, T Nichols, S Baillet, and R Leahy. A comparison of random field theory and
permutation methods for the statistical analysis of meg data. Neuroimage, Jan 2005.
[169] D. Pantazis, Thomas E Nichols, Sylvain Baillet, and R.M. Leahy. Spatiotemporal lo-
calization of significant activation in meg using permutation tests. Inf Process Med
Imaging, 18:512–523, Jul 2003.
[170] Theodore Papadopoulo and Sylvain Vallaghe. Implicit meshing for finite element meth-
ods using levelsets. In Proceedings of MMBIA 07, 2007.
[171] R Pascual-Marqui. Standardized low resolution brain electromagnetic tomography
(sloreta): technical details. Methods Find. Exp. Clin. Pharmacology, 24(D):5–12, Jan
2002.
[172] R. D. Pascual-Marqui, C. M. Michel, and D. Lehman. Low resolution electromagnetic
tomography: A new method for localizing electrical activity of the brain. Psychophysi-
ology, 18:49–65, 1994.
[173] M Pastor, J Artieda, J Arbizu, and M Valencia. Human cerebral activation during
steady-state visual-evoked responses. Journal of neuroscience, Jan 2003.
[174] W. Penfield and T. Rasmussen. The Cerebral Cortex of Man: A Clinical Study of Local-
ization of Function. Macmillan, 1950.
[175] Alan Peters and Edward G. Jones, editors. Cellular Components of the Cerebral Cortex,
volume 1 of Cerebral Cortex. Plenum, New York, 1984.
[176] C Phillips. Source estimation in EEG. PhD thesis, University de Liege, Belgium, 2000.
[177] C Phillips, J Mattout, M Rugg, and P Maquet. An empirical bayesian solution to the
source reconstruction problem in eeg. Neuroimage, Jan 2005.
[178] C Phillips, M Rugg, and K Friston. Anatomically informed basis functions for eeg source
localization: Combining functional and . . . . Neuroimage, Jan 2002.
[179] J Picard and H Ratliff. Minimum cuts and related problems. Networks, 5(4):357–370,
Jan 1975.
[180] B Presnell, B Turlach, and M Osborne. A new approach to variable selection in least
squares problems. IMA Journal of Numerical Analysis, 20:389–404, 2000.
[181] R. Q. Quiroga and H. Garcia. Single-trial event-related potentials with wavelet denois-
ing. Clinical Neurophysiology, 114(2):376–290, 2003.
[182] Ramırez R and Makeig S. Neuroelectromagnetic source imaging using multiscale
geodesic neural bases and sparse bayesian learning. In Human Brain Mapping, 2006.
260 BIBLIOGRAPHY
[183] G Rager and W Singer. The response of cat visual cortex to flicker stimuli of variable
frequency. The European journal of neuroscience, 10(5):1856–1877, 1998.
[184] M. Raichle. A brief history of human brain mapping. Trends in Neurosciences, Decem-
ber 2008.
[185] P.O. Ranta-aho, A.S. Koistinen, J.O. Ollikainen, J.P. Kaipio, J. Partanen, and P.A.
Karjalainen. Single-trial estimation of multichannel evoked-potential measurements.
IEEE Transactions on Biomedical Engineering, 50(2):189–196, 2003.
[186] B Rao, K Engan, S Cotter, J Palmer, and K Kreutz-Delgado. Subset selection in noise
based on diversity measure minimization. Signal Processing, Jan 2003.
[187] D. Regan. Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic
Fields in Science and Medicine. Elsevier, 1989.
[188] R.T Rockafellar. Convex analysis. Princeton University Press, 1970.
[189] L Rudin, S Osher, and E Fatemi. Nonlinear total variation based noise removal algo-
rithms. Physica D, 60:259–268, 1992.
[190] Francesco Di Russo, Antıgona Martınez, Martin I. Sereno, Sabrina Pitzalis, and
Steven A. Hillyard. Cortical sources of the early components of the visual evoked po-
tential. Humain Brain Mapping, 15:95–111, 2002.
[191] Francesco Di Russo, Sabrina Pitzalis, Teresa Aprile, Grazia Spitoni, Fabiana Patria,
Alessandra Stella, Donatella Spinelli, and Steven A Hillyard. Spatiotemporal analy-
sis of the cortical sources of the steady-state visual evoked potential. Human brain
mapping, 28(4):323–334, Apr 2007.
[192] Jukka Sarvas. Basic mathematical and electromagnetic concepts of the biomagnetic
inverse problem. Phys. Med. Biol., 32(1):11–22, 1987.
[193] M Scherg and D Von Cramon. Two bilateral sources of the late AEP as identified by a
spatio-temporal dipole model. Electroencephalography and Clinical Neurophysiology,
62:32–44, 1985.
[194] K. Sekihara, S. Nagarajan, D. Poeppel, and Y. Miyashita. Reconstructing spatio-
temporal activities of neural sources from magnetoencephalographic data using a vec-
tor beamformer. In ICASSP ’01: Proceedings of the Acoustics, Speech, and Signal Pro-
cessing, 2001. on IEEE International Conference, pages 2021–2024, Washington, DC,
USA, 2001. IEEE Computer Society.
[195] M.I Sereno, A.M Dale, J.B Reppas, and Kwong. Borders of multiple visual areas in
human revealed by functional magnetic resonance imaging. Science, pages 889–893,
1995.
[196] R Shapley and J Victor. The effect of contrast on the transfer properties of cat retinal
ganglion cells. The Journal of Physiology, 285(1):275–298, 1978.
[197] D Sharon, M Hamalainen, R Tootell, and E Halgren. The advantage of combining meg
and eeg: Comparison to fmri in focally stimulated visual cortex. Neuroimage, 36:1225–
1235, Mar 2007.
[198] X Shen and F Meyer. Low-dimensional embedding of fMRI datasets. Neuroimage,
41(3):886–902, Jan 2008.
BIBLIOGRAPHY 261
[199] P. Suffczynski, S. Kalitzin, G. Pfurtscheller, and FH Lopes da Silva. Computational
model of thalamo-cortical networks: dynamical control of alpha rhythms in relation to
focal attention. International Journal of Psychophysiology, 43(1):25–40, 2001.
[200] J Talairach, J Bancaud, and G Szikla. Approche nouvelle de la neurochirugie de
l’epilepsie. methodologie stererotaxique et resultats therapeutiques. Neurochirurgie,
20:1–240, 1974.
[201] A Tarantola. Popper, bayes and the inverse problem. Nature Physics, 2, Aug 2006.
[202] R Tibshirani. Regression shrinkage and selection via the Lasso. J.R. Statist. Soc.,
58(1):267–288, 1996.
[203] A.N Tikhonov and V.Y Arsenin. Solutions of Ill-Posed Problems. Winston & Sons,
Washington, 1977.
[204] R Tootell, N Hadjikhani, J Mendola, S Marrett, and A Dale. From retinotopy to recog-
nition: fmri in human visual cortex. Trends in Cognitive Sciences, 2(5):174–183, 1998.
[205] R. Tootell, E. Switkes, M. Silverman, and S. Hamilton. Functional anatomy of the
macaque striate cortex. ii. retinotopic organization. Journal of neuroscience, 8(5):1531–
1568, 1988.
[206] W Truccolo, K H Knuth, A Shah, S L Bressler, C E Schroeder, and M Ding. Estima-
tion of single-trial multicomponent ERPs: Differentially variable component analysis
(dVCA). Biological Cybernetics, 89(6):426–438, Dec 2003.
[207] P D Tuan, J Mocks, W Kohler, and T Gasser. Variable latencies of noisy signals: Esti-
mation and testing in brain potential data. Biometrika, 74(3):525–533, 1987.
[208] M Uusitalo and R Ilmoniemi. Signal-space projection method for separating MEG or
EEG into components. Medical and Biological Engineering and Computing, 35:135–
140, Jan 1997.
[209] K Uutela, M Hamalainen, and R Salmelin. Global optimization in the localization of
neuromagnetic sources. IEEE Transactions on Biomedical Engineering, 45(6):716–723,
June 1998.
[210] Pedro A Valdes-Sosa, Mayrim Vega-Hernandez, Jose Miguel Sanchez-Bornot, Ed-
uardo Martınez-Montes, and Marıa Antonieta Bobes. Eeg source imaging with spatio-
temporal tomographic nonnegative independent component analysis. Human Brain
mapping, 30(6):1898–910, Jun 2009.
[211] Sylvain Vallaghe. EEG and MEG forward modeling : computation and calibration.
PhD thesis, Universite de Nice-Sophia Antipolis, 2008.
[212] E van den Berg and M Friedlander. Probing the pareto frontier for basis pursuit solu-
tions. Department of Computer Science, Jan 2008.
[213] S. Vanni, J. Warnking, M. Dojat, C. Delon-Martin, J. Bullier, and C. Segebarth. Se-
quence of pattern onset responses in the human visual areas: an fMRI constrained
VEP source analysis. NeuroImage, 21(3):801–817, 2004.
[214] S Vanni, J Warnking, M Dojat, C Delon-Martin, J Bullier, and C Segebarth. Sequence
of pattern onset responses in the human visual areas: an fmri constrained vep source
analysis. NeuroImage, 21:801–817, 2004.
262 BIBLIOGRAPHY
[215] B Van Veen, W Van Drongelen, M Yuchtman, and A Suzuki. Localization of brain elec-
trical activity via linearly constrained minimum variance spatial filtering. Biomedical
Engineering, 44(9):867—880, Jan 1997.
[216] J. Vernon Odom, M. Bach, C. Barber, M. Brigell, M.F. Marmor, and A.P. Tormene. Visual
evoked potentials standard. Documenta Ophthalmologica, 108:115–123, 2004.
[217] J Vrba and E Robinson. Signal processing in magnetoencephalography. Methods,
25(2):249–271, Oct 2001.
[218] G Wahba. Practical approximate solutions to linear operator equations when the data
are noisy. SIAM Journal on Numerical Analysis, Jan 1977.
[219] B Wandell, S Dumoulin, and A Brewer. Visual field maps in human cortex. Neuron,
56(2):366–383, Oct 2007.
[220] J.-Z. Wang, S.J. Williamson, and L. Kaufman. Magnetic source images determined by
a lead-field analysis: the unique minimum-norm least-squares estimation. Biomedical
Engineering, IEEE Transactions on, 39(7):665–675, July 1992.
[221] Z. Wang, A. Maier, D.A. Leopold, N.K. Logothetis, and H. Liang. Single-trial evoked
potential estimation using wavelets. Computers in Biology and Medicine, 37(4):463–
473, Apr 2007.
[222] J Warnking. Delineation des aires visuelles retinotopiques chez l’homme par IRM fonc-
tionnelle. PhD thesis, Universite Joseph Fourier-Grenoble I, 2002.
[223] J Warnking, M Dojat, A Guerin-Dugue, C Delon-Martin, S Olympieff, N Richard,
A Chehikian, and C Segebarth. fmri retinotopic mapping - step by step. NeuroImage,
17:1665–1683, 2002.
[224] Pierre Weiss. Algorithmes rapides d’optimisation convexe. Applications a la reconstruc-
tion d’images et a la detection de changements. PhD thesis, Universite de Nice Sophia-
Antipolis, Novembre 2008.
[225] P. Welch. The use of fast Fourier transform for the estimation of power spectra: A
method based on time averaging over short, modified periodograms. Audio and Elec-
troacoustics, IEEE Transactions on, 15(2):70–73, Jun 1967.
[226] F Wendling, J.J Bellanger, F Bartolomei, and P Chauvel. Relevance of nonlinear
lumped-parameter models in the analysis of depth-eeg epileptic signals. Biological
Cybernetics, 83:367–378, 2000.
[227] D Wipf and S Nagarajan. A unified bayesian framework for meg/eeg source imaging.
Neuroimage, 44(3):947–966, Feb 2009.
[228] Adrien Wohrer and Pierre Kornprobst. Virtual Retina : A biological retina model
and simulator, with contrast gain control. Journal of Computational Neuroscience,
26(2):219–249, 2009.
[229] C. H. Wolters, A. Anwander, X. Tricoche, D. Weinstein, M. A. Koch, and R. MacLeod.
Influence of tissue conductivity anisotropy on EEG/MEG field and return current com-
putation in a realistic head model: A simulation and visualization study using high-
resolution finite element modeling. NeuroImage, 3:813–826, 2006.
[230] C.D. Woody. Characterization of an adaptive filter for the analysis of variable latency
neuroelectrical signals. Medical and Biological Engineering, 5:539–553, 1967.
BIBLIOGRAPHY 263
[231] Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory
and its application to image segmentation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 15(11):1101–1113, 1993.
[232] Ning Xu, Ravi Bansal, and Narendra Ahuja. Object segmentation using graph cuts
based active contours. Computer Vision and Pattern Recognition, IEEE Computer So-
ciety Conference on, 2:46, 2003.
[233] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. ACM
Comput. Surv., 38(4), 2006.
[234] D. Yoshor, W. H. Bosking, G. M. Ghose, and J. H. Maunsell. Receptive fields in human
visual cortex mapped with surface electrodes. Cereb Cortex, 17(10):2293–2302, October
2007.
[235] M Yuan and Y Lin. Model selection and estimation in regression with grouped vari-
ables. Journal of the Royal Statistical Society, Jan 2006.
[236] Benjamin W. Zeff, Brian R. White, Hamid Dehghani, Bradley L. Schlaggar, and
Joseph P. Culver. Retinotopic mapping of adult human visual cortex with high-
density diffuse optical tomography. Proceedings of the National Academy of Sciences,
104(29):12169–12174, 2007.
[237] LH Zetterberg, L. Kristiansson, and K. Mossberg. Performance of a model for a local
neuron population. Biological Cybernetics, 31(1):15–26, 1978.
[238] Zhi Zhang. A fast method to compute surface potentials generated by dipoles within
multilayer anisotropic spheres. Phys. Med. Biol., 40:335–349, 1995.
[239] H Zou and T Hastie. Regularization and variable selection via the elastic net. Journal
of the Royal Statistical Society Series B, Jan 2005.