Pattern Recognition in EEG - UGent · Pattern Recognition in EEG Pieter-Jan Kindermans, UGent,...

Post on 30-Jan-2020

7 views 0 download

transcript

Pattern Recognition in EEG

Pieter-Jan Kindermans, UGent, Department of Electronics and Information Systems (ELIS)

1

Who is familiar with machine learning?

2

Who is familiar with MATLAB?

3

Who knows how to program?

4

We are

Thibault Verhoeven, Pieter-Jan Kindermans

- Faculty of engineering and architecture

- Department of Electronics and Information Systems (ELIS)

- Reservoir Lab (a Machine learning group)

- PhD students

- Work on/related to Brain-Computer Interfaces

5

To illustrate basic machine learning principles

6

Outline

- Event-Related Potential classification (the task)

- Machine learning methods (the basic tools)

- Unsupervised classification in BCI (advanced tools)

- The hands on session (the work)

- Your own data?

7

Event-Related Potential classification (the task)

focus on ERPs in Brain-Computer Interfaces

8

Application: Brain-Computer Interfaces

9

Event-Related Potentials (Oddball paradigm)

10

Stimuli

0 0.2 0.4 0.6 0.8 1−0.1

−0.05

0

0.05

0.1

0.15

time (s)

P300NON−P300

ERP based BCI

11

General principle behind ERP based BCI

12

Stimulus 1

EEG/ Response

General principle behind ERP based BCI

13

Stimulus 1

EEG/ Response

2

General principle behind ERP based BCI

14

Stimulus 1

EEG/ Response

2 3

General principle behind ERP based BCI

15

Stimulus 1

EEG/ Response

2 3

1 iteration

General principle behind ERP based BCI

16

Stimulus 1

EEG/ Response

2 3 3 1 2

1 iteration

General principle behind ERP based BCI

17

Stimulus 1

EEG/ Response

2 3 3 1 2 3 21

1 iteration

General principle behind ERP based BCI

18

Stimulus 1

EEG/ Response

2 3 3 1 2 3 21

1 iteration

1 trial

General principle behind ERP based BCI

19

Stimulus 1

EEG/ Response

2 3 3 1 2 3 21

Attended stimulus?

1 iteration

1 trial

ERP variations

All these variations exhibit the same stimulus/iteration structure

- Visual speller

- Auditory (e.g. Amuse, PASS2D)

- Tactile

- ...

20

Example: auditory ERPs

21

A - supervised blocks

�100 0 100 200 300 400 500 600 700 800�2

0

2

[µV

]

tnt

[µV

]

�2

0

2

[µV

]

tnt

130 � 160 [ms] 180 � 240 [ms] 260 � 280 [ms] 300 � 350 [ms] 420 � 460 [ms]

�0.1

�0.05

0

0.05

0.1

( t ,

nt )

ssA

UC

�100 0 100 200 300 400 500 600 700 800�2

0

2

[µV

]

tnt

130 � 160 [ms] 180 � 240 [ms] 260 � 280 [ms] 300 � 350 [ms] 420 � 460 [ms]

�2

0

2

ssAUC

tnt

( t ,

nt )

ssA

UC

B - unsupervised blocks

Cz (thick)

F5 (thin)

time [ms] time [ms]

Many differences between subjects

22

Time after stimulus [ms]

100

faw

fcb

150 200 250 300 350 400 600500

GA

Unfortunately, the raw data looks like this

23

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

ERP Speller: The default approach

1. Record training data (quite boring)

2. Machine learning magic (supervised)

3. Use the BCI

24

Questions?

25

We will build a decoder to discriminate between target and non-target ERP responses

It is already implemented. If you get bored, you can extend the implementation such that it predicts the symbols as well.

26

Machine learning methods (the basic tools)

27

Machine learning rules

- Do not optimise the model on the data used for evaluation

28

Machine learning rules

- Do not optimise the model on the data used for evaluation

- Keep the model as simple as possible

29

Machine learning rules

- Do not optimise the model on the data used for evaluation

- Keep the model as simple as possible

- Use a proper cost function

30

Machine learning rules

- Do not optimise the model on the data used for evaluation

- Keep the model as simple as possible

- Use a proper cost function

- Do not directly interpret the classifier weights

31

Linear Discriminant Analysis

Pictures from Pattern Recognition and Machine Learning (C. Bishop)

32

Linear Discriminant Analysis

Pictures from Pattern Recognition and Machine Learning (C. Bishop)

33

Linear Discriminant Analysis

34

p(x|C1) =

1

(2⇡)D2

1

|⌃| 12exp(�1

2

(x� µ1)T⌃

�1(x� µ1))

p(x|C2) =

1

(2⇡)D2

1

|⌃| 12exp(�1

2

(x� µ2)T⌃

�1(x� µ2))

p(C1) = ⇡C1, 0 ⇡C1 1

p(C2) = 1� ⇡C1

Linear Discriminant Analysis

35

wx+ w0 > 0

w = ⌃

�1(µ1 � µ2)

w0 = �1

2

µ1T⌃

�1µ1 +

1

2

µ

T2 ⌃

�1µ2 + log

p(C1)

p(C2)

Linear Discriminant Analysis

36

wx+ w0 > 0

w = ⌃

�1(µ1 � µ2)

w0 = �1

2

µ1T⌃

�1µ1 +

1

2

µ

T2 ⌃

�1µ2 + log

p(C1)

p(C2)

−20 0 20

−20

−10

0

10

20

−20 0 20

−20

−10

0

10

20

−20 0 20

−20

−10

0

10

20

Linear Discriminant Analysis

37

wx+ w0 > 0

w = ⌃

�1(µ1 � µ2)

w0 = �1

2

µ1T⌃

�1µ1 +

1

2

µ

T2 ⌃

�1µ2 + log

p(C1)

p(C2)

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

Overfitting and regularisation

38

model complexity complexity

Erro

r

Train errorTest errorOptimum

Regularisation for LDA

Estimating covariance matrices is difficult (especially for high dimensions) Shrinkage regularisation

!

Effect: the weight vector becomes equal to the difference between the class means:

39

w = ⌃̂�1(µ1 � µ2)

⌃̂ = ⌃+ �I

Training and testing

40

data

train validation

fold 1fold 2fold 3fold 4fold 5

test

Training and testing

41

data

train validation

fold 1fold 2fold 3fold 4fold 5

test

Crossvalidation

42

data

train test

fold 1fold 2fold 3fold 4fold 5

Nested crossvalidation

43

data

train test

fold 1fold 2fold 3fold 4fold 5

data

train validation

subfold 1subfold 2subfold 3subfold 4

For all the inner folds

The importance of multivariate interactions

44

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40

−20

0

20

40

signaal

tijd

0 0.2 0.4 0.6 0.8 1−0.1

−0.05

0

0.05

0.1

0.15

time (s)

P300NON−P300

The importance of multivariate interactions

45

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

The importance of multivariate interactions

46

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

−10 0 10

−10

−5

0

5

10

0 0.2 0.4 0.6 0.8 1−0.1

−0.05

0

0.05

0.1

0.15

time (s)

P300NON−P300

The importance of multivariate interactions

47

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

The importance of multivariate interactions

48

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

The importance of multivariate interactions

49

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

The importance of multivariate interactions

50

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

����� ����

����� ����

��������

�� �� � � ���

��

�� �� � � ���

��

������

�����

���������

��������� ������� ������������ ����� �������������

�������

���������

�����

�� � ��

��

���

����������������������������

�������������������������������

������������� ������������

��������

��

���� ���� �

�� ��

������� ���� ��

������� ���� ��

��������

���� ����

����� ����

��������

Error measures

Computing the accuracy is simple, just count how many examples you have classified correctly!

51

Error measures

Computing the accuracy is simple, just count how many examples you have classified correctly!

Yes, but …

What if the data is such that 99% of the samples are belonging to the non-target class. If I constantly predict non-target, this will be a good model.

52−20 0 20

−20

−10

0

10

20

−20 0 20

−20

−10

0

10

20

−20 0 20

−20

−10

0

10

20

Images: wikipedia

53

Error measures

True positive rate (or sensitivity, recall):

!

True negative rate (or specificity)

!

False positive rate

54

TPR =TP

P

FPR =FP

N

TNR =TN

N

Error measures: balanced accuracy

True positive rate (or sensitivity, recall):

!

True negative rate (or specificity)

!

!

Possible to combine TPR and TNR in a balanced accuracy by averaging.

55

TPR =TP

P

TNR =TN

N

Error measures: area under curve

56

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positive rate

true

posi

tive

rate

Questions?

57

The hands on session (the work)

58

Data

- Visual ERP data (6x6) matrix speller

- 1:5 ratio of target to non-targets

- 15 iterations

- 12 stimuli per iteration

- 64 channels at 240 Hz

59

Find the target samples!

60

Feedback

61