+ All Categories
Home > Documents > Course 495: Advanced Statistical Machine Learning/Pattern ...

Course 495: Advanced Statistical Machine Learning/Pattern ...

Date post: 03-Jan-2017
Category:
Upload: doankhue
View: 226 times
Download: 3 times
Share this document with a friend
17
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) Course 495: Advanced Statistical Machine Learning/Pattern Recognition β€’ Goal (Lecture): To present Kernel Principal Component Analysis (KPCA) (and give a small flavour of Auto- encoders). 1
Transcript
Page 1: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

β€’ Goal (Lecture): To present Kernel Principal Component Analysis (KPCA) (and give a small flavour of Auto-encoders).

1

Page 2: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

β€’ Pattern Recognition & Machine Learning by C. Bishop Chapter 12

β€’ KPCA: SchΓΆlkopf, Bernhard, Alexander Smola, and Klaus-Robert

MΓΌller. "Nonlinear component analysis as a kernel eigenvalue

problem." Neural computation 10.5 (1998): 1299-1319.

β€’ Auto-Encoder: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507.

Materials

2

Page 3: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) 3

Non-linear Component Analysis

πœ‘

πœ‘ 𝒙𝑖 ∈ 𝐻 𝒙𝑖 ∈ 𝑅𝐹

𝐻 can be of arbitrary dimensionality

(could be even infinite)

Page 4: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Kernel Principal Component Analysis

πœ‘ π’™π‘–π›΅πœ‘ 𝒙𝑗 = π‘˜(𝒙𝑖 , 𝒙𝑗)

β€’ πœ‘(. ) may not be explicitly known or is extremely

expensive to compute and store.

β€’ What is explicitly known is the dot product in 𝐻

(also known as kernel π‘˜)

(𝒙𝑖 , 𝒙𝑗) ∈ 𝑅𝐹X 𝑅𝐹 π‘˜ . , . ∈ 𝑅

β€’ All positive (semi)-definite functions can be used as kernels

4

Page 5: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

KPCA-Kernel Matrix

β€’ Given a training population of 𝑛 samples 𝒙1, … , 𝒙𝑛 we

compute the training kernel matrix (also called Gram matrix).

𝑲 = πœ‘ π’™π‘–π›΅πœ‘ 𝒙𝑗 = [π‘˜ 𝒙𝑖 , 𝒙𝑗 ]

β€’ All the computations are performed via the use of the kernel

or the centralized kernel matrix.

𝚱 = πœ‘ 𝒙𝑖 βˆ’π’ŽΞ¦ 𝛡(πœ‘ 𝒙𝑗 βˆ’π’ŽΞ¦) π’ŽΞ¦ =

1

𝑛 πœ‘ 𝒙𝑖

𝑛

𝑖=1

5

Page 6: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

π‘˜ 𝒙𝑖 , 𝒙𝑗 = π‘’βˆ’| π’™π‘–βˆ’π’™π‘— |2/π‘Ÿ2

π‘˜ 𝒙𝑖 , 𝒙𝑗 = (𝒙𝑖𝑇𝒙𝑗 + 𝑏)𝑛

π‘˜ 𝒙𝑖 , 𝒙𝑗 = tanh(𝒙𝑖𝑇𝒙𝑗 + 𝑏)

KPCA- Popular Kernels

Gaussian Radial Basis Function (RBF) kernel:

Polynomial kernel:

Hyperbolic Tangent kernel:

6

Page 7: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

𝑿Φ = [πœ‘ 𝒙1 , … , πœ‘ 𝒙𝑛 ]

𝑿 Ξ¦ = πœ‘ 𝒙1 βˆ’π’ŽΞ¦, … , πœ‘ 𝒙𝑛 βˆ’π’ŽΞ¦

= 𝑿Φ 𝑰 βˆ’ 𝑬 = 𝑿Φ𝑴, 𝑬 =1

𝑛 𝟏 πŸπ‘‡

𝑲 = πœ‘ π’™π‘–π›΅πœ‘ 𝒙𝑗 = π‘˜ 𝒙𝑖 , 𝒙𝑗 = 𝑿Φ𝑇

𝑿Φ

πœ₯ = [ πœ‘ 𝒙𝑖 βˆ’π’ŽΞ¦ π›΅πœ‘ 𝒙𝑗 βˆ’π’ŽΞ¦ ] = 𝑰 βˆ’ 𝑬 𝑿Φ𝑇

𝑿Φ 𝑰 βˆ’ 𝑬 = 𝑰 βˆ’ 𝑬 𝑲 𝑰 βˆ’ 𝑬 = π‘²βˆ’ 𝑬𝑲 βˆ’π‘²π‘¬ + 𝑬𝑲𝑬

Input space: 𝑿 = [𝒙1, … , 𝒙𝑛]

Feature space:

Centralised:

Kernel:

Centralised Kernel:

KPCA-kernel Matrix

7

Page 8: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

β€’ KPCA cost function

π‘ΌΞ¦π‘œ = arg max𝑼Φtr[𝑼Φ𝑇

𝑺𝑑Φ𝑼Φ]

= arg max𝑼Φtr[𝑼Φ𝑇𝑿 Φ𝑿 Ξ¦

𝑇𝑼Φ]

subject to 𝑼Φ𝑇𝑼Φ = 𝑰

𝑺𝑑Φ𝑼Φ

π‘œ= 𝑼Φ

π‘œπš²

β€’ The solution is given by the d eigenvectors that

correspond to the d largest eigenvalues

KPCA-Optimization problem

8

Page 9: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

β€’ How can we compute the eigenvectors of 𝑺𝑑Φ?

We do not even know πœ‘β€Όβ€Ό

β€’ Do you see any problem with that?

β€’ Remember our Lemma that links the eigenvectors and

eigenvalues of matrices of the form 𝑨𝑨𝑇 and 𝑨𝑇𝑨

πœ₯ = 𝑿 Φ𝑇𝑿 Ξ¦ = π‘½πš²π•π‘‡ then 𝑼Φ

π‘œ = 𝑿 Ξ¦π‘½πš²βˆ’1

2

β€’ All computations are performed via the use of πœ₯ (so-

called kernel trick)

KPCA- Computing Principal Components

9

Page 10: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

β€’ Still π‘ΌΞ¦π‘œ = 𝑿 Ξ¦π‘½πš²βˆ’

1

2 cannot be analytically computed.

β€’ But we do not want to compute π‘ΌΞ¦π‘œ.

β€’ What we want is to compute latent features.

β€’ That is, given a test sample 𝒙𝑑 we want to compute

π’š = π‘ΌΞ¦π‘œ

π‘‡πœ‘(𝒙𝒕) (this can be performed via the kernel trick)

KPCA- Computing Principal Components

10

Page 11: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

π’š = π‘ΌΞ¦π‘œ

𝑇(πœ‘ 𝒙𝒕 βˆ’π’ŽΞ¦)

= πš²βˆ’12𝑽𝑇𝑿 Ξ¦

𝑇(πœ‘ 𝒙𝒕 βˆ’π’ŽΞ¦)

= πš²βˆ’12𝑽𝑇 𝑰 βˆ’ 𝑬 𝑿Φ𝑇

πœ‘ 𝒙𝒕 βˆ’1

π‘›π‘ΏΞ¦πŸ

= πš²βˆ’12𝑽𝑇 𝑰 βˆ’ 𝑬 (𝑿Φ𝑇

πœ‘ 𝒙𝒕 βˆ’1

𝑛𝑿Φ𝑇

π‘ΏΞ¦πŸ)

= πš²βˆ’12𝑽𝑇 𝑰 βˆ’ 𝑬 (𝑔 𝒙𝒕 βˆ’

1

π‘›π‘²πŸ)

𝑔 𝒙𝒕 = 𝑿Φ𝑇

πœ‘ 𝒙𝒕 =πœ‘ 𝒙1

π›΅πœ‘ 𝒙𝑑…

πœ‘ π’™π‘›π›΅πœ‘ 𝒙𝑑

=π‘˜ 𝒙1, 𝒙𝑑

β€¦π‘˜ 𝒙𝑛, 𝒙𝑑

KPCA- Extracting Latent Features

11

Page 12: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

KPCA- Example

12

Page 13: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Remember the PCA model?

π’šπ‘– = 𝑾𝑇𝒙𝑖

𝒙𝑖 𝒙𝑖 = 𝑾𝑾𝑇𝒙𝑖

Latent Feature Extraction with Neural Networks

13

Page 14: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Latent Feature Extraction with Neural Networks

14

Page 15: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

β€’G E Hinton, and R R Salakhutdinov Science 2006;313:504-507

Latent Feature Extraction with Neural Networks

15

Page 16: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

Matlab Toolbox for Dimensionality Reduction

http://www.mathworks.co.uk/help/stats/ppca.html

Matlab 2013b has PPCA implemented

Toolboxes on Component Analysis

16

Page 17: Course 495: Advanced Statistical Machine Learning/Pattern ...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

http://www.bsp.brain.riken.jp/ICALAB/

ICA toolboxes for image and signal processing:

http://mialab.mrn.org/software/

ICA for EEG Analysis:

http://research.ics.aalto.fi/ica/fastica/

FastICA

Toolboxes on Component Analysis

17


Recommended