Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Course 495: Advanced Statistical Machine Learning/Pattern Recognition
β’ Goal (Lecture): To present Kernel Principal Component Analysis (KPCA) (and give a small flavour of Auto-encoders).
1
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’ Pattern Recognition & Machine Learning by C. Bishop Chapter 12
β’ KPCA: SchΓΆlkopf, Bernhard, Alexander Smola, and Klaus-Robert
MΓΌller. "Nonlinear component analysis as a kernel eigenvalue
problem." Neural computation 10.5 (1998): 1299-1319.
β’ Auto-Encoder: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507.
Materials
2
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) 3
Non-linear Component Analysis
π
π ππ β π» ππ β π πΉ
π» can be of arbitrary dimensionality
(could be even infinite)
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Kernel Principal Component Analysis
π πππ΅π ππ = π(ππ , ππ)
β’ π(. ) may not be explicitly known or is extremely
expensive to compute and store.
β’ What is explicitly known is the dot product in π»
(also known as kernel π)
(ππ , ππ) β π πΉX π πΉ π . , . β π
β’ All positive (semi)-definite functions can be used as kernels
4
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
KPCA-Kernel Matrix
β’ Given a training population of π samples π1, β¦ , ππ we
compute the training kernel matrix (also called Gram matrix).
π² = π πππ΅π ππ = [π ππ , ππ ]
β’ All the computations are performed via the use of the kernel
or the centralized kernel matrix.
π± = π ππ βπΞ¦ π΅(π ππ βπΞ¦) πΞ¦ =
1
π π ππ
π
π=1
5
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π ππ , ππ = πβ| ππβππ |2/π2
π ππ , ππ = (πππππ + π)π
π ππ , ππ = tanh(πππππ + π)
KPCA- Popular Kernels
Gaussian Radial Basis Function (RBF) kernel:
Polynomial kernel:
Hyperbolic Tangent kernel:
6
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
πΏΞ¦ = [π π1 , β¦ , π ππ ]
πΏ Ξ¦ = π π1 βπΞ¦, β¦ , π ππ βπΞ¦
= πΏΞ¦ π° β π¬ = πΏΞ¦π΄, π¬ =1
π π ππ
π² = π πππ΅π ππ = π ππ , ππ = πΏΞ¦π
πΏΞ¦
π₯ = [ π ππ βπΞ¦ π΅π ππ βπΞ¦ ] = π° β π¬ πΏΞ¦π
πΏΞ¦ π° β π¬ = π° β π¬ π² π° β π¬ = π²β π¬π² βπ²π¬ + π¬π²π¬
Input space: πΏ = [π1, β¦ , ππ]
Feature space:
Centralised:
Kernel:
Centralised Kernel:
KPCA-kernel Matrix
7
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’ KPCA cost function
πΌΞ¦π = arg maxπΌΞ¦tr[πΌΞ¦π
πΊπ‘Ξ¦πΌΞ¦]
= arg maxπΌΞ¦tr[πΌΞ¦ππΏ Ξ¦πΏ Ξ¦
ππΌΞ¦]
subject to πΌΞ¦ππΌΞ¦ = π°
πΊπ‘Ξ¦πΌΞ¦
π= πΌΞ¦
ππ²
β’ The solution is given by the d eigenvectors that
correspond to the d largest eigenvalues
KPCA-Optimization problem
8
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’ How can we compute the eigenvectors of πΊπ‘Ξ¦?
We do not even know πβΌβΌ
β’ Do you see any problem with that?
β’ Remember our Lemma that links the eigenvectors and
eigenvalues of matrices of the form π¨π¨π and π¨ππ¨
π₯ = πΏ Ξ¦ππΏ Ξ¦ = π½π²ππ then πΌΞ¦
π = πΏ Ξ¦π½π²β1
2
β’ All computations are performed via the use of π₯ (so-
called kernel trick)
KPCA- Computing Principal Components
9
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’ Still πΌΞ¦π = πΏ Ξ¦π½π²β
1
2 cannot be analytically computed.
β’ But we do not want to compute πΌΞ¦π.
β’ What we want is to compute latent features.
β’ That is, given a test sample ππ‘ we want to compute
π = πΌΞ¦π
ππ(ππ) (this can be performed via the kernel trick)
KPCA- Computing Principal Components
10
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π = πΌΞ¦π
π(π ππ βπΞ¦)
= π²β12π½ππΏ Ξ¦
π(π ππ βπΞ¦)
= π²β12π½π π° β π¬ πΏΞ¦π
π ππ β1
ππΏΞ¦π
= π²β12π½π π° β π¬ (πΏΞ¦π
π ππ β1
ππΏΞ¦π
πΏΞ¦π)
= π²β12π½π π° β π¬ (π ππ β
1
ππ²π)
π ππ = πΏΞ¦π
π ππ =π π1
π΅π ππ‘β¦
π πππ΅π ππ‘
=π π1, ππ‘
β¦π ππ, ππ‘
KPCA- Extracting Latent Features
11
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
KPCA- Example
12
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Remember the PCA model?
ππ = πΎπππ
ππ ππ = πΎπΎπππ
Latent Feature Extraction with Neural Networks
13
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Latent Feature Extraction with Neural Networks
14
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’G E Hinton, and R R Salakhutdinov Science 2006;313:504-507
Latent Feature Extraction with Neural Networks
15
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
Matlab Toolbox for Dimensionality Reduction
http://www.mathworks.co.uk/help/stats/ppca.html
Matlab 2013b has PPCA implemented
Toolboxes on Component Analysis
16
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
http://www.bsp.brain.riken.jp/ICALAB/
ICA toolboxes for image and signal processing:
http://mialab.mrn.org/software/
ICA for EEG Analysis:
http://research.ics.aalto.fi/ica/fastica/
FastICA
Toolboxes on Component Analysis
17