A Large Dimensional Analysis of Kernel LS-SVMED STIC reception meeting 2019-2020
Zhenyu Liao
joint work with Romain Couillet
CentraleSupélec, Université Paris-Saclay, France.
Nov 28, 2019
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 1 / 9
Motivation: counterintuitive phenomena in large dimensional learning
Big Data era: large dimensional and massive amount of datadata number n and dimension p both large and comparable: analysis with Random Matrix Theory“curse of dimensionality” in large dimensional classification:
C1 : N (−µ, Ip) versus C2 : N (+µ, Ip)
x ∈ Rp has norm ‖x‖ = O(√
p) with spread ‖x‖ −E[‖x‖] = O(1).
indeed, for xi ∈ Ca, xj ∈ Cb, a ∈ {1, 2}1p‖xi − xj‖2 ' τ
for p large, regardless of the classes Ca, Cb!Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 2 / 9
Consequences to large kernel matrices: Gaussian mixture
Classify data x1, . . . , xn into C1 or C2 with distance-based kernel Kij = e−12p ‖xi−xj‖2
.
(a) p = 5, n = 500
K =
(b) p = 250, n = 500
K =
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 3 / 9
Consequences to large kernel matrices: real-world datasets
Distance-based kernel Kij = e−12p ‖xi−xj‖2
on MNIST and Fashion-MNIST data.
(a) MNIST
K =
(b) Fashion-MNIST
K =
Question: impact of large p on performance of kernel-based methods, e.g., LS-SVM?
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 4 / 9
Reminder on least-squares support vector machine
find classifier g(x) = wTϕ(x) + b by minimizing
L(w, b) =γ
n
n
∑i=1
(yi −wTϕ(xi)− b
)2+ ‖w‖2
on training set {(xi, yi)}ni=1, yi ∈ {−1,+1}.
“kernel trick”: g(x) = αT{k(x, xi)}ni=1 + b with
α = Q(y− b1n), b =1T
n Qy1T
n Q1n
where Q ≡(K + γ
n In)−1 resolvent of kernel matrix
K ≡ {k(xi, xj)}ni,j=1 =
{f (‖xi − xj‖2/p)
}n
i,j=1.
for new x, assign to C1 if g(x) < 0 and C2 otherwise.
Key observation: 1p‖xi − xj‖2 ' τ for large p, K only depends on f (τ), f ′(τ) and f ′′(τ)!
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 5 / 9
Main result: exact performance of LS-SVM
Main resultUnder a binary Gaussian mixture model C1 : N (µ1, C1) vs. C2 : N (µ2, C2), decision function g(x) isasymptotically Gaussian
g(x | x ∈ Ca) ∼ N (Ea, Va), a = {1, 2}that depends on data statistics (µa, Ca), hyperparameter γ and kernel function f “locally”.
−0.01 0 0.01
Histogram of g(x | x ∈ C1)
Histogram of g(x | x ∈ C2)
⇒ direct access to classification performance via Gaussian tail Q(x) = 1√2π
∫ ∞x e−
t22 dt.
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 6 / 9
When applied to real world datasets
(a) MNIST
−0.05 0 0.05
(b) Fashion-MNIST
−0.1 0 0.1
Why?
MNIST and Fashion-MINIT data are clearly NOT mixture of Gaussian vectors
when n, p large, algorithms tend to work AS IF they were: use only 1st and 2nd order statistical info.
1Means and covariances of data empirically estimated from the whole database.Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 7 / 9
Conclusion and take-away message
counterintuitive phenomena in real-world large dimensional learning
RMT as a tool to assess exact performance, understand and improve large dimensional learning
in this work: “curse of dimensionality”⇒ exact performance of kernel LS-SVM
more to be done in the general context of large dimensional learning!
Some references and related works:Zhenyu Liao and Romain Couillet. “A Large Dimensional Analysis of Least Squares Support Vector Machines”. In: IEEE Transactions onSignal Processing 67.4 (2019), pp. 1065–1074
Cosme Louart, Zhenyu Liao, and Romain Couillet. “A Random Matrix Approach to Neural Networks”. In: The Annals of AppliedProbability 28.2 (2018), pp. 1190–1248
Zhenyu Liao and Romain Couillet. “On the Spectrum of Random Features Maps of High Dimensional Data”. In: Proceedings of the 35thInternational Conference on Machine Learning. Vol. 80. PMLR, 2018, pp. 3063–3071
Zhenyu Liao and Romain Couillet. “The Dynamics of Learning: A Random Matrix Approach”. In: Proceedings of the 35th InternationalConference on Machine Learning. Vol. 80. PMLR, 2018, pp. 3072–3081
Xiaoyi Mai and Romain Couillet. “A Random Matrix Analysis and Improvement of Semi-supervised Learning for Large DimensionalData”. In: The Journal of Machine Learning Research 19.1 (2018), pp. 3074–3100
Mohamed El Amine Seddik, Mohamed Tamaazousti, and Romain Couillet. “Kernel Random Matrices of Large Concentrated Data: theExample of GAN-Generated Images”. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE. 2019, pp. 7480–7484
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 8 / 9
Thank you
Thank you!
Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 9 / 9