+ All Categories
Home > Documents > A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian...

A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian...

Date post: 14-Oct-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
9
A Large Dimensional Analysis of Kernel LS-SVM ED STIC reception meeting 2019-2020 Zhenyu Liao joint work with Romain Couillet CentraleSupélec, Université Paris-Saclay, France. Nov 28, 2019 Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 1/9
Transcript
Page 1: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

A Large Dimensional Analysis of Kernel LS-SVMED STIC reception meeting 2019-2020

Zhenyu Liao

joint work with Romain Couillet

CentraleSupélec, Université Paris-Saclay, France.

Nov 28, 2019

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 1 / 9

Page 2: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Motivation: counterintuitive phenomena in large dimensional learning

Big Data era: large dimensional and massive amount of datadata number n and dimension p both large and comparable: analysis with Random Matrix Theory“curse of dimensionality” in large dimensional classification:

C1 : N (−µ, Ip) versus C2 : N (+µ, Ip)

x ∈ Rp has norm ‖x‖ = O(√

p) with spread ‖x‖ −E[‖x‖] = O(1).

indeed, for xi ∈ Ca, xj ∈ Cb, a ∈ {1, 2}1p‖xi − xj‖2 ' τ

for p large, regardless of the classes Ca, Cb!Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 2 / 9

Page 3: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Consequences to large kernel matrices: Gaussian mixture

Classify data x1, . . . , xn into C1 or C2 with distance-based kernel Kij = e−12p ‖xi−xj‖2

.

(a) p = 5, n = 500

K =

(b) p = 250, n = 500

K =

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 3 / 9

Page 4: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Consequences to large kernel matrices: real-world datasets

Distance-based kernel Kij = e−12p ‖xi−xj‖2

on MNIST and Fashion-MNIST data.

(a) MNIST

K =

(b) Fashion-MNIST

K =

Question: impact of large p on performance of kernel-based methods, e.g., LS-SVM?

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 4 / 9

Page 5: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Reminder on least-squares support vector machine

find classifier g(x) = wTϕ(x) + b by minimizing

L(w, b) =γ

n

n

∑i=1

(yi −wTϕ(xi)− b

)2+ ‖w‖2

on training set {(xi, yi)}ni=1, yi ∈ {−1,+1}.

“kernel trick”: g(x) = αT{k(x, xi)}ni=1 + b with

α = Q(y− b1n), b =1T

n Qy1T

n Q1n

where Q ≡(K + γ

n In)−1 resolvent of kernel matrix

K ≡ {k(xi, xj)}ni,j=1 =

{f (‖xi − xj‖2/p)

}n

i,j=1.

for new x, assign to C1 if g(x) < 0 and C2 otherwise.

Key observation: 1p‖xi − xj‖2 ' τ for large p, K only depends on f (τ), f ′(τ) and f ′′(τ)!

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 5 / 9

Page 6: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Main result: exact performance of LS-SVM

Main resultUnder a binary Gaussian mixture model C1 : N (µ1, C1) vs. C2 : N (µ2, C2), decision function g(x) isasymptotically Gaussian

g(x | x ∈ Ca) ∼ N (Ea, Va), a = {1, 2}that depends on data statistics (µa, Ca), hyperparameter γ and kernel function f “locally”.

−0.01 0 0.01

Histogram of g(x | x ∈ C1)

Histogram of g(x | x ∈ C2)

⇒ direct access to classification performance via Gaussian tail Q(x) = 1√2π

∫ ∞x e−

t22 dt.

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 6 / 9

Page 7: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

When applied to real world datasets

(a) MNIST

−0.05 0 0.05

(b) Fashion-MNIST

−0.1 0 0.1

Why?

MNIST and Fashion-MINIT data are clearly NOT mixture of Gaussian vectors

when n, p large, algorithms tend to work AS IF they were: use only 1st and 2nd order statistical info.

1Means and covariances of data empirically estimated from the whole database.Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 7 / 9

Page 8: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Conclusion and take-away message

counterintuitive phenomena in real-world large dimensional learning

RMT as a tool to assess exact performance, understand and improve large dimensional learning

in this work: “curse of dimensionality”⇒ exact performance of kernel LS-SVM

more to be done in the general context of large dimensional learning!

Some references and related works:Zhenyu Liao and Romain Couillet. “A Large Dimensional Analysis of Least Squares Support Vector Machines”. In: IEEE Transactions onSignal Processing 67.4 (2019), pp. 1065–1074

Cosme Louart, Zhenyu Liao, and Romain Couillet. “A Random Matrix Approach to Neural Networks”. In: The Annals of AppliedProbability 28.2 (2018), pp. 1190–1248

Zhenyu Liao and Romain Couillet. “On the Spectrum of Random Features Maps of High Dimensional Data”. In: Proceedings of the 35thInternational Conference on Machine Learning. Vol. 80. PMLR, 2018, pp. 3063–3071

Zhenyu Liao and Romain Couillet. “The Dynamics of Learning: A Random Matrix Approach”. In: Proceedings of the 35th InternationalConference on Machine Learning. Vol. 80. PMLR, 2018, pp. 3072–3081

Xiaoyi Mai and Romain Couillet. “A Random Matrix Analysis and Improvement of Semi-supervised Learning for Large DimensionalData”. In: The Journal of Machine Learning Research 19.1 (2018), pp. 3074–3100

Mohamed El Amine Seddik, Mohamed Tamaazousti, and Romain Couillet. “Kernel Random Matrices of Large Concentrated Data: theExample of GAN-Generated Images”. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE. 2019, pp. 7480–7484

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 8 / 9

Page 9: A Large Dimensional Analysis of Kernel LS-SVM · Consequences to large kernel matrices: Gaussian mixture Classify data x1,. . .,xn into C 1 or C 2 withdistance-based kernel Kij =

Thank you

Thank you!

Z. Liao (CentraleSupélec, U Paris-Saclay) RMT for LS-SVM Nov 28, 2019 9 / 9


Recommended