Date post: | 21-Feb-2017 |
Category: |
Data & Analytics |
Upload: | david-khosid |
View: | 608 times |
Download: | 0 times |
Good visualizationMathematical framework
Implementation
Visualizing Data Using t-SNE
David Khosid
Dec. 21, 2015
1 / 20
Good visualizationMathematical framework
Implementation
Agenda
Good visualizationMechanics of t-SNEExamples: image, text, voiceScalability: large datasets visualization, up to tens of millionsImplementations: scikit-learn, Matlab, Torch
2 / 20
Good visualizationMathematical framework
Implementation
MNIST visualization with PDA
This PDA visualization is terrible
3 / 20
Good visualizationMathematical framework
Implementation
MNIST visualization with t-SNE in 2Dt-SNE visualization can help you identify various clusters.
Youtube link to 3D t-SNE
(a) MNIST in t-SNE (b) Learning animation (view with AdobeReader)
4 / 20
http://youtu.be/tMQAwqsMb6k
Good visualizationMathematical framework
Implementation
Good visualization (requirements)
Each high-dimensional object is represented by alow-dimensional object.Preserve the neighborhoodDistant points correspond to dissimilar objectsScalability: large, high-dimensional data sets.
5 / 20
Good visualizationMathematical framework
Implementation
Manifold Learning
ManifoldsMNIST: 10 intrinsicdimensions in 28x28 imagesImages - 100 dimsText - 1000 dims
PCAPCA is mainly concerneddimensionality, with preservinglarge pairwise distances in themap
Swiss Roll
6 / 20
Good visualizationMathematical framework
Implementation
Idea of t-SNE
A data point - is a point xi in the original data space RDA map point - is a point yi in the map space R2/R3. Everymap point represents one of the original data pointst-SNE is a visualization algorithm that choose positions of themap points in R2/R3
t-SNE procedure:1 Compute an N N similarity matrix in the original RD space2 Define an N N similarity matrix the low-dimensional
embedding space - a learn objective3 Define cost function - Kullback-Leibler divergence between
the two probability distributions4 Learn low-dimensional embedding
Result: t-SNE focuses on accurately modelling small pairwisedistances, i.e., on preserving local data structure in the R2/R3
7 / 20
Good visualizationMathematical framework
Implementation
Conditional similarity between two data points
Similarity of datapoints (xi ) in data space RD
pj|i =exp(xixj
2
22i)
k 6=m exp(xkxm2
22i)
pj|i measures how close xj is from xi , considering Gaussiandistribution around xi with a given variance 2i .
8 / 20
Good visualizationMathematical framework
Implementation
Symmetric similarity
Similarity of datapoints (xi ) in data space RD
pj|i =exp(xixj
2
22i)
k 6=m exp(xkxm2
22i)
(1)
Make the similarity metric pij symmetric. The main advantage ofsymmetry is simplifying the gradient (learning stage):
pij =pi |j + pj|i
2N (2)
we set pii = 0, as we interested in pairwise similaritiesi is chosen such that the data point has a fixed perplexity(effective number of neighbors).
9 / 20
Good visualizationMathematical framework
Implementation
Similarity of map points in Low Dimension
Student t-distribution with one degree of freedom (same as Cauchydistribution)
qij =(1 + yi yj2)1
k 6=m(1 + yk ym2)1(3)
we set qii = 0, as we interested in pairwise similaritiesheavy-tail (will be discussed later)still closely related to the Gaussiancomputationally convenient (no exponent)
10 / 20
Good visualizationMathematical framework
Implementation
Kullback-Leibler divergence (Cost Function)
(pij) is fixed, (qij) is flexible.We want (pij) and (qij) to be as close as possible.
C =
iKL(PiQi ) =
i
j
pji logpijqij
(4)
KL divergence:is not a distance, since it is asymmetriclarge pij modelled by small qij large penaltySmall pij modelled by large qij small penaltyKL divergence meaning: cross-entropy
11 / 20
Good visualizationMathematical framework
Implementation
Learning: Gradient of t-SNE
t-SNE algorithm minimizes KL divergence between P and Qdistributions.
Cy = 4
i 6=j
(pij qij)yi yj
1 + yi yj2(5)
positive attraction, negative repulsion(dissimilar DPs, similar MPs) repulsionrepulsions do not go to infinity
12 / 20
Good visualizationMathematical framework
Implementation
Learning: Physical Analogy
Cy = 4
i 6=j
(pij qij)yi yj
1 + yi yj2
Physical Analogy: F = k x , attraction/repulsion
13 / 20
Good visualizationMathematical framework
Implementation
Why t-Student for qij , instead of Gaussian?Q: How many equidistant datapoints in 10 dimensions?Crowding Problem: the area of the 2D map that is available toaccomodate moderately distant datapoints will not be largeenough compared with the area available to accommodate nearbydatapoints.
14 / 20
Good visualizationMathematical framework
Implementation
t-SNE in sklearn
Follow example:http://alexanderfabisch.github.io/t-sne-in-scikit-learn.html
15 / 20
http://alexanderfabisch.github.io/t-sne-in-scikit-learn.html
Good visualizationMathematical framework
Implementation
Scalability: Barnes-Hut-SNE
Original t-SNE data and computational complexity is O(N2).Limits 10K points.Reduce complexity to O(N log(N)) via Barnes-Hut-SNE(tree-based) algorithm. Up to tens of millions data points.
16 / 20
Good visualizationMathematical framework
Implementation
Review of t-SNE for Images, Speach, Text
(Flash Player should be installed on Windows, to see the embedded video)
17 / 20
Good visualizationMathematical framework
Implementation
Additional points
Q: Every time I run t-SNE, I get a (slightly) different result?Discussion: KL divergence in informative theoryQ: We want pij = pji and defined pij =
pi|j+pj|i2N . Why we
chose symmetric similarity metric?Discussion: What is the best visualization method forhigh-dimensional data so far?Q: Is it feasible to use t-SNE to reduce a dataset to onedimension?A: yes
18 / 20
Good visualizationMathematical framework
Implementation
Summary, Q&A
t-SNE is an effective method to visualize a complex datasetst-SNE exposes natural clustersImplemented in many languagesScalable with O(NlogN) version
19 / 20
Good visualizationMathematical framework
Implementation
References
Laurens van der Maaten page: https://lvdmaaten.github.io/tsne/
Kevin Murphy Machine Learning: a Probabilistic Perspective,MIT, 2012https://www.oreilly.com/learning/an-illustrated-introduction-to-the-t-sne-algorithm
20 / 20
https://lvdmaaten.github.io/tsne/
Good visualizationMathematical frameworkImplementation
0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: 0.16: 0.17: 0.18: 0.19: 0.20: 0.21: 0.22: 0.23: 0.24: 0.25: 0.26: 0.27: 0.28: 0.29: 0.30: 0.31: 0.32: 0.33: 0.34: 0.35: 0.36: 0.37: 0.38: 0.39: 0.40: 0.41: 0.42: 0.43: 0.44: 0.45: 0.46: 0.47: 0.48: 0.49: 0.50: 0.51: 0.52: 0.53: 0.54: 0.55: 0.56: 0.57: 0.58: 0.59: 0.60: 0.61: 0.62: 0.63: 0.64: 0.65: 0.66: 0.67: 0.68: 0.69: 0.70: 0.71: 0.72: 0.73: 0.74: 0.75: 0.76: 0.77: 0.78: 0.79: 0.80: 0.81: 0.82: 0.83: 0.84: 0.85: 0.86: 0.87: 0.88: 0.89: 0.90: 0.91: 0.92: 0.93: 0.94: 0.95: 0.96: 0.97: anm0: