Shape as Organizing Principle for
Data
MLConf Seattle 2015
Anthony Bak, Principal Data Scientist
The Data Problem: Complexity
Solution: Topological Summaries
Shape as Organizing
Principle for Data
Shape as Organizing Principle
Reduce Bias, Discover Models
TDA tells you the data you have,
not the data you want to have.
Generating Topological
Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Remember/Forget
Use multiple lenses/metrics to get the complete picture
Different lenses provide different summaries
Generating Topological Summaries
Lenses: where do they come from?
Mean/Max/Min
Variance
n-Moment
Density
…
Statistics
PCA/SVD
Autoencoders
Isomap/MDS/TS
NE
…
Machine
Learning
Centrality
Curvature
Harmonic Cycles
…
Geometry
Why Topology?
Key Properties of TDA
Deformation
Invariance
Compressed
Representation
Coordinate
Freeness
Coordinate Invariance
1. Topology of shape doesn’t depend on the coordinates used to
describe the shape
1. Different feature sets can describe the same phenomena
1. While processing data, we frequently alter coordinates: scaling,
rotating, whitening
You want to study properties of your data that are invariant
under coordinate changes
Coordinate Invariance: Gene Expression
NKI
GSE230
Coordinate Invariance: Disease State
Deformation Invariance
• Topological features don’t change when you stretch and distort the
data
Advantage: Makes problems easier
Noise resistance
Less pre-processing of data
Robust (stable) data
Deformation Invariance
Deformation Invariance
Deformation Invariance
Deformation Invariance
Compressed Representation
• Replace the metric space with a combinatorial summary: a simplicial
complex.
• Data becomes easier to manage, search, and query while
maintaining essential features.
• Leverages many known algorithms from graph theory, computational
topology, computational geometry.
Compressed Representation
Baby Steps: PCA
PCA
PCA
Data Stories
Model Introspection
Model Introspection
Predictive Maintenance
Customer Churn
Customer Churn
Transaction Fraud
Transaction Fraud
Transaction Fraud
We’re Hiring!http://www.ayasdi.com/company/careers/
Data Has Shape
And
Shape Has Meaning