+ All Categories
Home > Documents > Topology for Data Science 3 - University of Florida · 2017. 1. 26. · Cech complex The most...

Topology for Data Science 3 - University of Florida · 2017. 1. 26. · Cech complex The most...

Date post: 06-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
57
1/23 Intro Curvature Application Topology for Data Science 3 Peter Bubenik University of Florida Department of Mathematics [email protected] http://people.clas.ufl.edu/peterbubenik/ January 23, 2017 Tercera Escuela de An´ alisis Topol´ ogico de Datos y Topolog´ ıa Estoc´ astica ABACUS, Estado de M´ exico Peter Bubenik Topology for Data Science 3
Transcript
  • 1/23

    Intro Curvature Application

    Topology for Data Science 3

    Peter Bubenik

    University of FloridaDepartment of [email protected]

    http://people.clas.ufl.edu/peterbubenik/

    January 23, 2017

    Tercera Escuela de Análisis Topológico de Datosy Topoloǵıa Estocástica

    ABACUS, Estado de México

    Peter Bubenik Topology for Data Science 3

    [email protected]://people.clas.ufl.edu/peterbubenik/

  • 2/23

    Intro Curvature Application Motivation

    Homology

    Definition

    Homology in degree k is given by k-cycles modulo thek-boundaries.

    Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 0Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 1Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 2Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 3Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 4Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 5Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 6Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 7Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 8Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 9Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 10Peter Bubenik Topology for Data Science 3

  • 3/23

    Intro Curvature Application Motivation

    Persistent homology

    Main idea

    Vary a parameter and keep track of when homology appears anddisappears.

    radius = 11Peter Bubenik Topology for Data Science 3

  • 4/23

    Intro Curvature Application Motivation

    Barcode and Persistence Landscapes

    Barcode:

    0 2 4 6 8 10 12 14

    Convert to Persistence Landscape:

    2 4 6 8 10 12 14

    2

    4

    6

    0

    λ1

    λ2

    λ3

    λk = 0,

    for k ≥ 4

    Peter Bubenik Topology for Data Science 3

  • 5/23

    Intro Curvature Application Motivation

    Persistent homology of sampled points

    Peter Bubenik Topology for Data Science 3

  • 6/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Short bars

    Question

    Can we understand the small bars in terms of the underlyinggeometry – specifically curvature?

    This is joint work in progress with

    Dhruv Patel (Univ of Florida)

    Benjamin Whittle (Univ of Florida)

    Peter Bubenik Topology for Data Science 3

  • 7/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Metric geometry

    Curvature in a metric space, M

    compare triangles in M with triangles in certain spaces

    Model spaces of constant curvature K

    K = −1: Hyperbolic planeK = 0: Euclidean plane

    K = 1: Sphere of radius 1

    Assumptions:

    sample points independently

    from a uniform density

    on a unit disk of constant curvature

    Peter Bubenik Topology for Data Science 3

  • 8/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    Acute triangles 7→ persistent H1 in the Čech complex

    Asymptotically almost all H1 is of this form.

    Peter Bubenik Topology for Data Science 3

  • 8/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    Acute triangles 7→ persistent H1 in the Čech complex

    Asymptotically almost all H1 is of this form.

    Peter Bubenik Topology for Data Science 3

  • 8/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    Acute triangles 7→ persistent H1 in the Čech complex

    Asymptotically almost all H1 is of this form.

    Peter Bubenik Topology for Data Science 3

  • 8/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    Acute triangles 7→ persistent H1 in the Čech complex

    Asymptotically almost all H1 is of this form.

    Peter Bubenik Topology for Data Science 3

  • 8/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    Acute triangles 7→ persistent H1 in the Čech complex

    Asymptotically almost all H1 is of this form.

    Peter Bubenik Topology for Data Science 3

  • 9/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Čech complex

    The most persistent such H1 arises from equilateral triangles.

    Consider equilateral triangles with circumcircle of radius 1.

    Hyperbolic: death/birth ≈ 1.119Euclidean: death/birth = 2/

    √3 ≈ 1.155

    Spherical: death/birth ≈ 1.225

    Peter Bubenik Topology for Data Science 3

  • 10/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Points sampled from unit disks

    Sample 1000 points

    −1.0 −0.5 0.0 0.5 1.0

    −1.0

    −0.5

    0.0

    0.5

    1.0

    x

    y

    −1.0 −0.5 0.0 0.5 1.0

    −1.0

    −0.5

    0.0

    0.5

    1.0

    x

    y

    Peter Bubenik Topology for Data Science 3

  • 11/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Average Landscapes

    0 50 100 150 200

    05

    1015

    2025

    Average PL in degree 1 for hyperbolic

    Index

    Peter Bubenik Topology for Data Science 3

  • 11/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Average Landscapes

    0 50 100 150 200

    05

    1015

    20

    Average PL in degree 1 for euclidean

    Index

    Peter Bubenik Topology for Data Science 3

  • 11/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Average Landscapes

    0 50 100 150 200

    05

    1015

    20

    Average PL in degree 1 for spherical

    Index

    Peter Bubenik Topology for Data Science 3

  • 12/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Differences in Average Landscapes

    0 50 100 150 200

    −10

    12

    34

    5

    hyperbolic − euclidean in degree 1

    Index

    0 50 100 150 200−1

    01

    2

    euclidean − spherical in degree 1

    Index

    Peter Bubenik Topology for Data Science 3

  • 13/23

    Intro Curvature Application Setup Theory Computations Statistics & Machine Learning

    Classification

    100 samples from each of hyperbolic, euclidean and spherical

    Classify using SVM and 10-fold cross validation

    Classification accuracy

    Using degree 0: 100% Using degree 1: 87%

    Peter Bubenik Topology for Data Science 3

  • 14/23

    Intro Curvature Application Hippocampus Summary

    Alzheimers Disease Neuroimaging Initiative (ADNI)

    Joint work in progress with Ulrich Bauer (TU Munich), and RolandKwitt (Salzburg).

    The data:995 left and right (paired) hippocampi consisting of

    1 284 Normal

    2 307 Mild Cognitive Impairment

    3 178 Late Mild Cognitive Impairment

    4 226 Alzheimer’s Disease (AD)

    Each hippocampus converted to a 32× 32× 32 binary cubical grid.

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 15/23

    Intro Curvature Application Hippocampus Summary

    Left Hippocampi

    Peter Bubenik Topology for Data Science 3

  • 16/23

    Intro Curvature Application Hippocampus Summary

    Persistent homology transform

    Theorem (Turner, Mukherjee, Boyer (2014))

    For a surface in R3, persistent homology of sublevel sets in alldirections is a sufficient statistic.

    Our approach:

    filter each hippocampus in 144 directions

    calculate persistent homology

    convert to persistence landscape

    concatenate

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 17/23

    Intro Curvature Application Hippocampus Summary

    Persistence Landscape Transform

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 18/23

    Intro Curvature Application Hippocampus Summary

    Average Landscapes

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.5

    1.0

    1.5

    Average PL in degree 0 for Normal

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 18/23

    Intro Curvature Application Hippocampus Summary

    Average Landscapes

    0 5000 10000 15000 20000 25000 30000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    Average PL in degree 0 for AD

    Index

    0

    Peter Bubenik Topology for Data Science 3

  • 19/23

    Intro Curvature Application Hippocampus Summary

    Average Landscape Difference

    0 5000 10000 15000 20000 25000 30000

    −0.1

    0.0

    0.1

    0.2

    AD − Normal in degree 0

    Index

    0

    Is this difference significant? Permutation test: Yes (p val 0.000)

    Peter Bubenik Topology for Data Science 3

  • 19/23

    Intro Curvature Application Hippocampus Summary

    Average Landscape Difference

    0 5000 10000 15000 20000 25000 30000

    −0.1

    0.0

    0.1

    0.2

    AD − Normal in degree 0

    Index

    0

    Is this difference significant?

    Permutation test: Yes (p val 0.000)

    Peter Bubenik Topology for Data Science 3

  • 19/23

    Intro Curvature Application Hippocampus Summary

    Average Landscape Difference

    0 5000 10000 15000 20000 25000 30000

    −0.1

    0.0

    0.1

    0.2

    AD − Normal in degree 0

    Index

    0

    Is this difference significant? Permutation test: Yes (p val 0.000)

    Peter Bubenik Topology for Data Science 3

  • 20/23

    Intro Curvature Application Hippocampus Summary

    Principal Components Analysis

    PCA for PL in degree 0

    −50 −40 −30 −20 −10 0 10−20

    −10

    0 1

    0 2

    0 3

    0 4

    0

    −30−20

    −10 0

    10 20

    3

    pca1

    pca2pc

    a3

    1

    11

    1

    1 111

    1

    1

    11

    1 1

    1

    11

    111

    1

    1111 11

    1111

    111

    11

    111

    11

    1

    1

    11

    11

    1

    1

    11

    1

    1

    111 11

    1

    11

    111

    11

    11

    1 1

    1

    1

    1

    11

    1

    1

    11

    11

    11

    11

    1

    11

    111 1

    1

    1

    1

    1

    1

    1

    11

    1

    1

    1

    1

    1

    1

    11

    11111

    1

    1

    1

    1

    1

    1

    1

    111

    1

    111

    11

    1 111

    11

    11 11

    1

    11

    1

    11

    1

    1

    111111

    11

    11

    1111

    11

    11

    1

    11

    111

    11

    1

    1

    1

    11

    1

    1

    1

    1

    1

    1

    1

    1

    11

    1

    1

    1

    11 1

    1

    1

    1

    11

    1

    11111

    1

    1

    111

    1

    1

    11

    1

    11

    1111

    1

    11

    111

    11

    1 11

    1

    1

    11 111

    1

    11

    1

    1

    1

    1

    1

    11

    11

    1

    11

    1

    1 111

    11

    11

    1 1

    11

    1

    1

    11

    1

    1

    11

    11

    1

    111 1

    1 14

    4 4

    44

    4 4

    4

    4

    4

    4

    44

    4

    4

    4

    4

    4

    4

    44

    4

    44

    4

    444

    4

    44

    4

    4

    44

    4

    4

    4

    44

    4

    4 4

    4 44

    44

    44 4

    4

    4

    4

    44

    44

    4

    4

    4

    4

    44

    4

    4

    4 44

    4

    444

    4

    44

    4 44

    4

    4

    4 44

    4

    44

    4

    4

    4

    4

    4

    4 4

    444

    44

    44

    4

    4

    44

    44

    4

    4

    4

    4

    4

    4

    444

    44

    4

    4

    44

    4

    4

    4444

    44

    4

    4

    4

    4

    444

    4

    4

    4

    4

    4

    4

    4 4

    44

    4

    4

    4

    4

    4

    4

    44

    4

    444

    444

    44

    4

    4

    4

    4

    4

    4

    44

    44 444

    4

    4

    4444

    4

    4

    44 4

    44

    4

    44

    44

    4444

    444

    4

    4 4

    444

    4

    44

    4

    44

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    Peter Bubenik Topology for Data Science 3

  • 21/23

    Intro Curvature Application Hippocampus Summary

    Support Vector Machine on PCA coordinates

    Peter Bubenik Topology for Data Science 3

  • 22/23

    Intro Curvature Application Hippocampus Summary

    Classification on Landscape coordinates

    Support vector classification with 10-fold cross validation:

    truepred Normal Alzheimer’s Disease

    Normal 232 83Alzheimer’s Disease 52 143

    Prediction accuracy: 73%

    Peter Bubenik Topology for Data Science 3

  • 23/23

    Intro Curvature Application Hippocampus Summary

    Topological Data Analysis Summary

    DataGeometricstructure

    SummaryStatistics

    & MachineLearning

    Encode

    Topology

    Peter Bubenik Topology for Data Science 3

    IntroMotivation

    CurvatureSetupTheoryComputationsStatistics & Machine Learning

    ApplicationHippocampusSummary


Recommended