Isaac Newton Institute - Cambridge

Post on 23-Jan-2016

34 views 0 download

description

Isaac Newton Institute - Cambridge. Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina September 1, 2014. Personal Opinions on Mathematical Statistics. What is Mathematical Statistics? Validation of existing methods - PowerPoint PPT Presentation

transcript

11

UNC, Stat & OR

Isaac Newton Institute - CambridgeIsaac Newton Institute - Cambridge

Object Oriented Data Analysis

J. S. Marron

Dept. of Statistics and Operations

Research, University of North Carolina

April 21, 2023

22

UNC, Stat & OR

Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics

What is Mathematical Statistics?

Validation of existing methods

Asymptotics (n ∞) & Taylor

expansion

Comparison of existing methods

(requires hard math, but

really “accounting”???)

33

UNC, Stat & OR

Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics

What could Mathematical Statistics be?

Basis for invention of new methods

Complicated data mathematical

ideas

Do we value creativity?

Since we don’t do this, others do…

(where are the ₤₤₤s???)

44

UNC, Stat & OR

Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics

Since we don’t do this, others do…

Pattern Recognition

Artificial Intelligence

Neural Nets

Data Mining

Machine Learning

???

55

UNC, Stat & OR

Personal Opinions on Mathematical Personal Opinions on Mathematical StatisticsStatistics

Possible Litmus Test:

Creative Statistics

Clinical Trials Viewpoint:

Worst Imaginable Idea

Mathematical Statistics Viewpoint:

???

66

UNC, Stat & OR

Object Oriented Data Analysis, IObject Oriented Data Analysis, I

What is the “atom” of a statistical analysis?

1st Course: Numbers

Multivariate Analysis Course : Vectors

Functional Data Analysis: Curves

More generally: Data Objects

77

UNC, Stat & OR

Object Oriented Data Analysis, IIObject Oriented Data Analysis, II

Examples:

Medical Image Analysis

Images as Data Objects?

Shape Representations as Objects

Micro-arrays

Just multivariate analysis?

88

UNC, Stat & OR

Object Oriented Data Analysis, IIIObject Oriented Data Analysis, III

Typical Goals:

Understanding population variation

Visualization

Principal Component Analysis +

Discrimination (a.k.a. Classification)

Time Series of Data Objects

99

UNC, Stat & OR

Object Oriented Data Analysis, IVObject Oriented Data Analysis, IV

Major Statistical Challenge, I:

High Dimension Low Sample Size (HDLSS)

Dimension d >> sample size n

“Multivariate Analysis” nearly useless Can’t “normalize the data”

Land of Opportunity for Statisticians Need for “creative statisticians”

1010

UNC, Stat & OR

Object Oriented Data Analysis, VObject Oriented Data Analysis, V

Major Statistical Challenge, II:

Data may live in non-Euclidean space Lie Group / Symmet’c Spaces (manifold

data)

Trees/Graphs as data objects

Interesting Issues: What is “the mean” (pop’n center)?

How do we quantify “pop’n variation”?

1111

UNC, Stat & OR

Statistics in Image Analysis, IStatistics in Image Analysis, I

First Generation Problems:

Denoising

Segmentation

Registration

(all about single images)

1212

UNC, Stat & OR

Statistics in Image Analysis, IIStatistics in Image Analysis, II

Second Generation Problems:

Populations of Images

Understanding Population Variation

Discrimination (a.k.a. Classification)

Complex Data Structures (& Spaces)

HDLSS Statistics

1313

UNC, Stat & OR

HDLSS Statistics in Imaging

Why HDLSS (High Dim, Low Sample Size)?

Complex 3-d Objects Hard to Represent Often need d = 100’s of parameters

Complex 3-d Objects Costly to Segment Often have n = 10’s cases

1414

UNC, Stat & OR

Medical Imaging – A Challenging Medical Imaging – A Challenging ExampleExample

Male Pelvis Bladder – Prostate – Rectum How do they move over time (days)? Critical to Radiation Treatment

(cancer) Work with 3-d CT Very Challenging to Segment

Find boundary of each object? Represent each Object?

1515

UNC, Stat & OR

Male Pelvis – Raw DataMale Pelvis – Raw Data

One CT Slice

(in 3d

image)

Coccyx

(Tail Bone)

Rectum

Prostate

1616

UNC, Stat & OR

Male Pelvis – Raw DataMale Pelvis – Raw Data

Prostate:

manual segmentation

Slice by slice

Reassembled

1717

UNC, Stat & OR

Male Pelvis – Raw DataMale Pelvis – Raw Data

Prostate:

Slices:Reassembled in 3d

How to represent?

Thanks: Ja-Yeon Jeong

1818

UNC, Stat & OR

Object RepresentationObject Representation

Landmarks (hard to find) Boundary Rep’ns (no

correspondence) Medial representations

Find “skeleton” Discretize as “atoms” called M-reps

1919

UNC, Stat & OR

3-d m-reps3-d m-reps

Bladder – Prostate – Rectum (multiple objects, J. Y. Jeong)

• Medial Atoms provide “skeleton”

• Implied Boundary from “spokes” “surface”

2020

UNC, Stat & OR

3-d m-reps3-d m-reps

M-rep model fitting

• Easy, when starting from binary (blue)

• But very expensive (30 – 40 minutes technician’s time)

• Want automatic approach

• Challenging, because of poor contrast, noise, …

• Need to borrow information across training sample

• Use Bayes approach: prior & likelihood posterior

• ~Conjugate Gaussians, but there are issues:

• Major HLDSS challenges

• Manifold aspect of data

2121

UNC, Stat & OR

PCA for m-reps, IPCA for m-reps, I

Major issue: m-reps live in(locations, radius and angles)

E.g. “average” of: = ???

Natural Data Structure is:Lie Groups ~ Symmetric spaces

(smooth, curved manifolds)

)2()3(3 SOSO

359,358,3,2

2222

UNC, Stat & OR

PCA for m-reps, IIPCA for m-reps, II

PCA on non-Euclidean spaces?(i.e. on Lie Groups / Symmetric Spaces)

T. Fletcher: Principal Geodesic Analysis

Idea: replace “linear summary of data”With “geodesic summary of data”…

2323

UNC, Stat & OR

PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum

Bladder – Prostate – Rectum, 1 person, 17 days

PG 1 PG 2 PG 3

(analysis by Ja Yeon Jeong)

2424

UNC, Stat & OR

PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum

Bladder – Prostate – Rectum, 1 person, 17 days

PG 1 PG 2 PG 3

(analysis by Ja Yeon Jeong)

2525

UNC, Stat & OR

PGA for m-reps, Bladder-Prostate-PGA for m-reps, Bladder-Prostate-RectumRectum

Bladder – Prostate – Rectum, 1 person, 17 days

PG 1 PG 2 PG 3

(analysis by Ja Yeon Jeong)

2626

UNC, Stat & OR

HDLSS Classification (i.e. HDLSS Classification (i.e. Discrimination)Discrimination)

Background: Two Class (Binary) version:

Using “training data” from Class +1, and from Class -1

Develop a “rule” for assigning new data to a Class

Canonical Example: Disease Diagnosis New Patients are “Healthy” or “Ill” Determined based on measurements

2727

UNC, Stat & OR

HDLSS Classification (Cont.)HDLSS Classification (Cont.)

Ineffective Methods: Fisher Linear Discrimination Gaussian Likelihood Ratio

Less Useful Methods: Nearest Neighbors Neural Nets

(“black boxes”, no “directions” or intuition)

2828

UNC, Stat & OR

HDLSS Classification (Cont.)HDLSS Classification (Cont.)

Currently Fashionable Methods: Support Vector Machines Trees Based Approaches

New High Tech Method Distance Weighted Discrimination

(DWD) Specially designed for HDLSS data Avoids “data piling” problem of SVM Solves more suitable optimization problem

2929

UNC, Stat & OR

HDLSS Classification (Cont.)HDLSS Classification (Cont.)

Currently Fashionable Methods:

Trees Based ApproachesSupport Vector Machines:

3030

UNC, Stat & OR

Distance Weighted DiscriminationDistance Weighted Discrimination

Maximal Data Piling

3131

UNC, Stat & OR

Distance Weighted DiscriminationDistance Weighted Discrimination

Based on Optimization Problem:

More precisely work in appropriate penalty for violations

Optimization Method (Michael Todd): Second Order Cone Programming Still Convex gen’tion of quadratic

prog’ing Fast greedy solution Can use existing software

n

i ibw r1,

1min

3232

UNC, Stat & OR

DWD Bias Adjustment for MicroarraysDWD Bias Adjustment for Microarrays

Microarray data: Simult. Measur’ts of “gene

expression” Intrinsically HDLSS

Dimension d ~ 1,000s – 10,000s Sample Sizes n ~ 10s – 100s

My view: Each array is “point in cloud”

3333

UNC, Stat & OR

DWD Batch and Source AdjustmentDWD Batch and Source Adjustment

For Perou’s Stanford Breast Cancer Data Analysis in Benito, et al (2004)

Bioinformaticshttps://genome.unc.edu/pubsup/dwd/

Adjust for Source Effects Different sources of mRNA

Adjust for Batch Effects Arrays fabricated at different times

3434

UNC, Stat & OR

DWD Adj: Raw Breast Cancer dataDWD Adj: Raw Breast Cancer data

3535

UNC, Stat & OR

DWD Adj: Source ColorsDWD Adj: Source Colors

3636

UNC, Stat & OR

DWD Adj: Batch ColorsDWD Adj: Batch Colors

3737

UNC, Stat & OR

DWD Adj: Biological Class ColorsDWD Adj: Biological Class Colors

3838

UNC, Stat & OR

DWD Adj: Biological Class Colors & DWD Adj: Biological Class Colors & SymbolsSymbols

3939

UNC, Stat & OR

DWD Adj: Biological Class SymbolsDWD Adj: Biological Class Symbols

4040

UNC, Stat & OR

DWD Adj: Source ColorsDWD Adj: Source Colors

4141

UNC, Stat & OR

DWD Adj: PC 1-2 & DWD directionDWD Adj: PC 1-2 & DWD direction

4242

UNC, Stat & OR

DWD Adj: DWD Source AdjustmentDWD Adj: DWD Source Adjustment

4343

UNC, Stat & OR

DWD Adj: Source Adj’d, PCA viewDWD Adj: Source Adj’d, PCA view

4444

UNC, Stat & OR

DWD Adj: Source Adj’d, Class ColoredDWD Adj: Source Adj’d, Class Colored

4545

UNC, Stat & OR

DWD Adj: Source Adj’d, Batch ColoredDWD Adj: Source Adj’d, Batch Colored

4646

UNC, Stat & OR

DWD Adj: Source Adj’d, 5 PCsDWD Adj: Source Adj’d, 5 PCs

4747

UNC, Stat & OR

DWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWDDWD Adj: S. Adj’d, Batch 1,2 vs. 3 DWD

4848

UNC, Stat & OR

DWD Adj: S. & B1,2 vs. 3 AdjustedDWD Adj: S. & B1,2 vs. 3 Adjusted

4949

UNC, Stat & OR

DWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCsDWD Adj: S. & B1,2 vs. 3 Adj’d, 5 PCs

5050

UNC, Stat & OR

DWD Adj: S. & B Adj’d, B1 vs. 2 DWDDWD Adj: S. & B Adj’d, B1 vs. 2 DWD

5151

UNC, Stat & OR

DWD Adj: S. & B Adj’d, B1 vs. 2 Adj’dDWD Adj: S. & B Adj’d, B1 vs. 2 Adj’d

5252

UNC, Stat & OR

DWD Adj: S. & B Adj’d, 5 PC viewDWD Adj: S. & B Adj’d, 5 PC view

5353

UNC, Stat & OR

DWD Adj: S. & B Adj’d, 4 PC viewDWD Adj: S. & B Adj’d, 4 PC view

5454

UNC, Stat & OR

DWD Adj: S. & B Adj’d, Class ColorsDWD Adj: S. & B Adj’d, Class Colors

5555

UNC, Stat & OR

DWD Adj: S. & B Adj’d, Adj’d PCADWD Adj: S. & B Adj’d, Adj’d PCA

5656

UNC, Stat & OR

DWD Bias Adjustment for Microarrays

Effective for Batch and Source Adj. Also works for cross-platform Adj.

E.g. cDNA & Affy Despite literature claiming contrary

“Gene by Gene” vs. “Multivariate” views

Funded as part of caBIG“Cancer BioInformatics Grid”

“Data Combination Effort” of NCI

5757

UNC, Stat & OR

Interesting Benchmark Data SetInteresting Benchmark Data Set

NCI 60 Cell Lines Interesting benchmark, since same cells Data Web available:

http://discover.nci.nih.gov/datasetsNature2000.jsp

Both cDNA and Affymetrix Platforms

8 Major cancer subtypes

Use DWD now for visualization

5858

UNC, Stat & OR

NCI 60: Views using DWD Dir’ns (focus on NCI 60: Views using DWD Dir’ns (focus on biology)biology)

5959

UNC, Stat & OR

DWD in Face Recognition, I

Face Images as Data

(with M. Benito & D. Peña)

Registered using

landmarks

Male – Female Difference?

Discrimination Rule?

6060

UNC, Stat & OR

DWD in Face Recognition, II

DWD Direction

Good separation

Images “make

sense”

Garbage at ends?

(extrapolation

effects?)

6161

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

6262

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

6363

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

6464

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

6565

UNC, Stat & OR

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

Blood vessel tree dataBlood vessel tree data

6666

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Marron’s brain:

Segmented from

MRA

Reconstruct trees

in 3d

Rotate to view

6767

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Now look over many people (data

objects)

Structure of population (understand

variation?)

PCA in strongly non-Euclidean Space???

, ... ,,

6868

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Possible focus of analysis:

• Connectivity structure only (topology)

• Location, size, orientation of segments

• Structure within each vessel segment

, ... ,,

6969

UNC, Stat & OR

Blood vessel tree dataBlood vessel tree data

Present Focus:

Topology only

Already

challenging

Later address

others

Then add

attributes

To tree nodes

And extend

analysis

7070

UNC, Stat & OR

Strongly Non-Euclidean Strongly Non-Euclidean SpacesSpaces

Statistics on Population of Tree-Structured Data Objects?

• Mean???• Analog of PCA???

Strongly non-Euclidean, since:• Space of trees not a linear space• Not even approximately linear

(no tangent plane)

7171

UNC, Stat & OR

Strongly Non-Euclidean Strongly Non-Euclidean SpacesSpaces

PCA on Tree Space?

Key Idea (Jim Ramsay):

• Replace 1-d subspace

that best approximates data

• By 1-d representation

that best approximates data

Wang and Marron (2007) define notion of

Treeline (in structure space)

7272

UNC, Stat & OR

PCA for blood vessel tree PCA for blood vessel tree datadata

Data Analytic Goals: Age, Gender

See

these?

No…

7373

UNC, Stat & OR

Preliminary Tree-Curve Preliminary Tree-Curve ResultsResults

First Correlation

OfStructure

To Age!

(BackTrees)

7474

UNC, Stat & OR

HDLSS Asymptotics

Why study asymptotics?

7575

UNC, Stat & OR

HDLSS Asymptotics

Why study asymptotics?

An interesting (naïve) quote:

“I don’t look at asymptotics, because

I don’t have an infinite sample size”

7676

UNC, Stat & OR

HDLSS Asymptotics

Why study asymptotics?

An interesting (naïve) quote:

“I don’t look at asymptotics, because

I don’t have an infinite sample size”

Suggested perspective:

Asymptotics are a tool for finding simple

structure underlying complex entities

7777

UNC, Stat & OR

HDLSS Asymptotics

Which asymptotics?

n ∞ (classical, very widely

done)

d ∞ ???

Sensible?

Follow typical “sampling process”?

Say anything, as noise level

increases???

7878

UNC, Stat & OR

HDLSS Asymptotics

Which asymptotics?

n ∞ & d ∞

n >> d: a few results around

(still have classical info in data)

n ~ d: random matrices (Iain J., et al)

(nothing classically estimable)

HDLSS asymptotics: n fixed, d ∞

7979

UNC, Stat & OR

HDLSS Asymptotics

HDLSS asymptotics: n fixed, d ∞

Follow typical “sampling process”?

8080

UNC, Stat & OR

HDLSS Asymptotics

HDLSS asymptotics: n fixed, d ∞

Follow typical “sampling process”?

Microarrays: # genes bounded

Proteomics, SNPs, …

A moot point, from perspective:

Asymptotics are a tool for finding

simple structure underlying complex

entities

8181

UNC, Stat & OR

HDLSS Asymptotics

HDLSS asymptotics: n fixed, d ∞

Say anything, as noise level

increases???

8282

UNC, Stat & OR

HDLSS Asymptotics

HDLSS asymptotics: n fixed, d ∞

Say anything, as noise level

increases???

Yes, there exists simple, perhaps

surprising, underlying structure

8383

UNC, Stat & OR

HDLSS Asymptotics: Simple Paradoxes, I

For dim’al “Standard Normal” dist’n:

Euclidean Distance to Origin (as ):

- Data lie roughly on surface of sphere of radius

- Yet origin is point of “highest density”???

- Paradox resolved by:

“density w. r. t. Lebesgue Measure”

d

d

dd

d

IN

Z

Z

Z ,0~1

)1(pOdZ

d

8484

UNC, Stat & OR

HDLSS Asymptotics: Simple Paradoxes, II

For dim’al “Standard Normal” dist’n: indep. of

Euclidean Dist. between and (as ):Distance tends to non-random constant:

Can extend to Where do they all go???

(we can only perceive 3 dim’ns)

d

d

dd INZ ,0~2

)1(221 pOdZZ

1Z

1Z 2Z

nZZ ,...,1

8585

UNC, Stat & OR

HDLSS Asymptotics: Simple Paradoxes, III

For dim’al “Standard Normal” dist’n: indep. of

High dim’al Angles (as ):

- -“Everything is orthogonal”??? - Where do they all go???

(again our perceptual limitations) - Again 1st order structure is non-random

d

d

dd INZ ,0~2

)(90, 2/121

dOZZAngle p

1Z

8686

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, I

Assume , let

Study Subspace Generated by Data

a. Hyperplane through 0, of dimension

b. Points are “nearly equidistant to 0”, & dist

c. Within plane, can “rotate towards Unit Simplex”

d. All Gaussian data sets are“near Unit Simplex Vertices”!!!

“Randomness” appears only in rotation of simplex

n

d ddn INZZ ,0~,...,1

d

d

With P. Hall & A. Neeman

8787

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, II

Assume , let

Study Hyperplane Generated by Data

a. dimensional hyperplane

b. Points are pairwise equidistant, dist

c. Points lie at vertices of “regular hedron”

d. Again “randomness in data” is only in rotation

e. Surprisingly rigid structure in data?

1n

d ddn INZZ ,0~,...,1

d2d~

n

8888

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, III

Simulation View: shows “rigidity after rotation”

8989

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, III

Straightforward Generalizations:

non-Gaussian data: only need moments

non-independent: use “mixing conditions” (with P. Hall & A. Neeman)

Mild Eigenvalue condition on Theoretical Cov. (with J. Ahn, K. Muller & Y. Chi)

All based on simple “Laws of Large Numbers”

9090

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, IV

Explanation of Observed (Simulation) Behavior:

“everything similar for very high d”

2 popn’s are 2 simplices (i.e. regular n-

hedrons) All are same distance from the other class i.e. everything is a support vector i.e. all sensible directions show “data piling” so “sensible methods are all nearly the same” Including 1 - NN

9191

UNC, Stat & OR

HDLSS Asy’s: Geometrical Representation, V

Further Consequences of Geometric Representation

1. Inefficiency of DWD for uneven sample size(motivates “weighted version”, work in progress)

2. DWD more “stable” than SVM(based on “deeper limiting distributions”)(reflects intuitive idea “feeling sampling

variation”)(something like “mean vs. median”)

3. 1-NN rule inefficiency is quantified.

9292

UNC, Stat & OR

2nd Paper on HDLSS Asymptotics

Ahn, Marron, Muller & Chi (2007) Biometrika Assume 2nd Moments (and Gaussian)

Assume no eigenvalues too large in sense:

For assume i.e.

(min possible)

(much weaker than previous mixing conditions…)

d

jj

d

jj

d1

2

2

1

)(1 do 1 d

9393

UNC, Stat & OR

HDLSS Math. Stat. of PCA, I

Consistency & Strong Inconsistency:

Spike Covariance Model (Johnstone & Paul)

For Eigenvalues:

1st Eigenvector:

How good are empirical versions,

as estimates?

1,,1, ,,2,1 dddd d

1u

1,,1 ˆ,ˆ,,ˆ uddd

9494

UNC, Stat & OR

HDLSS Math. Stat. of PCA, II

Consistency (big enough spike):

For ,

Strong Inconsistency (spike not big enough):

For ,

1

0ˆ, 11 uuAngle

1

011 90ˆ, uuAngle

9595

UNC, Stat & OR

HDLSS Math. Stat. of PCA, III

Consistency of eigenvalues?

Eigenvalues Inconsistent

But known distribution

Unless as well

nn

dL

d

2

,1,1̂

n

9696

UNC, Stat & OR

HDLSS Work in Progress, II

Canonical Correlations: Myung Hee Lee

Results similar to those for those for

PCA

Singular values inconsistent

But directions converge under a much

milder spike assumption.

9797

UNC, Stat & OR

HDLSS Work in Progress, III

Conditions for Geo. Rep’n & PCA Consist.:

John Kent example:

Can only say:

not deterministic

Conclude: need some flavor of mixing

dddddd ININX *100,02

1,0

2

1~

212/1212/1

2/1

..10

..)(

pwd

pwddOX p

9898

UNC, Stat & OR

HDLSS Work in Progress, III

Conditions for Geo. Rep’n & PCA Consist.:

Conclude: need some flavor of mixing

Challenge: Classical mixing conditions

require notion of time ordering

Not always clear, e.g. microarrays

9999

UNC, Stat & OR

HDLSS Work in Progress, III

Conditions for Geo. Rep’n & PCA Consist.:

Sungkyu Jung Condition:

where

Define:

Assume: Ǝ a permutation,

So that is ρ-mixing

ddX ,0~ tdddd UU

dtddd XUZ 2/1

d

ddZ

100100

UNC, Stat & OR

HDLSS Deep Open Problem

In PCA Consistency:

Strong Inconsistency - spike

Consistency - spike

What happens at boundary

( )???

1

1

1

101101

UNC, Stat & OR

The Future of HDLSS Asymptotics?

1. Address your favorite statistical problem…

2. HDLSS versions of classical optimality

results?

3. Continguity Approach (~Random Matrices)

4. Rates of convergence?

5. Improved Discrimination Methods?

It is early days…

102102

UNC, Stat & OR

Some Carry Away Lessons

Atoms of the Analysis: Object Oriented

Viewpoint: Object Space Feature Space

DWD is attractive for HDLSS classification

“Randomness” in HDLSS data is only in rotations

(Modulo rotation, have constant simplex shape)

How to put HDLSS asymptotics to work?