Stairstep-like dendrogram cut:a permutation test approach
Dario Bruzzese Domenico [email protected] [email protected]
——————————————————————————————–Department of Department of
Preventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINO
ITALY ITALY
All computations andgraphics were done using theR system (packages: cluster,clusterGeneration, ggplot2)
—————————————Slides has been composed
using LATEX(beamer class) andthe Sweave tool
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 1 / 22
Stairstep-like dendrogram cut:a permutation test approach
Dario Bruzzese Domenico [email protected] [email protected]
——————————————————————————————–Department of Department of
Preventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINO
ITALY ITALY
All computations andgraphics were done using theR system (packages: cluster,clusterGeneration, ggplot2)
—————————————Slides has been composed
using LATEX(beamer class) andthe Sweave tool
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 1 / 22
(a not necessarily regular cut for a dendrogram)
Motivation
The rep1HighNoisedatasetYeung KY, Medvedovic M,Bumgarner KY:Clustering gene-expression datawith repeated measurements.
Genome Biology, 2003, 4:R34
n = 200p = 20It is a synthetic data set with
error distributions derived from
real array data.
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
An alternative cutk = 3 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
An alternative cutk = 3 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
An alternative cutk = 4 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
An alternative cutk = 4 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
α = 0.015 clusters
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
α = 0.015 clusters
An alternative cutk = 5 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
Motivation
Horizontal cutk = 2 (red clusters)k = 3 (green clusters)k = 4 (blue clusters). . .
k = 7 (brown clusters)
α = 0.015 clusters
An alternative cutk = 5 (rainbow clusters)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 2 / 22
The reference framework
Tools
Statistics
R
The reference framework
Tools
Statistics
Hierarchi-cal
clustering
Permuta-tiontests
R
The reference framework
Tools
Statistics
Hierarchi-cal
clustering
Permuta-tiontests
R
hclustplot.hclust
{stats}
genRandom-Clust
{cluster-Generation}qplot
ggplot{ggplot2}
La Carte
1 A (? simple ?) idea
2 A (? not so ?) simple procedure
3 Some results
4 The Wishlist
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 4 / 22
La Carte
1 A (? simple ?) idea
2 A (? not so ?) simple procedure
3 Some results
4 The Wishlist
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 5 / 22
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
Let:
n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
︸ ︷︷ ︸
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
C1RC1
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
C2RC2
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
C3RC3
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C1L ∪ C1
R
”
C1RC1
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C2L ∪ C2
R
”
C2RC2
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C3L ∪ C3
R
”
C3RC3
L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C1L
”
C1L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C1R
”
C1R
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C2L
”
C2L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C2R
”
C2R
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C3L
”
C3L
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple idea - notation
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 6 / 22
h“
C3R
”
C3R
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22
The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1
repeatif C i
L ≡ C iR then
add C iL ∪ C i
R to permClusterselse
add h(C iL) and h(C i
R) to aggregationLevelsToVisitsort aggregationLevelsToVisit in descending order
endremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22
The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderend
remove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22
The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22
The (? simple ?) ideaInput: A dataset and its related dendrogramOutput: A partition of the dataset
initialization:aggregationLevelsToVisit← h(C1
L ∪ C1R)
permClusters← [ ]i← 1repeat
if C iL ≡ C i
R thenadd C i
L ∪ C iR to permClusters
elseadd h(C i
L) and h(C iR) to aggregationLevelsToVisit
sort aggregationLevelsToVisit in descending orderendremove the first element from aggregationLevelsToVisiti← i+1
until aggregationLevelsToVisit is empty
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 7 / 22
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
Iterationi ← 1
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
h“
C1L ∪ C1
R
”
C1RC1
L
permClusters
aggregationLevelsToVisit
h(C1L ∪ C1
R)
Iterationi ← 1
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
C1RC1
L
clusters to compare
H0 : C1L ≡ C1
R 7→ reject
permClusters
aggregationLevelsToVisit
h(C1L ∪ C1
R)
Iterationi ← 1
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
aggregationLevelsToVisit
h(C1R),h(C1
L)
Iterationi ← 2
C1RC1
L
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
h“
C1R
”
C1R
aggregationLevelsToVisit
h(C1R),h(C1
L)
Iterationi ← 2
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
C2RC2
L
clusters to compare
H0 : C2L ≡ C2
R 7→ reject
aggregationLevelsToVisit
h(C1R),h(C1
L)
Iterationi ← 2
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
C2RC2
L
aggregationLevelsToVisit
h(C1L),h(C2
R),h(C2L)
Iterationi ← 3
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
h“
C1L
”
C1L
aggregationLevelsToVisit
h(C1L),h(C2
R),h(C2L)
Iterationi ← 3
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
C3RC3
L
clusters to compare
H0 : C3L ≡ C3
R 7→ reject
aggregationLevelsToVisit
h(C1L),h(C2
R),h(C2L)
Iterationi ← 3
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
aggregationLevelsToVisit
h(C3R),h(C2
R),h(C2L),h(C3
L)
Iterationi ← 4
C3RC3
LpermClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
h“
C3R
”
C3R
aggregationLevelsToVisit
h(C3R),h(C2
R),h(C2L),h(C3
L)
Iterationi ← 4
permClusters
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
permClusters
C4L ∪ C4
R
clusters to compare
H0 : C4L ≡ C4
R 7→ accept
C4RC4
L
aggregationLevelsToVisit
h(C3R),h(C2
R),h(C2L),h(C3
L)
Iterationi ← 4
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
C3R
permClusters
C4L ∪ C4
R ⇔ C3R
clusters to compare
H0 : C4L ≡ C4
R 7→ accept
aggregationLevelsToVisit
h(C3R),h(C2
R),h(C2L),h(C3
L)
Iterationi ← 4
The (? not so ?) simple idea in action
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 8 / 22
permClusters
C3L ,C
3R,C
2L ,C
4L ,C
4R
aggregationLevelsToVisit
Iterationi ← 9
aggregationLevelsToVisit
h(C3R),h(C2
R),h(C2L),h(C3
L)
La Carte
1 A (? simple ?) idea
2 A (? not so ?) simple procedure
3 Some results
4 The Wishlist
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 9 / 22
The (? not so ?) simple procedure
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple procedure
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22
max h(C3j )
min h(C3j )
For each k , the difference between maxj∈{L,R}
h“
Ckj
”and min
j∈{L,R}h
“Ck
j
”can be considered
as the minimum cost necessary to merge the two classes..
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple procedure
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22
max h(C3j )
h(C3L ∪ C3
R )
For each k , the difference between maxj∈{L,R}
h“
Ckj
”and min
j∈{L,R}h
“Ck
j
”can be considered
as the minimum cost necessary to merge the two classes.
The difference between h“
CkL ∪ Ck
R
”and max
j∈{L,R}h
“Ck
j
”can be, instead, considered as
the cost actually incurred for merging CkL and Ck
R .
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple procedure
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 10 / 22
The ratio between these two costs:
maxj∈{L,R}
h“
Ckj
”− min
j∈{L,R}h
“Ck
j
”h
`Ck
L ∪ CkR
´− max
j∈{L,R}h
“Ck
j
”is thus a measure that characterizes the aggregation process resulting in thenew class Ck
L ∪ CkR
Let:n the number of objects to classify;
CkL and Ck
R the two classes merged at level k(k=1,...,n-1)
h“
CkL ∪ Ck
R
”the height necessary to merge
CkL and Ck
R
h“
Ckj
”the height at which Ck
j has been obtained(j ∈ { L, R })
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
C1L C1
R
The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck
L and CkR really
belong to the same cluster, i.e. :
H0 : CkL ≡ Ck
R
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
C1L C1
R
The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck
L and CkR really
belong to the same cluster, i.e. :
H0 : CkL ≡ Ck
R
Under H0, mixing up (permuting) the statistical unitsof Ck
L and CkR should not alter the aggregation pro-
cess resulting in their merging in.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1LmC1
R
C1L C1
R
Let mCkL and mCk
R be the two new classes obtained by permuting the elements in CkL and Ck
R
The algorithm retraces down-ward the tree, startingfrom the root of the dendrogram where all objectsare classified in a unique cluster.∀ k a permutation test is designed to test the NullHypothesis that the two classes Ck
L and CkR really
belong to the same cluster, i.e. :
H0 : CkL ≡ Ck
R
Under H0, mixing up (permuting) the statistical unitsof Ck
L and CkR should not alter the aggregation pro-
cess resulting in their merging in.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1R
mC1L
mC1LmC1
R
C1L C1
R
Let mCkL and mCk
R be the two new classes obtained by permuting the elements in CkL and Ck
R
For each of them a new dendrogram is generated.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1R
mC1L
mC1LmC1
R
C1L C1
R
h(mC1R)
h(mC1L)
Let mCkL and mCk
R be the two new classes obtained by permuting the elements in CkL and Ck
R
For each of them a new dendrogram is generated.
The heights at which each of the two classes are buit up again, clearly correspondto the heights of the root nodes of the corresponding dendrograms.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1R
mC1L
mC1LmC1
R
C1L C1
R
h(mC1R)
h(mC1L)
The ratio:
cost“
mCkL ∪ mCk
R
”=
maxj∈{L,R}
h“
mCkj
”− min
j∈{L,R}h
“mCk
j
”h
`Ck
L ∪ CkR
´− max
j∈{L,R}h
“mCk
j
”is thus a measure that characterizes the aggregation process resulting in thenew (potential) class mCk
L ∪ mCkR
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1R
mC1L
mC1LmC1
R
C1L C1
R
Under H0 the aggregation process resulting in the new cluster CkL ∪ Ck
R should be very similar
to the one that potentially produces mCkL ∪ mCk
R ; thus the two values cost“
mCkL ∪ mCk
R
”and
cost“
CkL ∪ Ck
R
”should be close enough.
The (? not so ?) simple procedure: detail
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 11 / 22
mC1R
mC1L
mC1LmC1
R
C1L C1
R
The permutation procedure is repeated M times and each time a new couple mCkL , mCk
R is ob-tained. The pvalue Montecarlo is thus computed as:
p =#
˘cost
`mCk
L ∪ mCkR
´≤ cost
`Ck
L ∪ CkR
´¯+ 1
M + 1
La Carte
1 A (? simple ?) idea
2 A (? not so ?) simple procedure
3 Some results
4 The Wishlist
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 12 / 22
Some results
The yeast galactosedatasetIdeker T, Thorsson V, Ranish JA,Christmas R, Buhler J, Eng JK,Bumgarner RE, Goodlett DR,Aebersold R, Hood LIntegrated genomic andproteomic analyses of asystemically perturbed metabolicnetwork.
Science 2001, 292:929-934.
n = 205p = 80It is a subset of 205 genes that
reflect four functional categories
in the Gene Ontology listings.
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22
Some results
SettingsdistanceMethod = euclideanaggregationMethod = Wardα = 0.05M = 999
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22
Some results
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 13 / 22
Some results
The diabetes datasetBanfield JD, Raftery AEModel–based Gaussian andNon–Gaussian Clustering.
Biometrics, 1993, 49, 803-821.
n = 145p = 3It contains 145 subjects divided
into three groups (normal,
chemical diabetes, overt
diabetes) on the basis of their
oral glucose tolerance
descripted by three variables
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22
Some results
SettingsdistanceMethod = euclideanaggregationMethod = Wardα = 0.05M = 999
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22
Some results
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 14 / 22
Some results... for 5 variables
genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22
Some results... for 5 variables
genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = Ward
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22
Some results... for 5 variables
genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22
Some results... for 5 variables
genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22
Some results... for 5 variables
genRandomClusternumClust = 2:7numNonNoisy = 5sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 15 / 22
Some results... for 5 variables (100 replications)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 16 / 22
Some results... for 10 variables
genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22
Some results... for 10 variables
genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = Ward
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22
Some results... for 10 variables
genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22
Some results... for 10 variables
genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22
Some results... for 10 variables
genRandomClusternumClust = 2:7numNonNoisy = 10sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 17 / 22
Some results... for 10 variables (100 replications)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 18 / 22
Some results... for 15 variables
genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22
Some results... for 15 variables
genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = Ward
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22
Some results... for 15 variables
genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.1
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22
Some results... for 15 variables
genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.05
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22
Some results... for 15 variables
genRandomClusternumClust = 2:7numNonNoisy = 15sepVal = 0.01
SettingsdistanceMethod = euclideanaggregationMethod = WardM = 999α = 0.01
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 19 / 22
Some results... for 15 variables (100 replications)
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 20 / 22
La Carte
1 A (? simple ?) idea
2 A (? not so ?) simple procedure
3 Some results
4 The Wishlist
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 21 / 22
The wishlist
Statistical issues
Quality measures of the obtained partitionUse of different types of clusters
I different cardinality of clustersI different type of cluster generation
Study on the stability of the number of Montecarlo replicationsComputational complexity
R issues
profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22
The wishlist
Statistical issues
Quality measures of the obtained partitionUse of different types of clusters
I different cardinality of clustersI different type of cluster generation
Study on the stability of the number of Montecarlo replicationsComputational complexity
R issues
profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22
The wishlist
Statistical issues
Quality measures of the obtained partitionUse of different types of clusters
I different cardinality of clustersI different type of cluster generation
Study on the stability of the number of Montecarlo replicationsComputational complexity
R issues
profiling and optimizing the R codeuse of compiled codeuse of S3–S4 methodsdeploying a package
D. Bruzzese, D. Vistocco ( ——————————————————————————————–Department of Department ofPreventive Medical Sciences EconomicsUNIVERSITY OF NAPLES UNIVERSITY OF CASSINOITALY ITALY )Stairstep-like dendrogram cut UseR 2009 22 / 22