EULERIAN TOUR ALGORITHMS FOR DATA VISUALIZATION
AND THE PAIRVIZ PACKAGE
Catherine HurleyNUI Maynooth
R.W. OldfordU. Waterloo
July 8 2009 UseR!Monday 13 July 2009
Graphics: Effect Ordering
• Packages: seriation, gclus, corrgram
• Example: PCP Flea data
Standard order
Tars1 Tars2 Aede1 Aede2 Head Aede3
Correlation order
Tars2 Aede1 Aede2 Aede3 Tars1 Head
-0.6
0.2
Monday 13 July 2009
Pairviz: relationship ordering
• Statistical graphics are about comparisons
between variables, cases, groups, models
Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head
-0.6
0.0
0.6
Flea data: correlation order
Monday 13 July 2009
A graph model• Build a graph where nodes are statistical objects
• Edges are relationships
• Example:
Node Vis Edge Vis
Group Boxplot Two groups CI for mean diffVar Hist Two vars Scatterplot
2 vars Scat 4-d space Dynamic scatModel Resid 2 Models PCP
A
BC
D
E F
Monday 13 July 2009
Example: planned comparisons
Mice in 5 diet groups, response is lifetime
Nodes are treatments, edges are planned comparisons
Weights are p-values
0 0
0.0083
0.0147
0.3111N/N85
N/R40
N/R50NP R/R50
lopro
N/R50 N/N85 NP lopro N/R50 N/R40 R/R50 N/R5010
20
30
40
50
Planned comparisons of diets
Lifetime
-50
510
Differences
Reducing calories and protein increases lifetime
Monday 13 July 2009
Graph Traversal
• Traverse all nodes: hamiltonian path
• Traverse all edges: eulerian path
• Use gclus, seriation: hamiltonian paths on complete graphs
• PairViz: eulerian paths
A
B
C
D
E
F
G
H
A
B
C
D
E
F
G
H
Open hamiltonian path Closed hamiltonian path
Closed eulerian path on K7
A
B
C
D
E
F
G
Monday 13 July 2009
Graph Structures
• Complete graph: all comparisons are interesting
• Edge-weighted graphs: low weight edges are more interesting
• Bipartite graph
eg only treatment-control comparisons are of interest
Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head
-0.6
0.0
0.6
Weight edges by 1-corr, eulerian follows low weight edges
X1
X2
X3
Y1
Y2
Monday 13 July 2009
•Hypercube graph
or model selection: Each node in G is a predictor subsetedge: add/drop predictor
Graph Structures- cont’d
•Line graph
transform G
to L(G)
eg Each node in G is a var, each node in L(G) is var pair, edge is 3-d transition
Cube for factorial experiment
000 001
010 011
100 101
110 111
A
B
C
D
AB
ACAD
BC
BD CD
Monday 13 July 2009
Algorithms- Complete graph• Closed eulerian path exists when each node has odd number of vertices: ie for K2n+1
• Hamiltonian decomposition of graph
• into hamiltonian cycles: eulerian for K2n+1
• into hamiltonian paths: approx eulerian for K2n
• classical algorithm: hpaths
• WHam: weighted_hpaths: pick best for H1, best orientaton and order for others.
1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
2
3
4
5
6
7
Monday 13 July 2009
Algorithms-Complete graph cont’d
• Recursive algorithm: eseq:
• Start with eulerian on Kn, append edges to get eulerian on Kn+2
1 2 3 4 5
6 7
Monday 13 July 2009
Algorithms- general
• Eulerian graph: connected, all nodes have even number of edges
• Otherwise, add edges, pairing up odd nodes
• Classical algorithm (Hierholzer, Fleury)
• Our version GrEul, (etour) follows weight increasing edges
Chinese postman does this in optimal way0 0
0.0083
0.0147
0.3111N/N85
N/R40
N/R50NP R/R50
lopro
Monday 13 July 2009
Algorithms comparison
Complete-no weights
0 5 10 15 20 25 30 35
24
68
Etour 9
0 5 10 15 20 25 30 35
24
68
Eseq 9
0 5 10 15 20 25 30 352
46
8
hpaths 9
prefers low vertices prefers low edges 4 hamiltonians
Monday 13 July 2009
Algorithms: complete, weighted
0 50 100 150 200
01000
2000
3000
4000
Algorithm eseq: Eurodist edge weights
0 50 100 150 200
01000
2000
3000
4000
Weighted etour on Eurodist
0 50 100 150 200
01000
2000
3000
4000
Weighted hamiltonians on Eurodist
1 2 3 4 5 6 7 8 9 10
ignores weights Starts in Geneva
hamiltonian decomp, with increasing path lengths
Eurodist: 21 European cities
Monday 13 July 2009
Example: model selection
Mammal sleep data Y= log brain wt.
Predictors A= non dreaming sleep, B=dreaming sleep, C=log body wt, D=life span
0
A B C D
AB AC AD BC BD CD
ABC ABD ACD BCD
ABCD
•Hypercube graph represents possible moves in a stepwise regression algorithm•Graph Qn is hamiltonian, and eulerian for even n•Edge weights: change in SSE
•Eulerian starting with full model•All models with C are good•Bar chart: change in SSE
Sleep data: Model residuals.
ABCD BCD CD ACD ABCD ABC BC C AC ABC AB A AD ABD BD D AD ACD AC A 0 D CD C 0 B BD BCD BC B AB ABD ABCD
Monday 13 July 2009
More variables
Sleep data: 10 vars (nodes)45 edgesEulerian has length 50
Eulerian on scagnostics: Outlying
GP Bd L Br Bd SW PS TS SE PS TS D L P L PS Br P TS Bd TS PS P D D Br P D
0.0
0.3
0.6
Using outlying index from scagnostics package for eulerian traversalzoom on first half of display
Monday 13 July 2009
More variables-cont’d
Reduce the graphNN graph: eliminate edges with outlier index < .2
Reduces graph from 10 to 5 nodes, and 45 to 5 edgesOther nodes have no edges
NN Eulerian on scagnostics: Outlying
GP L Bd SW L Br GP
0.0
0.3
0.6
SW
Bd
Br
L
GP
Monday 13 July 2009
IN CONCLUSION..
• Pairviz package: relationship ordering for data visualisation
• Current version: algorithms presented here
• Thanks to graph, igraph
• Work in progress: ordering dynamic visualisations via ggobi.
with Adrian Waddell, UW
Monday 13 July 2009