Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | clayton-russo |
View: | 38 times |
Download: | 0 times |
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
An XMT/MTGL Case Study: PageRank
Jonathan Berry Scalable Algorithms Department
Sandia National Laboratories
July 23, 2008
Informatics Datasets Are Different
Informatics: The analysis of datasets arising from “information” sources such as the WWW (not physical simulation)Motivating Applications:
• Homeland security• Computer security (DOE emphasis)• Biological networks, etc.
Primary HPC Implication: Any partitioning is “bad”Primary HPC Implication: Any partitioning is “bad”
“One of the interesting ramifications of the fact that the PageRank calculation converges rapidly is that the web is an expander-like graph”
Page, Brin, Motwani,Winograd 1999
From UCSD ‘08
Broder, et al. ‘00
We Are Developing The MultiThreaded Graph Library
• Enables multithreaded graph algorithms• Based upon community standard (Boost Graph Library)
• Abstracts data structures and other application specifics• Hide some shared memory issues• Preserves good multithreaded performance
MTGL
ADAPTER
MTGL
MTGLC
C
S-T connectivity scaling (MTA-2) SSSP scaling (MTA-2)
MTA-2 Processors MTA-2 Processors
Sol
ve ti
me
(sec
)
Sol
ve ti
me
(sec
)
Initial Algorithmic Impacts of MTGL on XMT Are Promising
• S-T Connectivity
– Gathering evidence for 2005 prediction
– This plot show results for ≤ 2 billion edges
– Even with Threadstorm 2.0 limitations, XMT predictions look good
• Connected Components
– Simple SV is fast, but hot-spots
– Multilevel Kahan algorithm scales (but XMT data incomplete)
# XMT Processors
Tim
e (s
)
MTGLShiloach-Vishkin algorithm
32000p Blue Gene\L
MTGL/MTA 10p
Tim
e (s
)
# Edges
MTGLKahan’s algorithm
MTGL/XMT 10p
The “PageRank Derby”
• Ranking of data is a key operation– Which terabytes of some petabytes of data are the most interesting?– Which gigabytes of those terabytes? Etc..
• We have chosen PageRank as a candidate kernel ranking operation (though there are others)
• We wish to understand computational tradeoffs for various architectures and various datasets
– XMT, XT4, Niagara, Netezza, Hadoop, etc.
• We simulate real data with “R-MAT” graphs (Faloutsos, et al.)– No previous results for traditional HPC (distributed memory)– Two of Sandia’s top distributed memory people have gotten some.
“R-MAT” (Recursive MATrix Decomposition)
• Think of dropping a marble through a sequence of plastic trays with holes in them– Pick an initiator with k=pq holes– The i’th level has k^i holes– The marble goes through each hole with
some probability (normalized to 1.0 over all holes)
– The bottom level has N cells (1x1) and is the adjacency matrix
• The probabilities determine the nature of the graph– All equal generates Erdos-Renyi graphs– Unbalanced probabilities can lead to
inverse power-law graphs.
0.57 0.19
0.19 0.05
0.57 0.190.19 0.05
0.57 0.190.19 0.05
0.57 0.190.19 0.05
0.57 0.190.19 0.05
PageRank’s Kernel Operation
• Vertices “vote” by contributing their
current rank to their neighbors (in
proportion)
• For example, supposing that all
current ranks are 1.0:
– u contributes 0.5 to x
– v contributes 0.33.. To x
– w contributes 0.5 to x
• This operation, done over all edges,
dominates the running time of
PageRank
vu w
x
Attempt 4: Remove The Hot Spots, Auto-Parallelize
Load balanced and no hot spot! Extra memory for in-adjacencies
“CANAL” Output – Serial Inner Loop
Can still work well in cases with enough
work – even with high degree vertices.
Less total work than previous (nodep)
Serial inner loop is a scalability risk
Current Performance Results
Environment variable throttles
back the number of streams for 128P scaling
The MTGL Is Having An Interdisciplinary Impact
• Algorithms/architectures/visualization integration
– Sandia architects profiled MTGL to predict performance on XMT
– Titan visualization framework uses MTGL
– Qthreads/MTGL → X-caliber driver application
• Scalable facility location on MTA-2
– Based on expertise gained in EPA sensor placement WFO project
– Applications to community detection, sensor placement, …
MTGL/Qthreads
PageRank On Niagara
Sec
onds
Threads
Impact of HPC Informatics Activities
• LDRD
– The Networks Grand Challenge LDRD is building on MTGL’s success
• Industry
– 2005 WFO project helped justify the Cray XMT
• Scholarly community
– 3 algorithms track papers 1st MTAAP (2007)
– Keynote talk at 2nd MTAAP (2008)
– IEEE CiSE Special Issue on Combinatorial Computing
– DIMACS shortest paths challenge
– Indiana University collaboration: Parallel Processing Letters, BGL refactor
Acknowledgements
MultiThreading BackgroundSimon Kahan (Google (formerly Cray))Petr Konecny (Google (formerly Cray))
MultiThreading/Distributed Memory ComparisonsKristyn Maschhoff (Cray)Bruce Hendrickson (1415)Douglas Gregor (Indiana U.)Andrew Lumsdaine (Indiana U.)
MTGL Algorithm Design and DevelopmentVitus Leung (1415)Kamesh Madduri (Georgia Tech.)William McLendon (1423)Cynthia Phillips (1412)
MTGL IntegrationBrian Wylie (1424)Kyle Wheeler (Notre Dame)Brian Barrett (1422)