Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | lambert-walton |
View: | 218 times |
Download: | 2 times |
1
Parallel Sparse Operations in Matlab: Exploring Large Graphs
John R. GilbertUniversity of California at Santa Barbara
Aydin Buluc (UCSB)Brad McRae (NCEAS)Steve Reinhardt (Interactive Supercomputing)Viral Shah (ISC & UCSB)
with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL)
Support: DOE, NSF, DARPA, SGI, ISC
2
3D Spectral Coordinates
3
2D Histogram: RMAT Graph
4
Strongly Connected Components
5
Social Network Analysis in Matlab: 1993
Co-author graph from 1993
Householdersymposium
6
Combinatorial Scientific Computing
Emerging large scale, high-performance applications:
• Web search and information retrieval
• Knowledge discovery
• Computational biology
• Dynamical systems
• Machine learning
• Bioinformatics
• Sparse matrix methods
• Geometric modeling
• . . .
How will combinatorial methods be used by nonexperts?
7
Outline
• Infrastructure: Array-based sparse graph computation
• An application: Computational ecology
• Some nuts and bolts: Sparse matrix multiplication
8
Matlab*P
A = rand(4000*p, 4000*p);
x = randn(4000*p, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
9
MATLAB®
Star-P Architecture
Ordinary Matlab variables
Star-P
client manager
server manager
package manager
processor #0
processor #n-1
processor #1
processor #2
processor #3
. . .
ScaLAPACK
FFTW
FPGA interface
matrix manager Distributed matrices
sort
dense/sparse
UPC user code
MPI user code
10
P0
P1
P2
Pn
5941 532631
23 131
Each processor stores local vertices & edges in a compressed row structure.
Has been scaled to >108 vertices, >109 edges in interactive session.
Distributed Sparse Array Structure
1
2 326
53
41
31
59
11
Sparse Array and Matrix Operations
• dsparse layout, same semantics as ordinary full & sparse
• Matrix arithmetic: +, max, sum, etc.
• matrix * matrix and matrix * vector
• Matrix indexing and concatenation
A (1:3, [4 5 2]) = [ B(:, J) C ] ;
• Linear solvers: x = A \ b; using SuperLU (MPI)
• Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)
12
Large-Scale Graph Algorithms
• Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation.
• Time to turn the relationship around!
• Represent a graph as a sparse adjacency matrix.
• A sparse matrix language is a good start on primitives for computing with graphs.
• Leverage the mature techniques and tools of high-performance numerical computation.
13
Sparse Adjacency Matrix and Graph
• Adjacency matrix: sparse array w/ nonzeros for graph edges
• Storage-efficient implementation from sparse data structures
x ATx
1 2
3
4 7
6
5
AT
14
Breadth-First Search: sparse mat * vec
x ATx
1 2
3
4 7
6
5
AT
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
15
Breadth-First Search: sparse mat * vec
x ATx
1 2
3
4 7
6
5
AT
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
16
Breadth-First Search: sparse mat * vec
AT
1 2
3
4 7
6
5
(AT)2x
x ATx
• Multiply by adjacency matrix step to neighbor vertices
• Work-efficient implementation from sparse data structures
17
• Many tight clusters, loosely interconnected
• Input data is edge triples < i, j, label(i,j) >
• Vertices and edges permuted randomly
HPCS Graph Clustering Benchmark
Fine-grained, irregular data access
Searching and clustering
18
Clustering by Breadth-First Search
% Grow each seed to vertices
% reached by at least k
% paths of length 1 or 2
C = sparse(seeds, 1:ns, 1, n, ns);
C = A * C;
C = C + A * C;
C = C >= k;
• Grow local clusters from many seeds in parallel
• Breadth-first search by sparse matrix * matrix
• Cluster vertices connected by many short paths
19
Toolbox for Graph Analysis and Pattern Discovery
Layer 1: Graph Theoretic Tools
• Graph operations
• Global structure of graphs
• Graph partitioning and clustering
• Graph generators
• Visualization and graphics
• Scan and combining operations
• Utilities
20
Typical Application Stack
Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)
Graph Analysis & PD Toolbox
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Preconditioned Iterative Methods
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Applications
Computational ecology, CFD, data exploration
21
Landscape Connnectivity Modeling
• Landscape type and features facilitate or impede movement of members of a species
• Different species have different criteria, scales, etc.
• Habitat quality, gene flow, population stability
• Corridor identification, conservation planning
22
Pumas in Southern California
Joshua Tree N.P.
L.A.Palm Springs
Habitat quality model
23
Predicting Gene Flow with Resistive Networks
Circuit model predictions:
N = 100 m = 0.01N = 100 m = 0.01Genetic vs. geographic distance:
24
Early Experience with Real Genetic Data
• Good results with wolverines, mahogany, pumas
• Matlab implementation
• Needed:
– Finer resolution
– Larger landscapes
– Faster interaction
5km resolution(too coarse)
25
Circuitscape: Combinatorics and Numerics
• Model landscape (ideally at 100m resolution for pumas).
• Initial grid models connections to 4 or 8 neighbors.
• Partition landscape into connected components via GAPDT
• Use GAPDT to contract habitats into single graph nodes.
• Compute resistance for pairs of habitats .
• Direct methods are too slow for largest problems.
• Use iterative solvers via Star-P:Hypre (PCG+AMG)
26
Parallel Circuitscape Results
• Pumas in southern California:
– 12 million nodes
– Under 1 hour (16 processors)
– Original code took 3 days at coarser resolution
• Targeting much larger problems:
– Yellowstone-to-Yukon corridorFigures courtesy of Brad McRae, NCEAS
27
Sparse Matrix times Sparse Matrix
• A primitive in many array-based graph algorithms:
– Parallel breadth-first search
– Shortest paths
– Graph contraction
– Subgraph / submatrix indexing
– Etc.
• Graphs are often not mesh-like, i.e. geometric locality and good separators.
• Often do not want to optimize for one repeated operation, as in matvec for iterative methods
28
Sparse Matrix times Sparse Matrix
• Current work:
– Parallel algorithms with 2D data layout
– Sequential and parallel hypersparse algorithms
– Matrices over semirings
29
* =I
J
A(I,K)
K
K
B(K,J)
C(I,J)
ParSpGEMM
C(I,J) += A(I,K)*B(K,J) • Based on SUMMA
• Simple for non-square matrices, etc.
30
How Sparse? HyperSparse !
p blocks
p
nnz(j) = c
0p
cnnz(j) =
Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful.
31
SparseDComp Data Structure
• “Doubly compressed” data structure
• Maintains both DCSC and DCSR
• C = A*B needs only A.DCSC and B.DCSR
• 4*nnz values communicated for A*B in the worst case (though we usually get away with much less)
32
Sequential Operation Counts
• Matlab: O(n+nnz(B)+f)
• SpGEMM: O(nzc(A)+nzr(B)+f*logk)
Break-even point
Required non- zero operations (flops)
Number of columns of A containing at least one non-zero
33
Parallel Timings
• 16-processor Opteron, hypertransport, 64 GB memory
• R-MAT * R-MAT
• n = 220
• nnz = {8, 4, 2, 1, .5} * 220
time vs n/nnz, log-log plot
34
Matrices over Semirings
• Matrix multiplication C = AB (or matrix/vector):
Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j
• Replace scalar operations and + by
: associative, distributes over , identity 1
: associative, commutative, identity 0 annihilates under
• Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j
• Examples: (,+) ; (and,or) ; (+,min) ; . . .
• Same data reference pattern and control flow
35
Remarks
• Tools for combinatorial methods built on parallel
sparse matrix infrastructure
• Easy-to-use interactive programming environment
– Rapid prototyping tool for algorithm development
– Interactive exploration and visualization of data
• Sparse matrix * sparse matrix is a key primitive
• Matrices over semirings like (min,+) as well as (+,*)