Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz...

Post on 16-Oct-2020

3 views 0 download

transcript

Partitioning and Partitioning Tools

Tim BarthNASA Ames Research Center

Moffett Field, California 94035-1000 USA

1

Graph/Mesh Partitioning

• Why do it?

• The graph bisection problem

• What are the standard heuristic algorithms?

• What tools are available?

2

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-a

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-b

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-c

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-d

Why do it?

• Efficient utilization of distributed computational resources

– Equidistribution of workload among processors (load balancing)

– Minimized time spend in interprocessor communication

∗ Communication takes time and it’s not always possible to hide

this latency in data tranfer

∗ Cost of communication is often modeled by the linear

relationship for n messages: Cost =∑

n(α+ βmn)

(a) (b)

Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh

with minimized message length.

3-e

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-a

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-b

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-c

Why do it?

• As a strategy for reducing the overall arithmetic complexity ofan algorithm

– Overlapping Schwarz methods

– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring

– Multiscale methods, e.g. agglomeration multigrid

4-d

Why do it?

• Overlapping Schwarz methods

0 20 40 60 80

Schwarz Iterations

-15

-14

-13

-12

-11

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

10101010101010101010101010101010

Nor

m (

Glo

bal)

Res

idua

l

2nd Order Scheme2 Partition, 1 Overlap2 Partition, 2 Overlap2 Partition, 3 Overlap8 Partition, 1 Overlap8 Partition, 2 Overlap8 Partition, 3 Overlap

5

Why do it?

• Overlapping Schwarz methods with subdomain size H, mesh cell

size h and overlap δ

Let A be the discretization matrix and Mas the additive Schwarz

preconditioner. There exists a constant C independent of H and h

such that the condition number κ

κ(M−1as A) ≤ CH−2

(

1 +

(

H

δ

)2)

. (1)

with 2-level coarse space correction

There exists a constant C independent of H and h such that

κ(M−1as A) ≤ C

(

1 +

(

H

δ

))

. (2)

6

Why do it?

• Substructuring

A1 A2

A3 A4

x1

x2

=

b1

b2

, A−1 =

C1 C2

C3 C4

with S = A4 −A3A−11 A2, C1 = A−1

1 +A−11 A2S

−1A3A−11 ,

C2 = −A−11 A2S

−1, C3 = −S−1A3A−11 , C4 = S−1.

κ(M−1SchurA) = C(1 + log(H/δ))

7

Graph Bisection (np hard)

Define a partitioning vector p ∈ Zn which 2-colors the vertices of a graph

p = [+1,−1,−1,+1,+1, ...,+1,−1]T (3)

+1

+1

+1

+1

+1+1

+1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

• Minimize the cut-weight of the weighted graph

• Produce balanced partitions

8

Heuristic Graph Partitioning

Three commonly used partitioning techniques

• Recursive coordinate bisection

• Recursive Cuthill-McKee

• Recusive Spectral bisection

9

Recursive Coordinate Bisection

• Spatial coordinates are sorted along alternating horizontal andvertical directions

• Divisors are found to balance partitions

10

Graph Ordering Cuthill-McKee

Algorithm: Graph ordering, Cuthill-McKee.

Step 1. Find vertex with lowest degree. This is the root vertex.

Step 2. Find all neighboring vertices connecting to the root by incident

edges. Order them by increasing vertex degree. This forms level 1.

Step 3. Form level k by finding all neighboring vertices of level k − 1

which have not been previously ordered. Order these new vertices by

increasing vertex degree.

Step 4. If vertices remain, go to step 3.

11

Graph Ordering Cuthill-McKee

Matrix nonzero pattern

Figure 2: Natural Ordering (left) and Cuthill-McKee ordering (right)

12

Recursive Cuthill-McKee

• The level structure computed in Cuthill-McKee ordering isutilized

• Divisors are found to balance partitions

13

Recursive Spectral Bisection

Motivated by the observation that the cut-weight of a graph is precisely

Wc =1

4pTLp

Algorithm: Spectral Graph Bisection.

Step 1. Calculate the matrix L associated with the Laplacian of the

graph.

Step 2. Calculate the eigenvalues and eigenvectors of L.

Step 3. Order the eigenvalues by magnitude, λ1 ≤ λ2 ≤ λ3...λn.

Step 4. Determine the smallest nonzero eigenvalue, λf and its associated

eigenvector xf (the Fiedler vector).

Step 5. Sort elements of the Fiedler vector.

Step 6. Choose a divisor at the median of the sorted list and 2-color

vertices of the graph which correspond to elements of the Fielder vector

less than or greater than the median value.

14

Recursive Spectral Bisection

15

Multilevel k-way Partitioning

• Utilized successive k-way graph contraction to coarsen graph

• Perform high quality partitioning on coarsened graph

• Prolongate to finer graphs with local interface optimization toimprove cut-weight

16

17

18

Metis, ParMetis

• Extremely fast

• Parallel implementation (requires some initial partitioning)

• Supports weighted graphs by vertices or edges

• Supports incremental load balancing (repartitioning) withminimized data migration

19

20

Zoltan

• Relatively new package under development at Sandia underGPL

• Interfaces with Metis or Jostle

• Documentation suggests that the package will contain most ofthe commonly needed services for parallel scientific codes:partitioning, repartitioning, data migration, etc.

21

Partitioning Tools for SSS?

• Domain specific languages?

– Language for finite element methods

– Language for molecular dynamics

– <Insert your favorite problem domain here>

• Partial or full data dependency specification (analogous toscene graph specification in Java3d).

• Automatic tools for performance enhancement

– Use hardware performance statistics (memory accesspatterns) of previous executions in subsequence compilations

– Runtime data migration

22