Post on 27-Dec-2015
transcript
Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links
Ernie Chan
Authors
Ernie Chan Robert van de Geijn
Department of Computer Sciences
The University of Texas at Austin
William Gropp Rajeev Thakur
Mathematics and Computer Science Division
Argonne National Laboratory
Testbed Architecture
IBM Blue Gene/L3D torus point-to-point interconnect networkOne rack
1024 dual-processor nodes Two 8 x 8 x 8 midplanes
Special feature to send simultaneously Use multiple calls to MPI_Isend
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Model of Parallel Computation
Target Architectures Distributed-memory parallel architectures
Indexingp computational nodes Indexed 0 … p - 1
Logically Fully ConnectedA node can send directly to any other node
Model of Parallel Computation
Old Model of Communicating Between NodesUnidirectional sending or receiving
Model of Parallel Computation
Old Model of Communicating Between NodesSimultaneous sending and receiving
Model of Parallel Computation
Communicating Between NodesA node can send or receive with 2N other
nodes simultaneously along its 2N different links
Model of Parallel Computation
Communicating Between NodesCannot perform bidirectional exchange on any
link while sending or receiving simultaneously with multiple nodes
Model of Parallel Computation
Cost of Communication
α + nβ
α: startup time, latencyn: number of bytes to communicateβ: per data transmission time, bandwidth
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Sending Simultaneously
Old Cost of Communication with Sends to Multiple NodesCost to send to m separate nodes
(α + nβ) m
Sending Simultaneously
New Cost of Communication with Simultaneous Sends
(α + nβ) m
can be replaced with
(α + nβ) + (α + nβ) (m - 1)
Sending Simultaneously
New Cost of Communication with Simultaneous Sends
(α + nβ) m
can be replaced with
(α + nβ) + (α + nβ) (m - 1) τ
Cost of one send Cost of extra sends
Sending Simultaneously
New Cost of Communication with Simultaneous Sends
(α + nβ) m
can be replaced with
(α + nβ) + (α + nβ) (m - 1) τ
Cost of one send Cost of extra sends
0 ≤ τ ≤ 1
Sending Simultaneously
Benchmarking Sending SimultaneouslyLogarithmic-Logarithmic timing graphsMidplane – 512 nodesSending simultaneously with 1 – 6 neighbors8 bytes – 4 MB
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Generalized Algorithms
Short-Vector AlgorithmsMinimum-Spanning Tree
Long-Vector AlgorithmsBucket Algorithm
Generalized Algorithms
Minimum-Spanning TreeDisjointed partitions on N-dimensional mesh
5
9 11
3
7
8
0
10
12 13 15
1
4
14
6
2
Generalized Algorithms
Minimum-Spanning TreeDivide dimensions by a decrementing counter
from N+1
5
9 11
3
7
8
0
10
12 13 15
1
4
14
6
2
Generalized Algorithms
Minimum-Spanning TreeNow divide into 2N+1 partitions
5
9 11
3
7
8
0
10
12 13 15
1
4
14
6
2
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Outline
Testbed Architecture Model of Parallel Computation Sending Simultaneously Collective Communication Generalized Algorithms Performance Results Conclusion
Conclusion
IBM Blue Gene/L supports functionality of sending simultaneouslyBenchmarking along with model checking
verifies this claim New generalized algorithms show clear
performance gains