+ All Categories
Home > Documents > Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing...

Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing...

Date post: 29-Dec-2015
Category:
Upload: caitlin-bailey
View: 218 times
Download: 2 times
Share this document with a friend
23
Department of Computer Science at Florida State Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design • Background Innovations in interconnect topology and routing design is essential for future generation ultra-scale supercomputers. Current methods for evaluating topology and routing design are not ideal.
Transcript
Page 1: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI: A Performance Metric for Assessing Interconnect topology and routing design

• Background‒ Innovations in interconnect topology and routing design

is essential for future generation ultra-scale supercomputers.

‒ Current methods for evaluating topology and routing design are not ideal.

Page 2: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Current methods for evaluating interconnect topology and routing design

• Topology and routing are evaluated separately• Topology

‒ Diameter, bisection bandwidth, nodal degree, etc‒ Not directly related to application level performance

• Routing with topology‒ Simulation to get throughput and packet latency‒ Limited network sizes and numbers of scenarios‒ Simulation sees the tree, but not the forest.

• Two kinds of metrics: simple metrics that do not directly relate to performance and detailed metrics that are too expensive to obtain.

Page 3: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Impact of evaluation methods

• Evaluation methods set the design optimization objective• Recently proposals (dragonfly, jellyfish) all have large

bisection bandwidth and support certain traffic patterns effectively.

– Think of how the designs are justified!!‒ Excellently designs with traditional metrics.‒ Are these designs good for typical HPC workloads?

‒ There is no metric that can be used to compare across different topology and routing designs for HPC workloads.

Page 4: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

What kind of metrics are we looking for?

• Desirable properties:o Reflect overall network performanceo Simple enough that it can be computed quickly – we do

not want to do simulation.

• A related attempt -- effective bisection bandwidth: summarize network performance by the average performance for all bisection communication patterns.

‒ Is this metric reflective?

Page 5: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI: LANL-FSU throughput indices

• A metric for throughput performance• High level ideas

− Use modeling the obtain the average throughput for one communication pattern.

− Find the set of representative communication patterns to be used in the metrics

‒ Summary the overall network performance using the average throughput performance for a large number of communication patterns common to HPC applications

Page 6: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI: LANL-FSU throughput indices

• High level ideas‒ Once the patterns to be included is determined, LFTI

can be derived from most topology and routing specifications without detailed simulation.

• If an interconnect can achieve high overall performance for many common HPC patterns, it is likely that it will provide high performance for HPC workloads.

− Unlike some other metrics, LFTI is much harder to cheat.

Page 7: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI: LANL-FSU throughput index

• LFTI is the summary of the throughput of an interconnect for a large number of common communication patterns in HPC applications.

‒ For each communication pattern, a metric (sustained throughput) is used that is closely related to the application level performance for that pattern to quantify the performance of the interconnect.

‒ For a class of patterns (e.g. 2DNN patterns), the expected sustained throughput is used to quantify the performance.

‒ LFTI is the aggregate of the performance of many classes of patterns.

Page 8: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Computing the sustained throughput for a pattern (single path routing)

•Compute the link load (number of flows going through each link)•The sustained throughput for each flow is its share of the throughput on the bottleneck link or Max-Min fairness.•The sustained throughput for the pattern is the aggregate throughput of all flows in the pattern.

‒ Normalized with per flow throughput divided by the input link bandwidth.

Page 9: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Computing the throughput index for a class of patterns

•A throughput index for a class of patterns (e.g. 2DNN patterns) is the expected sustained throughput across all patterns of that class.

‒ The index can be obtained by randomly sampling of a large number of patterns (e.g. 10000 patterns)

‒ May apply some statistical method to obtain the index with confidence without sampling a large number of patterns.

Page 10: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Communication Patterns in LFTI indices

‒ Patterns with history‒ All to all, ‒ Bisect – effective bisection bandwidth

‒ Low-dimensional stencil patterns• 2DNN, 2DNN_DIAG, 3DNN, 3DNN_DIAG

‒ Random patterns – for applications with unstructure mesh, adaptive mesh refinement methods• RANDOM 50, RANDOM N50

‒ Commonly used sub-communication patterns• Permutation, shift

Page 11: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI categories

• Trying to reflect how the machine is used

• Whole system direct map LFTI

• Whole system random map LFTI

• Job allocation trace-based LFTI• Largest job based on some job traces

Page 12: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Evaluating interconnect using LFTI

Fat-tree (ftree), dragonfly (dfly), hypercube(hcube)6D torus (6D), 3D torus (3D), jellyfish (jfish) of 25K-35K nodes –the size of the next generation supercomputer.

Page 13: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Throughput index and communication time

Page 14: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Whole system direct map LFTI

Page 15: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Whole system direct map LFTI

Page 16: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Whole system random map LFTI

Page 17: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Whole system random map LFTI

Page 18: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Job allocation based

Page 19: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Job allocation based

Page 20: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI summary

Page 21: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

Conclusion

• Traditional performance metrics such as bisection bandwidth and effective bisection bandwidth are not indicative for interconnect’s performance.

• Optimizing for BB and EBB may not lead to high performance interconnects.

• LFTI is indicative of application level performance, yet can be derived rapidly without detailed simulation.‒ It is a much better metric than the current metrics.

Page 22: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI weakness

• Communication patterns and weights─ Heavily concentrating on simulation types of

applications

─ Not much for data intensive applications

─ Calls for performance characterization work─To find the truly “representative” workload to be

included in the index.

Page 23: Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.

Department of Computer Science at Florida StateDepartment of Computer Science at Florida State

LFTI weakness

• LFTI relies on fast modeling of throughput performance from each communication patternso Depending on the routing algorithm, the modeling can

be problematic• Indirect adaptive routing is an example – no effective

model method than simulation.

o Needs to develop new models for all existing and future routing schemes, and whatever can affect the “sustained throughput”


Recommended