Managed by UT-Battelle for the U. S. Department of Energy
Oak Ridge National Laboratory Computing and Computational Sciences
Computer Science and Mathematics
Power Measurement for High Performance Computing:
State of the Art
July 28, 2011
Chung-Hsing Hsu Steve Poole
PMP 2011
Managed by UT-Battelle for the U. S. Department of Energy
• Power utilization is a major concern.
• You cannot improve if you cannot measure.
• You cannot measure if you cannot monitor.
• More power-monitoring devices are available.
• The measurement methods are standardized.
• Facility & IT people will work together.
Observations
Managed by UT-Battelle for the U. S. Department of Energy
• The gap between common idling assumptions and real world conditions [Brill 2010]: – Assume: power draw tracks IT work (network traffic). – Reality: significant variation in IT work but flat power
draws at UPS.
• The limits of power modeling based on extrapolation [Hackenberg 2010, Davis 2011]: – Reality: inter-node variability in power draw. – Reality: method valid for load-balanced IT work.
Why Work Together
Managed by UT-Battelle for the U. S. Department of Energy
• Power measurement methods from the power system to the IT system: – Classified based on measurement domains. – Hierarchical domain structure. – Simultaneous measurements.
• Hardware-based methods: – Stand-alone “meters” and Integrated “sensors”. – Drivers and Interfaces.
• The real challenge is in the real-time analysis of massive sensor data.
This Survey
Managed by UT-Battelle for the U. S. Department of Energy
• A method for PUE calculation:
Simultaneous Measurements
Managed by UT-Battelle for the U. S. Department of Energy
• A site.
• An HPC facility.
• An HPC machine.
• A cabinet (or a server rack).
• A compute node (or a server).
• A multi-core processor.
• A processor core.
Seven Measurement Domains
Managed by UT-Battelle for the U. S. Department of Energy
• ORNL Primary 161-kV substation and associated lines:
Domain 1: A Site
Source: TVA, Environmental Assessment 2005.
Managed by UT-Battelle for the U. S. Department of Energy
• 100 of 567 Eaton PowerNet meters for CSB:
CSB
Domain 2: An HPC Facility
Source: J. Rogers, CUG 2009.
Managed by UT-Battelle for the U. S. Department of Energy
• Other options: wireless monitoring.
Domain 2: An HPC Facility
Source: 42U.com.
Managed by UT-Battelle for the U. S. Department of Energy
• The Jaguar Cray XT5:
Domain 3: An HPC Machine
Source: J. Rogers, CUG 2009.
Managed by UT-Battelle for the U. S. Department of Energy
• 3 meters for monitoring Jaguar XT5.
Domain 3: An HPC Machine
Source: Wenning et al., FEMP 2010.
Managed by UT-Battelle for the U. S. Department of Energy
• Monitor a 26-hour HPL run:
Domain 3: An HPC Machine
Source: J. Rogers, CUG 2009.
Managed by UT-Battelle for the U. S. Department of Energy
• Aggregate over all cabinet measurements:
Domain 4: A Cabinet
Source: Server Technology website.
Managed by UT-Battelle for the U. S. Department of Energy
• Inline meters are the most popular, but IPMI-compliant servers are gaining attention:
Domain 5: A Compute Node
Watts Up? PRO
Yokogawa WT210
Managed by UT-Battelle for the U. S. Department of Energy
Domain 6: A Processor
PowerPack [Ge 2005]: Insert resistors for the PSU wires
PowerMon2 [Bedard 2010]: Plug into the PSU
Cray XT3/4/5 [Laros 2009]: Interrogate the VRMs from L0
L0 VRM
Managed by UT-Battelle for the U. S. Department of Energy
• The issue of accounting: – Shared and distributed.
Domain 7: A Processor Core
Source: M. Schuette, LostCircuits 2008
Intel Core i7 CPU
Managed by UT-Battelle for the U. S. Department of Energy
• Today we have more options to use integrated sensors for each domain. The measurement process is also slowly being standardized.
• The real challenge is in the real-time analysis of massive sensor data: – The challenge of scale. – The challenge of accounting:
• For end users • For power loss
To Sum Up
Managed by UT-Battelle for the U. S. Department of Energy
• Accounting for power loss:
The Challenge of Accounting
Source: P. Scheihing, DOE EERE
Managed by UT-Battelle for the U. S. Department of Energy
Thank You! Acknowledgments:
This work was supported by the United States Department of Defense and used resources of the Extreme Scale Systems Center at Oak Ridge National Laboratory. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725.