+ All Categories
Home > Documents > Oak Ridge National Laboratory - Wayne State Universityweisong/chung-hsing.pdf · Managed by...

Oak Ridge National Laboratory - Wayne State Universityweisong/chung-hsing.pdf · Managed by...

Date post: 27-Aug-2018
Category:
Upload: phungdang
View: 216 times
Download: 0 times
Share this document with a friend
20
Managed by UT-Battelle for the U. S. Department of Energy Oak Ridge National Laboratory Computing and Computational Sciences Computer Science and Mathematics Power Measurement for High Performance Computing: State of the Art July 28, 2011 Chung-Hsing Hsu Steve Poole PMP 2011
Transcript

Managed by UT-Battelle for the U. S. Department of Energy

Oak Ridge National Laboratory Computing and Computational Sciences

Computer Science and Mathematics

Power Measurement for High Performance Computing:

State of the Art

July 28, 2011

Chung-Hsing Hsu Steve Poole

PMP 2011

Managed by UT-Battelle for the U. S. Department of Energy

•  Power utilization is a major concern.

•  You cannot improve if you cannot measure.

•  You cannot measure if you cannot monitor.

•  More power-monitoring devices are available.

•  The measurement methods are standardized.

•  Facility & IT people will work together.

Observations

Managed by UT-Battelle for the U. S. Department of Energy

•  The gap between common idling assumptions and real world conditions [Brill 2010]: –  Assume: power draw tracks IT work (network traffic). –  Reality: significant variation in IT work but flat power

draws at UPS.

•  The limits of power modeling based on extrapolation [Hackenberg 2010, Davis 2011]: –  Reality: inter-node variability in power draw. –  Reality: method valid for load-balanced IT work.

Why Work Together

Managed by UT-Battelle for the U. S. Department of Energy

•  Power measurement methods from the power system to the IT system: –  Classified based on measurement domains. –  Hierarchical domain structure. –  Simultaneous measurements.

•  Hardware-based methods: –  Stand-alone “meters” and Integrated “sensors”. –  Drivers and Interfaces.

•  The real challenge is in the real-time analysis of massive sensor data.

This Survey

Managed by UT-Battelle for the U. S. Department of Energy

Hierarchical Measurement Domains

Managed by UT-Battelle for the U. S. Department of Energy

•  A method for PUE calculation:

Simultaneous Measurements

Managed by UT-Battelle for the U. S. Department of Energy

•  A site.

•  An HPC facility.

•  An HPC machine.

•  A cabinet (or a server rack).

•  A compute node (or a server).

•  A multi-core processor.

•  A processor core.

Seven Measurement Domains

Managed by UT-Battelle for the U. S. Department of Energy

•  ORNL Primary 161-kV substation and associated lines:

Domain 1: A Site

Source: TVA, Environmental Assessment 2005.

Managed by UT-Battelle for the U. S. Department of Energy

•  100 of 567 Eaton PowerNet meters for CSB:

CSB

Domain 2: An HPC Facility

Source: J. Rogers, CUG 2009.

Managed by UT-Battelle for the U. S. Department of Energy

•  Other options: wireless monitoring.

Domain 2: An HPC Facility

Source: 42U.com.

Managed by UT-Battelle for the U. S. Department of Energy

•  The Jaguar Cray XT5:

Domain 3: An HPC Machine

Source: J. Rogers, CUG 2009.

Managed by UT-Battelle for the U. S. Department of Energy

•  3 meters for monitoring Jaguar XT5.

Domain 3: An HPC Machine

Source: Wenning et al., FEMP 2010.

Managed by UT-Battelle for the U. S. Department of Energy

•  Monitor a 26-hour HPL run:

Domain 3: An HPC Machine

Source: J. Rogers, CUG 2009.

Managed by UT-Battelle for the U. S. Department of Energy

•  Aggregate over all cabinet measurements:

Domain 4: A Cabinet

Source: Server Technology website.

Managed by UT-Battelle for the U. S. Department of Energy

•  Inline meters are the most popular, but IPMI-compliant servers are gaining attention:

Domain 5: A Compute Node

Watts Up? PRO

Yokogawa WT210

Managed by UT-Battelle for the U. S. Department of Energy

Domain 6: A Processor

PowerPack [Ge 2005]: Insert resistors for the PSU wires

PowerMon2 [Bedard 2010]: Plug into the PSU

Cray XT3/4/5 [Laros 2009]: Interrogate the VRMs from L0

L0 VRM

Managed by UT-Battelle for the U. S. Department of Energy

•  The issue of accounting: –  Shared and distributed.

Domain 7: A Processor Core

Source: M. Schuette, LostCircuits 2008

Intel Core i7 CPU

Managed by UT-Battelle for the U. S. Department of Energy

•  Today we have more options to use integrated sensors for each domain. The measurement process is also slowly being standardized.

•  The real challenge is in the real-time analysis of massive sensor data: –  The challenge of scale. –  The challenge of accounting:

•  For end users •  For power loss

To Sum Up

Managed by UT-Battelle for the U. S. Department of Energy

•  Accounting for power loss:

The Challenge of Accounting

Source: P. Scheihing, DOE EERE

Managed by UT-Battelle for the U. S. Department of Energy

Thank You! Acknowledgments:

This work was supported by the United States Department of Defense and used resources of the Extreme Scale Systems Center at Oak Ridge National Laboratory. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725.


Recommended