+ All Categories
Home > Documents > Blue Gene Performance Data Repository -...

Blue Gene Performance Data Repository -...

Date post: 12-Mar-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
22
© 2013 IBM Corporation Blue Gene Performance Data Repository I-Hsin Chung [email protected]
Transcript

© 2013 IBM Corporation

Blue Gene Performance Data Repository

I-Hsin Chung

[email protected]

© 2013 IBM Corporation

Introduction

! Goal

– To characterize the applications on existing systems

– To understand the system resource usage

•  To provide inputs for next generation system design

! Objective

– To collect performance data and store them into a relational database

•  With minimal overhead to user and job execution

– Uniform storage format

•  To support queries and presentations

•  To make comparisons cross applications or platforms

© 2013 IBM Corporation

Application Performance data/trace

Application

Job execution

Performance data/trace

instrumentation

collection

© 2013 IBM Corporation

Design consideration ! Low overhead

– User may simply use modified MPI compiler wrapper – Statically linked light-weight library inserted – By default the performance tool generates limited data

•  Performance data from fixed number of ranks •  No trace data

! Usability – Performance data stored into plain text file with SQL/CSV format

•  Policy compliance •  Paired “undo” commands in separate files

– To help user identify jobs later •  Meta data is stored (e.g., user comments, tag)

– To further assist users, user interface prototypes are developed to provide •  Quick overview – web interface •  Easy customization - spreadsheet

4

© 2013 IBM Corporation 5

Performance Data Repository

DB2 bgqsn2

grotius

Blue Gene Compute nodes

mgmt

perf. data

submit Instrumented

binary

Mysql bgqfen6

grotius

© 2013 IBM Corporation 6

How to use it?

! Link the application with performance tool – Use modified version of MPI compiler wrapper – Link the profiler libraries (e.g., -lmpihpmperf or -lpomprofperf)

manually – Supports MPI/Hardware counter, OpenMP profiler

! Run the application

! Query the database after the job is done – SQL statements

© 2013 IBM Corporation 7

Run the application ! (Optional) set the performance database information

– Example on grotius – setenv DBHOST 172.16.3.1 (IP address seen by the compute nodes) – setenv DBPORT 0 (default is 3306 for mysql) – setenv DBUSER root – setenv DBNAME bgresult

! (Optional) User comment/tag – DBNOTE (string), DBTAG (integer)

! (Optional) Control the performance data input to the database !  To have the performance data NOT input into the database

– set environment variable DBINPUT to No: •  setenv DBINPUT No •  export DBINPUT=No

!  To undo the performance data input after the job is run –  identify the job ID (should printout when the job runs) – at the mysql command prompt:

•  mysql> source perfdb_undo_job_xxx_y.sql

© 2013 IBM Corporation 8

Mysql Relational Database

hpc_run hpc_task

hpc_string

hpc_region

hpc_mpi_ tables

hpc_hpm_ tables

hpc_omp_ table

!  Arrows represent the query order

!  Starting with JobID

© 2013 IBM Corporation 9

Client�! Platform: – Windows (Excel 2003) – Mac (Excel 2010) * Different version for each platform

! Environment setup: – ODBC driver (for mysql)

! Develop with: – Visual Basic for Application (VBA)

! Event-driven – User click GUI to trigger the connection of

database, retrieval of data, presentation of chart, etc.

9

Design – Excel Version�

Database

ODBC driver

0

1

2

3

4

5

6

7

8

cm1 xhpl bt cg ep ft

derived_GFlops

derived_ld_bytes_per_cycle

derived_st_bytes_per_cycle

Excel

Normal Spread Sheet

Mac: Format converter�

Mac: Code partially rewritten�

© 2013 IBM Corporation 10

Client�! Platform: – IE / Firefox / Chrome / Safari (Mac)

! Develop with: – HTML (skeletal structure) – CSS (outer shell) – Javascript + dojo (action and communication) – CGI with Perl (database connection with SQL,

data management and send back to browser)

! Event-driven

! Security – Isolate the database from client side

10

Design – Webpage Version�

Database

0

1

2

3

4

5

6

7

8

cm1 xhpl bt cg ep ft

derived_GFlops

derived_ld_bytes_per_cycle

derived_st_bytes_per_cycle

Server

Browser

CGI�

Javascript�

Firewall� HTTP�Filters� HTML�

Filtered Data�

© 2013 IBM Corporation 11 11

Tool (Step 1 – Jobs Selection)�

User-defined DSN �

Save or load constraint set�

Select job from job list�

Selected job list or manually input�

Find jobs according to some constraints (non: no constraint) ex. User, time, environment variables�

DBTAG DBNOTE

© 2013 IBM Corporation 12 12

Tool (Step 2 – Component/Issue Selection)�

Presentation Scope�

Component� Issue�

Selected component/issue list�

Add

Del

Histogram�

© 2013 IBM Corporation 13 13

Tool (Step 3 – Chart)��

change y-axis into log/linear scale�

Check raw data�

Change the type of chart (ex. bar,

column…)�

Change the order of series of

chart�

Show error range (for avg)�

© 2013 IBM Corporation

Rosetta - CPU

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Rank 0 Rank n

Commited Instructions

Commited AXU uCode sub-ops

Commited XU uCode sub-ops

Flushed Instructions and Operation Cycles

FXU Dep Stalls

AXU Dep Stalls

Thread Arbitration Stalls

IU empty

© 2013 IBM Corporation

Rosetta – instruction & memory

Rank%0% Rank%n%DDR%bandwidth% 0.001% 0.005%Heap%Usage% 330100736% 666894336%Stack%Usage% 20351% 20351%Gflops% 0% 1.813%IPC% 0.3869% 0.2072%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Rank 0 Rank n

integer ratio

float ratio

SIMD

non-SIMD

L1

L1P

L2

DDR

© 2013 IBM Corporation

Rosetta – MPI comm.

Rank%0%Call%count% Data%size% Time%

MPI%Comm%P2P% 217% 868% 700.658%CollecLve% 7% 84% 0.034%

total%comm% 700.692%total%Lme% 711.03%

Rank%n%Call%count% Data%size% Time%

P2P% 7% 28% 0.022%CollecLve% 7% 84% 0.004%

total%comm% 0.026%total%Lme% 711.03%

© 2013 IBM Corporation

Nektar - CPU

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Rank 0 Rank n

Commited Instructions

Commited AXU uCode sub-ops

Commited XU uCode sub-ops

Flushed Instructions and Operation Cycles

FXU Dep Stalls

AXU Dep Stalls

Thread Arbitration Stalls

IU empty

© 2013 IBM Corporation

Nektar – instruction & memory

Rank%0% Rank%n%Gflops% 2.046% 1.813%IPC% 0.2029% 0.2072%DDR%bandwidth% 2.152% 2.055%Heap%Usage% 888291328% 286769152%Stack%Usage% 20511% 21983%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Rank 0 Rank n

integer ratio

float ratio

SIMD

non-SIMD

L1

L1P

L2

DDR

© 2013 IBM Corporation

Nektar – MPI comm.

Rank%0%Call%count% Data%size% Time%

MPI%Comm% P2P% 157605% 34012073.2% 0.419%CollecLve% 29680% 47930094% 3.773%

total%comm% 5.571%total%Lme% 45.621%

Rank%n%Call%count% Data%size% Time%

P2P% 49833% 15615488.70% 0.247%CollecLve% 29680% 40885632.30% 10.372%

total%comm% 10.908%total%Lme% 45.552%

© 2013 IBM Corporation

Simulation

Performance Projection

Simulation method

Hardware Specification

Application Performance data/

trace

© 2013 IBM Corporation

Simulation NPB D 64

0%

50%

100%

150%

200%

250%

300%

350%

400%

bt cg ep ft is lu mg sp

sim

bw/4

bw*4

comp/3.5

comp*3.5

© 2013 IBM Corporation

Conclusion

! Blue Gene Performance Data Repository – Automatically collects performance information from different aspects of

application executions – Stores it into a relational database for subsequent analysis – Is with minimal overhead to the users

! The Performance Data collected – Helps users understand the application performance behavior and the system

hardware usage. – Helps software/hardware co-design for next generation systems.

! The Blue Gene Performance Data Repository is available to Blue Gene/Q users under an open source license

22


Recommended