+ All Categories
Home > Documents > Characterization of Power and Performance in Data...

Characterization of Power and Performance in Data...

Date post: 22-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
13
Characterization of Power and Performance in Data- Intensive Applications using MapReduce over MPI Michela Taufer 1,7 With the contributions of: Tao Gao 1,2 , Yanfei Guo 3 , Boyu Zhang 1 , Pietro Cicotti 4 , , Yutong Lu 2,5,6 , Josh Davis 1 , Anthony Danalis 7 , Heike Jagode 7 , Jack Dongarra 7 , Sunita Chandrasekaran 1 , Pavan Balaji 3 1 University of Delaware 2 National University of Defense Technology 3 Argonne National Laboratory 4 San Diego Supercomputer Center 5 National Supercomputer Center in Guangzhou 6 Sun Yat-sen University 7 University of Tennessee Knoxville
Transcript
Page 1: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Characterization of Power and Performance in Data-Intensive Applications using MapReduce over MPI

Michela Taufer1,7With the contributions of: Tao Gao1,2, Yanfei Guo3, Boyu Zhang1, Pietro Cicotti4,,

Yutong Lu2,5,6, Josh Davis1, Anthony Danalis7, Heike Jagode7, Jack Dongarra7,

Sunita Chandrasekaran1, Pavan Balaji3

1 University of Delaware2 National University of Defense Technology3 Argonne National Laboratory4 San Diego Supercomputer Center 5 National Supercomputer Center in

Guangzhou6 Sun Yat-sen University 7 University of Tennessee Knoxville

Page 2: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Data Processing on the Cloud• Variations of MapReduce programming model can handle the parallel

job executions, communications, and data movements in data-intensive applications

• Users provide limited components, e.g., map and reduce functions

2

<Hello,1>

map

map

Shuffle

reduce

reduce

HelloWorld

HelloWorld

<world,1>

<Hello,1><world,1>

<Hello, 2>

<World, 2>

Wordcount example:

2

Page 3: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Data Processing: HPC vs. Cloud

• Key differences between Cloud computing and HPC systems disenfranchise the naïve used of Cloud method

disk array

MPI/OpenMP

HPC systemsprocessor

Interconnect

Hadoop/Spark

diskprocessor

Ethernet

Cloud computing systems

diskprocessor

3

Storage Storage

OS

OS

Page 4: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

4

over MPI

Is MapReduce an effective way to handle big data processing

on HPC systems?

4

Page 5: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Mimir• Mimir - a MapReduce over MPI framework

§ Integrate optimizations: combiner, dynamic repartition, split super-keys § Handle large data in-memory with better scaling compared with MR-MPI

• Open-source software: https://github.com/TauferLab/Mimir.git• Papers: Tao et all, IPDPS 2017; Tao et all, ICPADS 2018

5

WordCount on Tianhe-2 Octree Clustering on Tianhe-2 Join on Tianhe-2

Page 6: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Power Usage and Data Management • Run WordCount miniapps on a

4B word dataset and on KML

• Use PAPI and power capping to

measure

§ Total pawer

§ DRAM power

• Consider different features

§ Steps of the workflow

§ Optimizations (e.g., combiners)

§ Different types of words in the

dataset

§ Different setting of Mimir

6

Energy Usage

Runtime

Power cap

Execution Progress

Total power

DRAM power

Page 7: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

7

aggregatemap

map

map

P0

P1

Pn

… …

convert

convert

convert

reduce

reduce

reduce

input <key, value> <key,value> <key,list<value> output

barrierinterleave barrier interleave

aggregatemap

map

map

P0

P1

Pn

… …

convert

convert

convert

reduce

reduce

reduce

input <key, value> <key,value> <key,list<value> output

barrierinterleave barrier interleave

Impact of MR Stages – WordCount with ReduceByKey vs (Map + Aggregate)

Page 8: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

8

Impact of Data Processing – WordCount with GroupByKey vs. ReduceByKey<a,1> <a,1> <b,1> <b,1> <c,1> map

map

Shuffle

<b,1> <b,1> <c,1> <c,1> <a,1>

reduce

reduce

<a,3> <c,3>

<b,4>

<a,1> <a,1> <a,1> <c,1> <c,1> <c,1>

<b,1> <b,1> <b,1> <b,1>

a b c a b

b c a b c

<a,2> <b,2>

map

map

Shuffle

<b,2> <c,2> <a,2>

reduce

reduce

<a,3> <c,3>

<b,4>

<a,3> <c,3>

<b, 4>

a b c a b

b c a b c

Page 9: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

9

4B - 72 4B - 288 4B - 576

4B - 5K 4B - 1M 4B - 3M

4B - 14M 4B - 28M 4B - 50M

Impact of Data Characteristics - WordCount with different num. of words

Page 10: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Impact of Data Characteristics - WordCount with different num. of words

4B - 72

4B - 5K

4B - 14M

Sweet spot

Page 11: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Impact of Mimir – WordCount with different Mimir’s buffers setting

32 MB 4B72

64 MB 4B72

16 MB 4B42M

128 MB 4B42M

Page 12: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Food for Thoughts

• Build dynamic workflows for mitigating performance lost and power usage based on type of data• Features of streamed data are unknown• ML can support date feature predictions• Workflow optimizations and settings can be selected at runtime• Power can be intrinsically controlled within the workflow

• Leverage MPI to support analytics workflows• MPI comes with robustness and resilience features • Workflows rely on MPI for fault-tolerance

Page 13: Characterization of Power and Performance in Data ...icl.utk.edu/jlesc9/files/STM2.2/jlesc9_taufer.pdfCharacterization of Power and Performance in Data-Intensive Applications using

Recommended