Characterization of Power and Performance in Data...

Characterization of Power and Performance in Data-Intensive Applications using MapReduce over MPI

Michela Taufer1,7With the contributions of: Tao Gao1,2, Yanfei Guo3, Boyu Zhang1, Pietro Cicotti4,,

Yutong Lu2,5,6, Josh Davis1, Anthony Danalis7, Heike Jagode7, Jack Dongarra7,

Sunita Chandrasekaran1, Pavan Balaji3

1 University of Delaware2 National University of Defense Technology3 Argonne National Laboratory4 San Diego Supercomputer Center 5 National Supercomputer Center in

Guangzhou6 Sun Yat-sen University 7 University of Tennessee Knoxville

Data Processing on the Cloud• Variations of MapReduce programming model can handle the parallel

job executions, communications, and data movements in data-intensive applications

• Users provide limited components, e.g., map and reduce functions

2

<Hello,1>

map

map

Shuffle

reduce

reduce

HelloWorld

HelloWorld

<world,1>

<Hello,1><world,1>

<Hello, 2>

<World, 2>

Wordcount example:

2

Data Processing: HPC vs. Cloud

• Key differences between Cloud computing and HPC systems disenfranchise the naïve used of Cloud method

disk array

MPI/OpenMP

HPC systemsprocessor

Interconnect

Hadoop/Spark

diskprocessor

Ethernet

Cloud computing systems

diskprocessor

3

Storage Storage

OS

OS

4

over MPI

Is MapReduce an effective way to handle big data processing

on HPC systems?

4

Mimir• Mimir - a MapReduce over MPI framework

§ Integrate optimizations: combiner, dynamic repartition, split super-keys § Handle large data in-memory with better scaling compared with MR-MPI

• Open-source software: https://github.com/TauferLab/Mimir.git• Papers: Tao et all, IPDPS 2017; Tao et all, ICPADS 2018

5

WordCount on Tianhe-2 Octree Clustering on Tianhe-2 Join on Tianhe-2

https://github.com/TauferLab/Mimir.git

Power Usage and Data Management • Run WordCount miniapps on a

4B word dataset and on KML

• Use PAPI and power capping to

measure

§ Total pawer

§ DRAM power

• Consider different features

§ Steps of the workflow

§ Optimizations (e.g., combiners)

§ Different types of words in the

dataset

§ Different setting of Mimir

6

Energy Usage

Runtime

Power cap

Execution Progress

Total power

DRAM power

7

aggregatemap

map

map

…

P0

P1

Pn

… …

convert

convert

convert

…

reduce

reduce

reduce

…

input <key, value> <key,value> <key,list<value> output

barrierinterleave barrier interleave

aggregatemap

map

map

…

P0

P1

Pn

… …

convert

convert

convert

…

reduce

reduce

reduce

…

input <key, value> <key,value> <key,list<value> output

barrierinterleave barrier interleave

Impact of MR Stages – WordCount with ReduceByKey vs (Map + Aggregate)

8

Impact of Data Processing – WordCount with GroupByKey vs. ReduceByKey<a,1> <a,1> <b,1> <b,1> <c,1> map

map

Shuffle

<b,1> <b,1> <c,1> <c,1> <a,1>

reduce

reduce

<a,3> <c,3>

<b,4>

<a,1> <a,1> <a,1> <c,1> <c,1> <c,1>

<b,1> <b,1> <b,1> <b,1>

a b c a b

b c a b c

<a,2> <b,2>

map

map

Shuffle

<b,2> <c,2> <a,2>

reduce

reduce

<a,3> <c,3>

<b,4>

<a,3> <c,3>

<b, 4>

a b c a b

b c a b c

9

4B - 72 4B - 288 4B - 576

4B - 5K 4B - 1M 4B - 3M

4B - 14M 4B - 28M 4B - 50M

Impact of Data Characteristics - WordCount with different num. of words

Impact of Data Characteristics - WordCount with different num. of words

4B - 72

4B - 5K

4B - 14M

Sweet spot

Impact of Mimir – WordCount with different Mimir’s buffers setting

32 MB 4B72

64 MB 4B72

16 MB 4B42M

128 MB 4B42M

Food for Thoughts

• Build dynamic workflows for mitigating performance lost and power usage based on type of data• Features of streamed data are unknown• ML can support date feature predictions• Workflow optimizations and settings can be selected at runtime• Power can be intrinsically controlled within the workflow

• Leverage MPI to support analytics workflows• MPI comes with robustness and resilience features • Workflows rely on MPI for fault-tolerance

Date post:	22-May-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Characterization of Power and Performance in Data...

Documents