Characterization of Power and Performance in Data-Intensive Applications using MapReduce over MPI
Michela Taufer1,7With the contributions of: Tao Gao1,2, Yanfei Guo3, Boyu Zhang1, Pietro Cicotti4,,
Yutong Lu2,5,6, Josh Davis1, Anthony Danalis7, Heike Jagode7, Jack Dongarra7,
Sunita Chandrasekaran1, Pavan Balaji3
1 University of Delaware2 National University of Defense Technology3 Argonne National Laboratory4 San Diego Supercomputer Center 5 National Supercomputer Center in
Guangzhou6 Sun Yat-sen University 7 University of Tennessee Knoxville
Data Processing on the Cloud• Variations of MapReduce programming model can handle the parallel
job executions, communications, and data movements in data-intensive applications
• Users provide limited components, e.g., map and reduce functions
2
<Hello,1>
map
map
Shuffle
reduce
reduce
HelloWorld
HelloWorld
<world,1>
<Hello,1><world,1>
<Hello, 2>
<World, 2>
Wordcount example:
2
Data Processing: HPC vs. Cloud
• Key differences between Cloud computing and HPC systems disenfranchise the naïve used of Cloud method
disk array
MPI/OpenMP
HPC systemsprocessor
Interconnect
Hadoop/Spark
diskprocessor
Ethernet
Cloud computing systems
diskprocessor
3
Storage Storage
OS
OS
4
over MPI
Is MapReduce an effective way to handle big data processing
on HPC systems?
4
Mimir• Mimir - a MapReduce over MPI framework
§ Integrate optimizations: combiner, dynamic repartition, split super-keys § Handle large data in-memory with better scaling compared with MR-MPI
• Open-source software: https://github.com/TauferLab/Mimir.git• Papers: Tao et all, IPDPS 2017; Tao et all, ICPADS 2018
5
WordCount on Tianhe-2 Octree Clustering on Tianhe-2 Join on Tianhe-2
Power Usage and Data Management • Run WordCount miniapps on a
4B word dataset and on KML
• Use PAPI and power capping to
measure
§ Total pawer
§ DRAM power
• Consider different features
§ Steps of the workflow
§ Optimizations (e.g., combiners)
§ Different types of words in the
dataset
§ Different setting of Mimir
6
Energy Usage
Runtime
Power cap
Execution Progress
Total power
DRAM power
7
aggregatemap
map
map
…
P0
P1
Pn
… …
convert
convert
convert
…
reduce
reduce
reduce
…
input <key, value> <key,value> <key,list<value> output
barrierinterleave barrier interleave
aggregatemap
map
map
…
P0
P1
Pn
… …
convert
convert
convert
…
reduce
reduce
reduce
…
input <key, value> <key,value> <key,list<value> output
barrierinterleave barrier interleave
Impact of MR Stages – WordCount with ReduceByKey vs (Map + Aggregate)
8
Impact of Data Processing – WordCount with GroupByKey vs. ReduceByKey<a,1> <a,1> <b,1> <b,1> <c,1> map
map
Shuffle
<b,1> <b,1> <c,1> <c,1> <a,1>
reduce
reduce
<a,3> <c,3>
<b,4>
<a,1> <a,1> <a,1> <c,1> <c,1> <c,1>
<b,1> <b,1> <b,1> <b,1>
a b c a b
b c a b c
<a,2> <b,2>
map
map
Shuffle
<b,2> <c,2> <a,2>
reduce
reduce
<a,3> <c,3>
<b,4>
<a,3> <c,3>
<b, 4>
a b c a b
b c a b c
9
4B - 72 4B - 288 4B - 576
4B - 5K 4B - 1M 4B - 3M
4B - 14M 4B - 28M 4B - 50M
Impact of Data Characteristics - WordCount with different num. of words
Impact of Data Characteristics - WordCount with different num. of words
4B - 72
4B - 5K
4B - 14M
Sweet spot
Impact of Mimir – WordCount with different Mimir’s buffers setting
32 MB 4B72
64 MB 4B72
16 MB 4B42M
128 MB 4B42M
Food for Thoughts
• Build dynamic workflows for mitigating performance lost and power usage based on type of data• Features of streamed data are unknown• ML can support date feature predictions• Workflow optimizations and settings can be selected at runtime• Power can be intrinsically controlled within the workflow
• Leverage MPI to support analytics workflows• MPI comes with robustness and resilience features • Workflows rely on MPI for fault-tolerance