READEX – Run*me Exploita*on of Applica*on Dynamism for Energy-efficient eXascale compu*ng EEHPCWG@SC’17
AndreasGocht–TechnischeUniversitätDresden
Overview: Project Partners
• GrantagreementNo671657• OfficiallystartedSeptember1st,2015
• TechnischeUniversitätDresden/ZIH(Coordinator)• NorwegianUniversityofScienceandTechnology• TechnischeUniversitätMünchen• IT4InnovaWons,VSB-TechnicalUniversityofOstrava• NUIGalway,IrishCentreforHigh-EndCompuWng• IntelFrance• Gesellscha]fürnumerischeSimulaWonmbH
2
Project Mo*va*on
ApplicaWonsexhibitdynamicbehaviour• Changingresourcerequirements• ComputaWonalcharacterisWcs• ChangingloadonprocessorsoverWme
3
Overview
READEXcreatesatools-aidedmethodologyforautoma2ctuningofparallelapplicaWons• DynamicallyadjustsystemparameterstoactualresourcerequirementsJointechnologiesfromEmbeddedSystemsandHPC• HPC:PTF,Score-P,andHDEEM• EmbeddedSystems:Systemscenariomethodology
4
Overview
Co-designapproach• Manualtuningforenergyefficiencyasabaseline• AutomaWctuningforcomparison
• ApplicaWons• PERMONandESPRESO(FETItoolsfromIT4InnovaWons)• Indeed(GNS)• CORALbenchmarksuite• ProxyApps
5
Terminology: Region and Region Instance
6
int main(void) {
// Initialize application int num_iterationr = 2; for (int iter = 1; iter <= num_iterations; iter++) { laplace3D(); reduction(); fftw_execute(); } // Post processing and finalization return 0;
}
Significantregion RunWme
situaWon
Phaseregion
Phase
Terminology: Tuning Parameter
7
int main(void) {
// Initialize application int num_iterationr = 2; for (int iter = 1; iter <= num_iterations; iter++) { laplace3D(); reduction(); fftw_execute(); } // Post processing and finalization return 0;
}
TuningParameter
FREQ=1.5GHz
TuningParameterFREQ=2GHz
Overview: Workflow
1. InstrumentapplicaWonScore-PprovidesdifferentkindsofinstrumentaWon
2. DetectdynamismCheckwhetherrunWmesituaWonscouldbenefitfromtuning
3. DetectenergysavingpotenWalandconfiguraWons(DTA)UsetuningpluginandpowermeasurementinfrastructuretosearchforopWmalconfiguraWonCreatetuningmodel
4. RunWmeapplicaWontuning(RAT)Applytuningmodel,useopWmalconfiguraWon
8
Current Status
• ApplicaWoninstrumentaWon,dynamismdetecWon,DTA,andRATareimplemented
• Promisingresults:
9
[CELLREF]
[CELLREF]
[CELLREF][CELLREF]
[CELLREF]
[CELLREF]
20
30
40
50
60
70
80
90
100
110
EnergywithoutREADEX
EnergywithREADEX
RunWmewithoutREADEX
RunWmewithREADEX
NPBBT.C
[CELLREF]
[CELLREF]
[CELLREF][CELLREF]
[CELLREF]
[CELLREF]
20
30
40
50
60
70
80
90
100
110
EnergywithoutREADEX
EnergywithREADEX
RunWmewithoutREADEX
RunWmewithREADEX
NPBMG.D
Example: NAS-OMP BT.C Benchmark
ComparisonofpowerconsumpWonandrunWmeoftheBTbenchmark
10
WithREADEX
WithoutREADEX
Example: NAS-OMP BT.C Benchmark
Selectedtuningparameter:coreanduncorefrequencyoftheBTbenchmark
11
Example: NAS-OMP MG.D Benchmark
ComparisonofpowerconsumpWonandrunWmeoftheMGbenchmark
12
WithREADEX
WithoutREADEX
Example: NAS-OMP MG.D Benchmark
Selectedtuningparameter:coreanduncorefrequencyoftheMGbenchmark
13
Work in Progress
• EvaluaWngApplicaWonTuningParameters(ATPs)• Allowthetuningofprograminternal“decisions”• ExampleprecondiWonerinESPRESOasolverforFETIproblems:
Precondi2onertype
Numberof
itera2ons
Singleitera2oncostTimeandenergy
Totalsolu2oncostTimeandenergy
NoprecondiWoner 172 130+0ms 32.3+0.00J 21.4s 5.50kJ
WeightfuncWon 100 130+2ms 32.3+0.53J 12.9s 3.28kJ
Lumped 45 130+10ms 32.3+3.86J 6.3s 1.64kJ
LightDirichlet 39 130+10ms 32.3+3.74J 5.5s 1.41kJFullDirichlet(default) 30 130+80ms 32.3+20.6J 6.3s 1.59kJ
Note:130msand32.3J–isabaselineforsingleiteraWoncostwithoutprecondiWoner11.3%energysavingsagainstthedefaultfullDirichletprecondi2oners
14
Work in Progress
• RunWmeCalibraWon• AutomaWcallyfindsagoodconfiguraWonforunseenscenarios• AppliedatrunWme• MachineLearningtechniques
15
Thank you for your aQen*on
Ques2ons?
16