+ All Categories
Home > Documents > Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to...

Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to...

Date post: 04-Jun-2020
Category:
Upload: others
View: 27 times
Download: 0 times
Share this document with a friend
15
air planet people CISL/TDD/ASAP © UCAR, 2015 Using Parallel Python Tools to Postprocess Data for CMIP6 Sheri Mickelson Kevin Paul Eighth Symposium on Advances in Modeling and Analysis Using Python AMS 2018 1
Transcript
Page 1: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • peopleCISL/TDD/ASAP

© UCAR, 2015

Using Parallel Python Tools to Postprocess Data for CMIP6

Sheri MickelsonKevin Paul

Eighth Symposium on Advances in Modeling and Analysis Using Python

AMS 2018

1

Page 2: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

What is CMIP6?

Using Parallel Python Tools to Postprocess Data for CMIP6

2

§ Internationally coordinated effort to run sets of defined experiments. There are roughly 30 different centers from around the world that will be participating.

§ Each experiment has a defined set of protocols, forcings, and requested output.

§ Running the same experiments with multiple models leads to stronger results.

§ The results of these simulations are evaluated as part of the IPCC Climate Assessment Reports. The results are also used by international governments for policy decisions and for further research by scientific institutions and universities.

Page 3: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP5 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

3

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(NCO)

CMOR

Diagnostics(NCO/NCL)

PushtoESGF

Theseareallofthestepsthatweneedtotaketopublishourdatatothecommunity.ForCMIP5thisprocesstook15monthstopostprocess 200TBofdata.

Page 4: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP5 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

4

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(NCO)

CMOR

Diagnostics(NCO/NCL)

PushtoESGF

Series 1Field 1

Slice 1

Fiel

d 1

Fiel

d 2

Fiel

d 3

Slice 5

Fiel

d 1

Fiel

d 2

Fiel

d 3

Slice 3

Fiel

d 1

Fiel

d 2

Fiel

d 3

Slice 4

Fiel

d 1

Fiel

d 2

Fiel

d 3

Slice 2

Fiel

d 1

Fiel

d 2

Fiel

d 3

Series 2Field 2

Series 3Field 3

ConvertingfromTimeSlicetoTimeSeries

Modeloutputsdatainaformatthathasonetimesliceandmultiplevariables.Thepreferreddistributionformatisonevariableandmultipletimeslices.Thisstepconvertsfromoneformattotheother.TheexistingmethodusedNCOfortheconversion.ThiswasthemostexpensivestepinCMIP5.

Page 5: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP5 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

5

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(NCO)

CMOR

Diagnostics(NCO/NCL)

PushtoESGF

6ComponentDiagnosticPackages(Atm,Lndx2,Ocn,SeaIce,BGC)

Usedtodocumentandevaluatetheclimatesimulation.

Alloftheoriginalpackages:1. Containatoplevelcontrolscript2. CreateclimatologyfileswithNCO

tools3. CreatehundredsofplotswithNCL

scripts4. Createwebpagesthatallowusersto

browsethroughplots

Page 6: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP5 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

6

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(NCO)

CMOR

Diagnostics(NCO/NCL)

PushtoESGF

ThisstepStandardizes themodeloutputininordertomakeiteasiertocompareagainstothermodelsfortheintercomparison.

Someexamples:• Fileformats(e.g.,NetCDF4)• Namesoffilesanddirectorystructure• Fileattributes(e.g.,institution,MIPname,…)• Namesofdimensions(e.g.,lat,lon,…)• Namesofvariables(e.g.,psl,ta,tas,…)• Dimensionsofvariables• Variabledatatypes(e.g.,float,double,

…)• Attributesofvariables(e.g.,units,…)• Rangesoftime(e.g.,2006to2100)• Derivingvariablesthatarenotoutputted

directly

UsedFortranandNCLcode,NCO,andCMOR

Page 7: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP5 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

7

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(NCO)

CMOR

Diagnostics(NCO/NCL)

PushtoESGF

Motivation:ForCMIP6wewillhavetopostprocess 6PBofdatawithinthesameamountoftime.WeneededbettermethodsinordertobeabletocreatethisamountofdataforthecommunityintimeforAR6.

WeNeededtodoThreeThings:1. IncreasePerformance:Addedparallelizationintotheworkflow2. ReduceHumanIntervention:Workedonintegratingourworkflows

intoanautomatedworkflowengine3. ProjectManagement:Everythingiscoordinatedthroughacentral

database

CMIP66PB

(CurrentPrediction)

CMIP5200TB

Page 8: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP6 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

8

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(PyReshape)

PyConform

Diagnostics(PyAverager)

PushtoESGF

WorkflowDriv

enbyC

ylc

ExperimentsUpdate

TheirStatusinRunDatabase

WerewrotetoolsinPythonandaddedtaskparallelization.

Allofthetoolsdependon:• MPI4Pyforinternodecommunication• PyNIO andNetCDF4-pythonforI/O

Page 9: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

Parallelization Methods

Using Parallel Python Tools to Postprocess Data for CMIP6

9

Slice 1Fi

eld

1Fi

eld

2Fi

eld

3Slice 3

Fiel

d 1

Fiel

d 2

Fiel

d 3

Slice 2

Fiel

d 1

Fiel

d 2

Fiel

d 3

Series 1Field 1

Series 2Field 2

Series 3Field 3

Rank 1

Rank 2

Rank 3

Tim

e A

vera

ged

Clim

atol

ogy

File

Tim

e A

vera

ges

(Int

erna

l M

emor

y)T

ime-

Seri

es

File

s

Var 1 Var 2 Var 3

Rank 1 Rank 2 Rank 3

Avg

Var 1

Avg

Var 2

Avg

Var 3

Rank 0

Avg

Var 1

Avg

Var 2

Avg

Var 3

Var 1 Var 2 Var 3

Rank 1 Rank 2 Rank 3

Avg

Var 1

Avg

Var 2

Avg

Var 3

Rank 0

Avg

Var 1

Avg

Var 2

Avg

Var 3

Var 1 Var 2 Var 3

Rank 1 Rank 2 Rank 3

Avg

Var 1

Avg

Var 2

Avg

Var 3

Rank 0

Avg

Var 1

Avg

Var 2

Avg

Var 3

AVG1

AVG 2

AVG 3

AVG 4

AVG 5

AVG 6

AVG 7

AVG 8

AVG 9

Ave

rage

s to

C

ompu

te

InterCommunicator 1 InterCommunicator 2 InterCommunicator 3

PyReshaper

PyAverager

PyConform

“x = X1 + X2”Read:X1[i]

Read:X2[i]

Evaluate:(X1+X2)[i]

Map:iàj

Validate:> minimum< maximum

dimensions = [j]etcetera

Write:x[j] File

“y = X1 - X2”

Read:X1[i]

Read:X2[i]

Evaluate:(X1-X2)[i]

Map:iàj

Validate:> minimum< maximum

dimensions = [j]etcetera

Write:y[j] File

Page 10: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

PyReshaper Performance

Using Parallel Python Tools to Postprocess Data for CMIP6

10

ResultsarefromrunningthePyReshaper toolon16yellowstone cores,4coreson4nodes

Page 11: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

PyAverager Performance

Using Parallel Python Tools to Postprocess Data for CMIP6

11

0.1

1

10

100

1000

ATM-SE ICE LND OCN Total

Tim

e (m

inut

es) l

og s

cale

CESM Model Component

NCOPyAverager

Timetocomputeclimatologyfilesfor10yearsofCESMmonthlytimeslicefiles.ThePyAverager ranon120coresonyellowstone andthediagnosticson16yellowstone cores.

1

10

100

Tim

e (m

in) l

og s

cale

Performance Comparison Across Diagnostic Packages

OriginalPyAverager/NCL in Parallel

Page 12: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

PyConform Performance(Preliminary Timing Numbers)

Using Parallel Python Tools to Postprocess Data for CMIP6

12

CESMCaseNameCMIP5Table

InputDatasetSize

OutputDatasetSize

OriginalSerialRuntime

PyConformParallelRuntime(16Procs) SPEEDUP

b40.rcp4_5.1deg.006 Amon 84 GB 62GB 72mins 2mins 38x

b40.20th.track1.1deg.012Amon 135GB 102GB 120mins 8mins 16x

3hr 540GB 506GB 6hours 11mins 34x

Page 13: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

CMIP6 Workflow

Using Parallel Python Tools to Postprocess Data for CMIP6

13

ModelRun

Publication

Post-Processing

CESMModelRun

TimeSeriesConversion(PyReshape)

PyConform

Diagnostics(PyAverager)

PushtoESGF

WorkflowDriv

enbyC

ylc

ExperimentsUpdate

TheirStatusinRunDatabase

WeadoptedCylc asourworkflowEngine(writtenbyHilaryOliveratNIWA)

Weauto-generatetheworkflowdescriptionfilesfromboththeCESMandpostprocessenvironments.Alltheuserneedstodoiseditatoplevelscripttosetcertainvariables(runlength,whichdiagnosticstorun,etc)andthenmanuallystarttherunthroughCylc’sGUIorcommandlineinterface.

Page 14: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

Use Cases of Experiments That Used Cylc

§ Used Cylc to complete 1,240 out of 1,860 total runs ~750 TB timeslice output in about 1 month

§ Used Cylc to run and postprocess part of a 30 member ensemble in a couple of months

§ Used Cylc to build and run over 20,000 forecast ensembles in a couple of months

Using Parallel Python Tools to Postprocess Data for CMIP6

14

Page 15: Using Parallel Python Tools to PostprocessData for CMIP6€¦ · Using Parallel Python Tools to Postprocess Data for CMIP6 5 Model Run Publication Post-Processing CESM Model Run Time

air • planet • people© UCAR, 2015

Questions?§ PyReshaper

§ https://github.com/NCAR/pyreshaper§ PyAverager

§ https://github.com/NCAR/pyAverager§ PyConform (still in development)

§ https://github.com/NCAR/PyConform§ CESM/Cylc WF

§ https://github.com/NCAR/CESM-WF§ Cylc

§ https://cylc.github.io/cylc/

Conact Info mickelso .at. ucar.edu

Using Parallel Python Tools to Postprocess Data for CMIP6

15


Recommended