SDMCenter
Experience with Fusion Workflows
Norbert Podhorszki, Bertram Ludäscher
Department of Computer Science
University of California, Davis
UC DAVISDepartment ofComputer Science
kepler-project.org
SDMCenterNew Challenges
• The CPES project brought new challenges for Kepler and workflow automation people• Remote computations, services and tools
• Long running simulations, large amounts of data
• One-time-passwords
• Workflow = “Glue” • Scientists only need to connect individual components together
• Automate tedious processes (logins, copies of data, control, start-stop)
• Do it reliably
• Show what is going on
SDMCenterWorkflows
• Real-time Monitoring of Simulation:• Transfer current data set to a secondary resource• Execute short analysis/visualization routines• Display result
• Archival and post-processing• Transfer, pack and archive data
sets on the fly
SDMCenterKepler actors for CPES
• Job submission to various resource managers• Permanent SSH connection to perform tasks on a remote machine• Generalized actors (workflows themselves) for specified tasks:
• Watch a remote directory for simulation timesteps
• Execute an external command on a remote machine
• Tar and archive data in large junks to HPSS
• Transfer a remote image file and display on screen
• Control a running SCIRun server remotely
• Above actors do logging/checkpointing• the final workflow can be stopped / restarted
SDMCenter
Convert
Archive
Monitor
Transfer
Archival Workflow
Plasma physics simulation on 2048 processors on Seaborg@NERSC (LBL)Gyrokinetic Toroidal Code (GTC) to study energy transport in fusion devices (plasma microturbulence)
Generating 800GB of data (3000 files, 6000 timesteps, 267MB/timestep), 30+ hour simulation run
Under workflow control:Monitor (watch) simulation progress (via remote scripts)
Transfer from NERSC to ORNL concurrently with the simulation run
Convert each file to HDF5 file
Archive files to 4GB chunks into HPSS
SDMCenterMonitoring Workflow
SDMCenterFuture Plans
• Currently we have specialized actors that should be generalized for other disciplines and systems• “watching for” simulation output
• safe and robust transfer, recovery from failure
• archiving to different MSS, with different security policies, robust to failures and maintenance periods
• Next workflow is cyclic, not just streaming• couple two simulations on two resources, transfer data and control
between them
• use local job manager for code execution
• What about provenance management?• main reason to use scientific workflow system e.g. in bioinformatics
workflows – needed for debugging runs, interpreting results, etc.
SDMCenter
There is more, e.g., how to get from messy to neat & reusable designs?
Author: Tim McPhillips, UC Davis
SDMCenterThe Answer (YMMV)
• Collection-Oriented Modeling & Design (COMAD)• embrace an assembly line metaphor• data = tagged nested collections
• e.g. represented as flattened, pipelined token streams: