Date post: | 09-Jan-2017 |
Category: |
Science |
Upload: | daniel-wheeler |
View: | 23 times |
Download: | 1 times |
fasteners
psutils
$ corrcli watcher startLaunch watcher with ID: 6ea96596ac29$ python my_script.py params.json 6ea96596ac29$ corrcli jobs list label status time stamp pid0 27f427d2 finished 16-06-26 21:03:46 27800$ corrcli sync
Daniel Wheelera and Faïçal Yannick P. Congoa,b a Material Science and Engineering Division, Material Measurement Laboratory, NIST,
b LIMOS, Blaise Pascal University, FranceEmail: [email protected]
SIMULATION MANAGEMENT AND EXECUTION CONTROL
I-What is simulation management? Data context (metadata) is essential to verify scientific
research claims. Version control and other workflow tools capture
varying aspects of data context. Simulation management is concerned with the
metadata for a scientific simulation or execution. Execution control tools capture simulation execution
context in an analogous way to version control.
References[1] A. Davison, “Automated Capture of Experiment Context for Easier Reproducibility in Computational Research”, CISE, 2012, DOI: 10.1109/MCSE.2012.41 [2] F. Y. P. Congo, “Building a Cloud Service for Reproducible Simulation Management”, Scipy Proceedings, 2015[3] F. Y. P. Congo, “Cloud of Reproducible Records”, MGI website, 2015, https://mgi.nist.gov/cloud-reproducible-records
Workfow Tools
Wrapping Tools
II-Execution control in context
Execution Control
Version Control
Ubiquitous, robust Command line Web integration Highly collaborative
Not suitable for capturing execution context
Suitable for recording stable automated executions
Provides log, search and view of execution history
Capture entire simulation context
Version environments Collaborative
Not collaborative with current tools
Not robust or ubiquitous
Not suitable for log, search and view of history
Suitable for building pipelines of distinct tasks
Enables a clear division of tasks enabling flexible and adjustable pipeline implementation by non-experts
Black box design for each section of the pipeline
Monolithic in nature encouraging isolated ecosystem of tools
IV-Summary
III-Execution Control Apps
$ smt init$ smt configure configure=python \– --main=script.py$ smt run params.json$ smt list538732dcc47d
Record saved at end of execution Relatively mature code base Local web view Computation launched as a subprocess
Robust local record store on file system Use “watcher” to inspect process – no subprocesses Continuously update records Sync to CoRR API independently https://github.com/usnistgov/corr-cli/
Separate API and Cloud apps for scalability Code base under construction Centralized cloud platform https://github.com/usnistgov/corr
Capture version control details Capture python dependencies Capture input and output files Capture parameters Capture CPU and memory usage Capture record commentary
Features
What is CoRR? The Cloud of Reproducible Records is a
web platform and command line tool client (CoRR-cli) for recording execution context as a set of records.
Future Work Overcome government security issues to host live app Implement searchable table view of data Implement common metadata standard and
interoperability between CoRR and Sumatra Enlarge CoRR-cli metadata capture capabilities to be
equivalent to Sumatra for Python