+ All Categories
Home > Technology > GlobusWorld 2015

GlobusWorld 2015

Date post: 18-Feb-2017
Category:
Upload: tanu-malik
View: 319 times
Download: 0 times
Share this document with a friend
15
A Reproducible Framework Powered By Globus Tanu Malik, Kyle Chard, Ian Foster Computation Institute University of Chicago and Argonne National Laboratory eoDataspa
Transcript
Page 1: GlobusWorld 2015

A Reproducible Framework

Powered By GlobusTanu Malik, Kyle Chard, Ian Foster

Computation InstituteUniversity of Chicago and Argonne National Laboratory

GeoDataspace

GeoDataspace

GeoDataspace

Page 2: GlobusWorld 2015

Share and Reproduce

Alice wants to share her models and simulation output with Bob, and Bob wants to re-execute Alice’s application to validate her inputs and outputs.

GeoDataspace

GeoDataspace

GeoDataspace

Page 3: GlobusWorld 2015

Alice’s Options

1. A tar and gzip

2. Build a website with model code, parameters, and data

3. Submit to a repository

4. Create a virtual machine

GeoDataspace

GeoDataspace

GeoDataspace

Page 4: GlobusWorld 2015

GeoDataspace

GeoDataspace

GeoDataspace

Bob’s Frustration1. I do not find the lib.so required for building

the model.

2. How do I?

Lack of easy and efficient methods for sharing and reproducibility

Amount of pain Bob suffers

Amount of pain Alice suffers

Page 5: GlobusWorld 2015

Some Reproducibility Requirements

• Automatically solve the “dependency hell” problem

• “I have an incompatible version of the library”

• Connect programs with data and capture dataflows

• Which version of my program produced this data?

• Allows easy annotation of human knowledge

• “Insufficient documentation to install or run the program”

• Enables reproducibility efficiently and with minimal intervention

• “No change of programming or authoring environments”

GeoDataspace

GeoDataspace

GeoDataspace

Page 6: GlobusWorld 2015

GeoDataspace

GeoDataspace

GeoDataspace

Reproducible Framework

Machine A

Application

Machine Bdata

system files(/bin, /lib, ...)

source code

parametersconfiguration

network connections

SciUnits

(Docker Hub, GitHub, DataHub)

Globus Catalog Globus Publish

SciUnits

Execution Platform(Docker, PTU, chroot)

Share/Transfer

12

3

3

4

1. Capture the scientific activity

• Capture the source code, the data, the environment, including the flows of data from process to process (local or distributed)

2. Preserve as SciUnits

• Preserve the captured information as physical files or as detailed metadata (annotations and provenance)

3. Share & Distribute• Share the sciunits with others including detailed metadata

4. Re-execute and Re-analyze• Users can run the complete package without installation or configuration.

Queries for detailed provenance of data

and versions.

Page 7: GlobusWorld 2015

CI Components• SciUnits

• Units of scientific activity/research output

• Metadata Catalog

• A scalable, flexible catalog for annotations conforming to open-world assumption

• Globus services for sharing, transfering and publishing sciunits

• Share/Publish sciunits for others to use

• Replay capability through native re-execution, Docker or Vagrant

• Run sciunits without installation or configuration and metadata information

GeoDataspace

GeoDataspace

GeoDataspace

Page 8: GlobusWorld 2015

Simplifying  Data  Management  for

Geoscience  ModelsTanu  Malik,  Ian  Foster,  Kyle  Chard,

Joseph  Baker,  Mike  Gurnis,  Jonathan  Goodall,  Sco=  Peckham  

GeoDataspace

GeoDataspace

GeoDataspace

Page 9: GlobusWorld 2015

Science DriversSolid Earth

Space Science

Hydrology

CSDMS

GeoDataspace

GeoDataspace

GeoDataspace

• http://workspace.earthcube.org/geodataspace

• Software, Source code, Science Usecases, Reports, Presentations, News

Page 10: GlobusWorld 2015

Project Goals

• GeoDataspace Project Goals:

• Establish the reproducible framework with Globus

• Enable three use cases for establishing geounits in Space Science, Hydrology, and Seismology

• Making the geounit Client widely accessible to the EarthCube community

• connect with a model and data repository (CSDMS)

GeoDataspace

GeoDataspace

GeoDataspace

Page 11: GlobusWorld 2015

Science Usecases• Seismology: geounits of 2D and 3D kinematic

geoscience models, visualized through GPlates and modifying GPML data files

• End Goal: Sharing, preserving, and publishing visualization sessions with data

• Space Science geounits on SuperDARN data with analysis tools as available from the Baker Laboratory at Virginia Tech

• End Goal: Sharing and publishing geounits

• Hydrology geounits of IRODs workflows on hydrology VIC models

• End Goal: Demonstrating end-to-end reproducibility with iRODS that does not support data provenance or data publishing

GeoDataspace

GeoDataspace

GeoDataspace

Page 12: GlobusWorld 2015

AcknowledgementsFunders:

Community:

GeoDataspace

GeoDataspace

GeoDataspace

Page 13: GlobusWorld 2015

Reproducible Framework Client

Provenance Data

Annotations

Application Virtualization(Source Code, Data, Environment, Library

Dependencies)C

ore

Plu

gins

Clipboard Events

Commands

Web Browsing History

Metadata from Files

Ontologies, Dictionaries, Vocabularies

Globus Services(Catalog,Transfer, Share, Publish)

g gg g

Provenance Services(PROV Data Management)

Page 14: GlobusWorld 2015

geounit Client

GeoDataspace

GeoDataspace

GeoDataspace

1. Support for application virtualization.

2. Provenance collection:

audit <program name>, exec <program name> [activity]

specific version information collected if part of a VMS

3. Annotation: addannotation <file|dir|g> <key:value>

4. Create packages (Docker or vagrant compliant)

5. Queries: why, what, where

6. Visualizers

Page 15: GlobusWorld 2015

PROVaaS


Recommended