Post on 10-May-2015
transcript
www.ci.anl.govwww.ci.uchicago.edu
A Community Roadmap for Enabling Access to Geosciences Data
Tanu MalikIan FosterComputation InstituteUniversity of Chicago and Argonne National Lab.tanum@ci.uchicago.edu, foster@anl.gov
www.ci.anl.govwww.ci.uchicago.edu
2
Outline
• Access Workshop• DataSpace • Post Charette EarthCube
www.ci.anl.govwww.ci.uchicago.edu
3
Access is Vital for EarthCube’s Success
• The goal of EarthCube is to create a sustainable infrastructure that enables the sharing of all geosciences data, information, and knowledge in an open, transparent and inclusive manner.
I cant get access to *.
It is difficult for me to *.
I want to integrate data from other disciplines, but *.
Access refers to software and activities that make data and computational resources easily, efficiently and reliably available to scientists across disciplines.
www.ci.anl.govwww.ci.uchicago.edu
4
Access Workshop Goals
• Encourage discussions on emergent issues:– Use of cloud computing– Exploiting the general principle of moving computation to data – A technological and governance framework for cross-disciplinary access,
service architecture, brokering principles, real-time data, uniform authentication and authorization environment, etc.
– Improving access to data in publications.
• Bring some standardization on research data life cycle issues:– In general, data, once generated, follow a lifecycle---they are stored,
described, processed, transformed, accessed, discovered, analyzed, and curated. In organized networks and campaigns, lifecycle stages are often documented and standardized, though vary significantly across networks and campaigns. In individual initiatives, the lifecycle stages continue to remain ad hoc and ill-defined. [RDLM-Workshop2011]
• Obtain community consensus on a few use cases
www.ci.anl.govwww.ci.uchicago.edu
5
Workshop Activity Outcomes
• Use Case 1: Can I access “not large” but “big data” to conduct statistical analysis?
• Use Case 2: I have a hypothesis not tied to a physical instrument or geophysical parameter. Can I still access all the data, in an “interactive” fashion to test my hypothesis?
• Use Case 3: The storm dust paper is vital to my research. Can I access the data in the publication and change parameters of experiments to understand the nature of storm dust?
www.ci.anl.govwww.ci.uchicago.edu
6
Workshop Reflections
• Its all about data!
Resources, ServicesData
Import
Export
DataResources, Services
Export
Import
People
www.ci.anl.govwww.ci.uchicago.edu
7
Workshop Reflections-2
• Discussing technology issues in insolation is a recipe for disaster.– Access is closely aligned with other subgroups– It is important to organize in functional units
www.ci.anl.govwww.ci.uchicago.edu
8
Workshop Reflections-3
• Challenges will continue
Changing Requirements/Changing Technology
• Real-time data• Cross-disciplinary Data• High dimensionality• Network bandwidth, Computational resource, Data management constraints
Adoption Culture
Social Challenges
• Transparency• Openness• Establishing social ties
Adoption is slowSustainabilityEstablishing practices
www.ci.anl.govwww.ci.uchicago.edu
9
Principles of Data Sharing in EarthCube
Lowers the barrier to entry for data sharing and reuse Uses tenets like “metadata ASAP” to encourage submission of data Enables creation of “Curation Co-ops” among communities, sub-communities Serve the NSF DMP requirement Based on a cloud-based infrastructure to support data discovery, access, and
mining
www.ci.anl.govwww.ci.uchicago.edu
10
Enabling A Data Sharing Space: The DataSpace
• Embrace a “semi- structured” notion ‐‑
• Ingest data in raw form,Structuring and refinement of the data and metadata.
• Open, extensible architecture that supports Software as a Service (SaaS) model,
Process for vetting contributed services prior to their incorporation. Based on on demand resources ‑
• Emphasis on usability instead on developing technology/infrastructure
DataSpaceData
Export
Import
Resc, Services
www.ci.anl.govwww.ci.uchicago.edu
11
Post-Charette• 2 Earthcube PI meets at University of Colorado, Boulder
– A Concept group meeting, o some representation from Community groups, o July 10, 2012
– A Concept and Community group meeting, o October 4 -5, 2012
• Primary objective: Convergence– Through Roadmaps– Architecture– On future steps
www.ci.anl.govwww.ci.uchicago.edu
12
Highlights: Summary of Roadmaps
• Workplace to collaborate, • Lower barriers for participation, • Openness and extensibility, • Feedback and reproducibility, • Discovery of materials held by long-tailed
scientists, • Education and reward system for scientists, • Cross-domain teams and broad collaboration• A new community paradigm.
www.ci.anl.govwww.ci.uchicago.edu
13
Defining DataSpace: Architecture-1
DataResources, Services
Export
Import
www.ci.anl.govwww.ci.uchicago.edu
14
Defining DataSpace: Architecture-2
www.ci.anl.govwww.ci.uchicago.edu
15
Acknowledgements
• Don Middleton, NCAR• Robert Gibb, New Zealand Landcare
Research• Jeff Heard, U. of North Carolina• Doug Lindholm, U. of Colorado• Joseph Baker, Virginia Tech• Anne Wilson, U of Colorado• Chris Lynnes, NASA/ESIP Federation• Karsten Steinhauser, U. of
Minnesota• Ruth Duerr, NSIDC
• Dave Fulker, OPeNDAP, • Amarnath Gupta, UCS,• Robert Jacob, ANL• Chris Jenkins, JPL• Craig Mattocks, U. Miami• Beth Plale, Indiana Univ. • Stephen M. Richard, AZGS• Sameer Sirugeri, Microsoft • Zhangfan Xing, JPL, • John Williams, NCAR
www.ci.anl.govwww.ci.uchicago.edu
16
Thank You!
• Tanu Malik, tanum@ci.uchicago.edu, • Ian Foster, foster@anl.gov
• Questions?