+ All Categories
Home > Documents > Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

Date post: 05-Jan-2016
Category:
Upload: morty
View: 43 times
Download: 2 times
Share this document with a friend
Description:
Geoscience Data Repository in Digital Object Model and Open-Source Frameworks: Provenance Applications (ESDORA Project). Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook NASA ORNL DAAC Environmental Science Division Oak Ridge National Laboratory. Agenda. - PowerPoint PPT Presentation
24
NASA Distributed Active Archive Center for Biogeochemical Dynamics Geoscience Data Repository in Digital Object Model and Open- Source Frameworks: Provenance Applications (ESDORA Project) Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook NASA ORNL DAAC Environmental Science Division Oak Ridge National Laboratory 1
Transcript
Page 1: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Geoscience Data Repository in Digital Object Model and Open-Source Frameworks:Provenance Applications (ESDORA Project)

Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA ORNL DAACEnvironmental Science DivisionOak Ridge National Laboratory

1

Page 2: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Agenda

1.Geoscience Data Curation2.System Components & Digital Object Model3.Capabilities & OAIS Mapping4.Provenance Applications5.Conclusion Remarks

2

Page 3: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Digital Data Curation Maintaining and adding value to a trusted body of

digital information for current and future use throughout its lifecycle

3

Page 4: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Important Aspects of Data CurationAuditing:

What changed, when (contextual environment, status)Lineage and provenance:

The derivation history of data formally recorded, and is both machine and human understandable now and in the future.

Versioning:Keep earlier versions a data stream in a data system, such that we can revert to an earlier version if needed.

Identifier:Data is identifiable and citable, using a standardized scheme, e.g., the Digital Object Identifier (DOI) system.

Integrity:The integrity of data files at any time of its lifecycle is verifiable.

Interoperable/accessible for long term:Accessible with ease by users and software.

4

Page 5: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

The Challenge

• Tremendous amount of data in Geosciences is being generated, digital curation needs to be in place for preservation and reuse.

Yet, there is not a generic, interoperable system to manage, preserve, and deliver relevant metadata and data processing lineage information along with the actual content.

5

Page 6: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA: A Complete Data System Built on Fedora Digital Object Model

Archive Management:

Fedora Repository

User Interface:

Drupal & Islandora

Search & Discovery:

Apache Solr & Fedora Semantic Store

http://esdora2.ornl.gov/ 6

Page 7: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ContentDigital Object

XML Encoding

Fedora Digital Object Model

Object InfoObject Info

ID

SemanticsSemantics

AuditAudit

Metadata 2 Metadata 2

Content 1Content 1

… …

Metadata 1Metadata 1

Content 2Content 2

( Payette, S. and C. Lagoze, 1998 ) 7

Page 8: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA Capabilities:• Metadata and data managed together in one logic unit

• Integrity checks, versions, and auditing trails

• Machine-readable semantics for provenance knowledge

• XML-encoding for long-term storage, access, and recovery

• Search, discovery, metadata publishing• Multiple standards (FGDC, ISO, EML, etc…) accommodated (we use FGDC) 8

Page 9: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

OAIS Reference Architecture

9

Page 10: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Information Unit

Logical information units (packages) for ingestion, management, dissemination

OAIS – SIP: Submission Information PackageOAIS – AIP: Archival Information PackageOAIS – DIP: Dissemination Information Package

10

Page 11: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA SIP

Data set (folder)

-- Metadata (folder with structured and non-structured metadata files)

-- Data (folder with actual data files)

Data set (folder)

-- Metadata (folder with structured and non-structured metadata files)

-- Data (folder with actual data files)

Object InfoObject Info

ID

SemanticsSemantics

AuditAudit

FGDC Metadata FGDC Metadata

Free Text MetadataFree Text Metadata

… …

PolicyPolicy

Data Content Data Content

11

Page 12: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA AIP (data/metadata coexist)

Object InfoObject Info

ID

SemanticsSemantics

AuditAudit

FGDC Metadata FGDC Metadata

Free Text MetadataFree Text Metadata

… …

PolicyPolicy

Data Content Data Content

12

Page 13: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Inline Metadata Editor

13

Page 14: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA DIP• REST Web Services

• Data Objects• Collection Objects• Datastreams• Metadata in OAI-PMH• Indexing & Search

http://esdora2.ornl.gov/oaiprovider/?verb=ListRecords&metadataPrefix=fgdc

14

Page 15: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Solr-Enabled Indexing & Search• Simple Keyword Search• Faceted Search• Spatial/Temporal Search• Result linked to data objects

20

11

NA

SA

ES

DS

WG

Me

etin

g,

Ne

wp

ort

Ne

ws,

VA

15

Page 16: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Provenance in ESDORA

16

20

11

NA

SA

ES

DS

WG

Me

etin

g,

Ne

wp

ort

Ne

ws,

VA

Page 17: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Where should provenance be stored?

users

Internal metadata sources (often file system)

Structured metadata stores (database or indexing engine)

External metadata sources on the Web

Application

20

11

NA

SA

ES

DS

WG

Me

etin

g,

Ne

wp

ort

Ne

ws,

VA

17

In software applications: BAD

In accompanying files: BAD

In structured metadata records: BAD if not linked to data

Semantically a part of the content system: GOOD

Page 18: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

ESDORA: Metadata & semantic relations are stored in the same digital object as the data content

DOIDOI

FGDCFGDC

Read meRead me

Guide docsGuide docsDatastream

1Datastream

1

ISOISO

Application uses semantic queries for knowledge stored in objects

Application

18

SemanticsSemantics

Datastream xDatastream x

Page 19: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Synthetic Land Cover Data Chain (SYNMAP)

(Modeling and Synthesis Thematic Data Center, MAST-DC)

Analyzed_SYNMAP Analyzed Potential_SYNMAP

Original_SYNMAP

AVHRR_CFTC MODIS_GLC GLCC GLC2000

To provide the standardized land cover map for Multi-scale Synthesis and Terrestrial Model Intercomparison Project, the Original SYNMAP is assembled from four independent products, which is in-turn reprocessed (common resolution, extent, CF-Compliant NetCDF) to produce the Analyzed SYNMAP and Potential SYNMAP at global and North American scales. 19

Page 20: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Provenance: Data derivation history

Data derivation history information are recorded and stored in Fedora RDF semantic store. The semantic store are indexed, and can be queried using SPARQL and iTQL

Data derivation history information are recorded and stored in Fedora RDF semantic store. The semantic store are indexed, and can be queried using SPARQL and iTQL

20

Object: Analyzed_SYNMAP

Object: Analyzed_SYNMAP

Processing info…

Processing info…

Semantics (RDF):This object

is “DerivedFrom”Original_SYNMAP

Semantics (RDF):This object

is “DerivedFrom”Original_SYNMAP

Page 21: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Provenance: Granule checksums

21

Page 22: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Provenance: Auditing trail and versioning history

22

Page 23: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

Conclusion Remarks

The digital object model abstraction reduces the complexity of data curation.

Object semantics and XML encoding can be used to preserve provenance knowledge as well descriptive metadata.

The integrated system addresses many metadata and provenance issues and can be used as an archive system for Geoscience data content.

23

Page 24: Jerry Pan, Christopher Lenhardt, Biva Shrestha, Yaxing Wei, Giri Palanisamy, Robert Cook

NASA Distributed Active Archive Center for Biogeochemical Dynamics

• http://esdora2.ornl.gov/

• Acknowledgement: This work is funded by NASA ACCESS Grant # 09-ACCESS09-8

• The team would like to thank Stephen Berrick for progress reviews and guidance

• Contact: Jerry Pan, [email protected]

24


Recommended