+ All Categories
Home > Documents > Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at...

Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at...

Date post: 29-Dec-2015
Category:
Upload: fay-shields
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan and John Moses ESDIS Project NASA Goddard Space Flight Center [email protected]; [email protected]
Transcript
Page 1: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Emerging Provenance/Context Content Standard

Discussion at Data Stewardship Committee Session at ESIP Federation Meeting

January 5, 2012

H. K. “Rama” Ramapriyan and John MosesESDIS Project

NASA Goddard Space Flight [email protected]; [email protected]

Page 2: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Session Objectives

• Provide update on status• Identify applicable representation standards• Discuss use cases we currently have• Show sample mapping of use cases to content items in

PCCS matrix• Identify any new use cases needed• Discuss next steps

Page 3: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Status (1 of 3)• Received inputs from Ted Habermann et al (NOAA)• Merged with NASA’s inputs based on USGCRP Workshop (1998)

Report, and discussions (2010-2011) with EOS instrument teams (GLAS, HIRDLS) and TOMS instrument PI– Note – USGCRP workshop, jointly sponsored by NASA and NOAA,

identified a number of scenarios (use cases) from which content recommendations were derived

• Content matrix was developed and posted on ESIP Data Stewardship and Preservation Cluster wiki– Initial version – March 1, 2011– Latest version (incorporates comments from cluster members) – June 8,

2011– Focused mostly on satellite remote sensing data; need to ensure we cover

other types of data (aircraft, in situ)– Developed use cases during last ESIP meeting (July 2011)

Page 4: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Status (2 of 3)• NASA Earth Science Data Preservation Content Specification

– Data System Requirements are included in all satellite mission Level 1 Requirements. Preservation requirement is stated as:• The <<project name>> shall transfer to the <<designated NASA Earth Science

Division-assigned Data Center(s)>> all the information and documentation required for long-term preservation of knowledge about the products resulting from <<project name>>, as defined in the NASA Earth Science Data Preservation Content Specification document published at http://earthdata.nasa.gov/about-eosdis/requirements, and shall baseline to a specific initial version.

– Document has been prepared and baselined (available at URL above) as of November 23, 2011• Summarizes content of PCCS matrix including item descriptions and rationale• Current baselined version has been reviewed by a broad set of organizations

– DAACs, SMAP, ICESat-II, EOS Instrument Teams, MEaSUREs, DAAC User Working Groups, ESDIS, NASA HQ

• Accounts for concerns of data providers as well as users• Provides some flexibility, allowing for variations from project to project in

approach to meeting the specifications – requires projects to provide checklist to show how specifications are met

Page 5: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Status (3 of 3)

• NASA Earth Science Data Preservation Content Specification– An evolution path has been provided for the specification - schedules and

versions are intentionally defined to balance flight project needs for early specificity with deliberative and evolving nature of standards’ work by TIWG’s Data Stewardship Group, SPG, ESIP, IEEE, ISO

– Provides opportunity for flight projects to select a version to baseline early, but review (as part of ESDIS CCB process) later versions and adopt if feasible

– Revisions take into account concerns by existing projects– CCB-baselined version (version 2) has been posted (mid-Nov. 2011)– These will be reflected in an updated matrix (ESIP version)– Updates scheduled for Spring 2012, 2014 and 2016

Page 6: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Standards that Prescribe and/or Guide Construction of Preservation Content

What are the prominent standards (and groups) that guide or prescribe content which overlap or contribute to preservation content requirements?

Metadata for interoperability in dataset search and access (ISO TC 211) ISO 19115-1 Collection and Granule Metadata for geographic information (MD package) ISO 19115-2 NOAA-NASA sponsored revision for imagery and gridded data (MI package ISO 19110 Geographic information for feature entity and attribute cataloguing) ISO 19119 Metadata for services ISO 19157 How to describe, evaluate and report data quality ANSI Content and Collection Management Standards and FGDC Digital Geospatial

Metadata (ISO 19115 NAP) standards feed into ISO

Open Geospatial Consortium (OGC) Service Web Enablement (SWE) Sensor Observation Service and WCS, WFS, WMS serving netCDF and HDF sensorML (instrument specifications)

Consultive Committee on Space Data Systems (CCSDS) Open Archive Information Systems (OAIS) Reference Framework (nomenclature) Satellite data requirements (i.e., Level 0 data content), XML Formatted Data Unit

Library and Publishing applications (e.g., NISO, NARA, Library of Congress) Dublin Core Metadata Exchange and Transfer Standard PREMIS

Page 7: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

ISO Metadata standards supporting documentation contentCategory Content Item Path to ISO Standard Preflight/

Pre-Operations Calibration

Instrument Description

/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:instrument

Pre-operational Calibration Data

 

Science Data Products

Raw Data and Derived Products

 

Metadata ISO 19115-1 Geographic Information (MD package), ISO 19115-2 Extensions for imagery and gridded data (MI package); ISO 19157 Data Quality, ISO 19119 Services

Science Data Product

Documentation

Product Team /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact or gmd:credit

Product Requirements

(basis, structure & format)

/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:acquisitionRequirement/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:abstract; purpose; environmentDescription/gmi:MI_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorFormat/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:algorithm/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description

Processing and Algorithm Version

History

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceNote

Product Generation Algorithm

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:softwareReference and /gmi:algorithm

Product Quality

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmd:DQ_QuantitativeAttributeAccuracy/gmd:evaluationMethodDescription/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmd:DQ_QuantitativeAttributeAccuracy/gmd:result/gmi:QE_CoverageResult/gmi:resultContentDescription/gmi:MI_CoverageDescription/gmd:dimension/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmi:QE_Usability/gmd:result/gmd:DQ_ConformanceResult

Product Application

 

Mission Data Calibration

Calibration Method

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description

Calibration Data  

Product Software

software documentation

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:softwareReference and/gmi:algorithm

Product software  

Algorithm Inputs

Ancillary data documentation

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:sourcegmd:dimension (CoverageContentTypeCode=auxilliaryData)

Ancillary data  

Page 8: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Use Cases - Purpose

• Identify what content items are required and why• Mapping of use cases to content items helps with rationale

and priority for preserving them• If no use cases are identified for some of the items

– Look harder for use cases, or– Eliminate item from PCCS

Page 9: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Use Cases – ESIP Summer Meeting (2011)• Creating a data set:

– Creating a long term trend data set from multiple data sets • A research 100 years in the future is examining the historical record.

– Giving credit to people involved in the data set – Describing appropriate uses of a data set

• Asserting quality of data set • Intellectual property rights • Policies • Creating citations for published data set • Validation of data

• Using a data set: – Obtaining Data

• Discovering data – Choosing a data set from multiple similar choices.

• Assessing data

– Applications for data • Analysis of new version of data set

– How do the improvements affect our use of the data? • Sharing data set for collaboration • Comparing multiple data sets • Reproducing a dataset • Verification of an experiment

Page 10: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Use Cases – current list• USGCRP Workshop Report (1998)

– Ozone Trends and Tropospheric Aerosols– Precipitation Trend Analysis– Near-Surface Temperature Analysis– Ocean Frontal Analysis– Snow Cover Analysis– Ocean Topography

• ESIP Summer Meeting (2011)– Long list developed (see next chart)– Obtaining Data

• Involves all of the ways users obtain data. The range of mechanisms considered range from ad-hoc methods such as consulting your colleagues at a meeting, finding out about a data set and contacting the scientist who has it to get it, to using one of the major data centers systems (or GCMD) to find and assess and obtain relevant data sets.

– Choosing a data set from multiple similar choices• A research user needs to pick the data set from multiple similar data sets that best

meets the user’s requirements for their intended application. E.g., Polar bear ecologist choosing a data set on sea ice conditions in a region of the Hudson Bay from the multiple data sets listed at NSIDC.

– Reproducing a dataset

Page 11: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Preservation Content Use CasesContent Item Category

Instrument Description Preflight/Pre-Operations CalibrationPreflight/Pre-operational Calibration Data

Raw Data and Derived Products Science Data ProductsMetadata

Product Team

Science Data Product

Documentation

Product Requirements (basis, structure & format)

Processing and Algorithm Version History

Product Generation Algorithm

Product Quality

Product Application

Calibration Method Mission Data CalibrationCalibration Data

Science data product generation software and software documentation

Science Data Product Software

Ancillary data and documentationScience Data

Product Algorithm Inputs

Datasets and documentationScience Data

Product Validation

Software and documentationScience Data

Software Tools

Ozone Trends and Tropospheric AerosolsIntercomparison with new in-situ observation instruments suggest previously disregarded satellite measurements were real. Recalculated TOMS measurements indicate earlier ozone depletion over Antarctica.

Precipitation Trend AnalysisIncluding weather satellite data and accounting for measurement [station location and orbit] disparities to produce a reliable record.

Near-Surface Temperature AnalysisResolving discrepancy between land-surface weather station measurements and temperatures derived from satellite microwave instruments.

Ocean Frontal AnalysisDetecting seasonal location of physical and biological mixing properties using long time series satellite derived sea surface temperature fields from Level 1 data and in situ measurements.

Snow Cover AnalysisImprovements in snow cover extent seasonal changes and long term trends resulted from capability to reprocess with new land-ocean mask and snow cover filters.

Ocean TopographyRadar altimeters map oceans surface topo to find small changes over a long period of time. Results depend on reducing Earth location errors.

Use case: Every example of successful re-analysis of archived data has depended on quality documentation (re: pg 1, GCSR for LTA)

Key Items

Page 12: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Category Descriptions (1 of 2)1. Category 2. Content Item 3. Definition/Description

Preflight/Pre-Operations Calibration

Instrument Description

Documentation of Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, instrument geometric calibration (geo-location offsets), noise characteristics, etc.).

Preflight/Pre-operational

Calibration Data

Numeric (digital data) files of Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, instrument geometric calibration (geo-location offsets), noise characteristics, etc.).

Science Data Products

Raw Data and Derived Products

Raw data are data values at full resolution as directly measured by a spaceborne, airborne or in situ instrument. Derived products are higher level products (level 1b through 4) where calibration and geo-location transformations have been applied to generate sensor units, and/or algorithms have been applied to generate gridded geophysical parameters.

MetadataInformation about data to facilitate discovery, search, access, understanding and usage associated with each of the data products.

Science Data Product

Documentation

Product TeamNames of key science team leads and product team members (development, help desk and operations), roles, performing organization, contact information, sponsoring agencies or organizations and comments about the products.

Product Requirements

Requirements and designs for each science data product, either explicitly or by reference to the requirements/design documents. Product requirements and designs should include content, format, latency, accuracy and quality.

Processing and Algorithm Version

History

For all products held in the archive, documentation of processing history and production version history, indicating which versions were used when, why different versions came about, and what the improvements were from version to version. For all products held in the archive, the versions of source code used to produce the products should be available at the archive.

Product Generation Algorithm

Detailed discussion of processing algorithms, outputs, error budgets and limitations. Processing algorithms and their theoretical (scientific and mathematical) basis, including complete description of any sampling or mapping algorithm used in creation of the product , geo-location, radiometric calibration, geophysical parameters, sampling or mapping algorithms used in creation of the product, algorithm software documentation, & high-level data flow diagrams. Description of how the algorithm is numerically implemented.

Product Quality

Description of the impact to product quality due to issues with computationally intensive operations (e.g., large matrix inversions, truncation and rounding). Documentation of product quality assessment (methods used, assessment summaries for each version of the datasets). Description of embedded data at the granule level including quality flags, product data uncertainty fields, data issues logs, etc. Relevant test reports, reviews, and appraisals.

Product ApplicationUseful references to published articles about the use of the data and user feedback received by the science and instrument teams about the products. Includes reports of any peculiarities or notable features observed in the products.

Page 13: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Category Descriptions (2 of 2)

1. Category 2. Content Item 3. Definition/Description

Mission Data Calibration

Calibration MethodThe methods used for instrument/sensor radiometric and geometric calibration while in operation (e.g., in orbit). The source code used in applying the calibration algorithms. Documentation of in-line changes to calibration or to instrument or platform operations or conditions that occur throughout the mission.

Calibration DataInstrument and platform engineering data collected during operations (e.g., on orbit), including platform and instrument environment, events and maneuvers; attitude and ephemeris; aircraft position; acquisition logs that record data gaps; calibration look-up tables; and any significant external event data that may have impacted the observations.

Science Data Product Software

Science data product generation software

and software documentation

Source code used to generate products at all levels in the science data processing system. Software release notes, including references to versions of operating systems, compilers, commercial software libraries used in the code. Versions of science data product software should be archived for each major product release. A major product release is characterized by the appearance of peer reviewed publications where reported results are based on the product version.

Science Data Product

Algorithm Inputs

Ancillary data and documentation

Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product, either explicitly in data descriptions or by reference to appropriate publications. Ancillary data should be stored with the products unless it is available from another permanent archive facility.

Science Data Product

Validation

Datasets and documentation

Accuracy of products, as measured by validation testing, and compared to accuracy requirements. Description of validation process, including identification of validation data sets, measurement protocols, data collection, analysis and accuracy reporting.

Science Data Software Tools

Software and documentation

Product access (reader) tools.  Software source code that would facilitate use of the calibration data, ancillary data and the data products at all levels. Includes software source code useful for creating programs that will read and display the calibration data, ancillary data and product data and metadata values.  Commercial tools should be identified with appropriate references.

Page 14: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

European Space Agency Archive Content Descriptions

NASA ESCategory

ESA LTDP Dataset Composition Content Descriptions

Pre-Mission Mission, Sensor/Instrument requirementsSensor/Instrument characteristicsMeasurements qualification processCalibration data and methods

Data Products Raw data and/or Level 0 equivalentProcessed data (e.g., L1, L2 or upper levels)Metadata

Product Documentation

Data/Products structures and formats descriptionsProcessing algorithms and scientific basesProcessing evolution (e.g. changes, history)Product qualification process (data & methods) Known errors and/or limits in processing or algorithm’s application

Mission Calibration

Sensor/Instrument platform and its performancesInstrument/Sensor calibration, tuning (data & methods)Sensor/Instrument evolution

Software Processing algorithms and methods (e.g. technical descriptions, software coding, platform references, etc.)

Algorithm Inputs Supporting information for data processing (e.g. ancillary)Ancillary, Auxiliary support elements & evolution

Validation Validation campaign data and results

Page 15: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Back-up Charts

Page 16: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Content Matrix - Introduction• Using column headings discussed at January 2011 ESIP meeting (mostly)• Each row corresponds to a content item and provides details• Content items are mapped into 8 categories (see next chart) • One or more content items are defined in each of the categories • Column headings– Item Number (C.N – category and number within category)– Category – Content Name– Definition / Description– Rationale (why content is needed)– Criteria (how good content should be)– Priority (H, M, L or critical, essential , desirable)– Source (who should provide content item)– Project phase for capture– User community (who would be most likely to need the content item – this column is mostly blank in this

version; needs group inputs)– Representation (while focus in on “what”, brief comments are included here on whether items are word

files, numeric files, pointers, etc.)– Distribution restrictions (potential proprietary or ITAR concerns associated with content item)– Source identifying item (where content item came from – NASA, NOAA or both)

Page 17: Emerging Provenance/Context Content Standard Discussion at Data Stewardship Committee Session at ESIP Federation Meeting January 5, 2012 H. K. “Rama” Ramapriyan.

Categories

1. Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets

2. Products (Data): Raw instrument data, Level 0 through Level 4 data products and associated metadata

3. Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information

4. Mission Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers

5. Product Software: Product generation software and software documentation6. Algorithm Input: Any ancillary data or other data sets used in generation or

calibration of the data or derived product; ancillary data description and documentation

7. Validation: Record and data sets8. Software Tools: product access (reader) tools.


Recommended