Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | fay-shields |
View: | 214 times |
Download: | 0 times |
Emerging Provenance/Context Content Standard
Discussion at Data Stewardship Committee Session at ESIP Federation Meeting
January 5, 2012
H. K. “Rama” Ramapriyan and John MosesESDIS Project
NASA Goddard Space Flight [email protected]; [email protected]
Session Objectives
• Provide update on status• Identify applicable representation standards• Discuss use cases we currently have• Show sample mapping of use cases to content items in
PCCS matrix• Identify any new use cases needed• Discuss next steps
Status (1 of 3)• Received inputs from Ted Habermann et al (NOAA)• Merged with NASA’s inputs based on USGCRP Workshop (1998)
Report, and discussions (2010-2011) with EOS instrument teams (GLAS, HIRDLS) and TOMS instrument PI– Note – USGCRP workshop, jointly sponsored by NASA and NOAA,
identified a number of scenarios (use cases) from which content recommendations were derived
• Content matrix was developed and posted on ESIP Data Stewardship and Preservation Cluster wiki– Initial version – March 1, 2011– Latest version (incorporates comments from cluster members) – June 8,
2011– Focused mostly on satellite remote sensing data; need to ensure we cover
other types of data (aircraft, in situ)– Developed use cases during last ESIP meeting (July 2011)
Status (2 of 3)• NASA Earth Science Data Preservation Content Specification
– Data System Requirements are included in all satellite mission Level 1 Requirements. Preservation requirement is stated as:• The <<project name>> shall transfer to the <<designated NASA Earth Science
Division-assigned Data Center(s)>> all the information and documentation required for long-term preservation of knowledge about the products resulting from <<project name>>, as defined in the NASA Earth Science Data Preservation Content Specification document published at http://earthdata.nasa.gov/about-eosdis/requirements, and shall baseline to a specific initial version.
– Document has been prepared and baselined (available at URL above) as of November 23, 2011• Summarizes content of PCCS matrix including item descriptions and rationale• Current baselined version has been reviewed by a broad set of organizations
– DAACs, SMAP, ICESat-II, EOS Instrument Teams, MEaSUREs, DAAC User Working Groups, ESDIS, NASA HQ
• Accounts for concerns of data providers as well as users• Provides some flexibility, allowing for variations from project to project in
approach to meeting the specifications – requires projects to provide checklist to show how specifications are met
Status (3 of 3)
• NASA Earth Science Data Preservation Content Specification– An evolution path has been provided for the specification - schedules and
versions are intentionally defined to balance flight project needs for early specificity with deliberative and evolving nature of standards’ work by TIWG’s Data Stewardship Group, SPG, ESIP, IEEE, ISO
– Provides opportunity for flight projects to select a version to baseline early, but review (as part of ESDIS CCB process) later versions and adopt if feasible
– Revisions take into account concerns by existing projects– CCB-baselined version (version 2) has been posted (mid-Nov. 2011)– These will be reflected in an updated matrix (ESIP version)– Updates scheduled for Spring 2012, 2014 and 2016
Standards that Prescribe and/or Guide Construction of Preservation Content
What are the prominent standards (and groups) that guide or prescribe content which overlap or contribute to preservation content requirements?
Metadata for interoperability in dataset search and access (ISO TC 211) ISO 19115-1 Collection and Granule Metadata for geographic information (MD package) ISO 19115-2 NOAA-NASA sponsored revision for imagery and gridded data (MI package ISO 19110 Geographic information for feature entity and attribute cataloguing) ISO 19119 Metadata for services ISO 19157 How to describe, evaluate and report data quality ANSI Content and Collection Management Standards and FGDC Digital Geospatial
Metadata (ISO 19115 NAP) standards feed into ISO
Open Geospatial Consortium (OGC) Service Web Enablement (SWE) Sensor Observation Service and WCS, WFS, WMS serving netCDF and HDF sensorML (instrument specifications)
Consultive Committee on Space Data Systems (CCSDS) Open Archive Information Systems (OAIS) Reference Framework (nomenclature) Satellite data requirements (i.e., Level 0 data content), XML Formatted Data Unit
Library and Publishing applications (e.g., NISO, NARA, Library of Congress) Dublin Core Metadata Exchange and Transfer Standard PREMIS
ISO Metadata standards supporting documentation contentCategory Content Item Path to ISO Standard Preflight/
Pre-Operations Calibration
Instrument Description
/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:instrument
Pre-operational Calibration Data
Science Data Products
Raw Data and Derived Products
Metadata ISO 19115-1 Geographic Information (MD package), ISO 19115-2 Extensions for imagery and gridded data (MI package); ISO 19157 Data Quality, ISO 19119 Services
Science Data Product
Documentation
Product Team /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact or gmd:credit
Product Requirements
(basis, structure & format)
/gmi:MI_Metadata/gmi:acquisitionInformation/gmi:MI_AcquisitionInformation/gmi:acquisitionRequirement/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:abstract; purpose; environmentDescription/gmi:MI_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorFormat/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:algorithm/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description
Processing and Algorithm Version
History
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceNote
Product Generation Algorithm
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:softwareReference and /gmi:algorithm
Product Quality
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmd:DQ_QuantitativeAttributeAccuracy/gmd:evaluationMethodDescription/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmd:DQ_QuantitativeAttributeAccuracy/gmd:result/gmi:QE_CoverageResult/gmi:resultContentDescription/gmi:MI_CoverageDescription/gmd:dimension/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:report/gmi:QE_Usability/gmd:result/gmd:DQ_ConformanceResult
Product Application
Mission Data Calibration
Calibration Method
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:source/gmd:LI_Source/gmd:sourceStep/gmi:LE_ProcessStep/gmd:description
Calibration Data
Product Software
software documentation
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:processStep/gmi:LE_ProcessStep/gmi:processingInformation/gmi:LE_Processing/gmi:softwareReference and/gmi:algorithm
Product software
Algorithm Inputs
Ancillary data documentation
/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:sourcegmd:dimension (CoverageContentTypeCode=auxilliaryData)
Ancillary data
Use Cases - Purpose
• Identify what content items are required and why• Mapping of use cases to content items helps with rationale
and priority for preserving them• If no use cases are identified for some of the items
– Look harder for use cases, or– Eliminate item from PCCS
Use Cases – ESIP Summer Meeting (2011)• Creating a data set:
– Creating a long term trend data set from multiple data sets • A research 100 years in the future is examining the historical record.
– Giving credit to people involved in the data set – Describing appropriate uses of a data set
• Asserting quality of data set • Intellectual property rights • Policies • Creating citations for published data set • Validation of data
• Using a data set: – Obtaining Data
• Discovering data – Choosing a data set from multiple similar choices.
• Assessing data
– Applications for data • Analysis of new version of data set
– How do the improvements affect our use of the data? • Sharing data set for collaboration • Comparing multiple data sets • Reproducing a dataset • Verification of an experiment
Use Cases – current list• USGCRP Workshop Report (1998)
– Ozone Trends and Tropospheric Aerosols– Precipitation Trend Analysis– Near-Surface Temperature Analysis– Ocean Frontal Analysis– Snow Cover Analysis– Ocean Topography
• ESIP Summer Meeting (2011)– Long list developed (see next chart)– Obtaining Data
• Involves all of the ways users obtain data. The range of mechanisms considered range from ad-hoc methods such as consulting your colleagues at a meeting, finding out about a data set and contacting the scientist who has it to get it, to using one of the major data centers systems (or GCMD) to find and assess and obtain relevant data sets.
– Choosing a data set from multiple similar choices• A research user needs to pick the data set from multiple similar data sets that best
meets the user’s requirements for their intended application. E.g., Polar bear ecologist choosing a data set on sea ice conditions in a region of the Hudson Bay from the multiple data sets listed at NSIDC.
– Reproducing a dataset
Preservation Content Use CasesContent Item Category
Instrument Description Preflight/Pre-Operations CalibrationPreflight/Pre-operational Calibration Data
Raw Data and Derived Products Science Data ProductsMetadata
Product Team
Science Data Product
Documentation
Product Requirements (basis, structure & format)
Processing and Algorithm Version History
Product Generation Algorithm
Product Quality
Product Application
Calibration Method Mission Data CalibrationCalibration Data
Science data product generation software and software documentation
Science Data Product Software
Ancillary data and documentationScience Data
Product Algorithm Inputs
Datasets and documentationScience Data
Product Validation
Software and documentationScience Data
Software Tools
Ozone Trends and Tropospheric AerosolsIntercomparison with new in-situ observation instruments suggest previously disregarded satellite measurements were real. Recalculated TOMS measurements indicate earlier ozone depletion over Antarctica.
Precipitation Trend AnalysisIncluding weather satellite data and accounting for measurement [station location and orbit] disparities to produce a reliable record.
Near-Surface Temperature AnalysisResolving discrepancy between land-surface weather station measurements and temperatures derived from satellite microwave instruments.
Ocean Frontal AnalysisDetecting seasonal location of physical and biological mixing properties using long time series satellite derived sea surface temperature fields from Level 1 data and in situ measurements.
Snow Cover AnalysisImprovements in snow cover extent seasonal changes and long term trends resulted from capability to reprocess with new land-ocean mask and snow cover filters.
Ocean TopographyRadar altimeters map oceans surface topo to find small changes over a long period of time. Results depend on reducing Earth location errors.
Use case: Every example of successful re-analysis of archived data has depended on quality documentation (re: pg 1, GCSR for LTA)
Key Items
Category Descriptions (1 of 2)1. Category 2. Content Item 3. Definition/Description
Preflight/Pre-Operations Calibration
Instrument Description
Documentation of Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, instrument geometric calibration (geo-location offsets), noise characteristics, etc.).
Preflight/Pre-operational
Calibration Data
Numeric (digital data) files of Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, instrument geometric calibration (geo-location offsets), noise characteristics, etc.).
Science Data Products
Raw Data and Derived Products
Raw data are data values at full resolution as directly measured by a spaceborne, airborne or in situ instrument. Derived products are higher level products (level 1b through 4) where calibration and geo-location transformations have been applied to generate sensor units, and/or algorithms have been applied to generate gridded geophysical parameters.
MetadataInformation about data to facilitate discovery, search, access, understanding and usage associated with each of the data products.
Science Data Product
Documentation
Product TeamNames of key science team leads and product team members (development, help desk and operations), roles, performing organization, contact information, sponsoring agencies or organizations and comments about the products.
Product Requirements
Requirements and designs for each science data product, either explicitly or by reference to the requirements/design documents. Product requirements and designs should include content, format, latency, accuracy and quality.
Processing and Algorithm Version
History
For all products held in the archive, documentation of processing history and production version history, indicating which versions were used when, why different versions came about, and what the improvements were from version to version. For all products held in the archive, the versions of source code used to produce the products should be available at the archive.
Product Generation Algorithm
Detailed discussion of processing algorithms, outputs, error budgets and limitations. Processing algorithms and their theoretical (scientific and mathematical) basis, including complete description of any sampling or mapping algorithm used in creation of the product , geo-location, radiometric calibration, geophysical parameters, sampling or mapping algorithms used in creation of the product, algorithm software documentation, & high-level data flow diagrams. Description of how the algorithm is numerically implemented.
Product Quality
Description of the impact to product quality due to issues with computationally intensive operations (e.g., large matrix inversions, truncation and rounding). Documentation of product quality assessment (methods used, assessment summaries for each version of the datasets). Description of embedded data at the granule level including quality flags, product data uncertainty fields, data issues logs, etc. Relevant test reports, reviews, and appraisals.
Product ApplicationUseful references to published articles about the use of the data and user feedback received by the science and instrument teams about the products. Includes reports of any peculiarities or notable features observed in the products.
Category Descriptions (2 of 2)
1. Category 2. Content Item 3. Definition/Description
Mission Data Calibration
Calibration MethodThe methods used for instrument/sensor radiometric and geometric calibration while in operation (e.g., in orbit). The source code used in applying the calibration algorithms. Documentation of in-line changes to calibration or to instrument or platform operations or conditions that occur throughout the mission.
Calibration DataInstrument and platform engineering data collected during operations (e.g., on orbit), including platform and instrument environment, events and maneuvers; attitude and ephemeris; aircraft position; acquisition logs that record data gaps; calibration look-up tables; and any significant external event data that may have impacted the observations.
Science Data Product Software
Science data product generation software
and software documentation
Source code used to generate products at all levels in the science data processing system. Software release notes, including references to versions of operating systems, compilers, commercial software libraries used in the code. Versions of science data product software should be archived for each major product release. A major product release is characterized by the appearance of peer reviewed publications where reported results are based on the product version.
Science Data Product
Algorithm Inputs
Ancillary data and documentation
Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product, either explicitly in data descriptions or by reference to appropriate publications. Ancillary data should be stored with the products unless it is available from another permanent archive facility.
Science Data Product
Validation
Datasets and documentation
Accuracy of products, as measured by validation testing, and compared to accuracy requirements. Description of validation process, including identification of validation data sets, measurement protocols, data collection, analysis and accuracy reporting.
Science Data Software Tools
Software and documentation
Product access (reader) tools. Software source code that would facilitate use of the calibration data, ancillary data and the data products at all levels. Includes software source code useful for creating programs that will read and display the calibration data, ancillary data and product data and metadata values. Commercial tools should be identified with appropriate references.
European Space Agency Archive Content Descriptions
NASA ESCategory
ESA LTDP Dataset Composition Content Descriptions
Pre-Mission Mission, Sensor/Instrument requirementsSensor/Instrument characteristicsMeasurements qualification processCalibration data and methods
Data Products Raw data and/or Level 0 equivalentProcessed data (e.g., L1, L2 or upper levels)Metadata
Product Documentation
Data/Products structures and formats descriptionsProcessing algorithms and scientific basesProcessing evolution (e.g. changes, history)Product qualification process (data & methods) Known errors and/or limits in processing or algorithm’s application
Mission Calibration
Sensor/Instrument platform and its performancesInstrument/Sensor calibration, tuning (data & methods)Sensor/Instrument evolution
Software Processing algorithms and methods (e.g. technical descriptions, software coding, platform references, etc.)
Algorithm Inputs Supporting information for data processing (e.g. ancillary)Ancillary, Auxiliary support elements & evolution
Validation Validation campaign data and results
Back-up Charts
Content Matrix - Introduction• Using column headings discussed at January 2011 ESIP meeting (mostly)• Each row corresponds to a content item and provides details• Content items are mapped into 8 categories (see next chart) • One or more content items are defined in each of the categories • Column headings– Item Number (C.N – category and number within category)– Category – Content Name– Definition / Description– Rationale (why content is needed)– Criteria (how good content should be)– Priority (H, M, L or critical, essential , desirable)– Source (who should provide content item)– Project phase for capture– User community (who would be most likely to need the content item – this column is mostly blank in this
version; needs group inputs)– Representation (while focus in on “what”, brief comments are included here on whether items are word
files, numeric files, pointers, etc.)– Distribution restrictions (potential proprietary or ITAR concerns associated with content item)– Source identifying item (where content item came from – NASA, NOAA or both)
Categories
1. Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets
2. Products (Data): Raw instrument data, Level 0 through Level 4 data products and associated metadata
3. Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information
4. Mission Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers
5. Product Software: Product generation software and software documentation6. Algorithm Input: Any ancillary data or other data sets used in generation or
calibration of the data or derived product; ancillary data description and documentation
7. Validation: Record and data sets8. Software Tools: product access (reader) tools.