+ All Categories
Home > Documents > Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006 .

Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006 .

Date post: 13-Dec-2015
Category:
Upload: eleanor-mccoy
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006 http://pds.nasa.gov
Transcript

Data Integrity Issues:How to Proceed?

Engineering Node

Elizabeth Rye

August 3, 2006http://pds.nasa.gov

3 August 2006 Data Integrity Issues 2 of 19

PDS Requirements for Data Integrity

• The PDS has made a commitment to ensure the integrity of its data archives. This commitment is primarily spelled out in the Level 3 requirement 4.1.2:

– “PDS will develop and implement procedures for periodically ensuring the integrity of the data.”

• Several other Level 3 requirements suggest additional implications for data integrity assurance.

3 August 2006 Data Integrity Issues 3 of 19

PDS Requirements for Data Integrity

• The PDS is responsible for assisting data providers in determining how to validate the data they provide:

– “PDS will provide criteria for validating archival products” (1.3.3)

• The PDS is responsible for ascertaining that the data we deliver to the NSSDC is valid:

– “PDS will meet U.S. federal regulations for the preservation and management of data.” (2.8.3)

– “PDS will meet U.S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC)” (4.1.5)

3 August 2006 Data Integrity Issues 4 of 19

PDS Requirements for Data Integrity

• The PDS is responsible for enabling our users to verify the integrity of the data they receive from us:

– “PDS will develop and maintain online mechanisms allowing users to download portions of the archive” (3.2.1)

– “PDS will develop and maintain a mechanism for offline delivery of portions of the archive to users”( 3.2.2)

– “PDS will provide mechanisms to ensure that data have been transferred intact” (3.2.3)

• The PDS needs to ensure the maintenance of data integrity through the media refreshing process:

– “PDS will develop and implement procedures for periodically refreshing the data by updating the underlying storage technology” (4.1.3)

3 August 2006 Data Integrity Issues 5 of 19

PDS Requirements for Data Integrity

• The PDS has a stated goal of utilizing standardized procedures in areas that affect inter-node data transfers:

– “PDS will provide standard protocols for accessing data, metadata and computing resources across the distributed archive” (2.7.3)

3 August 2006 Data Integrity Issues 6 of 19

PDS Requirements for Data Integrity

• From the above requirements, we can derive several areas of concern for data integrity:

– Verifying the integrity of data stored on physical media

– Detecting errors introduced during transfer of data to newer media

– Detecting errors that occur during transmission of data:• From data providers to the PDS• Between PDS nodes• From the PDS to the NSSDC• From the PDS to end users

3 August 2006 Data Integrity Issues 7 of 19

PDS Requirements for Data Integrity

• There are two additional areas not derivable from existing PDS requirements where data integrity issues are involved:

– The re-delivery of non-archived data during the operations phase of a mission

– The potential updating of data to newer formats long after it has been archived

3 August 2006 Data Integrity Issues 8 of 19

Mitch Gordon Survey

• For each numbered item, do you think that it is an important issue for us to address?

• Section A - It is critical that the PDS be able to ascertain the integrity of its archive. This includes (but is not limited to):

1. detecting errors that occur during the transmission of data from providers to the PDS,

2. detecting errors that occur during the transmission of data between PDS nodes,

3. detecting errors that occur during the transmission of data from the PDS to end users.

4. detecting errors that occur during the transmission of data from the PDS to the NSSDC

5. verifying the integrity of data stored on various types of external physical media (all of which have finite life spans),

6. detecting errors introduced during transfer of data to newer media,

3 August 2006 Data Integrity Issues 9 of 19

Mitch Gordon Survey

Lyle Steve Ed Steve Todd David Anne Dick Mitch

ATM EN GEO PPI PPI SBN-Az

SBN-Md

RS Rings

A Archive Integrity

A1 Providers to PDS Y Y Y Y N Y Y N Y

A2 PDS to PDS Y Y Y Y N Y Y N Y

A3 PDS to users Y Y M Y N Y Y N Y

A4 PDS to NSSDC Y Y M Y N Y Y Y Y

A5 Integrity on media Y Y Y Y Y Y Y Y Y

A6 Upgrading to new media

Y Y Y Y Y Y Y Y Y

B Identify tool

B1 Use MD5 M Y N M N Y N M Y

C Policies for use of tool Y Y N Y Y N M Y

3 August 2006 Data Integrity Issues 10 of 19

Possible Solutions to the Problem

• Checksums are widely accepted in the broader community as a means for ensuring data integrity

• MD5 checksums, in particular, are well suited to this purpose

• There has been no mechanism beside checksums suggested by any of the nodes as a means for detecting changes in data

• There is no consensus within the PDS as to whether we should limit ourselves to the MD5 checksum algorithm

• There is little consensus within the PDS as to whether we should use a standardized approach to utilizing checksums to verify data integrity

3 August 2006 Data Integrity Issues 11 of 19

Mitch Gordon Survey

• Section B - Identify a tool that can help (not necessarily be sufficient) with any, or hopefully all, of the above.

– Use a single tool, MD5, for generating and validating checksums

• Section C - Establish policies for the use of the tool in a variety of situations.

3 August 2006 Data Integrity Issues 12 of 19

Mitch Gordon Survey

Lyle Steve Ed Steve Todd David Anne Dick Mitch

ATM EN GEO PPI PPI SBN-Az

SBN-Md

RS Rings

A Archive Integrity

A1 Providers to PDS Y Y Y Y N Y Y N Y

A2 PDS to PDS Y Y Y Y N Y Y N Y

A3 PDS to users Y Y M Y N Y Y N Y

A4 PDS to NSSDC Y Y M Y N Y Y Y Y

A5 Integrity on media Y Y Y Y Y Y Y Y Y

A6 Upgrading to new media

Y Y Y Y Y Y Y Y Y

B Identify tool

B1 Use MD5 M Y N M N Y N M Y

C Policies for use of tool Y Y N Y Y N M Y

3 August 2006 Data Integrity Issues 13 of 19

Issues to be Addressed

3 August 2006 Data Integrity Issues 14 of 19

Standardization Issue

• Should we have a standardized approach across the PDS for storing and accessing checksums or should each node be permitted to use whatever mechanism it chooses?

– Some flexibility needed to deal with variety of ways in which data providers deliver data to the PDS

– Standardization permits the development of tools for generating, accessing, and periodically validating against checksums

– Standardization permits the addition of checksum tools to existing interfaces (like PDS-D and NSSDC delivery mechanism) to utilize and validate against checksums

3 August 2006 Data Integrity Issues 15 of 19

Urgency

• Volume of data returned from missions is increasing exponentially every couple of years

• Going back and calculating checksums for every file already in the PDS holdings is currently feasible, but will become a significantly more difficult task with each passing year

3 August 2006 Data Integrity Issues 16 of 19

Policy Questions to be Answered

• At what level of detail should checksums be required?

• For what parts of the archiving process should checksums be required?

• To what degree should standardization among nodes be insisted upon?

• When should we begin requiring checksums?

3 August 2006 Data Integrity Issues 17 of 19

Current Proposal (SCR 3-1034, V9)

• Mandates generation of file checksums for every file on every archive volume

• Mandates standardized format and location for storage of checksums

• Is insufficient to solve all data integrity problems, but is a necessary part of the solution

• Required for all missions archiving to v3.8 or higher of Standards Reference (roughly missions starting process late this year)

3 August 2006 Data Integrity Issues 18 of 19

Most Recent Votes on Checksum SCR

Version 5 (MC) Version 9 (Tech)

Science Nodes ATM no reject

GEO no reject

IMG yes recommend

PPI yes recommend

RINGS yes recommend

SBN no recommend

Support Nodes EN yes recommend

NAIF yes recommend

RS no reject

3 August 2006 Data Integrity Issues 19 of 19

Options for Next Step

• Proceed with MC vote on version 9 of SCR

• Form new working group to come up with a new proposal

• MC draft policy on data integrity to provide further guidance to Tech group

• Drop the issue (fails to meet our requirements)

• Other?


Recommended