+ All Categories
Home > Documents > Regional Databases and Archives: the Effects of Scale…

Regional Databases and Archives: the Effects of Scale…

Date post: 03-Feb-2016
Category:
Upload: danica
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Regional Databases and Archives: the Effects of Scale…. A Presentation for “Scalable Information Networks for the Environment Workshop” October 31, 2001 San Diego, California Raymond McCord Oak Ridge National Laboratory* - PowerPoint PPT Presentation
Popular Tags:
59
R AR M M Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the Environment Workshop” October 31, 2001 San Diego, California Raymond McCord Oak Ridge National Laboratory* *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725
Transcript
Page 1: Regional Databases and Archives: the Effects of Scale…

RAARMM

Atmospheric Radiation Measurement

Regional Databases and Archives:the Effects of Scale…

A Presentation for “Scalable Information Networks for the Environment Workshop”

October 31, 2001

San Diego, California

Raymond McCord

Oak Ridge National Laboratory**Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under

contract DE-AC05-00OR22725

Page 2: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Credits Concepts are derived from managing data for

environmental projects over the past 25 years. Variations of the concepts have been observed from

these disciplines. plant community research impact assessment in marine systems national acid rain surveys Environmental monitoring and cleanup projects at DOE facilities Military land use assessment Climate change research (atmospheric research)

Ideas are freely traded with Dick Olson (ORNL)

Page 3: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Presentation Strategy Motivation and concerns Archive overview

Definition, components, functions, why & why not, examples

Archives and scale Effects of scale Mitigate scale effects

Generate and manage metadata

Future: Archive issues to resolve

Page 4: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

My Motivation & Concerns Motivation

Describe observations about the effects of scale on Archives

Describe remedies to minimize scale effects Minimize remedy pain

Concerns Preaching to the choir!! Nothing new will

happen!! Continuing unnecessary

limits to future science!!

The enemy is our behavior. Will we change

or whine???

Page 5: Regional Databases and Archives: the Effects of Scale…

Source: American Scientist,Vol 886 p 525.

You can’t keep running in here and demanding data

every two years

Challenge:engage scientists

in the processof archiving theirdata and providethe mechanismfor archiving.

Challenge:engage scientists

in the processof archiving theirdata and providethe mechanismfor archiving.

Page 6: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Archives and Scale: Presumptions

Regional data live in Archives Information sharing is important The archiving can be improved Archive “neurons” are metadata Multidisciplinary data will foster broader

ecological discoveries The limited number of permanent data

archives for ecological data will increase

Page 7: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

What Is an Archive?

Page 8: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

What Is a Data Archive? A data archive is a permanent, electronic collection

of datasets with accompanying metadata such that users of the data can acquire, understand, and use the data. More than a long-term backup More than an index or catalog with pointers to datasets

stored elsewhere For more details, see Michener, W. A. and J. W.

Brunt. 2000. Ecological Data: Design, Management and Processing. Blackwell Science. 180 pp.

Page 9: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Components of an Archive Data and metadata Storage devices Information system Network connections Staff

Data/metadata preparation and review Systems development and maintenance User support

Page 10: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Archive Functions Store data

Submitted by others Build catalog and structure Maintain storage across technology generations Review new data (QA, metadata)

“Advertise” contents Find data for users

Query and browse logic Distribute data

Provide access to data References to documentation

Page 11: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Data Centers at ORNL CDIAC - Carbon Dioxide Information Analysis

Center ARM Archive - Atmospheric Radiation

Measurement Program ORNL DAAC - Distributed Active Archive

Center for Biogeochemical Dynamics NARSTO - tropospheric air pollution information

for North America OREIS - Oak Ridge Environmental Information

System

Page 12: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Atmospheric Radiation Measurement (ARM) Program ARM research questions:

What happens to all of the sunlight energy? How is light absorbed by clouds? What does partly cloudy mean? Statistically? Spatially? What types of clouds form? When and How?

ARM is a ‘once in a lifetime’ research adventure for atmospheric scientists

ARM research includes instrumentation, system development, data analysis, and modeling (climate and process)

Page 13: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

ARM Measurements ScopeAll data collection is highly automated -- a REAL BLAST!!

Data collection is now a peer outcome with scientific discovery

Page 14: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

ARM Archive

ARM Archive stores and provides access to the entire accumulation of data Currently 5 million files and 14,000 GB and growing

The ARM data in the Archive will be accessed for research for many years (decades) Currently distributes 50-100,000 files per month

(100-200 GB) More information:

ARM Program www.arm.gov ARM Archive www.archive.arm.gov

Page 15: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Other ARM Systems

IncomingDataFiles

DataReception

operationsmeta data

backupdatafiles

catalogmeta data

MassStorageSystem

ARM Archive Schematic“Archive Input & Output”

Archive webUser Interface

quer

y sp

ecifi

catio

ns

date

location

measurement DataRetrieval

filelist

Requestedfiles

user copy

Page 16: Regional Databases and Archives: the Effects of Scale…

Data Flow

Data Metadata

User Interface

Core archive functions

ArchiveDevelopment

and Maintenance

Archivesupport

Data andMetadata

Submission

Data/Metadata

Ingest

Backup,Security,Migration

Network User Request

Requestpathways

UserSupport

User interactions

Page 17: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Why Archive?? “I am doing Science.

Trust me.”

Page 18: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Cycles of Research“An Information View”

Planning

Automation and review

Informationreview

Problem Definition(Research Objectives)

Analysis and

modeling

Planning

MeasurementCollection

Selection andextraction

Archive of Data

Publications

OriginalObservations

SecondaryObservations

200 yrs 20 yrs

Page 19: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Why Don’t I Archive My Data? No incentives - what’s in it for me? No acknowledgment - does a dataset = paper? Give up publication rights - will somebody scoop

me? Poor planning - it was not in “the Plan” No resources - who’s going to pay for it? Lack of training - what do I do first? Unsure about metadata content - how much is

enough?

Page 20: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Why Should I Archive My Data?(management hints!!)

Career advancement (give them credit) you will get some recognition you can publish data paper in ESA Ecological Archives it may help me do science with broader scope

Professional incentives (give them training) good scientific practice (create peer pressure)

Institutional incentives (have expectations) required by the sponsor

Technological advances (give them systems) its easier and there are more options

Page 21: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Archiving Supports Science Metadata required for archiving will

improve data quality Extends data usefulness Increases your information base for doing

research: data volume and diversity

Permits replication of results

A KEY concept

of Science

Page 22: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

The Effects of Project Scale on Archives

“Metadata are archive neurons??”

Page 23: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Metadata Depends on Your “World View”

Investigator Doesn’t need extensive formal metadata

Project Metadata needed for project integration and modeling

activities Project data manager may help write metadata

Data archive More detailed metadata (e.g., spatial coordinates) More standardization (e.g., keywords) to communicate

clearly with future users Who writes the metadata?

Page 24: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

(In the beginning, was the measurement. It was formless and desolate. Without context…)

Page 25: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Single Experiment View

datesample

ID

parameter name

location

Page 26: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Research Project View

QA flag

media

datesample

ID

parameter name

location

Page 27: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Long-term or Multidisciplinary View

QA flag

media

generator

method

datesample

ID

parameter name

location

records

Units

Page 28: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Integrated System & Archive View

QA flag

media

generator

method

datesample

ID

parameter name

location

records

Units

Sample def.typedatelocationgenerator

labfield

Method def.

words, wordsunitsmethod

Parameter def.

org.typenamecustodianaddress, etc.

coord.elev. typedepth

Recordsystem

datewords, words.

QA def.

Units def.

GIS

Page 29: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Another View of Scale

Page 30: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Program

Project Scale and Recorded Metadata

PIMetadata Group Archive

Increasing User Scope

•Units

•Method

•QA flag

•Media

•Parameter name

•Measurement

•Date

•Sample ID

•Location

•Generator

•Records

Page 31: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Data Maturation and Scale Individual Investigators

collect data, quality assure, document, analyze, publish Groups or Science Teams

collate data, enhance, synthesize, model, publish Project Information System

collate data, review completeness, maintain data for project Data Distribution and Archive Center

long-term archive, distribute freely to users Master Data Directory

searchable index with pointers to data

Page 32: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Preparing for ArchivingI will not wait.I will not wait.I will not wait.I will not …

Page 33: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Generic Environmental Data Model (Which Piece Is First…?)

QA flag

media

generator

method

datesample

ID

parameter name

location

records

Units

Sample def.typedatelocationgenerator

labfield

Method def.

words, wordsunitsmethod

Parameter def.

org.typenamecustodianaddress, etc.

coord.elev. typedepth

Recordsystem

datewords, words.

QA def.

Units def.

GIS

Page 34: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Measurement

Sequence of Information Birth

QA flag

media

generator

method

datesample

ID

parameter name

location

records

Units

Sample def.typedatelocationgenerator

labfield

Method def.

words, wordsunitsmethod

Parameter def.

org.typenamecustodianaddress, etc.

coord.elev. typedepth

Recordsystem

datewords, words.

QA def.

Units def.

GIS

Page 35: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Research ~ Publishing ~ Metadata

Metadata design can be a “checklist” for research planning

Metadata preparation can be integrated with publication process

Metadata are an investment in current and future science

Page 36: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Where to Archive Data?

Page 37: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Archive Choices What determines your options?

Sponsor requirements Repository access Metadata requirements

Scalable storage Personal web pages and files Project or network data centers Federal data centers

Links “transcend” storage structures Master directory Mercury

Page 38: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Personal Web Page Its fun, rewarding, relatively easy, can share data

quickly, can control access to data Data issues??

complete metadata QA checks

Connected to basic archival center functions?? ready access to data (24 h/d, 7 d/wk) user support data available on multiple media secure, backed-up, long-term storage

Page 39: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

ESA Ecological Archives Publishing datasets as peer reviewed, citable papers

(with volume and page numbers) Data papers are announced in abstract form in a

print journal with data available electronically Citation example

Esser, G., H.F.H. Lieth, J.M.O. Scurlock and R.J. Olson. 2000. Osnabrück net primary productivity data set. (Ecological Archives data paper E081-011). Ecology 81, 1177-1177.

Bill Michener, Editor http://esa.sdsc.edu/esapubs/Journals_main.htm

Page 40: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Master Data Directory Provides search capability and pointers to a source

of the data (Center does not archive data) Maintains standard keywords/indices Collects metadata from many sources Examples

Global Change Master Directory (GCMD) http://gcmd.gsfc.nasa.gov

ORNL DAAC Mercury System http://mercury.ornl.gov

Page 41: Regional Databases and Archives: the Effects of Scale…

6. Data and documentationare downloaded directly from the data provider

1. The data provider uses the Metadata Editor to create a metadata file containing links to the data and documentation

5. User links to data provider’s server

2. Mercury harvests the metadata and builds an index

3. Users query the index

User

4. Full metadata are returnedto the user, including linksback to the data provider

MetadataIndex

NASA / ORNL

Data anddocumentation

What is Mercury?

Mercury is used to assist an

investigator with documenting data and making these data available to

others.

Mercury is used to assist an

investigator with documenting data and making these data available to

others.

Page 42: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Regional Archives

Page 43: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Sources of Regional Data Carbon Dioxide Information Analysis Center National Geophysical Data Center National Environmental Satellite, Data, and

Information Service National Soils Data Access Facility National Water Information System Forest Inventory and Analysis Breeding Bird Survey Threatened and Endangered Species Global Change Master Directory

Page 44: Regional Databases and Archives: the Effects of Scale…

NASA EOSDIS Distributed Active Archive Centers

JPLJPL

U. AlaskaU. Alaska

U. ColoradoU. Colorado

EDCEDC

ORNLORNL

GSFCGSFC

LaRCLaRC

Cryosphere

Land ProcessesSEDACSEDAC

Socio-economic

BiogeochemicalDynamics

Sea Ice andPolar Processes

AtmosphericProcesses

Upper Atmosphere,Global Biosphere, and

Geophysics

Ocean CirculationAnd Air-sea Interaction

Page 45: Regional Databases and Archives: the Effects of Scale…

Precipitation

Cloud Amount

Fossil Fuel Emissions

Soil Carbon

Topography

LW Radiation

Clear-Sky Albedo

Vegetation Biophysics (fPAR)

Global scale, 280 parameters: surface, atmospheric, fluxes

Page 46: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Future: Issues to Resolve

Size, diversity, and longevity Accommodating change Teaching good practices

Page 47: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Issues: Size, Diversity, Longevity Size

Online vs. Offline Database vs. File structure Multiple institutions Too big for technology migration??

Diversity Increased logic and documentation for “finding data” Spatial distribution Increased potential for uniqueness conflicts

Longevity Too old to explain or decode Too much evolution of methods and practices Asynchronous change in data and metadata

Page 48: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Issues: Planning and Requirements Plan for archiving early and ongoing

Avoids missing metadata Avoids panic Improves overall data quality and consistency Consider the timing of requirements

Requirements Standards: “to be or not to be?” Documentation expectations Accessibility

“Its mine!!Its my data!!

You CAN’T have it!!”

Page 49: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Research Implies Change …

repeat…

New informationrequirements

New questions

Research

DiscoveryNot always true for other information

systems

Page 50: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Issues: Accommodating Change Change must be considered in the design Things that will change

Access expectations Logical hierarchy of information scope

New parameters New disciplines New study sites New data sources or methods

Page 51: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Issues: More Changes Unpredictable variation is:

no excuse!! Often used as an excuse to avoid standards Cannot avoid all of it, but try…

Missing values will occur; Plan ahead Do not do: Temp, temp, t, T, temperature

Be clear, avoid ambiguity Minimal observational intensity is:

no excuse!! Quick study = no documentation??

The unexpected are rare and

most valuable??

Page 52: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Rules for CreatingDatabases for Archiving Unique occurrences

Each type of measurement is represented in a consistent way Each measurement event is represented by only one value

Identifiers Each value is associated with a parameter name Each measurement value has a quality indicator and link to a

method description

Place and time Each value is associated with a unique place name with a

quantitatively defined location (geographic coordinates) Each value is associated with a date and time

Data Storage and Transport Data are stored or managed with a database management

system or equivalent

Page 53: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive

Best Practices include Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation

Published: Cook et al. 2001. Ecological Bulletin http://www.daac.ornl.gov/DAAC/bestpractices.html

Page 54: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Reflecting Into the Future…

Page 55: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Workshop Reactions Distributed (sensor) processing

Yes / No Automated QA Getting data dirty

Metadata early 10X easier, scalable

Differentiate standards Intentional variance only Partition / isolate exceptions when possible

Look for 3, 5, 10X changes 20-30% not worthwhile

Page 56: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Summary Points Archives need structure and standards Social and education solutions VERY

important Metadata are the “neurons” of Archives Metadata early better than late Need to think about our choices.

Page 57: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Future Thoughts Will we be able to know “Where are we?”

in the information structure How many 30 KB files are on a 100

GB tape cartridge? The future limits will not be technology

But our minds… We need to plan NOW how to best

leverage the future

Page 58: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

A Future Scientist’s View I told my college-age daughter about the

Japanese announcement of 1 TB of optical memory in 1 cubic centimeter.

Her reply: “…We need to know how to think critically

and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”

Page 59: Regional Databases and Archives: the Effects of Scale…

RAARMMAtmospheric Radiation Measurement

Looking Forward to a Future With Archives!!


Recommended