RAARMM
Atmospheric Radiation Measurement
Regional Databases and Archives:the Effects of Scale…
A Presentation for “Scalable Information Networks for the Environment Workshop”
October 31, 2001
San Diego, California
Raymond McCord
Oak Ridge National Laboratory**Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under
contract DE-AC05-00OR22725
RAARMMAtmospheric Radiation Measurement
Credits Concepts are derived from managing data for
environmental projects over the past 25 years. Variations of the concepts have been observed from
these disciplines. plant community research impact assessment in marine systems national acid rain surveys Environmental monitoring and cleanup projects at DOE facilities Military land use assessment Climate change research (atmospheric research)
Ideas are freely traded with Dick Olson (ORNL)
RAARMMAtmospheric Radiation Measurement
Presentation Strategy Motivation and concerns Archive overview
Definition, components, functions, why & why not, examples
Archives and scale Effects of scale Mitigate scale effects
Generate and manage metadata
Future: Archive issues to resolve
RAARMMAtmospheric Radiation Measurement
My Motivation & Concerns Motivation
Describe observations about the effects of scale on Archives
Describe remedies to minimize scale effects Minimize remedy pain
Concerns Preaching to the choir!! Nothing new will
happen!! Continuing unnecessary
limits to future science!!
The enemy is our behavior. Will we change
or whine???
Source: American Scientist,Vol 886 p 525.
You can’t keep running in here and demanding data
every two years
Challenge:engage scientists
in the processof archiving theirdata and providethe mechanismfor archiving.
Challenge:engage scientists
in the processof archiving theirdata and providethe mechanismfor archiving.
RAARMMAtmospheric Radiation Measurement
Archives and Scale: Presumptions
Regional data live in Archives Information sharing is important The archiving can be improved Archive “neurons” are metadata Multidisciplinary data will foster broader
ecological discoveries The limited number of permanent data
archives for ecological data will increase
RAARMMAtmospheric Radiation Measurement
What Is an Archive?
RAARMMAtmospheric Radiation Measurement
What Is a Data Archive? A data archive is a permanent, electronic collection
of datasets with accompanying metadata such that users of the data can acquire, understand, and use the data. More than a long-term backup More than an index or catalog with pointers to datasets
stored elsewhere For more details, see Michener, W. A. and J. W.
Brunt. 2000. Ecological Data: Design, Management and Processing. Blackwell Science. 180 pp.
RAARMMAtmospheric Radiation Measurement
Components of an Archive Data and metadata Storage devices Information system Network connections Staff
Data/metadata preparation and review Systems development and maintenance User support
RAARMMAtmospheric Radiation Measurement
Archive Functions Store data
Submitted by others Build catalog and structure Maintain storage across technology generations Review new data (QA, metadata)
“Advertise” contents Find data for users
Query and browse logic Distribute data
Provide access to data References to documentation
RAARMMAtmospheric Radiation Measurement
Data Centers at ORNL CDIAC - Carbon Dioxide Information Analysis
Center ARM Archive - Atmospheric Radiation
Measurement Program ORNL DAAC - Distributed Active Archive
Center for Biogeochemical Dynamics NARSTO - tropospheric air pollution information
for North America OREIS - Oak Ridge Environmental Information
System
RAARMMAtmospheric Radiation Measurement
Atmospheric Radiation Measurement (ARM) Program ARM research questions:
What happens to all of the sunlight energy? How is light absorbed by clouds? What does partly cloudy mean? Statistically? Spatially? What types of clouds form? When and How?
ARM is a ‘once in a lifetime’ research adventure for atmospheric scientists
ARM research includes instrumentation, system development, data analysis, and modeling (climate and process)
RAARMMAtmospheric Radiation Measurement
ARM Measurements ScopeAll data collection is highly automated -- a REAL BLAST!!
Data collection is now a peer outcome with scientific discovery
RAARMMAtmospheric Radiation Measurement
ARM Archive
ARM Archive stores and provides access to the entire accumulation of data Currently 5 million files and 14,000 GB and growing
The ARM data in the Archive will be accessed for research for many years (decades) Currently distributes 50-100,000 files per month
(100-200 GB) More information:
ARM Program www.arm.gov ARM Archive www.archive.arm.gov
RAARMMAtmospheric Radiation Measurement
Other ARM Systems
IncomingDataFiles
DataReception
operationsmeta data
backupdatafiles
catalogmeta data
MassStorageSystem
ARM Archive Schematic“Archive Input & Output”
Archive webUser Interface
quer
y sp
ecifi
catio
ns
date
location
measurement DataRetrieval
filelist
Requestedfiles
user copy
Data Flow
Data Metadata
User Interface
Core archive functions
ArchiveDevelopment
and Maintenance
Archivesupport
Data andMetadata
Submission
Data/Metadata
Ingest
Backup,Security,Migration
Network User Request
Requestpathways
UserSupport
User interactions
RAARMMAtmospheric Radiation Measurement
Why Archive?? “I am doing Science.
Trust me.”
RAARMMAtmospheric Radiation Measurement
Cycles of Research“An Information View”
Planning
Automation and review
Informationreview
Problem Definition(Research Objectives)
Analysis and
modeling
Planning
MeasurementCollection
Selection andextraction
Archive of Data
Publications
OriginalObservations
SecondaryObservations
200 yrs 20 yrs
RAARMMAtmospheric Radiation Measurement
Why Don’t I Archive My Data? No incentives - what’s in it for me? No acknowledgment - does a dataset = paper? Give up publication rights - will somebody scoop
me? Poor planning - it was not in “the Plan” No resources - who’s going to pay for it? Lack of training - what do I do first? Unsure about metadata content - how much is
enough?
RAARMMAtmospheric Radiation Measurement
Why Should I Archive My Data?(management hints!!)
Career advancement (give them credit) you will get some recognition you can publish data paper in ESA Ecological Archives it may help me do science with broader scope
Professional incentives (give them training) good scientific practice (create peer pressure)
Institutional incentives (have expectations) required by the sponsor
Technological advances (give them systems) its easier and there are more options
RAARMMAtmospheric Radiation Measurement
Archiving Supports Science Metadata required for archiving will
improve data quality Extends data usefulness Increases your information base for doing
research: data volume and diversity
Permits replication of results
A KEY concept
of Science
RAARMMAtmospheric Radiation Measurement
The Effects of Project Scale on Archives
“Metadata are archive neurons??”
RAARMMAtmospheric Radiation Measurement
Metadata Depends on Your “World View”
Investigator Doesn’t need extensive formal metadata
Project Metadata needed for project integration and modeling
activities Project data manager may help write metadata
Data archive More detailed metadata (e.g., spatial coordinates) More standardization (e.g., keywords) to communicate
clearly with future users Who writes the metadata?
RAARMMAtmospheric Radiation Measurement
Measurement
(In the beginning, was the measurement. It was formless and desolate. Without context…)
RAARMMAtmospheric Radiation Measurement
Measurement
Single Experiment View
datesample
ID
parameter name
location
RAARMMAtmospheric Radiation Measurement
Measurement
Research Project View
QA flag
media
datesample
ID
parameter name
location
RAARMMAtmospheric Radiation Measurement
Measurement
Long-term or Multidisciplinary View
QA flag
media
generator
method
datesample
ID
parameter name
location
records
Units
RAARMMAtmospheric Radiation Measurement
Measurement
Integrated System & Archive View
QA flag
media
generator
method
datesample
ID
parameter name
location
records
Units
Sample def.typedatelocationgenerator
labfield
Method def.
words, wordsunitsmethod
Parameter def.
org.typenamecustodianaddress, etc.
coord.elev. typedepth
Recordsystem
datewords, words.
QA def.
Units def.
GIS
RAARMMAtmospheric Radiation Measurement
Another View of Scale
RAARMMAtmospheric Radiation Measurement
Program
Project Scale and Recorded Metadata
PIMetadata Group Archive
Increasing User Scope
•Units
•Method
•QA flag
•Media
•Parameter name
•Measurement
•Date
•Sample ID
•Location
•Generator
•Records
RAARMMAtmospheric Radiation Measurement
Data Maturation and Scale Individual Investigators
collect data, quality assure, document, analyze, publish Groups or Science Teams
collate data, enhance, synthesize, model, publish Project Information System
collate data, review completeness, maintain data for project Data Distribution and Archive Center
long-term archive, distribute freely to users Master Data Directory
searchable index with pointers to data
RAARMMAtmospheric Radiation Measurement
Preparing for ArchivingI will not wait.I will not wait.I will not wait.I will not …
RAARMMAtmospheric Radiation Measurement
Measurement
Generic Environmental Data Model (Which Piece Is First…?)
QA flag
media
generator
method
datesample
ID
parameter name
location
records
Units
Sample def.typedatelocationgenerator
labfield
Method def.
words, wordsunitsmethod
Parameter def.
org.typenamecustodianaddress, etc.
coord.elev. typedepth
Recordsystem
datewords, words.
QA def.
Units def.
GIS
RAARMMAtmospheric Radiation Measurement
Measurement
Sequence of Information Birth
QA flag
media
generator
method
datesample
ID
parameter name
location
records
Units
Sample def.typedatelocationgenerator
labfield
Method def.
words, wordsunitsmethod
Parameter def.
org.typenamecustodianaddress, etc.
coord.elev. typedepth
Recordsystem
datewords, words.
QA def.
Units def.
GIS
RAARMMAtmospheric Radiation Measurement
Research ~ Publishing ~ Metadata
Metadata design can be a “checklist” for research planning
Metadata preparation can be integrated with publication process
Metadata are an investment in current and future science
RAARMMAtmospheric Radiation Measurement
Where to Archive Data?
RAARMMAtmospheric Radiation Measurement
Archive Choices What determines your options?
Sponsor requirements Repository access Metadata requirements
Scalable storage Personal web pages and files Project or network data centers Federal data centers
Links “transcend” storage structures Master directory Mercury
RAARMMAtmospheric Radiation Measurement
Personal Web Page Its fun, rewarding, relatively easy, can share data
quickly, can control access to data Data issues??
complete metadata QA checks
Connected to basic archival center functions?? ready access to data (24 h/d, 7 d/wk) user support data available on multiple media secure, backed-up, long-term storage
RAARMMAtmospheric Radiation Measurement
ESA Ecological Archives Publishing datasets as peer reviewed, citable papers
(with volume and page numbers) Data papers are announced in abstract form in a
print journal with data available electronically Citation example
Esser, G., H.F.H. Lieth, J.M.O. Scurlock and R.J. Olson. 2000. Osnabrück net primary productivity data set. (Ecological Archives data paper E081-011). Ecology 81, 1177-1177.
Bill Michener, Editor http://esa.sdsc.edu/esapubs/Journals_main.htm
RAARMMAtmospheric Radiation Measurement
Master Data Directory Provides search capability and pointers to a source
of the data (Center does not archive data) Maintains standard keywords/indices Collects metadata from many sources Examples
Global Change Master Directory (GCMD) http://gcmd.gsfc.nasa.gov
ORNL DAAC Mercury System http://mercury.ornl.gov
6. Data and documentationare downloaded directly from the data provider
1. The data provider uses the Metadata Editor to create a metadata file containing links to the data and documentation
5. User links to data provider’s server
2. Mercury harvests the metadata and builds an index
3. Users query the index
User
4. Full metadata are returnedto the user, including linksback to the data provider
MetadataIndex
NASA / ORNL
Data anddocumentation
What is Mercury?
Mercury is used to assist an
investigator with documenting data and making these data available to
others.
Mercury is used to assist an
investigator with documenting data and making these data available to
others.
RAARMMAtmospheric Radiation Measurement
Regional Archives
RAARMMAtmospheric Radiation Measurement
Sources of Regional Data Carbon Dioxide Information Analysis Center National Geophysical Data Center National Environmental Satellite, Data, and
Information Service National Soils Data Access Facility National Water Information System Forest Inventory and Analysis Breeding Bird Survey Threatened and Endangered Species Global Change Master Directory
NASA EOSDIS Distributed Active Archive Centers
JPLJPL
U. AlaskaU. Alaska
U. ColoradoU. Colorado
EDCEDC
ORNLORNL
GSFCGSFC
LaRCLaRC
Cryosphere
Land ProcessesSEDACSEDAC
Socio-economic
BiogeochemicalDynamics
Sea Ice andPolar Processes
AtmosphericProcesses
Upper Atmosphere,Global Biosphere, and
Geophysics
Ocean CirculationAnd Air-sea Interaction
Precipitation
Cloud Amount
Fossil Fuel Emissions
Soil Carbon
Topography
LW Radiation
Clear-Sky Albedo
Vegetation Biophysics (fPAR)
Global scale, 280 parameters: surface, atmospheric, fluxes
RAARMMAtmospheric Radiation Measurement
Future: Issues to Resolve
Size, diversity, and longevity Accommodating change Teaching good practices
RAARMMAtmospheric Radiation Measurement
Issues: Size, Diversity, Longevity Size
Online vs. Offline Database vs. File structure Multiple institutions Too big for technology migration??
Diversity Increased logic and documentation for “finding data” Spatial distribution Increased potential for uniqueness conflicts
Longevity Too old to explain or decode Too much evolution of methods and practices Asynchronous change in data and metadata
RAARMMAtmospheric Radiation Measurement
Issues: Planning and Requirements Plan for archiving early and ongoing
Avoids missing metadata Avoids panic Improves overall data quality and consistency Consider the timing of requirements
Requirements Standards: “to be or not to be?” Documentation expectations Accessibility
“Its mine!!Its my data!!
You CAN’T have it!!”
RAARMMAtmospheric Radiation Measurement
Research Implies Change …
repeat…
New informationrequirements
New questions
Research
DiscoveryNot always true for other information
systems
RAARMMAtmospheric Radiation Measurement
Issues: Accommodating Change Change must be considered in the design Things that will change
Access expectations Logical hierarchy of information scope
New parameters New disciplines New study sites New data sources or methods
RAARMMAtmospheric Radiation Measurement
Issues: More Changes Unpredictable variation is:
no excuse!! Often used as an excuse to avoid standards Cannot avoid all of it, but try…
Missing values will occur; Plan ahead Do not do: Temp, temp, t, T, temperature
Be clear, avoid ambiguity Minimal observational intensity is:
no excuse!! Quick study = no documentation??
The unexpected are rare and
most valuable??
RAARMMAtmospheric Radiation Measurement
Rules for CreatingDatabases for Archiving Unique occurrences
Each type of measurement is represented in a consistent way Each measurement event is represented by only one value
Identifiers Each value is associated with a parameter name Each measurement value has a quality indicator and link to a
method description
Place and time Each value is associated with a unique place name with a
quantitatively defined location (geographic coordinates) Each value is associated with a date and time
Data Storage and Transport Data are stored or managed with a database management
system or equivalent
RAARMMAtmospheric Radiation Measurement
Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive
Best Practices include Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation
Published: Cook et al. 2001. Ecological Bulletin http://www.daac.ornl.gov/DAAC/bestpractices.html
RAARMMAtmospheric Radiation Measurement
Reflecting Into the Future…
RAARMMAtmospheric Radiation Measurement
Workshop Reactions Distributed (sensor) processing
Yes / No Automated QA Getting data dirty
Metadata early 10X easier, scalable
Differentiate standards Intentional variance only Partition / isolate exceptions when possible
Look for 3, 5, 10X changes 20-30% not worthwhile
RAARMMAtmospheric Radiation Measurement
Summary Points Archives need structure and standards Social and education solutions VERY
important Metadata are the “neurons” of Archives Metadata early better than late Need to think about our choices.
RAARMMAtmospheric Radiation Measurement
Future Thoughts Will we be able to know “Where are we?”
in the information structure How many 30 KB files are on a 100
GB tape cartridge? The future limits will not be technology
But our minds… We need to plan NOW how to best
leverage the future
RAARMMAtmospheric Radiation Measurement
A Future Scientist’s View I told my college-age daughter about the
Japanese announcement of 1 TB of optical memory in 1 cubic centimeter.
Her reply: “…We need to know how to think critically
and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”
RAARMMAtmospheric Radiation Measurement
Looking Forward to a Future With Archives!!