The Research Data Archive at NCAR
Doug Schusterand
Steve WorleyNCAR
Topic Outline
Introduction/HistoryCore Data Categories/Featured DatasetsArchive Management/Tools New Supporting IT InfrastructureFuture Possibilities
1/25/2011 AMS 2011 2
Introduction/History
Data Support Section (Founded 1965)Paper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard
Drives -> Network Based Storage and TransferKB of observations -> Terabytes of Model Generated
Data (Total archive volume over 600 TB)Weeks or months for a user to get data -> Users want
data access now (over 7000 registered users) Pay for Data -> Free and open access to all datasets
that aren’t subject to source restrictions
1/25/2011 AMS 2011 3
Introduction/History
How do we evolve to support the growing needs of data users and generators?Stay aware of current research uses
Strengthen datasets supporting core research data categories
Update archive management toolsRebuild/Augment IT infrastructureEducate supporting staff
1/25/2011 AMS 2011 4
Core Data CategoriesContent to support atmospheric and
geosciences researchSome research examples:
ClimateOceanographic HydrologicWeather PredictionRenewable Energy (Wind/Solar)
1/25/2011 AMS 2011 5
Core Data Categories
1/25/2011 AMS 2011 6
Operational and Reanalysis model outputs
Meteorological and Oceanographic Observations
Remote Sensing Observations
Topography/Bathymetry, Vegetation, Land Use
Featured Datasets
Platform ObservationsDataset Title Coverage Update
FrequencyNCEP GDAS observations (PREPBUFR and NetCDF) Global 1999 – Present Daily
RDA Upper Air Database Global 1920 – Present Monthly
NCDC TD3200 U.S. Cooperative Summary of Day U.S. 1890 – Present Monthly
Unidata IDD GTS based observations (NetCDF) Global 2002 – Present Daily
NCEP operational observations (ON-29 Format) Global 1975 – 2007 Fixed
International Comprehensive Ocean-Atmosphere Data Set (ICOADS)
Global 1662 – Present Monthly
1/25/2011 AMS 2011 7
1662 2011Global Platform Observations
Featured Datasets
Analysis and Forecast Model DataDataset Title Coverage Update Frequency
Thorpex Interactive Grand Global Ensemble (TIGGE) Global 2006 - Present Hourly
Unidata IDD (GFS 0.5deg, RUC 20km, NAM 12km) Global and Regional 2002 - Present Daily
NCEP ETA/NAM (40km) North America 1995 - Present Monthly
ECMWF Operational Deterministic (1.25 x 1.25 Deg) Global 1985 - Present Bi-Yearly
NCEP GDAS Final Analysis (1x1 Deg) Global 1999 - Present Daily
NCEP OI Global SST (1x1 Deg) Global 1981 - Present Weekly
NOAA OI Global SST (0.25 x 0.25 Deg) Global 1981 - Present Monthly
Hadley Centre Global Sea Ice and SST Global 1850 - Present Monthly
1/25/2011 AMS 2011 8
1850 2011Analysis and Forecast Model Data
Featured Datasets
1/25/2011 AMS 2011 9
High Resolution Re-AnalysisDataset Title Coverage Update Frequency
ERA-40 (T159) Global 1957 - 2002 Static Set
ERA-Interim (N128 Gaussian) Global 1989 - Present Yearly
1870 2011High Resolution Re-Analysis
JRA-25 (1.125 Deg Gaussian) Global 1979 – Present Yearly
NCEP/DOE (T62) Global 1979 - Present Static Set
NCEP/NCAR (T62) Global 1948 - Present Quarterly
NARR (32 x 32 km) North America 1979 - Present Quarterly
CFSR (0.5 x 0.5 Deg) Global 1979 - Present Monthly
NOAA-CIRES 20th Century Global 1870 – 2008 Static Set
Archive Management
How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff?
1/25/2011 AMS 2011 10
Archive ManagementCommon Data Management Tools
Functionality RequirementsScalableIntegrated –one call does allAutomatable
1/25/2011 AMS 2011 11
Archive ManagementCommon Data Management Tools
Task Completion Requirements1. Data acquisition
Get Data (daily or irregularly)
2. Data ArchivalArchive to disk and tape
3. Metadata CollectionCollect MetadataUpdate Metadata Databases
4. Metadata PublishingUpdate Web Server PagesUpdate Internal Metadata Access Points
1/25/2011 AMS 2011 12
Integrated Archival Tools
1/25/2011 AMS 2011 13
Model Generated Data GRIB,
NetCDF
Obs DataBUFR, ASCII etc.
TopographyVector Image,
Binary, etc
Remote Sensing Data
Binary
RDA/CISL Servers
Automateddsupdt
ManualTape, FTP,
etc
Step 1: Get Data
RDA/CISL Servers
Integrated Archival Tools
1/25/2011 AMS 2011 14
Model Generated Data GRIB,
NetCDF
Obs DataBUFR, ASCII etc.
TopographyVector Image,
Binary, etc
Remote Sensing Data
Binary
Step 2: Archive Data
Model Generated Data Files GRIB-2
DISK
HPSSModel
Generated Data File
Model Generated Data File
dsarch
RDA DatabaseFile attribute metadata:
Name, Dataset, Location, Format
RDA/CISL Servers
Integrated Archival Tools
1/25/2011 AMS 2011 15
Step 3: Collect File ContentMetadata/Check Integrity
RDA DB
Model Generated
File,GRIB-2 Format
Temperature(Center, Date, Time, Level,
Location)
Humidity(Center, Date, Time, Level,
Location)
Vorticity(Center, Date, Time, Level,
Location)
Visibility(Center, Date, Time, Level,
Location)
Precip Rate(Center, Date, Time, Level,
Location)
File attribute metadata:Name, Dataset, Location,
Format
File content metadata:T(C,D,T,L,L)
RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)
PcpR(C,D,T,L,L)
GatherMetadata
RDA/CISL Servers
Integrated Archival Tools
1/25/2011 AMS 2011 16
Step 4: Publish Metadata and Data
RDA Web Server
-Dynamic File lists-Data Search tools-Detailed Content Metadata-Data Subsetting Interfaces
CISL Computational Node
-Detailed Metadata for files on disk.-Data Subsetting
RDA DB
File attribute metadata:Name, Dataset, Location,
Format
File content metadata:T(C,D,T,L,L)
RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)
PcpR(C,D,T,L,L)
New Supporting IT/Infrastructure
Online Disk UpgradesLarger Disk (450 TB)Common Disk Interfaces (webserver and
compute nodes)Tape Archive Upgrades
High Performance Storage System (HPSS)Computing Power Upgrades
Additional and more powerful servers
1/25/2011 AMS 2011 17
New Supporting IT/Infrastructure
1/25/2011 AMS 2011 18
Complete User CommunityPros:-Fast access to online data.-Access to all RDA metadata.-Access to RDA data. processing services.
Complete User CommunityCons:-Small fraction of RDA online.-Slow access to offline data.-Data processing requests take a long time to finish.
NCAR User CommunityPros:-Access to full RDA.-Fast computing.
NCAR User CommunityCons:-No access to online data.-Forced to use MSS as a file server: access is too slow-No direct access to RDA metadata.
New Supporting IT/Infrastructure
1/25/2011 AMS 2011 19
Complete User CommunityImprovements:-Faster access to full RDA.-Expanded data processing services available.-Faster turnaround on data processing requests.
NCAR User CommunityImprovements:-Faster access to full RDA.-Direct access to all RDA metadata.
Future Possibilities
1/25/2011 AMS 2011 20
Leverage New IT Infrastructure Server side parameter and spatial sub-setting
across multiple datasets Model or In-Situ observations
Data provided in multiple output formats Web services based requests (REST, etc.) Addition of large and diverse data sets to the
RDA.
http://dss.ucar.edu
1/25/2011 AMS 2011 21