Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | chester-tyler |
View: | 33 times |
Download: | 1 times |
Unidata Infrastructure for Data Unidata Infrastructure for Data ServicesServices
Russ Rew
GO-ESSP Workshop, LLNL
2006-06-19
2
Some Current Unidata Infrastructure Projects
LDM for distributing and processing near real-time data
Integrated Data Viewer (IDV) for testing infrastructure in platform-independent data visualization and analysis
NetCDF C-based interfaces for data access CFIOlib for a CF conventions API (tomorrow)
NetCDF Java for advanced data access infrastructure Common Data Model for improving interoperability NcML for metadata annotation and data aggregation THREDDS Data Server (TDS) for remote access to
archives GALEON for serving netCDF data through OGC Web
Coverage Services (WCS)
3
LDM-6 for Internet Data Distribution
Implements a peer-to-peer system for reliable, event-driven data distribution
Supports subscriptions to many near real-time data feeds; no data center needed
Data product abstraction is general: model output, observations, text products, satellite data, radar, …
Protocols use persistent connections to achieve low latency
Highly configurable: inject, distribute, capture, filter, and process arbitrary data products
In continuous use by over 160 universities, NOAA, USGS, NASA, internationally, THORPEX global ensembles (TIGGE), …
Candidate for use in new WMO weather information system
Source
LDM
Source
Source
LDM LDM
LDMLDM
LDM LDM
LDM
LDM
Internet
4
IDV (Integrated Data Viewer) Freely available 100% Java
reference application and framework for visualization and analysis of geoscience data
Provides integrated and time synchronized 2-D and 3-D visualizations of model outputs, observed, and remotely sensed data, using U. of Wisc. VisAD
Handles diverse formats and protocols for local and remote access: GRIB, netCDF, OPeNDAP, ADDE, HTTP, GIS, …
Serves as end-to-end test for many Unidata technologies: THREDDS services, Java netCDF, XML bundles, plug-in architecture, interactive collaboration, …
5
NetCDF’s Niche Simple data model for scientific datasets
Portable, self-describing data Appendable, sharable, archivable Direct access for efficient subsetting Metadata via attribute conventions such as CF
Flexible remote access via OPeNDAP, HTTP, WCS
Lots of applications: NCO, ncbrowse, ncview, IDV, IDL, MATLAB, ArcGIS, ...
Language interfaces include C, Java, Fortran, C++, Perl, Python, Ruby, ...
6
NetCDF-3 Data Model
Attribute
name: String
type: DataType
values: 1D array
Variable
name: String
shape: Dimension[ ]
type: DataType
array: read( ), …
File
location: Filename
create( ), open( ), …
Dimension
name: String
length: int
isUnlimited( )
DataTypechar byte short int
float double
A file has named variables, dimensions, and attributes. Variables also have attributes. Variables may share
dimensions, indicating a common grid. One dimension may be of unlimited
length.
Variables and attributes have one of six primitive
data types.
7
Some NetCDF-3 Limitations
Only one shared unlimited dimension No structures, just scalars and multidimensional
arrays No strings, just arrays of characters Limited numeric types No ragged arrays or nested structures Only ASCII characters in names Changes to file schema can be expensive Efficient access requires reads in same order as
writes No built-in compression Only serial I/O Flat name space limits scalability
8
NetCDF-4 Features to Address Limitations
Multiple unlimited dimensions Portable structured types String type Additional numeric types Variable-length types for ragged arrays Unicode names Efficient dynamic schema changes Multidimensional tiling (chunking) Per variable compression Parallel I/O Nested scopes using Groups
9
NetCDF-4 Data Model (Common Data Access Model)
Dimension
name: String
length: int
isUnlimited( )
Attribute
name: String
type: DataType
values: 1D array
Variable
name: String
shape: Dimension[ ]
type: DataType
array: read( ), …
Group
name: String
File
location: Filename
create( ), open( ), …DataType
PrimitiveTypechar
byte
short
intint64float
doubleunsigned byte unsigned short
unsigned intunsigned int64
string
UserDefinedType
typename: String
Compound
VariableLength
Enum
Opaque
A file has a top-level unnamed group. Each group may contain one or more named subgroups, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions,
indicating a common grid. One or more dimensions may be of unlimited length.
Variables and attributes have one of twelve primitive data types or one of
four user-defined types.
10
NetCDF-4 Architecture
NetCDF Javaapplications
NetCDF-3applications
NetCDF-4applications
HDF5applications
NetCDF-4 uses HDF5 for storage, high performance Parallel I/O Chunking for efficient access in different orders,
efficient use of compression Conversion using “reader makes right” approach
Provides simple netCDF interface to subset of HDF5 Also supports netCDF classic and 64-bit formats
POSIX I/OPOSIX I/O MPI I/OMPI I/O
HDF5HDF5netCDF-3netCDF-3
netCDF netCDF JavaJava
netCDF-4netCDF-4
……
NetCDF Javaapplication
NetCDF-3application
NetCDF-4application
HDF5application
Java VMJava VM
11
Status of NetCDF-4
NetCDF-4.0-alpha14 currently available for testing Files created with alpha release use unsupported artifacts
We’re seeking feedback on performance and functionality
NetCDF-4.0-beta waiting for HDF5 1.8-beta Will finalize file format, eliminate necessity for artifacts
Expected within a few weeks of HDF5 1.8-beta release, maybe by August 2006
HDF5 1.8 currently expected by November 2006 Has enhancements specifically for netCDF-4: variable creation order, Unicode names, dimension scales, on-the-fly numeric conversions
Plans for netCDF-4.1 and beyond on netCDF-4 web site
12
Summary
Unidata’s LDM-6 implements an event-driven architecture for low-latency data distribution
Unidata’s IDV provides a platform-independent visualization and analysis framework and reference application for integrating data from diverse sources
Unidata’s netCDF-4 software preserves backward compatibility and eliminates many limitations of netCDF-3 with only a modest increase in complexity