Unidata’sCommon Data Model
and theTHREDDS Data Server
John CaronUnidata/UCAR, Boulder CO
Jan 6, 2006ESIP Winter 2006
Outline
• Definitions
• Creating a Common Data (Access) Model from NetCDF, HDF5, OPeNDAP
• CDM Coordinate Systems, Data Types
• CDM implementation
• NetCDF Markup Language (NcML)
• The THREDDS Data Server
NetCDF-3
• Machine and OS independent file format for “self-describing” scientific data
• C library (Fortran, C++, Perl, IDL, MatLab, Python, Ruby), Java library
• Efficient subsetting of multidimensional arrays.
• > 20,000 downloads last year
HDF5
• Machine and OS independent file format for “self-describing” scientific data
• C library (Fortran, Java, PyTables)• Evolution from HDF4, but different.• HDF-EOS, HDF5-EOS, standard formats
for EOSDIS, ASCI, NPOESS• Parallel-IO, chunked storage, compression
filters, many data types. • Developed at NCSA, now independent
NetCDF-4
• Project funded by NASA to create new version of netCDF using the HDF5 file format.
• “Extend and merge” netCDF and HDF5– Widespread use and simplicity of netCDF– Generality and performance of HDF5
NetCDF-Java 2.2 (nj22)
• 100% Java library
• Prototype implementation of CDM
• File formats:– General: NetCDF, HDF5, OPeNDAP– Grids: GRIB1, GRIB2– Radar: NEXRAD, NIDS, DORADE– Satellite: DMSP, GINI
• Access to THREDDS catalogs
OPeNDAP
• Client-server protocol for scientific data access
• C++ client and server, Java client and server libraries.
• Current version 2.0; NASA ESE standard
• Working on new 4.0 protocol spec
THREDDS
• Originally funded by NSDL – “discovery and use of scientific data”– Middleware between data providers and users– Dataset Inventory Catalogs (XML)
• Now part of Unidata core funding– Data Serving (pull)
What’s a Data Model?
• Its about scientific data: storing, accessing
• It’s an abstraction
• Equivalent to an abstract object model in OOP
• An Abstract Data Model describes data objects and what methods you can use on them
What’s a Data Model?
• An API is the interface to the Data Model for a specific programming language
• A file format is a way to persist the objects in the Data Model.
• A data access protocol plays the role of a file format.
• The Abstract Data Model removes the details of any particular API and the persistence format.
Creating a Common Data Access Model
from NetCDF, HDF5, OPeNDAP
NetCDF-3DataModel
OPeNDAPData
Model(DAP-2)
HDF5DataModel
CommonData(Access) Model
Coordinate Systemsand Scientific Data Types
Coordinate Systems
Common Data Model Layers
Data Access
Scientific Datatypes
Grid
Point
Radial
Trajectory
Swath
Station
Coordinate Systems needed
• NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems– so georeferencing not part of API– Need conventions to specify (eg CF-1,
COARDS, etc)
• Contrast GRIB, HDF-EOS, other specialized formats
• Must be done in a general way
• Same underlying mathematics as VisAD, ASCII
Coordinate Systems
Scientific DataTypes
• Based on datasets Unidata is familiar with– APIs are evolving
• How are data points connected?• Intended to scale to large, multifile
collections• Intended to support “specialized queries”
– Space, Time
• Corresponding “standard” NetCDF file conventions
Point Observation Data
PointObsDataset Methods
// Collection of StructureData
Collection getData(
LatLonRect boundingBox,
Date start, Date end);
Trajectory Data
TrajectoryObs Methods
int getNumPoints();
StructureData getData(int point);
Station Data
StationObs Methods
// return List of Station
List getStations();
// return List of StructureData
List getData(
Station s,
Date start, Date end);
Radial Data
Radial methods
interface Radial { int getNumGates(); float getData(int gate);
float getStartingGate(); float getGateSize(); float getElevation(); float getAzimuth(); double getTime();}
Gridded Data
Grid methods
interface GridCoordSys {
CoordinateAxis getTaxis();
CoordinateAxis getXaxis();
CoordinateAxis getYaxis();
CoordinateAxis getZaxis();
Projection getProjection();
}
Array getDataCube(Range time, Range z, Range y, Range x);
Image/Swath
Standardizing NetCDF Formats
• Grid: CF-1 Convention– Need improvements for regional models
(WRF), GIS info
• Radar: “Radar Exchange Format”– With radar community (led by NCAR ATD)
• Point Observations– Unidata Observation Dataset Conventions
CDM implementations: NetCDF-4 and NetCDF-Java 2.2
34
NetCDF-4
C
Library
HDF5 Library
netCDF-4 Library
netCDF-3Interface
NetCDF-4 C Library
NetCDF-4 Status
• 4.0 Beta implements CDM access layer– complete, but waiting for HDF5 release 1.8 to
finalize file format
• 4.1: adding Coordinate Systems
• 4.?: merge OPeNDAP access (pending funding)
NetCDF-Java 2.2 (nj22)
• Prototype implementation of CDM
• File formats:– General: NetCDF, HDF5, OPeNDAP– Grids: GRIB1, GRIB2– Radar: NEXRAD, NIDS, DORADE– Satellite: DMSP, GINI
• Access to THREDDS catalogs
• Implements NcML
Coordinate Systems
Common Data Model
Data Access
Scientific Datatypes
Grid
Point
Radial
Trajectory
Swath
Station
NetcdfDataset
ApplicationScientific Datatypes
NetCDF-Java version 2.2 architecture
OPeNDAPTHREDDS
Catalog.xml
NetCDF-3
HDF5
I/O service provider
GRIB
GINI
NIDS
NetcdfFile
NetCDF-4
…Nexrad
DSMP
CoordSystem Builder
Datatype Adapter
ADDE
NetCDF-Java 2.2 Status
• Data Access layer: Beta quality– also waiting for HDF5 release to finish
NetCDF-4, commit to API
• Coordinate Systems: early Beta– Finishing docs, runtime plugability
• Data Types: Alpha, still experimenting with APIs
NetCDF Markup Language (NcML)
• XML representation of netCDF metadata (like ncdump -h)
• Create new netCDF files (like ncgen)
• Modify existing datasets– Add/delete/rename – Create logical sections of existing variables.
• Create unions and aggregations of multiple existing datasets.
<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147">
<attribute name=“DataType" value=“Radar" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable>
</netcdf>
NcML example
NcML Aggregation
• Union
• Join Existing
• Join New
• Forecast Model Run
+ + =
+ =
NcML Aggregation Example
<netcdf xmlns=“http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2”>
<aggregation dimName="time" type="joinNew">
<variableAgg name="Temperature"/>
<variableAgg name="Pressure"/>
<scan location=“C:/data/goes/" suffix=".gini"/>
</aggregation>
</netcdf>
THREDDS Data Server
• Integrates data access with THREDDS catalogs and services
• Tomcat/Servlet, 100% Java, single war file
• Data input is netCDF Java 2.2 library
• Data output:– OPeNDAP – HTTP Server– OGC Web Coverage Server (gridded)
HTTP Tomcat Server
THREDDS Data Server
Datasets
Catalog.xml
hostname.edu
THREDDS ServerApplication
NetCDF-Javalibrary
IDD Data
•OPeNDAP
•HTTPServer
•WCS
HTTP Tomcat Server
TDS as WCS Gateway
Catalog.xml
hostname.edu
THREDDS ServerApplication
NetCDF-Javalibrary
•OPeNDAP
•HTTPServer
•WCS
OPeNDAP ServeranotherHost.org
HTTP Tomcat Server
TDS and NcML
Catalog.xml
hostname.edu
THREDDS Server Application
Netcdf-Java
•OPeNDAP
Datasets
NcML
•WCS
TDS and NcML
• Server serves the dataset “wrapped” by the NcML– Client sees OPeNDAP or WCS, not NcML
• Can “fix” metadata problems
• Can augment metadata
• Use NcML aggregation on the TDS– replaces the old “Aggregation Server”
HTTP Tomcat Server
TDS and Digital Libraries
Datasets
Catalog.xml
otherhost.gov
THREDDS ServerApplication
NetCDF-Javalibrary
•OPeNDAP
•HTTPServer
•WCS
OPeNDAP Server
hostname.edu
OAI HarvesterDL Records
TDS and Digital Libraries
• Framework to add metadata– By hand (collection level)– Automatic extraction from datasets
• Send records to existing DLs– No search
• Both collection and inventory level
Future Plans
• NetCDF-Java– Get API’s stable, docs, runtime plugability– NetCDF-4 (!)– HDF4, HDF-EOS, BUFR (need funding)
• NetCDF-4 C Library– DataTypes too immature to port– NcML?– Java on the server
TDS Future Plans
• Aggregation– Driven by IDD data (motherlode)
• Pluggable Authorization• access control by dataset• Performance• Services
– Coordinate System Verifier (eg CF-1)– Data access– Subset and get netcdf file
File Format#N
File Format#2
File Format#1
CDM
Visualization&Analysis
ConclusionN + M instead of N * M things on your TODO List!
NetCDF file
OpenDAP Server
WCS Service