Data Infrastructure for Hydrologic Observations
Ilya Zaslavsky
Spatial Information Systems LabSan Diego Supercomputer Center
UCSD
SSI Global Water Initiative Seminar Series, February 3, 2009
http://his.cuahsi.orghttp://hiscentral.cuahsi.orghttp://hydroseek.nethttp://river.sdsc.edu/ucsddashhttp://wron.net.au/DemosII/Modules/ODMKMLGatway.aspx
Observation Stations
Ameriflux Towers (NASA & DOE) NOAA Automated Surface Observing System
USGS National Water Information System NOAA Climate Reference Network
Map for the US
Build a common window on water data using web services
US Map of USGS Observations
Antarctica
Puerto Rico
Hawaii
Alaska
Different types of nutrients by decade: Available Data Total
Some physical properties by decade: Available Data Total
Water Data Web Sites
NWISWeb site output# agency_cd Agency Code# site_no USGS station number# dv_dt date of daily mean streamflow# dv_va daily mean streamflow value, in cubic-feet per-second# dv_cd daily mean streamflow value qualification code## Sites in this file include:# USGS 02087500 NEUSE RIVER NEAR CLAYTON, NC#agency_cd site_no dv_dt dv_va dv_cdUSGS 02087500 2003-09-01 1190USGS 02087500 2003-09-02 649USGS 02087500 2003-09-03 525USGS 02087500 2003-09-04 486USGS 02087500 2003-09-05 733USGS 02087500 2003-09-06 585USGS 02087500 2003-09-07 485USGS 02087500 2003-09-08 463USGS 02087500 2003-09-09 673USGS 02087500 2003-09-10 517USGS 02087500 2003-09-11 454
Time series of streamflow at a gaging station
Point Observations Information Model
• A data source operates an observation network• A network is a set of observation sites• A site is a point location where one or more variables are measured• A variable is a property describing the flow or quality of water• An observation series is an array of observations at a given site, for a
given variable, with start time and end time
• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value
Data Source
Network
Sites
ObservationSeries
Values{Value, Time, Qualifier}
USGS
Streamflow gages
Neuse River near Clayton, NC
Discharge, stage, start, end (Daily or instantaneous)
206 cfs, 13 August 2006
Return network information, and variable information within the network
Return site information, including a series catalog of variables measured at a site with their periods of record
Return time series of values
Information model challenges…• Sites
– STORET has stations, and measurement points, at various offsets…– Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no
state/county info); agency site files need to be upgraded to ODM…– A groundwater site is different than a stream gauge…
• Censored values– Values have qualifiers, such as “less than”, “censored”, etc. – per value.
Sometimes mixed data types.. • Units
– There are multiple renditions of the same units, even within one repository– There may be several units for the same parameter code (STORET)– Unit multipliers (e.g. NCDC ASOS)
• Sources– STORET requires organization IDs (which collected data for STORET) in
addition to site IDs• Time stamps: ISO 8601
– Many in local times; conversion needed• Variable names and measurement methods don’t match
– E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘, Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen
– ‘bed sediment’ and ‘suspended sediment’ medium types in NWIS vs. STORET’s ‘sediment’.
http://his.cuahsi.org/odmdatabases.html
CUAHSI Observations Data Model
Information communication
• Water web pages• Water web services
HyperText Markup Language (HTML)
Water Markup Language (WaterML)
WaterML design principles• Driven largely by hydrologists; the goal is to capture
semantics of hydrologic observations discovery and retrieval• Relies to a large extent on the information model as in ODM
(Observations Data Model), and terms are aligned as much as possible– Several community reviews since 2005
• Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations
• Is no more than an exchange schema for CUAHSI web services
• A fairly simple and rigid schema tuned to the current implementation; the least barrier for adoption by hydrologists
• Conformance with OGC specs not in the initial scope – but working with OGC on this (OGC Discussion Paper 07-041)
Water Data Services• Set of query
functions
• Returns data in WaterML
NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, USGS SNOTEL, ODM (multiple sites)
Test bed HISServers
Central HIS servers
ArcGIS
Matlab
IDL, R
MapWindow
Excel
Programming (Fortran, C, VB)
Desktop clients
Customizable web interface (DASH)
HTML - XMLW
SDL - SO
AP
Modeling (OpenMI)
Global search (Hydroseek)
WaterOneFlow Web Services, WaterML
Con
trolle
d vo
cabu
larie
s
Met
adat
aca
talo
gs
Ont
olog
y
ETL
serv
ices
HIS LiteServers
External data providers
Deployment to test beds
Other popular online clients
ODM DataLoader
Streaming Data Loading
Ontology tagging (Hydrotagger)
WSDL and ODM registration
Data publishing
ODMTools
Server config tools
HIS CentralRegistry & Harvester
Hydrologic Information System Service Oriented Architecture
Central HIS Data
Services
Catalog
Semantic Tagging of Harvested Variables
Hydroseekhttp://www.hydroseek.net
Supports search by location and type of data across multiple observation networks including NWIS, Storet, and academic data
• 11 WATERS Network test bed projects• 16 ODM instances (some test beds have more than one ODM
instance)• Data from 1246 sites, of these, 167 sites are operated by WATERS
investigators
National Hydrologic Information ServerSan Diego Supercomputer Center
HIS Deployment
Against the NIH Syndrome2006:► CUAHSI HIS web services are discussed on the BASINS mailing list as a
new way to access hydrologic data. The list is mostly used by hydrologists and developers outside academia;
► NCDC develops ASOS web services following WaterML2007: ► MOU with USGS; USGS is developing WaterML-compliant GetValues
service;► GLEON uses an early version of ODM to develop their own schema
(VEGA);► Phoenix LTER is developing ODM (in MySQL) and WaterML services (in
Java);► A Google Earth-based client for CUAHSI web services is developed at
CSIRO, Australia;► Deployment to 11 hydrologic observatory test beds, + CBEO (CEOP
project)2008: ► KISTERS develops WaterML-compliant web services over their database,
for a client;► MapWindow open source GIS develops WaterOneFlow parsers;► Florida, Texas and Idaho use ODM and WaterOneFlow web services to
provide access to state data repositories; New Jersey is considering the same;
► Another CEOP project, at UC-Davis, is implementing ODM (in Postgres) and web services (in Java);
► Stroud Water Research Center; SBRP; Australian WRON; AWI…► More, which we don’t know about…
Water Quality in Moreton Bay, Brisbane, Australia (Jane Hunter)
Summary• Generic method for managing and publishing observational
data– Supports many types of point observational data– Overcomes syntactic and semantic heterogeneity using a standard
data model and controlled vocabularies– Supports a national network of observatory test beds but can grow!
• WaterML is a common language for water observations data from academic and government sources
• Point Observations Data from Agencies and Academic Investigators can be consistently communicated using web services
• National Water Metadata Catalog is the most comprehensive index of the nation’s water observations presently existing
• Join the Water Data Federation!
Consortium of Universities for the Advancement of Hydrologic Science, Inc.
An organization representing more than one hundred United States universities, receives support from the
National Science Foundation to develop infrastructure and services for the advancement of hydrologic
science and education in the U.S. http://www.cuahsi.org/
122 US Universities as
of July 2008
Databases Analysis
Models
CUAHSI Hydrologic Information SystemGoal: Enhance hydrologic science by facilitating user access to more and better data for testing hypotheses and analyzing processes
• Advancement of water science is critically dependent on integration of water information– Querying nation’s repository of water data– Linking small integrated research sites (<100
km2) with global and continental models– Integrating data from multiple disciplines to
understand controls on hydrologic cycle• It is as important to represent hydrologic
environments precisely with data as it is to represent hydrologic processes with equationsRainfall
& SnowWater quantity
and quality Remote sensing Meteorology Soil water
SDSC Spatial Information Systems LabResearch and system development• Services-based spatial information
integration infrastructure• Mediation services for spatial data, query
processing, map assembly services• Long-term spatial data preservation• Spatial data standards and technologies for
online mapping (SVG, WMS/WFS)• Support of spatial data projects at SDSC
and beyond
Mediator
LegendGenerator
MapAssembler
Ontology
…
GRID SERVICESFOR MAP INTEGRATION
Mediator
LegendGenerator
MapAssembler
Ontology
…
GRID SERVICESFOR MAP INTEGRATION
services
In Geosciences (GEON, CUAHSI, CBEO,…)
Spatial web services
FederalAgencies
Figure 1.26 The Geography Network.
ESRICounty spatial data and toxicant information
Telesis, other localNon-profits
CA state
WSDL
WSWSDL
WSWSDL
WSWSDL
WSWSDL
WSWSDL
WS
Student projects
The CHI ME Model
In regional development (NIEHS SBRP, Katrina)
In Neurosciences (BIRN, CCDB)
http://spatial.sdsc.edu/lab/
Contact: [email protected]