Lessons learnt from GEOSS
Experience in Data Brokering
Stefano Nativi (GEO and CNR-IIA)
[email protected]@cnr.ithttps://www.earthobservations.org/documents/GEO_Strategic_Plan_2016_2025_Implementing_GEOSS.pdf
GEO is a coalition of governments and participating organizations working towards the implementation of the Global Earth Observation System of Systems (GEOSS) to meet the need for timely, quality long-term global information as a basis for sound decision-making.
Earth observations from diverse sources, including satellite, airborne,in-situ platforms and citizen observatories,when integrated together, provide powerful tools for understanding the past and present conditions of Earth systems, as well as the interplay between them.
GEOSS Applications
GEOSS Providers
GEOSS Application Developers
(intermediate Users)
GEOSSend-Users
GEOSS ApplicationsGEOSS ApplicationsGEOSS Applications
Enterprise System j
… .
Enterprise System 1
System 4Enterprise
System 3
Enterprise System 2
… .
… .
… .
SBA 1 SBA 2
SBA 8
Enterprise System
K
Enterprise System 3
System 4
Enterprise System
1
Enterprise System 2 Enterprise
System Z
Enterprise System 1
System 4Enterprise
System 2
Enterprise System 3
GEOSS Portal
DOWNSTREAM
UPSTREAM
MIDSTREAMGEOSS Common Platform
APIs
Mediation modules
GEOSS Common Platform (GCI)
Some Key Numbers:155 Brokered data providers; About 45 million datasets;
about 200 million granules
GEO Discovery and Access Broker (DAB)
• GEO DAB is a brokering framework
• Connect, mediate and harmonize hundreds of heterogeneous
data/information systems
• Provide discoverability, access and transformation capabilities
http://www.geodab.net/
Enhanced GEOSS Portal - Overview
• Enhanced during 2016
• Accessible from www.geoportal.org
• Coordinated with ESA, CNR-IIA, DG-RTD, DG-JRC and GeoSec
• Focus on engagement, delivery and advocating
• Structured in 3 phases
• 1st phase – 2016: interface restyling: completed
• 2nd phase – 2017/18: deployment of major upgrades
• 3rd phase – 2019 onwards – operations and evolutions
Brokering: main benefits and challenges
• Benefits
– Multi-purpose (i.e. application agnostic)
– Re-usability
– High Specialization
– Composability with other (third-party) services
– Sustainability and evolvability
– Flexibility and configurability (even at run-time);
– Extensibility
• Challenges
– Trusting
– Governance
– New cultural and business model
– Complexity (need for specialists in intermediation services)
Key Lessons from first decade
• Main success of GEO is the creation of a common yet
flexible and organizational structure for voluntary
cooperation
• Agreement on common data sharing and data management
principles
– implementation needs to be strengthened.
• Cooperation across SBAs need strengthening
• Stronger linkage between space-based and in-situ
communities needed to close the observational gaps
• The GEOSS Common Infrastructure (GCI) has greatly
advanced Earth observation data interoperability
– there is the need to develop a more User-driven GEOSS
Variety challenge in GEOSS
• Variety is the most important challenge for GEOSS.
More than 155 Brokered
Systems
About 200 M granules
Main Adopted Solutions – GEO DAB
• Appliance of the Brokering pattern and introduction of a Brokering services tier (GEO DAB)
• GEO DAB maps the diverse data and metadata modelsonto its own internal model
– general enough to comprise all the necessaryconcepts
• GEO DAB internal data and metadata model MUST BE flexible and extensible to allow new concepts and related attributes addition
OGCCSW2.0.2APISO1.0 INPE
OGCCSW2.0.2ebRIMEO CKAN
OGCCSW2.0.2ebRIMCIM DCAT
ESRIGEOPORTAL10 GI-cat
OAI-PMH2.0 ESRIGEOPORTAL10
OpenSearch1.1 NCML-OD
OpenSearch1.1ESIP BCODMO
OpenSearchGENESIDR NCML-CF
CKAN NetCDF-CF1.4
CUAHSIHIS-Central FTPpopulatedwithsupportedmetadatatypesESRIRESTAPI10.3 WAFWebAccessibleFolders
OGCWCS GeoNetwork (2.2.0orgreater)
OGCWMS EcologicalMarkupLanguage2.1.1
OGCWFS1.0.0,1.1.0,2.0.0 NERRS(NationalEstuarineResearchReserveSystem)
OGCWMTS HMACSW2.0.2ebRIM/CIM
OGCSOS1.0.0,2.0.0,2.0.0HydroProfile HDF
OGCWPS1.0.0 IADCDB(MySQL)
OGCCSW2.0.0Core GrADS-DS
OGCCSW2.0.2APISO1.0 FedEO
OGCCSW2.0.2ebRIM/EOAP ARPADB(basedonMicrosoftSQL)
OGCCSW2.0.2ebRIM/CIMAP ESRIMapServer
IRISStation SHAPEfiles(FTP)
IRISEvent KISTERSWeb- EnvironmentofCanada
HYRAXTHREDDSSERVER1.9 EnvironmentCanadaHydrometricdata(FTP)
OAI-PMH2.0- Harvesting OpenSearch1.1
GBIF EarthEngine
DIF RASAQM
HYDRO EGASKRO
UNAVCO SITAD(SistemaInformativoTerritorialeAmbientaleDiffuso)
CDI1.04,1.3,1.4 FileSystem
ISO19115-2 GDACS
THREDDS1.0.1,1.0.2 GeoRSS 2.0
THREDDS-NCISO1.0.1,1.0.2 Degreecatalogservice2.2
THREDDS-NCISO-PLUS1.0.1,1.0.2 OpenSearchGENESIDR
Volume challenge in GEOSS
• Large number of (Big) datasets provided by
the supply systems
– e.g. millions of discoverable (small to medium size)
products, and long EO time/space series
• GEOSS DOES NOT store datasets
• GEOSS HAS to provide effective
discoverability and accessibility
– e.g. Commonly, constrained queries return a large
number of datasets
Main Adopted Solutions
• GCI addresses this challenge by returning an ordered
and/or a smaller result sets
Views
Ranking and Paging
GEOSS View
• Definition:
– Subset of the whole GEOSS resources defined by applying
(via the DAB) a set of clauses
• Discovery clauses (e.g. spatial envelope, keywords,
sources, etc.)
• Access clauses (e.g. data format, access protocol, CRS,
etc.)
• Defined “View” exposed on the GEOSS Portal
Consumer-defined View – i.e. Client-side available only for the client application that defined the
view.
Provider-defined View –i.e. Server-side available for all client applications (e.g. a Community)
Velocity challenges in GEOSS
• Processing rate to transform and preview data
• Asynchronous approach for data access
• Real-time (or near real-time) data access
Main Adopted Solutions
• GEO DAB + GEOSS Portal support a fast previewservice providing data preview (tile-based):
– when available, the data provider fast preview services is
used –by implementing the required mediation
• The DAB + GEOSS Portal provide a set of (synchronousand/or asynchronous) access transformation services to deliver discovered datasets according to a Common Grid:
– Format
– Coordinate Reference System
– spatial and temporal extent and resolution
Main Adopted Solutions
• Distribute Users' queries to
the brokered near real-time
systems, on-the-fly:
• Most updated content
• Lower performance
• Non-consistent ranking
• Harvest information of near
real-time systems at regular
and effective intervals:
• Potentially not the most
updated content
• Good performance
• Consistent ranking
• Two strategies have been pursued to broker Real-time
systems:
Global Biodiversity Facility (GBIF)
INPE Satellite Imagery
ESRI ArcGIS Online
...
Visualization challenges in GEOSS
• Support diverse (cross-)disciplinary applications
targeting different Communities and User categories
• Main solutions
– GEOSS Portal customization:
• e.g. display seismic events according to magnitude
– A set of high-level APIs (GEO DAB APIs) to allow
the development of ad-hoc applications
exploiting GEOSS content
A set of standard Web service interfaces:
• e.g. OGC service interfaces, CKAN, OAI-PMH, FTP, etc.
A set of APIs for software developers:
• Client side APIs:
– (high-level) JavaScript library
– … . (Python)
• Server side APIs:
– REST/JSON APIs
– OpenSearch APIs
– … .