Geonovum Testbed – Topic 4“Spatial data on the Web using the current SDI”
INSPIRE Conference 2016
The starting point …
Agrarisch Areaal
WFS
Adressen en Gebouwen(BAG)
Metadata
CSW WFS
SDI
GIS Experts and Developers
any modelany CRS
XMLrich queries
ISO 19115ISO 19139/XML
What are we trying to do?
• presence on the Web of data• crawlability and linkability, i.e. make each resource hosted by a WFS or CSW available via a persistent URI and ensure that all resources can be reached via links from a “landing page” for the data set
• harmonisation of data discovery: • classification of the resources using vocabularies supported by the main search engines on the Web
• discovery of both spatial and non‐spatial data by the same search engine• data access based on current Web practices
• representations of data for consumption by humans (HTML), developers (JSON‐LD, GeoJSON, GML) and search engine crawlers (HTML with structured data annotations), accessible via HTTP(S)
• connecting data with other data on the Web• establishing and maintaining links between data
What have we built
Agrarisch Areaal
WFS
Adressen en Gebouwen(BAG)
Metadata
CSW WFS
SDI
GIS Experts and Developers
any modelany CRS
XMLrich queries
ISO 19115ISO 19139/XML
Bekendmakingen
“linkeddata
proxy”
“linkeddata
proxy”GeoNetwork
Indexed Web Linked Data Web?Web APIs?
Open Data Portals
Search Engine
Crawlers
“Web Developers”
schema.orgWGS 84
HTML & JSON-LDContent negotiation“Follow your nose”
DCATschema.org
links betweendatasets
Architectural principles used in the proxy design
• All resources are identified using persistent HTTP URIs• All resources should be discoverable via search engines• All interaction is using the HTTP protocol, consistent with its design• APIs to access data should be self‐describing and support immediate use
• Resources can be accessed and understood by developers and citizens
• Resources are either explicitly linked using HTTP URIs or data is structured so that links can be established dynamically
Metadata Paul
Search for an address in Google
Feature from Address WFS reported as first hit
HTML page of the address – live from the WFS
Feature available under a persistent URI as HTML, GeoJSON, JSON‐LD, GML
HTML annotated with schema.org
Landing page of the ldproxy.net demo server
A dataset (a WFS)
A dataset (a feature type in a WFS)
… paged access to all features
All data is accessed live from a WFS
Providing subsets of large data sets (here: by municipality)
Select the municipality
http://www.ldproxy.net/bag/inspireadressen/?fields=addressLocality&distinctValues=true
Dynamically establish links to other resources(here: a dataset with public announcements)
http://www.ldproxy.net/bag/inspireadressen/?addressLocality=Valkenswaard
A public announcement
Note: Workshop tomorrow morning at 09:00 about the work of the Working Group
Indexing is slow 0.2% the pages (features) after 5 months
Indexing of the schema.org annotation is minimal
Similar across search engines
Google is experimenting with the discovery of datasets
https://developers.google.com/search/docs/data‐types/datasets
Resources
• Detailed reporthttp://geo4web‐testbed.github.io/topic4/Many additional aspects and issues covered, not discussed today
• Source codehttps://github.com/interactive‐instruments/ldproxy
• Docker imagehttps://hub.docker.com/r/iide/ldproxy/
Summary of the Findings
• To a large extent we were successful – SDIs can be leveraged to make data available on the Web
• But there are open questions and issues, e.g.• Data integration across datasets: Properly structured data using common vocabularies will decrease the cost of data integration (here: linking) significantly
• SDI consistency: Metadata in WFS capabilities and the dataset metadata in catalogues is often incomplete or inconsistent, persistent identifiers in WFSs, dead links
• Performance & data compactness: Response times and data size impact on the indexing / ranking by search engines and usability in general
• Search engines are largely a black box when we look at spatial data: how is structured spatial data used?
• schema.org is a divergent kind of vocabulary ‐ from the Linked Data perspective• Different vocabularies needed for different use cases / communities – e.g. schema.org vs GeoSPARQL
• Content negotiation based on media types does not support different vocabularies / user communities
Docker
Docker
Docker image of ldproxy
Set up a local ldproxy service using docker
https://github.com/interactive‐instruments/ldproxy/blob/master/docs/00‐getting‐started.md
Adapt a ldproxy service configuration
• change feature type label “lands2:watertorens” to “Watertorens“• disable Foto_groot in the overviews • disable properties OBJECTID and Foto_thumb everywhere • have a better feature name than "watertorens.1" • change "WOONPLAATS" to "Woonplats"
also documented at …https://github.com/interactive‐
instruments/ldproxy/blob/master/docs/00‐getting‐started.md
Service Manager without ldproxy services
… with one ldproxy service
The “landing page” for the ldproxy service
The page for a feature type
The page of a feature
Try it yourself, even without setting up your own proxy
http://services.live.geocat.net:7080