Geoservices Activities at EDINA

Post on 08-Jul-2015

456 views 1 download

Tags:

transcript

Geoservices activities at EDINA

(OR Why the Elephant is your Friend)

About - EDINA National Data Centre

• A designated National Data Centre for Tertiary Education since 1995

• Based at The University of Edinburgh• Our mission...

to enhance the productivity of research, learning and teaching in UK higher and further education

BY

delivering access to a range of online data services through a UK academic infrastructure, as well as supporting knowledge

exchange and ICT capacity building, nationally and internationally.

• Focus is on service but also undertake r&D• History

– first online GI service, UKBORDERS, launched in 1994 – flagship Digimap service now a teenager!– substantial experience in handling geospatial data on a large

scale (large db; large user base)

The Geoservices Team

• Largest team within EDINA • Highly experienced and skilled

team– provides advice nationally and

internationally– active in standards development and

policy– active in GI community nationally and

internationally

• Demands of the services offered means the team has been at leading edge of GI service development in UK Services

Projects

Today

Services

Projects

1999

Our Service requirements

• Fast servicing of requests

• Scaleable and extensible– accommodates steady or increasing demand

• Robust (our SLA aspires to 99% uptime!)

• Maintainable

• Standardised

– can easily substitute components for repair, upgrade, etc.

• Rapid prototyping and rollout

• All of above on tight budget!

What do we use Postgres/PostGIS for?

• Service operation and management

• Map creation– Data store for vector based maps– Indexing service for raster based maps– Source for ‘Get Feature Info’ queries

• Data Delivery– Data store for vector products

• Searching/Querying– Advanced place name searching

… for service operation and management

• Store service critical metadata

• User data

• Control user access

• Log activity

Case Study: Digimap

• Approx 50,000 active users at any point in time

• Academic Year 2010/11 stats

• c400,000 logins

• Over 10 million maps created

• 240,000 high quality print maps generated

• 100,000 data download requests

• Over 1 million data files downloaded

… as a ‘Data Store’ for mapping

• From the (very) large • Ordnance Survey’s MasterMap (in EDINA’s map schema)

Data Rows:

Area: 107,293,931Lines: 278,110,576Boundary: 535,039Points: 3,984,140 Symbols: 2,793,680Text: 21,004,729

Data Size (indexes):Area: 49 Gb (13Gb)Lines: 73 Gb (24Gb)Boundary: 321 Mb (46 Mb)Points: 668 Mb (399 Mb) Symbols: 522 Mb (236 Mb)Text: 4 Gb (1.7gb)

… as a ‘Data Store’ for mapping

• … via the small but cartographically complex • Ordnance Survey’s Strategi

Only 778,000 rows

Range of geometries

Strict layer draw order

Over 50 layers

Many drawn multiple times

… as a ‘Data Store’ for mapping

• … to the complex data schema• SeaZone’s Hydrospatial

Large range of features

Complex feature relationships

Individual layers scale control

… as a ‘Spatial Indexing’ system

• Spatial index for 1.4 million historical maps of Great Britain• Covers the late 1840s to early 1990s

Complex file structure

Reflects original capture Counties Towns Editions Scale

And the digitisation process

… but not critically TIME

• However, for historical data the temporal availability was critical.

• Use of date information in addition to spatial index allows maps to be placed in correct time slot– Used publication date as survey date metadata missing– An example of a MapServer layer definition for 1890s maps:

area from (select * from historic.ancient_roam_tiles b, (select county, max(edition) as edition2, a.sheet_no from historic.ancient_roam_tiles a, (selectmax(version) as max_version, sheet_no from historic.ancient_roam_tiles where (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and (version = 'ng' or version = 'cs_ng') and st_setsrid(!BOX!,27700) && area group by sheet_no) as selection where a.version = selection.max_version and a.sheet_no=selection.sheet_no and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) group by a.sheet_no, county) as sheet_group where b.sheet_no=sheet_group.sheet_no and b.county = sheet_group.county and (1890 between (cast((substr(cast(publish_year_start as varchar),1,3))as int)*10) AND (cast((substr(cast(publish_year_end as varchar),1,3))as int)*10)) and (scale=10000 or scale=10560) and b.edition = sheet_group.edition2) as subq using unique id using SRID=27700

• Ease of use with range of map rendering software

OS Strategi (Cadcorp GeognoSIS)

OS Open Data: Panorama and Vector Map District products plus grid lines and labels (MapServer)

… for WMS GetFeature Info

Example of proximity search (especially useful for point data)

• Easy to provide information about selected feature.

• Allow use of additional search parameters, for example proximity to point clicked.

• Access additional metadata tables for information.

Bedrock information and selected area highlighted.

Map sheet information stored in metadata tables.

… update interfaces to reflect current map

Legend shows only rock types in area (over 1000 in full legend)

Timeline highlights selected as well as other available decades

… as a ‘Data Store’ for download

UKBORDERS provides bespoke data extraction of vector boundary data in custom formats (Shape, MIF,KML,DXF)

Realtime extraction - uses Geoserver over PostGIS as WFS piped through FME

Metamodel built around PostGIS (formerly Oracle). Migration resulted in a more scalable (multiple dev/live/fallover instances) with easier desktop prototyping

OpenBoundaries – same engine, different data (all based around derived OS Open Data) and skin

… for querying

• Unlock provides an Application Programming Interface (API) for querying over 11 million geographic names across variety of gazetteers:• GeoNames (world coverage)• Pleiades ancient place names (world coverage)• Natural Earth (world coverage)• OS products (UK coverage): 1:50,000 Placename Gazetteer, Meridian 2, Boundary-

Line, BN Grid references

• Placename outlines and attribution extracted from mapping data or published gazetteers

• Outlines are unique service feature enabling further spatial data extraction and analysis

• Unlock Places extensively uses stored database procedures: • The writing of dynamic queries. • Allowing complex data filtering and parsing.

Outline of Southampton returned by Unlock Places

How do we use Postgres/PostGIS to best effect

• Ensure data schemas are determined by functionality– Do NOT accept defaults from loaders– Use INTs for primary selection attributes

• Tailor data processing to task– For mapping do NOT include non-mapped features or attributes

• Indexes are your friend– Ensure all search attributes are indexed

• Clustered indexes are your best pal– Critical for our mapping schemas

• Bad or unnecessary indexes are your worst enemy– Can cause sever slowdown resulting in a bad user experience– Make use of EXPLAIN

• Hide internal complexity behind database views – makes applications more portable

• Use schemas to roll out data updates (just set search path to look in new default schema), makes rolling back to previous data version easy.

• Take advantage of stored procedures. If SQL is hidden in application code then it might be impossible to roll out changes instantly because of the need to re-compile, re-deploy the application, downtime might be required By storing SQL within procedures any changes become immediate and more seamless.

• Use built in data replication per instance – feel more protected from bad luck!

What we like about Postgres/PostGIS

• Reliable

• Performant

• Scalable

• Easier replication

• Standards compliant

• Comes with good tools

• Superb 3rd party support

..and the elephants ...

The future: What are we planning?

• Migrating to Postgres 9.1– Currently we have a mix of 8.3 and 8.4 installs– Take advantage of new functionality and bug fixes

• Exploring the new functionality in PostGIS 2.0 to enhance existing services and possible new ones– Raster capabilities– Topology – Generalisation with topological consistencyconstraints

Highly generalised Census 2001 OAs in Nottingham. all input features are present post generalisation with no overlaps or new slivers introduced.

Conclusion

• Postgres and PostGIS has been used to power EDINA geo-services for over 8 years

• During late 2011 the last major service was migrated.

• All geo-services (and some non-geo ones!) at EDINA rely on Postgres/PostGIS as either the sole or principal database

• It will continue to form the core of our services for the foreseeable future.

• The elephant is our friend, it certainly could be yours!