+ All Categories
Home > Technology > Foss4 g topology_july_16_2015

Foss4 g topology_july_16_2015

Date post: 16-Aug-2015
Category:
Upload: lars-aksel-opsahl
View: 66 times
Download: 0 times
Share this document with a friend
57
What happens when you put 1 billion points into Postgis Topology? Foss4g 2015, Como 16/07/2015
Transcript

What happens when you put 1 billion points into Postgis Topology?

Foss4g 2015, Como 16/07/2015

Norwegian Institute of Bioeconomy Research WWW.NIBIO.NO

(from 1. July 2015 Skogoglandskap was merged into

NIBIO together 2 other institutes. )

Lars Aksel Opsahl ([email protected]) , developer.

Is this possible ?

7/18/15 31 billion points in Postgis Topology

Move 1 billion points

Into postgis/topology

The answer is YES!

How long time to add 15 billion ? 15-16 hours.

Is it possible to edit this topo layer ? Yes.

Does edit take long time ? 1 sec and more.

The rest of the slides will go into details about how we solve this and why Topology is good alternative for our case.

7/18/15 41 billion points in Postgis Topology

This presentation we will focus onWHAT type of data we test on.

WHY use Postgis Topology for this layer.

HOW we use Postgis Topology.HOW we f ill this Postgis Topology layer with data.

HOW we plan to update this Topology layer.

AR5 is a high resolution land resource map that covers all of Norway.

● The map describes land resources based on land type, site index, tree species and ground conditions.

● When simple feature it is 8 million polygons with a total of 1 billion points.

AR5 used in gardskart.nibio.no

7/18/15 71 billion points in Postgis Topology

AR5 used in kilden.nibio.no

7/18/15 81 billion points in Postgis Topology

Why use Postgis Topology for AR5.

View map changes

7/18/15 101 billion points in Postgis Topology

What you see Whats the history of the map

Added by aeb10/01/2011

Added by lop16/06/2015

Rollback a user map update

7/18/15 111 billion points in Postgis Topology

User adds a new lineand surface attribute

Moderator deletesthe new line

The new map

Initial map

No overlap or gaps when map edit

7/18/15 121 billion points in Postgis Topology

User adds a new lineand surface attribute The new map

Initial map

This new line will not cause any overlap or gap with the exiting surface

Old lines will keeptheir history and original points(2 new points)

7/18/15 131 billion points in Postgis Topology

CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(id serial PRIMARY KEY not null );SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5','ar5_topo_linje', 'geo', 'LINESTRING') As new_layer_id;

-- create a new table for linestring attrubuttesCREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(

id serial PRIMARY KEY not null,-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since

this update outside our range we can not us foreig key her edge_id int not null, objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1 CHECK (objtype_kode in (1,2,-1)), aravgrtype smallint not null,-- contains felles egenskaper from ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,-- used temp data will be deleted after data is adddedsl_sdeid int

);

HOW TO ILUSTRATE

A good picture may say more that any text, but for some people

a SQL fragment may say more that any text or picture.

When you see SQL fragments, I will explain the meaning. You

can actually think of this as a picture.

HOW we use Postgis Topology.

Database structure for border (lines/edges)

7/18/15 151 billion points in Postgis Topology

CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(id serial PRIMARY KEY not null );SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_linje', 'geo','LINESTRING') As new_layer_id;

-- create a new table for linestring attrubuttesCREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(

id serial PRIMARY KEY not null,-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since this update outside our range we can not us foreig key her edge_id int not null, objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1 CHECK (objtype_kode in (1,2,-1)), aravgrtype smallint not null,-- contains felles egenskaper from ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,-- used temp data will be deleted after data is adddedsl_sdeid int

);

table that holds Topo object for lines

Holds attribute For egdes

Why store attributes in separate table for lines ?

7/18/15 161 billion points in Postgis Topology

● We want to be sure that any edge can have only one attribute value.

● After a discussion with Sandro Santilli we will look at other ways to do this : My update code becomes complicated and many of the same tests are already done in Topology package by Sandro Santilli. The way I have solved this now needs to be redesigned.

Database structure surface

7/18/15 171 billion points in Postgis Topology

CREATE UNLOGGED TABLE topo_ar5.ar5_topo_flate(id serial PRIMARY KEY not null,artype int4 CONSTRAINT artype_between_0_100 CHECK (artype > 0 and artype < 100),arskogbon int4 CONSTRAINT arskogbon_between_0_100 CHECK (arskogbon > 0 and arskogbon < 100),artreslag int4 CONSTRAINT artreslag_between_0_100 CHECK (artreslag > 0 and artreslag < 100),argrunnf int4 CONSTRAINT argrunnf_between_0_100 CHECK (argrunnf > 0 and argrunnf < 100),-- contains felles egenskaper form ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,simple_geo geometry(MultiPolygon,4258) NULL);

--add a topogeometry column to the a ref to polygpn surfaceSELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_flate', 'geo','POLYGON') As new_layer_id;

Used for performance.

Adding the topo geometry

HOW we f ill this Postgis Topology layer with data.

● Content balanced grid.● Parallelize with GNU parallel and the grid cells.● All code is wrapped in PL/pgSQL functions. ● We use simple feature lines and surface

representation points when we create Postgis Topology

-- Core create grid code we use the && Operators to increase index usesql := 'SELECT count(*) FROM ' || table_name || ' WHERE ' || geo_column_name || ' && ' || 'ST_MakeEnvelope(' || x_min || ',' || y_min || ',' || x_max || ',' || y_max || ',' || source_srid || ')';EXECUTE sql INTO num_rows_table_tmp ;IF num_rows_table < max_rowsTHEN

sectors[0] := grid_geom;ELSE

x_delta := (x_max – x_min)/2; y_delta := (y_max – y_min)/2; x_center := x_min + x_delta; y_center := y_min + y_delta;sectors[0] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_min,y_min,x_center,y_center, ST_SRID(grid_geom)), min_distance, max_rows);sectors[1] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_center,y_min,x_max,y_center, ST_SRID(grid_geom)), min_distance, max_rows);sectors[2] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_min,y_center,x_center,y_max, ST_SRID(grid_geom)), min_distance, max_rows);sectors[3] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_center,y_center,x_max,y_max, ST_SRID(grid_geom)), min_distance, max_rows);

Create content balanced grid for AR5 in Norway

7/18/15 191 billion points in Postgis Topology

-- Create a grid with around max 4000 lines in each cellSL_make_content_based_balanced_grid01(ARRAY['org_ar5.ar5_linje geo'],4000))

To big, split in 4

Below limit ok to use

Content balanced grid for AR5 in Norway

7/18/15 201 billion points in Postgis Topology

Content balanced grid for AR5 in Norway

7/18/15 211 billion points in Postgis Topology

Linestring and surface distribution for the grid used.

● Covered by a single cell (does not touch any cell border lines)● Single cell edges : 18988984● Single cell surfaces : 7093814

● Crosses/touches cell border lines● Multi cell edges : 635048● Multi cell surfaces : 534455

221 billion points in Postgis Topology

4 different operation type

7/18/15 231 billion points in Postgis Topology

● A:Process lines covered by single cells.● B:Merge cells to include lines that cross cell borders

(then do the same as in A for lines founs)

● C:Process surfaces covered by single cells.● D:Merge cells to include surfaces that cross cell

borders. (then do the same as in C for surfaces found)

A: Only process data covered by each cell

7/18/15 241 billion points in Postgis Topology

WAIT TO PROCESS:LINE NOT COVERD BY SINGLE CELL

START TO PROCESS :LINE COVERD BY SINGLE CELL

B: Merge cells to include lines that cross cell borders.

7/18/15 251 billion points in Postgis Topology

OK TO PROCESS NOW:LINE COVERD BY SET OF MERGED CELLS

DON'T PROCESS :DON'T TOUCH ANY ORIGNAL BORDERS

Process lines covered by single cells : 1. create topo.

7/18/15 261 billion points in Postgis Topology

SELECT topology.toTopoGeom(geo, 'topo_ar5_sysdata', 1, 0.0000000001) as geo,sl_sdeidFROM (

select arl.sl_sdeid, arl.geo from org_ar5.ar5_linje arlwhere cell_geo_in && arl.geo andST_Contains(cell_geo_in, arl.geo) andarl.objType not in ('KantUtsnitt') andNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)

) AS a

Create the topo object. Extreme performance. Snap to value

Use to find attributes

Merge cells and collect cell borders

7/18/15 271 billion points in Postgis Topology

-- merge cel( SELECT

ST_union(cell.geo) as cell_unionFROM topo_ar5.cell_ad as cellWHERE cell.id >= cell_min_in and cell.id < (stop_cell_id)

) AS r2

-- get cell bordersFROM (

SELECT (ST_Dump(grid_lines)).geom AS grid_lineFROM (

SELECT ST_Collect(ST_ExteriorRing(cell.geo)) as grid_linesFROM topo_ar5.cell_ad as cellWHERE cell.id >= cell_min_in and cell.id < (stop_cell_id)

) AS r ) AS r,

Use merged cells and cell borders to f ind new lines

7/18/15 281 billion points in Postgis Topology

....WHERE ST_intersects(r.grid_line, arl.geo) ANDNOT EXISTS ( select edge_id from topo_ar5_sysdata.edge_data where ST_Intersects(geom, arl.geo) and ST_Intersects(geom, r.grid_line) ) ANDarl.objType not in ('KantUtsnitt') ANDNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)...WHERE ST_Contains(r2.cell_union, arl.geo) ANDNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)

Covered by merged cell

Process lines covered by single cells : 2. add attributes

7/18/15 291 billion points in Postgis Topology

SELECT distinct ON (edge_id) edge_id,topo_ar5.ar5_omkod_objtype_2_kode(b.objtype) as objtype_kode,aravgrtype,b.datafangstdato,ARRAY[b.informasjon] as informasjon,(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as kvalitet ,b.opphav,b.verifiseringsdato,(b.registreringsversjon,4.5)::topo_ar5.sosi_registreringsversjon as registreringsversjon,b.sl_sdeid

FROM ( select r.element_id as edge_id , arl.* FROM relation_ids_added ra, topo_ar5_sysdata.relation r , org_ar5.ar5_linje arl WHEREra.topogeo_id = r.topogeo_id and ra.layer_id = r.layer_id andarl.sl_sdeid = ra.sl_sdeid

) AS b Map by id.

Add attributes using user defined types.

Process surfaces covered by single cells: 1 add topo

7/18/15 301 billion points in Postgis Topology

INSERT INTO topo_ar5.ar5_topo_flate (geo)SELECT topology.CreateTopoGeom('topo_ar5_sysdata',3,2,topoelementarray ) as geofrom

( select distinct ST_GetFaceGeometry('topo_ar5_sysdata',l.face_id) as geo,topology.TopoElementArray_Agg(ARRAY[l.face_id,3]) as topoelementarray, ST_union(l.mbr) as union_face

From topo_ar5_sysdata.face as l, topo_ar5.cell_ad cellwhere cell.id = cell_nr_in and ST_Contains(cell.geo,l.mbr) and NOT EXISTS (select re.element_id from topo_ar5_sysdata.relation re where re.layer_id = 2 and re.element_id = l.face_id ) group by l.face_id

) as r1,topo_ar5.cell_ad cell

where cell.id = cell_nr_in andST_Contains(cell.geo, ST_Boundary(r1.union_face));

Build surface created

Find surfaces insideCurrent cell

Create surface Topo geo

Process surfaces covered by single cells: 2 update simple geo

7/18/15 311 billion points in Postgis Topology

update topo_ar5.ar5_topo_flate AS f set simple_geo = geo::geometryfrom arf_id as ft where f.id = ft.id_temp; Just cast from topo geomtry

Process surfaces covered by single cells : 2. update attributes

7/18/15 321 billion points in Postgis Topology

-- update the rest of the attributtesupdate topo_ar5.ar5_topo_flate as f SET (artype, arskogbon, artreslag,argrunnf,felles_egenskaper) =(c.artype,c.arskogbon,c.artreslag,c.argrunnf,(datafangstdato,informasjon,null, kvalitet,null,opphav,null,registreringsversjon,verifiseringsdato)::topo_ar5.sosi_felles_egenskaper ) FROM ( SELECT

b.artype ,b.arskogbon,b.artreslag,b.argrunnf,b.id_temp,b.datafangstdato, ARRAY[b.informasjon] as informasjon,(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as kvalitet ,b.opphav, b.verifiseringsdato,(b.registreringsversjon,'4.5')::topo_ar5.sosi_registreringsversjon as registreringsversjonFROM( select p.*, ft.id_temp from org_ar5.ar5_punkt as p,arf_id as ft,topo_ar5.ar5_topo_flate as f2where f2.id = ft.id_temp and ST_Covers(f2.simple_geo,p.geo)) as b

) AS c where f.id = c.id_temp;

Find data by using Representation point

Test performance for the migrations process(16 dual core CPU's and ssd disks)

1 parallel threadfunction_create_topo_ar5.sh vroom2 1 13000 200

15 parallel threadfunction_create_topo_ar5.sh vroom2 15 13000 200

20 parallel threadfunction_create_topo_ar5.sh vroom2 20 13000 200

331 billion points in Postgis Topology

Decreasing processing time when increasing number of parallel threads

Number of threads Total runtime in hours

1 108

15 16

20 18

7/18/15 341 billion points in Postgis Topology

Average operations per second the 4 the different operation types with different number of threads.

Number of threads

A: Single celllinestrings

B: Multi celllinestrings

C: Single cellsurfaces

D: Multi cellsurfaces

1 91 9 305 5

15 1043 48 972 21

20 814 48 934 27

7/18/15 351 billion points in Postgis Topology

Average operations per second at every hour when running single threaded.

7/18/15 361 billion points in Postgis Topology

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCCCD

0

50

100

150

200

250

300

350

400

450

500

Hours and opr. type

Opr. pr. sec .

Average operations per second at every hour when running 15 parallel threads.

7/18/15 371 billion points in Postgis Topology

A A A A A A B B B B C C C D D D D D D D D0

200

400

600

800

1000

1200

1400

1600

1800

Hours and opr. type

Opr. pr. sec .

Summery convert AR5 to Postgis Topology

7/18/15 381 billion points in Postgis Topology

● Content balanced grid and parallel threads.● Two parallel threads can not work in the same area ● Function based index topo_ar5.get_relation_id( geo

TopoGeometry) and indexes on relation table.● Heavy use of && operator. ● Ok with 16 hours processing time since this is a one

time operation.● ValidateTopology('topo_ar5_sysdata') show no error.

HOW to update the Postgis Topology layer.

● Draw a line and set attribute values ● Use stored procedures● Use one single transaction● Rollback if any errors● Java backend with JSON API● Simple test client using this API

Two comments about update

7/18/15 401 billion points in Postgis Topology

1) Jostein head of AR5 “Don't delete old lines, it's nice toknow the history behind changes”.

2) Ingvild my boss “Why do I have to move old lines aroundwith many hundreds points, why can´t I just give you a newsimple line that just shows the difference ?”

Edit Topology data with surface data

7/18/15 411 billion points in Postgis Topology

Draw a polygon

Split a polygonUpdate surface attributes

Extend a polygon

Edit Topology : Split a polygon- Original map

7/18/15 421 billion points in Postgis Topology

Edit Topology : Split a polygon- Input : point, line, attribute values

7/18/15 431 billion points in Postgis Topology

Edit Topology : What happens when you have a split surface operation.

1 billion points in Postgis Topology

Java backend calls : apply_line_on_topo_flate( geo_in geometry,p_in geometry, artype_in int, arskogbon_in int,artreslag_in int, argrunnf_in int)

And the following happens- Adjust input input line to current data and take in account that equal surface be equal- Compute the area to be update- Take a copy of the non changed data- Take a copy of data may change- Clear data from the line attribute table- Clear data from the topo surface layer and delete rows to be changed- Add the adjusted line by topology.toTopoGeom- Update the line attribute table- Create new surfaces with new attribute value- Create old surfaces with old value- Check that non changed area is still the same

Edit Topology : Timing issues when you have a split surface operation.

1 billion points in Postgis Topology

Java backend calls this function

topo_ar5.apply_line_on_topo_flate( geo_in geometry, p_in geometry,artype_in int, arskogbon_in int, artreslag_in int, argrunnf_in int)

Small operations that include few changes takes a 1000 ms, but bigger oprations may minutts

http://trac.osgeo.org/postgis/ticket/2083

Edit Topology : Split a polygon- New map

7/18/15 461 billion points in Postgis Topology

Edit Topology : Extend a polygon

7/18/15 471 billion points in Postgis Topology

Edit Topology : Extend a polygon

7/18/15 481 billion points in Postgis Topology

Edit Topology : Extend a polygon.

1 billion points in Postgis Topology

Java backend call this function:

apply_line_on_topo_flate( geo_in geometry, p_in geometry,artype_in int, arskogbon_in int, artreslag_in int,argrunnf_in int)

Where p_in (0.0) means not set.

Edit Topology : Extend a polygon

7/18/15 501 billion points in Postgis Topology

Edit Topology : Draw a new polygon

7/18/15 511 billion points in Postgis Topology

Edit Topology : Draw a new polygon

7/18/15 521 billion points in Postgis Topology

Edit Topology : Draw a new polygon.

1 billion points in Postgis Topology

Java backend call this function: apply_polygon_on_topo_flate(geo_in geometry, artype_in int, arskogbon_in int,artreslag_in int, argrunnf_in int)

Edit Topology : Draw a new polygon

7/18/15 541 billion points in Postgis Topology

Further plans this year● Add many new layer to Postgis Topology this fall and

adjust the Topology model to new requirements. ● Create a client that uses JSON API for update of

topology layers.● Extend update API with more functionality.● We have to work more on performance and topology

usage and update client for AR5 .

Postgis Topology is a great tool and you can add one billion points and it's possible to update it afterwords.

Thanks to everybody that has contributed to Postgis Topology and other open source tools.

Questions ?

Thanks for your attention.


Recommended