Big size meteorological data processing and mobile displaying system using PostGIS and GeoServerBJ Jang, JW Geum, JH Kwun, HG Park
2
The system was SLOW not because of using PostGIS
but because of NOT TUNING.
PostGIS definitely could make good performance if the
system has been PROPERLY TUNED.
Let’s go over some tuning skills from CASE OF MOBILE
weather chart service of KMA.
Objective
3
Background
Mobile Weather Chart Service Flow
Observation Data
Model
Improvement of performance part by tuning
4
GRIBData
VectorChart
ChartService
Vectorize Image
GRIB Data Weather Chart for service
Korea Meteorological Administration
Vector Chart
※GRIB DATA : GRIdded Binary or General Regularly-disrtibuted Information in Binary form - standardized by the World Meteorological Organization
indexingKeeping data sizeImport speed Problems to solvedbackground
Vector Weather Chart
5
Software Architecture
indexingKeeping data sizeImport speed Problems to solvedbackground
Characteristics of Weather Data
6
• Geographically low resolutionLow Resolution
• Surface + Height(Isobaric surface)• Analysis model• Data time + Forecasting time
Multiple
Dimension
• A few times ~ hundreds timesFrequent
Production
• Always need up-to-date dataRealtime/
Near Realtime
indexingKeeping data sizeImport speed Problems to solvedbackground
Data usage per day
46
35,0005,332
67M
times generation (00, 06, 12, 18 UTC)
# of spatial table:
# of weather charts:
MB data
# of spatial data columns:
indexingKeeping data sizeImport speed Problems to solvedbackground
8
Problems to be solved
Problems of Existing System
Slow data collection
Difficult big size data
management
Slow searching for
Weather Chart
9
indexingKeeping data sizeImport speed Problems to solvedbackground
Why is the service slow?
Failed to understand characteristics of data
10
indexingKeeping data sizeImport speed Problems to solvedbackground
Improvement Goal
11
5 hr to insert data
Data file grows 35 GB per day
Tens of seconds to search a single weather chart
Inserting less than 3o min.
Keep the size of data file fixed
Searching a weather chart within a few second
PROBLEMS GOALS
Inserting less than 30 min.
Keep the size of data file fixed
Searching a weather chart within a few second
Using addBatch() & excuteBatch()
Using partitioning & truncate
Improvement on index
GOALS ACTIVITY
indexingKeeping data sizeImport speed Problems to solvedbackground
Improvement on importing speed for big size data
using batch
12
General Data Processing Time
source: http://novathin.kr/19
Run one by one
Run one time after gathering as much as batch size
There is big difference according to the way of executing SQL!
The time required each batch size
13
indexingKeeping data sizeImport speed Problems to solvedbackground
One weather chart kml file executing 3,000 columns test criteria
Import speed comparison
14
# of addBatch() # of execution Time(sec)
0 3,000 109.0
100 30 8.9
500 6 5.7
1,000 3 3.4
3,000 1 1.1
1 insert / 1 commit kml file(3,000 insert) / 1 commit
indexingKeeping data sizeImport speed Problems to solvedbackground
15
Keeping data file size
by managing table
16
Data Management of PostGIS PostGIS is write-once.
Not deleting updated and deleted data Recording new data below after marking
Pros Fast Can manage several versions of data
Cons Data file size can be extremely increased Low performance by increase of file size Weather Chart DB file increases by 35 GB
per day!!!
indexing Keeping data sizeImport speed Problems to solvedbackground
17
Snapshot vs Write-once
table
A
B’
C
D
E
table
A
B X
C
D
E
B’
snap-shot
B
Transaction owner
Other users
Record be-fore re-newal
Record af-ter re-newal
Record be-fore re-newal
Record af-ter re-newal
Oracle / MySQL PostgreSQL
After complet-ing transection
indexing Keeping data sizeImport speed Problems to solvedbackground
18
General VACUUMTable
A
B X
C X
D
E X
B’
C’
Table
A
B X
C X
D
E X
B’
C’
Table
A
F
C X
D
E X
B’
C’
No need
B X
C X
E X
FSM
No need
C X
E X
FSM
VACUUM execution Data Insert
Source: http://www.geocities.jp/sugachan1973/doc/funto60.html
In terms of PostGIS for KMA’s weather charts, general vac-uum functions can’t solve the problem that data files con-tinuously increase.
indexing Keeping data sizeImport speed Problems to solvedbackground
19
VACUUM FULL
On PostGIS for KMA’s weather chart, it takes 15 hr. for full vacuum.During Vacuum full, exclusive LOCK happens.
Source: http://www.devmedia.com.br/otimizacao-uma-ferramenta-chamada-vacuum/1710
unused space arrange for big size data management
VACUUM FULL
indexing Keeping data sizeImport speed Problems to solvedbackground
20
Partitioning Partitioning?
Managing tables by conceptually separating one table to several
Data size by table down Index size down and Search speed up Weather Chart
Weather C
hart_0
Weather C
hart_1
Weather C
hart_2
Weather C
hart_3
Weather C
hart_4
Weather C
hart_5
Weather C
hart_6
Insert on Sunday
Insert on Monday
Insert on Tuesday
Truncate on Sunday
Truncate on Monday
Truncate on Tuesday
Execution time of truncate is almost a few seconds and file size is decreased without vacuum
indexing Keeping data sizeImport speed Problems to solvedbackground
21
Improvement on inquiry speed by resetting index
Improvement flow of inquiry speed
22
Data Con-dition Analysis
Query Finding
Query Plan Analysis
Index Im-provement
indexingKeeping data sizeImport speed Problems to solvedbackground
23
Data Condition Analysis Understanding # of
columns by table select count(*)
table_name is foolish! Possible to understand
the number of rows if using statistical table
Meaningful data is stored on pg_class ta-ble
Execution time within one minute
select relname as table_name, to_char(reltuples, '999,999,999') as row_countfrom pg_class where relnamespace = (select oid from pg_namespace where nspname = 'public')and relam = 0order by 2 desc, 1;
indexingKeeping data sizeImport speed Problems to solvedbackground
GeoServer SQL VIEW
Register sql query as Layer Datasource is geoDB, can
use SQL VIEW Useful
Complex condition to layer Reprojection Able to join multiple tables normal attribute -> spatial ob-
jectGeoServer , showing weather chart, per-fomance is affected bysearching speed of PostGIS
24
indexingKeeping data sizeImport speed Problems to solvedbackground
25
Query Finding
Identifying executed SQL using statistical
Using table pg_stat_activity table
Necessary process for tuning
Possible to check execu-tion time
Differences of queries by PostGIS versionselect query_start, current_query
from pg_stat_activitywhere username = ‘mobile’and current_query not like ‘<IDLE>%’order by query_start desc;
SELECT "val",encode(ST_AsBinary(ST_Force_2D("geom")),'base64') as "geom" FROM (
select mdl, mdl_var, placemark_name, val, lyrs_cd, forecast_time,
create_time as anal_time, ST_Transform(the_geom, 7188) as geom
from contour where mdl_var = 'TMP'
) as "vtable"WHERE (((("mdl" = 'GDAPS' AND "lyrs_cd" = 'A925.0') AND "forecast_time" = '2011.06.27 00:00') AND "anal_time" = '2011.06.27 00:00') AND "geom" && ST_GeomFromText('POLYGON ((-1056768 -2105344, -1056768 -1040384, 8192 -1040384, 8192 -2105344, -1056768 -2105344))', 7188));
indexingKeeping data sizeImport speed Problems to solvedbackground
Query Plan Analysis PostGIS has basically query analysis function
pgAdmin III-Query-Analysis explanation function Explain Analyze command – Easy to analyze query
26
indexingKeeping data sizeImport speed Problems to solvedbackground
Index Improvement Principles for Index Improvement
Setting index with all columns on Where clause Spatial column has separate index Columns with lots of including data types come first Possibly, items compared as same operator come first Unnecessary index should be removed due to bad performance on
inserting Examples
-- contour_0DROP INDEX index_createtime_contour_0;DROP INDEX index_forecasttime_contour_0;DROP INDEX index_lyrscd_contour_0;DROP INDEX index_mdl_contour_0;DROP INDEX index_mdlvar_contour_0;CREATE INDEX index_contour_0_all ON contour_0 (forecast_time ASC NULLS LAST, mdl_var ASC NULLS LAST, lyrs_cd ASC NULLS LAST, create_time DESC NULLS LAST, mdl ASC NULLS LAST);
Result After individually deleting index, integrated index creation reduces 20% of
data capacity 6 ~ 25 times speed improvement by table(big tables show better perfor-
mance)27
indexingKeeping data sizeImport speed Problems to solvedbackground
28
Improvement Result
29
Under 300 isobaric/Temperature/ Isokinetics
Ground/Wet-number/Temperature
800 isobaric/Mixture ratio/Temperature
ConclusionImportance on excution using
addBatch() and excute-Batch()
About 100 times performance im-
provement
Mixed with partition-ing and truncate
Stably keeping N-1 accordance
Appropriate index for query
20 times inquiry time improvement
30
31
ConclusionPostGIS is really great DBMS!
Perfectly suited with GeoServer
However, tuning with per-fect understanding of the features.