Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | osmfstateofthemap |
View: | 507 times |
Download: | 0 times |
Inside
Roland Olbricht
at SOTM 2013 in Birmingham
Overview
1. The server as a whole
2. Processing of requests
3. The query statement pipeline
1. The server as a whole
4000 to 6000Unique IPs
per day
150'000 to250'000
requests per day
10 GB to 30 GBresult data
per day
Downloadsize per IP
> 1 GB
100 MB – 1GB
10 MB – 100MB
1 MB – 10 MB
100 KB – 1 MB
10 KB – 100 KB
1 KB – 10 KB
< 1 KB
Download sizeper request# Unique IPs
4
20
150
384
595
1104
1189
764
884'037
8'513'484
61'999
20'245
4'740
1'696
921
339
Statistics of 2013-08-30
Share resourcesacross 10^7 !
=> [timeout:...]: Server keeps track of „free time units“ Server accepts a client request if it is below half of free server time units
Client requests Server state
240000 free time units
239820 free time units
153420 free time units
because 153420/2 < 86400.153420 free time units
153240 free time units
With timeout 180 ?
With timeout 86400 ?
With timeout 86400 ?
With timeout 180 ?
Short allowed runtime High Priority
Long allowed runtime Low Priority
Since June 2012all requests with [timeout:...] < 180 acceptedrequests with longer timeout occasionally rejected
Share resourcesacross 10^7 !
2. Processing of requests
The bottleneck ...
… is disk I/O.
almost completely idle peaks oftennear 100%
node
[name=„Aston Business School“];
out;
Request
Disk time
Memory
„out“ vs „out skel“vs „out meta“
(node 1473072867,lat = 52.4867839,lon = -1.8884618)
node
[name=„Aston Business School“];
out;
(node 1473072867,lat = 52.4867839,lon = -1.8884618,amenity=bicycle_parkingbcc_ref=433bicycle_parking=standscapacity=10covered=yesname=Aston Business School)
out skel;
node
[name=„Aston Business School“];
out skel;
Request
Disk time
Memory(node 1473072867,lat = 52.4867839,lon = -1.8884618)
node
[name=„Aston Business School“];
(node 1473072867,lat = 52.4867839,lon = -1.8884618)
„out“ vs „out skel“vs „out meta“
out meta;out meta;
node
[name=„Aston Business School“];
Request
Disk time
Memory(node 1473072867,lat = 52.4867839,lon = -1.8884618)
node
[name=„Aston Business School“];
(node 1473072867,version = 2, timestamp = ...,…,lat = 52.4867839,lon = -1.8884618,amenity=bicycle_parkingbcc_ref=433bicycle_parking=standscapacity=10covered=yesname=Aston Business School)
„out“ vs „out skel“vs „out meta“
node
[name=„Aston Business School“];
out meta;
Request
Disk time
Memory(node 1473072867,lat = 52.4867839,lon = -1.8884618)
Every statementtakes disk time
Internally, we onlystore skeletons.
„out“ vs „out skel“vs „out meta“
3. The query statement pipeline
The query statementis a pipeline
Planning decisions
Collect ids of potential results
Copy from memory if possible
derive geo index from query
lookup geo index by ids
fetch all skeletons
cheap filtering
filter by key conditionals
expensive filtering
Ids
raw data
filtering
more conditions better than fewer
The query statement pipeline:
node[name=„Aston Business School“];
Planning decisionsCollect ids of potential resultsCopy from memory if possible
derive geo index from querylookup geo index by idsfetch all skeletons
cheap filteringfilter by key conditionalsexpensive filtering
Disk time
Collect ids of potential results
lookup geo index by idsfetch all skeletons
(node 1473072867)
(Idx 0x42f00f00)(node 1473072867,lat=52.487, lon=-1.889)
The query statement pipeline:
node[amenity=bicycle_parking];
Planning decisionsCollect ids of potential resultsCopy from memory if possible
derive geo index from querylookup geo index by idsfetch all skeletons
cheap filteringfilter by key conditionalsexpensive filtering
Disk time
Collect ids of potential results
lookup geo index by idsfetch all skeletons
(node 1000, …,node …, node …,node 1473072867,node …, node …) [~ 80'000 objects]
(Idx 0x1, 0x2, 0x3, …,...)((node 1, lat=..., lon=..., …, (node 1473072867, lat=52.487, lon=-1.889), ...)
…
~80'000 disc seeks
…
~30'000 disc seeks
(node 1000, …,node …, node …,node 1473072867,node …, node …) [~ 80'000 objects]
The query statement pipeline:node[amenity=bicycle_parking](52.48, -1.89, 52.49, -1.88);
Planning decisionsCollect ids of potential resultsCopy from memory if possible
derive geo index from querylookup geo index by idsfetch all skeletons
cheap filteringfilter by key conditionalsexpensive filtering
Disk time
Collect ids of potential results
fetch all skeletons
(Idx 0x42f00f00)
(node 1473072867,lat=52.487, lon=-1.889)
derive geo index from query
The query statement pipeline:node[name=„Aston Business School“](51.0, -3.0, 60.0, 3.0);
Planning decisionsCollect ids of potential resultsCopy from memory if possible
derive geo index from querylookup geo index by idsfetch all skeletons
cheap filteringfilter by key conditionalsexpensive filtering
Disk time
Collect ids of potential results
fetch all skeletons
(Idx 0x42000000, …, Idx 0x42ffffff)
(node 1473072867,lat=52.487, lon=-1.889)
derive geo index from query
…
~3'000 disc seeks
(node 1473072867)
Resumee
Be bold, the server cares for large queries
Select right „out“ mode for performanceand for quick testing
Use all available information,in particular small bounding boxesand specific search conditionals
Thank you for your attention