+ All Categories
Home > Documents > High Performance Geoserver Clusters.pdf

High Performance Geoserver Clusters.pdf

Date post: 18-Jan-2017
Category:
Upload: lytuyen
View: 256 times
Download: 6 times
Share this document with a friend
46
High Performance Geoserver Clusters Derek Kern - Ubisense, Inc 1
Transcript
Page 1: High Performance Geoserver Clusters.pdf

High Performance Geoserver ClustersDerek Kern - Ubisense, Inc

1

Page 2: High Performance Geoserver Clusters.pdf

What this talk is about• I want to walk you through the reasoning and process

involved in scaling and clustering GeoServer / GeoWebCache

• Why would you begin scaling and clustering?• While scaling and clustering, what do you need to

consider?

2

Page 3: High Performance Geoserver Clusters.pdf

What this talk is *not* about• This talk is *not* about tuning an individual GeoServer

instance• This is an important and complicated topic• However, it has already been covered quite well at

numerous previous Foss4g gatherings– Google for the talk entitled: GeoServer on steroids

• We will discuss GeoServer parameters only insofar as they are needed for scaling and clustering

3

Page 4: High Performance Geoserver Clusters.pdf

We use GeoServer a lot

4

Page 5: High Performance Geoserver Clusters.pdf

We use GeoServer a lot

5

Page 6: High Performance Geoserver Clusters.pdf

Why scaling and clustering?• So, what events would initiate the scaling / clustering

process?– Poor application performance– GeoServer machine resources being exhausted– Onboarding new users– Onboarding new feature layers / layer groups– Onboarding new spatial applications– Onboarding new spatial data sources

6

Page 7: High Performance Geoserver Clusters.pdf

Why scaling and clustering?• So, what events would initiate the scaling / clustering

process? (cont’d)– Changing the scales at which layers are being rendered– Others

• At one customer site, we are using GeoServer / GeoWebCache, nightly, to construct SQLite tilestores that are distributed to offline users

7

Page 8: High Performance Geoserver Clusters.pdf

Why scaling and clustering?• These events are relevant to performance insofar as

they relate to the following factors affecting performance:– Number of users, i.e. tile requests (duh)– Hardware capacity (GeoServer and/or Database)– Network capacity– Database structure– Feature density

8

Page 9: High Performance Geoserver Clusters.pdf

Zoom in - Database structure• The structure of tables, tablespaces, etc can affect the

rate at which data can be queried and rendered onto tiles

• Example #1: If a feature table is large enough, then missing spatial index could dramatically slow the rendering process

• Example #2: In PostgreSQL, a table needing vacuuming might be enough to slow the rendering process

9

Page 10: High Performance Geoserver Clusters.pdf

Zoom in - Feature density• The density of features per tile can greatly affect

performance• This offers strong incentive to be very careful when

choosing the scales at which to display features• I’ve witnessed poor choices bring an entire GeoServer

cluster to its knees

10

Page 11: High Performance Geoserver Clusters.pdf

Arch #1 - The Starting Point

11

This is the portrait

of simplicity

Page 12: High Performance Geoserver Clusters.pdf

Arch #1 - The Starting Point• This is the starting point for many geographically-

enabled web applications• There is a single, generic application server (Django,

Ruby on Rails, etc)• There is a single database server (PostgreSQL,

MySQL, Oracle, etc)• There is single GeoServer and it is using its bundled

GeoWebCache for caching

12

Page 13: High Performance Geoserver Clusters.pdf

Quick Note - Scaling In or Out?• GeoServer can obviously be scaled across machines• However, it can also be scaled within a machine, i.e.

multiple GeoServer instances can run on different ports on a single machine

• Let’s call the former “scaling out” and the latter “scaling in”

• Most of this talk is structured around scaling out, but is equally applicable to scaling in

• “GeoServer on steroids” has some content on scaling in13

Page 14: High Performance Geoserver Clusters.pdf

Arch #2 - Obvious next step

14

Page 15: High Performance Geoserver Clusters.pdf

Arch #2 - Obvious next step• We’ve simply added another GeoServer instance• This architecture, theoretically, has double the capacity

as #1• It is also a very easy step to make• However, it has problems

– How is traffic to be balanced between the two servers?• Traffic management must be dealt with, somehow, by the

application server

15

Page 16: High Performance Geoserver Clusters.pdf

Arch #2 - Obvious next step• However, it has problems (cont’d)

– Configuration data is not shared so configuration changes must be made twice

• *** Assuming the instances are serving the same layers– Tiles are being cached twice

• Duplication of effort• Managing expired tiles is now doubly difficult

• Aside: Handling expired tiles– GeoRSS– Bulk layer cache clearing ✭– Targeted layer cache clearing ✭

16

★ Careful: GeoServer disk quota processing can cause problems when tiles are cleared by means of the OS, i.e. ‘rm’. It should be disabled when clearing using ‘rm’

Page 17: High Performance Geoserver Clusters.pdf

Aside - Tile lifecycle spectrumHigh Entropy Content

|| || || || Real-time Near real-time Near daily DailyΔ: x<4m Δ: 4m≤x<3h Δ: 3h≤x<1d Δ: 1d≤x<3d

Outage status Device status Veggie mgmt CustomersScada status Trouble calls Construction status As builtVehicle position

17

Page 18: High Performance Geoserver Clusters.pdf

Aside - Tile lifecycle spectrumLow Entropy Content

|| || || || Weekly Monthly Yearly NeverΔ: 3d≤x<10d Δ: 10d≤x<3ms Δ: 3ms≤x<2y Δ: N/A

Legacy as built Roads Rail State boundariesCustomers Parcels City/county boundaries Water features

18

Page 19: High Performance Geoserver Clusters.pdf

Aside - Tile lifecycle spectrum• Most applications have content (being rendered onto

tiles) that fall all over the lifecycle spectrum• The appropriate GeoServer / GeoWebCache

architecture will ultimately be driven by factors that include:– The lifecycle of the tiles being served– The amount of data being served– The amount of time needed to render tiles– The number of users requesting GeoServer / GeoWebCache

tiles19

Page 20: High Performance Geoserver Clusters.pdf

Arch #2 - Obvious next step• Example

– If we balance the load by hits, then one GeoServer would serve the ‘NHealth’ layer and the other GeoServer would serve all other layers

– Given the considerations already covered, would this be an equitable balance?

– The answer: Not necessarily

20

Layer Name Hit% Refresh CycleNHealth 50% DailyStatus of Accounts 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily

Total 100%

Page 21: High Performance Geoserver Clusters.pdf

Arch #2 - Obvious next step• Example (cont’d)

– The ‘Account Status’ layer is refreshed every 15 minutes so, depending upon how many tiles are expired during each cycle, the tile cache might be less effective

– In order to strike an equitable balance, the statistic we want is:

• Total hits * Average tile output time• This statistic is, essentially, total tile

output time (TTOT)– As it turns out, balancing layer TTOT is

difficult, if architecture is not considered21

Layer Name Hit% Refresh CycleNHealth 50% DailyAccount Status 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily

Total 100%

Page 22: High Performance Geoserver Clusters.pdf

Arch #3 - A Little Better

22

Page 23: High Performance Geoserver Clusters.pdf

Arch #3 - A little better• We’ve added a load balancer to mediate traffic between

the GeoServer instances• The application server will point clients to the load

balancer when tiles are needed• While the theoretical capacity hasn’t changed, this

architecture is better able to exploit that capacity

23

Page 24: High Performance Geoserver Clusters.pdf

Arch #3 - A little better• The load balancer can be hardware or software-based

– Examples• mod_proxy_balancer• NgiNX• BigIP• Barracuda

• This architecture still has problems– Configuration data is still not shared so configuration changes

must be made twice *** Assuming the instances are serving the same layers

24

Page 25: High Performance Geoserver Clusters.pdf

Arch #3 - A little better• This architecture still has problems (cont’d)

– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult

25

Page 26: High Performance Geoserver Clusters.pdf

Arch #4 - Almost there

26

Page 27: High Performance Geoserver Clusters.pdf

Arch #4 - Almost there• We’ve made a minor change in the storage of

configuration data• Configuration data is now being stored in one location

and shared amongst GeoServers via NFS• One of the GeoServer instances should be designated

as the writer. Configuration changes will be handled by the writer. All other GeoServer instances will be readers

27

Page 28: High Performance Geoserver Clusters.pdf

Arch #4 - Almost there• Rather than NFS*, configuration data can also be

shared using rsync• The web administration interface should be disabled for

the reader instances– Add -DGEOSERVER_CONSOLE_DISABLED=true to the Tomcat

startup command line– From WEB-INF/lib, delete files matching gs-web*-.jar files

28

* For those who know Linux well, we have chosen to mount the configuration data using autofs, not /etc/fstab .

Page 29: High Performance Geoserver Clusters.pdf

Arch #4 - Almost there• Again, this architecture has problems

– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult

– Configuration data is now shared. However, each time the configuration data is changed by the writer, each GeoServer instance must be instructed to re-read its configuration data

• Luckily, this problem is solvable

29

Page 30: High Performance Geoserver Clusters.pdf

Arch #4 - Almost there• Instructing GeoServers to reread their configuration data

30

#!/bin/bash

# Get the Geoserver server hostname prefixGSRVR_HNAME_PREFIX=$(hostname | rev | cut -c 3- | rev)

# Loop over possible Geoserver server hostnames.for i in {1..20}do # Assembled the next possible Geoserver server hostname GSRVR_HNAME="${GSRVR_HNAME_PREFIX}$(printf %02d ${i})"

# See if the machine exists on the network PING_TEST=$(ping -c 2 -W 2 ${GSRVR_HNAME} &> /dev/null ; echo $?) if [ "${PING_TEST}" -eq 0 ] then

# The server exists so send the reload commandcurl -u admin:geoserver -X POST -d "reload_configuration=1" "http://${GSRVR_HNAME}:8080/geoserver/rest/reload"

echo "Reloaded configurations on ${GSRVR_HNAME}" fidone

Page 31: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease

31

Page 32: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease• We’ve put a GeoWebCache instance in front the

GeoServer instances. It is now responsible for caching tiles. GeoServers are now just tile generators

• We have a single, unified cache• GeoWebCache uses the load balancer to determine

which GeoServer instance will generate the tile that it needs

• This architecture is now poised to exploit the maximum amount of tiling capacity from the GeoServer instances

32

Page 33: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease• This architecture has two minor problems

– GeoWebCache has its own configuration data that must be maintained. Furthermore, this configuration data is dependent upon the configuration of the GeoServer instances. Again, it looks like we are back in the position of having to make configuration changes twice

– GeoWebCache caches must be cleared for GeoServer layers whose configuration data has changed

– Both of these problems are solvable

33

Page 34: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease• Re-writing GeoWebCache config from GeoServer config

34

#!/bin/bash

# Set the start tagNEW_LAYERS_XML=" <layers>\n"

for a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_name}" = "" ] then

continue fi

# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))') if [ "${workspace_name}" != "" ] then

layer_name="${workspace_name}:${layer_name}" fi

Page 35: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease

35

# Add the layer def NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsLayer>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <name>${layer_name}</name>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsUrl><string>http://$(hostname)/geoserver/wms</string></wmsUrl>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} </wmsLayer>\n"done

# Set the end tagNEW_LAYERS_XML="${NEW_LAYERS_XML} </layers>\n"

# Put the newly generated layer definitions into the GeoWebCache configuration

# Use ElementTree to write the new layers XML to the geowebcache configuration.echo "Writing layers taken from Geoserver configuration to the GeoWebCache configuration"${PYTHONHOME}/bin/python << EOFimport xml.etree.ElementTree as ET

# Read in the current GeoWebCache configurationcgeowebcache = ET.parse( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml" )

# Get the namespace from the Geoserver docnamespace = cgeowebcache.getroot().tag.split( '}' )[0].strip( '{' )

# Register the namespaceET.register_namespace( "", namespace )

Page 36: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease

36

# Build an element to contain the new layers XMLnewlayers = ET.fromstring( "${NEW_LAYERS_XML}" )

# Get the old layers so they can be removedoldlayers = cgeowebcache.find( "{" + namespace + "}layers" )

# Remove the old layerscgeowebcache.getroot().remove( oldlayers )

# Add the new layerscgeowebcache.getroot().append( newlayers )

# Write out the new GeoWebCache XMLcgeowebcache.write( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml", encoding="utf-8", xml_declaration=True )EOF

# Finally, tell GeoWebCache to reread its layersecho "Forcing GeoWebCache to reread its configuration"curl -s -u geowebcache:secured -d "reload_configuration=1" http://localhost:8080/geowebcache/rest/reload > /dev/null

Page 37: High Performance Geoserver Clusters.pdf

Arch #5 - Cooking with grease

37

# Clear the caches for any layers whose definitions have changed in the last 2 hoursfor a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -mtime -2 -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_cache_directory_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_cache_directory_name}" = "" ] then continue fi

# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))')

if [ "${workspace_name}" != "" ] then layer_cache_directory_name="${workspace_name}_${layer_cache_directory_name}" fi # Now, clear the cache associated with the layer echo "Clearing cache directory ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}" rm -rf ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}done

Page 38: High Performance Geoserver Clusters.pdf

Other thoughts on caching• Block Size

– File system block size on GeoWebCache server(s) can be very important

– The default block size for RedHat ext4 is 4K– Very often, raster tiles can be less than 4K in size. Sometimes

less than 2K– If the file system block size is too large, then GeoWebCache

can prematurely exhaust its disk space– Note, however, setting block size too small can adversely affect

performance– This is clearly a balancing act

38

Page 39: High Performance Geoserver Clusters.pdf

Scaling and clustering obstacles• The capacity of the database server will circumscribe

the capacity of the cluster• Poor configuration• Poor usage (e.g. reading dense layers at high scales)• Network capacity

39

Page 40: High Performance Geoserver Clusters.pdf

A little benchmarking• I did some very simple benchmarking in order to give you

some idea of scaling• I had access to three machines for benchmarking

– (1) My desktop• Linux Mint 17 Qiana (Ubuntu-based)• AMD FX-8150 Eight-Core Processor 3.7 GHz• 16Gb RAM

– (2) Out-dated laptop• CentOS 6.7 (Redhat-based)• Intel i7-2760QM 2.4 GHz• 8Gb RAM

40

Page 41: High Performance Geoserver Clusters.pdf

A little benchmarking• Machines for benchmarking

– (3) Really ancient laptop• CentOS 6.7 (Redhat-based)• Intel i5-2530M 2.5 GHz• 8Gb RAM

• GeoWebCache wasn’t used as part of the benchmark. The benchmark is meant to measure the amount of processing power being added

41

Page 42: High Performance Geoserver Clusters.pdf

A little benchmarking• Benchmark configurations

– 1 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container

– 2 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container• Machine (3) running: 1 GeoServer/Tomcat container

– 3 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 1 GeoServer/Tomcat container

42

Page 43: High Performance Geoserver Clusters.pdf

A little benchmarking• Benchmark configurations (cont’d)

– 4 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers

– 5 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 3 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers

43

Page 44: High Performance Geoserver Clusters.pdf

A little benchmarking - results• The performance jump from

1 to 2 GeoServers is substantial

• The performance jump from 2 to 3 GeoServers is less substantial. This is likely due to hardware limitations

• The performance slumps from 4 to 5 GeoServers. At this point, we’ve probably overloaded the hardware. Remember, at this point, the outdated laptop has 3 GeoServer containers

44

Page 45: High Performance Geoserver Clusters.pdf

45

?

Page 46: High Performance Geoserver Clusters.pdf

FIND OUT MOREContact Your NameYour TitleEmail: [email protected] Line: +44 (0)1223 Insert No.

FIND OUT MOREDerek KernPrincipal ArchitectEmail: [email protected]

www.ubisense.net

Thank you!

46


Recommended