High Performance Geoserver Clusters.pdf

Post on 18-Jan-2017

256 views 6 download

transcript

High Performance Geoserver ClustersDerek Kern - Ubisense, Inc

1

What this talk is about• I want to walk you through the reasoning and process

involved in scaling and clustering GeoServer / GeoWebCache

• Why would you begin scaling and clustering?• While scaling and clustering, what do you need to

consider?

2

What this talk is *not* about• This talk is *not* about tuning an individual GeoServer

instance• This is an important and complicated topic• However, it has already been covered quite well at

numerous previous Foss4g gatherings– Google for the talk entitled: GeoServer on steroids

• We will discuss GeoServer parameters only insofar as they are needed for scaling and clustering

3

We use GeoServer a lot

4

We use GeoServer a lot

5

Why scaling and clustering?• So, what events would initiate the scaling / clustering

process?– Poor application performance– GeoServer machine resources being exhausted– Onboarding new users– Onboarding new feature layers / layer groups– Onboarding new spatial applications– Onboarding new spatial data sources

6

Why scaling and clustering?• So, what events would initiate the scaling / clustering

process? (cont’d)– Changing the scales at which layers are being rendered– Others

• At one customer site, we are using GeoServer / GeoWebCache, nightly, to construct SQLite tilestores that are distributed to offline users

7

Why scaling and clustering?• These events are relevant to performance insofar as

they relate to the following factors affecting performance:– Number of users, i.e. tile requests (duh)– Hardware capacity (GeoServer and/or Database)– Network capacity– Database structure– Feature density

8

Zoom in - Database structure• The structure of tables, tablespaces, etc can affect the

rate at which data can be queried and rendered onto tiles

• Example #1: If a feature table is large enough, then missing spatial index could dramatically slow the rendering process

• Example #2: In PostgreSQL, a table needing vacuuming might be enough to slow the rendering process

9

Zoom in - Feature density• The density of features per tile can greatly affect

performance• This offers strong incentive to be very careful when

choosing the scales at which to display features• I’ve witnessed poor choices bring an entire GeoServer

cluster to its knees

10

Arch #1 - The Starting Point

11

This is the portrait

of simplicity

Arch #1 - The Starting Point• This is the starting point for many geographically-

enabled web applications• There is a single, generic application server (Django,

Ruby on Rails, etc)• There is a single database server (PostgreSQL,

MySQL, Oracle, etc)• There is single GeoServer and it is using its bundled

GeoWebCache for caching

12

Quick Note - Scaling In or Out?• GeoServer can obviously be scaled across machines• However, it can also be scaled within a machine, i.e.

multiple GeoServer instances can run on different ports on a single machine

• Let’s call the former “scaling out” and the latter “scaling in”

• Most of this talk is structured around scaling out, but is equally applicable to scaling in

• “GeoServer on steroids” has some content on scaling in13

Arch #2 - Obvious next step

14

Arch #2 - Obvious next step• We’ve simply added another GeoServer instance• This architecture, theoretically, has double the capacity

as #1• It is also a very easy step to make• However, it has problems

– How is traffic to be balanced between the two servers?• Traffic management must be dealt with, somehow, by the

application server

15

Arch #2 - Obvious next step• However, it has problems (cont’d)

– Configuration data is not shared so configuration changes must be made twice

• *** Assuming the instances are serving the same layers– Tiles are being cached twice

• Duplication of effort• Managing expired tiles is now doubly difficult

• Aside: Handling expired tiles– GeoRSS– Bulk layer cache clearing ✭– Targeted layer cache clearing ✭

16

★ Careful: GeoServer disk quota processing can cause problems when tiles are cleared by means of the OS, i.e. ‘rm’. It should be disabled when clearing using ‘rm’

Aside - Tile lifecycle spectrumHigh Entropy Content

|| || || || Real-time Near real-time Near daily DailyΔ: x<4m Δ: 4m≤x<3h Δ: 3h≤x<1d Δ: 1d≤x<3d

Outage status Device status Veggie mgmt CustomersScada status Trouble calls Construction status As builtVehicle position

17

Aside - Tile lifecycle spectrumLow Entropy Content

|| || || || Weekly Monthly Yearly NeverΔ: 3d≤x<10d Δ: 10d≤x<3ms Δ: 3ms≤x<2y Δ: N/A

Legacy as built Roads Rail State boundariesCustomers Parcels City/county boundaries Water features

18

Aside - Tile lifecycle spectrum• Most applications have content (being rendered onto

tiles) that fall all over the lifecycle spectrum• The appropriate GeoServer / GeoWebCache

architecture will ultimately be driven by factors that include:– The lifecycle of the tiles being served– The amount of data being served– The amount of time needed to render tiles– The number of users requesting GeoServer / GeoWebCache

tiles19

Arch #2 - Obvious next step• Example

– If we balance the load by hits, then one GeoServer would serve the ‘NHealth’ layer and the other GeoServer would serve all other layers

– Given the considerations already covered, would this be an equitable balance?

– The answer: Not necessarily

20

Layer Name Hit% Refresh CycleNHealth 50% DailyStatus of Accounts 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily

Total 100%

Arch #2 - Obvious next step• Example (cont’d)

– The ‘Account Status’ layer is refreshed every 15 minutes so, depending upon how many tiles are expired during each cycle, the tile cache might be less effective

– In order to strike an equitable balance, the statistic we want is:

• Total hits * Average tile output time• This statistic is, essentially, total tile

output time (TTOT)– As it turns out, balancing layer TTOT is

difficult, if architecture is not considered21

Layer Name Hit% Refresh CycleNHealth 50% DailyAccount Status 18% 15 minsDevices 8% DailyActives 8% 1 hourCables 6% 30 minsProblem Accounts 4% DailyTransmit 3% DailyTickets 1% 20 minsOutage Nodes 1% 10 minsRegion 1% DailyFacility 1% Daily

Total 100%

Arch #3 - A Little Better

22

Arch #3 - A little better• We’ve added a load balancer to mediate traffic between

the GeoServer instances• The application server will point clients to the load

balancer when tiles are needed• While the theoretical capacity hasn’t changed, this

architecture is better able to exploit that capacity

23

Arch #3 - A little better• The load balancer can be hardware or software-based

– Examples• mod_proxy_balancer• NgiNX• BigIP• Barracuda

• This architecture still has problems– Configuration data is still not shared so configuration changes

must be made twice *** Assuming the instances are serving the same layers

24

Arch #3 - A little better• This architecture still has problems (cont’d)

– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult

25

Arch #4 - Almost there

26

Arch #4 - Almost there• We’ve made a minor change in the storage of

configuration data• Configuration data is now being stored in one location

and shared amongst GeoServers via NFS• One of the GeoServer instances should be designated

as the writer. Configuration changes will be handled by the writer. All other GeoServer instances will be readers

27

Arch #4 - Almost there• Rather than NFS*, configuration data can also be

shared using rsync• The web administration interface should be disabled for

the reader instances– Add -DGEOSERVER_CONSOLE_DISABLED=true to the Tomcat

startup command line– From WEB-INF/lib, delete files matching gs-web*-.jar files

28

* For those who know Linux well, we have chosen to mount the configuration data using autofs, not /etc/fstab .

Arch #4 - Almost there• Again, this architecture has problems

– Tiles are still being cached twice• There is still a duplication of effort• Managing expired tiles is still doubly difficult

– Configuration data is now shared. However, each time the configuration data is changed by the writer, each GeoServer instance must be instructed to re-read its configuration data

• Luckily, this problem is solvable

29

Arch #4 - Almost there• Instructing GeoServers to reread their configuration data

30

#!/bin/bash

# Get the Geoserver server hostname prefixGSRVR_HNAME_PREFIX=$(hostname | rev | cut -c 3- | rev)

# Loop over possible Geoserver server hostnames.for i in {1..20}do # Assembled the next possible Geoserver server hostname GSRVR_HNAME="${GSRVR_HNAME_PREFIX}$(printf %02d ${i})"

# See if the machine exists on the network PING_TEST=$(ping -c 2 -W 2 ${GSRVR_HNAME} &> /dev/null ; echo $?) if [ "${PING_TEST}" -eq 0 ] then

# The server exists so send the reload commandcurl -u admin:geoserver -X POST -d "reload_configuration=1" "http://${GSRVR_HNAME}:8080/geoserver/rest/reload"

echo "Reloaded configurations on ${GSRVR_HNAME}" fidone

Arch #5 - Cooking with grease

31

Arch #5 - Cooking with grease• We’ve put a GeoWebCache instance in front the

GeoServer instances. It is now responsible for caching tiles. GeoServers are now just tile generators

• We have a single, unified cache• GeoWebCache uses the load balancer to determine

which GeoServer instance will generate the tile that it needs

• This architecture is now poised to exploit the maximum amount of tiling capacity from the GeoServer instances

32

Arch #5 - Cooking with grease• This architecture has two minor problems

– GeoWebCache has its own configuration data that must be maintained. Furthermore, this configuration data is dependent upon the configuration of the GeoServer instances. Again, it looks like we are back in the position of having to make configuration changes twice

– GeoWebCache caches must be cleared for GeoServer layers whose configuration data has changed

– Both of these problems are solvable

33

Arch #5 - Cooking with grease• Re-writing GeoWebCache config from GeoServer config

34

#!/bin/bash

# Set the start tagNEW_LAYERS_XML=" <layers>\n"

for a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_name}" = "" ] then

continue fi

# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))') if [ "${workspace_name}" != "" ] then

layer_name="${workspace_name}:${layer_name}" fi

Arch #5 - Cooking with grease

35

# Add the layer def NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsLayer>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <name>${layer_name}</name>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} <wmsUrl><string>http://$(hostname)/geoserver/wms</string></wmsUrl>\n" NEW_LAYERS_XML="${NEW_LAYERS_XML} </wmsLayer>\n"done

# Set the end tagNEW_LAYERS_XML="${NEW_LAYERS_XML} </layers>\n"

# Put the newly generated layer definitions into the GeoWebCache configuration

# Use ElementTree to write the new layers XML to the geowebcache configuration.echo "Writing layers taken from Geoserver configuration to the GeoWebCache configuration"${PYTHONHOME}/bin/python << EOFimport xml.etree.ElementTree as ET

# Read in the current GeoWebCache configurationcgeowebcache = ET.parse( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml" )

# Get the namespace from the Geoserver docnamespace = cgeowebcache.getroot().tag.split( '}' )[0].strip( '{' )

# Register the namespaceET.register_namespace( "", namespace )

Arch #5 - Cooking with grease

36

# Build an element to contain the new layers XMLnewlayers = ET.fromstring( "${NEW_LAYERS_XML}" )

# Get the old layers so they can be removedoldlayers = cgeowebcache.find( "{" + namespace + "}layers" )

# Remove the old layerscgeowebcache.getroot().remove( oldlayers )

# Add the new layerscgeowebcache.getroot().append( newlayers )

# Write out the new GeoWebCache XMLcgeowebcache.write( "${GEOWEBCACHE_CACHE_DIR}/geowebcache.xml", encoding="utf-8", xml_declaration=True )EOF

# Finally, tell GeoWebCache to reread its layersecho "Forcing GeoWebCache to reread its configuration"curl -s -u geowebcache:secured -d "reload_configuration=1" http://localhost:8080/geowebcache/rest/reload > /dev/null

Arch #5 - Cooking with grease

37

# Clear the caches for any layers whose definitions have changed in the last 2 hoursfor a_layer_xml in $(find ${GEOSERVER_DATA_DIR} -name '*.xml' -mtime -2 -exec grep -l -i -E "<layer>|<layerGroup>" {} \;)do # Get the layer name from the file layer_cache_directory_name=$(sed -n '/name/{s/.*<name>//;s/<\/name.*//;p;}' ${a_layer_xml}) if [ "${layer_cache_directory_name}" = "" ] then continue fi

# Get the workspace name from the path workspace_name=$(echo ${a_layer_xml} | grep -Po '(?<=(workspaces/))\w+(?=(/))')

if [ "${workspace_name}" != "" ] then layer_cache_directory_name="${workspace_name}_${layer_cache_directory_name}" fi # Now, clear the cache associated with the layer echo "Clearing cache directory ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}" rm -rf ${GEOWEBCACHE_CACHE_DIR}/${layer_cache_directory_name}done

Other thoughts on caching• Block Size

– File system block size on GeoWebCache server(s) can be very important

– The default block size for RedHat ext4 is 4K– Very often, raster tiles can be less than 4K in size. Sometimes

less than 2K– If the file system block size is too large, then GeoWebCache

can prematurely exhaust its disk space– Note, however, setting block size too small can adversely affect

performance– This is clearly a balancing act

38

Scaling and clustering obstacles• The capacity of the database server will circumscribe

the capacity of the cluster• Poor configuration• Poor usage (e.g. reading dense layers at high scales)• Network capacity

39

A little benchmarking• I did some very simple benchmarking in order to give you

some idea of scaling• I had access to three machines for benchmarking

– (1) My desktop• Linux Mint 17 Qiana (Ubuntu-based)• AMD FX-8150 Eight-Core Processor 3.7 GHz• 16Gb RAM

– (2) Out-dated laptop• CentOS 6.7 (Redhat-based)• Intel i7-2760QM 2.4 GHz• 8Gb RAM

40

A little benchmarking• Machines for benchmarking

– (3) Really ancient laptop• CentOS 6.7 (Redhat-based)• Intel i5-2530M 2.5 GHz• 8Gb RAM

• GeoWebCache wasn’t used as part of the benchmark. The benchmark is meant to measure the amount of processing power being added

41

A little benchmarking• Benchmark configurations

– 1 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container

– 2 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 1 GeoServer/Tomcat container• Machine (3) running: 1 GeoServer/Tomcat container

– 3 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 1 GeoServer/Tomcat container

42

A little benchmarking• Benchmark configurations (cont’d)

– 4 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 2 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers

– 5 GeoServers• Machine (1) running: PostgreSQL 9.5, Apache 2.4• Machine (2) running: 3 GeoServer/Tomcat containers• Machine (3) running: 2 GeoServer/Tomcat containers

43

A little benchmarking - results• The performance jump from

1 to 2 GeoServers is substantial

• The performance jump from 2 to 3 GeoServers is less substantial. This is likely due to hardware limitations

• The performance slumps from 4 to 5 GeoServers. At this point, we’ve probably overloaded the hardware. Remember, at this point, the outdated laptop has 3 GeoServer containers

44

45

?

FIND OUT MOREContact Your NameYour TitleEmail: your.name@ubisense.netDirect Line: +44 (0)1223 Insert No.

FIND OUT MOREDerek KernPrincipal ArchitectEmail: derek.kern@ubisense.net

www.ubisense.net

Thank you!

46