+ All Categories
Home > Documents > Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data...

Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data...

Date post: 21-Jul-2018
Category:
Upload: vothuan
View: 215 times
Download: 0 times
Share this document with a friend
33
Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues Paolo Corti and Ben Lewis Harvard Center for Geographic Analysis 2017 FOSS4G Boston
Transcript
Page 1: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues

Paolo Corti and Ben LewisHarvard Center for Geographic Analysis

2017 FOSS4GBoston

Page 2: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Background

Harvard Center for Geographic Analysis

• WorldMap http://worldmap.harvard.edu– Biggest GeoNode instance on the planet– https://github.com/cga-harvard/cga-worldmap

• HHypermap http://hh.worldmap.harvard.edu – Map service registry– https://github.com/cga-harvard/HHypermap

Page 3: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

NoteBillion Object Platform (BOP)

https://github.com/cga-harvard/hhypermap-bop

Page 4: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Demo of WorldMap / HHypermap

Page 6: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

The need for an asynchronous processor

In WorldMap and HHypermap there are operations run by users which are time consuming and cannot be handled in the context of a web request

● Harvest the metadata of a service and its layers● Synchronize the metadata of a new or updated layer to the search

engine● Feed a gazetteer when a new layer is uploaded or updated● Upload a spatial datasets to the server● Create a new layer using a table join

Page 7: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

HTTP request/response cycle must be fast● In web applications the HTTP

request/response cycle can be synchronous as long as there are very quick interactions between the client and the server

● unfortunately there are cases when the cycle become slower

● In these situations the best practice for a web application is to process asynchronously these tasks using a task queue

Page 8: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Task Queues

Asynchronous processing in a web application can be delegated to a task queue, which is a system for parallel execution of tasks in a non-blocking fashion

Page 9: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Asynchronous processing model

Page 10: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Asynchronous processing model

● The asynchronous processing model is composed by services that produce processing tasks (producers) and by services which consume and process these tasks (consumers) accordingly

● A message queue is a broker which facilitates message passing by providing a protocol or interface which other services can access. Work can be distributed across threads or machines

● In the context of a web application the producer is the client application that creates messages based on the user interaction. The consumer is a daemon process that can consume the messages and run the needed process

Page 11: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Glossary● Task Queue: a system for parallel execution of tasks in a non-blocking

fashion● Broker or Message Queue: provides a protocol or interface for messages

exchanging between different services and applications● Producer: the code that places the tasks to be executed later in the broker● Consumer or Worker: takes tasks from the broker and process them● Exchange: takes a message from a producer and route it to zero or more

queues (messages routing)

Tasks must be consumed faster than being produced. If not, add more workers

Page 12: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Use cases for task queues

● in web applications some process is taking too much time and must be processed asynchronously

● heterogeneous applications/services in a given system architecture need an easy way to reliably communicate between each other

● periodic operations (vs crontab)● a way of parallelizing tasks in multi processors● monitor processes and analyze failing tasks (and execute

them again)

Page 13: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Typical use cases for a task queue in a web application

● Thumbnails generation● Sending bulk email● Fetching large amounts of data from APIs● Performing time-intensive calculations● Expensive queries● Search engine index synchronization● Interaction with another application/service● Replacing cron jobs (backups, maintenance, etc…)

Page 14: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Typical use cases for a task queue in a GIS Portal/SDI

● Upload a shapefile to the server (GeoNode)● Thumbnails generation for layers and maps (GeoNode)● OGC services harvesting (Harvard Hypermap)● Geoprocessing operations● Geospatial data maintenance

Page 15: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Producer, broker and consumer architecture

Producer

Consumer

ProducerBroker

Consumer

Producer

Broker

ConsumerProducer

BrokerConsumer

Producer

Page 16: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Message brokers implementations

Most of them are open source!

● RabbitMQ (AMQP, STOMP, JMS)● Apache ActiveMQ (STOMP, JMS)● Amazon Simple Queue Service (JMS)● Apache Kafka

Several standard protocols:

● AMQP, STOMP, JMS, MSMQ (Microsoft .NET)

Page 17: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Tasks (Jobs) queues implementations

● Celery (RabbitMQ, Redis, Amazon SQS, Zookeeper)● Redis Queue (Redis)● Resque (Redis)● Kue (Redis)

And many others!

Page 18: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Celery● asynchronous task queue based on distributed message

passing● focused on real-time operation, but supports scheduling

as well● the execution units, called tasks, are executed

concurrently on a single or more worker servers● it supports many message brokers (RabbitMQ, Redis,

MongoDB, CouchDB, ...)● written in Python but it can operate with other languages● great integration with Django!● great monitoring tools (Flower, django-celery-results)

Page 19: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

RabbitMQ

● RabbitMQ is a message broker: it accepts and forwards messages

● most widely deployed open source broker (35k+ deployments)

● support many message protocols● supported by many operating systems and

languages● Written in Erlang

Page 20: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Architecture of Celery/RabbitMQ

https://tests4geeks.com/python-celery-rabbitmq-tutorial/

Page 21: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

A real use case: Harvard HypermapHHypermap (Harvard Hypermap) Registry is a platform that manages OWS, Esri REST, and other types of map service harvesting, and orchestration and maintains uptime statistics for services and layers. Where possible, layers are cached by MapProxy.

HHypermap provides thousands of remote layers to WorldMap users

Page 22: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Harvard HypermapWorldMap Architecture

Page 23: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

HHypermap interface

Page 24: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Need for a task queue

SLOW!!!

Page 25: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Producer

Is the code that places the tasks to be executed later in the broker

Page 26: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Celery messages

Page 27: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Consumer

Takes tasks from the broker and process them in a worker

Page 28: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Replacing cron jobs

Page 29: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Replacing cron jobs

Page 30: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Workers and threads with htop

Page 31: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Monitoring

Page 32: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Monitoring a task

Page 33: Maintaining Spatial Data Infrastructures (SDIs) 2017 ... · Maintaining Spatial Data Infrastructures (SDIs) ... //tests4geeks.com/python-celery-rabbitmq-tutorial/ ... by MapProxy.

Thanks!

Question and Answer


Recommended