+ All Categories
Home > Technology > Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Date post: 29-Nov-2014
Category:
Upload: hdconf
View: 150 times
Download: 2 times
Share this document with a friend
Description:
In this talk I am going to show how to build a system that can ingest data produced at separate geo located areas (think AWS and it’s many regions) and replicate it to a central cluster where it can be further processed and analysed. I will present an example of how to build a system like this one by using RabbitMQ Federation to replicate data across AWS Regions and RabbitMQ support for many protocols to produce/consume data. To help with scalability I am going to show an interesting way to implement sharded queues with RabbitMQ by using the Consistent Hash Exchange.
117
Transcript
Page 1: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ
Page 2: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Alvaro Videla

• Developer Advocate at Pivotal / RabbitMQ!

• Co-Author of RabbitMQ in Action!

• Creator of the RabbitMQ Simulator!

• Blogs about RabbitMQ Internals: http://videlalvaro.github.io/internals.html!

• @old_sound — [email protected] — github.com/videlalvaro

Page 3: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

About Me

Co-authored!!

RabbitMQ in Action!

http://bit.ly/rabbitmq

Page 4: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

About this Talk

• Exploratory Talk

• A ‘what could be done’ talk instead of ‘this is how you do it’

Page 5: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Agenda

• Intro to RabbitMQ

• The Problem

• Solution Proposal

• Improvements

Page 6: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

https://twitter.com/spacemanaki/status/514590885523505153

Page 7: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

What is RabbitMQ

Page 8: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ

Page 9: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ

Page 10: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ• Multi Protocol Messaging Server

Page 11: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Multi Protocol Messaging Server!• Open Source (MPL)

RabbitMQ

Page 12: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Multi Protocol Messaging Server!• Open Source (MPL)!• Polyglot

RabbitMQ

Page 13: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Multi Protocol Messaging Server!• Open Source (MPL)!• Polyglot!• Written in Erlang/OTP

RabbitMQ

Page 15: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

http://www.rabbitmq.com/community-plugins.html

Community Plugins

Page 16: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

Page 17: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

Page 18: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java

Page 19: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js

Page 20: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js!• Erlang

Page 21: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js!• Erlang!• PHP

Page 22: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js!• Erlang!• PHP!• Ruby

Page 23: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js!• Erlang!• PHP!• Ruby!• .Net

Page 24: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

• Java!• node.js!• Erlang!• PHP!• Ruby!• .Net!• Haskell!• Python

Page 25: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Polyglot

Even COBOL!!!11

Page 26: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Some users of RabbitMQ

Page 27: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram

Some users of RabbitMQ

Page 28: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram!• Indeed.com

Some users of RabbitMQ

Page 29: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram!• Indeed.com!• Telefonica

Some users of RabbitMQ

Page 30: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram!• Indeed.com!• Telefonica!• Mercado Libre

Some users of RabbitMQ

Page 31: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram!• Indeed.com!• Telefonica!• Mercado Libre!• NHS

Some users of RabbitMQ

Page 32: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

• Instagram!• Indeed.com!• Telefonica!• Mercado Libre!• NHS!• Mozilla

Some users of RabbitMQ

Page 33: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The New York Times on RabbitMQ

This architecture - Fabrik - has dozens of RabbitMQ instances spread across 6 AWS zones in Oregon and Dublin.

Upon launch today, the system autoscaled to ~500,000 users. Connection times remained flat at ~200ms.

http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-January/032943.html

Page 34: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

http://www.rabbitmq.com/download.html

Unix - Mac - Windows

Page 35: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Messaging with RabbitMQ

A demo with the RabbitMQ Simulator

https://github.com/RabbitMQSimulator/RabbitMQSimulator

Page 36: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

http://tryrabbitmq.com

Page 37: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Simulator

Page 38: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

Page 39: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Distributed Application

App

App

App

App

Page 40: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Distributed Application

App

App

App

App

Page 41: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

{ok, Connection} = amqp_connection:start(#amqp_params_network{host = "localhost"}), !

{ok, Channel} = amqp_connection:open_channel(Connection),

Data ProducerObtain a Channel

Page 42: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Data Producer

Declare an Exchange

amqp_channel:call(Channel, #'exchange.declare'{exchange = <<"events">>, type = <<"direct">>}),

Page 43: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

amqp_channel:cast(Channel, #'basic.publish'{ exchange = <<"events">>}, #amqp_msg{props = #'P_basic'{delivery_mode = 2}, payload = <<“Hello Federation">>}),

Data Producer

Publish a message

Page 44: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Data ConsumerObtain a Channel

{ok, Connection} = amqp_connection:start(#amqp_params_network{host = "localhost"}), !

{ok, Channel} = amqp_connection:open_channel(Connection),

Page 45: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

amqp_channel:call(Channel, #'exchange.declare'{exchange = <<"events">>, type = <<"direct">>}), !

#'queue.declare_ok'{queue = Queue} = amqp_channel:call(Channel, #'queue.declare'{exclusive = true}), !

amqp_channel:call(Channel, #'queue.bind'{exchange = <<"events">>, queue = Queue}),

Data ConsumerDeclare Queue and bind it

Page 46: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

amqp_channel:subscribe(Channel, #'basic.consume'{queue = Queue, no_ack = true}, self()), !

receive #'basic.consume_ok'{} -> ok end, !

loop(Channel).

Data ConsumerStart a consumer

Page 47: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

loop(Channel) -> receive {#'basic.deliver'{}, #amqp_msg{payload = Body}} -> io:format(" [x] ~p~n", [Body]), loop(Channel) end.

Data ConsumerProcess messages

Page 48: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Ad-hoc solution

Page 49: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

A process that replicates data to the remote server

Page 50: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues

Page 51: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

Page 52: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

• Prevent unbounded local buffers

Page 53: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

• Prevent unbounded local buffers

• Prevent message loss

Page 54: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

• Prevent unbounded local buffers

• Prevent message loss

• Prevent unnecessary message replication

Page 55: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

• Prevent unbounded local buffers

• Prevent message loss

• Prevent unnecessary message replication

• No need for those messages on remote server

Page 56: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Possible issues• Remote server is offline

• Prevent unbounded local buffers

• Prevent message loss

• Prevent unnecessary message replication

• No need for those messages on remote server

• Messages that became stale

Page 57: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Can we do better?

Page 58: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 59: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 60: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation• Supports replication across different administrative domains

Page 61: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation• Supports replication across different administrative domains

• Supports mix of Erlang and RabbitMQ versions

Page 62: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation• Supports replication across different administrative domains

• Supports mix of Erlang and RabbitMQ versions

• Supports Network Partitions

Page 63: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation• Supports replication across different administrative domains

• Supports mix of Erlang and RabbitMQ versions

• Supports Network Partitions

• Specificity - not everything has to be federated

Page 64: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 65: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 66: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 67: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

• It’s a RabbitMQ Plugin

Page 68: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

• It’s a RabbitMQ Plugin

• Internally uses Queues and Exchanges Decorators

Page 69: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

• It’s a RabbitMQ Plugin

• Internally uses Queues and Exchanges Decorators

• Managed using Parameters and Policies

Page 70: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Enabling the Plugin

rabbitmq-plugins enable rabbitmq_federation

Page 71: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Enabling the Plugin

rabbitmq-plugins enable rabbitmq_federation

rabbitmq-plugins enable rabbitmq_federation_management

Page 72: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federating an Exchange

rabbitmqctl set_parameter federation-upstream my-upstream \ ‘{“uri":"amqp://server-name","expires":3600000}'

Page 73: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federating an Exchange

rabbitmqctl set_parameter federation-upstream my-upstream \ ‘{“uri":"amqp://server-name","expires":3600000}' !

rabbitmqctl set_policy --apply-to exchanges federate-me "^amq\." \ '{"federation-upstream-set":"all"}'

Page 74: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federating an Exchange

Page 75: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Configuring Federation

Page 76: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Config Optionsrabbitmqctl set_parameter federation-upstream \ name ‘json-object’

Page 77: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Config Optionsrabbitmqctl set_parameter federation-upstream \ name ‘json-object’ !

json-object: { ‘uri’: ‘amqp://server-name/’, ‘prefetch-count’: 1000, ‘reconnect-delay’: 1, ‘ack-mode’: on-confirm }

http://www.rabbitmq.com/federation-reference.html

Page 78: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Prevent unbound buffers

expires: N // ms. message-ttl: N // ms.

Page 79: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Prevent message forwarding

max-hops: N

Page 80: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Speed vs No Message Loss

ack-mode: on-confirm ack-mode: on-publish

ack-mode: no-ack

Page 81: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

AMQP URI:

amqp://user:pass@host:10000/vhost

http://www.rabbitmq.com/uri-spec.html

Page 82: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Config can be applied via

• CLI using rabbitmqctl

• HTTP API

• RabbitMQ Management Interface

Page 83: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ Federation

Page 84: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/

Some Queueing Theory

Page 85: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

RabbitMQ BasicQos Simulator

Page 86: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Prevent Unbound Buffers

https://www.rabbitmq.com/blog/2014/01/23/preventing-unbounded-buffers-with-rabbitmq/

λ = mean arrival time µ = mean service rate if λ > µ what happens?

Page 87: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Prevent Unbound Buffers

https://www.rabbitmq.com/blog/2014/01/23/preventing-unbounded-buffers-with-rabbitmq/

λ = mean arrival time µ = mean service rate if λ > µ what happens? Queue length goes to infinity over time.

Page 88: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Recommended Reading

Performance Modeling and Design of Computer Systems:

Queueing Theory in Action

Page 89: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Scaling the Setup

Page 90: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

Page 91: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

• Queues contents live in the node where the Queue was declared

Page 92: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

• Queues contents live in the node where the Queue was declared

• A cluster can access the queue from every connected node

Page 93: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

• Queues contents live in the node where the Queue was declared

• A cluster can access the queue from every connected node

• Queues are an Erlang process (tied to one core)

Page 94: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

The Problem

• Queues contents live in the node where the Queue was declared

• A cluster can access the queue from every connected node

• Queues are an Erlang process (tied to one core)

• Adding more nodes doesn’t really help

Page 95: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Enter Sharded Queues

Page 96: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Enter Sharded Queues

Page 97: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Pieces of the Puzzle

• modulo hash exchange (consistent hash works as well)

• good ol’ queues

Page 98: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

Page 99: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

Page 100: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

Page 101: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

Page 102: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

• Declare Queues with name: nodename.queuename.index

Page 103: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

• Declare Queues with name: nodename.queuename.index

• Bind the queues to a consistent hash exchange

Page 104: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Sharded Queues

• Declare Queues with name: nodename.queuename.index

• Bind the queues to a partitioner exchange

• Transparent to the consumer (virtual queue name)

Page 105: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

We need more scale!

Page 106: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federated Queues

Page 107: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federated Queues

• Load-balance messages across federated queues

• Only moves messages when needed

Page 108: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federating a Queue

rabbitmqctl set_parameter federation-upstream my-upstream \ ‘{“uri":"amqp://server-name","expires":3600000}'

Page 109: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Federating a Queue

rabbitmqctl set_parameter federation-upstream my-upstream \ ‘{“uri":"amqp://server-name","expires":3600000}' !

rabbitmqctl set_policy --apply-to queues federate-me "^images\." \ '{"federation-upstream-set":"all"}'

Page 110: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

With RabbitMQ we can

Page 111: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

With RabbitMQ we can

• Ingest data using various protocols: AMQP, MQTT and STOMP

Page 112: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

With RabbitMQ we can

• Ingest data using various protocols: AMQP, MQTT and STOMP

• Distribute that data globally using Federation

Page 113: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

With RabbitMQ we can

• Ingest data using various protocols: AMQP, MQTT and STOMP

• Distribute that data globally using Federation

• Scale up using Sharding

Page 114: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

With RabbitMQ we can

• Ingest data using various protocols: AMQP, MQTT and STOMP

• Distribute that data globally using Federation

• Scale up using Sharding

• Load balance consumers with Federated Queues

Page 115: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Credits

world map: wikipedia.org

federation diagrams: rabbitmq.com

Page 116: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

Questions?

Page 117: Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ

ThanksAlvaro Videla - @old_sound


Recommended