Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Post on 15-Jan-2015

534 views 0 download

Tags:

description

Eric will be presenting on SimpleReach's use of message architectures and why they an important part of a distributed system stack. They are often overlooked because the prevailing sentiment is that the storage and processing engines are the most important aspects of the system. Without the highways, the data won’t be able to get to its destination.

transcript

Eric Lubow

@elubow

elubow@simplereach.com

Message

Architectures in Distributed

Systems

Message Architectures in Distributed Systems Eric Lubow @elubow

Overview

• SimpleReach

• Why is messaging important

• Goals

• Explanations

• Questions

Message Architectures in Distributed Systems Eric Lubow @elubow

Personal Vanity

• CTO of SimpleReach

• Co-author of Practical Cassandra

• Skydiver, Mixed Martial Artist,

Motorcyclist, Dog dad, NY Giants fan

• IronMatt Foundation for Pediatric Brian

Tumors (ironmatt.org)

Message Architectures in Distributed Systems Eric Lubow @elubow

Message Architectures in Distributed Systems Eric Lubow @elubow

Message Architectures in Distributed Systems Eric Lubow @elubow

• Millions of URLs per day

• Over 3.75 billion page views per month

• 7b events per day (~80k events/second)

• Auto-scale 175-190 machines depending on traffic

• Built a predictive measurement algorithm for the social web

SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Why is Messaging Important?

• Most large scale systems discussions only talk about storage

• Direct high volumes of data around your infrastructure

• Control flow of data through your infrastructure

• Decouple important systems

• Scalability, Elasticity, Deliverability, and Redundancy

• Buffering and Asynchronous communication

Message Architectures in Distributed Systems Eric Lubow @elubow

The database is NOT a transport layer

App

incoming request

sync persist data

send response

async queue message

Data Flow

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow Patterns

• Enrichment/In-stream Modification Schemes

• Monitoring and Instrumentation

Message Architectures in Distributed Systems Eric Lubow @elubow

Messaging Systems

• RabbitMQ

• ZeroMQ

• Kafka

• Amazon SQS

• NSQ

• ActiveMQ

• Resque

• Custom

Message Architectures in Distributed Systems Eric Lubow @elubow

What Did SimpleReach Choose?

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

NSQ• Distributed and de-centralized topology

• At least once delivery guaranteed

• Multicast style message routing

• Simple to configure and deploy

• Allow for maintenance windows with no downtime

• Ephemeral channels for testing

• Channel sampling

github.com/bitly/nsq

Message Architectures in Distributed Systems Eric Lubow @elubow

separate hosts

• a topic is a distinct stream of messages (a single nsqd instance can have multiple topics)

• a channel is an independent queue for a topic (a topic can have multiple channels)

• consumers discover producers by querying nsqlookupd (a discovery service for topics)

• topics and channels are created at runtime (just start publishing/subscribing)

nsqd

“metrics”

Channels

“event”

Topics

“enrichment”

“writer”

Consumers

AAABBB

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Topics and Channels

Message Architectures in Distributed Systems Eric Lubow @elubow

Everyone Speaks The Same Language

http:// + {“content-type”: “application/json”}

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

Message Architectures in Distributed Systems Eric Lubow @elubow

• nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/channels)

• nsq_to_http - utility that helps transport an aggregate stream over HTTP

• nsq_to_file - utility that safely persists an aggregated stream to disk

• nsq_stat - iostat like utility for a topic/channel

• nsq_tail - tail like utility for a topic/channel

NSQ Tools

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

Right Tool For The Job

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

Message Architectures in Distributed Systems Eric Lubow @elubow

NSQNSQD

API

consumer

NSQNSQD

API

NSQNSQD

API

consumer

nsqlookupd

nsqlookupd

PUBLISH

REGISTER

DISCOVER

SUBSCRIBE

How Does It Work?

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

The Schrute of the Problem

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

Message Architectures in Distributed Systems Eric Lubow @elubow

Simple Deployment & Automation

• Chef cookbook - github.com/simplereach/chef-nsq

• Written in Go

• Easily distributable binaries

• Deploy lookup nodes

• Nsqd’s installed locally

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

Message Architectures in Distributed Systems Eric Lubow @elubow

nsqlookupd nsqlookupd

consumer➊ regularly poll for topic producers

➋ connect to all producers

HTTP requests

Runtime Discovery

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

Message Architectures in Distributed Systems Eric Lubow @elubow

Path of a Packet

Internet

EC

Inte

rn

al

AP

I

Solr

C*

Mongo

Redis

Vertica

API

Fire Hose

SC

Co

ns

um

ers

Qu

eu

e

Message Architectures in Distributed Systems Eric Lubow @elubow

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Message Architectures in Distributed Systems Eric Lubow @elubow

Broadcast Importance for Polyglottany

Aggregator

Mongo Writer

Broadcast

Redis Writer

Cassandra Writer

Solr Writer

Calculator

NSQ

Vertica Writer

Message Architectures in Distributed Systems Eric Lubow @elubow

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

Message Architectures in Distributed Systems Eric Lubow @elubow

What Is Enrichment?

A mechanism to add value to a message to enhance processing in

your system

Message Architectures in Distributed Systems Eric Lubow @elubow

How Do We Enrich

Raw EventEnriched

Event

Consumer A

Consumer B

Consumer C

NSQ Broadcast

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

• Enrichment

Message Architectures in Distributed Systems Eric Lubow @elubow

Monitoring / Instrumentation

• Comes with statsd support built-in

• Statsd talks to both Graphite and nsqadmin

• Nsqadmin comes with graphs for message processing stats

• Nagios plugins available for monitoring topic/channel depth

• Average end to end latency calculations are done on a per-channel basis

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

• Enrichment

• Monitoring and Instrumentation

Message Architectures in Distributed Systems Eric Lubow @elubow

Summary• Large Systems are more than just storage

• Abstraction

• Highly Available

• Controlled Data Flow Patterns

• Monitoring & Automation

Message Architectures in Distributed Systems Eric Lubow @elubow

We’re

Hiring

Message Architectures in Distributed Systems Eric Lubow @elubow

Questions are guaranteed in life.

Answers aren’t.

Eric Lubow

@elubow

elubow@simplereach.com

Cassandra Day, New York

Thank you.