+ All Categories
Transcript
Page 1: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Eric Lubow

@elubow

[email protected]

Message

Architectures in Distributed

Systems

Page 2: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Overview

• SimpleReach

• Why is messaging important

• Goals

• Explanations

• Questions

Page 3: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Personal Vanity

• CTO of SimpleReach

• Co-author of Practical Cassandra

• Skydiver, Mixed Martial Artist,

Motorcyclist, Dog dad, NY Giants fan

• IronMatt Foundation for Pediatric Brian

Tumors (ironmatt.org)

Page 4: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Page 5: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Page 6: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

• Millions of URLs per day

• Over 3.75 billion page views per month

• 7b events per day (~80k events/second)

• Auto-scale 175-190 machines depending on traffic

• Built a predictive measurement algorithm for the social web

SimpleReach

Page 7: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Why is Messaging Important?

• Most large scale systems discussions only talk about storage

• Direct high volumes of data around your infrastructure

• Control flow of data through your infrastructure

• Decouple important systems

• Scalability, Elasticity, Deliverability, and Redundancy

• Buffering and Asynchronous communication

Page 8: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

The database is NOT a transport layer

Page 9: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

App

incoming request

sync persist data

send response

async queue message

Data Flow

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 10: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow Patterns

• Enrichment/In-stream Modification Schemes

• Monitoring and Instrumentation

Page 11: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Messaging Systems

• RabbitMQ

• ZeroMQ

• Kafka

• Amazon SQS

• NSQ

• ActiveMQ

• Resque

• Custom

Page 12: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

What Did SimpleReach Choose?

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 13: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

NSQ• Distributed and de-centralized topology

• At least once delivery guaranteed

• Multicast style message routing

• Simple to configure and deploy

• Allow for maintenance windows with no downtime

• Ephemeral channels for testing

• Channel sampling

github.com/bitly/nsq

Page 14: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

separate hosts

• a topic is a distinct stream of messages (a single nsqd instance can have multiple topics)

• a channel is an independent queue for a topic (a topic can have multiple channels)

• consumers discover producers by querying nsqlookupd (a discovery service for topics)

• topics and channels are created at runtime (just start publishing/subscribing)

nsqd

“metrics”

Channels

“event”

Topics

“enrichment”

“writer”

Consumers

AAABBB

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Topics and Channels

Page 15: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Everyone Speaks The Same Language

http:// + {“content-type”: “application/json”}

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 16: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

Page 17: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

• nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/channels)

• nsq_to_http - utility that helps transport an aggregate stream over HTTP

• nsq_to_file - utility that safely persists an aggregated stream to disk

• nsq_stat - iostat like utility for a topic/channel

• nsq_tail - tail like utility for a topic/channel

NSQ Tools

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 18: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Right Tool For The Job

Page 19: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

Page 20: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

NSQNSQD

API

consumer

NSQNSQD

API

NSQNSQD

API

consumer

nsqlookupd

nsqlookupd

PUBLISH

REGISTER

DISCOVER

SUBSCRIBE

How Does It Work?

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 21: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

The Schrute of the Problem

Page 22: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

Page 23: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Simple Deployment & Automation

• Chef cookbook - github.com/simplereach/chef-nsq

• Written in Go

• Easily distributable binaries

• Deploy lookup nodes

• Nsqd’s installed locally

Page 24: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

Page 25: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

nsqlookupd nsqlookupd

consumer➊ regularly poll for topic producers

➋ connect to all producers

HTTP requests

Runtime Discovery

Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14

Page 26: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

Page 27: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Path of a Packet

Internet

EC

Inte

rn

al

AP

I

Solr

C*

Mongo

Redis

Vertica

API

Fire Hose

SC

Co

ns

um

ers

Qu

eu

e

Page 28: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Page 29: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Page 30: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Page 31: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Broadcast Importance for Polyglottany

Aggregator

Mongo Writer

Broadcast

Redis Writer

Cassandra Writer

Solr Writer

Calculator

NSQ

Vertica Writer

Page 32: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Page 33: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Controlled Data Flow

Social Event

CollectorSocial Data

Batch & Write

Processed Data

Batch & Write

Raw Data

Calculate Score Write

NSQ Broadcast NSQ

Page 34: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

Page 35: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

What Is Enrichment?

A mechanism to add value to a message to enhance processing in

your system

Page 36: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

How Do We Enrich

Raw EventEnriched

Event

Consumer A

Consumer B

Consumer C

NSQ Broadcast

Page 37: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

• Enrichment

Page 38: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Monitoring / Instrumentation

• Comes with statsd support built-in

• Statsd talks to both Graphite and nsqadmin

• Nsqadmin comes with graphs for message processing stats

• Nagios plugins available for monitoring topic/channel depth

• Average end to end latency calculations are done on a per-channel basis

Page 39: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Goals• Consistent interfaces between systems

• Allow access to many toolsets

• Minimize downtime/Minimize cost of downtime

• High availability

• Clients should have minimal architecture knowledge

• Horizontal Scaling

• Controlled Data Flow

• Enrichment

• Monitoring and Instrumentation

Page 40: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Summary• Large Systems are more than just storage

• Abstraction

• Highly Available

• Controlled Data Flow Patterns

• Monitoring & Automation

Page 41: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

We’re

Hiring

Page 42: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach

Message Architectures in Distributed Systems Eric Lubow @elubow

Questions are guaranteed in life.

Answers aren’t.

Eric Lubow

@elubow

[email protected]

Cassandra Day, New York

Thank you.


Top Related