+ All Categories
Home > Technology > A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real time architecture using Hadoop and Storm @ FOSDEM 2013

Date post: 26-Jan-2015
Category:
Upload: nathan-bijnens
View: 105 times
Download: 3 times
Share this document with a friend
Description:
 
Popular Tags:
67
A real-time architecture using Hadoop and Storm.
Transcript
Page 1: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop and Storm.

Page 2: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 2

Speakers

Nathan Bijnens@nathan_gs

Geert Van Landeghem@gvanlandeghem

Page 3: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 3

Our Vision

Big Data

test

Volume

Page 4: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 4

Big Data

test

Velocity

Page 5: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 5

Our Vision

Volume

test

Variety

Page 6: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 6

Credits

Nathan MarzEngineer at Backtype(now Twitter).

Storm

Cascalog

ElephantDB

manning.com/marz

Page 7: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 7

A Data System

Page 8: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 8

Not all information is equal. Some information is derived from other pieces of

information.

Data is more than Information

Page 9: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 9

Eventually you will reach the most

This is the information you hold true, simple because it exists.

Data is more than Information

Page 10: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 10

EventsEverything we do generates events:- Pay with Credit Card

- Commit to Git

- Click on a webpage

- Tweet

Page 11: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 11

Events used to manipulate the master data.

Events - Before

Page 12: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 12

Today, events are the master data.

Events - After

Page 13: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 13

everything.

Data System

Page 14: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 14

Data is Immutable

Events

Page 15: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 15

Data is Time Based

Events

Page 16: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 16

Capturing change traditionally

Person Location

Nathan Antwerp

Geert Dendermonde

John Ghent

Person Location

Nathan Ghent

Geert Dendermonde

John Ghent

Page 17: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 17

Capturing change

Person Location Time

Nathan Antwerp 2005-01-01

Geert Dendermonde 2011-10-08

John Ghent 2010-05-02

Nathan Ghent 2013-02-03

Person Location Time

Nathan Antwerp 2005-01-01

Geert Dendermonde 2011-10-08

John Ghent 2010-05-02

Page 18: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 18

The data you query is often transformed, aggregated, ...

Query

Page 19: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 19

Query

Query = function ( data )

Page 20: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 20

Number of people living in each city.

Person Location Time

Nathan Antwerp 2005-01-01

Geert Dendermonde 2011-10-08

John Ghent 2010-05-02

Nathan Ghent 2013-02-03

Location Count

Ghent 2

Dendermonde 1

Page 21: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 21

Query

All Data Query

Page 22: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 22

Query: Precompute

All Data QueryPrecomputed

View

Page 23: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 23

Layered Architecture

Speed Layer

Batch Layer

Serving Layer

Page 24: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 24

Layered Architecture

HadoopElephant

DB

Qu

ery

Incoming Data

Cassandra

Page 25: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 25

Batch Layer

Page 26: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 26

Batch Layer

HadoopElephant

DB

Incoming Data

Page 27: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 27

Unrestrained computation.

Batch Layer

Page 28: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 28

Horizontal scalable.

Batch Layer

Page 29: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 29

High Latency.

matter.

Batch Layer

Page 30: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 30

Stores master copy of data set...

Batch Layer

append only.

Page 31: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 31

Batch Layer

Page 32: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 32

Batch: View generation

Master Dataset

View #1

View #3

View #2MapReduce

Page 33: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 33

1. Take a large problem and divide it into sub-problems

2. Perform the same function on all sub-problems

3. Combine the output from all sub-problems

Output

MA

PRED

UC

E

MapReduce

DoWork() DoWork() DoWork()…

Page 34: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 34

Read only database.No random writes required.

Batch View Database

Page 35: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 35

Batch View DatabaseElephantDB

Splout

Page 36: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 36

Batch Layer

Not yet absorbed.

Data absorbed into Batch Views

Time No

w

Just a few hours of data.

Page 37: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 37

Speed Layer

Page 38: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 38

Overview

HadoopElephant

DB

Incoming Data

Cassandra

Page 39: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 39

Stream processing.

Speed Layer

Page 40: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 40

Continuous computation.

Speed Layer

Page 41: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 41

Transactional.

Speed Layer

Page 42: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 42

Storing a limited window of data.Compensating for the last few hours of data.

Speed Layer

Page 43: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 43

All the complexity is isolated in the Speed layer auto-

corrected.

Speed Layer

Page 44: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 44

CAPYou have a choice between:

Availability- Queries are eventual consistent.

Consistency- Queries are consistent.

Page 45: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 45

Some algorithms are hard to implement in real time. For those cases we could

estimate the results.

Eventual accuracy

Page 46: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 46

Speed Layer

Incoming Data

Real Time

View 1

Real Time

View 2

Page 47: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 47

StormMessage passing.

Distributed processing.

Horizontally scalable.

Incremental algorithms.

Fast.

Data in motion.

Page 48: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 48

StormMessage passing.

Distributed processing.

Horizontally scalable.

Incremental algorithms.

Fast.

Data in motion.

Page 49: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 49

Storm

Nimbus Zookeeper

Worker Node

Supervisor

Wo

rker

Wo

rker

Wo

rker

Worker Node

Supervisor

Wo

rker

Wo

rker

Wo

rkerWorker Node

SupervisorW

orker

Wo

rker

Wo

rker

Page 50: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 50

StormTuple

Stream

Page 51: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 51

StormSpout

Bolt

Page 52: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 52

StormGrouping

Page 53: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 53

Speed Layer ViewsThe views are stored in Read & Write database.- Cassandra

- Hbase

- MongoDB

- MySQL

- ElasticSearch

-

Much more complex than a read only view.

Page 54: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 54

Serving Layer

Page 55: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 55

Overview

HadoopElephant

DB

Qu

ery

Incoming Data

Cassandra

Page 56: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 56

This layer queries the Batch & Real Time views and merges it.

Serving Layer

Page 57: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 57

Serving Layer

Real Time Views

Merge

Batch Views

Page 58: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 58

Overview

Page 59: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 59

Overview

HadoopElephant

DB

Qu

ery

Incoming Data

Cassandra

Page 60: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 60

Lambda ArchitectureCan discard any view, batch and real time, and just recreate everything from the master data.

Mistakes are corrected via recomputation.- Write bad data? Remove the data & recompute.

- Bug in view generation? Just recompute the view.

Data storage is highly optimized.

Page 61: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 61

Recommendations

Page 62: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 62

Serialization & Schema

Catch errors as quickly as they happen. Validation on write vs on read.

Page 63: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 63

Serialization & Schema

CSV is actually a serialization language that is just poorly defined.

Page 64: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 64

Serialization & SchemaUse a format with a schema.- Thrift

- Avro

- Protobuffers

Page 65: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 65

Questions?

What are your needs?@nathan_gs & @gvanlandeghem

Page 66: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 66

DataCrunchers

We enable companies in envisioning, defining and implementing a data strategy.

A one-stop-shop for all your Big Data needs.

The first Big Data Consultancy agency in Belgium.

Page 67: A real time architecture using Hadoop and Storm @ FOSDEM 2013

A real-time architecture using Hadoop & Storm. 67

Jobs

We are [email protected]


Recommended