A Gentle Introduction To Storm And Kafka

Post on 12-Apr-2017

115 views 0 download

transcript

The Leader in Big Data Consulting

www.mammothdata.com | @mammothdataco

A Gentle Introduction of Kafka and Storm

{Percona University | Raleigh}

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

Open Software Integrators

Open Software Integrators is a Big Data consulting and services company specializing in Hadoop, Cassandra, MongoDB and other NoSQL technologies. OSI focuses on executive strategy, initial install, design and implementation.

Founded January 2008 by Andrew C. Oliver

Based in downtown Durham, NC

Partnered with Hortonworks, MongoDB, DataStax, Cloudera, Couchbase, Cloudbees & Neo Technology

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

A Gentle Introduction

What Kafka and Storm are?What they can be used for?What they excel at?

www.mammothdata.com | @mammothdataco

Kafka

Kafka and Storm

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

What is Apache Kafka?

Kafka is a distributed, partitioned, replicated commit log service.

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

The Commit Log

An append-only, immutable sequence of records ordered by time.

firstrecord

next writtenrecord

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

Kafka is:

● fast● durable● distributed● scalable

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

Kafka Abstractions

● Topic: feeds of messages in categories● Broker: a host running Kafka● Producer: a process that publishes messages● Consumer: a process that pulls messages● Partition: portion of a topic’s stream of messages

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

What Kafka is used for:

Enterprise-grade event streaming

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

What Kafka is not good at:

Doing anything other than being a commit log.

www.mammothdata.com | @mammothdataco

Storm

Kafka and Storm

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

What is Apache Storm?

Storm is a distributed, real time computation system

www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco

Stream Processing

● AKA Event Sourcing ● Command and Query Responsibility Segregation● Complex Event Processing● etc.

Several process fail into the domain of stream processing.

www.mammothdata.com | @mammothdataco

● Simple API● Guaranteed data processing● Fault tolerant● Scalable● Usable with any language

What Storm Does

www.mammothdata.com | @mammothdataco

Three abstractions:● Spouts● Bolts● Topology

Storm Abstractions

SpoutSpout

BoltBoltBolt

Bolt

www.mammothdata.com | @mammothdataco

Processes:● UI● Nimbus● Supervisor● Worker

Storm Processes

SupervisorWorker

Worker

SupervisorWorker

Worker

Zookeeper

Web UI Nimbus

www.mammothdata.com | @mammothdataco

● Worker process● Executors● Tasks

Storm Parallelism Model

www.mammothdata.com | @mammothdataco

Use Case: Security

Kafka and Storm

www.mammothdata.com | @mammothdataco

Security customer analytics platform ● Pulling data from customer sites, ● Placed data in a SQL database ● Performing analysis to spot anomalous traffic ● Pushing results back to client to blocking traffic sources

Use Case: Security

www.mammothdata.com | @mammothdataco

Original system mean turn around time: 4.5 hoursStorm / Kafka solution, maximum processing time:

2.6 seconds

Use Case: Security

www.mammothdata.com | @mammothdataco

Thank You

Kafka and Storm

www.mammothdata.com | @mammothdataco

Kafka: http://kafka.apache.org/Storm: http://storm.apache.org/

Links