+ All Categories
Home > Technology > Real Time Data Streaming using Kafka & Storm

Real Time Data Streaming using Kafka & Storm

Date post: 06-May-2015
Category:
Upload: ran-silberman
View: 7,733 times
Download: 6 times
Share this document with a friend
Description:
This presentation describes 3 real use case of Real-Time Data Streaming and how they were implemented in LivePerson using Kafka and Storm
39
DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
Transcript
Page 1: Real Time Data Streaming using Kafka & Storm

DATA

LivePerson Case Study: Real Time Data Streaming

March 20th 2014Ran Silberman

Page 2: Real Time Data Streaming using Kafka & Storm

About me● Technical Leader of Data Platform in LivePerson

● Bird watcher and amateur bird photographer

Pharaoh Eagle-Owl / Bubo ascalaphus This is what the people from previous slide were looking at…

Amir Silberman

Page 3: Real Time Data Streaming using Kafka & Storm

Agenda● Why we chose Kafka + Storm

● How implementation was done

● Measures of success

● Two examples of use

● Tips from our experience

Page 4: Real Time Data Streaming using Kafka & Storm

Data in LivePersonVisitor in Site

Chat Window

Agent console

LivePerson SaaS Server

LoginMonitor

Rules,Intelligence,Decision

Chat

Chat

Invite

DATA

DATA DATA

BIGDATA

Page 5: Real Time Data Streaming using Kafka & Storm

Legacy Data flow in LivePerson

BI DWH (Oracle)

RealTime servers

ETLSessionize

Modeling

Schema View

Real-Time data

Historical data

Page 6: Real Time Data Streaming using Kafka & Storm

Why Kafka + Storm?● Need to scale out and plan for future scale

○ Limit for scale should not be technology

○ Let the limit be cost of (commodity) hardware

● What Data platforms can be implemented quickly?

○ Open source - fast evolving and community

○ Micro-services - do only what you ought to do!

● Are there risks in this choice?

○ Yes! technology is not mature enough

○ But, there is no other mature technology that can

address our needs!

Page 7: Real Time Data Streaming using Kafka & Storm

Long-eared Owl / Asio otusAmir Silberman

Page 8: Real Time Data Streaming using Kafka & Storm

Legacy Data flow in LivePerson

BI DWH (Oracle)

RealTime servers

Customers

ETLSessionize

Modeling

Schema View

Page 9: Real Time Data Streaming using Kafka & Storm

1st phase - move to Hadoop

ETLSessionize

Modeling

Schema View

RealTime servers

BI DWH (Vertica)HDFS

Hadoop

MR Job transfers data to BI DWH

Customers

Page 10: Real Time Data Streaming using Kafka & Storm

2. move to Kafka

6

RealTime servers

HDFSBI DWH (Vertica)

Hadoop

MR Job transfers data to BI DWH

KafkaTopic-1

Customers

Page 11: Real Time Data Streaming using Kafka & Storm

3. Integrate with new producers

6

RealTime servers

HDFSBI DWH (Vertica)

Hadoop

MR Job transfers data to BI DWH

KafkaTopic-1 Topic-2

New RealTime servers

Customers

Page 12: Real Time Data Streaming using Kafka & Storm

4. Add Real-time BI

6

Customers

RealTime servers

HDFSBI DWH (Vertica)

Hadoop

MR Job transfers data to BI DWH

KafkaTopic-1 Topic-2

New RealTime servers

Storm

Topology

Analytics DB

Page 13: Real Time Data Streaming using Kafka & Storm

Architecture

Real-time servers

Kafka

Storm

Cassandra/ CouchBase

Real Time Processing

Flow rate into Kafka:33 MB/Sec

Flow rate from Kafka: 20 MB/Sec

Total daily data in Kafka:17 Billion events

Some Numbers: Cyber Monday 2013

Dashboards

4 topologies reading all events

Page 14: Real Time Data Streaming using Kafka & Storm

Eurasian Wryneck / Jynx torquillaAmir Silberman

Page 15: Real Time Data Streaming using Kafka & Storm

Two use cases 1. Visitor list

2. Agent State

Page 16: Real Time Data Streaming using Kafka & Storm

1st Strom Use Case: “Visitors List”Use case:

● Show list of visitors in the “Agent Console”

● Collect data about visitor in real time

● Visitor stickiness in streaming process

Page 17: Real Time Data Streaming using Kafka & Storm

Visitors List Topology

Page 18: Real Time Data Streaming using Kafka & Storm

Selected Analytics DB - Couchbase

1st Strom Use Case: “Visitors List”

● Document Store - for complex documents

● Searchable - possible to search by different

attributes.

● High throughput - Read & Write

Page 19: Real Time Data Streaming using Kafka & Storm

First Storm Topology – Visitor Feed

Storm Topology

Kafka Spout Analyze relevant events

Write event to Visitor document

emit emit

Kafka events stream

Add/ Update

Couchbase

“Visitor List” Topology: Analytics DB: Couchbase - Document store

Parse Avro into tuple

emit

Page 20: Real Time Data Streaming using Kafka & Storm

Visitors List - Storm considerations● Complex calculations before sending to DB

○ Ignore delayed events

○ Reorder events before storing

● Document cached in memory

● Fields Grouping to bolt that writes to CouchBase

● High parallelism in bolt that writes to CouchBase

Page 21: Real Time Data Streaming using Kafka & Storm

Visitors List Topology

Page 22: Real Time Data Streaming using Kafka & Storm

European Roller / Coracias garrulusAmir Silberman

Page 23: Real Time Data Streaming using Kafka & Storm

2nd Storm Use Case: “Agent State”Use case:

● Show Agent activity on “Agent Console”

● Count Agent statistics

● Display graphs

Page 24: Real Time Data Streaming using Kafka & Storm

Agent Status Topology

Page 25: Real Time Data Streaming using Kafka & Storm

Selected Analytics DB - Cassandra

2nd Storm Use Case: “Agent State”

● Wide Column Store DB

● Highly Available w/o Single point of failure

● High throughput

● Optimized for counters

Page 26: Real Time Data Streaming using Kafka & Storm

First Storm Topology – Visitor Feed

Storm Topology

Kafka Spout Analyze relevant events

Send events

emit emit

Kafka events stream

Add

“Agent Status” Topology: Analytics DB: Cassandra - Document store

Parse Avro into tuple

emit

Data visualization using Highcharts

Page 27: Real Time Data Streaming using Kafka & Storm

Agent Status - Storm considerations● Counters stored by topology

● Calculations done after reading from DB

● Delayed events should not be ignored

● Order of events does not matter

● Using Highcharts for data visualization

Page 28: Real Time Data Streaming using Kafka & Storm

Spur-winged Lapwing / Vanellus spinosusAmir Silberman

Page 29: Real Time Data Streaming using Kafka & Storm

3rd Storm Use Case: Data AuditingUse case:

● Needs to be able to tell whether events arrived

○ Where there any missing events?

○ Where there any duplicated events?

○ How long did it take for events to arrive?

● Data not important - only count of events

Page 30: Real Time Data Streaming using Kafka & Storm

3rd Storm Use Case: Data AuditingRealtime server

Kafka Topics

Auditing Topic

Storm Sync topology

Audit-loader topology

MySql

Hadoop

HDFS

audit job

kafka1

3

4

2

Auditor

Page 31: Real Time Data Streaming using Kafka & Storm

First Storm Topology – Visitor Feed

Storm Topology

Kafka Spout Analyze relevant events

Send events

emit emit

Kafka events stream

Add

“Sync Audit” Topology: Sync messages between two topics

Parse Avro into tuple

emit

Kafka Audit topic

Page 32: Real Time Data Streaming using Kafka & Storm

First Storm Topology – Visitor Feed

Storm Topology

Kafka Spout Analyze relevant events

Send events

emit emit

Kafka Audit topic

Add

“Load Audit” Topology: Analytics DB: MySql - RDBMS

Parse Avro into tuple

emit

Auditing Report

Page 33: Real Time Data Streaming using Kafka & Storm

“Load Audit” Topology:● Stores statistics of events count

● SQL type DB

● Used for Auditing and other statistics

● Requires metadata in events header

Page 34: Real Time Data Streaming using Kafka & Storm

Challenges:

● High network traffic

● Writing to Kafka is faster than reading

● All topologies read all events

● How to avoid resource starvation in Storm

Subalpine Warbler / Sylvia cantillansAmir Silberman

Page 35: Real Time Data Streaming using Kafka & Storm

Optimizations of Kafka● Increase Kafka consuming rate by adding partitions

● Run on physical machines with RAID

● Set retention to the proper need

● Monitor data flow!

Page 36: Real Time Data Streaming using Kafka & Storm

Optimizations of Storm● #of Kafka-Spouts = number of total partitions

● Set “Isolation mode” for important topologies

● Validate Network cards can carry network traffic

● Set Storm cluster on high CPU machines

● Monitor servers CPU & Memory (Graphite)

● Assess min. #Cores that topology needs

○ Use “top” -> “load” to find server load

Page 37: Real Time Data Streaming using Kafka & Storm

Demo● Agent Console - https://z1.le.liveperson.net/

71394613 / [email protected]

● My Site - http://birds-of-israel.weebly.com/

Page 38: Real Time Data Streaming using Kafka & Storm

Questions?

Little Owl / Athene noctuaAmir Silberman

Page 39: Real Time Data Streaming using Kafka & Storm

Thank you!

Ruff / Philomachus pugnaxAmir Silberman


Recommended