+ All Categories
Home > Technology > Online Media Data Stream Processing with Kafka

Online Media Data Stream Processing with Kafka

Date post: 14-Dec-2014
Category:
Upload: christian-guegi
View: 1,012 times
Download: 6 times
Share this document with a friend
Description:
 
20
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
Transcript
Page 1: Online Media Data Stream Processing with Kafka

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

Page 2: Online Media Data Stream Processing with Kafka

2

Overview

•  What is Streaming Data? •  Why Kafka? •  Kafka Architecture

•  Use Case: Prospective Search

18. Septem

ber 2012

Page 3: Online Media Data Stream Processing with Kafka

3

About Sentric

•  Spin-off of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland

•  Big Data expert, focused on Hadoop, HBase and Solr

•  Objective: Transforming data into insights

18. Septem

ber 2012

Page 4: Online Media Data Stream Processing with Kafka

CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  

Page 5: Online Media Data Stream Processing with Kafka

5

Data Streams

•  Website Activity Data •  User activity

•  Server activity

•  Social Media Data •  News Data •  …

•  How to Analyze in Real-Time?

What is Streaming Data?

18. Septem

ber 2012

Page 6: Online Media Data Stream Processing with Kafka

6

Offline vs. Online

What is Streaming Data?

18. Septem

ber 2012

t  

now  

Offline  (Hadoop/MR)   Online  (Ka5a)  

Page 7: Online Media Data Stream Processing with Kafka

CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  

Page 8: Online Media Data Stream Processing with Kafka

8

Streaming Systems

•  Message Queues (RabbitMQ, ActiveMQ) •  do not scale / have no persistence

•  Flume / Scribe •  Log-Aggregation only, high throughput and

scalable, push model •  Focus on offline consumption

•  Kafka •  High throughput and scalable, pull model •  Different consumption profiles

Why Kafka?

18. Septem

ber 2012

Page 9: Online Media Data Stream Processing with Kafka

9

Consumer Performance

Why Kafka?

18. Septem

ber 2012

Source:  h<p://[email protected]/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf  

Page 10: Online Media Data Stream Processing with Kafka

CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  

Page 11: Online Media Data Stream Processing with Kafka

11

Key Concepts

•  Messaging System •  Publish-Subscribe •  Persistent

•  High-Throughput

Kafka Architecture

18. Septem

ber 2012

Page 12: Online Media Data Stream Processing with Kafka

12

Messaging

Kafka Architecture

18. Septem

ber 2012

Broker Producer

Consumer

Producer

Producer

Producer

Consumer

Consumer

ZooKeeper

Push Pull

Page 13: Online Media Data Stream Processing with Kafka

13

Publish-Subscribe

Kafka Architecture

18. Septem

ber 2012

logs

Consumer Consumer Consumer

Msg

Msg

page-views

Msg

Topics

Page 14: Online Media Data Stream Processing with Kafka

14

Persistent

•  Persists messages to disc •  Topic is base abstraction

•  Binary write ahead log •  No message ID •  Message offset ID (byte position)

•  Messages retained a specific time •  Default is 7 days

Kafka Architecture

18. Septem

ber 2012

Page 15: Online Media Data Stream Processing with Kafka

15

High-Throughput

•  API Simplicity •  Append message

•  Fetch message from given byte position

•  Batching •  Stateless Broker •  O(1) disc access (no seeks) •  Use of operating system features

Kafka Architecture

18. Septem

ber 2012

Page 16: Online Media Data Stream Processing with Kafka

CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf

Page 17: Online Media Data Stream Processing with Kafka

17

Solution Architecture

Prospective Search

18. Septem

ber 2012

REST

n News Agents

MySQL Solr

Web-UI

RT Alerts

Kafka

HBase

Icons by http://dryicons.com

Page 18: Online Media Data Stream Processing with Kafka

18

Prospective Search with Kafka

Prospective Search

18. Septem

ber 2012

Processing

Kafka Consumer

Pull (Batch)

Prospective Search

RT Alerts

Icons by http://dryicons.com

Page 19: Online Media Data Stream Processing with Kafka

19

Resources to get started

•  http://incubator.apache.org/kafka/ •  http://sites.computer.org/debull/

A12june/A12JUN-CD.pdf

18. Septem

ber 2012

Page 20: Online Media Data Stream Processing with Kafka

20

Thank you!

Questions? Christian Gügi, [email protected]

Swiss Big Data User Group

18. Septem

ber 2012


Recommended