Date post: | 14-Dec-2014 |
Category: |
Technology |
Upload: | christian-guegi |
View: | 1,012 times |
Download: | 6 times |
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
2
Overview
• What is Streaming Data? • Why Kafka? • Kafka Architecture
• Use Case: Prospective Search
18. Septem
ber 2012
3
About Sentric
• Spin-off of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland
• Big Data expert, focused on Hadoop, HBase and Solr
• Objective: Transforming data into insights
18. Septem
ber 2012
CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL
5
Data Streams
• Website Activity Data • User activity
• Server activity
• Social Media Data • News Data • …
• How to Analyze in Real-Time?
What is Streaming Data?
18. Septem
ber 2012
6
Offline vs. Online
What is Streaming Data?
18. Septem
ber 2012
t
now
Offline (Hadoop/MR) Online (Ka5a)
CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy
8
Streaming Systems
• Message Queues (RabbitMQ, ActiveMQ) • do not scale / have no persistence
• Flume / Scribe • Log-Aggregation only, high throughput and
scalable, push model • Focus on offline consumption
• Kafka • High throughput and scalable, pull model • Different consumption profiles
Why Kafka?
18. Septem
ber 2012
9
Consumer Performance
Why Kafka?
18. Septem
ber 2012
Source: h<p://[email protected]/en-‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-‐final12.pdf
CC 2.0 by Presidente | http://flic.kr/p/2ptSZ
11
Key Concepts
• Messaging System • Publish-Subscribe • Persistent
• High-Throughput
Kafka Architecture
18. Septem
ber 2012
12
Messaging
Kafka Architecture
18. Septem
ber 2012
Broker Producer
Consumer
Producer
Producer
Producer
Consumer
Consumer
ZooKeeper
Push Pull
13
Publish-Subscribe
Kafka Architecture
18. Septem
ber 2012
logs
Consumer Consumer Consumer
Msg
…
Msg
page-views
Msg
Topics
14
Persistent
• Persists messages to disc • Topic is base abstraction
• Binary write ahead log • No message ID • Message offset ID (byte position)
• Messages retained a specific time • Default is 7 days
Kafka Architecture
18. Septem
ber 2012
15
High-Throughput
• API Simplicity • Append message
• Fetch message from given byte position
• Batching • Stateless Broker • O(1) disc access (no seeks) • Use of operating system features
Kafka Architecture
18. Septem
ber 2012
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
17
Solution Architecture
Prospective Search
18. Septem
ber 2012
REST
n News Agents
MySQL Solr
Web-UI
RT Alerts
Kafka
HBase
Icons by http://dryicons.com
18
Prospective Search with Kafka
Prospective Search
18. Septem
ber 2012
Processing
Kafka Consumer
Pull (Batch)
Prospective Search
RT Alerts
Icons by http://dryicons.com
19
Resources to get started
• http://incubator.apache.org/kafka/ • http://sites.computer.org/debull/
A12june/A12JUN-CD.pdf
18. Septem
ber 2012
20
Thank you!
Questions? Christian Gügi, [email protected]
Swiss Big Data User Group
18. Septem
ber 2012