+ All Categories
Home > Data & Analytics > Data streaming-systems

Data streaming-systems

Date post: 20-Mar-2017
Category:
Upload: imcpune
View: 488 times
Download: 1 times
Share this document with a friend
20
Every ad. Every sales channel. Every screen. One platform. Building Distributed Data Streaming System Ashish Tadose Lead Software Engineer Big Data Analytics - PubMatic
Transcript
Page 1: Data streaming-systems

Every ad.Every sales channel.Every screen.One platform.

Building Distributed Data Streaming System

Ashish Tadose

Lead Software EngineerBig Data Analytics - PubMatic

Page 2: Data streaming-systems

Agenda

• What is stream processing

• Streaming architecture

• Scalable Data Ingestion

• RealTime Streaming Processing system

2

Page 3: Data streaming-systems

What is Streaming Process ?

3

Page 4: Data streaming-systems

In simple words, Streaming is…

4

Page 5: Data streaming-systems

Batch & Streaming processing

Data Generator

IngestionDistributed File system

Processing Data Store

Batch processing

Data Generator

IngestionMessage

QueueProcessing Data Store

Stream Data processing

Page 6: Data streaming-systems

Batch & Streaming processing

6

Data Generator

Ingestion

MessageQueue

Processing Data Store

Stream Data processing

Distributed File system

Processing Data Store

Batch processing

Page 7: Data streaming-systems

Batch & Streaming processing

7

Data Generator

IngestionMessage

Queue

Processing Data Store

Stream Data processing

Distributed File system

Processing Data Store

Batch processing

Page 8: Data streaming-systems

Lambda Architecture: Velocity & Volume

8

Page 9: Data streaming-systems

StreamingIngestion

Technologies

9

Page 10: Data streaming-systems

Ingestion Ecosystem

• Sources

• Machine data

• External stream & syslogs

• Data Collection

• Flume

• Kafka

• Kinesis

• Confluent10

Page 11: Data streaming-systems

Flume

• Easier to setup

• Rich set of in-build tools

• No inherent support for data replication

• Nodes works in isolation

• Memory channel vs File Channel 11

Page 12: Data streaming-systems

Kinesis

12

Page 13: Data streaming-systems

Kafka

13

http://kafka.apache.org/

Originated at LinkedIn, open sourced in early 2011

Implemented in Scala, some Java

9 core committers, plus ~ 20 contributors

Page 14: Data streaming-systems

Why is Kafka so fast?

• Fast writes:

• While Kafka persists all data to disk, essentially all writes go to thepage cache of OS, i.e. RAM.

• Fast reads:

• Very efficient to transfer data from page cache to a network socket

• Linux: sendfile() system call

• Combination of the two = fast Kafka!

• Example (Operations): On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks as they will be serving data entirely from cache.

14

14

http://kafka.apache.org/documentation.html#persistence

Page 15: Data streaming-systems

Flafka – Flume meets Kafka

15

Page 16: Data streaming-systems

Confluent - Centralized Ingestion with Kafka Pipeline

16

Page 17: Data streaming-systems

StreamProcessing

17

Page 18: Data streaming-systems

RealTime Stream Processing• Processing system

• Apache Storm

• Apache Samza

• Apache Spark (Streaming)

• Project Apex - DataTorrent

• Storage

• Hive HDFS

• Hbase

• MySql

• Custom

• Access

• Depend of data storage

• Scalable query interface - Kafka 18

Page 19: Data streaming-systems

Streaming Design Patterns

• Micro batching

• Unpredictable incoming data

• Creating multiple streams

• Out of sequence events

• Stream joins

• Top N metrics

• External Lookup

19

Page 20: Data streaming-systems

Thank You

20


Recommended