+ All Categories
Home > Technology > Seattle kafka meetup nov 2015 published siphon

Seattle kafka meetup nov 2015 published siphon

Date post: 11-Feb-2017
Category:
Upload: nitin-kumar
View: 607 times
Download: 4 times
Share this document with a friend
13
Siphon - Kafka as DataBus in Microsoft Nitin Kumar ([email protected] ) Dev Manager, Microsoft https://www.linkedin.com/in/nikuma
Transcript
Page 1: Seattle kafka meetup nov 2015 published  siphon

Siphon - Kafka as DataBus in Microsoft

Nitin Kumar ([email protected])Dev Manager, Microsoft

https://www.linkedin.com/in/nikuma

Page 2: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Agenda• Scale: Kafka at Microsoft (Bing, Ads, Office)• Use Case: NRT Customer facing reports• Kafka based Streaming Solution• Collector• Consumer Restful APIs• Monitoring: Canary/Audit Trail

• Production Experience• Key Takeaways

Page 3: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Scale: Kafka at Microsoft (Ads, Bing, Office)

Kafka Brokers 1000+ across 5 Datacenters

Operating System Windows Server 2012 R2

Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network

Incoming Events 1 million per sec, (90 Billion per day, 100 TB per day)

Outgoing Events 5 million per sec, (1 Trillion per day, 500 TB per day)

Kafka Topics/Partitions 50+/5000+

Kafka version 0.8.1.1 (3 way replication)

Page 4: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Problem

Serving System{Q}

{R}Online Fraud

Detection

ML Classification Aggregation Reporting DB

Keyword

1.5 hours 2.5 hours

Advertiser

Feature Extraction

300 GB/h

200+

Features

Stats25 TB

Log Collection

Sorting / Partitioning

What is the click through rate of my ad, that launched at 5pm?

Page 5: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Goals / Design ConsiderationsReduce latency from 4 hours to 15 minutes

99.8% Log completeness Guarantees

Check pointing & Failure recovery

Exactly Once Semantic

Highly Available, Scalable and rolling upgrade

Reusing Existing C# Libraries

Page 6: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Siphon DataBus

Solution

{Q}

{R}Kafka Audit

ML Classification

Aggregation Reporting DB

Keyword

1-2 sec (Minimize latency) < 15 minutes

Advertiser

Feature Extraction

100MBPS

200+

Features

Stats25 TB

Serving SystemOnline Fraud

Detection

Kafka as a distributed Queue StreamScope as a distributed processing system

StreamScope

Page 7: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

SiphonAsia DC

Zookeeper Canary

Kafka

Collector

Agent

Services Data Pull (Agent)

Services Data Push

Device Proxy Services

Consumer API (Push/

Pull)

Europe DC

Zookeeper Canary

Kafka

US DC

Zookeeper Canary

Kafka

Streaming

Batch

Audit Trail

Open Source

Microsoft Internal

Siphon

Page 8: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Collector – Data Ingestion (Producers)

• Http(s) Server• Restful API with SSL support.• Abstraction from Kafka

internals (Partition, Kafka version)• Throttling, QPS Monitoring• PII scrubbing• Load balancing/failover

Device Proxy Services

Collector

Kafka Brokers

Broker

Broker

Broker

Broker

P0P1P2

P3P4P5

P6P7P8

P9P10P11

Collector

Collector

Load

Bal

ance

rServices Data Push

Agent

Services Data Pull (Agent)

Open Source

Microsoft Internal

Siphon

Page 9: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Consumer API (Push/Pull)• Restful Pull API – Simple consumer • Config driven subscriptions for preconfigured sinks like (HDFS, Cosmos, ELK).

Config (ZK)

Executor

Kafka .NET Library

Kafka

Supported destinations –• Cosmos• Elastic Search• Kafka• HDFS

Page 10: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

High Level Consumer

Monitoring using Canary, Audit Trail

Device Proxy Services

Collector

Kafka Brokers

Broker

Broker

Broker

Broker

P0P1P2

P3P4P5

P6P7P8

P9P10P11

Collector

Collector

Load

Bal

ance

rServices Data Push

Agent

Services Data Pull (Agent)

Synthetic message

Audit Trail

Page 11: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Production Experience

• System in production for 15 months• End to End Advertiser report latency of 12+ minutes.• Other use cases from Office, Bing.• Integration with other streaming systems – Storm, Spark.• Monitoring using ELK

Page 12: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

Key Takeaways• Scale out with Kafka (50K -> 1M -> multi-million Events Per sec)• Ability to build tunable Auditing/Monitoring• Producer/Consumer Restful API provides a nice abstraction• Config driven Pub/Sub system

Page 13: Seattle kafka meetup nov 2015 published  siphon

Monday, May 1, 2023

“We are Hiring.”Thank You

Nitin Kumar ([email protected])https://www.linkedin.com/in/nikuma


Recommended