Date post: | 11-Feb-2017 |
Category: |
Technology |
Upload: | nitin-kumar |
View: | 607 times |
Download: | 4 times |
Siphon - Kafka as DataBus in Microsoft
Nitin Kumar ([email protected])Dev Manager, Microsoft
https://www.linkedin.com/in/nikuma
Monday, May 1, 2023
Agenda• Scale: Kafka at Microsoft (Bing, Ads, Office)• Use Case: NRT Customer facing reports• Kafka based Streaming Solution• Collector• Consumer Restful APIs• Monitoring: Canary/Audit Trail
• Production Experience• Key Takeaways
Monday, May 1, 2023
Scale: Kafka at Microsoft (Ads, Bing, Office)
Kafka Brokers 1000+ across 5 Datacenters
Operating System Windows Server 2012 R2
Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network
Incoming Events 1 million per sec, (90 Billion per day, 100 TB per day)
Outgoing Events 5 million per sec, (1 Trillion per day, 500 TB per day)
Kafka Topics/Partitions 50+/5000+
Kafka version 0.8.1.1 (3 way replication)
Monday, May 1, 2023
Problem
Serving System{Q}
{R}Online Fraud
Detection
ML Classification Aggregation Reporting DB
Keyword
1.5 hours 2.5 hours
Advertiser
Feature Extraction
300 GB/h
200+
Features
Stats25 TB
Log Collection
Sorting / Partitioning
What is the click through rate of my ad, that launched at 5pm?
Monday, May 1, 2023
Goals / Design ConsiderationsReduce latency from 4 hours to 15 minutes
99.8% Log completeness Guarantees
Check pointing & Failure recovery
Exactly Once Semantic
Highly Available, Scalable and rolling upgrade
Reusing Existing C# Libraries
Monday, May 1, 2023
Siphon DataBus
Solution
{Q}
{R}Kafka Audit
ML Classification
Aggregation Reporting DB
Keyword
1-2 sec (Minimize latency) < 15 minutes
Advertiser
Feature Extraction
100MBPS
200+
Features
Stats25 TB
Serving SystemOnline Fraud
Detection
Kafka as a distributed Queue StreamScope as a distributed processing system
StreamScope
Monday, May 1, 2023
SiphonAsia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open Source
Microsoft Internal
Siphon
Monday, May 1, 2023
Collector – Data Ingestion (Producers)
• Http(s) Server• Restful API with SSL support.• Abstraction from Kafka
internals (Partition, Kafka version)• Throttling, QPS Monitoring• PII scrubbing• Load balancing/failover
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0P1P2
P3P4P5
P6P7P8
P9P10P11
Collector
Collector
Load
Bal
ance
rServices Data Push
Agent
Services Data Pull (Agent)
Open Source
Microsoft Internal
Siphon
Monday, May 1, 2023
Consumer API (Push/Pull)• Restful Pull API – Simple consumer • Config driven subscriptions for preconfigured sinks like (HDFS, Cosmos, ELK).
Config (ZK)
Executor
Kafka .NET Library
Kafka
Supported destinations –• Cosmos• Elastic Search• Kafka• HDFS
Monday, May 1, 2023
High Level Consumer
Monitoring using Canary, Audit Trail
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0P1P2
P3P4P5
P6P7P8
P9P10P11
Collector
Collector
Load
Bal
ance
rServices Data Push
Agent
Services Data Pull (Agent)
Synthetic message
Audit Trail
Monday, May 1, 2023
Production Experience
• System in production for 15 months• End to End Advertiser report latency of 12+ minutes.• Other use cases from Office, Bing.• Integration with other streaming systems – Storm, Spark.• Monitoring using ELK
Monday, May 1, 2023
Key Takeaways• Scale out with Kafka (50K -> 1M -> multi-million Events Per sec)• Ability to build tunable Auditing/Monitoring• Producer/Consumer Restful API provides a nice abstraction• Config driven Pub/Sub system
Monday, May 1, 2023
“We are Hiring.”Thank You
Nitin Kumar ([email protected])https://www.linkedin.com/in/nikuma