Date post: | 21-Jan-2017 |
Category: |
Technology |
Upload: | amazon-web-services |
View: | 11,650 times |
Download: | 1 times |
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Peter Bakas, Director of Engineering, Event and Data Pipelines, Netflix
October 2015
BDT318
Netflix KeystoneHow Netflix Handles Data Streams Up to 8 Million Events Per Second
Peter Bakas
@ Netflix : Cloud Platform Engineering - Event and Data Pipelines
@ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure
@ Yahoo : Display Advertising, Behavioral Targeting, Payments
@ PayPal : Site Engineering and Architecture
@ Play : Advisor to various Startups (Data, Security, Containers)
Who is this guy?
What to Expect from the Session
• Architectural design and principles for Keystone
• Current state of technologies that Keystone is leveraging
• Best practices in operating Kafka and Samza
Why are we here?
Publish, Collect, Process, Aggregate & Move Data
@ Cloud Scale
• 550 billion events per day
• 8.5 million events & 21 GB per second during peak
• 1+ PB per day
• Hundreds of event types
By the numbers
How did we get here?
Chukwa
Chukwa/Suro + Real-Time
Chukwa/Suro + Real-Time
Now what?!!
Keystone
Keystone
Split Fronting Kafka Clusters
Normal-priority (majority)
• 2 copies, 12 hour retention
High-priority (streaming activities etc.)
• 3 copies, 24 hour retention
Split Fronting Kafka Clusters
Instance type - D2XL
• Large disk (6TB) with 450-475MB/s of sequential I/O
throughput measured
• Large memory (30GB)
• Medium network capability (~ 700Mbps)
• Replication lag starts to show when bytes in above
18MB/second per broker with thousands of partition
• PR is available to Apache Kafka
• https://github.com/apache/kafka/pull/132
• https://issues.apache.org/jira/browse/KAFKA-1215
• Improved availability
• Reduce cost of maintenance
Kafka Zone Aware Replica Assignment
Keystone
Control Plane + Data Plane
• Control plane for router is job manager
• Infrastructure is data plane
• Declarative, reconciliation driven
• Smart scheduling managing tradeoffs
• Auto Scaling based on traffic
• Fault tolerance
• Application (router) faults
• AWS hardware faults
Keystone
Routing Service - Samza
Routing Service - Samza
Routing Service - Samza
Amazon S3 Routing
• ~5800 long running containers for Amazon S3 routing
• ~500 C3-4XL AWS instances for Amazon S3 routing
Elasticsearch Routing
• ~850 long running containers for Elasticsearch routing
• ~70 C3-4XL AWS instances for Elasticsearch routing
Kafka Routing
• ~3200 long running containers for Kafka routing
• ~280 C3-4XL AWS instances for Kafka routing
Routing Service - Samza
Container Footprint:
• 2G - 5G memory
• 160 mbps max network bandwidth
• 1 CPU Share
• 20G disk for buffer & logs
• Processes 1-12 partitions
• Periodically reports health to infrastructure
Routing Service - Samza
Observed Numbers:
• Avg memory usage of ~1.8G per container
• Avg memory usage per node ~20G(Range: 7G - 25G)
• Avg CPU utilization of 8% per node
• Avg NetworkIn 256Mbps per node
• Avg NetworkOut 156Mbps per node
Routing Service - Samza
Publish to Amazon S3 sink:
• Every 200mb or 2 mins
• S3 average upload latency 200ms
Producer to Router latency:
• 30 percentile topics under 500 ms
• 70 percentile topics under 1 sec
• 90 percentile under 2 sec
• Overall average about 2.5 seconds
Kafka to Router consumer lag (est time to catch up):
• 65 percentile under 500ms
• 90 percentile under 5 seconds
+ Alterations
• Internal build of Samza version 0.9.1
• Fixed SAMZA-41 in 0.9.1
• Support static partition range assignment
• Added SAMZA-775 in 0.9.1
• Prefetch buffer specified based on heap to use
• Backported SAMZA-655 to 0.9.1
• Environment variable configuration rewriter
• Backported SAMZA-540 to version 0.9.1
• Expose latency related metrics in OffsetManager
• Integration with Netflix Alert & Monitoring systems
Keystone
Real-time processing
Real-time processing
Real-time processing
Real-time processing
• Streaming jobs to analyze movie plays, A/B tests, etc.
• Direct API for Kafka in 1.3
• Observed 2x performance improvement compared to 1.2
• Additional improvement possible with prefetching and connection pooling
(not available yet)
• Campaigned for backpressure support
• Result - Spark 1.5 release has community developed back pressure
support SPARK-7398
Great. How do I use it?
Annotation-based event definition
@Resource(type = ConsumerStorageType.DB, name = "S3Diagnostics")
public class S3Diagnostics implements Annotatable {
....
S3Diagnostics s3Diagnostics = new S3Diagnostics();
....
LogManager.logEvent(s3Diagnostics); // log this diagnostic event
Java
{
"eventName" : "ksproxytest",
"payload" : {
"k1" : "v1",
"k2" : "v2"
}
}
Non-Java : Keystone Proxy
Wire format
• Extensible
• Currently supports JSON
• Will support Avro
• Encapsulated as a shareable jar
• Immutable message through the pipeline
Producer Resilience
• Outage should never disrupt existing instances from serving business
purpose
• Outage should never prevent new instances from starting up
• After service is restored, event producing should resume
automatically
Fail, but never block
block.on.buffer.full=false
handle potential blocking of first metadata request
Trust me, it works!
Keystone Dashboard
Keystone Dashboard
Keystone Dashboard
Trust, but verify!
• Broker monitoring
• Alert on offline broker from ZooKeeper
• Consumer monitoring
• Alert on consumer lag/stuck and unconsumed partitions
• Heart-beating
• Produce/consume messages and measure latency
• Broker performance testing
• Produce tens of thousands messages per second on single instance
• Create multiple consumer groups to test consumer impact on broker
Auditor
Auditor - Broker Monitoring
Consumer Offset
Stuck Consumer Unconsumed Partitions
Auditor - Consumer Monitoring
Consumer Lag
Meet Winston
New Internal Automation Engine:
• Collect diagnostic information based on alerts & other operational
events
• Help services self heal
• Reduce MTTR
• Reduce pager fatigue
• Improve productivity for developer
Winston
Winston
How do you like your Kaffee?
Kaffee
Kaffee
Kaffee
What’s next?
• Performance tuning + optimizations
• Self service
• Schemas + registry
• Event discovery + visualization
• Open Source Auditor/Kaffee
Near Term
And then???
Global real-time data stream + stream processing network
Remember to complete
your evaluations!
Thank you!