MesosCon Europe 2017
Making and keeping Netflix highly available
Katharina ProbstEngineering Director
100+ MillionCustomers
By the numbers.
125+Million
Hours watched per
day
380Micro-
services in production
1000+Device types
Mantis overview
Micro-service Clusters Mantis
Stream processingCloud native service
● Configurable message delivery guarantees● Heterogeneous workloads
○ Real-time dashboarding, alerting○ Anomaly detection, metric generation○ Interactive exploration of streaming data
AnomalyDetection
Core architectural components
AWS EC2
Apache Mesos
Mantis Framework
FenzoFenzo Scheduler
Optimized for cloud
Scale underlying agent cluster
Fitness criteria, e.g.,● bin packing● spreading tasks across
EC2 AZs for high availability
Fenzo
Know something is wrong in seconds, not minutes
Breakdown by region
Breakdown by device type
Real-time SPS
Faster detection
Faster insights into causes
Faster mitigation
What does this all mean?
Happier customers!