Date post: | 17-Feb-2017 |
Category: |
Technology |
Upload: | chef |
View: | 86 times |
Download: | 0 times |
NETFLIX’S CLOUD MIGRATION
Ariel Tseitlin Partner, Scale Venture Partners July 10, 2016
It all started in 2008…
A TALE OF A CLOUD TRANSFORMATION
ABOUT NETFLIX Netflix is the world’s leading Internet television network with over 81 million members in more than 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series[1]
[1] http://ir.netflix.com/
A TROUBLED STATE OF AFFAIRS • Single monolithic Java app
• 2 Data Centers, running 100% of production and test
• Centralized release process, train left every two weeks
• 6-8 weeks to provision new hardware
THE DAY THE SHELL CRACKED • Oracle database corruption rendered site unavailable
• It took 3 days to recover
• Unable to ship any DVDs during downtime
A JOURNEY TO THE SKY • Combined cloud migration with micro-services re-architecture
• Started small, tunneling back to data center
• Shifted more and more services to cloud over time
• A difficult period of “roman-riding”
• 2010 iPhone launch done entirely from the cloud
CORPORATE SYSTEMS TOO
It wasn’t just the Product
CORPORATE SYSTEM MOVED TO SAAS • Email (Exchange->Google Apps)
• Expense Management (Concur->Workday)
• Document sharing (File Servers->Box)
• And many more…
• Goal: 100% SaaS, no data centers
MADE POSSIBLE IN THE CLOUD
Agility, APIs, Elasticity, Efficiency, Resiliency
AGILITY • Removed separation between dev and ops
• Resources at the click of a button
• Decentralized continuous delivery
APIS • Software-controlled infrastructure (start, terminate, scale)
• Inject failure
• Monitor & audit
• Automate operations
ELASTICITY • Capacity planning replaced with forecasting
• Dynamic load-based auto-scaling
• New data centers at the click of a button
EFFICIENCY • ~10x trough to peak ratio. • Optimize machine class for each service
• Highly available red/black deployments
RESILIENCY • Failure injection
• Redundancy (multiple AZs, multiple regions)
• Automated remediation
• Decentralized operations
• Improved performance and reliability
ORGANIZATION HAD TO CHANGE • BEFORE: Centralized NOC & IT Ops reporting to CIO • AFTER: Centralized platform, reporting to CPO, with service teams
for • Performance • Availability • Security • Delivery
SPECIAL CASE OR A BLUEPRINT FOR SUCCESS? • Was there anything unique about Netflix that enabled it to make the
cloud & DevOps transformation? YES & NO • No, any organization / enterprise can (& will) transform itself to
become cloud-native • Yes, you need the right culture, talent, and desire (but you will be
out-competed if you don’t!)