Date post: | 16-Aug-2015 |
Category: |
Technology |
Upload: | nick-stephens |
View: | 32 times |
Download: | 2 times |
Cloud Scale: AWS and AzureLessons Learned
October 15th, 2014
Nick Stephens
Cloud Scale Challenge
• Pariveda held an internal competition to build a highly scalable cloud application
• The application had to be built on 2 of the most popular clouds– AWS and Azure
• It was a great learning experience
Competition - Rules
• Build simple E-commerce site– Search for Products– Add to Cart– Submit Order
• Build on both AWS and Azure– Must use 3 services each cloud offers
• Best performance for price wins
Competition - SLAs
• Search for Product– 600,000 requests/min with response in 1 sec
• Add to Cart– 30,000 requests/min with response within 500 ms– Request must be persisted within 10 sec
• Submit Order– 3,000 requests/min with response within 500 ms– Request must be persisted within 10 sec
Competition - Deliverables
• Teams pick their most cost effective solution
• Demo chosen solution to judges
• Must prove SLAs were met by generating load on system
My Team’s Solution
• Strategy– Re-use as much as possible• Chose IaaS over PaaS for portability
– Pick right technology for problem• Chose NodeJS because of high networking and low CPU need
– Handle Add to Cart and Submit Order requests asynchronously• Queue request to scale more easily
My Team’s Solution
• Development– Coded to interface to abstract cloud specific storage logic• Separate implementations for each cloud
– Used Redis as a queue with Redisq library• VM with Redis on AWS • Redis Cache on Azure
My Team’s Solution
• AWS Architecture– NodeJS Web Server– Redis Server (Queue)– NodeJS Worker
• Services Used– EC2– DynamoDB– Cloud Search
My Team’s Solution
• Testing– Needed to generate heavy load on the system to prove SLAs• Built a custom load test rig to capture client response times and request
persistence times
– Response times were captured in SQL database for easy reporting
– Used Remote Desktop to monitor servers• Watched CPU and network traffic to gauge performance
My Team’s Solution
• Competition Results– We demoed our solution but didn’t meet all SLAs• Only achieved approximately 300,000 searches/min
– We hadn’t tested our system at that scale• We realized a bottleneck during the demo
– We didn’t have all of the deployment automated• We couldn’t quickly redeploy, scale out, and retest
Winning Team’s Solution
• Development– Developed AWS and Azure solution separately• Both started out using .NET on Windows
– AWS solution switched to NodeJS on Linux• Linux servers are much cheaper than Windows
– Azure solution ended up being cheaper• Higher SQS vs Azure storage transaction costs added
Winning Team’s Solution
• Azure Architecture– .NET Web API– PaaS– Azure Storage
• Services Used– Web Roles– Worker Roles– Azure Storage
Winning Team’s Solution
• Testing– Wrote custom test harness• Could view aggregate results from test runners
– Increased application servers until meet SLAs
– Tried different sizes of instances
Lessons Learned
• Scale Out not Up– This type of problem is a network bound problem– More instances were better than larger instances
• Synchronous writes were possible for this scenario– The teams that had synchronous writes had to scale out more– Asynchronous writes can be quicker and scales better
Lessons Learned
• Capture metrics to judge performance– Metrics can show bottlenecks– Objective measure of performance
• Use existing tools whenever possible– Some teams used load testing service instead of custom tool– Allowed those teams to focus more on application
Lessons Learned
• Automate deployment as much as possible– Fast and reliable process
• No clear winner in AWS vs Azure– Team submissions were split between AWS and Azure– Each cloud had similar but unique feature sets– Either cloud could have won with right architecture
QUESTIONS?