Building Big: Lessons learned from Windows Azure customers – Part OneMark Simms (@mabsimms) Simon Davies(@simongdavies)Principal Program Manager Windows Azure Technical Specialist Microsoft Microsoft3-029
Session ObjectivesDesigning large-scale services requires careful design and architecture choicesThis session will explore customer deployments on Azure and illustrate the key choices, tradeoffs and learningsTwo part session:• Part 1: Building for Scale• Part 2: Building for Availability
Other Great SessionsThis session will focus on architecture and design choices for delivering large scale services.If this isn’t a compelling topic, there are many other great sessions happening right now!
Room Level Title PresenterNexus/Normandy
300 Advanced Windows Azure Infrastructure as a Service (IaaS)
Michael Washam
Trident/Thunder 200 What’s new in VS2012 Orville McDonaldOdyssey 300 Apps for Office and SharePoint development
using the all new browser-based “Napa” and Visual Studio 2012
Saurabh BhatiaJim Nakashima
Magellan 200 WP8: Making Money with Your Application on Windows Phone
Todd Brix
Building Big – the scale challengePartitioning your applicationCaching your data
Agenda
What do we mean by large scale?Millions of usersHundreds of thousands of operations per secondThousands of coresHundreds of databases
Designing and Deploying Internet Scale ServicesWhat does Azure do for me?James Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
Redundancy and Fault RecoveryCommodity hardware sliceSingle version softwareMulti-tenancySupport geo-distributionAutomatic provisioning and installationConfiguration and code as a unitManage roles, not serversDeal with multi-system failuresRecover at the service level
Designing and Deploying Internet Scale ServicesJames Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
Partition the service Design for Failure• Do not trust underlying
components• Decouple components• Avoid single points of failure• Support geo-distributionInstrument everything• Implement inter-service
monitoring and alerting• Instrument for production testing• Configurable logging
Part 1: Design for Scale Part 2: Design for Availability
Optimize for density
http://www.microsoft.com/en-us/news/features/2012/jun12/06-06Pottermore.aspx
500databases
1Bpage views
1000cores
110Mdaily peak
pvs
Pottermore
Decomposing Typical Social Application WorkloadsContent DeliverySite-wide content, transient state (session state)
Content ExplorationPer-user content view, per-user stateful progress
Social Graph and ContentPer-user content view (comments, likes, etc), global reach (any user can reach any other user). Loosely consistent / asynchronous updates to N consumers.
Interactive GamingN-user content view (game actions, session, etc), global reach (any user can reach any other user). Interactive state updates shared amongst N players.
Build for Scale – Partitioning and Scale OutAzure architecture is based on scale-out; composing multiple scale units to build large systems
Azure Compute (Web, Worker, IaaS)• 1-8 CPU cores• 2-14 GB RAM• 5-800 Mbps
network
Azure Storage
• 100 TB storage (max)
• 5000 operations / sec
• 3 Gbps
Azure SQL Database
• 150 GB • 305 threads• 400 concurrent
reqs
Evaluating Scale
Azure Cloud Service
Load Balancer
Windows Azure SQL Database
Web Role
Worker Role
Aspect Partitioning Capacity Web role Low state
Automatic (via load balancer)Round-robin
Add more instances (easy)
SQL Database
High stateManual (via app code)Choose partitioning function
Add more databases
Horizontal Partitioning
David Alexander [email protected] Carlson [email protected] Charles [email protected] Mitchel [email protected]
mRichard Zeng [email protected]
A C M Z
Vertical PartitioningDavid Alexander [email protected]
mJarred Carlson [email protected] Charles [email protected]
Simon Mitchel [email protected] Zeng [email protected]
Hybrid PartitioningDavid Alexander [email protected]
mJarred Carlson [email protected] Charles [email protected]
Simon Mitchel [email protected] Zeng [email protected]
A-L M-Z
Understanding Partitioning for Scale1 •Select the partition key
2 •Convert partition key to a partition value (optional)
3 •Map partition value to a logical partition
4 •Map logical partition to physical resource
Last Name
LastName.SubString(0, 2) ->
“Si”
ShardMap[“Si”] -> S
DbMap[“S”] -> “Db0123S”
Partitioning the Database (Range Based)1 •The user (user ID) is a natural partitioning key; all workloads are user-centric
2 •Use a non-cryptographic hash to convert the user ID to an integer value
3 •Map a range of integers to a logical “shard”
4 •Map logical “shard” to physical resource (database)
“MaSimms”
639837447
ShardMap.FirstOrDefault(e =>
e.IsInRange(639837447))
DbMap[Shard].ConnectionString
Demo: Partitioning Code (Range Based)
Range Based Partitioning
UserData_001
JohnSmith
Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed
-789794523Hash ShardMap
Shard: 1 -1288490190:-
429496730
Resource Map
Logical Bucket Based Partitioning
UserData_001
JohnSmith
Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed
-789794523Hash ShardMap (32 buckets)
Shard: 27
Resource Map
Logical buckets mapped to physical databases
Lookup Bucket Based Partitioning
UserData_001
JohnSmith
Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed
-789794523Hash ShardMap
Shard: 2
Resource Map
Lookup records map each partition value to a logical/physical resource
Lookup
Distributed Caching
More capacity – now what?Not practical to query durable store for every request
Throughput and LatencyEfficiency\COGs
Not all data needs to be immediately consistent.
Build for Scale – Shift to Distributed CacheDistributed cache engines can provide high-throughput low-latency access to commonly accessed application data• Semantic: Key -> byte[] • In-memory data (not written to disk)• Scale-out architecture (client-side partitioning,
explicit connections to physical resource)• Examples: memcached, Azure Caching
8datacentres
2BPeak requests a day
50K Peak Request per
second
Press Association
Publishing Information Stream
• One source, many subscribers
• Worker role collects data, publishes to cache
• Web instances feed from cache, publish to users
Caching Resource Data
WebRole Instance WebRole Instance
Azure Load Balancer
WebRole Instance
Cache Role Instance
Source Data Service
HTTP GET
Cache Role Instance
HTTP GET
Worker RoleInstance
PUT PUT
Memcached on Windows Azure•
••
•
•
•
••
•
Provisioned by running memcached within a worker role in your service
Requires custom set-up and management code
Good performance and scale*
Windows Azure CacheGeneral Availability as part of the Windows Azure 1.8 SDK
Cache is deployed into your service as a worker role
Good Performance and Scale
•
•
••
•
••
•
••
••
High Availability for Windows Azure Cache What happens when rolling out new application version, Guest OS or a Host OS upgrade?Data moved to available nodes by upgrade domain
How does the cache behave if we add or remove instances?Adding – ring is rebalanced data may be movedDeleting – data is NOT moved – be careful
What about node failureDepends on configuration
Dealing with Node FailureCache can be protected from node failure by keeping a secondary copyStrong consistency model – overhead on writing
Cache Data Population and RefreshOn DemandCache Aside – client pulls data from source and caches on cache missData Push Background tasks (e.g. worker roles ) populate cache with data on a scheduleData PullAsync refresh triggered by client on detection of stale data – requires careful design
Demo: Integrating Distributed Cache
Recap and ResourcesBuilding big: • The scale challenge• Partition your application• Optimize state management (cache)
Resources:Best Practices for the Design of Large-Scale Services on Windows Azure Cloud ServicesTODO: failsafe doc link
• Follow us on Twitter @WindowsAzure
• Get Started: www.windowsazure.com/build
Resources
Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.