Building Big: Lessons learned from Windows Azure customers – Part One

Building Big: Lessons learned from Windows Azure customers – Part OneMark Simms (@mabsimms) Simon Davies(@simongdavies)Principal Program Manager Windows Azure Technical Specialist Microsoft Microsoft3-029

Session ObjectivesDesigning large-scale services requires careful design and architecture choicesThis session will explore customer deployments on Azure and illustrate the key choices, tradeoffs and learningsTwo part session:• Part 1: Building for Scale• Part 2: Building for Availability

Other Great SessionsThis session will focus on architecture and design choices for delivering large scale services.If this isn’t a compelling topic, there are many other great sessions happening right now!

Room Level Title PresenterNexus/Normandy

300 Advanced Windows Azure Infrastructure as a Service (IaaS)

Michael Washam

Trident/Thunder 200 What’s new in VS2012 Orville McDonaldOdyssey 300 Apps for Office and SharePoint development

using the all new browser-based “Napa” and Visual Studio 2012

Saurabh BhatiaJim Nakashima

Magellan 200 WP8: Making Money with Your Application on Windows Phone

Todd Brix

Building Big – the scale challengePartitioning your applicationCaching your data

Agenda

What do we mean by large scale?Millions of usersHundreds of thousands of operations per secondThousands of coresHundreds of databases

Designing and Deploying Internet Scale ServicesWhat does Azure do for me?James Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf

Redundancy and Fault RecoveryCommodity hardware sliceSingle version softwareMulti-tenancySupport geo-distributionAutomatic provisioning and installationConfiguration and code as a unitManage roles, not serversDeal with multi-system failuresRecover at the service level

http://static.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf


Designing and Deploying Internet Scale ServicesJames Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf

Partition the service Design for Failure• Do not trust underlying

components• Decouple components• Avoid single points of failure• Support geo-distributionInstrument everything• Implement inter-service

monitoring and alerting• Instrument for production testing• Configurable logging

Part 1: Design for Scale Part 2: Design for Availability

Optimize for density



http://www.microsoft.com/en-us/news/features/2012/jun12/06-06Pottermore.aspx




500databases

1Bpage views

1000cores

110Mdaily peak

pvs

Pottermore

Decomposing Typical Social Application WorkloadsContent DeliverySite-wide content, transient state (session state)

Content ExplorationPer-user content view, per-user stateful progress

Social Graph and ContentPer-user content view (comments, likes, etc), global reach (any user can reach any other user). Loosely consistent / asynchronous updates to N consumers.

Interactive GamingN-user content view (game actions, session, etc), global reach (any user can reach any other user). Interactive state updates shared amongst N players.

Build for Scale – Partitioning and Scale OutAzure architecture is based on scale-out; composing multiple scale units to build large systems

Azure Compute (Web, Worker, IaaS)• 1-8 CPU cores• 2-14 GB RAM• 5-800 Mbps

network

Azure Storage

• 100 TB storage (max)

• 5000 operations / sec

• 3 Gbps

Azure SQL Database

• 150 GB • 305 threads• 400 concurrent

reqs

Evaluating Scale

Azure Cloud Service

Load Balancer

Windows Azure SQL Database

Web Role

Worker Role

Aspect Partitioning Capacity Web role Low state

Automatic (via load balancer)Round-robin

Add more instances (easy)

SQL Database

High stateManual (via app code)Choose partitioning function

Add more databases

Horizontal Partitioning

David Alexander [email protected] Carlson [email protected] Charles [email protected] Mitchel [email protected]

mRichard Zeng [email protected]

A C M Z

mailto:[email protected]







Vertical PartitioningDavid Alexander [email protected]

mJarred Carlson [email protected] Charles [email protected]

Simon Mitchel [email protected] Zeng [email protected]










Hybrid PartitioningDavid Alexander [email protected]

mJarred Carlson [email protected] Charles [email protected]

Simon Mitchel [email protected] Zeng [email protected]

A-L M-Z










Understanding Partitioning for Scale1 •Select the partition key

2 •Convert partition key to a partition value (optional)

3 •Map partition value to a logical partition

4 •Map logical partition to physical resource

Last Name

LastName.SubString(0, 2) ->

“Si”

ShardMap[“Si”] -> S

DbMap[“S”] -> “Db0123S”

Partitioning the Database (Range Based)1 •The user (user ID) is a natural partitioning key; all workloads are user-centric

2 •Use a non-cryptographic hash to convert the user ID to an integer value

3 •Map a range of integers to a logical “shard”

4 •Map logical “shard” to physical resource (database)

“MaSimms”

639837447

ShardMap.FirstOrDefault(e =>

e.IsInRange(639837447))

DbMap[Shard].ConnectionString

Demo: Partitioning Code (Range Based)

Range Based Partitioning

UserData_001

JohnSmith

Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed

-789794523Hash ShardMap

Shard: 1 -1288490190:-

429496730

Resource Map

Logical Bucket Based Partitioning

UserData_001

JohnSmith


-789794523Hash ShardMap (32 buckets)

Shard: 27

Resource Map

Logical buckets mapped to physical databases

Lookup Bucket Based Partitioning

UserData_001

JohnSmith


-789794523Hash ShardMap

Shard: 2

Resource Map

Lookup records map each partition value to a logical/physical resource

Lookup

Distributed Caching

More capacity – now what?Not practical to query durable store for every request

Throughput and LatencyEfficiency\COGs

Not all data needs to be immediately consistent.

Build for Scale – Shift to Distributed CacheDistributed cache engines can provide high-throughput low-latency access to commonly accessed application data• Semantic: Key -> byte[] • In-memory data (not written to disk)• Scale-out architecture (client-side partitioning,

explicit connections to physical resource)• Examples: memcached, Azure Caching

8datacentres

2BPeak requests a day

50K Peak Request per

second

Press Association

Publishing Information Stream

• One source, many subscribers

• Worker role collects data, publishes to cache

• Web instances feed from cache, publish to users

Caching Resource Data

WebRole Instance WebRole Instance

Azure Load Balancer

WebRole Instance

Cache Role Instance

Source Data Service

HTTP GET

Cache Role Instance

HTTP GET

Worker RoleInstance

PUT PUT

Memcached on Windows Azure•

••

•

•

•

••

•

Provisioned by running memcached within a worker role in your service

Requires custom set-up and management code

Good performance and scale*

Windows Azure CacheGeneral Availability as part of the Windows Azure 1.8 SDK

Cache is deployed into your service as a worker role

Good Performance and Scale

•

•

••

•

••

•

••

••

High Availability for Windows Azure Cache What happens when rolling out new application version, Guest OS or a Host OS upgrade?Data moved to available nodes by upgrade domain

How does the cache behave if we add or remove instances?Adding – ring is rebalanced data may be movedDeleting – data is NOT moved – be careful

What about node failureDepends on configuration

Dealing with Node FailureCache can be protected from node failure by keeping a secondary copyStrong consistency model – overhead on writing

Cache Data Population and RefreshOn DemandCache Aside – client pulls data from source and caches on cache missData Push Background tasks (e.g. worker roles ) populate cache with data on a scheduleData PullAsync refresh triggered by client on detection of stale data – requires careful design

Demo: Integrating Distributed Cache

Recap and ResourcesBuilding big: • The scale challenge• Partition your application• Optimize state management (cache)

Resources:Best Practices for the Design of Large-Scale Services on Windows Azure Cloud ServicesTODO: failsafe doc link

http://msdn.microsoft.com/en-us/library/windowsazure/jj717232.aspx




• Follow us on Twitter @WindowsAzure

• Get Started: www.windowsazure.com/build

Resources

Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

http://www.windowsazure.com/build

http://aka.ms/BuildSessions

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Date post:	20-Feb-2016
Category:	Documents
Upload:	dawn
View:	42 times
Download:	2 times

Building Big: Lessons learned from Windows Azure customers – Part One

Documents