+ All Categories
Home > Documents > Building Big: Lessons learned from Windows Azure customers – Part One

Building Big: Lessons learned from Windows Azure customers – Part One

Date post: 20-Feb-2016
Category:
Upload: dawn
View: 42 times
Download: 2 times
Share this document with a friend
Description:
Building Big: Lessons learned from Windows Azure customers – Part One. Mark Simms (@ mabsimms )Simon Davies(@ simongdavies ) Principal Program ManagerWindows Azure Technical Specialist MicrosoftMicrosoft. 3-029. Session Objectives. - PowerPoint PPT Presentation
Popular Tags:
37
Building Big: Lessons learned from Windows Azure customers – Part One Mark Simms (@mabsimms)Simon Davies(@simongdavies) Principal Program Manager Windows Azure Technical Specialist Microsoft Microsoft 3-029
Transcript
Page 1: Building Big: Lessons learned from Windows Azure  customers – Part One

Building Big: Lessons learned from Windows Azure customers – Part OneMark Simms (@mabsimms) Simon Davies(@simongdavies)Principal Program Manager Windows Azure Technical Specialist Microsoft Microsoft3-029

Page 2: Building Big: Lessons learned from Windows Azure  customers – Part One

Session ObjectivesDesigning large-scale services requires careful design and architecture choicesThis session will explore customer deployments on Azure and illustrate the key choices, tradeoffs and learningsTwo part session:• Part 1: Building for Scale• Part 2: Building for Availability

Page 3: Building Big: Lessons learned from Windows Azure  customers – Part One

Other Great SessionsThis session will focus on architecture and design choices for delivering large scale services.If this isn’t a compelling topic, there are many other great sessions happening right now!

Room Level Title PresenterNexus/Normandy

300 Advanced Windows Azure Infrastructure as a Service (IaaS)

Michael Washam

Trident/Thunder 200 What’s new in VS2012 Orville McDonaldOdyssey 300 Apps for Office and SharePoint development

using the all new browser-based “Napa” and Visual Studio 2012

Saurabh BhatiaJim Nakashima

Magellan 200 WP8: Making Money with Your Application on Windows Phone

Todd Brix

Page 4: Building Big: Lessons learned from Windows Azure  customers – Part One

Building Big – the scale challengePartitioning your applicationCaching your data

Agenda

Page 5: Building Big: Lessons learned from Windows Azure  customers – Part One

What do we mean by large scale?Millions of usersHundreds of thousands of operations per secondThousands of coresHundreds of databases

Page 6: Building Big: Lessons learned from Windows Azure  customers – Part One

Designing and Deploying Internet Scale ServicesWhat does Azure do for me?James Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf

Redundancy and Fault RecoveryCommodity hardware sliceSingle version softwareMulti-tenancySupport geo-distributionAutomatic provisioning and installationConfiguration and code as a unitManage roles, not serversDeal with multi-system failuresRecover at the service level

Page 7: Building Big: Lessons learned from Windows Azure  customers – Part One

Designing and Deploying Internet Scale ServicesJames Hamilton, https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf

Partition the service Design for Failure• Do not trust underlying

components• Decouple components• Avoid single points of failure• Support geo-distributionInstrument everything• Implement inter-service

monitoring and alerting• Instrument for production testing• Configurable logging

Part 1: Design for Scale Part 2: Design for Availability

Optimize for density

Page 9: Building Big: Lessons learned from Windows Azure  customers – Part One

500databases

1Bpage views

1000cores

110Mdaily peak

pvs

Pottermore

Page 10: Building Big: Lessons learned from Windows Azure  customers – Part One

Decomposing Typical Social Application WorkloadsContent DeliverySite-wide content, transient state (session state)

Content ExplorationPer-user content view, per-user stateful progress

Social Graph and ContentPer-user content view (comments, likes, etc), global reach (any user can reach any other user). Loosely consistent / asynchronous updates to N consumers.

Interactive GamingN-user content view (game actions, session, etc), global reach (any user can reach any other user). Interactive state updates shared amongst N players.

Page 11: Building Big: Lessons learned from Windows Azure  customers – Part One
Page 12: Building Big: Lessons learned from Windows Azure  customers – Part One

Build for Scale – Partitioning and Scale OutAzure architecture is based on scale-out; composing multiple scale units to build large systems

Azure Compute (Web, Worker, IaaS)• 1-8 CPU cores• 2-14 GB RAM• 5-800 Mbps

network

Azure Storage

• 100 TB storage (max)

• 5000 operations / sec

• 3 Gbps

Azure SQL Database

• 150 GB • 305 threads• 400 concurrent

reqs

Page 13: Building Big: Lessons learned from Windows Azure  customers – Part One

Evaluating Scale

Azure Cloud Service

Load Balancer

Windows Azure SQL Database

Web Role

Worker Role

Aspect Partitioning Capacity Web role Low state

Automatic (via load balancer)Round-robin

Add more instances (easy)

SQL Database

High stateManual (via app code)Choose partitioning function

Add more databases

Page 17: Building Big: Lessons learned from Windows Azure  customers – Part One

Understanding Partitioning for Scale1 •Select the partition key

2 •Convert partition key to a partition value (optional)

3 •Map partition value to a logical partition

4 •Map logical partition to physical resource

Last Name

LastName.SubString(0, 2) ->

“Si”

ShardMap[“Si”] -> S

DbMap[“S”] -> “Db0123S”

Page 18: Building Big: Lessons learned from Windows Azure  customers – Part One

Partitioning the Database (Range Based)1 •The user (user ID) is a natural partitioning key; all workloads are user-centric

2 •Use a non-cryptographic hash to convert the user ID to an integer value

3 •Map a range of integers to a logical “shard”

4 •Map logical “shard” to physical resource (database)

“MaSimms”

639837447

ShardMap.FirstOrDefault(e =>

e.IsInRange(639837447))

DbMap[Shard].ConnectionString

Page 19: Building Big: Lessons learned from Windows Azure  customers – Part One

Demo: Partitioning Code (Range Based)

Page 20: Building Big: Lessons learned from Windows Azure  customers – Part One
Page 21: Building Big: Lessons learned from Windows Azure  customers – Part One

Range Based Partitioning

UserData_001

JohnSmith

Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed

-789794523Hash ShardMap

Shard: 1 -1288490190:-

429496730

Resource Map

Page 22: Building Big: Lessons learned from Windows Azure  customers – Part One

Logical Bucket Based Partitioning

UserData_001

JohnSmith

Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed

-789794523Hash ShardMap (32 buckets)

Shard: 27

Resource Map

Logical buckets mapped to physical databases

Page 23: Building Big: Lessons learned from Windows Azure  customers – Part One

Lookup Bucket Based Partitioning

UserData_001

JohnSmith

Range based partitioningHash (MurMur3) against Upper()5 shards, evenly distributed

-789794523Hash ShardMap

Shard: 2

Resource Map

Lookup records map each partition value to a logical/physical resource

Lookup

Page 24: Building Big: Lessons learned from Windows Azure  customers – Part One

Distributed Caching

Page 25: Building Big: Lessons learned from Windows Azure  customers – Part One

More capacity – now what?Not practical to query durable store for every request

Throughput and LatencyEfficiency\COGs

Not all data needs to be immediately consistent.

Page 26: Building Big: Lessons learned from Windows Azure  customers – Part One

Build for Scale – Shift to Distributed CacheDistributed cache engines can provide high-throughput low-latency access to commonly accessed application data• Semantic: Key -> byte[] • In-memory data (not written to disk)• Scale-out architecture (client-side partitioning,

explicit connections to physical resource)• Examples: memcached, Azure Caching

Page 27: Building Big: Lessons learned from Windows Azure  customers – Part One

8datacentres

2BPeak requests a day

50K Peak Request per

second

Press Association

Page 28: Building Big: Lessons learned from Windows Azure  customers – Part One

Publishing Information Stream

• One source, many subscribers

• Worker role collects data, publishes to cache

• Web instances feed from cache, publish to users

Caching Resource Data

WebRole Instance WebRole Instance

Azure Load Balancer

WebRole Instance

Cache Role Instance

Source Data Service

HTTP GET

Cache Role Instance

HTTP GET

Worker RoleInstance

PUT PUT

Page 29: Building Big: Lessons learned from Windows Azure  customers – Part One

Memcached on Windows Azure•

••

••

Provisioned by running memcached within a worker role in your service

Requires custom set-up and management code

Good performance and scale*

Page 30: Building Big: Lessons learned from Windows Azure  customers – Part One

Windows Azure CacheGeneral Availability as part of the Windows Azure 1.8 SDK

Cache is deployed into your service as a worker role

Good Performance and Scale

••

••

••

••

Page 31: Building Big: Lessons learned from Windows Azure  customers – Part One

High Availability for Windows Azure Cache What happens when rolling out new application version, Guest OS or a Host OS upgrade?Data moved to available nodes by upgrade domain

How does the cache behave if we add or remove instances?Adding – ring is rebalanced data may be movedDeleting – data is NOT moved – be careful

What about node failureDepends on configuration

Page 32: Building Big: Lessons learned from Windows Azure  customers – Part One

Dealing with Node FailureCache can be protected from node failure by keeping a secondary copyStrong consistency model – overhead on writing

Page 33: Building Big: Lessons learned from Windows Azure  customers – Part One

Cache Data Population and RefreshOn DemandCache Aside – client pulls data from source and caches on cache missData Push Background tasks (e.g. worker roles ) populate cache with data on a scheduleData PullAsync refresh triggered by client on detection of stale data – requires careful design

Page 34: Building Big: Lessons learned from Windows Azure  customers – Part One

Demo: Integrating Distributed Cache

Page 35: Building Big: Lessons learned from Windows Azure  customers – Part One

Recap and ResourcesBuilding big: • The scale challenge• Partition your application• Optimize state management (cache)

Resources:Best Practices for the Design of Large-Scale Services on Windows Azure Cloud ServicesTODO: failsafe doc link

Page 36: Building Big: Lessons learned from Windows Azure  customers – Part One

• Follow us on Twitter @WindowsAzure

• Get Started: www.windowsazure.com/build

Resources

Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

Page 37: Building Big: Lessons learned from Windows Azure  customers – Part One

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Recommended