Storage Strategies

AgendaPartitioningHorizontal PartitioningVertical PartitioningNon-Relational Data ModelingUpgrade Scenarios for the Data

Tier

OutlineData PartitioningVertical PartitioningHorizontal Partitioning

Partitioning in:Windows Azure StorageSQL Azure

Windows Azure TablesData modelingUpgrade scenarios

Why PartitionData Volume (too many bytes)

Work Load (too many transactions/second)

Cost (using different cost storage)

Elasticity (just in time partitioning for high load periods)

Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Horizontal Partitioning (Sharding)Spread Data Across Similar Nodes

Achieve Massive Scale Out (Data and Load)

Intra-Partition Queries Simple

Cross-Partition Queries Harder

David Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Vertical Partitioning

SQL Azure Tables BLOBS

Spread Data Across Dis-Similar Nodes

Place frequently queried data in more ‘expensive’ indexed storage

Place large data in ‘cheap’ binary storage

Retrieving

a whole row requires >1 query


Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Horizontal Partitioning

Table Storage – Key PointsPartitions are Auto-BalancedNo need to partition into equal bins

Hot partitions may be scaled upWindows Azure fabric may dedicate more resources to partitions with high Tx load

Partition Key AND Row Key = Unique IDMust include Partition Key for Create, Update, DeleteSelect queries across partitions run sequentially

Don’t use sequential partition keys

Table Storage – Key PointsContinuation Tokens May Be Returned from Cross Partition QueriesAny query not including the Rowkey and PartitionKey (only those as well) needs to handle Continuation tokenshttp://tinyurl.com/ContToken

Key Columns Up to 1KB in sizeShould aim to keep to 260 char URI limit

Be aggressive e.g. Only ever query by an ID?Use Unique partition key and RowKey = ‘ ‘ for a partition of 1

Horizontal – SQL Azure

1 2 … 13 … 26

SQL Azure – Key PointsPartition for: Data volume > 50GBTransaction throttle (non deterministic)Always code for retry

All partition logic up to the developerAlgorithmicLookup based

Partitions are not Auto-BalancedNeed to aim for ‘equal’ partitions‘Equal’ not necessarily the same size

Choosing a Partition KeyNatural KeysCountryFirst letter, last nameDate

MathematicalHash functionsModulo operator

Lookup BasedLookup table to resolve value to partitions

Using ModuloThe remainder of a divisionNice properties for partitioning:Given two positive integers M and NM mod N will return a number between 0 and N-1

Want equi-sized partitions?Given an appropriate distribution of M we will get N ‘equally full’ buckets.

demo

Using Hash ValuesUsing a hash function projects one distribution into anotherUse a hash function that projects a random distributionDo NOT use a cryptographic hash functionBe careful if using Object.GetHashCode()Boxed types may return different value to un-boxed equivalent

Re-partition all data

Version partitioning scheme

Partition Stability Over TimeMay need to change partitioning scheme

Two options:

e.g. <Version><PartitionKey><v1><A3E567D7D8C68789><v2><A8B978C8B6D77836>

wherev1 = GUID mod 4v2 = GUID mod 101 2

Just In Time PartitioningIn SQL Azure Partitions Cost MoneyIn highly elastic scenarios partitions may be needed for just a few hours or daysIf load is predictablePartition before load commencesDe-partition after load has subsided


Goals for Vertical Partitioning

SQL AzureFully indexableNo query transaction charge$9.99/GB/Month

Balance Performance vs. CostUse appropriate storage for type of data

Windows Azure StorageLimited IndexingPay per Query$.15/GB/Month

Vertical PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Tables or SQL Azure Tables or Blobs BLOBs

Worked ExampleSearchable Data in Table Storage or SQL AzureIndexed (SQL Azure)No cost per query (SQL Azure)Lower cost storage (Windows Azure Table Storage)

Thumbnails in TablesBinary Properties < 64kbBatch queries saves transaction costs

Full Photos in Windows Azure Blob StorageCan handle large dataCan stream full sized files direct back to HTTP client via CDN if needed

Non-Relational Data Modeling

Storage is cheap

Tables != RDBMSCross partition queries are resource intensive

Aggressive data duplication can save money and boost performance

Goal: To be able to include Partition Key in all queries

E.g. Tweet Storage

TweetIDUserIDDateTimeStampMessage

With an RDBMS you’d probably start something like this:SELECT * FROM Tweet WHERE Message Like %SearchTerm%

E.g. Tweet Storage

…Message

TweetIDWordID

WordIDWord (IX)

…Message

TweetIDWord (IX)

E.g. Tweet Storage

TweetID (RK)UserID (PK)DateTimeStampMessage

TweetID (RK)UserIDDateTimeStampMessageWord (PK)

E.g. Tweet Storage

TweetID (RK)UserID (PK)DateTimeStampMessage

TweetID (RK)UserIDDateTimeStampMessageUserID (PK)

Modeling In TablesCurrently no secondary indexes (coming)Be careful to minimize cross partition queries

Build indexes yourselfConcentrate on useful partition keys

If associated data is small enoughSave additional queriesDuplicate data with each index

Upgrade Scenarios for the Data Tier

Adding non-key propertiesTwo step upgrade processUse ADO.NET’s “IgnoreMissingProperties”

Entity Shape ChangeHave a version property in each entity

Types of Shape Change:Removing non-key propertiesSimilar two step process to addingIn addition use ADO.NET’s “ReplaceOnUpdate”

Changing Partition key or Row keyCopy entities to a new table

Adding Additional Property

PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …

Release new version of Table Schema with NEW Property

Upgrade Client to v1.5

PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …

v1.5 ClientIf entity is v1:Store a default valueDo not upgrade the entity

v1 ClientIgnores new property, because it uses “IgnoreMissingProperties”

Default

Upgrade Client to v2

PK1 RK1 1PK2 RK2PK3 RK3 1... … …

v2 ClientStarts using real values for new propertyUpdates entity to v2

v1.5 ClientUnderstands v1 and v2

DefaultValue 121DefaultValue 2

Upgrade Entities to v2

PK1 RK1PK2 RK2 Value 2PK3 RK3... … …

Use a background job to update version number of all entities

2

221

1

SummaryPartitioning Data Key to Cloud Scale AppsHorizontally Partition for Scale OutVertically Partition for Cost/PerformanceChoose appropriate partition keys Table storage requires different approach to data modeling Don’t be afraid to aggressively de-normalize and duplicate data

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.

Date post:	23-Feb-2016
Category:	Documents
Upload:	krista
View:	57 times
Download:	0 times

Storage Strategies

Documents