AgendaPartitioningHorizontal PartitioningVertical PartitioningNon-Relational Data ModelingUpgrade Scenarios for the Data
Tier
OutlineData PartitioningVertical PartitioningHorizontal Partitioning
Partitioning in:Windows Azure StorageSQL Azure
Windows Azure TablesData modelingUpgrade scenarios
Why PartitionData Volume (too many bytes)
Work Load (too many transactions/second)
Cost (using different cost storage)
Elasticity (just in time partitioning for high load periods)
Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB
Horizontal Partitioning (Sharding)Spread Data Across Similar Nodes
Achieve Massive Scale Out (Data and Load)
Intra-Partition Queries Simple
Cross-Partition Queries Harder
David Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB
Vertical Partitioning
SQL Azure Tables BLOBS
Spread Data Across Dis-Similar Nodes
Place frequently queried data in more ‘expensive’ indexed storage
Place large data in ‘cheap’ binary storage
Retrieving
a whole row requires >1 query
Vertical Partitioning
Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB
Horizontal Partitioning
Table Storage – Key PointsPartitions are Auto-BalancedNo need to partition into equal bins
Hot partitions may be scaled upWindows Azure fabric may dedicate more resources to partitions with high Tx load
Partition Key AND Row Key = Unique IDMust include Partition Key for Create, Update, DeleteSelect queries across partitions run sequentially
Don’t use sequential partition keys
Table Storage – Key PointsContinuation Tokens May Be Returned from Cross Partition QueriesAny query not including the Rowkey and PartitionKey (only those as well) needs to handle Continuation tokenshttp://tinyurl.com/ContToken
Key Columns Up to 1KB in sizeShould aim to keep to 260 char URI limit
Be aggressive e.g. Only ever query by an ID?Use Unique partition key and RowKey = ‘ ‘ for a partition of 1
Horizontal – SQL Azure
1 2 … 13 … 26
SQL Azure – Key PointsPartition for: Data volume > 50GBTransaction throttle (non deterministic)Always code for retry
All partition logic up to the developerAlgorithmicLookup based
Partitions are not Auto-BalancedNeed to aim for ‘equal’ partitions‘Equal’ not necessarily the same size
Choosing a Partition KeyNatural KeysCountryFirst letter, last nameDate
MathematicalHash functionsModulo operator
Lookup BasedLookup table to resolve value to partitions
Using ModuloThe remainder of a divisionNice properties for partitioning:Given two positive integers M and NM mod N will return a number between 0 and N-1
Want equi-sized partitions?Given an appropriate distribution of M we will get N ‘equally full’ buckets.
demo
Using Hash ValuesUsing a hash function projects one distribution into anotherUse a hash function that projects a random distributionDo NOT use a cryptographic hash functionBe careful if using Object.GetHashCode()Boxed types may return different value to un-boxed equivalent
Re-partition all data
Version partitioning scheme
Partition Stability Over TimeMay need to change partitioning scheme
Two options:
e.g. <Version><PartitionKey><v1><A3E567D7D8C68789><v2><A8B978C8B6D77836>
wherev1 = GUID mod 4v2 = GUID mod 101 2
Just In Time PartitioningIn SQL Azure Partitions Cost MoneyIn highly elastic scenarios partitions may be needed for just a few hours or daysIf load is predictablePartition before load commencesDe-partition after load has subsided
Vertical Partitioning
Goals for Vertical Partitioning
SQL AzureFully indexableNo query transaction charge$9.99/GB/Month
Balance Performance vs. CostUse appropriate storage for type of data
Windows Azure StorageLimited IndexingPay per Query$.15/GB/Month
Vertical PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB
Tables or SQL Azure Tables or Blobs BLOBs
Worked ExampleSearchable Data in Table Storage or SQL AzureIndexed (SQL Azure)No cost per query (SQL Azure)Lower cost storage (Windows Azure Table Storage)
Thumbnails in TablesBinary Properties < 64kbBatch queries saves transaction costs
Full Photos in Windows Azure Blob StorageCan handle large dataCan stream full sized files direct back to HTTP client via CDN if needed
Non-Relational Data Modeling
Storage is cheap
Tables != RDBMSCross partition queries are resource intensive
Aggressive data duplication can save money and boost performance
Goal: To be able to include Partition Key in all queries
E.g. Tweet Storage
TweetIDUserIDDateTimeStampMessage
With an RDBMS you’d probably start something like this:SELECT * FROM Tweet WHERE Message Like %SearchTerm%
E.g. Tweet Storage
…Message
TweetIDWordID
WordIDWord (IX)
…Message
TweetIDWord (IX)
E.g. Tweet Storage
TweetID (RK)UserID (PK)DateTimeStampMessage
TweetID (RK)UserIDDateTimeStampMessageWord (PK)
E.g. Tweet Storage
TweetID (RK)UserID (PK)DateTimeStampMessage
TweetID (RK)UserIDDateTimeStampMessageUserID (PK)
Modeling In TablesCurrently no secondary indexes (coming)Be careful to minimize cross partition queries
Build indexes yourselfConcentrate on useful partition keys
If associated data is small enoughSave additional queriesDuplicate data with each index
Upgrade Scenarios for the Data Tier
Adding non-key propertiesTwo step upgrade processUse ADO.NET’s “IgnoreMissingProperties”
Entity Shape ChangeHave a version property in each entity
Types of Shape Change:Removing non-key propertiesSimilar two step process to addingIn addition use ADO.NET’s “ReplaceOnUpdate”
Changing Partition key or Row keyCopy entities to a new table
Adding Additional Property
PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …
Release new version of Table Schema with NEW Property
Upgrade Client to v1.5
PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …
v1.5 ClientIf entity is v1:Store a default valueDo not upgrade the entity
v1 ClientIgnores new property, because it uses “IgnoreMissingProperties”
Default
Upgrade Client to v2
PK1 RK1 1PK2 RK2PK3 RK3 1... … …
v2 ClientStarts using real values for new propertyUpdates entity to v2
v1.5 ClientUnderstands v1 and v2
DefaultValue 121DefaultValue 2
Upgrade Entities to v2
PK1 RK1PK2 RK2 Value 2PK3 RK3... … …
Use a background job to update version number of all entities
2
221
1
SummaryPartitioning Data Key to Cloud Scale AppsHorizontally Partition for Scale OutVertically Partition for Cost/PerformanceChoose appropriate partition keys Table storage requires different approach to data modeling Don’t be afraid to aggressively de-normalize and duplicate data
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION
IN THIS PRESENTATION.