+ All Categories
Home > Documents > Storage Strategies

Storage Strategies

Date post: 23-Feb-2016
Category:
Upload: krista
View: 57 times
Download: 0 times
Share this document with a friend
Description:
Storage Strategies. Name Title Microsoft Corporation. Agenda. Partitioning Horizontal Partitioning Vertical Partitioning Non-Relational Data Modeling Upgrade Scenarios for the Data Tier. Outline. Data Partitioning Vertical Partitioning Horizontal Partitioning Partitioning in: - PowerPoint PPT Presentation
Popular Tags:
39
Storage Strategies Name Title Microsoft Corporation
Transcript
Page 1: Storage Strategies
Page 2: Storage Strategies

AgendaPartitioningHorizontal PartitioningVertical PartitioningNon-Relational Data ModelingUpgrade Scenarios for the Data

Tier

Page 3: Storage Strategies

OutlineData PartitioningVertical PartitioningHorizontal Partitioning

Partitioning in:Windows Azure StorageSQL Azure

Windows Azure TablesData modelingUpgrade scenarios

Page 4: Storage Strategies

Why PartitionData Volume (too many bytes)

Work Load (too many transactions/second)

Cost (using different cost storage)

Elasticity (just in time partitioning for high load periods)

Page 5: Storage Strategies

Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Page 6: Storage Strategies

Horizontal Partitioning (Sharding)Spread Data Across Similar Nodes

Achieve Massive Scale Out (Data and Load)

Intra-Partition Queries Simple

Cross-Partition Queries Harder

Page 7: Storage Strategies

David Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Vertical Partitioning

SQL Azure Tables BLOBS

Page 8: Storage Strategies

Spread Data Across Dis-Similar Nodes

Place frequently queried data in more ‘expensive’ indexed storage

Place large data in ‘cheap’ binary storage

Retrieving

a whole row requires >1 query

Vertical Partitioning

Page 9: Storage Strategies

Horizontal PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Page 10: Storage Strategies

Horizontal Partitioning

Page 11: Storage Strategies

Table Storage – Key PointsPartitions are Auto-BalancedNo need to partition into equal bins

Hot partitions may be scaled upWindows Azure fabric may dedicate more resources to partitions with high Tx load

Partition Key AND Row Key = Unique IDMust include Partition Key for Create, Update, DeleteSelect queries across partitions run sequentially

Don’t use sequential partition keys

Page 12: Storage Strategies

Table Storage – Key PointsContinuation Tokens May Be Returned from Cross Partition QueriesAny query not including the Rowkey and PartitionKey (only those as well) needs to handle Continuation tokenshttp://tinyurl.com/ContToken

Key Columns Up to 1KB in sizeShould aim to keep to 260 char URI limit

Be aggressive e.g. Only ever query by an ID?Use Unique partition key and RowKey = ‘ ‘ for a partition of 1

Page 13: Storage Strategies

Horizontal – SQL Azure

1 2 … 13 … 26

Page 14: Storage Strategies

SQL Azure – Key PointsPartition for: Data volume > 50GBTransaction throttle (non deterministic)Always code for retry

All partition logic up to the developerAlgorithmicLookup based

Partitions are not Auto-BalancedNeed to aim for ‘equal’ partitions‘Equal’ not necessarily the same size

Page 15: Storage Strategies

Choosing a Partition KeyNatural KeysCountryFirst letter, last nameDate

MathematicalHash functionsModulo operator

Lookup BasedLookup table to resolve value to partitions

Page 16: Storage Strategies

Using ModuloThe remainder of a divisionNice properties for partitioning:Given two positive integers M and NM mod N will return a number between 0 and N-1

Want equi-sized partitions?Given an appropriate distribution of M we will get N ‘equally full’ buckets.

Page 17: Storage Strategies

demo

Page 18: Storage Strategies

Using Hash ValuesUsing a hash function projects one distribution into anotherUse a hash function that projects a random distributionDo NOT use a cryptographic hash functionBe careful if using Object.GetHashCode()Boxed types may return different value to un-boxed equivalent

Page 19: Storage Strategies

Re-partition all data

Version partitioning scheme

Partition Stability Over TimeMay need to change partitioning scheme

Two options:

e.g. <Version><PartitionKey><v1><A3E567D7D8C68789><v2><A8B978C8B6D77836>

wherev1 = GUID mod 4v2 = GUID mod 101 2

Page 20: Storage Strategies

Just In Time PartitioningIn SQL Azure Partitions Cost MoneyIn highly elastic scenarios partitions may be needed for just a few hours or daysIf load is predictablePartition before load commencesDe-partition after load has subsided

Page 21: Storage Strategies

Vertical Partitioning

Page 22: Storage Strategies

Goals for Vertical Partitioning

SQL AzureFully indexableNo query transaction charge$9.99/GB/Month

Balance Performance vs. CostUse appropriate storage for type of data

Windows Azure StorageLimited IndexingPay per Query$.15/GB/Month

Page 23: Storage Strategies

Vertical PartitioningDavid Alexander [email protected] 3kb 3MBJared Carlson [email protected] 3kb 3MBSue Charles [email protected] 3kb 3MBSimon Mitchel [email protected] 3kb 3MBRichard Zeng [email protected] 3kb 3MB

Tables or SQL Azure Tables or Blobs BLOBs

Page 24: Storage Strategies

Worked ExampleSearchable Data in Table Storage or SQL AzureIndexed (SQL Azure)No cost per query (SQL Azure)Lower cost storage (Windows Azure Table Storage)

Thumbnails in TablesBinary Properties < 64kbBatch queries saves transaction costs

Full Photos in Windows Azure Blob StorageCan handle large dataCan stream full sized files direct back to HTTP client via CDN if needed

Page 25: Storage Strategies

Non-Relational Data Modeling

Page 26: Storage Strategies

Storage is cheap

Tables != RDBMSCross partition queries are resource intensive

Aggressive data duplication can save money and boost performance

Goal: To be able to include Partition Key in all queries

Page 27: Storage Strategies

E.g. Tweet Storage

TweetIDUserIDDateTimeStampMessage

With an RDBMS you’d probably start something like this:SELECT * FROM Tweet WHERE Message Like %SearchTerm%

Page 28: Storage Strategies

E.g. Tweet Storage

…Message

TweetIDWordID

WordIDWord (IX)

…Message

TweetIDWord (IX)

Page 29: Storage Strategies

E.g. Tweet Storage

TweetID (RK)UserID (PK)DateTimeStampMessage

TweetID (RK)UserIDDateTimeStampMessageWord (PK)

Page 30: Storage Strategies

E.g. Tweet Storage

TweetID (RK)UserID (PK)DateTimeStampMessage

TweetID (RK)UserIDDateTimeStampMessageUserID (PK)

Page 31: Storage Strategies

Modeling In TablesCurrently no secondary indexes (coming)Be careful to minimize cross partition queries

Build indexes yourselfConcentrate on useful partition keys

If associated data is small enoughSave additional queriesDuplicate data with each index

Page 32: Storage Strategies

Upgrade Scenarios for the Data Tier

Page 33: Storage Strategies

Adding non-key propertiesTwo step upgrade processUse ADO.NET’s “IgnoreMissingProperties”

Entity Shape ChangeHave a version property in each entity

Types of Shape Change:Removing non-key propertiesSimilar two step process to addingIn addition use ADO.NET’s “ReplaceOnUpdate”

Changing Partition key or Row keyCopy entities to a new table

Page 34: Storage Strategies

Adding Additional Property

PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …

Release new version of Table Schema with NEW Property

Page 35: Storage Strategies

Upgrade Client to v1.5

PK1 RK1 1PK2 RK2 1PK3 RK3 1... … …

v1.5 ClientIf entity is v1:Store a default valueDo not upgrade the entity

v1 ClientIgnores new property, because it uses “IgnoreMissingProperties”

Default

Page 36: Storage Strategies

Upgrade Client to v2

PK1 RK1 1PK2 RK2PK3 RK3 1... … …

v2 ClientStarts using real values for new propertyUpdates entity to v2

v1.5 ClientUnderstands v1 and v2

DefaultValue 121DefaultValue 2

Page 37: Storage Strategies

Upgrade Entities to v2

PK1 RK1PK2 RK2 Value 2PK3 RK3... … …

Use a background job to update version number of all entities

2

221

1

Page 38: Storage Strategies

SummaryPartitioning Data Key to Cloud Scale AppsHorizontally Partition for Scale OutVertically Partition for Cost/PerformanceChoose appropriate partition keys Table storage requires different approach to data modeling Don’t be afraid to aggressively de-normalize and duplicate data

Page 39: Storage Strategies

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.


Recommended