Post on 22-Feb-2020
transcript
Agenda
Disk Storage Technology in 2010 and Beyond
Large Mailbox Value
Store and ESE Database Innovations
Designing Exchange 2010 Storage Solutions
Disk Storage Technology in 2010 and Beyond
Disk Capacity trend predicted to continue2 TB desktop-class SATA disks available (3-4 TB next year)
1 TB Nearline/Midline SAS disk available (2 TB end of year)
Sequential I/O throughput increasing linearly based on areal density (2010 SATA = ~250 MB/sec)
Random I/O performance not expected to improve substantially (15K RPM is the ceiling)
Solid State Disks (SSD)/FlashHigh $/GB, low $/IO
Write performance improving
Reliability mostly addressed with wear leveling
Random vs. Sequential Disk IO
Random IODisk head has to move to process subsequent IO
Head movement = High IO latency
Seek Latency limits IOPS
Sequential IODisk head does not move to process subsequent IO
Stationary Head = Low IO latency
Disk RPM speed limits IOPS
7.2K RPM SATA Disk (20 ms latency)Random = 50 IOPSSequential = +300 IOPS!
IOPS = Input/Outputs(IO’s) per second
Disk Head
E-mail Trends
E-mail volume is growing
Users expect larger corporate mailboxes
E-mail is business critical• Time loss after a failure is measured in seconds
• Data loss after a failure needs to be close to zero
Business users report that they currently spend 19% of their work day, or close to 2 hours/day, on e-mail. (Radicati, 2007)
The average corporate user, today, can expect to send and receive about 156 messages a day, and this number is expected to grow to about 233 messages a day by 2012. An increase of 33% over the four-year period. (Radicati, 2008)
Large Mailbox Value
Large Mailbox: 1-10 GB+Aggregate Mailbox = Primary mailbox + Archive Mailbox
~1 Year of mail (minimum)
Increased knowledge worker productivity
Reduced mailbox management
Client Accessibility (Outlook/OWA/Mobile)
Eliminate/Reduce PSTs
Time ItemsMailbox Size (MB)
1 Day 200 15
1 Month 4000 ~300
1 Year 52,000 ~3800
4 Years 208,000 ~15000
160 Receive + 40 Send/Day Profile, 75KB, no deletions, 5-day work week
Large Mailbox Challenges & SolutionsClient Experience
Performance Improvements: Office 2007 SP2 (KB 953195)
Updated OST sizing guidance (10 GB)
Utilize the Archive Mailbox to reduce data cached to OST
Store/ESE changes
Outlook 2007 Performance (Cached Exchange Mode)
Outlook 2007 (Online)/OWA Performance
Items/folder LimitationsView Creation Performance
Client Search Performance
Store/ESE changes
Search Performance Improvements
Real-time result views
2x increase in indexing performance
Store/ESE changes
Large Mailbox Challenges & SolutionsDeployment/Operations Backup off passive copies
Daily Incremental/Weekly Full backups
DPM Express Full Backups
HA + Hold Policy is your backup
Long Backup Times
Fast Recovery Requirements (RTO)
High Storage CostsIOPS (efficiently utilizing low performance/high capacity disks)RAID overhead
HA
Store/ESE changes
Move Mailbox Downtime Online Move Mailbox
Database MaintenanceOnline Maintenance Duration (OLD)
DB corruption (-1018) pain point
DB re-seed performance hit on active copy
Store/ESE changes
Exchange 2010 Storage Vision
IO ReductionSequential IO
Large, Fast, Low-cost Mailboxes
SATA/Tier 2 Disk Optimization
Storage Design Flexibility
RAID’less Storage (JBOD)
Store Schema ChangesIOPS Reductions
Store Schema: the way the Store organizes data in the ESE Database
One simple theme: move away from doing many, random, small size, disk IOs to doing fewer, sequential, large size, disk IOs
Store Schema ChangesIOPS Reductions
Significant benefits, including fast/efficient…OWA/Outlook Online Mode
…end user viewing for “cold” states/first time view creation
…Calendar Operations
…Search performance
Outlook Cached Mode/Exchange ActiveSyncOST sync = sequential IO
EAS sync = sequential IO
Server Management…Move mailbox
…Content Index Crawls
Store Table ArchitectureIOPS Reduction
E2007
Message/Folder Table (MFT)
Joe:Inbox:H3
Joe:Inbox:H2
Joe:Inbox:H1
Per Database Per Folder
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Attachments Table
Jeff:Excel.xls
Ann:Pic.bmp
Joe:Help.doc
Message Table (Msg)
Joe:Msg10
Jeff:Msg32
Ann:Msg180
Folders Table
Jeff:Inbox
Ann:Drafts
Joe:Unread
E2010
View Tables (e.g. From)
Joe:H920
Joe:H302
Joe:H10
Secondary Indexes used for Views
Per Mailbox
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Body
Joe:Msg10
Joe:Help.doc
Joe:Msg302
Message Header Table
Joe:H10
Joe:H302
Joe:H920
Folders Table
Joe:Inbox
Joe:Drafts
Joe:Unread
Per Database
New Store Schema = no more single instance storage within a DB
Per View
Store Schema ChangesPhysical Contiguity
1078
B+ Tree
92 4577 6 872 7210 3278 21 9346
1078
B+ Tree
1079 1080 1081 1082 1083 3456 3457 3458
Ex2007
Ex2010
Many, small size, IOs (1 per 8K page)
Fewer, larger size, sequential IOs
DB Pages (Page Numbers)
B+Tree = Table
Store Schema ChangesLazy View Updates
Ex2007
Ex2010
Many, random, IOs (1 per update)
Fewer, sequential, IOs (1 per view)
All Unread or Flagged items (view)
TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted
User uses OWA/Outlook Online and switches to this view
All Unread or Flagged items (view)
M1 M2 M1 M3 M2
M1 M2 M1 M3 M2
Nickel & Dime Approach
Pay to Play Approach
DB I/O
Reducing IO by deferring view updatesView updates utilize sequential IO
ESE ChangesIOPS Reductions
Optimize for new Store SchemaSpace Hints (allocate database space in contiguous manner)
Re-wrote how database maintenance works (maintain database contiguity)
Utilize space efficiently (Database compression)
Increase IO SizesDB page size increased from 8 KB to 32 KB
Improved read/write IO coalescing (Gap coalescing)
Provide improved async read capability (Pre-read)
Increase Cache Effectiveness100 MB Checkpoint Depth (HA configurations only)
DB Cache Compression (Dehydration)
DB Cache Priority (Fast Evict)
Space ManagementAllocate space based on contiguity
Page 1
Used
Page 3
Used
Disk
Database Space Allocation Hints:• Allocate DB space based on either data compactness or data contiguity (usage pattern)
DB Cache
Page X
Msg Header
Page Y
Msg Header
Page Z
EventHistory
Contiguity
Space Contiguity
Space Compactness
Page 4
Msg Header
Page 5
Msg Header
Page 2
EventHistory
Sequential/BloatRandom/Compact
Maintain ContiguityIOPS Reductions
New database maintenance architectureESE Function Exchange 2007 SP1 Exchange 2010
Cleanup(deleted items/mailboxes)
Cleanup performed during Online Defrag (OLD) which occurs during Online Maintenance (OLM) time window
ESE performs cleanup at run time (when store hard delete occurs). Happens during Store dumpster cleanup (OLM), pages are zeroed by default.
Space Compaction Database is compacted and space reclaimed during Online Defrag (OLD)
Database is compacted and space reclaimed at run-time by OLD2. Auto-throttled.
Maintain Contiguity N/A: Contiguity is compromised by space compaction
Database is analyzed for contiguity andspace at run time and is defragmented in the background (B+Tree Defrag/OLD2). Auto-throttled.
Database Checksum When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle. Active DB copy only.
Two options (both Active and Passive copies):1. Run DB Checksum in the
background 24x7 (default). Sequential IO
2. Run DB Checksum during OLM window. Sequential IO
DB Contiguity ResultsIOPS Reductions
Ex2007 Message Folder Table (aka MFT)
Ex2010 Message Header Table (aka MsgHeader)
Blue = contiguous (good)Red = fragmented (bad)
*Production database analysis
Random Deletes at the tail
FRAGMENTED
CONTIGUOUS
DB Page Numbers
Database CompressionMitigate DB Space Growth
Store Schema change, Space Hints, B+Tree Defrag & 32KB page size combine to increase DB file size by 20%.
Growth is 100% mitigated by Database Compression7bit/XPRESS (based on LZ77) Compression for message headers and text/html bodies (Long Values)
1.001.20
1.000.88
0.00
0.50
1.00
1.50
E2007/RTF E2010/RTF E2010/Mix E2010/HTML
1 Database, 750 x 250MB mailboxes, RTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text, Avg. Message size = ~50KB
DB File Size Comparison
DB Page Size Increased to 32 KBIOPS Reductions
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
MsgBody
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
MsgBody
3 Read IOs
Page 1 (32 KB)
Msg Header, Msg Body
Disk
DBCache
1 Read IO
Ex2007 DB Read20 KB Message
Ex2010 DB Read20 KB Message
8 KB Pages
32 KB Pages
Page 2 (32 KB)
X
Page 1 (32 KB)
Msg Header, Msg Body
IO Gap CoalescingIOPS Reduction: Read Case
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
Ex2007 DBRead Behavior
Ex2010 DBRead Behavior
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
3 Read IOs
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
Page 2
TempBuffer
Page 4
TempBuffer
1 Read IO
100 MB Checkpoint Depth (Active Copies)IOPS Reductions
Checkpoint Depth: amount of data waiting to be committed to the database file (edb)
Default Checkpoint Depth Max increased from 20 MB to 100 MB for active mailbox databases configured for HA (non-HA is 20 MB, passive is 5 MB)
Loadgen Test: 3000 Mailbox, 12 DB, Outlook 2007 Online Very Heavy Profile
0
20
40
60
80
100
120
20 40 60 80 100
Database Pages Repeatedly Written/sec
DB Writes/sec (avg)
Checkpoint Depth (MB)
100MB Checkpoint Depth = 40% DB write IO reduction
Deep Checkpoint Benefit = Efficient DB writes (~40% reduction)
Deep Checkpoint Risks: long store shutdown times, long crash recovery times
Risk Mitigation: shutdown databases in parallel, failover on store crash
DB Cache CompressionIOPS Reductions
Problem: New Store Schema + 32 KB pages can reduceefficiency of cache (e.g., a page with 8 KB of data consumes 32 KB of memory in the DB Cache)
Solution: Implement DB Cache Compression to shrink partially used cached pages in memory; allowing more Effective cache.
Page 1 (32 KB)
8 KB
Disk
DBCache
Page 1 (32 KB)
8 KB
1. 32 KB Page with only 8 KB of data is read off disk
2. 32 KB page is compressed to an 8 KB in-memory image
Up to 30% more cache/mailbox server
More Cache = Less DB IO!
Page 1 (8 KB)
8 KB
DB Cache PriorityIOPS ReductionsProblem: Background and recovery DB operations can pollute the cache (e.g., checksumming, OLD2, HA log replay)
Solution: Implement DB Cache Priority to allow lower cache priorities for background/replay operations
Now Past Future
DB Cache Time
Outlook Message Read
HA Log Replay (Passive)
DB Maintenance
Cache Eviction Cache Entry
ESE Caching Algorithm = LRU-K (Least Recently Used)
Optimize for SATA/Tier 2 DisksDB Write IO “Burstiness”Problem: Bursty DB writes negatively affects DB read and Log write latency because the more write IOs issued at a time, the more disk contention
0
20
40
60
80
100
120
2 4 8 16 32 64
IO Latency Based on Max DB Write IOs (ms)
Maximum DB Write IOs Issued
Latency (ms)
DB Read IO
Single 7.2k SATA disk, logs/db on same spindle, Loadgen load generating 250 RPC Operations/second, ~50 IOPS
Log Write IO
Solution: Throttle DB writes based on Checkpoint target (QoS), DB Write Smoothing
DB Write Smoothing Results
3000 Mailboxes, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile
0
5
10
15
20
25
30
35
40
45
50
Exchange 2010 Baseline Exchange 2010 Smooth DB IO
49
34
3.70.7
10.1
5.1
Ex2010 Smooth DB IO Benefit
DB Read Latency (ms)
Log Write Latency (ms)
RPC Average Latency
50% Reduction!
Ex2007 vs. Ex2010 IOPS Reduction Results
0
50
100
150
200
250
300
350
400
450
500
E2007 E2010
DB IOPS Comparison
DB Read IO/Sec
DB Write IO/Sec
DB IO/Sec
+70% Reduction!
3000 Mailboxes, 3MB DB Cache/user, Loadgen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size, E2010 Beta
Exchange IOPS Trend
0
0.2
0.4
0.6
0.8
1
Exchange 2003 Exchange 2007 Exchange 2010
DB IOPS/Mailbox
Exchange 2003
Exchange 2007
Exchange 2010
+ 90% Reduction!
JBOD/RAID'less Storage: Now an option!
JBOD : 1 disk = 1 Database/Log
Requires Exchange 2010 HA (3+ DB Copies)
Annual Disk Failure Rate (AFR) = ~5%
JBOD AdvantagesReducing Storage Costs/Complexity
Eliminates unnecessary DB copies: Server and Storage redundancy can be symmetrical
Reduces Disk IO: Eliminates RAID write penalty
Enables Simple Storage Design: 1 disk = 1 database
Enables Simple Storage Failure Recovery
JBOD ChallengesExchange HA/Storage must replace RAID functionality
Disk Striping performance (e.g. RAID10) cannot be leveraged
Disk Failure = Database Failover (~30 second outage)
Re-enabling Resiliency = Spare disk assignment/partitioning/format/DB re-seed (scriptable)
Soft Disk Errors (bad blocks) must be detected and repaired
Mailbox Server Node 1
Mailbox Server Node 2
Database Availability Group (DAG)
Page1
Page2
Page3
Mailbox Server Node 3
1. Page corruption detected on Active Copy (e.g. -1018)
2. Active DB places marker in log stream to notify passive copies to ship up to date page
3. Passive receives log and replays up to marker, retrieves good page, invokes Replication service callback and ships page
4. Active receives good page, writes page to DB. Page is restored.
DB1-Active
Database
Log
Page1
Page2
Page3
DB1-CopyA
Database
Log
Page1
Page2
Page3
DB1-CopyB
Database
Log
5. Subsequent page repair from additional copies ignored
JBOD/RAID'less Storage: Single Page Restore (Active)
Storage Design Flexibility
SAN DAS (SAS) JBOD (SATA/Tier2)
• HA = Shared Storage Cluster• +1.0 IOPS/Mailbox• 3.5” 15K 146GB FC Disks• RAID10 for DB & Logs• Dedicated Spindles• Multi-path (HBA’s, FC Switches, SAN array controllers)• Backup = Streaming off active • Fast Recovery = Hardware VSS (Snapshots/Clones)
• HA = CCR• .33 IOPS/Mailbox• 2.5” 146GB 10K SAS Disks• RAID5 for DB• RAID10 for Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot• Fast Recovery = CCR
• HA = DAG (2+ DB copies)• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• RAID10 for DB & Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover
DAS (SATA/Tier2)
• HA = DAG (3+ DB copies)• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• 1 DB = 1 Disk• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover
More options to reduce storage cost
Storage Design Flexibility
Personal Archive provides mailbox storage flexibility
Exchange 2010 supportsDAS, SAN, JBOD*, RAID, SATA class, Enterprise Class, SSD**
Exchange 2010 has been optimized for DAS storage and Tier 2 (SATA) disks
IOPS reductions/SATA optimizations enable lower performing storage!
Maximum number of databases/server = 100
Max recommended DB Size = 2 TB*
Max recommended Folder Item Count = 100,000***
* 2+ HA copies only** Not recommended for mainstream due to high $/GB
*** Assuming no 3rd party applications
Exchange 2010 Storage Requirements
Storage Guidance Stand Alone HA (2 copies) HA (3+ copies)
Storage Type DAS, SAN (Fibre Channel, iSCSI)
Disk Type SAS, Fibre Channel, SATA/Tier2 , SSD
RAID RAID recommended RAID optional
RAID Type RAID-1/0, RAID-5, RAID-6 JBOD
DB/Log Isolation Best Practice Not required
Windows Disk Type Basic (recommended), Dynamic (supported)
Partition Type GPT (recommended), MBR (supported)
Partition Alignment Windows 2008/R2 Default (1 MB)
File System NTFS
NTFS Allocation Unit Size 64 KB for both database and log volumes
Encryption Support Outlook Protection Rules, Bitlocker
See Appendix for full details
HA/JBOD ExampleSingle Site, 3 Node, 3 Copy DAG
DB1
DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
Legend
Active copy Passive copy Spare Disk
DB1
DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
DB1
DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
Mbx Server 1
10,000 Mailboxes
3,333 Active Mailboxes/Server
3 Nodes, 3 Copies = double disk failure resiliency
8 Cores32 GB RAM
8 Cores32 GB RAM
8 Cores32 GB RAM
2 GB Mailbox Size
.11 IOPS/Mailbox
1 TB 7.2k RPM disks (SAS/SATA/Tier2)
Online Spares
Battery BackedCaching ArrayController
120 Messages/day
JBOD: 30 disks/node
Database Availability Group (DAG)
Mbx Server 2 Mbx Server 3
Key Takeaways
Exchange Server 2010…Reduces DB IOPS by +70%...again!
Optimizes for large mailboxes (10 GB+) and 100,000 Item counts
Improved performance for SATA (Tier 2 class) disks
Enables JBOD/RAID'less scenarios
Enables unmatched storage flexibility to reduce costs
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.