+ All Categories
Home > Documents > Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Date post: 02-Apr-2015
Category:
Upload: marie-vowels
View: 228 times
Download: 3 times
Share this document with a friend
49
SQL Server High Availability: Overview, Considerations, and Solution Guidance Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207
Transcript
Page 1: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

SQL Server High Availability: Overview, Considerations, and Solution GuidanceVineet RaoLead Program ManagerMicrosoft Corporation

SESSION CODE: DAT207

Page 2: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

REVENUE

COMPLIANCE24X7 GLOBAL BUSINESS

GROWTH

Application Trends

Page 3: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.
Page 4: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Analysis

Solution Design

Implementation

Testing

Maintenance

Deployments and Best Practices

Ensuring IT servicesand operationalcontinuity in theenterprise

Protect missioncritical SQL Serverdatabases using Always On Technologies

What is this talk about?

Page 5: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period

Disaster Recovery involves processes and procedures designed to restore business operations due to a natural or human-induced disaster

Typically involves providing redundancy spanning multiple sites or across geographic regions

Availability defined in terms of service level agreements (SLA)

Recovery TimeData loss during unplanned downtime

Recovery Time Objective (RTO) guided by availability requirements

How much downtime can you tolerate?

Recovery Point Objective (RPO) guided by criticality of application data

How much data can you lose?

AvailabilityClass

Acceptable Downtime (hrs/yr) OR RTO

Acceptable Data Loss (time of last copy) OR RPO

Tier 1 >99.99%(1 hr or less)

5 min or less

Tier 2 99.9% - 99.99% (1- 8.5 hrs) 5 mins to 8.5 hrs

Tier 3 (<99.9%)(Hours to days)

Hours to days

Defining HA and DR

Page 6: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Protection against resource failuresMachine Database CorruptionDiskResource Bottlenecks

Location RedundancyBuilding< 10 miles

Local HA

Regional DR

Geographic DR

Protection against Network Outages Site Failures

Location Redundancy

– City, County– < 100

Protection against Natural Disasters

Location Redundancy

– State, Country– > 100 miles

Protection Levels

Page 7: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Database Downtime

Unplanned Downtime

FailureProtection

User Errors

Planned Downtime

Upgrade and Migrations

Online Administration

Predictable Resourcing

Database Downtime Drivers

Page 8: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Always On Technologies

Unplanned DowntimeBackup and RestoreLog ShippingDatabase MirroringFailover ClusteringReplication

Planned DowntimeRolling Upgrades and PatchingOnline OperationsResource GovernorDatabase Snapshots

Provides a full range of options to minimize operational downtime and maintain appropriate levels of application availability.

Page 9: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Backup and RestoreQuick Summary

Base availability technology for any solutionProtects against failures and recovery from errorsLocal HA and Site DRRequires restore for recovery (high RTO)

Types: Full, Differential, and Transaction LogFilegroup backup/restore for large databases

Backup Compression provides faster and smaller backups in SQL Server 2008

Page 10: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Log ShippingOverview

Enhanced Transaction Log Backup and RestoreBackup on primary instanceRestore on secondary instance(s)

Ability to apply changes to multiple secondaries

Configurable restore delay allows for recovery from user error as well as protection against failures

Page 11: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Log ShippingTrade-Offs

Asynchronous secondaries+ Minimal performance impact- Potential data loss (RPO > 0)

Separation of Capture and Apply mechanisms+ No impact for increasing number of secondaries- Manual failover (higher RTO)

Page 12: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Database MirroringOverview

Optimized for very fast failoverMirrored database is hot-standby of principal

Redundancy: Log (physical)Database failover unit

Two Modes: Asynchronous and Synchronous

New in 2008CompressionFaster Recovery for Manual FailoverAutomatic Page Repair

Page 13: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Database MirroringSynchronous Mode

MirrorPrincipal

Witness

Log

Application

SQL Server SQL Server

2

2

4

51

Data DataLog

3>2 >3

Mirror is always redoing—it remains current

Commit

Page 14: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Database MirroringTrade-Offs

Hot-standby+ Fastest failover (best RTO)

Synchronous+ No data loss (RPO=0)- Performance impact for commit

- Limited to one mirror per database- Performance impact increases with number of mirrored databases- No read access to mirror

Page 15: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Failover ClusteringOverview

Instance-level failover built on Windows Failover ClusteringTransparent failover includes network name, IP, storage, and SQL Server Agent

Combined with hardware redundancy like SAN for data loss protection

New in 2008More robust and reliable setup

Rolling upgrade and patchingIntegrated OS and SQL Server health checks

New with Windows Server 2008Failover Cluster Validation toolDHCP supportIPv6 supportUp to 16-node support

Page 16: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Failover ClusteringMultiple-Active Cluster

FC1 Active Node

FC2 Active Node

InstAInstB

InstC

Windows Server Cluster

FC1 Passive

Node

FC1 Passive Node

FC1 Passive

Node

FC2 Passive

Node

Shared Disk for

FC2

Shared Disk for

FC1

Page 17: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Failover ClusteringTrade-Offs

Instance-level failover+ Transparent to clients+ Entire instance is highly-available- Failover requires service restart (higher RTO)

No data redundancy+ No performance impact- Requires hardware redundancy like SAN for data loss protection of RPO=0

Site DR with VLAN and stretch storage-level replication+ No special configuration within SQL Server- Requires more complex hardware solutions

Page 18: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

ReplicationOverview

Logical (query-level) redundancy at table-level

Key scenarios:Customized application-specific DRReal-time reporting on secondary server that be used for Site DRScale out application queries

Two types: Transaction and Peer-to-Peer

New in 2008Add and recover nodes without disruption

Use log sequence number (LSN) to initialize a new nodeTopology Viewer

VisualizeCreate/Modify

Page 19: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

ReplicationTransactional and Peer-to-Peer

New Jersey

Transactional ReplicationReporting + Redundancy

Peer to Peer ReplicationQuery Scale Out + Redundancy

New York

Boston

Seattle

Tokyo

Shanghai

England

Page 20: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

ReplicationTrade-Offs

Logical, table-level redundancy+ Enables application-optimized solutions+ Enables replication from dissimilar databases- Performance impact- Smallest failover unit

Read and write ability on replicated data+ Highest hardware utilization- Write ability may affect data integrity

Asynchronous+ Limits performance impact from benefits above- Manual failover (higher RTO)- Data loss (RPO > 0)

Page 21: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Always On Technologies

Unplanned DowntimeBackup and RestoreLog ShippingDatabase MirroringFailover ClusteringReplication

Planned DowntimeRolling Upgrades and PatchingOnline OperationsResource GovernorDatabase Snapshots

Page 22: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Rolling Upgrades and Patching3 Step Upgrade/Patching

1. Perform upgrades on the mirror, secondaries, subscribers, or passive nodes

2. Switch rolesDatabase mirroring Failover to the mirrorLog shippingReplication Re-direct clients to subscriberFailover Clustering Failover to passive node

3. Perform upgrades on the original database server

1. Backup principal log with norecovery2. Recover secondary 3. Re-direct clients to secondary

Optional: switch roles again

Page 23: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Rolling Upgrades and PatchingNew in SQL Server 2008: Failover Clusters

FOI1

FOI2

Passive Nodes Offline

FOI1

FOI2

FOI1

FOI2

Prepare for Upgrade Upgrade Step 1

Upgrade Step 2Upgrade Step 3

FOI1 Possable Owners

FOI1 Upgraded Nodes

FOI1

FOI2

Prepare for Upgrade:1. Ensure .NET 3.5 and MSI 4.5 are installed on each node2. Consider upgrading SQL Server shared components on each node first

Upgrade Step 1:- Upgrade half of the nodes- Start upgrading passive nodes first to minimize failovers- Consider moving other instances to avoid service restart if this is the first Katmai instance on the node

Upgrade Step 2:- When half nodes are upgraded or when specified, setup will roll ownership to upgraded nodes. Downlevel nodes will be removed from possible owners and up-level nodes will be added

Upgrade Step 3:1. Upgrade remaining nodes2. Return ownership to desired nodes if needed

Page 24: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Online Operations

Online administration to reduce planned downtime.

Available online administrative operations include:Online system changes (e.g. alter table)Online granular restoreOnline backup and restoreOnline indexing

Page 25: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Resource GovernorNew in SQL Server 2008

Enables allocation of resources to workloads

Key benefits:Minimize runaway query scenarioPredictable concurrent execution of workloads

Limit known large workloads from abusing resourcesWorkload prioritization

Page 26: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Database SnapshotsOverview

Ability to interact with data at point-in-time

Snapshots can be used to keep point-in-time data to provide recovery from errors

Leverages TempDB to store older rowsPerformance impact to insert/update/delete statements

Offload read-only workloads

Page 27: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Always On Technologies

Unplanned DowntimeBackup and RestoreLog ShippingDatabase MirroringFailover ClusteringReplication

Planned DowntimeRolling Upgrades and PatchingOnline OperationsResource GovernorDatabase Snapshots

Page 28: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

No Data Loss(RPO=0)

Failover Unit AutoFailover(RTO)

Inst DB Tab

+ **

Read Mult-iple

Write

*

*

*

Solutions

Log Shipping

DBM Sync

Async

Cluster

Transactional Replication

Peer-PeerReplication

RPO FailoverRedundancy and

Utilization

Hard-ware App PerfImpact

Manag-eability

Low Low Low

Low High Low

Low Low Low

High*** Low *** Low***

Low Low High

Low Low High

Cost

* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology

Always On Solution Characteristics

Page 29: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Adventureworks Inc is a manufacturing

company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances

The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA

One datacenter located in OmahaThree applications

Manufacturing – Tier 1Finance – Tier 2 Scheduling – Tier 3

Manufacturing application runs on a dedicated SQL Server 2008 Instance

All other applications run on a second instance

Availability of tier1 applications is criticalImplement a solution at the lowest possible cost

AdventureWorks Inc Scenario

Page 30: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Manufacturing application has strict SLA’sFinance application requires readability on the secondary

The reports are run every 4 hours and need to be fresh as of the last one hour. To offload the reporting load from the main system they would like to utilize the secondary

Data LossRPO=0

RTO in secs

Failover Unit AutoFailover

Inst DB Tab

Read Multiple Sites

ReadWrite

Applications

Manufacturing

Finance

Scheduling

Application Requirements

Page 31: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Clustering can provide a zero data loss solution that can also provide fast instance level failover

Use RAID configuration to provide data redundancy on the SAN

If a redundant copy is required that can provide instance failover with zero

data loss use SAN replicationHigh Cost Solution

Use synchronous database mirroring if instance failover is not needed

Solutions Data LossRPO=0

Fast RTO

Failover Unit AutoFailover

Read > 1Sites\ Copy

Read WriteInst DB Tab

Cluster

SAN Replication

DBM - Sync

Log Shipping

Transactional Replication

Peer-PeerReplication

Clustering with RAID

DBM - Async

Solution Choice for Manufacturing Application

Page 32: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

For database level redundancy with acceptable data loss with minimal perf impact, asynchronous database mirroring is an optimal choice

Use database snapshots at periodic intervals to provide a readable snapshot of the data for reporting

Low cost solution

DBM - Async

Cluster

Solutions Data LossRPO=0

Fast RTO

Failover Unit AutoFailover

Read > 1Sites\ Copy

Read WriteInst DB Tab

SAN Replication

Log Shipping

Transactional Replication

Peer-PeerReplication

Async Database Mirroring

Omaha Datacenter

Finance

Db Snapshotevery hour

Reports

Scheduling

DBM - Sync

Solution Choice for Finance Application

Page 33: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Using Log Shipping to setup Mirroring

DEMO

Page 34: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure.

It is now required that all applications have the ability to failover to the regional datacenter across the river in Council Bluff

The SLA need to be maintained for tier 1 applications even in the case of site failures

Adding a regional datacenter into the mix

Page 35: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Async Database Mirroring

Omaha Datacenter

Finance

Db Snapshotevery hour

Reports

Scheduling

Sync Mirroringno witness

Log Shipping

CB Datacenter

Cluster with SAN

Manufacturing

Regional Site Solution Choices

Page 36: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX

The disaster recovery site has lower SLA requirements for all applications.The manufacturing application can have an RPO of 1 hourThe RTO is set at 4 hours

Complete Architecture

Page 37: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Sync MirroringNo witness

Cluster with SAN

Log Shipping

Manufacturing

Topology Diagram

Page 38: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Licensing Facts

Passive servers are mirror, log shipped secondary and clustering passive node

No license required on passive if it is truly passive

A passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server.

The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly.

Page 39: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

HA Features Edition SupportFeature Express Workgroup Standard Enterprise Comments

Database Mirroring 1

Advanced high availability solution that includes fast failover and automatic client redirection

Failover Clustering 2

Backup Log-shipping Data backup and recovery solution

Online System Changes

Includes Hot Add Memory, dedicated administrative connection, and other online operations

Online Indexing

Online Restore

Fast Recovery Database available when undo operations begin

₁Single thread redo₂ Limited to 2 node cluster

Page 40: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

There is no “one size fits all” solution

Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution

Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions

TEST the solution to ensure it can meet the availability requirements and meet SLA’s

Summary

Page 41: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Related Content

DAT401 | High Availability and Disaster Recovery: Best Practices for Customer DeploymentsDAT305 | See the Largest Mission Critical Deployment of Microsoft SQL Server around the WorldDAT303 | Architecting and Using Microsoft SQL Server Availability Technologies in a Virtualized WorldDAT407 | Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering ImplementationsWSV313 | Failover Clustering Deployment SuccessWSV314 | Failover Clustering Pro Troubleshooting with Windows Server 2008 R2DAT09-HOL | Installing a Microsoft SQL Server 2008 + SP1 Clustered InstanceDAT12-HOL | Maintaining a Microsoft SQL Server 2008 Failover ClusterVIR06-HOL | Implementing High Availability and Live Migration with Windows Server 2008 R2 Hyper-V

Page 42: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

INFRASTRUCTURE PLANNING AND DESIGN (IPD) GUIDEMicrosoft SQL Server 2008 and SQL Server 2008 R2What are IPD Guides?

Guidance & best practices for infrastructure planning of Microsoft technologies

SQL Server Guide BenefitsHelps organizations confidently plan a Microsoft SQL Server 2008 and SQL Server 2008 R2 implementation.

Assists database administrators and technical decision makers identify appropriate server roles

Guides architects and administrators in determining the infrastructure components, server placement, and fault-tolerance configuration

It’s a free download!Go to www.microsoft.com/ipd

Check out the entire IPD series for streamlined IT infrastructure planning

“At the end of the day, IT operations is really about running your business as

efficiently as you can so you have more dollars left for innovation. IPD guides help

us achieve this.” Peter Zerger, Consulting Practice Lead for Management Solutions, AKOS Technology Services

Page 43: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

DAT Track Scratch 2 Win

Find the DAT Track Surface Table in the Yellow Section of the TLCTry your luck to win a Zune HDSimply scratch the game pieces on the DAT Track Surface Table and Match 3 Zune HDs to win

Page 44: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

Page 45: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Complete an evaluation on CommNet and enter to win!

Page 46: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the

North America 2011 kiosk located at registrationJoin us in Atlanta next year

Page 47: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

question & answer

Page 48: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 49: Vineet Rao Lead Program Manager Microsoft Corporation SESSION CODE: DAT207.

JUNE 7-10, 2010 | NEW ORLEANS, LA


Recommended