SQL Server High Availability

Post on 27-Jan-2015

2,770 views 3 download

Tags:

description

SQL Server High Availability presentation to SQL Saturday #12

transcript

Copyright 2009 – Database Architechswww.dbarchitechs.com

HA cover

SQ

L S

atur

day

2009

–P

ortla

nd, O

rego

n

Copyright 2009 – Database Architechswww.dbarchitechs.com

Paul Bertucci• Founder Database Architechs –www.dbarchitechs.com

– Specializing in HA, Database Design, Data Architecture, Data Replication, and P&T for SQL Server, Sybase, DB2 and Oracle

– Over 28+ years experience in Data Base industry• Co-Author of SQL Server 2000 Unleashed! (SAMS)• Co-Author of SQL Server 2005 Unleashed! (SAMS)• Co-Author of SQL Server 2008 Unleashed! (SAMS) – Summer 2009 !• Co-Author of ADO.NET in 24 hours (SAMS)• Author MS SQL Server High Availability (SAMS)• Author Sybase Performance & Tuning• Author Sybase Physical DB Design• Veritas SQL Server Performance Series • Former Chief Data Architect Symantec Corporation• Current Chief Architect Autodesk Corporation

pbertucci@dbarchitechs.com

Copyright 2009 – Database Architechswww.dbarchitechs.com

Agenda

� What is High Availability?

� How do you assess your HA Requirements?

� What are the MS SQL Server related options for HA?

� How each option delivers HA…

� Performance and Tuning is critical too – SQL Shot!

� Q & A

Copyright 2009 – Database Architechswww.dbarchitechs.com

Test

1. What is the quickest way to test if your SQL Server Clusteringconfiguration is failing over properly?

2. What is the SQL Server feature in SQL Server 2005/2008that replaces Log Shipping?

Copyright 2009 – Database Architechswww.dbarchitechs.com

What is Availability?ApplicationAvailability

PlannedDowntime

UnplannedDowntime

RecoverableDisaster

Failure causes:- Human- Hardware- Software

Uptime

Copyright 2009 – Database Architechswww.dbarchitechs.com

The cost of Un-Availability

� Airline Reservation Systems - $67K to $112K per hour

� ATM Service Fees - $12K to $17K per hour

� Brokerage (Retail) - $5.6M to $7.3M per hour

What is your cost of downtime?

Copyright 2009 – Database Architechswww.dbarchitechs.com

Across all layers of your systems

NetworkApplication

MiddlewareDatabase

Operating System HARDWARENetwork Components

ServersDisk Systems

Memory

Copyright 2009 – Database Architechswww.dbarchitechs.com

Availability across planned operation

Feb 14-28 Mar 1 – Apr 15 Apr 16 – 20

Ava

ilabi

lity

(%)

100%

90%

Availability Goals

Starting Date Date of Failure Days Hours Minutes MBU (minutes) TU Avail %Period 1 2/14/2008 2/28/2008 15.00 24.00 60.00 21600.00 38.00 99.82407Period 2 3/1/2008 4/15/2008 46.00 24.00 60.00 66240.00 68.00 99.89734Period 3 4/16/2008 4/20/2008 5.00 24.00 60.00 7200.00 442.00 93.86111

Overall 2/14/2008 4/20/2008 66.00 24.00 60.00 95040.00 548.00 99.4234

Copyright 2009 – Database Architechswww.dbarchitechs.com

Extreme Availability

High Availability

Standard Availability

Acceptable Availability

Marginal Availability

Near zero downtime!

Minimal downtime

With some downtimetolerance

Non-critical Applications

Non-production Applications

8,760 hours/year | 168 hours/week | 24 hours/day

525,600 minutes/year | 7,200 minutes/week | 1,440 minutes/day

(99.5% - 100%)

(95% - 99.4%)

(83% - 94%)

(70%-82%)

(up to 69%)

Availability ContinuumCharacteristic Availability Range

Availability Range describes the percentage of time relative to the “planned” hours of operations

Copyright 2009 – Database Architechswww.dbarchitechs.com

ZeroUnplannedDowntime

ZeroPlanned

Downtime

Marginal Availability

Acceptable Availability

Standard AvailabilityHigh Availability

Extreme AvailabilityApplications and Availability

email

eCommerce

ATM

HRAccounting

MarketingMailers

911

InventoryMgmt

Five 9’s (99.999%) ~ 6 minutes/year downtime

Copyright 2009 – Database Architechswww.dbarchitechs.com

What do you need?

It’s as simple as 1, 2, 3 +

� Step One – Launch of a brief “Phase 0” HA Assessment

� Step Two – Complete an HA Primary Variables gauge

� Step Three – Match your need to the optimal HA solution

� Step + (optional) – Determine the ROI of the HA solution

Copyright 2009 – Database Architechswww.dbarchitechs.com

Assessing HA with Primary Variables

Uptime Requirement

Time to Recover

Data Resiliency

Application Resiliency

Cost of Downtime ($$ lost/hr)

Cost of the High Availability Solution ($$)

Degree of Distributed Access/Synchronization

Scheduled Maintenance Frequency

Performance/Scalability

0% 100%

Long Short

Low High

Low High

Low High

Often Never

Low High

Low High

Low High

Tolerance of Recovery TimeHigh Low

Copyright 2009 – Database Architechswww.dbarchitechs.com

Assessment(scope)

Assessment(scope)

Development Methodology“With High Availability built in”

RequirementsRequirements

DesignDesign

Code & TestCode & Test

System Test &Acceptance

System Test &Acceptance

ImplementationImplementation

0. Assessment- Project Planning- Project Sizing- Deliverables Identified (SOW)- Schedules/milestones- High-Level Requirements (scope) - Estimate HA Primary Variables (gauges)

1. Requirements- Detail Requirements (process/data/technology)- Early Prototyping (optional)- Detailed HA Primary Variables- Detailed Service Level Agreements/Rqmts- Detailed Disaster Recovery requirements

2. Design- Detail Design (data/process/technology)- Choose and design the matching HA solution for the application

3. Code & Test- Code Development/Unit Testing- Fully integrate the HA solution with the application

4. System Test & Acceptance- Full system Test/User Acceptance- Full HA Test/Validation/Acceptance

5. Implementation- Production Build/Implementation- Production HA build/monitoring begins

Copyright 2009 – Database Architechswww.dbarchitechs.com

Transition

Spiral/Rapid MethodologyIterative approach 0. Initial assessment

- Project Planning- Project Sizing- Deliverables Identified (SOW)- Schedules/milestones- High-Level Functions (scope)-Estimate HA Primary Variables (gauges)

3. Requirements- Detail Requirements

(process/data/technology)- Detail HA Primary Variables- Detailed SLA/Rqmts- Detailed Disaster

Recovery requirements

1. High-level Rqmts/Prototyping- High-level requirements (process/data/technology)

- High-level HA Primary Variables

4. Design- Detail Designs (process/data/technology)- Choose and design the matching HA solution for the application (verified via prototypes)

Elaboration/Prototype

Inception

2. Early Code & Test- Early code and testing of apps/DI

(process/data/technology)- Prototyping of HA options

Construction

5. Code & Test- Code Development/Unit Testing-Fully integrate the HA solutionwith the application

6. System Test & Acceptance- Full system Test/User Acceptance

-Full HA Test/Validation/Acceptance

7. Implementation- Production Build/Implementation- Production HA build/monitoring begins

Copyright 2009 – Database Architechswww.dbarchitechs.com

LogShipping

DBMirroring

SQL Clustering

Data Replication

Cluster Services

OtherHW

DiskMethods

Valid High Availability Options

DiskMethods

OtherHW

Cluster Services

Data Replication

SQL Clustering

DBMirroring

LogShipping

Copyright 2009 – Database Architechswww.dbarchitechs.com

Cluster GroupResources

Windows 2003Enterprise Edition

MSCS Cluster Services

Windows 2003Enterprise Edition

SCSI

LocalBinaries

LocalBinaries

Node A

D:

C:

C:Node B

Shared Disk

Q: Quorum

Copyright 2009 – Database Architechswww.dbarchitechs.com

Windows 2003Enterprise Edition

Windows 2003Enterprise Edition

SQL Server 2008 (Virtual SQL Server)

SCSI

LocalBinaries

LocalBinaries

COLTST1SQL Clustering

E:

C:

C:

Master DBTempDBAppl 1 DB

COLTST2

SQL Server 2008 (physical)

SQL Server 2008 (physical)

SQLConnections

Quorum Disk

Q:VSQLDBARCH\VSQLSRV1

Cluster GroupResources

MS DTC

SQL Agent

Copyright 2009 – Database Architechswww.dbarchitechs.com

Publication

Server

distribution

Central Publisher/Remote DistributorReplication model

AdventureWorks

SQL Server 2008

AdventureWorks

SQL Server 2008 Subscription

Server

SQL Server 2008

DistributionServer

Data Replication

“Primary”“Replicate”

Can be used as a Warm Standby and/or for Reporting needs

AdventureWorks

SQL Server 2008 Subscription

Server

“Replicate”

Database Mirroring

MSDB DB

SQL Server 2008

SQL Server 2008

WitnessServer

Adventure Works DB

SQL Server 2008 Mirror Server

translog

PrincipalServer

translog

Adventure Works DB

Network

Client Client ClientClient

B

C

A

DD

Database Mirroring

20FIG34

MSDB DB

SQL Server 2008

SQL Server 2008

WitnessServer

AdventureWorksDB

SQL Server 2008

Mirror Server

translog

PrincipalServer

Networ

k

Networ

k

Database Snapshot

Repor

ting U

sers

AdventureWorksDB

translog

Database Mirroring with DB Snapshots

Copyright 2009 – Database Architechswww.dbarchitechs.com

SQL Server 2008

“Source”

CallOne DB

CallOne DB

SQL Server 2008

SecondaryServer

“Destination”

translog

\Backup\CallOne_tlog_200405141120.TRN

TxnLogbackupsPrimary

Server

TxnLog Copies

\LogShare\CallOne_tlog_200405141120.TRN TxnLog Restores

MSDB DB

SQL Server 2008

MonitorServer

“Monitor”

Last

log

sh

ipp

ed

Del

ay A

nsw

er

Delay between logs loaded

Delay Answer

Log Shipping

Copyright 2009 – Database Architechswww.dbarchitechs.com

RAID Disk I/O Summary

RAID Level Fault Tolerance Logical Reads

Physical I/Os per

Read

Logical Writes

Physical I/Os per Write

RAID 0 None 1 1 1 1

RAID 1 or 10 Best(Optimal for OLTP)

1 1 1 2 writes

RAID 5 Moderate (Optimal for mostly

READ ONLY systems)

1 1 1 2 reads + 2 writes(that’s 4 per write!)

Several RAID vendors are now showing RAID 5 and RAID 10performance almost equivalent now via Cache/Buffer

advancements on their RAID controllers

NOTE:

Copyright 2009 – Database Architechswww.dbarchitechs.com

Fault Tolerance and SQL DB FilesDescription Fault Tolerance

Quorum Drive The quorum drive used with MSCS should be

isolated to a drive by itself (very often mirrored as well for maximum availability)

RAID 1 or

RAID 10

SQL Server

Database

files (OLTP)

For OLTP (online transaction processing) systems,

the database data/index files should be placed

on a RAID 10 disk system.

RAID 10

SQL Server

Database

files (DSS)

For DSS (Decision Support Systems) systems that

are primarily READ ONLY, the database

data/index files should be placed on a RAID 5

disk system.

RAID 5

Temp DB Highly volatile disk I/O (when not able to do all it’s

work in cache)

RAID 10

SQL Server

Transaction

Log files

The SQL transaction log files should be on their own

mirrored volume for both performance and

database protection. (for DSS systems, this

could be RAID 5 also).

RAID 10

Or RAID 1

Copyright 2009 – Database Architechswww.dbarchitechs.com

Example DB data files configuration

TempDB

OLTP X - DB

Master DB

OLTP Y - DB

log

log

DSS - DB(read only)

log

E:

F:

G:

H:

RAID 10

RAID 5

Q: Quorum

RAID 1 or RAID 10

Copyright 2009 – Database Architechswww.dbarchitechs.com

Database Snapshots

Decision Tree approach

Condition/Question

Case A Case B Case CCase D Case nAction

VAction

WAction

XAction

YAction

Z

. . .. . .

DiskMethods

OtherHW

Cluster Services

Data Replication

SQL Clustering

DatabaseMirroring

LogShipping

DistributedTransactions

Copyright 2009 – Database Architechswww.dbarchitechs.com

Decision-Tree Path Traversal

1

HW/DiskRedundancy

Cluster Services

Data Replication

SQL Clustering

LogShipping

DistributedTransactions

HANot Needed

1b2ba

cd

e

1c2

1a2

1e2

1d2

1a2c3

1a2a3

1a2b3

1a2d3

1a2e3

ab c d

e

DatabaseMirroring

DatabaseSnapshots

Copyright 2009 – Database Architechswww.dbarchitechs.com

Decision-Tree: ASP Questions 1-3What % of availability must your application have?

A% <= 70%

MarginalAvailability

70% < A% < =83% 83 < A% < =95% A%> 99.5%

AcceptableAvailability

StandardAvailability

ExtremeAvailability

HighAvailability

95% < A% < =99.5%

How much tolerance of downtime by end-users?

Very High

NotCritical

High Medium Very Low

LowCriticality

StandardCriticality

ExtremelyCritical

HighCriticality

Low

What is the per hour cost of downtime for this application?

$C<= $3K

Very LowCost

$3K < $C < =$7K $7K < $C < =$12K $C > $20K

LowCost

ModerateCost

Very HighCost

HighCost

$12K < $C < =$20K

1

2

3

Copyright 2009 – Database Architechswww.dbarchitechs.com

Decision-Tree: ASP Questions 4-6How long does it take to get the application back online?

Very Long

MarginalRecoverability

Long Average Very Short

AcceptableRecoverability

StandardRecoverability

ExtremeRecoverability

FastRecoverability

Short

How much of the application is distributed?

None

Non-Distributed

A Little Medium All

LowDistribution

ModeratelyDistributed

ExtremelyDistributed

HighDistribution

A Lot

How much data inconsistency can be tolerated?

Very Little

Very HighConsistency

A Little Medium Very Much

HighConsistency

ModerateConsistency

MinimalConsistency

LowConsistency

A Lot

4

5

6

Copyright 2009 – Database Architechswww.dbarchitechs.com

Decision-Tree: ASP Questions 7-9How often is scheduled maintenance required?

Very HighDowntime

Average Very Often

HighDowntime

ReasonableDowntime

MinimalDowntime

LowDowntime

Often

How important is high performance and scalability?

Not Very

Very lowPerformance

Somewhat Moderately Extremely

LowPerformance

ReasonablePerformance

ExtremePerformance

HighPerformance

Very Much

How important is the application connection to the end-user?

NotNeeded

Establish newConnection

ConnectionRetry process

ConnectionFail-over

ConnectionRe-established

easily

7

8

9

RarelyNot Often

Not Very Somewhat Moderately ExtremelyVery Much

Copyright 2009 – Database Architechswww.dbarchitechs.com

Decision-Tree: ASP Question 10What is the estimated cost of the HA Solution (budget)?

C$ < $10K

Very LowCost

$10K <= C$ < $100K

LowCost

ModerateCost

ExtremeCost

HighCost

10

DiskMethods

OtherHW

Cluster Services

SQL Clustering

$100K <= C$ < $250K $250K <= C$ < $500K C$ >= $500K

1. 1e ���� Extreme Availability goal2. 1e+2d ���� Very low tolerance of downtime3. 1e+2d+3e ���� $15k/hr cost of downtime (High Cost)4. 1e+2d+3e+4c ���� Average recovery time5. 1e+2d+3e+4c+5a ���� No distributed components or synchronization6. 1e+2d+3e+4c+5a+6b ���� A little data inconsistency can be tolerated7. 1e+2d+3e+4c+5a+6b+7c ���� Average amount of scheduled downtime8. 1e+2d+3e+4c+5a+6b+7c+8d ���� Performance is very much important9. 1e+2d+3e+4c+5a+6b+7c+8d+9b ���� Connection can be re-established10. 1e+2d+3e+4c+5a+6b+7c+8d+9b+10c ���� Moderate HA Cost/Good budget

Best fitting HA Solution (together)

Copyright 2009 – Database Architechswww.dbarchitechs.com

Basic “one-two” Punch approach

Hardware/NetworkRedundancy

Disk BackupsDB Backups

Vendor SLA’s

Training, QA,& Standards

SoftwareUpgrades

Build the proper foundation first

Then, build within the appropriate HA solution that your application requires

1

2

Database Snapshots

DiskMethods

OtherHW

Cluster Services

Data Replication

SQL Clustering

DatabaseMirroring

LogShipping

DistributedTransactions

Copyright 2009 – Database Architechswww.dbarchitechs.com

ASP – Scenario #1 with SQL Clustering

Windows 2003Enterprise Edition

Windows 2003Enterprise Edition

SQL Server 2005 (Virtual SQL Server)

SCSI

LocalBinaries

LocalBinaries

ASPProd1

E:

C:

C:

Master DB

TempDB

HOE DB

ASPProd2

SQL Server 2005 (physical)

SQL Server 2005 (physical) Quorum Disk

Q:

ASQL\ASPSERV1

Cluster GroupResources

MS DTC

SQL Agent

Net

wor

k

JRU

N/W

ebS

ervi

ces/

IIS

F:

G:

MSCS

MSCSActive/PassiveConfiguration

Copyright 2009 – Database Architechswww.dbarchitechs.com

SQL Server 2000

“Source”

CallOne DB

CallOne DB

SQL Server 2000

SecondaryServer

“Destination”

translog

\Backup\CallOne_tlog_200405141120.TRN

TxnLogbackupsPrimary

Server

TxnLog Copies

\LogShare\CallOne_tlog_200405141120.TRN TxnLog Restores

MSDB DB

SQL Server 2000

MonitorServer

“Monitor”

Last

log

sh

ipp

ed

Del

ay A

nsw

er

Delay between logs loaded

Delay Answer

Log Shipping

Copyright 2009 – Database Architechswww.dbarchitechs.com

Live REPL solutionPublication

Server

distribution

Central Publisher/Remote DistributorReplication model

MktgDB

SQL Server 2000

MktgDB

SQL Server 2000Subscription

Server

SQL Server 2000

DistributionServer

Headquarters (Santa Clara)

North America (Reporting & “warm/hot” spare)

MktgDB

SQL Server 2000Subscription

Server

Europe (Reporting)

MktgDB

SQL Server 2000Subscription

Server

Far East (Reporting)

Copyright 2009 – Database Architechswww.dbarchitechs.com

Publication

Server

distribution

Central Publisher(default option)

Northwind

SQL Server 2000 Northwind

SQL Server 2000Subscription

Server

DistributionServer

Northwind

SQL Server 2000Subscription

Server

Northwind

Oracle Subscription

Server

Northwind

SQL Server 7.0Subscription

Server

Copyright 2009 – Database Architechswww.dbarchitechs.com

Publication

Server

distribution

Central PublisherRemote Distributor

Northwind

SQL Server 2000 Northwind

SQL Server 2000Subscription

Server

Northwind

SQL Server 2000Subscription

Server

Northwind

Oracle Subscription

Server

Northwind

SQL Server 7.0Subscription

Server

SQL Server 2000

DistributionServer

Distributing DataData Access Latency Autonomy Sites(locations)

Frequency Network Machines Owner Other

Read OnlyReporting short high many high fast/

stable1

server/site1 OLTP

site

Each site only needs regionaldata

Central PublisherTransactional replfilter by region

REPLICATION

Read OnlyReporting long high many low fast/

stable1

server/site1 OLTP

site

Each site only needs regionaldata

Central PublisherSnapshot replfilter by region

Read MostlyA few updates short high < 10 medium fast/

stable1

server/site1 OLTP

site

Regional updateson one table

Central PublisherTransactional replUpdating Subs

Read MostlyA few updates medium high < 10 medium slow/

unreliab1

server/siteAll

update

Regional updateall tables

Central PublisherMerge repl

Inserts (new orders) short high many high fast/

stable1

server/site

1reportsite

Each site only needs regionaldata

Central SubscriberTransactional repl

Hot/WarmSpare

Veryshort

high < 2 high fast/stable

1server/site

1 OLTPsite

Fail-overCentral PublisherRemote DistributorTransactional repl

Read equalEqual updates short high < 10 medium fast/

stable1

server/siteAll

update

Regional updateall tables

Peer-to-PeerTransactional

repl

Database Mirroring

Database Mirroring

Database Mirroring

Copyright 2009 – Database Architechswww.dbarchitechs.com

Piecing it togetherHardware/Network

RedundancyDisk BackupsDB Backups

Vendor SLA’s

Training, QA,& Standards

SoftwareUpgrades

NetworkApplication

MiddlewareDatabase

OperatingSystem

HARDWARENetwork Components

ServersDisk Systems

Memory

System

Stack

Foundation, Foundation, Foundation

Copyright 2009 – Database Architechswww.dbarchitechs.com

ROI Calculati

on

Copyright 2009 – Database Architechswww.dbarchitechs.com

Database Mirroring

MSDB DB

SQL Server 2008

SQL Server 2008

WitnessServer

Applx DB

SQL Server 2008

Mirror Server

translog

PrincipalServer

translog

Applx DB

Network

B

C

A

DD

Transparent Client Redirect

“Copy-on-Write” technology

20FIG30

SQL Server 2005 Database Mirroring

20FIG31

SQL Server 2008 Database Mirroring

16

Copyright 2009 all rights reservedCopyright 2009 all rights reserved

4343

Copyright 2009 – Database Architechswww.dbarchitechs.com

Database Mirroring with DB Snapshot

MSDB DB

SQL Server 2005

SQL Server 2005

WitnessServer

Applx DB

SQL Server 2005

Mirror Server

translog

PrincipalServer

translog

Applx DB

Networ

k

Networ

k

Database Snapshot

Repor

ting U

sers

04SQL Server 2008

SQLServer

AdventureWorksDB

SnapshotAdventureWorks

DB

Source DataPages

System Catalogof changed pages

SparseFile

Pages

SELECT …..data……. FROM AdventureWorks

SNAPSHOT

Snapshot Users

PH Topology With SnapshotsPH Topology With Snapshots

SQL Server 2008

Adventure Works DB

SQL Server 2008

Mirror Server

translog

PrincipalServer

translog

Adventure Works DB

Instance: SQL2008xyz

Instance: SQL2008zzz

Endpoint Name: “endpoint4mirroring”

Endpoint Name: “endpoint4mirroring”

Role: PARTNER

Role: PARTNER

PH Topology

SQL Server 2008

PrincipalServer

Active

Passive

Clu

ster

ed

OLT

P A

pplic

atio

n

Rep

licat

ion

NetworkNetwork

Less CriticalReporting Users

Database Snapshot

Net

wo

rkN

etw

ork

CriticalReportUsers

Copyright 2009 – Database Architechswww.dbarchitechs.com

The Combo PackSQL Server 2008

PrincipalServer

SQL Server 2008

MirrorServer

SQL Server 2008

WitnessServer

SQL Server 2008

PrincipalServer

SQL Server 2008

MirrorServer

SQL Server 2008

Subscriber

SQL Server 2008

Subscriber

Publisher

Distributor

Copyright 2009 – Database Architechswww.dbarchitechs.com

DB Availability Improvement !

SQL Server 2000

SQLServer

SQL Server 2005/2008

SQLServer

time

RestartStage

Tra

nsac

tions

Rol

led

For

war

dT

rans

actio

nsR

olle

d B

ack

Res

tart

co

mpl

ete

SQL Server 2005/2008database is available

SQL Server 2000database is available

Performance and Tuning counts in HA

SQL SHOT – MS SQL Server

Copyright 2009 – Database Architechswww.dbarchitechs.com

Questions

Is there any time left????

Copyright 2009 – Database Architechswww.dbarchitechs.com

SQL Server Resources

Stop!

Copyright 2009 – Database Architechswww.dbarchitechs.com

Fail-Over via Move Group

ANSWER to question #1

Copyright 2009 – Database Architechswww.dbarchitechs.com

SQL Server 2000

Northwind 2

SQL Server 2000

Distributed Transactions

“Primary Location”

“Secondary Location”

Northwind 1ReadsTry primary first

If not available, try secondaryUpdates

Must succeed together, or be

both rolled back(two-phase commit)

MS

DT

C