Overview of high availability in Microsoft SQL Server

transcript

Szymon Wójcik

Agenda

IntroductionWhat is availability?What is and why to choose high availability?Key factors to consider for high availability scenarioHigh availability techniques in Microsoft SQL Server

ReplicationLog shippingMirroringFailover clustering

Discussion

PLSSUG Cracow Partners

Agenda

Discussion

Introduction

Szymon WójcikExperience with MS SQL Server since 2000 (dev/admin)MCITP: DBA SQL Server 2005Interests:

Performance tuningHigh availability

Blog – sqlphobosq.wordpress.comTwitter - @phobosq

Agenda

Discussion

Availability [1/5]

One of the concepts defined within ITILAbility to perform its agreed function when requiredDetermined by:

Reliability – how long (MTBF)Maintainability – how quickly restored (MTRS)Serviceability – contract conditionsPerformanceSecurity

ConfidentialityIntegrityAvailability

Availability [2/5]

Best practice – measure in %:

Agreed Service Time – defined in SLA (Service Level Agreement)Downtime – duration of service unavailability during Agreed Service TimeImportant when planning/deploying a service to understand availability concept

Availability [3/5] – figures for one week

Allowed downtime duration per week [hh:mm:ss format]Availability level 8x5 (40 hours/week) 24x7 (168 hours/week)

80% 08:00:00 33:36:00

90% 04:00:00 16:48:00

95% 02:00:00 08:24:00

98% 00:48:00 03:21:36

99% 00:24:00 01:40:48

99,9% 00:02:24 00:10:05

99,99% 00:00:14,4 00:01:01

99,999% 00:00:01,44 00:00:06

Availability [4/5] – figures for one year

Allowed downtime duration per year [DD.hh:mm:ss format]Availability level 8x5 (40 hours/week) 24x7 (168 hours/week)

80% 52.00:00:00 73.00:00:00

90% 26.00:00:00 36.12:00:00

95% 13.00:00:00 18.06:00:00

98% 5.04:48:00 7.07:24:00

99% 2.14:24:00 3.15:48:00

99,9% 0.06:14:24 0.08:45:36

99,99% 0.00:37:26 0.00:52:33

99,999% 0.00:03:45 0.00:05:15

Availability [5/5] – important notes

Availability != Uptime (service may be up but unavailable)Scheduled downtime does not have to cause unavailability (up to definition in SLA)

Agenda

Discussion

High availability - definition

System design approach and service implementation that ensures a certain level of operational performance (Wikipedia)Masks the effects of hardware or software failureMaintains availability of applications so that perceived downtime is minimized (Microsoft)

High availability != disaster recovery

High availability is used for ensuring for meeting Service Level Target for availabilityDisaster recovery is ensuring operational continuityThey can be used complementary – HA can minimize the need of invoking DR, but never replace it

Why to choose high availability

For users:Minimizes downtime probabilityAllows to sustain a failure if properly designed

For administrators:Simplifies migration effortMinimizes risk of continuity

Agenda

Discussion

Single point of failure

A whole system is as strong as the weakest link

Server

Switch

Router

Server on the Web

Hardware redundancy

Introduce additional hardware to minimize risk of failure

Server

Switch

Router

Server on the Web

Switch

Router

Hardware redundancy

Not only whole machines may be multiplicated to become fault tolerantAlso components:

Power suppliesCPUsHard disksNetwork interface cardsStorage controllers

Standby node

A standby node is a machine in a HA system that takes over in case of primary server failureThree types:

Cold standby – Unplugged, needs to be prepared before useWarm standby – Ready to use, but requires manual switchHot standby – Ready to use, takes over automatically

Fail over = switching from primary to standbyFail back = return to primary

There may be more than one standby in HA scenario!

Load balancing vs failover

Load balancing – distributing of workload between several peer servers

If one goes down, others take overWorkload distributed by load balancer

Failover – automatic switch to standbyStandby is not activeSwitch initiated upon loss of heartbeat

Other points

High availability requires additional costs – multiple components must be present according to design in order to meet requirementsIt may become complex to maintain – additional CIs present in environment that need to be kept up-to-dateHardware design must be followed by software to fully benefit from HAKISS – Keep It Simple and Stupid

Agenda

Discussion

High availability in Microsoft SQL Server

SQL Server, as a RDBMS, provides means for failover scenarioLoad balancing is difficult and must be properly designed in order to workHigh availability in SQL Server does not prevent logical data corruption – periodic DBCC checks are advised

HA methods overview in SQL ServerMethod What it does Standby

type# of standby

nodesRemark

ReplicationTransfers completed transactions to standby nodes

Cold/warm Any• Standby is accessible• May allow for updates• Conflicts may appear

Log shippingPerforms regular log backups, copy to standby and restore

Cold Any• Database unavailable

during restore• Standby may be

accessible

Database mirroring

Replays transactions as they are logged Warm/hot 1

• Standby unavailable• Requires third server

to allow hot standby

Failover cluster

Monitors Windows service status and transfers execution

Hot Any

• Requires shared (or replicated) storage

• Requires identical hardware

• Failover = downtime

Replication

Three server roles in replication:PublisherDistributorSubscriber

Three types:SnapshotTransactionalMerge

Two subscription methods:Push – Distributor pushes articles to SubscribersPull – Subscribers downloads from Distributor

Replication topology

Publisher

Distributor

Subscriber

Publisher

Subscriber

Possible application of replication

Create a second copy of data to be used in case of emergency (DR)Create a copy of data to offload the server (load balancing)Allow offline users to work with data and upload their changes later (high availability)

Replication agents

External programs which are used to implement replication:

Snapshot Agent:creates snapshots

Log Reader Agent:Reads transaction logMarks transactions for replication

Distribution Agent:Dispatches transactions to Subscriber

Merge Agent:Downloads remote and uploads local changesResolves conflicts in merge replication

Snapshot replication

Publisher makes a copy of a database which is applied at SubscriberGood for small, static data:

Whole snapshot is applied every time – the changes which appear after snapshot will be applied with next snapshotRequires sufficient bandwidth

Transactional replication

Starts with a snapshotTransactions are recorded at Publisher and replayed at SubscriberMay allow for updatable subcriptionsIf Subscriber is offline, records are stored at the Distributor

Merge replication

Starts with a snapshotMerges changes between Publisher and SubscribersAllows synchronization via HTTPS (since SQL Server 2008)Allows the most autonomous design – e.g. mobile users, multiple branch offices working on the same data

Replication how-to

Configure DistributorConfigure Publisher:

Select replication typeSelect articles to be published[Optional] Set up article filteringSet up security

Configure Subscribers:Connect to DistributorSelect subscription method

Apply snapshot[Transactional/merge] Synchronize changes

Failover in replication

Stop subscriptionDirect all traffic from Publisher to Subscriber:

Change application connection stringsChange DNS aliases, if required, orChange IP addresses

Failback in replication

After restoring Publisher, restore a copy of database from SubscriberDirect all traffic from Subscriber to PublisherReestablish the replication

Log shipping

Keeps a standby by automating backup, copy and restore processThree server roles in log shipping:

PrimarySecondaryMonitor

How it works? [1/2]

Restore a full backup from Primary to Secondary and then:

A job runs on Primary which backs up transaction logSecond job copies the log backup to SecondaryThird job on Secondary restores the log after it’s copied

[Optional] Monitor server tracks performance and incidents

How it works? [2/2]

Primary Secondary

Monitor

Failover in log shipping

Copy transaction log backups from primary to secondaryBackup tail of the log on primaryRestore all backups except tail-log with NORECOVERYRestore tail-log with RECOVERYDisable log shipping jobsRedirect client traffic to secondary

Drawbacks of log shipping

You can’t miss a transaction log backupNetwork traffic generated has to be consideredYou are always behind on SecondarySecondary is read-only

Database mirroring

Allows to keep your standby up-to-dateAllows automatic failoverCost-effective alternative to clusteringAvailable in Standard Edition (2005 – 2008 R2)Does not require cluster capable hardwareMight be in implemented when Windows Authentication mode is not possible (using certificates)

How it works?

Principal Mirror

Witness

Database mirroring modes

High availability (with witness)Automatic failoverSynchronous transaction commit (principal commits after mirror confirms it’s commit)

High protection (without witness)Manual failoverSynchronous transaction commit

High performance (without witness)Manual failoverAsynchronous transaction commit

Manual failover in database mirroring

Can be done with one mouse click in SSMSRequires client traffic redirection:

Possible within connection string using Failover Partner command

Automatic failover in database mirroring

Initiated automatically by witness if there is no quorum:

If principal is unavailable, fails over to mirrorDoes nothing if mirror becomes unavailableFails over also if principal is up but unreachable from network!

Requires client traffic redirection:Possible within connection string using Failover Partner command

Failover clustering

Provides protection on a server level:Automatic failover in case of server failureFails over logins, endpoints and jobs

Combines multiple machines (nodes) in a single virtual serverRequires cluster-capable hardware:

Shared or common storageCertified server hardware

Clustering

Node A

Node B

UserStorage

Cluster

Failover in a cluster

Node A

Node B

UserStorage

Cluster

Node A

Node B

UserStorage

Cluster

Node A

Node B

UserStorage

Cluster

SummaryMethod What it does Standby

type# of standby

nodesRemark

ReplicationTransfers completed transactions to standby nodes

Cold/warm Any• Standby is accessible• May allow for updates• Conflicts may appear

Log shippingPerforms regular log backups, copy to standby and restore

Cold Any• Database unavailable

during restore• Standby may be

accessible

Database mirroring

Replays transactions as they are logged Warm/hot 1

• Standby unavailable• Requires third server

to allow hot standby

Failover cluster

Monitors Windows service status and transfers execution

Hot Any

• Requires shared (or replicated) storage

• Requires identical hardware

• Failover = downtime

Discussion

THANK YOU!

Overview of high availability in Microsoft SQL Server

Documents