+ All Categories
Home > Technology > Microsoft SQL Server always on solutions guide for high availability and disaster recovery

Microsoft SQL Server always on solutions guide for high availability and disaster recovery

Date post: 15-May-2015
Category:
Upload: -
View: 1,412 times
Download: 2 times
Share this document with a friend
Popular Tags:
33
Transcript
  • 1. Mic Sol Ava LeRoy Contrib Mishra Review (SQLHA Matthe Thoma Summa maximiz AlwaysO A key go between infrastru Categor Applies Source: E-book 32 page croso ution ailab y Tuttle, butors: Li wers: Kevi A), Alexei ews, AyadS s, Benjam ry: This wh ze applicatio On high ava oal of this p n business s ucture engin ry: Quick G to: SQL Se White pap publicatio s oft SQ ns Gu ility a , Jr. indsey All n Farlee, S Khalyako, Shammou min Wright ite paper d on availabil ilability and paper is to e stakeholder neers, and d uide erver 2012 er (link to s on date: Ma QL Se uide and en, Justin Shahryar G , Wolfgan ut (Caregr t-Jones iscusses ho ity, and pro d disaster re establish a rs, technica database ad source cont ay 2012 erver for H Disas Erickson, G. Hashem g Kutsche roup), Dav ow to reduc ovide data p ecovery sol common co l decision m dministrato ent) r Alw High ster Min He, C mi (Motric era (Bwin vid P. Smit ce planned protection utions. ontext for r makers, syst ors. waysO Reco Cephas Li city), Allan Party), Ch th (Service and unplan using SQL S related disc tem archite On overy n, Sanjay n Hirt harles eU), Juerg nned downt Server 2012 ussions ects, y gen time, 2

2. This page intentionally left blank 3. Copyright 2012 by Microsoft Corporation All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher. Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book. 4. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv Contents HighAvailabilityandDisasterRecoveryConcepts.........................................................................1 DescribingHighAvailability................................................................................................................................................1 Plannedvs.UnplannedDowntime..........................................................................................................................................1 DegradedAvailability..............................................................................................................................................................2 QuantifyingDowntime.........................................................................................................................................................2 RecoveryObjectives................................................................................................................................................................3 JustifyingROIorOpportunityCost..........................................................................................................................................3 MonitoringAvailabilityHealth................................................................................................................................................4 PlanningforDisasterRecovery...............................................................................................................................................4 Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5 SQLServerAlwaysOn..............................................................................................................................................................5 SignificantlyReducePlannedDowntime.................................................................................................................................5 EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6 EasyDeploymentandManagement.......................................................................................................................................6 ContrastingRPOandRTOCapabilities....................................................................................................................................6 SQLServerAlwaysOnLayersofProtection..........................................................................................7 InfrastructureAvailability...................................................................................................................................................8 WindowsOperatingSystem....................................................................................................................................................8 WindowsServerFailoverClustering.......................................................................................................................................9 WSFCClusterValidationWizard...........................................................................................................................................11 WSFCQuorumModesandVotingConfiguration..................................................................................................................12 WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15 SQLServerInstanceLevelProtection...........................................................................................................................17 AvailabilityImprovementsSQLServerInstances...............................................................................................................17 AlwaysOnFailoverClusterInstances.....................................................................................................................................18 DatabaseAvailability..........................................................................................................................................................21 AlwaysOnAvailabilityGroups...............................................................................................................................................21 AvailabilityGroupFailover....................................................................................................................................................22 AvailabilityGroupListener....................................................................................................................................................24 AvailabilityImprovementsDatabases................................................................................................................................26 ClientConnectivityRecommendations........................................................................................................................27 Conclusion..............................................................................................................................................................................28 5. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1 HighAvailabilityandDisasterRecoveryConcepts Youcanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecovery solutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges, andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives. ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywith MicrosoftSQLServer2012sectionofthispaper. DescribingHighAvailability Foragivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsofthe endusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemay beexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts, contractualdamages,orthelossofgoodwill. Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.A soundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)with technicalcapabilitiesandinfrastructurecosts. Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersand stakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation: 100% Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolution provides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesof downtime. Numberof9s AvailabilityPercentage TotalAnnualDowntime 2 99% 3days,15hours 3 99.9% 8hours,45minutes 4 99.99% 52minutes,34seconds 5 99.999% 5minutes,15seconds Plannedvs.UnplannedDowntime Systemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplanned failure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokey typesofforeseeabledowntime: Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,data loading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperational proceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities 6. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2 canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplanned outagescenarios. Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedor uncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orare consideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesof failures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance. WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformance indicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyou tocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanned downtime. DegradedAvailability Highavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoa completeoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohave limitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude: Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybe temporarilyhaltedorqueued. Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,ora partialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.User experiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner. Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthat retriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduser asdatalatencyorpoorapplicationresponsiveness. Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayers ofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferent functionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthe featuresorcomponentsthatareaffected. Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegraded availabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery. QuantifyingDowntime Whendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthe systembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.With unplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutage occurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage. 7. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3 Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostop investigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthe systembackonline,andifneeded,reestablishfaulttolerance. RecoveryObjectives Dataredundancyisakeycomponentofahighavailabilitydatabasesolution.Transactionalactivityon yourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondary instances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybe lostonthesecondaryinstancesduetodelaysindatapropagation. Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackin business,andhowmuchtimelatencythereisinthelasttransactionrecovered: RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However, theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace. RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itis thetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthe mostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependinguponthe workloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhigh availabilitysolutionused. YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeand acceptabledataloss,andasmetricsformonitoringavailabilityhealth. JustifyingROIorOpportunityCost Thebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecosts mayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditionto projectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcan alsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPO goalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude: Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure, distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventive maintenance. Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeon thecustomerexperiencethroughautomaticandtransparentrecovery. Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocan beleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributing workloadsacrossallavailablehardware. 8. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4 ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththe projectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactual outage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime. MonitoringAvailabilityHealth Fromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderall relevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordata latencyonyourstandbyinstancesasaproxyforexpectedRPO. Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduring theoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupon detailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis. PlanningforDisasterRecovery Whilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddress whatisdonetoreestablishhighavailabilityaftertheoutage. Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforean actualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedor manualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescope ofasounddisasterrecoveryplanshouldinclude: Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,or workload. Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,and diagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties. Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethe systemandbusinessdependencies? Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesrole responsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecovery steps. Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthe systemhasreturnedtonormaloperations? Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailand claritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistype ofdocumentationiscommonlyreferredasarunbookoracookbook. Recoveryrehearsals.Regularlyexercisethedisasterrecoveryplantoestablishbaselineexpectations forRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimary andeachofthedisasterrecoverysites. 9. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5 Overview:HighAvailabilitywithMicrosoftSQLServer2012 AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplications andprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetof featuresandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow. ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothe deepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper. SQLServerAlwaysOn AlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.It canprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplication failovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibility inconfigurationandenablesreuseofexistinghardwareinvestments. AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityat boththedatabaseandtheinstancelevel: AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodataloss throughlogbaseddatamovementfordataprotectionwithoutshareddisks. Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofa logicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,and automaticpagerepair. AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureand supportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServer instances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfaster applicationrecovery. SignificantlyReducePlannedDowntime Thekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperating systempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentofthe outagesinanITenvironment. SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsand enablingmoreonlinemaintenanceoperations: WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal, streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.This operatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystem patchingrequirementsbyasmuchas60percent. OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumns withdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations. 10. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6 RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingof instances,whichhelpssignificantlytoreduceapplicationdowntime. SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivethe additionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhosts withzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithout impactingapplications. EliminateIdleHardwareandImproveCostEfficiencyandPerformance Typicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOn AvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleservers forreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.The abilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimprove performanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardware investments. EasyDeploymentandManagement FeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandline interface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystem Centerintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups. ContrastingRPOandRTOCapabilities ThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekey driversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution. Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve: HighAvailabilityandDisasterRecovery SQLServerSolution Potential DataLoss (RPO) Potential Recovery Time(RTO) Automatic Failover Readable Secondaries(1) AlwaysOnAvailabilityGroupsynchronouscommit Zero Seconds Yes (4) 02 AlwaysOnAvailabilityGroupasynchronouscommit Seconds Minutes No 04 AlwaysOnFailoverClusterInstance NA (5) Seconds tominutes Yes NA DatabaseMirroring(2) Highsafety(sync+witness) Zero Seconds Yes NA DatabaseMirroring(2) Highperformance(async) Seconds (6) Minutes (6) No NA LogShipping Minutes (6) Minutes tohours(6) No Notduring arestore Backup,Copy,Restore(3) Hours (6) Hours todays (6) No Notduring arestore (1) AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype. (2) ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead. (3) Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability. (4) Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance. (5) TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation. (6) Highlydependentupontheworkload,datavolume,andfailoverprocedures. 11. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7 SQLServerAlwaysOnLayersofProtection SQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogical andphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommon practicetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles, suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers. Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,and toofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions. AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers: Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination. SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.Thenodesthat hosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB). Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailability groupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.Eachreplicaishostedbyan instanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster. Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstance networkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailability grouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology, logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica. ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram: 12. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8 InfrastructureAvailability BothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindows ServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successful MicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies. WindowsOperatingSystem SQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfor networking,storage,security,patching,andmonitoring. ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesand capacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer 2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,and WindowsServer2008R2Datacenteroperatingsystem. Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer 2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx). WindowsServerCoreInstallationOption Asakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallation optioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimal environmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplication support.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled. Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcan significantlyreduceongoingmaintenance,servicing,andpatchingrequirements. AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment, configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbe doneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineor remotetools. OptimizingSQLServerforPrivateCloud HighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloud environment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkand storageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperational expenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresources ondemandwithoutcompromisingcontrol. InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQL ServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithno discernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering. Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivate Cloud(http://www.microsoft.com/SqlServerPrivateCloud). 13. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9 WindowsServerFailoverClustering WindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehigh availabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer. IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbe automaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.With AlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups. ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities: Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusin additiontohostedapplicationsettings.Changestothemetadataorstatusononenodeare automaticallypropagatedtotheothernodesinthecluster. Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchas directattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hosted applications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigure startupandhealthdependenciesuponotherresources. Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthrougha combinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverall healthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster. Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbe automaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailover policycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhosted applicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately. Formoreinformation,seeWindowsServer|FailoverClusteringandNode Balancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx). Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFC clustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecovery stepsareallintrinsicallytiedtoyourWSFCconfiguration. WSFCStorageConfigurations WindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnected storagedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremely robust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeis consideredtobeatfault. Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusinga SCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifa nodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster. 14. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10 SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfiguration combinations,including: Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).Remotestorage technologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,as wellasServerMessagingBlock(SMB)filesharebasedsolutions. Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldisk volumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysical implementationandcapacityoftheunderlyingdiskvolumescanvary. Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthe cluster.Sharedstorageisaccessibletomultiplenodesinthecluster.Controlandownershipof compliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3 protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfile sharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoashared volume. Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenode ownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannow deploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated, localorremotestorage. WSFCResourceHealthDetectionandFailover EachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.A varietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemory errors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices. YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentuponone another.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththe healthofeachofitsresourcedependencies. ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregistered asWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQL ServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentuponthe instancesvirtualnetworknameresource. IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,the configuredfailoverpolicycausestheclusterservicetodooneofthefollowing: Restarttheresourceonthecurrentnode. Settheresourceoffline. Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode. 15. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11 Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthorthe overallhealthofthecluster. WSFCClusterValidationWizard TheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer 2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensure thataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution. Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofservers thatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlying hardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFC clusterwouldbesupportedonagivenconfiguration. Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories: Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating systemversions,devices,services,drivers,andsoon. Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall configuration.ValidatesinternodecommunicationsonallNICs. Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration. Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture, andservicepackandWindowsSoftwareUpdatelevels. Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration, tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime. YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference. YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyou installSQLServer,andasapartofanydisasterrecoveryprocess.Aclustervalidationreportisrequired byMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFC clusterconfiguration. Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster (http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx). Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeo clusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedto applyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidation steps. Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailability Groups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG). 16. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12 WSFCQuorumModesandVotingConfiguration WSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfault tolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisvery importanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisaster recoverysolution. ClusterHealthDetectionbyQuorum EachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode's healthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate. AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.Theoverallhealth andstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeans thattheclusterishealthyenoughtoprovidenodelevelfaulttolerance. Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbe maintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailover to.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.Thisalso causesallSQLServerinstancesregisteredwiththeclustertobestopped. Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobring itbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsection laterinthispaper. QuorumModes AquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorum voting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodes inthecluster. Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes: NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe clustertobehealthy. NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalso countedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe clustertobehealthy. Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshould bevisibletoallnodesinthecluster. NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskcluster resourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskis alsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativeforthe clustertobehealthy. 17. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13 DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodeto thatshareddiskiscountedasanaffirmativevote. Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuorumina Cluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx). Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk, youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes, ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes. VotingandNonVotingNodes Bydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,file sharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.Thequorum discussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteon clusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote. EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeinthe clustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygiven moment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,or appeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunication failure.Akeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodein theWSFCclusterisindeedthatactualstateofthosenodes. ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliable communicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenall nodesareonthesamephysicalsubnet. However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactually onlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetween subnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,that networkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes. Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasa splitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,and inconflictwithoneanother. Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforced quorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthe quorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorum sectionlaterinthispaper. Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodes NodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum. 18. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14 RecommendedAdjustmentstoQuorumVoting Todeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,in sequentialorder: 1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification. 2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote. 3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as theresultofanautomaticfailover,shouldhaveavote. 4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisionto taketheclusterofflinewhenthereisnothingwrongwiththeprimarysite. 5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQL Serverinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossible tiesinthequorumvote. 6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfiguration thatdoesnotsupportahealthyquorum. Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeight Settings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx). Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodeto includeorexcludeitsvote. Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyou administersettingsrelatedWSFCclusterconfigurationandnodequorumvoting. Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/en us/library/ff878305(SQL.110).aspx). 19. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15 WSFCDisasterRecoverythroughForcedQuorum Quorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolving severalnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQL Serverinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausethecluster cannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFC clusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemay havejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilityto communicatewithaquorum. TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleast onenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureor identifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFC clustertoreflectthesurvivingclustertopologyaswell. YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthat tooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andlets youbringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster. Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps: 1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthen examinetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshould preserveforensicdataandsystemlogsforlateranalysis. 2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode, manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotential dataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica. Formoreinformation,seeForceaWSFCClustertoStartWithouta Quorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx). Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFC clusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeof operation. 3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothaveto specifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes. AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodesto synchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetoprevent potentialraceconditionsinresolvingthelastknownstateofthecluster. Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,or youruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyour findingsinstep1areaccurate,thisshouldnotoccur. 20. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16 4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesinthe clusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorum failure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration. Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology, andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFC clusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero. Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestored backtoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverCluster Manager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriate DMVs,verifythatahealthyquorumhasbeenrestored. 5) Recoveravailabilitygroupdatabasereplicasasneeded.Somedatabasesmayrecoverandcome backonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofother databasesmayrequireadditionalmanualsteps. Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringing thembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas, asynchronoussecondaryreplicas. 6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfrom theinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjust relatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroup replicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode. Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncate thetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailed replicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyou willruntheriskofrunningoutoftransactionlogspaceontheotherreplicas. 7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhigh availabilityforhealthyoperations. 8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,and Windowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPoint andRecoveryTimeexperiences. 21. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17 SQLServerInstanceLevelProtection ThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilities andfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServer infrastructurecomponents. AvailabilityImprovementsSQLServerInstances ThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOn FailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups. Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios: FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityof afailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactsthe SQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServer internalcomponenterror. Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:server down,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.The FailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies. PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;any servicelevelfailurecausedfailover. Formoreinformation,seeFailoverPolicyforFailoverClusterInstances (http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx). Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystem configurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthat capturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOn deployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies. Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions (http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes (http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx). SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefile shareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedrive letterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageona physicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/O performancecanverynearlyapproximatethatofdirectattachedstorage. Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabaseson filesharesitstimetoreconsiderthescenario.aspx). 22. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18 Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServer resourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthe filesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline. WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygroup listenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIP addresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtual networkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIP addressinavaryingroundrobinsequence. AlwaysOnFailoverClusterInstances TheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailability ofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter. AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailover Clustering(WSFC)cluster,butonlyactiveononenodeatatime.Clientapplicationsconnecttoavirtual networknameandvirtualIPaddressthatareownedbytheactiveclusternode. EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCcluster servicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeach installednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstance anditsresources,withinapreferredfailoversequence. Databasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththe WSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI. Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/en us/library/ms189134(SQL.110).aspx). FCIFailoverProcess Ifadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFC clusterserviceusingthishighlevelprocesstodoafailover: 1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeis initiated.Atimeoutintherestartattemptindicatesaresourcefailure. 2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover. 3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer serviceisattempted. 4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednode owneroftheFCI. 23. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19 5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartup procedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputs theresourceonthisnewnodeinafailedstate. 6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymode whiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback. FCIImprovements PreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeature enhancementsinSQLServer2012improveavailabilityrobustnessandserviceability: Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetwork interfaceisavailable;thisisknownasanORclusterresourcedependency. PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServer servicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN. Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnet clustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicate dataandcoordinatestoragefailoverbetweenclusternodes. Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverCluster Instance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012 alwayson_3a00_multisitefailoverclusterinstance.aspx). Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnection toeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystem storedprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnostic information. PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasa simpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanew SQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailureto connect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailable diagnosticinformation. Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/en us/library/ff878233(SQL.110).aspx). ThereisnowbroadersupportforFCIstoragescenarios: Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresource dependencyduringsetup. tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,such asalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN. 24. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20 PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstorage volumethatfailedoverwithothersystemdatabases. Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduring failover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotential nodeowners,orelsetheSQLServerservicewillnotstartonsomenodes. 25. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21 DatabaseAvailability ThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponents worktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetof optionsforexplicitlyprotectingdatabasedataanddatatierapplications. AlwaysOnAvailabilityGroups AnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstanceto anotherwithinthesameWSFCcluster.Clientapplicationscanconnecttotheavailabilitygroups databasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstracts theunderlyingSQLServerinstances. AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring, failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServer instancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,andit doesnotrequiretheuseofsymmetricalsharedstorage. Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/en us/library/ff877884(SQL.110).aspx). AvailabilityReplicasandRoles EachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyofthe userdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafrom agivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQL Serverinstancemusthavededicated(nonshared)storagevolumes. Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyof theavailabilitygroupdatabasesandisenabledforread/writeoperations. Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateach separatelyserveintheroleofasecondaryreplica. AvailabilityReplicaSynchronization Thecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeach ofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,all databasesintheavailabilitygroupmustbesettothefullrecoverymodel. Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesand transactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportion ofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroring endpointoneachofthesecondaryreplicanodes. Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthe secondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequence number(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionlog hasbeenhardenedandflushedtotheremotedisk. 26. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22 Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenot partoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocess onthesecondaryreplicasasdatalatency. Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailability mode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRAN statement: Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheir respectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2 synchronouscommitsecondaryreplicas. Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butit ensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions. Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocal transactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondary replicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommit secondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype. Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbut allowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible. Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/en us/library/ff877931(SQL.110).aspx). Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronization stateofeachreplica.Youwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawith asynchronizationstateofanythingotherthanSynchronizedorSynchronizing. Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondary replicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itis temporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoes notimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicais healthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommit modeoperations. AvailabilityGroupFailover Theavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesinthe WSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealth andfailoverpolicyoftheprimaryreplica. AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseverity tolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththe sp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies. 27. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23 Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanother node,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeover theroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredto thatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset. Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplica hasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.Thisreplicahealth informationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_states systemview. Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhen failoverisindicated. Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedand synchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicarole istransferredtoasecondaryreplicawithoutanyuserintervention. Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbeset tosynchronouscommitavailabilitymode.Thesynchronizationstatebetweenthereplicasmustbe Synchronized.Additionally,theWSFCclustermusthaveahealthyquorum. AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.Thisis blockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers. Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakea decisiontodeliberatelyfailovertoasecondaryreplicaornot. Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices: o Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionally equivalenttoanautomaticfailover. o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatis possibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitis notsynchronizedwiththeprimaryreplica. Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimary replicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicas tosynchronouscommitandthenperformaplannedmanualfailover. Formoreinformation,seePerformaForcedManualFailoverofanAvailability Group(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx). 28. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24 Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimary replicaorthesecondaryreplicathatyouwanttofailoverto: Failovermodeissettomanual. Availabilitymodeissettoasynchronouscommit. ReplicaresidesonanFCI. Formoreinformation,seeFailoverModes(AlwaysOnAvailability Groups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx). Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,the secondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondary replicasuntiltheprimaryreplicaissettosynchronouscommitmode. AvailabilityGroupListener AnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessa databaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceon whichtheprimaryreplicaresides. ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduring configurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerare registeredwithDNSunderthesamevirtualnetworkname. Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworkname astheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultina connectiontotheSQLServerinstancethatishostingtheprimaryreplica. Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmapto thevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitis successful,oruntilitreachestheconnectiontimeout.Theclientwillattempttomaketheseconnections inparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers. Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygroup listenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointis boundtothenewinstancesvirtualIPaddressesandTCPports. Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/en us/library/hh213417(SQL.110).aspx). 29. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25 ApplicationIntentFiltering Whileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentis tobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified, thedefaultapplicationintentfortheclientisreadwrite. Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnection accesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault, invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServer shouldfilteroutclientconnectionrequestsusingthefollowingrules. Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto: Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection. Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto: Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery. Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection. Formoreinformation,seeConfigureConnectionAccessonanAvailability Replica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx). ApplicationIntentReadOnlyRouting AkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandby hardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyour secondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimary replicas. Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting, databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction, operationalsupport,andadhocqueries. Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServer instanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedto redirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailable secondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier. Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisbound totheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction. Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQL Server)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx) 30. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26 AvailabilityImprovementsDatabases SQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationand capabilities. Thefollowingimprovementreducesrecoverytime: PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccurs periodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofa restartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeach checkpoint,andincreasingrecoverytime(RTO)predictability. PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval, irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes. Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/en us/library/ms189573(SQL.110).aspx). Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime: OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max), varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline. OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystem metadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement. SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorre indexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists. Thereisanexampleofbroadersupportforstoragescenarios: AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesof errorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfroma differentavailabilityreplica. SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhanced tosupportmultiplereplicas. Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/en us/library/bb677167(SQL.110).aspx). 31. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27 ClientConnectivityRecommendations FollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012 AlwaysOntechnologies: AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS) protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOn features.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02, andtheSQLNativeClient11.0. Connectionproviderproperty:MultiSubnetFailover=True.Usethiskeywordinyourconnection stringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatare registeredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets. Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonly workloadsfromyourprimaryreplicaontothesecondaryreplicas. Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallel connectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachof themsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection. Youshouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotential sequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15 seconds+21secondsforeverysecondaryreplica. 32. MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28 Conclusion Thiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanned downtime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012 AlwaysOnhighavailabilityanddisasterrecoverysolutions. Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailable databaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecovery TimeObjectives(RTO). SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevel thatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,ina mannerthatcanbewelljustifiedusingRPOandRTOgoals. For more information: http://www.microsoft.com/sqlserver/: SQL Server Web site http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example: Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason? Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing? This feedback will help us improve the quality of white papers we release. Send feedback. Version 1.1, 21 February 2012.


Recommended