Post on 24-Dec-2015
transcript
Exchange Server 2013 Managed AvailabilityScott Schnoll
OFC-B315
Agenda
Understand our approach to monitoring
Understand Managed Availability
Service Management
Service Health LandscapeExchange Online drove changes in the on-premises product, and changed our approach to monitoringScale drives automationComponent-based monitoring does not measure client experienceClient access is proxied by CAS to the Mailbox server that hosts the active database copy
CAS Array
DAG
MBX-A MBX-BDB1DB1
Load Balancer
Exchange 2013 Managed Availability
Cloud Trained
Bring experience from the service to the enterprise
User Focused
Monitor end user experience
Recovery Oriented
Restore end user experience with recovery actions
Cloud TrainedExchange Engineering team has been operating Exchange Online service for 7 yearsRelevant features, experience, and knowledge from service operation is put into on-premises product
Engineers are on-call for service related issuesDrives accountability for awareness and motivates the team toward auto-healing and recoveryScale, Auto-Deployment, Optics, High Availability are key tenets
User FocusedIf you can’t measure it, you can’t manage it
AvailabilityCan I access the service?
LatencyHow is my experience?
ErrorsAm I able to accomplish what I want?
Availability
ErrorsLatency
Customer Touch Points
Recovery Oriented
—OWA send probe—OWA failure monitor—OWA fast recovery responder—OWA verified as healthy —OWA send probe—OWA failure monitor—OWA fast recovery responder—Failover database responder—OWA verified as healthy—MBX1 is failover target
LB CAS1
CAS2
DAG
MBX1
DB1 DB2
MBX2
OWA DB1 DB2
MBX3
OWA DB1 DB2
OWA
OWA
OWA
OWA DB1
DB1
“Stuff breaks, but the experience does not”
Overview
Probe EngineCheck user experience
Managed Availability Components
Probe
Check
Notify
MonitorEvaluate probe data
EscalateGenerate event log
RespondRestore service or prevent failure System Center
Operations Manager
Exchange Server 2013
Managed Availabilit
y
Probe EngineProbesKey goal: measure the user’s perception of the serviceTypically synthetic user transactions (e.g., send a message via OWA)
ChecksKey goal: measure actual user traffic and become aware when users are experiencing issuesTypically implemented as performance counters where thresholds can be set to detect spikes
NotifyKey goal: take action immediately based on a critical eventTypically conditions that can be detected without large sample set
Probe
Check
Notify
MonitorsEvaluates data collected by probes to determine if action needs to be takenDepending on the rule, a monitor can initiate a responder or escalate
Defines the time from failure that a responder is executed
MonitorEvaluate probe data
EscalateGenerate event log
RespondRestore service or prevent failure
RespondersExecutes a response to alert generated by a monitor
RespondersRestart – Terminates and restarts serviceReset AppPool – Cycles IIS application poolFailover – Initiates a database or server failoverBugcheck – Initiates a bugcheck of the serverOffline – Takes a protocol on a machine out of serviceOnline – Places a machine back into serviceEscalate – Escalates an issue to an admin
EscalateGenerate event log
RespondRestore service or prevent failure
Responder ThrottlingBuilt-in sequencing mechanism controls actionsThrottling ensures the entire service isn’t compromised
All responders can throttle in some fashionSome take into account minimum number of servers within a groupSome take into account timeSome take into account number of occurrencesSome may use a combination of the above
When throttling occurs, responder action may be delayed or skippedFor example, when the Bugcheck Responder is throttled, the action is skipped, not delayed
Monitoring Layers
CAS
MBX
PROTOCOL
STORE
PROTOCOL PROXY
4
3
2
1
PROACTIVE REACTIVE
20s 5min 20min
System Level ChecksMailbox Self Test(e.g. OWA MST) [detection 5m]
Protocol Self Test(e.g. OWA PST) [detection 20 secs]
Proxy Self Test(e.g. OWA PrST) [detection 20 secs]
End User Experience Level ChecksCustomer Touch Point – CTP(e.g. OWA CTP) [detection 20m]
Monitor States
Managed Availability PipelineSampling Detection Recovery
Probe Definition
Monitor Definition
Responder Results
(Responses)
Responder Definition
T3
00:00:00
00:00:10
00:00:30
RestartReset AppPool
FailoverBugcheck
Offline
Escalate
Sequenced Responder Pipeline
Named Times
Probe Results (Samples) ResponderProbe
Notification Item
Monitor Results (Alerts)
Healthy
T1
T2
Monitor
Architecture
Managed Availability ArchitectureUses worker process modelExchange Health Manager Service (MSExchangeHMHost.exe)Exchange Health Manager Worker process (MSExchangeHMWorker.exe)
Uses persistent storageRegistry used to store runtime data, like bookmarks and local overridesActive Directory used to store global overridesCrimson Event Channel used to store work item resultsConsumes data collected by Exchange Diagnostic Service
Managed Availability ArchitectureLeverages multiple HealthMailboxes per databaseThese are user accounts, visible in the MESO/Monitoring Mailboxes container
Can also be viewed usingGet-Mailbox -Monitoring
Server and Service Health
Server and Service HealthHealth SetsHealth GroupsManagement Tasks and CmdletsEvent Logging
Health Sets
Health SetsA health set is a group of monitors, probes and responders for a component that determine whether the component is healthy or unhealthy
View list of health sets:Get-HealthReport –Identity <ServerName>
Health SetsA health set is a group of monitors, probes and responders for a component that determine whether the component is healthy or unhealthy
View list of probes, monitors and responders associated with a health set:Get-MonitoringItemIdentity -Server <ServerName> -Identity <HealthSetName> | ft Identity,ItemType,Name -auto
Health SetsHealth reported using “worst of” monitors in the health set
View details of Health Set to see what monitors are healthy/unhealthy:$Health = Get-HealthReport –Server EX1 | ? {$_.HealthSet –ilike "<Name>"}$Health.Entries | ft Name, AlertValue -auto
31
32
OWA Health Sets | Monitoring Layers
ProtocolHealth Set
ProxyHealth Set
CTPHealth Set
CAS
MBX
PROTOCOL
STORE
PROTOCOL PROXY
4
3
2
1
OWA
OWA.Proxy
OWA.Protocol
System Level ChecksMailbox Self Test(e.g. OWA MST) [detection 5m]
Protocol Self Test(e.g. OWA PST) [detection 20 secs]
Proxy Self Test(e.g. OWA PrST) [detection 20 secs]
End User Experience Level ChecksCustomer Touch Point – CTP(e.g. OWA CTP) [detection 20m]
PROACTIVE REACTIVE
20s 5min 20min
Health Groups
Health GroupsPortals in System Center Operations ManagerCustomer Touch Points – components that affect real-time user interactions (protocols)Service Components – components without direct real-time, user interactions (MRS, OABGen)Server Components – physical resources of the server (disk space, memory, network)Dependency Availability – server’s ability to use dependencies (AD, DNS, etc.)
System Center Operations ManagerDisplays health information related to the Exchange environmentManagement Pack Support: SCOM 2007 R2, SCOM 2012
Escalate responder writes event to event log which is consumed by monitor within SCOMWhen alert is received by SCOM, it may not be the sum total of problems at a given point in time
Dashboard is broken down into three areasActive AlertsOrganization HealthServer Health
Viewing Health in System CenterThe state of a health group is based on the health of the monitors within the groupHealth evaluated by a "worst of" evaluation of the monitors in the group
A health group can have one of six states: Healthy, Degraded, Unhealthy, Repairing, Disabled or Unavailable
Viewing Health in SCOM
Management Tasks and Cmdlets
Management Tasks and CmdletsExtract or view system healthGet-ServerHealthGet-HealthReport
View probes, monitors and responders for a health setGet-MonitoringItemIdentity
Details about probes, monitors, and respondersGet-MonitoringItemHelp
OverridesAdmins can alter the thresholds and parameters used by the probes, monitors and respondersEnables emergency actionsEnables fine tuning of thresholds specific to the environment
Can be deployed for specific servers or for the entire environmentServer related overrides are stored in the registryGlobal overrides are stored in Active Directory
OverridesCan be set for a specified duration or to apply to a specific version of the server
Are not immediately implementedExchange Health Service reads configuration every 10 minutesGlobal changes depend on Active Directory replication
Wildcards are not supportedCannot override entire health set in one task
Management Tasks and CmdletsCreate an overrideAdd-ServerMonitoringOverrideAdd-GlobalMonitoringOverride
View overridesGet-ServerMonitoringOverrideGet-GlobalMonitoringOverride
Remove an overrideRemove-ServerMonitoringOverrideRemove-GlobalMonitoringOverride
Event Logging
Event LoggingManaged Availability makes extensive use of crimson channel event logMicrosoft-Exchange-ActiveMonitoring
ProbeDefinitionProbeResultMonitorDefinitionMonitorResultResponderDefinitionResponderResult
Microsoft-Exchange-ManagedAvailabilityMonitoringRecoveryActionResults
DefinitionsProbe, monitor and responder definitions initialized and logged when Health Manager worker process starts
Recovery ActionsManaged availability logs all recovery actions to the crimson channelMicrosoft.Exchange.ManagedAvailability/RecoveryActionsEvent 500 indicates that a recovery action was startedEvent 501 indicates that a recovery action was successfulEvent 502 indicates that a recovery action was unsuccessful
Managed Availability – Recovery ActionsUseful properties for Recovery Action eventId - Action that was taken. Common values are RestartService, RecycleApplicationPool, ComponentOffline, or ServerFailoverState - Whether the action has started (event 500) or finished (event 501/502)ResourceName - The object that was affected by the action. This will be the name of a service for RestartService actions, or the name of a server for server-level actionsEndTime - The time the action completedResult - Whether the action succeeded or notRequestorName - The name of the Responder that took the action
Summary
Exchange 2013 Managed Availability is…
Cloud Trained
Bring experience from the service to the enterprise
User Focused
Monitor end user experience
Recovery Oriented
Restore end user experience with recovery actions
OFC-B318 Microsoft Exchange Server 2013 SP1 High Availability and Site Resilience
OFC-B244 Microsoft Exchange Server 2013 SP1 Tips and Tricks
OFC-B248 Publishing Microsoft Exchange Server: Which TLA Should You Choose?
OFC-B321 Monitoring and Tuning Microsoft Exchange Server 2013 Performance
Related content
Solutions Advisory Board
Solutions Advisory BoardMicrosoft provides lab-tested, cross-product, end-to-end solutionsSAB members hear our solution ideas, and influence them by providing feedback
SAB Session• Presenters from Microsoft Azure,
Office, Cloud and Datacenter, and Microsoft Consulting Services
SAB Table @ Ask the ExpertsTues 6:30 – 8:30pm
Ask the Experts• Meet the SAB team and ask us
questions • Experts from Microsoft Azure, Office,
Cloud and Datacenter teams
Hilton Americas, Room 335AWed 4:00 – 5:30pm
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
msdn
Resources for Developers
http://microsoft.com/msdn
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Complete an evaluation and enter to win!
Evaluate this session
Scan this QR code to evaluate this session.
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.