+ All Categories
Home > Documents > Managed Availability works by implementing Probes, Monitors and Responders: The Probe is the...

Managed Availability works by implementing Probes, Monitors and Responders: The Probe is the...

Date post: 17-Dec-2015
Category:
Upload: ashlie-thompson
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Managed Availability Made Easy Jay Cotton Microsoft Premier Field Engineer
Transcript

Managed Availability Made Easy

Jay CottonMicrosoft Premier Field Engineer

Managed AvailabilityManaged Availability works by implementing Probes, Monitors and Responders: The Probe is the component that performs the

simple test. It doesn’t care whether the test passes or fails. It simply performs the test.

The Monitor consumes the results of the Probe, and uses that to determine the Health state of the item being monitored.

Depending on the Health state, one or more graduated Responder actions may be invoked. All of this information is logged in the Crimson event channel under Active Monitoring, or Managed Availability.

Managed Availability (Quick Review)

ProbesMeasure the user’s perception of the serviceTypically synthetic user transactions (e.g., send a message via OWA)

Monitors• Evaluates data

collected by probes to determine if action needs to be taken• Depending on the rule, a

monitor can initiate a responder or escalate

• Defines the time from failure that a responder is executed

RespondersExecutes a response to alert generated by a monitor

RespondersRestart – Terminates and restarts serviceReset AppPool – Cycles IIS application poolFailover – Initiates a database or server failoverBugcheck – Initiates a bugcheck of the serverOffline – Takes a protocol on a machine out of serviceOnline – Places a machine back into serviceEscalate – Escalates an issue to an admin

Settings• XML files in the

$exinstall\bin\Monitoring\config folder are used to store configuration settings for some of the probe and monitor work items.

Management Tasks and CmdletsExtract or view system healthGet-ServerHealthGet-HealthReport

View probes, monitors and responders for a health setGet-MonitoringItemIdentity

Details about probes, monitors, and respondersGet-MonitoringItemHelp

OverridesAdmins can alter the thresholds and parameters used by the probes, monitors and respondersEnables emergency actionsEnables fine tuning of thresholds specific to the environment

Can be deployed for specific servers or for the entire environmentServer related overrides are stored in the registryGlobal overrides are stored in Active Directory

OverridesCan be set for a specified duration or to apply to a specific version of the server

Are not immediately implementedExchange Health Service reads configuration every 10 minutesGlobal changes depend on Active Directory replication

Wildcards are not supportedCannot override entire health set in one task

Management Tasks and CmdletsCreate an overrideAdd-ServerMonitoringOverrideAdd-GlobalMonitoringOverride

View overridesGet-ServerMonitoringOverrideGet-GlobalMonitoringOverride

Remove an overrideRemove-ServerMonitoringOverrideRemove-GlobalMonitoringOverride

Event LoggingManaged Availability makes extensive use of crimson channel event logMicrosoft-Exchange-ActiveMonitoring

ProbeDefinitionProbeResultMonitorDefinitionMonitorResultResponderDefinitionResponderResult

Microsoft-Exchange-ManagedAvailabilityMonitoringRecoveryActionResults

DefinitionsProbe, monitor and responder definitions initialized and logged when Health Manager worker process starts

Managed Availability – Recovery ActionsManaged availability logs all recovery actions to the crimson channelMicrosoft.Exchange.ManagedAvailability/RecoveryActionsEvent 500 indicates that a recovery action was startedEvent 501 indicates that a recovery action was successfulEvent 502 indicates that a recovery action was unsuccessful

Managed Availability – Recovery ActionsUseful properties for Recovery Action eventId - Action that was taken. Common values are RestartService, RecycleApplicationPool, ComponentOffline, or ServerFailoverState - Whether the action has started (event 500) or finished (event 501/502)ResourceName - The object that was affected by the action. This will be the name of a service for RestartService actions, or the name of a server for server-level actionsEndTime - The time the action completedResult - Whether the action succeeded or notRequestorName - The name of the Responder that took the action

How to Troubleshoot

1. Get-HealthReport (as a current review)2. Start with MA Recovery Actions3. Then look at responders 4. Review the monitors for those responders5. Then dig into the probes for those monitors.

Troubleshooting Managed Availability

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Recommended