\Web Service(Default Web Site)\Current Connections
Slide 8
\MSExchange Active Manager(_total)\Database Mounted
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Large Organization Configuration 36 Cores / 450 GB RAM per
server Higher Mailbox Density Deployed Exchange 2013 in All-In-One
configuration Hardware NLB configured for Least Connections What
Happened? Policy change required removal of local storage of email
Outlook now required to run in Online Mode Impact Increased in
network traffic Users frequently disconnected during peak periods
~2 weeks to isolate problem ~2 weeks to get remediation changes in
place
IISRpcHttpHttpProxyIISRpcHttp RPC Client Access Location
inetpub \logs \LogFiles \W3SVC1 Logging \RpcHttp \W3SVC1 Logging
\HttpProxy \RpcHttp Inetpub \logs \LogFiles \W3SVC2 Logging
\RpcHttp \W3SVC2 Logging \RPC Client Access File Names
u_exXXXXXX.log httperrXXXXX.log RpcHttpXXXXXXXX- X.log
HttpProxyXXXXXX XXXX-X.log u_exXXXXXX.log httperrXXXXX.log
RpcHttpXXXXXXXX- X.log RCA_XXXXXXXXXX- X.log Perfmon Counter \Web
Service(Default Web Site) \Current Connections \RPC/HTTP Proxy
\Current Number of Incoming RPC over HTTP Connections \MSExchange
HttpProxy \Accepted Connection Count \Web Service(Exchange Back
End) \Current Connections \RPC/HTTP Proxy\ Current Number of
Incoming RPC over HTTP Connections \MSExchange RPC ClientAccess
\Current Connections
Slide 21
NetworkCPUMemoryStorage
Slide 22
Network (Requests) \Web Service(Default Web Site)\Current
Connections \MSExchangeIS Store(*)\RPC Average Latency< 100 ms
\MSExchangeIS Client Type(*)\RPC Average Latency < 100 ms
\MSExchangeIS Store(*)\RPC Operation/Sec \MSExchangeIS Client
Type(*)\RPC Operation/Sec Overall RPC Average Latency is not
impacted CAS Experience MoMT \MSExchange RpcClientAccess\RPC
Averaged Latency \MSExchange RpcClientAccess\RPC Operations/sec EAS
\MSExchange ActiveSync\Requests/sec \MSExchange ActiveSync\Current
Requests EWS \MSExchangeWS\Average Response Time
\MSExchangeWS\Requests/sec OWA \MSExchange OWA\Average Response
Time \MSExchange OWA\Average Search Time \MSExchange
OWA\Requests/sec POP \MSExchangePop3(*)\Average LDAP Latency
\MSExchangePop3(*)\Average RPC Latency \MSExchangePop3(*)\Request
Rate IMAP \MSExchangeImap4(*)\Average LDAP Latency
\MSExchangeImap4(*)\Average RPC Latency \MSExchangeImap4(*)\Request
Rate Management / Background Ops PS
\MSExchangeRemotePowershell\Current Connection Sessions
\MSExchangeRemotePowershell\Current Connected Unique Users
Slide 23
Memory (Exchange Process Usage) \Memory\% Committed Bytes in
Use < 80% \Memory\Available MBytes> 5% or RAM.NET CLR
Memory(*)\% Time in GC Should be below 10% on average.NET CLR
Exceptions(*)\# of Excepts Thrown / sec Should be less than 5% of
total requests per second (RPS) (Web Server(_Total)\ Connection
Attempts/sec *.05)..NET CLR Memory(*)\# Bytes in all Heaps Only 30%
bytes committed Memory (WorkstationGC to ServerGC).NET CLR
Memory\Allocated Bytes/sec Sustained >50mb
Slide 24
Storage (Exchange I/O) \MSExchange Active
Manager(_total)\Database Mounted Balanced across all MBX servers
\MSExchange Database ++> Instances(*)\I/O Database Reads
(Attached) Average Latency < 20ms \MSExchange Database ++>
Instances(*)\I/O Database Writes(Attached) Average Latency <
50ms \MSExchange Database ++> Instances(*)\I/O Log Writes
Average Latency < 10ms \MSExchange Database ++>
Instances(*)\I/O Database Reads (Recovery) Average Latency <
200ms \MSExchange Database ++> Instances(*)\I/O Database
Writes(Recovery) Average Latency < read latency for same
instance as above I/O is acceptable
Slide 25
CPU (Exchange Processes) Processor(_Total)\% Processor Time
Should be less than 75% on average. \Processor(_Total)\% Privileged
Time (kernel) Should be less than 75% on average.
\Processor(_Total)\%User TimeShould be less than 75% on average.
\Process (*)\% Processor Time System\Processor Queue Length (all
instances) Shouldn't be greater than 5 per processor. W3wp#3 high
CPU W3WP#3 is the MSExchangeRpcProxyFrontEndApp Pool
Investigation ~4 weeks Preferred architecture not followed
Customer scaled beyond tested configuration NLB algorithm not
optimized for Exchange load profile Resolution Least Connection /
Slow Start on hardware LB Reduced Cores < 20 Scalability
Improvements coming.NET 4.6 (In Preview) Large number of
connections to server in short timeframe RpcProxy FrontEnd AppPool
requests backlogged Managed Availability Probe Fails Managed
Availability restarts service Network load balancer takes server
out of rotation Network load balancer adds server to rotation
Slide 29
Large Organization Configuration 16 Cores / 92 GB RAM per
server Deployed Exchange 2013 in All-In-One configuration NLB
configured for Round Robin What Happened? File writes failing, MA
Probe failures, MDB Failovers Encountered bug with Anti-Virus
Failed to deploy recommended fixes prior to migration Exposed new
bug Impact Users frequently disconnected during peak periods ~8
weeks to isolate problem ~3 weeks to get fix and configuration
changes in place
Slide 30
RpcHttp HttpProxy RpcHttp IIS RPC Client Access Store Worker
IIS I/O Manager File System Driver Anti-Virus Filter Driver Device
Driver Mini-Port Driver Continued I/O delayed stalled forces MA to
move Databases. MBxDB Stalled I/O delaying clients response (dump
showed 6min lock) Is Valid File to Scan?
Slide 31
Monitors Services Performance Counters Event Logs
OutlookProxyTestProbe OutlookRpcSelfTestProbe OutlookRpcCtpProbe
Goals Bring Office365 Capabilities On-Premises Monitor based upon
end user experience Focus on recovery oriented computing Components
Probes test components and user experience Monitors analyze
probe(s) for Pass/Fail Responders take action based up monitor
results When troubleshooting Monitor failures are a signal to a
problem Consistent failures can force a bluescreen Responders
Restart Reset AppPool Failover MBX BugCheck Offline Escalate
Slide 32
Storage Some Database I/O Latencies, but overall all I/O is
fairly healthy.
Slide 33
W3wp#11 CPU util running hot? CPU The server appears to be busy
but uncertain if this normal or a bug
Slide 34
Private Bytes reached 10GB+ before restarting Memory Massive
growth in memory footprint of w3wp#11 process throughout the day.
W3WP Process ID = 62192
Slide 35
AppDomain Used to enable isolation within a process 3 AppDomain
by default Normal W3WP for Exchange has 3-4 AppDomains Created as a
result of config change Exchange Leak in W3SVC/1=
MSExchangeRpcProxyFrontEndApp Pool Process Explorer View AppDomains
and other.NET stats for running processes. Process Explorer
Slide 36
Outlook Anywhere Servicelets used by Exchange for minor tasks
RPCHTTPServicelet runs every 15 minutes RPCHTTPServicelet was
writing update to the Default Web Site/Rpc site from SSL to None on
every run. What was causing this change to continually be
updated?
Slide 37
MSExchangeRPCAppPool MSExchange Services Host System AppDomain
Default AppDomain Front-End AppDomain Back-End AppDomain
BinariesConfig Heaps AppDomain (~125mb at startup) Connections
Front-End AppDomain RPC Client Access Store Worker Instance
Front-End AppDomain Every 15 Min Set SSLOffloading = true +100
Users +50Users +60Users +200 Users MBxDB
Slide 38
Investigation ~10 weeks of investigation Many iterations of
data collected and analyzed Deployment Guidance Missteps NLB
Configuration Set to Round Robin Most recent CU Update + Hotfixes
Resolution NLB Configuration changed to Slow Start Most recent CU
Update + Hotfixes installed Interim configuration change until
KB2925281 hotfix release Final fix in Exchange 2013 Service Pack 1
Data Collection Analysis
Slide 39
Slide 40
BRK3131: Exchange Design Concepts and Best Practices BRK3197:
Exchange Server Preferred Architecture BRK3178: Exchange on IaaS:
Concerns, Tradeoffs, and Best Practices BRK3173: Experts Unplugged:
Exchange Server Deployment and Architecture BRK3158: Experts
Unplugged: Exchange Top Issues BRK3129: Deploying Exchange Server
2016 BRK3102: Experts Unplugged: Exchange Server High Availability
and Site Resilience