©202 BMC Software, Inc. All Rights Reserved.
Server Consolidation
Eric D. Ho Advisory Software Consultant
BMC Software, Inc.
March 20, 2002
2
Objective
This presentation is designed to show the methodology by which server consolidation study can be achieved using PATROL Perform and Predict.
3
Server Consolidation Advantages
Reduced total cost of ownershipLower license cost for softwareImproved system manageability
System backup and recovery Software distribution Reduce manpower requirements
Improve infrastructure technology Replace older servers with newer technology Faster processors will reduce workload response times and
memory requirements
4
Server Consolidation Challenges
Need to:Measure current usage accuratelyCapture the configuration detailsCharacterize workloadsUnderstand usage patternsPredict the consolidation effectEvaluate alternativesProject growth
5
Server Consolidation Risks
You don’t know where to start !!Methodology, Process, Tools
Wrong Size - It does not fit !!Oops! Career change?
It fits! But.. Performance stinksBuy more…. Spend more!
It will take a long time !!By the time you are done, the solution is obsolete.
6
Server Consolidation Methodology
Six steps to Success 1. Baseline current performance
Collect detailed performance data 2. Characterize workload
System, utility, application, database, etc. 3. Analyze resource usage level
Time Series Graphical Analysis Peaks, batch windows, trends, growth pattern Workloads profiles
4. Combine systems for sizing Server and Workload stacking
5. Consolidation modeling Resource contention analysis Response time degradation analysis Growth sensitivity analysis
6. Validate recommendation
7
1 - Baseline Current Performance
Collect detailed performance metricsSystem, IO, memory, process, user24x7Data collected every 10 secondsData logged every 15 minutes
96 data records per day per server Servers: as10, db02, db14, db15, and db25
8
PATROL Data Collection
PATROL
AGENT
PATROL
Collector
OS KM
Perform AGENTProactive Monitoring
ThresholdsStatus/Alerts
Detect ProblemsRecovery Actions
Availability Problem
Determination Problem Resolution
Proactive Planning
Performance CheckWorkload AnalysisBottleneck AnalysisPerformance ReportingPredictive AnalysisCapacity Planning
Kernel
Data
PATROL History File
Daily Performance
Files
Visualizer Reports
Visualizer
(Windows-based)
High
Medium
Low
Node CPU Utilization by PriorityAS/400 Node COMPANYA on 4/24/96
0
20
40
60
80
10 AM 11 12 PM 1 2 3
% Proc
Graphical Analysis
Analyze
Performance Analysis
Predict
Predictive Analysis
UNIX NT
UNIX or NT
Investigate
UNIX and NT Real-TimeAnalysis
Visualizer Database
PATROL Perform & Predict Architecture
PATROL Collect
Performance Model
Performance Results
TCP: 6767, 6768
TCP: 10128
10
Node CPU UtilizationDistSys Node <ALL> on 03/12/2002 - 03/18/2002
as10 db02 db14 db15 db25
% Total
12:00 AM
11:00 PM
1500
Sunday Monday Tuesday Wednesday Thursday Friday1
Saturday2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31
11
Node CPU UtilizationDistSys Node <ALL> on 03/12/2002-03/18/2002
0
500
1000
1500
2000
250003
/12/
2002
612
PM 6
03/1
3/20
026
12 P
M 603
/14/
2002
612
PM 6
03/1
5/20
026
12 P
M 603
/16/
2002
612
PM 6
03/1
7/20
026
12 P
M 603
/18/
2002
612
PM 6
% Total
db25
db15
db14
db02
as10
12
2. Workload Characterization
Logical Grouping Who (users) What (processes) Where (servers)
Dynamic Post Data Collection
Business Perspective Application Business Unit/Budget Geographic
SystemsSystems UsersUsers
Transactions
Workloads
13
Sample Workloads
Characterize workloads Oracle (1 process = 1 transaction) axciom (Oracle Instance MEMPWD) f45 (Oracle Financials Form 45) f60 (Oracle Financials Form 60) ar25run RGRAGR PMSERVER GL tools (BMC, HP, etc.) system zzz (the rest of processes)
14
3 - Analyze Resource Usage Level
(A) Time Series Graphical AnalysisPeaksBatch windowsTrendsGrowth pattern
(B) Workload Analysis
15
Top 10 Node CPU Utilizationson 03/18/2002
% Total
0%
500%
1000%
1500%
2000%
12:00 AM 2:00 4:00 6:00 8:00 10:00 12:00 PM 3:00 5:00 7:00 9:00 11:00as10 db02 db14 db15
db25
Utilization shown is Total Utilization Number of processors - 48
16
Node I/O RateDistSys Node <ALL> on 03/18/2002
0
5000
10000
15000
20000
25000
12 AM 2 4 6 8 10 12 PM 2 4 6 8 10
IO/Sec
as10
db02
db14
db15
db25
17
Node Paging RateDistSys Node <ALL> on 03/18/2002
0
25
50
75
100
12 AM 2 4 6 8 10 12 PM 2 4 6 8 10
Pg/Sec
as10
db02
db14
db15
db25
18
Top 10 Disk UtilizationsOverall on 03/18/2002
Utilization
0%
100%
12:00 AM 2:00 4:00 6:00 8:00 10:00 12:00 PM 3:00 5:00 7:00 9:00 11:00
27 2785
36
36
36
36
35
35
34
34
33
32
42
42
2423232323222222
52
50
49
49
49
48
48
48
48
47
85
85
57
56
56
56
56
55
55
55
85
80
80
79
78
78
78
78
77
77
75
75
72
72
72
71
71
71
69
69
78
78
77
76
75
75
75
75
74
74
82
82
82
80
79
79
79
79
78
78
78
78
76
76
75
75
74
74
74
73
85
73
73
71
71
70
70
69
69
68
85
70
70
68
68
68
67
67
67
65
68
68
66
65
65
65
65
64
63
63
db02 c11t1d5 db02 c11t6d2 db02 c19t0d0 db02 c19t1d3
db02 c22t1d0 db02 c25t1d1 db15 sd5323 db15 sd5324
db15 sd6853 db15 sd6854
19
3 - Analyze Resource Usage Level
(A) Time Series Graphical AnalysisPeaksBatch windowsTrendsGrowth pattern
(B) Workload Analysis
20
as10 HP N4000/06, 440 MHz 14 GB memory Peak Utilizations from
9am-10pm1pm-2pm
Major Workloads dis4ws f45 f60
Workload Analysis - as10
Workload CPU UtilizationDistSys Node as10 Workload <ALL> on 03/18/2002
0
50
100
150
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Total
zzzTCP_UDP@as10
zzz@as10
tools@as10
system@as10
rw-procs@as10
oracle@as10
f60@as10
f45@as10
dis4ws@as10
21
db02 HP V2500/20, 440 MHz 12 GB memory Peak Utilization:
7am-7pm
Major Workloads Oracle RGRARG
Workload Analysis - db02
Workload CPU UtilizationDistSys Node db02 Workload <ALL> on 03/18/2002
0
500
1000
1500
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Total
f60@db02
zzzTCP_UDP@db02
GL@db02
tools@db02
ar25run@db02
system@db02
zzz@db02
RGRARG@db02
oracle@db02
22
db14 Sun F6800/08, 750 MHz 16 GB memory Peak Utilization from
6pm-12am
Major Workload Oracle
Workload Analysis - db14
Workload CPU UtilizationDistSys Node db14 Workload <ALL> on 03/18/2002
0
100
200
300
400
500
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Total
zzzTCP_UDP@db14
tools@db14
pmserver@db14
system@db14
zzz@db14
oracle@db14
23
Workload Analysis - db15
db15 Sun F6800/08, 750 MHz 16 GB memory Peak Utilization from
1pm-7pm
Major Workload Oracle-Axciom
Workload CPU UtilizationDistSys Node db15 Workload <ALL> on 03/18/2002
0
100
200
300
400
500
600
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Total
zzzTCP_UDP@db15
pmserver@db15
rw-procs@db15
tools@db15
system@db15
zzz@db15
oracle@db15
axciom@db15
24
db25 Sun E4500/06, 440 MHz 6 GB memory Peak Utilization:
5pm-10pm
Major Workload Oracle
Workload Analysis - db25
Workload CPU UtilizationDistSys Node db25 Workload <ALL> on 03/18/2002
0
50
100
150
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Total
zzzTCP_UDP@db25
tools@db25
zzz@db25
system@db25
oracle@db25
25
4 - Combine Systems for Sizing
Server stacking Combined as10 and db02 into 1 server
Change db02 from HP to Sun F6800 Combined 2 database servers (db14 and db25) into 1 server
Workload stacking Stack up all Oracle Instances
Check total capacity requirementUse graphical visualization for quick check!
26
Server Stacking
Stacked as10 and db02 servers
Total CPU requirement is about 1200%
12 processors needed?
Workload CPU UtilizationDistSys Node [ORAFIN] Workload <ALL> on 03/18/2002
0
500
1000
1500
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% Totaloracle@as10 as10
f60@db02 db02
zzzTCP_UDP@as10 as10
zzzTCP_UDP@db02 db02
tools@as10 as10
GL@db02 db02
rw-procs@as10 as10
f60@as10 as10
system@as10 as10
tools@db02 db02
f45@as10 as10
ar25run@db02 db02
zzz@as10 as10
dis4ws@as10 as10
system@db02 db02
zzz@db02 db02
RGRARG@db02 db02
oracle@db02 db02
27
Server Stacking
Stacked 3 database servers into 1
Total CPU requirement is less than 1000% (10 processors)
IO issue?Paging issue?
Workload CPU UtilizationDistSys Node [DB] Workload <ALL> on 03/18/2002
0
200
400
600
800
1000
12 AM 4 8 12 PM 4 8
2 6 10 2 6 10
% TotalzzzTCP_UDP@db14 db14
zzzTCP_UDP@db15 db15
zzzTCP_UDP@db25 db25
pmserver@db15 db15
tools@db14 db14
rw-procs@db15 db15
tools@db15 db15
tools@db25 db25
pmserver@db14 db14
zzz@db25 db25
system@db25 db25
system@db14 db14
system@db15 db15
zzz@db14 db14
zzz@db15 db15
oracle@db25 db25
oracle@db15 db15
axciom@db15 db15
oracle@db14 db14
28
Workload Stacking
Stacked all Oracle workloads into 1 server
Total CPU requirement is slightly over 800% (on 8 processors)
Workload CPU UtilizationDistSys Node <ALL> Workload [DATABASE] on 03/12/2002-03/18/2002
0
200
400
600
800
1000
03/1
2/20
026
12 P
M 603
/13/
2002
612
PM 6
03/1
4/20
026
12 P
M 603
/15/
2002
612
PM 6
03/1
6/20
026
12 P
M 603
/17/
2002
612
PM 6
03/1
8/20
026
12 P
M 6
% Total
oracle@db14 db14
axciom@db15 db15
oracle@db25 db25
oracle@db15 db15
29
5 - Consolidation Modeling
Resource contention analysis Combined as10 and db02 into 1 server
Change db02 from HP to Sun F6800/16 @750 MHz Consolidate Workloads from db14 and db25 into db14
Response time degradation analysisGrowth sensitivity analysis
Use 3/18/2002 14:00 to 15:00 as baseline interval
Let’s see how PATROL Predict works…...
30
Baseline Model: Mar-18-2002, 14:00
Prepare the baseline model Build a model for all nodes at peak utilization Calibrate the models to ensure measured and
calculated values are accurate.
31
Baseline Analysis - Response Time
Note: Response Time
corresponded to transaction turnaround time
Relative Response Time was set to 1. Any “what-if” scenarios would change the Relative Response Time to reflect improvement or degradation
32
Baseline Analysis - Utilization
Note: This report shows
the current workload breakdown of as10and db02
We would “move” the application workloads from as10 to db02 as part of the server consolidation.
33
Baseline Analysis
Note: This report shows
the current workload breakdown of db14, db15 and db25
We would “move” the application workloads from db25 to db14 as part of the server consolidation.
34
What-if Analyses
Growth Sensitivity AnalysisServer Sizing
Application Server Database Server…
Application SizingDisaster PlanningHardware Purchase PlanningCapacity Planning
35
Consolidation Modeling #1
Resource contention analysisCombined as10 and db02 into 1 server
Change db02 from HP to Sun F6800/16 @ 750 MHz
Used 3/18/2002 14:00 to 15:00 as baseline interval
36
What-if Modeling - Utilization
Note: This report shows
the workloads f45, f60, dis4ws and rw-procs were moved from as10 to db02.
Next, we would look at the relative response time changes.
37
What-if Modeling - Response Time
Note: This report shows
the workloads f45, f60, dis4ws and rw-procs about 27% slower after they were moved.
The reason is that db02 has slower processor speed (25.27 specint95 per processor) than as10 (32.96 specint95), even though it has 20 processors versus 6 processors at as10.
Let’s see what happened when db02 is changed to a SUN F6800/16 machine.
38
Note: This report shows
effect of the server upgraded.
The moved workloads are now 90% of the original time.
The oracle workload is now improved by 25%.
SUN F6800/16 at 750 Mhz is rated at 35.34 specint95 per processor)
39
Consolidation Modeling #2
Resource contention analysisConsolidate Workloads from db14 and db25 into
db14Used 3/18/2002 20:00 to 21:00 as baseline interval
since db14 and db25 had higher utilization at night time.
40
Workload Migration - Utilization
Note: This report shows
the Oracle@db25 moved to db14.
41
Workload Migration - Response Time
Note: This report shows
the Oracle@db25 workload running at db14 received a 41% improvement on response time.
Original workloads on db14 were not affected by the moved Oracle workload
42
6 - Validate Recommendation
Create test environmentObserve results of initial implementationCompare modeled results with “consolidated”
measurement.Re-model the combined systems to account for
un-foreseen changes
43
Server Consolidation ReviewSix steps to Success 1. Baseline current performance
Collect detailed performance data 2. Characterize workload
System, utility, application, database, etc. 3. Analyze resource usage level
Time Series Graphical Analysis Peaks, batch windows, trends, growth pattern Workloads profiles
4. Combine systems for sizing Server and Workload stacking
5. Consolidation modeling Resource contention analysis Response time degradation analysis Growth sensitivity analysis
6. Validate recommendation
44
STORAGE Consolidation Too?
BMC’s Application Centric Storage Management (ACSM) products can be leveraged to consolidate the storage side…
45
PATROL Performance Management Summary
An established process An integrated suite of products and services to manage
mission critical client/server applications.
A proven methodology Performance and capacity management across multiple
platforms
Multi-functional Performance Analysis Daily Performance Visualization Interactive Performance Prediction
High degree of process automation
46
ROI’s
Ensure consistent approach to take on server consolidation projectsROI: Reduce risks and costs
Enable IT staff to understand performance information and evaluate alternatives effectivelyROI: Better IT Performance/$ Ratio
Empower IT staff to plan for and justify expenditures with confidenceROI: Timely hardware/software acquisitions