+ All Categories
Home > Software > Transform RMF and SMF into Availability Intelligence

Transform RMF and SMF into Availability Intelligence

Date post: 08-Feb-2017
Category:
Upload: intellimagic
View: 112 times
Download: 1 times
Share this document with a friend
51
1 Transform RMF/SMF into Availability Intelligence Brent Phillips, Managing Director, Americas Jerry Street, Senior Performance Consultant
Transcript
Page 1: Transform RMF and SMF into Availability Intelligence

1

Transform RMF/SMF intoAvailability Intelligence

Brent Phillips, Managing Director, AmericasJerry Street, Senior Performance Consultant

www.intellimagic.com

Page 2: Transform RMF and SMF into Availability Intelligence

2

Today’s Agenda

1. z/OS infrastructure availability2. The “Availability Intelligence” concept3. Sample use cases4. How to create intelligence about threats to availability5. Availability Intelligence as a Service

Page 3: Transform RMF and SMF into Availability Intelligence

3

Session Abstract Using Availability Intelligence to Better Protect the Production Site

• It is time for a new, more intelligent approach to interpreting the RMF & SMF data.   One that provides a dramatically different result that you can easily verify on your own data.

• RMF & SMF produce the world’s richest source of machine-generated data about enterprise infrastructure performance and configuration.  But even the best run shops are not able to use this data to avoid incidents causing unavailability.

• To outsmart unavailability, you have to automatically “crawl” through all the workload data every day at a very granular level.  This data needs to be enriched and constantly evaluated against detailed expert knowledge about the infrastructure. 

• Using the automatic application of expert domain knowledge to mine and interpret the data, you can the risk in your infrastructure to handle your peak workloads, and how that risk is changing over time.   This new visibility gives you warning before your online monitors can even detect any disruption to service levels.

Page 4: Transform RMF and SMF into Availability Intelligence

4

We are inspired by creating intelligence

that illuminates the risks hiding inside your IT infrastructure.

“Any sufficiently advanced technology is indistinguishable from magic”

Arthur C. Clarke, 1962

Page 5: Transform RMF and SMF into Availability Intelligence

5

1. z/OS Infrastructure Availability

Page 6: Transform RMF and SMF into Availability Intelligence

6

Availability on z/OS Systems• What does the “z” stand for?

“zero downtime”

• What is your availability?

• z/OS vs. end-user experience

Page 7: Transform RMF and SMF into Availability Intelligence

7

z/OS Infrastructure Areas• Many components required:

‒ Processor, Memory, WLM Goals, etc.‒ Channels‒ Coupling Facility‒ XCF‒ FICON‒ Disk Storage‒ Replication / DR / GDPS‒ Tape / Virtual Tape Storage

Page 8: Transform RMF and SMF into Availability Intelligence

8

Infrastructure Availability Today: Either Good or Bad

Full Engageds

Little

Panic Hard to focus

Stress level

BrainStatus

Available

Page 9: Transform RMF and SMF into Availability Intelligence

9

The Missing Stage: About to Be Bad

Little

Healthy

Panic

Engaged

Hard to focus

Stress level

BrainStatus

Available

Page 10: Transform RMF and SMF into Availability Intelligence

10

Seeing Threats to Continuous Availability• Question: Which has better intelligence to avoid outages:

‒ A 20 thousand Dollar automobile; or ‒ A SAN storage infrastructure costing millions of Dollars?

Page 11: Transform RMF and SMF into Availability Intelligence

11

Predictable

Unpredictable

Incidents Leading to Application Unavailability

Response for Unpredictable:• Find the problem quicker• Accelerate the

problem fix

Response for Predictable:

• Avoid incident with proactive action

Page 12: Transform RMF and SMF into Availability Intelligence

12

Increasing the Predictable Portion

Predictable

Unpredictable

What would be the impact on:1. Your IT staff?2. Your Employees?3. Your Customers?

Page 13: Transform RMF and SMF into Availability Intelligence

13

2. Availability Intelligence

Page 14: Transform RMF and SMF into Availability Intelligence

14© IntelliMagic 2014

Time End-user impact

Response Time

Your existing monitors look at symptoms

here, only after users experience problems

Detection

SLA

Perfo

rman

ce

IT Infrastructure Availability Monitoring Today

Easy metric to get,

but is an effect, not a cause

Page 15: Transform RMF and SMF into Availability Intelligence

15

Availability Intelligence identifies risk here, before

response time suffers

© IntelliMagic 2014

Time

Response Time

Sub-component SaturationSL

A Pe

rform

ance

Monitoring with Availability Intelligence

DetectionEnd-user impact

Requires evaluating every data point

with expert domain knowledge about every component

Easy metric to get, but is an effect,

not a cause

Page 16: Transform RMF and SMF into Availability Intelligence

16© IntelliMagic 2014

Time

Response Time Sub-component Saturation

SLA

Perfo

rman

ce

Most infrastructure “fires” can be prevented by

intervening here

No end user impact

Changing the Outcome - Avoiding Disruptions

Page 17: Transform RMF and SMF into Availability Intelligence

17

What: Foreknowledge about hidden threats to availability

Why: To better protect continuous availability at primary site by 1. Avoiding incidents (make more of them predictable) 2. Accelerating the resolution (reduce MTTR)

How: Use built-in expert domain knowledge in automatic analysis of the performance and configuration data

What is Availability Intelligence?

Page 18: Transform RMF and SMF into Availability Intelligence

18

• It is not enough to only have:‒ Easier, nicer graphs, visualizations‒ Statistical analysis (as common w/ ITOA - IT Operations Analytics)

• Rather, understanding what the data means for risk requires:‒ HW component knowledge (as gained from performance modeling)‒ Good or Bad? and rate the risk of unavailability‒ How to derive new, meaningful metrics out of the raw data‒ Best practices to configure, manage infrastructure‒ How to visualize the risk and problems in the infrastructure

What Availability Intelligence Requires

Page 19: Transform RMF and SMF into Availability Intelligence

19

Illuminating Threats Inside the Storage Arrays

Storage Array Response

Times

Within Array

Between Arrays

Imbalance?

Application Workloads

Config or Failure

Changes?Disk Device

Loads

FW Bypass, etc.

Back-end,Cache

AdapterUtilization

Fibre Switch Errors

Front-endLag

Measure:

Lead Measures:

Lead Measures:

Page 20: Transform RMF and SMF into Availability Intelligence

20

3. Sample Use Cases

Page 21: Transform RMF and SMF into Availability Intelligence

21

Data Center Rollups of KRI’s - Key Risk Indicators

21© IntelliMagic 2014

Disk Storage Systems

Performance Metrics

Key Risk Indicators

Highest Rating for this Dashboard

Consolidate individual ratings on infrastructure resources into data center views to see risk across enterprise at a glance

Page 22: Transform RMF and SMF into Availability Intelligence

22

Visualizing Risk to Continuous Availability

What does the data mean for your infrastructure availability?Automatic rating of key metrics according to built-in expert knowledge, to obtain intelligence about threats you can use to protect availability

No Border, No Rating Green Border, GoodYellow Border, Early Warning

Red Border, Performance Exceptions

Page 23: Transform RMF and SMF into Availability Intelligence

23

Rating the Risk using Expert Domain Knowledge

Based on straight thresholds where appropriate (like hardware limits)

Based on dynamic thresholds where the limits also depend on

workload characteristics

Page 24: Transform RMF and SMF into Availability Intelligence

24

Disk Infrastructure Use Case: Avoiding disruption to production service levels

Page 25: Transform RMF and SMF into Availability Intelligence

25

Disk Storage System Dashboard [rating: 0.49]Rating based on DSS data using DSS Thresholds

Response Time on first storage array is

rated green – no discernable problem

to end-users yet.

But a threat to availability exists in

an underlying metric (back-end disk drive read response rate)

Page 26: Transform RMF and SMF into Availability Intelligence

26

Response Time (ms) [rating: 0.00]Rating based on DSS data using DSS Thresholds

Response time is a lag measure

But seeing it plotted against the dynamic

thresholds (grey backgrounds) is useful

to have an idea of what can be expected

for that type of workload on that particular array configuration

Page 27: Transform RMF and SMF into Availability Intelligence

27

Breakdown of Response Time Components (ms)

Breakdown of response time into its components allows identification of the largest contributors

Page 28: Transform RMF and SMF into Availability Intelligence

28

Disconnect (ms) [rating: 0.00]Rating based on DSS data using DSS Thresholds

Overall, Disconnect Time is not yet out of range for this array

Page 29: Transform RMF and SMF into Availability Intelligence

29

Disconnect time components (ms)

Built-in knowledge enables a further

breakdown of disconnect time into

its components

Page 30: Transform RMF and SMF into Availability Intelligence

30

Drive Read Response (ms) [rating: 0.49]Rating based on DSS data using Thresholds Specific to this DSS Configuration

What was identified on the exception report is a

deeper issue:

Back-end drives are starting to become

saturated.

With minimal workload growth, this will soon show up in response

time and impact production users

Page 31: Transform RMF and SMF into Availability Intelligence

31

Cost comparison use case: Holistic Evaluation (CPU vs. IO)

Page 32: Transform RMF and SMF into Availability Intelligence

32

Using and Delay components per Service Class(%) (top 20) for all Service Classes by Service Class

Faster job executionis required.

Question:

For the select service class(es),

is it cheaper to obtain the needed performance win

with upgraded CPU or storage?

Page 33: Transform RMF and SMF into Availability Intelligence

33

Is it the time spent waiting on DASD already the

best in class, or is there room

for improvement?

0

0.5

1

1.5

2

2.5

3

3.5

4

0:30 0:45 1:00 1:15 1:30 1:45 2:00 2:15 2:30

ms

Average Response Time Components for Entire Subsystem

IOSQ Pending Connect Disconnect

Approx 65% of Time is Using/Waiting on DASD

Page 34: Transform RMF and SMF into Availability Intelligence

34

Comparing Options for Run Time Improvement

CPU Using

CPU Delay

DASD Using

& Delay

Total Seconds

Run Time savings

Before 1196 1523 3915 6634 na

1. CPU Upgrade 416 265 3915 4596 15%

2.Storage Upgrade 1196 1523 1027 3746 44%

Results of Modeling:

1. upgrading CPU to best available

vs. 2. upgrading storage to next generation

Page 35: Transform RMF and SMF into Availability Intelligence

35

4. How to Create Availability Intelligenceout of the RMF/SMF data

Page 36: Transform RMF and SMF into Availability Intelligence

36

1. z/OS Systems‒ Processors, WLM, Coupling Facility,

XCF, Jobs/Datasets

2. z/OS Disk & Replication‒ Supports every Disk vendor and configuration‒ FICON, Replication, Jobs, Datasets, Storage groups, GDPS…

3. z/OS Tape/Virtual Tape‒ IBM TS7700, Oracle StorageTek VSM ‒ 2016: EMC DLm

IntelliMagic Vision for z/OS

Page 37: Transform RMF and SMF into Availability Intelligence

37

How to generate Availability Intelligence: 7 areas to apply expert domain knowledge

Built-In Domain Knowledge and Expertise is Required for Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Page 38: Transform RMF and SMF into Availability Intelligence

38

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

AsessColle

ct

#1 - Collect data from different sources

• z/OS RMF/CMF Data• Specific SMF Data

‒ IBM GDPS‒ IBM XRC

• SAN Collector• Vendor APIs

Colle

ct

Page 39: Transform RMF and SMF into Availability Intelligence

39

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

AsessColle

ct

#2 – Normalize

Validate, normalize and properly categorize collected data• Consolidate same data from

different sources• Enable different summarization

(e.g. by storage pool)

Normalize

Page 40: Transform RMF and SMF into Availability Intelligence

40

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

AsessColle

ct

Normalize#3 – Enrich

Fill the gaps• Calculate component

utilization from workload data

Enrich

Page 41: Transform RMF and SMF into Availability Intelligence

41

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

AssessColle

ct

Normalize#4 – AssessIs it good or bad?• Apply hardware and

workload knowledge• Are the metrics as

expected based on the used hardware and workload profile?

Rate

Assess

Page 42: Transform RMF and SMF into Availability Intelligence

42

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

Assess

Rate

Colle

ct

Normalize

#5 – RateHow significant and risky is it?

• Rating is always based on two thresholds (warning, exception)

• Rating is based on‒ Knowledge of HW

components‒ Best practices

• Focused on lead measures• Avoid false positives• Avoid false negatives

Rate

Page 43: Transform RMF and SMF into Availability Intelligence

43

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

Assess

RateRec

ommen

d

Colle

ct

Normalize#6 – Recomendations What to do next?• For the rated exceptions in

the entire environment include recommendations about what is likely going on and what to do…

Recom

mend

Page 44: Transform RMF and SMF into Availability Intelligence

44

How to generate Availability Intelligence

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

Assess

RateRec

ommen

d

Colle

ct

Normalize#7 - Visualization

• Optimized presentation of results

• Rating always visible‒ By coloured frames

and bubbles • Web Interface• Automated reporting

based on rating result

Page 45: Transform RMF and SMF into Availability Intelligence

45

Benefits1. Neutralize Threats2. Accelerate fixes

Sample actions: •Rebalance work•Fix lost redundancy•Isolate change•Correct error •Hardware upgrade

7 Key Areas to Apply Expert Knowledge to SMF/RMF

Machine generated data

Domain knowledge, expertise

+ Colle

ctAvailability IntelligenceAutomation

Enrich

Assess

RateRec

ommen

d

Colle

ct

Normalize

Page 46: Transform RMF and SMF into Availability Intelligence

46

Automation & the Power of Always Knowing

• Identify risk for every interval, on every device, in every data center

• A “thousand pairs of eyes” is the only way to continually execute the ITIL v3 definition of the capacity management process: – ensuring…the IT Infrastructure is able to deliver agreed Service Level Targets in a cost

effective and timely manner…considers all Resources required to deliver the IT Service...

Page 47: Transform RMF and SMF into Availability Intelligence

47

5. Availability Intelligence as a Service from IntelliMagic

Page 48: Transform RMF and SMF into Availability Intelligence

48

• Creating the world’s best intelligence about performance and availability risk in your infrastructure

• 20+ year history of delivering solutions for deep infrastructure analysis

• Privately held, financially independent• Customer centric, responsive• Solutions used daily in some

of the world’s largest data centers

IntelliMagic

Page 49: Transform RMF and SMF into Availability Intelligence

49

• Stays up to date on frequently updated hardware knowledge • Very quick time to results (~24 hours)• Okay for security - no PII in infrastructure measurement data• Easy dissemination of intelligence reports• Easy access to expert consultants

Availability Intelligence as a Service

Page 50: Transform RMF and SMF into Availability Intelligence

50

Example US Mainframe SaaS Customers • Insurance

‒ One of the largest in the US• Banking/Financial

‒ One of the largest in the US• Shipping

‒ One of the largest in the US

Page 51: Transform RMF and SMF into Availability Intelligence

51

Outsmart Unavailability with the world’s best intelligence about the current levels of risk hiding in your z/OS infrastructure.To see the difference, just send us the historical RMF/SMF data prior to your last service disruption.For questions/more details, contact:[email protected]

Conclusion

“Any sufficiently advanced technology

is indistinguishable from magic”

Arthur C. Clarke, 1962


Recommended