Real-Time Coherence Monitoring in Integrated Environments

Post on 11-Nov-2014

279 views 1 download

Tags:

description

In this presentation, Everett Williams, SL’s Oracle Coherence Expert, discusses problems with analyzing host metrics and Oracle Coherence metrics within a cluster using the current tool sets. He also discusses how to integrate statistics to solve problems that relate between the two in an integrated and intelligent manner. Users of SL’s Oracle Coherence Monitor can address these issues and understand if their Coherence issues are related to issues within the underlying hardware, or whether they reside somewhere else.

transcript

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 1

Real-Time Coherence Monitoring in Integrated Environments

Correlating Coherence Monitoring Metrics with Infrastructure, Database, and Application Server Metrics

5 December 2013 - London, UK

Everett Williams

Senior Director of Technology

SL Corporation

Tom Lubinski

Chief Technology Officer

SL Corporation

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 2

Disclaimer

The following is intended to outline our general product direction. It is

intended for information purposes only, and may not be incorporated

into any contract. It is not a commitment to deliver any material, code,

or functionality, and should not be relied upon in making purchasing

decisions. The development, release, and timing of any features or

functionality described for SL’s products remains at the sole discretion

of SL.

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 3

Agenda

• Customer quotes/Problem Statement

• Data Collected

• Current Tools/Analysis capabilities

• Architecture

• Demo

• Expanding to App Servers

• Challenges

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 4

Customer Questions/Comments

• Nodes, Caches, and Services are nice. But we are server people. –

Online Major Retailer

• How do I know if hardware/network are causing my coherence

problem – Major Apparel Company

• Does blade configuration #1 or blade configuration #2 run my

application better? – Major Retailer

• We spend a great deal of time “poking around” looking for system

metrics – Investment Bank.

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 5

Assumptions

• 100-500 JVMS

• 10-50 of Hosts

• 100s of caches

• Analysis over time

• Overlapping Data set (more than outlier analysis)

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 6

Data Collected and Aggregated

• Coherence Information

– Cache

– Service

– Node

– Storage Manager

• Host Information

– Host CPU/Memory

– Host Network

– Host Process CPU Memory

• Coherence information aggregated to the host level.

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 7

What makes Coherence “unique”

• Single task spread across multiple processes and servers.

• Impact of Network and latency

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 8

Server view of data

• Single Server

• Single state

• No Context

• Doesn’t scale visually

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 9

Top

• Single Server

• Single state

• No Context

• Doesn’t scale visually

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 10

Top/NMon

• Single Server

• Single state

• No Context

• Doesn’t scale visually

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 11

Taskmon Graphs

• Single Server

• No Context

• Time Series not aligned

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 12

Stacked Graphs

• Doesn’t visually scale

• Top N Servers don’t re-sort

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 13

Single Graph with Multiple Trends

• Good for Outlier analysis

• Not good for overlapping trends.

• Single Server

• No Context

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 14

RTView Enterprise Solution

Collect, Analyze, Correlate, and Visualize data from multiple disparate sources

RTView

Enterprise System RTView

Developer

Generic

JVM

VMware Oracle

WebLogic

Oracle

Coherence

… and many more

TIBCO

IBM

Oracle

Database

Custom

Package

OEM

Connector System

Metrics

OEM

Target Systems

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 15

DataServers

RTView Enterprise Solution

Users

EM Central Server(s)

.

CMDB

ALERTDEFS

Configuration

Management

Alert

Aggregation

Directory

Cache Map

Display

Server

HISTORY

EM Central Server(s) provide configuration to

dataservers and alert management to users.

It also provides a CacheMap identifying the

location of all data contained in DataServers.

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 16

DataServer Configuration

Primary / Backup Servers run on different machines

RTView DataServer – with H/A Deployment

DataServer

<primary>

DataServer

<backup>

Historian

<primary>

Historian

<backup>

RTView EM

ConfigServer

Host

A

Systems

Being

Monitored

DataServer

components obtain

configuration

information via EM

ConfigServer

Host

B

H/A Database

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 17

End-To-End Monitoring

• Capture the nested levels systems are implemented in –

heterogeneous component layering

Host Layer

Physical Servers, Network, Disk, OS

App Server Caching Messaging

Servlet JSP

EJB

Topic Queue

Route

Cache Service

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 18

DEMO

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 19

Challenges

• Consolidated snapshot of data

• Visual scalability

• Multiple combinations of trends

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 20

Process/Node

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 21

Network/Service

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 22

Network/Service #2

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 23

Host/Service

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 24

Network/TCMP

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 25

Network/TCMP #2

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 26

Expanding to Application Servers

Correlation of WebLogic and Coherence

Monitoring Metrics

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 27

On-Line Store Overview Diagram

System overview diagram

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 28

WebLogic Cluster/Server Summary

All Servers Organized by Cluster, with Health State

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 29

WebLogic Cluster App Summary

Each Cluster shown as a unit, with server metrics aggregated

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 30

Load Balance Analysis

Load Balance Comparison of multiple metrics across WebLogic and Coherence

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 31

Aggregating Other Middleware Information

Health State of each service aggregated from multiple components

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 32

Aggregating Other Middleware Information

Health State of each service aggregated from multiple components

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 33

Aggregating Other Middleware Information

Including Aggregate Service Alert History over Time

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 34

Aggregating Other Middleware Information

Including Detailed History of Coherence Cache Service

© 2012 SL Corporation. All Rights Reserved.

© 2013 SL Corporation. All Rights Reserved. 35

Thank you!

For more information, please visit

www.sl.com + www.sl.com/blog