+ All Categories
Home > Technology > The IBM dashboard for operational metrics

The IBM dashboard for operational metrics

Date post: 26-Jan-2015
Category:
Upload: platform-cf
View: 113 times
Download: 2 times
Share this document with a friend
Description:
 
31
Transcript
Page 1: The IBM dashboard for operational metrics

1

Page 2: The IBM dashboard for operational metrics

2

Daniel KrookSenior Certified IT Specialist, IBM

The IBM dashboard for operational metrics

Page 3: The IBM dashboard for operational metrics

3

We run Cloud Foundry on dozens of OpenStack VMs

Two intranet clusters

In the past year, we’ve learned how to

Classic: 38 huge VMs deployed with Chef: 1,302 users, 1,710 appsNG: 41 medium VMs deployed with BOSH: 123 users, 247 appsNot counting Dev deploymentsAll on 50+ Nova Compute nodes

• Keep Cloud Foundry running smoothly• Discover and prevent impending problems• Resolve unexpected issues quickly

Page 4: The IBM dashboard for operational metrics

4

1. Show the key data points we track

2. Show how our metrics dashboard helps us monitor that data

3. Share ideas on how to find better data in NG and beyond

4. Spark discussion on improved visibility for CF admins and customers.

Goals of this lightning talk

We are looking to get better at this, and help the community get better as well.

Page 5: The IBM dashboard for operational metrics

5

1. The key data

Page 6: The IBM dashboard for operational metrics

6

What are the important metrics?

Data that can be tracked over time to see trends and behaviors

Data that can help us predict problems before they happen

DEAs and apps health

Memory reserved as a proportion of the memory available

General health of all components

Health of the virtual machines Status of the processes running on them

Database nodes and services

Number of provisioned services against capacity available

At the PaaS layer, that means:

Page 7: The IBM dashboard for operational metrics

7

Deliver continuous availability in the cloud

Proactively solve problems rather than react to them

Understand the behavior of the system to automate it

Why do we need metrics?

Page 8: The IBM dashboard for operational metrics

8

NATS message bus• Discover the components to interrogate • Best for dynamically changing data

Where can we find them?

Cloud Controller database (CCDB)• Longer lived data that isn’t in the varz endpoints

Page 9: The IBM dashboard for operational metrics

9

2. Monitoring that data

Page 10: The IBM dashboard for operational metrics

10

1. Views of component health

2. Resource usage details

3. Ongoing growth trends

4. Access to logs and raw varz

5. Email notifications

Our metrics dashboard provides…

Page 11: The IBM dashboard for operational metrics

11

Components nearing capacity or failure Already failed components Out of control apps and noisy users

Active/inactive users and apps Growth trends and runtime/service adoption

It helps us find (and fix) problems

It helps us see patterns

Page 12: The IBM dashboard for operational metrics

12

User and app trends

There is also one unauthenticated page for high level stats

Page 13: The IBM dashboard for operational metrics

13

DEA list

Page 14: The IBM dashboard for operational metrics

14

DEA details

Page 15: The IBM dashboard for operational metrics

15

Service node list

Page 16: The IBM dashboard for operational metrics

16

Service node details

Page 17: The IBM dashboard for operational metrics

17

User list

Page 18: The IBM dashboard for operational metrics

18

User details

Page 19: The IBM dashboard for operational metrics

19

App list

Page 20: The IBM dashboard for operational metrics

20

App details

Page 21: The IBM dashboard for operational metrics

21

Log list

Page 22: The IBM dashboard for operational metrics

22

Log details

Page 23: The IBM dashboard for operational metrics

23

Email notifications

Page 24: The IBM dashboard for operational metrics

24

3. Finding and acting on better data

Page 25: The IBM dashboard for operational metrics

25

NG provides granular user/org/space views…• This enables better BSS potential in terms of QoS and departmental billing

…But we lost user and app data linkages from the health manager• Can’t see what DEA my app resides on (not currently enabled in our NG version)• Can’t see how many apps a user has (replaced by orgs and spaces, but still

valuable to trace)• See https://github.com/cloudfoundry/cloud_controller_ng/issues/81

We’d like to restore that data, either surface it • in varz endpoints (dynamic data, preferred) or • CC_DB (static data, could be a security concern)

Let’s resolve gaps in data captured from NG

Page 26: The IBM dashboard for operational metrics

26

Detect errors in applications that are traceable to users/orgs• Preemptively reach out to them to see if they need help• Think customer service and proactive support!• Can we hook into to BOSH or Jenkins for automation?

Automate (and expand links to the IaaS and SaaS stacks)• Self healing systems (out of disk, move apps)• Self scaling systems (detect when nearing thresholds)• Evolving topologies (replace unused service nodes with popular ones)

Let’s begin to link metrics to automation

Page 27: The IBM dashboard for operational metrics

27

Admins are the primary beneficiary right now• But data is almost completely read only• Should we provide UAA based tiers of access to admins?

Others can and should benefit• Customers

• End users• Developers

• Management• Executives, line of business owners• Finance

Let’s expand the broadcast of metrics to more users

Page 28: The IBM dashboard for operational metrics

28

Thanks!

Page 29: The IBM dashboard for operational metrics

29

The metrics dashboard innovators

Chris Peters Russell Boykin

Doug Davis Wei Feng

Page 30: The IBM dashboard for operational metrics

30

We’re hiring!

Search Jobs at IBM by:

SmartCloud Application Services

Page 31: The IBM dashboard for operational metrics

31


Recommended