+ All Categories
Home > Documents > EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault...

EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault...

Date post: 30-Mar-2015
Category:
Upload: ashlynn-shaker
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004
Transcript
Page 1: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

EarthLink and Micromuse:

Growing up Together

Doug McClureEarthLink OperationsSr. Manager, Fault and Performance MgmtJune 3, 2004

Page 2: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

2

Fault & Performance Mgmt Overview

• One of the Nation’s Largest ISPs

• Headquarters in Atlanta, GA– Key facilities in Dallas, TX, Pasadena and San Jose, CA, Knoxville, TN and Seattle, WA

• Profitable, strong balance sheet

• Largest DSL footprint

• First-to-market with products that provide the best possible Internet experience

• Customer Advocacy: Fighting SPAM with technical solutions, litigation, legislative support, industry collaboration and consumer education

– Howard Carmack, aka the "Buffalo Spammer," was sentenced to 3-1/2 to seven years in prison on May 27 th after EarthLink received a $16.4M civil judgment in May 2003

• 10th Anniversary (1994-2004) – http://www.redefineyourworld.com

                                                

Page 3: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

3

Fault & Performance Mgmt Overview

5.25M Customers• ~4M Dialup (Premium ~3.5M, Value ~500K)• ~1.2M Broadband (Cable, xDSL)• ~160K Web Hosting (Unix, Windows)• ~50K Wireless (Blackberry, PDA, Laptops, Wi-Fi)

Dial Access Coverage > 90% of US Population• ~16K Local Dial Access Numbers• ~500K Active Modem Ports (~50% ELNK, ~50% Outsourced)• ~400 PoPs (18 Core Backbone PoPs, four data centers)

Broadband Coverage• ~200 Markets with Broadband Offerings

Large and Diverse Infrastructure• 2300 Network Elements• 1500 Server Elements• Thousands of Access Circuits, Hundreds of WAN Circuits

Page 4: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

4

Fault & Performance Mgmt Overview

Access Technology Innovation•Premium and Value Dial-up•Broadband (Cable, xDSL, Satellite)•Voice (Converged Devices, VoIP)•Wireless (WiFi, CDMA, Blackberry, PDA)•Broadband over Power Lines (BPL)

Value Added Service and Product Innovation•Blocker Family: spamBlocker, POP-UP Blocker, ScamBlocker, Virus Blocker, Spyware Blocker

•Parental Controls•Webmail•Web Accelerator

                                                   

                                                                                                                             

          

Page 5: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

5

Fault & Performance Mgmt Overview

Exceptional Customer Service•2003 PC Magazine Readers' Choice Awards for both high-speed and dial-up services

•2003 highest ranking in customer satisfaction for the second year in a row for high-speed Internet service by J.D. Power and Associates in its Internet Service Provider Residential Customer Satisfaction StudySM

•2003 CNET Editors' Choice award

Page 6: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

6

Fault & Performance MgmtInnovation = Constant Change

Drivers•Speed to Market, Competition – Do more, faster•Quality, Performance, Support Costs•Compliance - Sarbanes-Oxley

Operational Challenges•Release Management•Change Management•Service Level Management

Page 7: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

7

Fault & Performance MgmtOperations Maturity: Growing Up

Production Improvement Program (PIP)•Foundation in IT Service Management, ITIL, CobIT•Focusing on four main areas: Service Level Mgmt, Change Mgmt, Release Mgmt, and Production Security

– Over 10% of Operations staff have now attended ITIL Foundation Training

• 1 Master Level Certified (more planned)• 9 Practitioner Level Trained in CCR Quadrant (pending

certification results)• 114 Foundation Level Trained (most pending

certification results)

Page 8: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

8

Fault & Performance MgmtOperations Maturity: Growing Up

Service Level Management• NOC, Help Desk• Set and manage expectations internal/external to

Operations

Change Management• Provide oversight and control of the production

environment• Minimize risk and impact from change activities

Release Management• Development Operations• Minimize poor quality production releases

Enterprise Security• Compliance, control, audit

Page 9: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

9

Fault & Performance MgmtEarthLink and Micromuse FactsVery Early Netcool Adopter• EarthLink (Mindspring) was Micromuse’s first US customer

– Began evaluating Micromuse Netcool in 1996, official customer April 1997

Early Innovation• Early joint innovation and development helped build foundation for many

of Micromuse’s key products– EarthLink and Micromuse are revitalizing joint development projects with

emerging service and business activity monitoring products

Driving 3rd Party Vendor Integration & Partnerships• EarthLink requires detailed integration with Micromuse suite – much more

than just “sending SNMP TRAPs”– Quest Software, Compuware, PeopleSoft, Remedy, Cisco Systems, Arbor Networks

Current Deployment• Netcool OMNIbus, Internet Service Monitors, Desktop Clients, Webtop,

Impact, numerous Gateways, Probes, Data Source Adaptors– Two Senior System Engineers, Three System Engineers, Two System Analysts

devoted to Fault and Performance Management (Netcool + Other)• Services provided for NOC (3 shifts, 6 per shift), Systems Administration (3

shifts, 10 per shift), Network Engineering

Page 10: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

10

Fault & Performance MgmtMoving Beyond “MoM” and Apple

PieEarthLink’s Early Micromuse Netcool Deployment• Focused on Netcool as the “Manager of Managers” or “MoM”• Needed during EarthLink’s rapid growth and expansion• Enabled event management, eliminated “swivel chair NOC”

“Apple Pie” is Event Correlation and Deduplication• The Netcool sweet spot was providing EarthLink with event correlation

and deduplication– Able to reduce the event stream from 100,000’s to 1,000’s per week – Further reduction expected to 100’s per week through use of advanced Netcool/Impact

policies and deployment of Netcool/Precision• Enables NOC and support staffs to operate efficiently

Focus now on End-to-End Service Management• Netcool Suite allows EarthLink to manage entire service

– Understand service relationships, service levels, perform service modeling and service discovery

• Enables impact assessment, prioritization, understanding service delivery chain

• Eliminates “needle in the haystack” approach of event management– This is the problem that needs attention now (compared to “I think this is the event

causing problems”)

Page 11: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

11

Fault & Performance MgmtService Management Complexity

S111

ANY WEB BROWSER

PALM CLIENT

CLIENTClientApplications

PresentationLayer

ApplicationServicesLayer

InfrastructureLayer

CoreServicesLayer

HTML

S86S84

APIs

APIs

APIs

StorageS110

S91

S112

Tickets

S102

ANY WEB BROWSER

S83

HTML

S81

IMAP

S108 S104

API 1

Mail

S82

API 4 API 7

API 2

S88

S106

S101S100

SMTP

API 5API 3

POP3

API 6

S109

S90

HTMLHTML

S103

S107

CLIENT

S87

S105

S80

S85

To Other Systems

Good Customer Experience?

Performance?

Infrastructure Events to Netcool

Source: EarthLink Product Group

Page 12: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

12

Fault & Performance MgmtService Management Complexity

Number of Components

Time(24x7x365)

System Changes

Infrastructure Events

D

D

D D D D

D

D

D D D D

D D D D

D D D D

D D D D

Identify key service elements

Instrument those elements

Consolidate & analyze data

Develop service model and SLAs

Dealing with EarthLink Service Complexity:

•The complexity and amount of data generated from end-to-end service management is enormous

•Networks, Firewalls, Servers, Applications, Switches, Routers, Load Balancers, Applications, Databases, etc.

•Netcool/ObjectServer is a must have for EarthLink to effectively manage and understand EarthLink’s service event stream from end-to-end

•Impact 3.0’s cluster capability will enable EarthLink to analyze, enrich, suppress, and manage event stream regardless of our growth

Source: EarthLink Product Group

•RAD (future)•Impact•Precision (future)

•ISM•System Agents•SNMP

•ObjectServer•RAD (future)•Impact

•RAD (future)•Impact•ISM

Page 13: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

13

Fault & Performance Mgmt The Customer IS Important

Customer Experience Monitoring and Management

• The Micromuse Netcool Suite enables proactive, real-time monitoring of the customer’s experience for core EarthLink services

– Over 14K Internet Service Monitors (ISM) instances in operation covering all key services (HTTP, HTTPS, SMTP, POP3, IMAP) and dedicated customers (ICMP)

• Allows for customer experience monitoring information to be correlated, analyzed, and presented in real-time

– Micromuse Netcool/ISMs, Keynote, Compuware Client Vantage, Quest Foglight

– External/Internal Synthetic testing system & network element monitoring system and network port monitoring

• Immediate notification to support groups when customer’s experience degrades

Page 14: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

14

Fault & Performance Mgmt The Business IS Important

Business Activity Monitoring and Management• Expands IT Operations visibility vertically and horizontally• Ties IT Operations data and Business data together

– System Downtime vs. Contact Center Call Volume– Real-Time Customer Subscriptions vs. Sales Forecasts

• Enables Real Time Monitoring and Management of Business and IT processes

– Change and Downtime Management – Customer Registration Management

Page 15: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

15

Fault & Performance MgmtProduction Improvement Program

Release Planning

Dev / Procurement

Release Design, Build

Release Acceptance

Roll-out Planning

Comm, Prep, Training

Distribution/ Installation

Policy, Procedures, Standards & Guidelines

Security Consulting

Security Assessment

Security Monitoring

STATUS CHANGE (1)Prioritization, Risk Assessment and

Forward Schedule of Change

STATUS CHANGE (2)

Change Approval and Proj. Service

Availability

STATUS CHANGE (3)Final Change Approval and

Implementation

Metrics &

Reporting

Corp Project

Ops Project

Non-Project

Pro

d S

ec

REQUEST FOR CHANGE (RFC)

CLOSED RFC

STATUS CHANGE (4)

Review Changes

Security Test & Sign off

Rel

ease

Mg

tC

han

ge

Mg

t

Mutual Benefit from EarthLink’s Innovation and Advanced Use of Micromuse Products

Micromuse OMNIbus, Impact, Webtop, RAD, NFSM

Source: EarthLink SLM Group

Page 16: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

16

Fault & Performance Mgmt Business Activity Monitoring

Managing the Impact of Change and Downtime Activities on the

Business and Operations

Page 17: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

17

Fault & Performance Mgmt Overview

Drivers• Adoption of ITIL/COBIT Best Practices for Change Management

– Production Improvement Program (PIP), SOX Compliance, etc.– Significant change for many groups – Fear, Uncertainty, Doubt (FUD)

• No Real-Time Visibility into Change/Downtime Management Activities

– Business Process • Who, What, When, Where, Why, and How, Cost, Risk, and Impact

– Workflow – Monitor Lifecycle, SLAs, Bottlenecks – Is the process enabling Operations or is it a bottleneck?

– Impact on Infrastructure – False Positives, Contact Center Call Volume (COGS)

• Drive out False Positives from Production Monitoring Systems– Huge burden on NOC and other support staff

• Desire to have Automated Remedy Trouble Ticket Creation– Reduce time to address problems, reduces MTTR

Page 18: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

18

Fault & Performance Mgmt Overview

Solution• Provide Real-Time Visibility into Change/Downtime Process

– There are 12 pending and 24 scheduled change requests for tonight, 6 are underway and 8 start in 15 minutes or less

• Create Actionable Information – Dept. 828 has five outstanding major change requests, attention is

needed

• Ensure Business Rules are Guiding/Enabling the Process – Not Hindering It

– Eliminate FUD

• Report (dashboards, reports) on Process and Impact– NOC and other support groups know what’s happening during change

and downtime windows– Management has oversight and visibility– Business understands impact of change and downtime activity

Page 19: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

19

Fault & Performance Mgmt Implementation• Micromuse Netcool/OMNIbus

– Custom integration with Request for Change (RFC) and Downtime Management System

– ObjectServer flexibility allows for definition of important business and IT data in each event to capture Change/Downtime Status

• Service Impact, Business Impact, Customer Impact, SLA, Restoral Priority, Escalation Path, etc.

• Micromuse Netcool/Impact 3.0– Impact policies build lists in real time for all nodes listed in change/downtime

request– As change/downtime activity progresses through its lifecycle, the

change/downtime Netcool event changes states– Change/Downtime event suppression policy updates all incoming events that

match node list during the maintenance window with “Suppression Status” and “Change/Downtime Reference Number”

• Micromuse Netcool/Webtop 1.2 – RAD 2.0– Process owner (Change/Downtime Management Group) dashboard for

monitoring and managing the overall end-to-end process, workflow, and business impact

– Business group dashboards for monitoring change/downtime activities within area of control (Network Engineering, MIS, etc.)

Page 20: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

20

Fault & Performance Mgmt Webtop 1.2 Presentation

Page 21: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

21

Fault & Performance Mgmt RAD 2.0 Presentation

Page 22: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

22

Fault & Performance Mgmt Netcool Event Management

Change/Downtime Request Events

Suppressed Change/Downtime Activity Events

Change / Downtime Status

Event Suppressed by Change / Downtime

Change / Downtime ID

Page 23: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

23

Fault & Performance Mgmt Future Enhancements

Planned Netcool/Impact Policies•COGS Impact

– Assess support cost impact due to change and downtime activities within Operations and Customer Support in Real-Time

•Data Gap Management– A common question: Why does my chart or graph have gaps? – The solution: Annotate graphs, charts, portals, etc. with the

reason for data gaps caused by planned change/downtime activities

– How: Integrate change and downtime event information with all performance, utilization, and capacity monitoring solutions via Impact 3.0

Page 24: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

24

Fault & Performance Mgmt Business Activity Monitoring

EarthLink Customer Registration, Provisioning, and Fulfillment

Dashboards

Page 25: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

25

Fault & Performance Mgmt RAD 2.0 Joint DevelopmentBusiness Activity Monitoring: Real-Time Customer Registration Dashboard

Page 26: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

26

Fault & Performance MgmtRAD 2.0 Joint Development

Business Activity Monitoring: Real-Time Customer Registration Dashboard

Page 27: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

27

Fault & Performance Mgmt Continuous Improvement

Building better Network and Systems Management

•Founded Atlanta Network and Systems Management Technical User Group (ANSMTUG) in January 2004

– http://www.ansmtug.org– Metro-Atlanta Fortune 100, Service Providers, Enterprise, Media,

and Emerging Technology Companies • Bell South, The Home Depot, EarthLink, Southern Company,

N2 Broadband, eDeltacom, Delta, CNN, Cingular, E*Trade, Knology Broadband, Cox Communications

•Customers helping Customers– Use Micromuse and other NSM products better– Collectively drive product requirements and features into

Micromuse and other NSM vendors•Special Interest Groups (SIG) Forming

– Best practices for NSM using Micromuse Netcool Suite– Aligning NSM solutions to ITIL, MOF, CobIT, etc.

Page 28: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

28

Fault & Performance Mgmt Challenges facing Micromuse

• Product Development, Focus, and Release Cycle– Business * Monitoring (BAM, BSM, BI, BTI, B-I-N-G-O)– Performance Monitoring & Management Solution– Features vs. New Product – Finding the Right Balance– Licensing – Needs Review and Simpler Approach– Support New Technologies Sooner Across Core Products– Uniform Release Cycle (core architecture components and capabilities)

• Discovery, Root Cause Analysis (RCA), Next-Gen Polling– Emerging Competition– Service / Application Discovery & RCA– Universal Poller Concept

• Out of the Box Functionality and Updates– Appearance of Requiring Too Much Customization

• Competition is focusing on this• Many customers have product still on the shelf

– Ease of Use• More out of the box, templates, examples, plug and play, wizards,• Tools and Utilities section on Support website is a start

– Improving Documentation

Page 29: EarthLink and Micromuse: Growing up Together Doug McClure EarthLink Operations Sr. Manager, Fault and Performance Mgmt June 3, 2004.

29

Fault & Performance Mgmt Closing and Q&A

Closing

Q&A

Doug McClureSr. Manager,

Fault and Performance MgmtEarthLink Operations

[email protected] (W)678-362-7712 (C)


Recommended