+ All Categories
Home > Documents > Challenges and Opportunities in Autonomic Computing · Sybase Security Sybase Security Servers...

Challenges and Opportunities in Autonomic Computing · Sybase Security Sybase Security Servers...

Date post: 02-Jul-2018
Category:
Upload: hahuong
View: 220 times
Download: 0 times
Share this document with a friend
46
Thomas J. Watson Research Center PO Box 218 Yorktown Heights, NY 10598 Challenges and Opportunities in Autonomic Computing June 25, 2002 presentation to ICS'02 Alfred Z. Spector VP, Services & Software IBM Research [email protected] Copyright IBM 2002 1
Transcript

Thomas J. Watson Research CenterPO Box 218Yorktown Heights, NY 10598

Challenges and Opportunities in Autonomic Computing

June 25, 2002presentation to ICS'02

Alfred Z. SpectorVP, Services & Software

IBM [email protected]

Copyright IBM 2002

1

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Abstract

Significant advances are required to make systems more adaptive to the growing range of impulses affecting them and to reduce their total cost of management. Progress seems to require significant innovation in adaptive techniques, systems architecture, software engineering, and standards. In this presentation, I will survey the space of the requirements and draw example problems from real systems. I'll then discuss the space of our research at IBM and highlight some of the more compelling research projects we are doing in the area. I'll conclude with a summary of some key challenges for the broader community as they relate to autonomic computing.

2

AZS Presentation to ICS'02 June 25 02 Copyright IBM

IntroductionAutonomic Computing

MotivationSpaceGoalsExamples, Mature and ResearchOur Research Agenda

The Space of Research

Outline

3

AZS Presentation to ICS'02 June 25 02 Copyright IBM

1945 1st IBM Research Lab

in NY (Columbia U)

Established: 1995 Established: 1972

Established: 1982

Established: 1961

Established: 1998

ZürichBeijing

AustinDelhi

Tokyo

Established: 1955

Established: 1995

1952San JoseCalifornia

Established: 1986

AlmadenWatson

Haifa

IBM Research Worldwide

4

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Geometric growth now generating really large quantum gainsInstalled base has reached critical massBuilding blocks, painstakingly developed, over many years workSociety increasingly accepts & needs I/T

So many more things are now feasibleBut, challenges in harnassing I/T technology grow; e.g., using massive parallelism

Unabashed Technical Optimism

5

Autonomic Computing

6

AZS Presentation to ICS'02 June 25 02 Copyright IBM

An application server typically supports

5 Applications10 EJBsHundreds of servlets~ 100 configuration parameters

A web server typically serves

Thousands of web artifacts~ 20 configuration parameters

Failure protocols for each component are different: time-out, number of retries, where and what they log, how they fail

The increasing challenge of managing large systems is due to the inherent complexity of the solution and the sheer number of heterogeneous components

APPCLU 6.2

SUNE-mail

SUN

E-mailAddress Capture

AIXDSS

AIX

DSSGateways

SUN

Sybase Security

AIX

SybaseSecurity Servers

LocalDirector

Network

SUN

Sybase

SUN

Sybase Expressnet DB Servers

APPCLU 6.2

APPCLU 6.2

TPF

TPF

EPRDSYSPLEXIMS

DSUs

PPRDComplex

IMSDSUs

IPCE SYSPLEX

IMSDSUs

CICS

MSC

OS390

OS390

OS390

OS390

CASTPF

SYSPLEXIMSDSUs

IPCW

OS390

Back-end Systems

Typical Enterprise System Configuration

Complex System Topology

Messaging has ~ 50 configuration parameters

Front end for online customer service

SUNSUN

App Logging

MQ AIX

Logging

MQAIX

GatewayLogging

MQ

Hub Server Group

WebsphereApp Server

Netscape Ent. Server

SUN

MQ

HTTP

Presentation Business Logic Gateway

IMSW

IMSS

CASMQ

SNA

OICS Engine

AIX

SNA

SNA

DSSClient

JDBC

HTTP

MQ

SUN

Netscape Ent. Server

CIO’s speak out:

“Most of my costs are really pure maintenance and operations – keeping the processes running that keep the ship afloat. Our development budget suffers.” “Y2K and 9/11 have forced us to look at what we have – and we have too much complexity.”

7

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Increasing emphasis on Total Cost Of OwnershipIncreasing emphasis on QoSIncreasing emphasis on time to market installing applications

Which creates change and instabilityImprovement in Manageability

Absolute requirement w/exponential growth of boxes outstripping productivity improvements for administrators

Problems:Increasing complexityManagement is people intensive

Cost of managementAvailability of people and skills to do management

Solutions must be open

Industry Trends

8

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Towards Autonomic Computing

Self-optimizing System designed to automatically manage resources to allow the servers to meet the enterprise needs in the most efficient fashion

Self-configuring systems designed to define itself "on the fly"

Self-protecting System designed to protect itself from any unauthorized access anywhere

Self-healingAutonomic problem determination and resolution

9

AZS Presentation to ICS'02 June 25 02 Copyright IBM

IBM GoalsCreate and deploy self-managing infrastructure technologies to reduce complexity, lower cost of ownership, and increase reliabilityEstablish an architectural framework for leadership in Autonomic ComputingProvide technologies to reduce the cost of managing systems; that is automating automation (automation squared)

10

AZS Presentation to ICS'02 June 25 02 Copyright IBM

FailureRandom

Malicious

CatastrophicSparse

Aggressive

Load Variability

Attack

Small

Highly malicious

Autonomic Computing Dimensions

Other dimensions

11

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Principles

Local management structureRedundancy, heterogeneityDynamic run-time bindingValidation and self-protection

Requirements

System is always on, always live

Zero IT administrationAny system element can fail

Problems

Testing / verificationRoot cause analysisGlobal system management"Evolving" software vs. upgradingMachine-optimizable componentsStandards

Principles, Requirements, Problems

12

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Society

Enterprise

Campus

System

ComponentStatic, predesigned,fewer options

Dynamic, self-assembling,many options

Architectural Styles at Various Stages

13

AZS Presentation to ICS'02 June 25 02 Copyright IBM

zSeries CPU recoveryCPU duplex

zSeries SysplexWebSphereDB2 self managementIntrusion detection and rejectionAntivirus immune systemNetwork Dispatcher

IBM Example Mature Technologies

14

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Duplicated:Complex controlsArithmetic dataflow

Shared:Cache controlsCache data/address flowR-Unit Check all state updates

Preserve known good stateIf error1. Stop state updates2. Refresh from saved state3. Restart CPU

If error persists1. Extract saved state (SE)2. Load into spare CPU3. Start spare CPU

CFW 3/30/00

E-Unit(unchecked)

Cache(parity)

I-Unit(mirror)

E-Unit(mirror)

R-Unit(ECC on

saved state)

I-Unit(unchecked)

AddressCache dataInstructionsResults / state updatesSaved state data

zSeries CPU Error Detection and Recovery

15

AZS Presentation to ICS'02 June 25 02 Copyright IBM

SMP CEC

CICS

IMS

DB2

SMP CEC

CICS

IMS

DB2

SMP CEC

CICS

IMS

DB2

SMP CEC

CICS

IMS

DB2

SysplexTimer

SysplexTimer

CouplingFacility

CouplingFacility

ESCONDirector

ESCONDirector

CICS ApplicationsIMS Applications

DB2 Applications

No SPOF - hardware or software

CEC16 CPU SMP

Sysplex32 CECsor 512 processors

zSeries Parallel Sysplex

16

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Nanny process to restart application server processes that have failed or hung.Basic resource management - threads, connections, bean pools allocated as needed (within pre-set min and max).Optimized workload management using both session and transactional affinity.Transaction log recoverability. Centralized administration for clustering. Can duplicate server configuration across servers.

WebSphere Application Server: Today

17

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Initial Design and LayoutHardware configuration (a la Estimator for DB2 for 390)Logical database designPhysical data layout (partitioning, allocation to nodegroups, clustering)Auxiliary data structures (indexes, ASTs)Configuration parameters

DB2 for Unix, Windows, & OS/2 V7.1: 73 database manager parms, 72 database parameters (vs. 52 in V5!)330 registry variables!Memory allocation among various heaps, buffer pools, etc.

DB2 for OS/390 and z/OS V7: 200 DB2 system parameters (ZPARMs) -- 116 hiddenMemory allocation among EDM, Statement Caching, and Sort pools60 bufferpools with choices of Virtual, Hiper, and DataSpace-backed

Dynamic Monitoring & Adjustment Database statistics to collect and when, Clustering and REORG Buffer pool hit ratios, Memory allocation Problem determination (deadlocks, bad plans, ...) System / query status & visualization of all the above

Huge Scope of DBA Responsibilities

18

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Event Correlation to improve accuracy and scalabilityIntrusion Tolerance to ensure that the IDS itself is protected against attackBehavior-Based Intrusion Detection to enable detection of previously unknown attacksDistributed Event Triage and CorrelationAgent-based ID systems

State of the Art in Intrusion Detection

19

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Automated VirusAnalysis Center

ActiveNetwork

AdministratorAdministrator

ClientsClients

Widget Co.

AnalyzeDerive CureDistribute

* Sold as Norton Anti-Virus Corporate Edition

Digital Immune System

20

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Automated VirusAnalysis Center

ActiveNetwork

AdministratorAdministrator

ClientsClients

Widget Co.

Wodget Co.

Digital Immune System

21

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Internet

ActiveStandby

Multiple Virtual ClustersMultiple services within each ClusterSeparate balancing parameters used for each Cluster

Automatically balances load within each ClusterFault tolerant: standby ND automatically takes over for failed active NDRequires no operating system modifications Requires no physical alteration to networkRequires no specific code on servers. Server agent code can be installed for but is not requiredUtilizes up to three metrics to balance within each Cluster

Static: based on counts at ND (no server code)Advisors: Measures performance of specific application (server code)System: Measures over all performance of the system (utilizes OS performance monitors)

Dynamic feedback used to balance the loadMonitors systems and uses a weighted combination of the metrics to reassign loadWeighted round-robin, weights automatically adjusted based on feedback

Remotely manageableInterfaces available to connect to a broader autonomic systemStart, Stop, Quiesce, machines in a ClusterAdd or Remove Clusters

Layer 3 and layer 7 routing supported

Network Dispatcher: Autonomic Load Distribution

22

AZS Presentation to ICS'02 June 25 02 Copyright IBM

CACHE

eNetDispatcher

CACHE

CACHE

CACHE

CACHE

net

net

CACHE

eNetDispatcher

CACHE

CACHE

CACHE

CACHE

Origin Server

Origin Server

Origin Server

PODs PODsFront End CachingFront End Caching

Origin Origin cachescaches

Origin Origin ServersServers

ContentContentManagement Management ServersServers

CACHE

HITHIT

CACHE

CACHE

MISSMISS

ContMgmtSvr

ContMgmtSvr

pre-feed

ContentContentSourcesSources

Results

LotusNews/Photos

Publishing

CIS/NetCam

Results

LotusNews/Photos

Publishing

CIS/NetCam

Four-tier Web Serving ArchitectureFour-tier Web Serving Architecture

IBM Olympic Experiences

23

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Oceano provisioning and running stateless servers

eWLM ebusiness Work Load Manager-open servers

eBPM WebSphere

ABLE AI, Policy engine, and Agents

Blue GeneCellular computing architecture

SecuritySelf healing

Ongoing IBM Research Projects

24

Ongoing IBM Research Projects

25

AZS Presentation to ICS'02 June 25 02 Copyright IBM

RequestsRequests

Macy's SportsWeb

Macy's

Virtualized HardwareSingle Point of System Management

SportsWeb

Track performance metricsAggregate & correlate metrics (end-to-end) to SLA violationsOrchestrate reconfiguration

Fixed resource allocationSeparate managementBest effort basis, using own resources

RouterRouterThrottle incoming requests

Océano:

Today:

Océano Project

26

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Self-tuning, End-to-End Performance Management:Self-tuning, End-to-End Performance Management:Dynamic, allocation of server resourcesDynamic, allocation of server resourcesWorkload balancing & routingWorkload balancing & routingCross platform reportingCross platform reportingPolicy based for various classes of users Policy based for various classes of users & applications& applications

InternetInternet

Appliance Appliance ServersServers

Web Web Application Application

ServersServersData and Data and

Transaction Transaction ServersServers

Internet/Internet/ExtranetExtranet

Business Business PartnersPartners

Existing Existing Business Business

DataData

Distributed Workload Management

27

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Adjust every configuration parameter dynamically, while the system is in use!Expand and shrink memory usage, based on workloadAutomatically profile workloads and create/recommend indexes, partitioning, clustering, summary tables, ... to improve performanceAutomatically detect the need, estimate the duration of, and schedule maintenance operations (like reorg, statistics collection, backup, load, rebind)Observe actual performance and exploit that information to improve operations. Recommend action when things aren't they way you want them to be.Project into the future to detect coming problems, like low memory or constrained disk space, and notify you by page or e-mail days or weeks in advance!

Wouldn't it be great if your database was as easy to maintain

and as self- controlled as your

fridge?

Can your database do this? Soon it will...

SMART's Vision

28

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Java-based agent framework and AI component libraryAgent builder, test and debug tools, multi-agent platformAdd adaptivity through on-line machine learning (data mining)Policy-based behavior using rules-based knowledge representationAdd reflexive, reactive, and deliberative goal-seeking behaviorsDistributed hierarchical communication and feedback control

AbleAgent Sensors Effectors

Learning

Intelligent Control

Reasoning

SystemMonitors

System Controls

ABLE Autonomic Components

29

AZS Presentation to ICS'02 June 25 02 Copyright IBM

2.8/5.6 GF/s4 MB

Chip(2 processors)

Board(8 chips, 2x2x2)

Rack(128 boards, 8x8x16)

22.4/44.8 GF/s2.08 GB

2.9/5.7 TF/s266 GB

System(64 cabinets, 32x32x64)

180/360 TF/s16 TB

440 core

440 core

EDRAM

I/O

Autonomic Computing Issues: checkpointing, routing around failed nodes, data migration, communication route optimization

Blue Gene/L System

30

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Behavior-Based Intrusion DetectionSecure Distributed StorageSecure Boot & System Configuration MonitoringTamper-responsive hardwareTraps for catching worms and DoS agentsCertified systems that guarantee program separation

Current Security Research

31

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Self-managing storage systemsSelf-managing data base systemsLEO, DB2 Learning OptimizerArchitecture for control of autonomic systems

A Few New Projects

32

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Space Sequential Skip Sequential

Random

1

2

3

Device Sequential

Skip Sequential

Random

a

b

c

DatabaseDatabase Autonomic Manager

Policy andHistory

Policy

Alerts

Storage SystemStorage System autonomic Manager

Policy andHistory

File System

File System Autonomic Manager

Policy andHistory

StandardPorting Layer

Enhancementadditinos

ALOMS-Tango: Storage for Data Base Systems

33

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Statistics

Plan Execution

Optimizer

Best Plan

Plan Plan ExecutionExecution

OptimizerOptimizer

Best Best PlanPlan Adjustments

SQL Compilation

Actual Cardinalities

Estimated Cardinalities

1. Monitor1. Monitor

2. Analyze2. Analyze

3. Feedback3. Feedback4. Exploit4. Exploit

AdjustmentsAdjustments

EstimatedEstimatedCardinalitiesCardinalities

ActualActualCardinalitiesCardinalities

Learning in Query Optimization

34

AZS Presentation to ICS'02 June 25 02 Copyright IBM

DataBase

Application and Integration Middleware

Operating System

File System

Storage System Processor System

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and service agreements

Hints andDirections

AdministratorAlerts andmeasurement

IBM

ManagedOperations

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and service agreements

Hints and

Directions

AdministratorAlerts andmeasurement

IBM

ManagedOperations

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and service agreements

Hints andDirections

AdministratorAlerts andmeasurement

IBM

ManagedOperations

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and service agreements

Hints andDirections

AdministratorAlerts andmeasurement

IBM

ManagedOperations

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and service agreements

Hints andDirections

AdministratorAlerts andmeasurement

IBM

ManagedOperations

ManagedComponent

ManagedComponent

ManagedComponent

ManagedComponent

Autonomic ManagerPolicy based management,measure, model,

direct

Policy andHistory

Policy

Alerts

Measurement

Measurement

Workload and service agreements

Workload and

service agreements

Hints andDirections

AdministratorAlerts and

measurement

IBM

ManagedOperations

Autonomic Computing - The Whole System

35

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Managementchannel(output)

Managementchannel(input)

Functionalchannel(output)

Functionalchannel(input)

Monitor,control

Mgt.Unit

Func.Unit

Accesscontrol

Encapsulates servicesFunctional unit

Provides the serviceWeb server, DB, etc.

Management unitControls functional unitControl accessNegotiates for input,output services

Autonomic System ArchitectureAn Autonomic Element

36

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Negotiates withdirectory for service

Gets location of DB,storage services

Web ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Web Server

Self-configuringNew web server added

(Leg of a) Strawman ArchitectureAn Autonomic System

37

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Web ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Web Server

Self-configuringNew web server addedNegotiates withdirectory for service

Gets location of DB,storage services

Negotiates with DB,storage services

(Leg of a) Strawman ArchitectureAn Autonomic System

38

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Web ServerWeb ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Self-healing

Storage

Storage service dies

(Leg of a) Strawman ArchitectureAn Autonomic System

39

AZS Presentation to ICS'02 June 25 02 Copyright IBM

DB gets location ofnew storage service

Web ServerWeb ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Self-healingStorage service dies

Storage

(Leg of a) Strawman ArchitectureAn Autonomic System (x)

40

AZS Presentation to ICS'02 June 25 02 Copyright IBM

DB binds new storageservice

Web ServerWeb ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Self-healingStorage service dies

DB gets location ofnew storage service

Storage

DB initializes newstorage service

(Leg of a) Strawman ArchitectureAn Autonomic System

41

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Web ServerWeb ServerWeb Server

DB

Storage Storage

SystemsWebs of elements

Composition of elementsComposition of servicesLate bindingDynamicBy negotiated SLA

Directory

Self-healingStorage service dies

DB gets location ofnew storage service

DB binds new storageservice

DB initializes newstorage service

Back in business withno interruption !

Storage

(Leg of a) Strawman ArchitectureAn Autonomic System

42

AZS Presentation to ICS'02 June 25 02 Copyright IBM

A long list of difficult problemsSystems

An extremely different way of creating systems

TheoryDifficult issues in complex systems, etc.

Candidate Grand Challenge in Computing Research Association (CRA) Grand Challenges Conference (ongoing today)

Autonomic Computing:A Grand Challenge?

43

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Architecture and basic principlesFundamentals and theoryStandardsProduct applications + implicationsSoftware engineering discipline

proof points for all above

(IBM) Autonomic Computing Action Framework

44

AZS Presentation to ICS'02 June 25 02 Copyright IBM

Component System Federation

Optimization Algorithms

Data Mining, Continual OptimizationWorkload management

Extended Cross system workload management

Control Theory Resource SLA managementComponent policy management and enforcementMonitoring

Agregating data and keeping relevant history

End to End Service level agreement managementgreement

Distributed Alg. & Control

Scripting sensors & control Distributed Alg. & ControlOptimization without complete or up to date information

Security Intrusion detection Sensor, Instrumentation Federated Intrusion Detection

Special Languages

Translate Business Policy to component policies

SLA specification language and processor,Policy specification language and processor

Rationalizing distributed policy

Adaptive/Learning Theories

Call Center Optimization,SLA and Policy Enginex

Complex Systems Automated Operation,Agent Technology,Autonomic Computing framework

Federated SystemArchitecture

Infrastructure Component level problem determination,

Unit of work tracking

Time

The Space of Research

45

Thank you for listening.

46


Recommended