Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung...

Network Monitoring

Chu-Sing Yang

Department of Electrical EngineeringNational Cheng Kung University

Outline

Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring

Introduction Network monitoring

Observes and analyzes the status and behavior of the end systems, intermediate systems and subnetworks that make up the configuration to be managed

Three major design areas for network monitoring Access to monitored information

How to define monitoring information How to get that information from a resource to a manager

Design of monitoring mechanisms How best to obtain information from resources

Application of monitored information How the monitored information is used in various management func

tional areas

Outline


Network-Monitoring Information

Static information Characterizes the current configuration and the elements in

the current configuration The number & identification of ports on a router

Is typically generated by the element involved The information is available to a manager by an agent or a

proxy Dynamic information

Is related to events in the network A change of state of a protocol machine Transmission of a packet on a network

Is collected and stored by the network element responsible for the underlying events

Network-Monitoring Information (cont.)

Statistical information Is derived from dynamic information

Average no. of packets transmitted per unit time

Is generated by any system that has access to the underlying dynamic information

Monitoring Real-Time System

Network-Monitoring System Monitoring application

Includes the functions of network monitoring that are visible to user Performance monitoring, fault monitoring, accounting monitoring

Manager function

Is the module at network monitor Performs the basic monitoring function of retrieving information from other

elements Agent function

Gathers and records management information for one or more network elements Communicates the information to the monitor

Managed objects Is the management information that represents resources and their activities

Monitoring agent An additional module concerned with statistical information Generates summaries and statistical analyses of management information

Network Monitoring Configurations

Network Monitoring Configurations

Network monitor Includes agent software and a set of managed objects

To assure that the monitor continues to perform function Monitor the load on itself and on the network Monitor the status and behavior of the network monitor

Monitors the amount of network management traffic into and out of the network monitor

External monitors (remote monitors) Includes one or more agents that monitor traffic on a network

Proxy agent If network elements do not share a common network management

protocol with the network monitor

Two-Tier Management Communication Model

Database

} Network Elements

Network Queries

Unsolicited Events

{ Manager

Unmanaged Element

Managed Element

AgentAgent

Managed Element

AgentAgent

Managed Element

AgentAgent

Network

Management

System

Two-Tier Management Communication

} Network Elements

Network Queries Unsolicited

Events

Router

The Real World

CiscoWorksHP-OpenView

} NetworkManagementSystem

Call Manager

Printer RouterSwitch

Unmanaged Element

ProxyAgentProxyAgent

Three-Tier Management Communication

} Network Elements

RMONProbeRMONProbe

The Model

MDB{ Manager

Managed Element

AgentAgent

NMS

Three-Tier Management Communication

The Real World

CiscoWorksConcord eHealth

}NetworkManagementSystem

SwitchProbe

Switch{Managed

Element

Polling and Event Reporting

Information that is useful for network monitoring is collected and stored by agents and made available to one or more managers systemsPolling

Is a request-response interaction between a manager and agent

The manager queries any agent and request the values of various information elements

Is used to generate a report on behalf of a user and to respond to specific user queries

Event Reporting

Agent may generate a report Periodically to give the manager its current status When a significant event or an unusual event occurs

Manager Is a listener waiting for incoming information Preconfigure or set the reporting period

Benefits Be useful for detecting problems as soon as they occur More efficient than polling for monitoring objects whose stat

es or values change relatively infrequently

Polling

Manager Queries any agent and request the values of various

information elements Learns about the configuration it is managing Obtains periodically an update of conditions Investigates an area in detail after being alerted to a problem

Agent Responds with information from its MIB Reports information matching certain criteria Supplies the manager with information about the structure of

the MIB at the agent

Polling vs. Event Reporting Factors of choices

The amount of network traffic generated by each methods Robustness in critical situations The time delay in notifying the network manager The amount of processing in managed devices The tradeoffs of reliable versus unreliable transfer The network-monitoring applications being supported The contingencies required in case a notifying device fails

before sending a report In general

SNMP approach: polling Telecommunications management systems: both

Outline


Performance Indicators

Difficulties in selection and use of the indicators There are too many indicators in use The meanings of most indicators are not yet clearly

understood Some indicators are supported by some manufacturers only Most indicators are not suitable for comparison with each

other Indicators are accurately measured but incorrectly

interpreted The calculation of indicators takes too much time, and the

final results can hardly be used for controlling the environment

Performance Indicators

Service-oriented measuresthe highest priorityAvailabilityResponse timeAccuracy

Efficiency-oriented measuresThroughputUtilization

Availability

The percentage of time that a network system, a component, or an application is available for a user

Availability is based on the reliability of the individual components of a network MTBF: mean time between failures MTTR: mean time to repair Availability = MTBF / (MTBF+MTTR)

Availability of a system depends on the availability of its individual components plus the system organization Redundant components

A = 0.98A(serial)=0.98x0.98 =0.96

Unavailabily=1-A=0.02Unavailability of parallel=0.02x0.02=0.0004A(parallel) = 1-0.0004 =0.9996

Availability (cont.)

Functional availability for a dual link system Nonpeak periods accounts for 40% of requests, ether link c

an handle the traffic load During peak periods, both links are required to handle the fu

ll load, but one link can handle 80% of the peak load Af = (capability when 1 link is up) * Pr[1 link up]

+ (capability when 2 links are up) * Pr[2 links up] Af (nonpeak) = 1 * [A(1-A) + (1-A)A] + 1 * (A*A) = 0.99 Af (peak) = 0.8 * [A(1-A) + (1-A)A] + 1 * (A)(A) = 0.954 Af = 0.6 * Af (peak) + 0.4 * Af (nonpeak) If A = 0.9, Af = 0.9684

Base Requirements for Availability

Secure facilities Power systems Circuit diversity Intra-chassis redundancy

Dual power suppliesOnline Insertion and RemovalMulti-processor design

Response Time

The time it takes for a response to appear at a user’s terminal after a user action calls for it

The cost for shorter response time Computer processing power

Increased processing power means increased cost Competing requirements

Provides rapid response time to some processes may penalized other processes

Productivity increases as rapid response times are achieved Up to 2 seconds response time is acceptable for most

interactive applications

System Response Time

Elements of Response Time

Accuracy

The percentage of time that no errors occur in the transmission and delivery of informationBuilt-in error correction mechanisms in protocols

Data link and TCP protocols

Monitors the rate of errors Indicates an intermittent faulty line Exists a source of noise or interference

Throughput

The rate at which application-oriented events occur Is an application-oriented measure

No. of transactions of a given type for a period of time No. of customer sessions for a given applications during a

period of time No. of calls for a circuit-switched environment

Is useful to track these measures over time Performance trouble spots

Utilization

The percentage of the theoretical capacity of a resource (e.g., multiplexer, transmission line, switch) that is being used

Is a more fine-grained measure than throughput Used to search for potential bottlenecks and

areas of congestion Response time usually increases exponentially

as the utilization of a resource increases

Simple Efficiency Analysis

Outline


Performance-Monitoring Function

Three components for performance monitoring Performance measurement

Gathers statistics about network traffic and timing Accomplished by agent modules to observe the behavior of nodes

No. of connections, the traffic per connection External (remote) monitor

Be able to unload the processing requirement from operational nodes to a dedicated system

Performance analysis Consists of software for reducing and presenting the data

Synthetic traffic generation Permits the network to be observed under a controlled load

Performance Measurement Reports

Host communication matrix Group communication matrix Packet type histogram Data packet size histogram Throughput-utilization distribution Packet interarrival time histogram Channel acquisition delay histogram Communication delay histogram Collision count histogram Transmission count histogram

Inquiry Concerns Possible Errors and Inefficiencies

Are there S-D pairs with unusually heavy traffic Are some packet types of unusually high frequency,

indicating an error or an inefficient protocol? What is the distribution of data packet size? What are the channel acquisition and communication

delay distribution? Are collisions a factor in getting packets transmitted? What is the channel utilization and throughput?

Inquiry Concerns Increasing Traffic Load

What is the effect of traffic load on utilization, throughput and time delay?

When does traffic load start to degrade system performance?

What is the tradeoff among stability, throughput and delay?

What is the max capacity of the channel under normal operating conditions?

How many active users are necessary to reach this maximum?

Inquiry Concerns Varying Packet Sizes

Do larger packets increase or decrease throughput and delay?

How does constant packet size affect utilization and delay?

Statistical versus Exhaustive measurement

When an agent is monitoring a heavy load of traffic, it may not be practical to collect exhaustive dataMonitors the total number of packets in a given

time period between each S-D pair on the LAN Samples the traffic stream to estimate the

value of the random variableStatistical methods: probabilities

Outline


Fault Monitoring Objective

Identify faults as quickly as possible after they occur and identify the cause of the fault so that remedial action may be taken

Problems of fault observation – locate and diagnose faults Unobservable faults

Certain faults are inherently unobservable locally The existence of a deadlock between cooperating distributed processes may

not be observable locally Partially observable faults

A node failure may be observable but insufficient to pinpoint the problem The failure of low-level protocol

Uncertainty in observation Lack of response from a remote device may mean that the device is stuck,

the network is partitioned, congestion caused the response to be delayed, or the local timer is faulty

Fault Monitoring (cont.) Problems in fault isolation

Multiple potential causes Multiple technologies will cause the potential point of failure and the

types of failures increase

Too many related observations A single failure may generate many secondary failures

Interference between diagnosis and local recovery procedures

Local recovery procedures may destroy important evidence concerning the nature of the fault, disabling diagnosis

Absence of automated testing tools Testing to isolate faults is difficult and costly to administer

Fault Monitoring

x

Fault Monitoring

Fault Monitoring

Fault-Monitoring Functions

Detect faults Agent reports errors independently to one or more

managers Agent maintains a log of significant events and errors Criteria for issuing a fault report

Avoids overloading Anticipate faults

Set up thresholds Packet loss rate

An effective user interface

Test a Fault Monitoring System

Connectivity test Data integrity test Protocol integrity test Data saturation test Connection saturation test Response-time test Loopback test Function test Diagnostic test

Outline


Accounting Monitoring

Keep track of users’ usage of network resources An internal accounting system assesses the overall usage of resources

and determines the cost of shared resource to each department System offers a public services

Resources that may be subjected to accounting Communications facilities

LANs, WANs, leased lines, dial-up lines, and PBX system Computer hardware

Workstations and servers Software and systems

Applications and utility software in servers, a data center, and end-user sites Services

Includes all commercial communication and information services

Collect Accounting Data Based on the requirements of the organization Communications-related accounting data might be gathered

and maintained on each user User identification Receiver No. of packets Security level

Identifies the transmission and processing priorities Time stamps

Associated with each transmission and processing event Transaction start and stop times

Network status codes Indicates the nature of any detected errors or malfunctions

Resources used

Summary

Network monitoring is the most fundamental aspect of automated network management Gathers information about the status and behavior of

network elements Static information Dynamic information Statistical information

Agent collects local management information and transmits to one or more NMS

Each NMS includes network management application software plus software for communication with agents

Summary

Performance monitoring Availability Response time Accuracy Throughput Utilization

Fault monitoring Identifies faults as quickly as possible Identifies the cause of the fault to take corrective action Fault monitoring function is complicated

Accounting monitoring Gathers usage information for each resources

Date post:	19-Dec-2015
Category:	Documents
View:	212 times
Download:	0 times

Network Monitoring Chu-Sing Yang Department of Electrical Engineering National Cheng Kung...

Documents