Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 212 times |
Download: | 0 times |
Network Monitoring
Chu-Sing Yang
Department of Electrical EngineeringNational Cheng Kung University
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Introduction Network monitoring
Observes and analyzes the status and behavior of the end systems, intermediate systems and subnetworks that make up the configuration to be managed
Three major design areas for network monitoring Access to monitored information
How to define monitoring information How to get that information from a resource to a manager
Design of monitoring mechanisms How best to obtain information from resources
Application of monitored information How the monitored information is used in various management func
tional areas
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Network-Monitoring Information
Static information Characterizes the current configuration and the elements in
the current configuration The number & identification of ports on a router
Is typically generated by the element involved The information is available to a manager by an agent or a
proxy Dynamic information
Is related to events in the network A change of state of a protocol machine Transmission of a packet on a network
Is collected and stored by the network element responsible for the underlying events
Network-Monitoring Information (cont.)
Statistical information Is derived from dynamic information
Average no. of packets transmitted per unit time
Is generated by any system that has access to the underlying dynamic information
Monitoring Real-Time System
Network-Monitoring System Monitoring application
Includes the functions of network monitoring that are visible to user Performance monitoring, fault monitoring, accounting monitoring
Manager function
Is the module at network monitor Performs the basic monitoring function of retrieving information from other
elements Agent function
Gathers and records management information for one or more network elements Communicates the information to the monitor
Managed objects Is the management information that represents resources and their activities
Monitoring agent An additional module concerned with statistical information Generates summaries and statistical analyses of management information
Network Monitoring Configurations
Network Monitoring Configurations
Network monitor Includes agent software and a set of managed objects
To assure that the monitor continues to perform function Monitor the load on itself and on the network Monitor the status and behavior of the network monitor
Monitors the amount of network management traffic into and out of the network monitor
External monitors (remote monitors) Includes one or more agents that monitor traffic on a network
Proxy agent If network elements do not share a common network management
protocol with the network monitor
Two-Tier Management Communication Model
Database
} Network Elements
Network Queries
Unsolicited Events
{ Manager
Unmanaged Element
Managed Element
AgentAgent
Managed Element
AgentAgent
Managed Element
AgentAgent
Network
Management
System
Two-Tier Management Communication
} Network Elements
Network Queries Unsolicited
Events
Router
The Real World
CiscoWorksHP-OpenView
} NetworkManagementSystem
Call Manager
Printer RouterSwitch
Unmanaged Element
ProxyAgentProxyAgent
Three-Tier Management Communication
} Network Elements
RMONProbeRMONProbe
The Model
MDB{ Manager
Managed Element
AgentAgent
NMS
Three-Tier Management Communication
The Real World
CiscoWorksConcord eHealth
}NetworkManagementSystem
SwitchProbe
Switch{Managed
Element
Polling and Event Reporting
Information that is useful for network monitoring is collected and stored by agents and made available to one or more managers systemsPolling
Is a request-response interaction between a manager and agent
The manager queries any agent and request the values of various information elements
Is used to generate a report on behalf of a user and to respond to specific user queries
Event Reporting
Agent may generate a report Periodically to give the manager its current status When a significant event or an unusual event occurs
Manager Is a listener waiting for incoming information Preconfigure or set the reporting period
Benefits Be useful for detecting problems as soon as they occur More efficient than polling for monitoring objects whose stat
es or values change relatively infrequently
Polling
Manager Queries any agent and request the values of various
information elements Learns about the configuration it is managing Obtains periodically an update of conditions Investigates an area in detail after being alerted to a problem
Agent Responds with information from its MIB Reports information matching certain criteria Supplies the manager with information about the structure of
the MIB at the agent
Polling vs. Event Reporting Factors of choices
The amount of network traffic generated by each methods Robustness in critical situations The time delay in notifying the network manager The amount of processing in managed devices The tradeoffs of reliable versus unreliable transfer The network-monitoring applications being supported The contingencies required in case a notifying device fails
before sending a report In general
SNMP approach: polling Telecommunications management systems: both
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Performance Indicators
Difficulties in selection and use of the indicators There are too many indicators in use The meanings of most indicators are not yet clearly
understood Some indicators are supported by some manufacturers only Most indicators are not suitable for comparison with each
other Indicators are accurately measured but incorrectly
interpreted The calculation of indicators takes too much time, and the
final results can hardly be used for controlling the environment
Performance Indicators
Service-oriented measuresthe highest priorityAvailabilityResponse timeAccuracy
Efficiency-oriented measuresThroughputUtilization
Availability
The percentage of time that a network system, a component, or an application is available for a user
Availability is based on the reliability of the individual components of a network MTBF: mean time between failures MTTR: mean time to repair Availability = MTBF / (MTBF+MTTR)
Availability of a system depends on the availability of its individual components plus the system organization Redundant components
A = 0.98A(serial)=0.98x0.98 =0.96
Unavailabily=1-A=0.02Unavailability of parallel=0.02x0.02=0.0004A(parallel) = 1-0.0004 =0.9996
Availability (cont.)
Functional availability for a dual link system Nonpeak periods accounts for 40% of requests, ether link c
an handle the traffic load During peak periods, both links are required to handle the fu
ll load, but one link can handle 80% of the peak load Af = (capability when 1 link is up) * Pr[1 link up]
+ (capability when 2 links are up) * Pr[2 links up] Af (nonpeak) = 1 * [A(1-A) + (1-A)A] + 1 * (A*A) = 0.99 Af (peak) = 0.8 * [A(1-A) + (1-A)A] + 1 * (A)(A) = 0.954 Af = 0.6 * Af (peak) + 0.4 * Af (nonpeak) If A = 0.9, Af = 0.9684
Base Requirements for Availability
Secure facilities Power systems Circuit diversity Intra-chassis redundancy
Dual power suppliesOnline Insertion and RemovalMulti-processor design
Response Time
The time it takes for a response to appear at a user’s terminal after a user action calls for it
The cost for shorter response time Computer processing power
Increased processing power means increased cost Competing requirements
Provides rapid response time to some processes may penalized other processes
Productivity increases as rapid response times are achieved Up to 2 seconds response time is acceptable for most
interactive applications
System Response Time
Elements of Response Time
Accuracy
The percentage of time that no errors occur in the transmission and delivery of informationBuilt-in error correction mechanisms in protocols
Data link and TCP protocols
Monitors the rate of errors Indicates an intermittent faulty line Exists a source of noise or interference
Throughput
The rate at which application-oriented events occur Is an application-oriented measure
No. of transactions of a given type for a period of time No. of customer sessions for a given applications during a
period of time No. of calls for a circuit-switched environment
Is useful to track these measures over time Performance trouble spots
Utilization
The percentage of the theoretical capacity of a resource (e.g., multiplexer, transmission line, switch) that is being used
Is a more fine-grained measure than throughput Used to search for potential bottlenecks and
areas of congestion Response time usually increases exponentially
as the utilization of a resource increases
Simple Efficiency Analysis
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Performance-Monitoring Function
Three components for performance monitoring Performance measurement
Gathers statistics about network traffic and timing Accomplished by agent modules to observe the behavior of nodes
No. of connections, the traffic per connection External (remote) monitor
Be able to unload the processing requirement from operational nodes to a dedicated system
Performance analysis Consists of software for reducing and presenting the data
Synthetic traffic generation Permits the network to be observed under a controlled load
Performance Measurement Reports
Host communication matrix Group communication matrix Packet type histogram Data packet size histogram Throughput-utilization distribution Packet interarrival time histogram Channel acquisition delay histogram Communication delay histogram Collision count histogram Transmission count histogram
Inquiry Concerns Possible Errors and Inefficiencies
Are there S-D pairs with unusually heavy traffic Are some packet types of unusually high frequency,
indicating an error or an inefficient protocol? What is the distribution of data packet size? What are the channel acquisition and communication
delay distribution? Are collisions a factor in getting packets transmitted? What is the channel utilization and throughput?
Inquiry Concerns Increasing Traffic Load
What is the effect of traffic load on utilization, throughput and time delay?
When does traffic load start to degrade system performance?
What is the tradeoff among stability, throughput and delay?
What is the max capacity of the channel under normal operating conditions?
How many active users are necessary to reach this maximum?
Inquiry Concerns Varying Packet Sizes
Do larger packets increase or decrease throughput and delay?
How does constant packet size affect utilization and delay?
Statistical versus Exhaustive measurement
When an agent is monitoring a heavy load of traffic, it may not be practical to collect exhaustive dataMonitors the total number of packets in a given
time period between each S-D pair on the LAN Samples the traffic stream to estimate the
value of the random variableStatistical methods: probabilities
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Fault Monitoring Objective
Identify faults as quickly as possible after they occur and identify the cause of the fault so that remedial action may be taken
Problems of fault observation – locate and diagnose faults Unobservable faults
Certain faults are inherently unobservable locally The existence of a deadlock between cooperating distributed processes may
not be observable locally Partially observable faults
A node failure may be observable but insufficient to pinpoint the problem The failure of low-level protocol
Uncertainty in observation Lack of response from a remote device may mean that the device is stuck,
the network is partitioned, congestion caused the response to be delayed, or the local timer is faulty
Fault Monitoring (cont.) Problems in fault isolation
Multiple potential causes Multiple technologies will cause the potential point of failure and the
types of failures increase
Too many related observations A single failure may generate many secondary failures
Interference between diagnosis and local recovery procedures
Local recovery procedures may destroy important evidence concerning the nature of the fault, disabling diagnosis
Absence of automated testing tools Testing to isolate faults is difficult and costly to administer
Fault Monitoring
x
Fault Monitoring
Fault Monitoring
Fault-Monitoring Functions
Detect faults Agent reports errors independently to one or more
managers Agent maintains a log of significant events and errors Criteria for issuing a fault report
Avoids overloading Anticipate faults
Set up thresholds Packet loss rate
An effective user interface
Test a Fault Monitoring System
Connectivity test Data integrity test Protocol integrity test Data saturation test Connection saturation test Response-time test Loopback test Function test Diagnostic test
Outline
Introduction Network monitoring architecture Performance monitoring Fault monitoring Accounting monitoring
Accounting Monitoring
Keep track of users’ usage of network resources An internal accounting system assesses the overall usage of resources
and determines the cost of shared resource to each department System offers a public services
Resources that may be subjected to accounting Communications facilities
LANs, WANs, leased lines, dial-up lines, and PBX system Computer hardware
Workstations and servers Software and systems
Applications and utility software in servers, a data center, and end-user sites Services
Includes all commercial communication and information services
Collect Accounting Data Based on the requirements of the organization Communications-related accounting data might be gathered
and maintained on each user User identification Receiver No. of packets Security level
Identifies the transmission and processing priorities Time stamps
Associated with each transmission and processing event Transaction start and stop times
Network status codes Indicates the nature of any detected errors or malfunctions
Resources used
Summary
Network monitoring is the most fundamental aspect of automated network management Gathers information about the status and behavior of
network elements Static information Dynamic information Statistical information
Agent collects local management information and transmits to one or more NMS
Each NMS includes network management application software plus software for communication with agents
Summary
Performance monitoring Availability Response time Accuracy Throughput Utilization
Fault monitoring Identifies faults as quickly as possible Identifies the cause of the fault to take corrective action Fault monitoring function is complicated
Accounting monitoring Gathers usage information for each resources