+ All Categories
Home > Documents > Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Date post: 30-Dec-2015
Category:
Upload: leo-wilson
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks. Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou Microsoft Research Presented by -Maitreya Natu. Network Management. Faults directory. …. Root cause. Healthy network. Corrective measure. Faulty network. - PowerPoint PPT Presentation
Popular Tags:
55
Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou Microsoft Research Presented by -Maitreya Natu
Transcript
Page 1: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong ZhouMicrosoft Research

Presented by -Maitreya Natu

Page 2: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Network Management

Faulty network

Root cause

Faults directory

Corrective measure

Healthy network

Page 3: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Tasks involved in Network Management Continuously monitoring the functioning Collecting information about the nodes and the

links Removing inconsistencies and noise from the

reported information Analyzing the information Taking appropriate actions to improve network

reliability and performance

Page 4: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Challenges in wireless networks

Dynamic and unpredictable topology link errors due to fluctuating environment

conditionsNode mobility

Limited capacityScarcity of resources

Link attacks

Page 5: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Proposed framework

Reproduce inside a simulator, the real-world events that took place

Use online trace driven simulation to detect faults and analyze the root causes

Page 6: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Network Management

Healthy network

Types of faults

Network model

Faults directory

Creating a network model

Page 7: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Network Management

Faulty network

Types of faults

Network model

Detected faults

Fault diagnosis

Faults directory

Page 8: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Network Management

Types of faults

Network modelwhat-if analysis

Detected faults

Faults directory

Corrective measures

Page 9: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Key issues

How to Accurately reproduce what happened in the network inside a simulator

How to build fault diagnosis on top of a simulator to perform root cause analysis

Page 10: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Accurate modeling

Use real traces from the diagnosed networkRemoves dependency on generic theoretical

modelsCaptures nuances of the hardware, software

and environment of the particular network Collect good quality data

By developing a technique to effectively rule out erroneous data

Page 11: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault diagnosis

Performance data emitted by trace driven simulation is used as baseline

Any significant deviation indicates a potential fault

Simulator selectively injects a set of suspected faults and searches a set that most produces the expected performance

An efficient algorithm is designed to determine root causes

Page 12: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

System Overview

simulator

Topology changes

Traffic Simulator

Interference InjectionLink RSSLink Load

Routing update

Faults Directory

+/-Expected loss rateThroughput noise

Loss rateThroughput noise

Error

Link/Node failure

1. Receive Cleaned Data 2. Drive Simulation

3. Compute Expected Performance

4. Compare Expected & AveragePerformance

5. Discrepancy Found

6. Search for set of faults that result in best explanation

7. Report thecause of failure

Page 13: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Why Simulation Based Diagnosis?

Much better insights into the network behavior than any heuristic or theoretical technique

Highly customizable and applies to a large class of networks

Ability to perform what-if analysis Helps to foresee the consequences of a corrective

action

Recent advances in simulators have made possible their use for real-time analysis

Page 14: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Accurate modeling

Healthy network

Types of faults

Network model

Faults directory

Page 15: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Current network models

Bayesian networks to map symptom-fault dependencies

Context Free Grammars Correlation Matrix

Page 16: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Can on-line simulations be used as core tool?

Page 17: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Building confidence in simulator accuracy Problem

Hard to accurately model the physical layer and the RF propagation

Traffic demands on the router are hard to predict

Page 18: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Building confidence in simulator accuracy Problem

Hard to accurately model the physical layer and the RF propagation

Traffic demands on the router are hard to predict

Solution “after the fact” simulation Agents periodically report information about the link

conditions and traffic patterns to the link simulators

Page 19: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Simulations when the RF condition of the link is good

Modeling the overheads of the protocol stack such as parity bits, MAC-layer back-off, IEEE 802.11 inter-framespacing and ACK, and headers.

Modeling the contention from flows within theinterference and communication ranges.

Page 20: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Simulations with varying received signal strength

Throughput matches closely with the simulator’s estimate,when signal quality is good

Simulator estimate deviates from real, when signal strength is poor

Page 21: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Why simulation results deviate in case of poor signal strength? Lack of accurate packet loss as a function

of packet size, RSS and ambient noise. Depends on signal processing hardware and

the RF antenna within the wireless cards Lack of accurate auto-rate control

Adjustment of sending rate done by WLAN cards based on the transmission conditions

Page 22: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

How to model auto-rate control done by WLAN cards? Use Trace driven simulation When auto-rate is in use

Collect the rate at which the wireless card is operating and provide the reported rate to the simulator

OtherwiseData rate is known to the simulator

Page 23: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

How to model accurate packet loss as a function of packet-size, RSS and ambient noise? Use offline analysis Calibrate the wireless cards and create a

database associating environmental factors with expected performanceE.g., mapping from signal strength and noise

to loss rate

Page 24: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Experiment to model the loss rates due to poor signal strength Collect another set of traces

Slowly send out packetsPlace packet sniffers near both the sender

and the receiver, and derive loss rate from the packet level trace

Seed the wireless link in the simulator with a Bernoulli loss rate that matches loss rate with the real traces

Page 25: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Estimated and measured throughput when compensating for the loss rate due to poor signal strength

Even though the match is not perfect, its not expected to be a problem, because many routing protocols try to avoid the use of poor quality

links Poor quality links are used only when certain parts of mesh

network have poor connectivity to the rest of the network In a well-engineered network, not many nodes depend on

such bad link for routing

Loss rate and the measured throughput do not monotonically decrease with the signal strength due to the effect of auto-rate

Page 26: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Stability of channel conditions

How rapidly do channel conditions change and how often a trace should be collected?

Page 27: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Temporal fluctuation in RSS

Fluctuation magnitude is not significant Relative quality of signals across different

number of walls remain stable

Page 28: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Stability of channel conditions

How rapidly do channel conditions change and how often a trace should be collected?When the environment is generally static,

nodes may report only the average and standard deviation of the RSS to the manager every few minutes

Page 29: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Dealing with imperfect data

By neighborhood monitoring Each node reports performance and traffic statistics

for its incoming and outgoing links And for other links in its communication range

Possible when node is in promiscuous mode Thus multiple reports are sent for each link Redundant reports can be used to detect

inconsistency Find the minimum set of nodes that can explain

the inconsistency in the reports

Page 30: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Summary

How to accurately model the real behavior? Solution: Use trace-based simulation

Problem: Simulation results are good for strong signals but deviate for bad RF conditions Need to model the autorate control

Use trace-driven data Need to model the loss rate due to poor signal strength

Use offline analysis How often a trace should be collected?

Very little data (average and standard deviation of RSS), at fairly low time granularity, as channels are relatively stable

How to deal with imperfect data By neighborhood monitoring

Page 31: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault diagnosis

Faulty network

Types of faults

Network model

Detected faults

Faults directory

Page 32: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Current fault diagnosis approaches

AI techniquesRule based systemsNeural networks

Model traversing techniquesDependency graphsCausality graphsBayesian networks

Page 33: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault Isolation and Diagnosis

Establish the expected performance in the simulation

Find difference between expected and observed performance

Search over the fault space to detect which set of faults can re-produce performance similar to what has been observed

Page 34: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Collecting data from traces

Trace data collection Network topology

Each node reports its neighbor and routing tables Traffic statistics

Each node maintains counters of traffic sent and received from immediate neighbors

Physical medium Each node reports signal strength of wireless links to neighbors

Network performance Includes both the link and end-to-end performance, which can be

measured through loss rate, delay, throughputs Focus is on link level performance

Page 35: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Simulating the network performance Traffic load simulation

Link based traffic simulation Adjust application sending rate to match the observed link-level

traffic counts Route simulation

Use actual routes taken by packets as input to the simulator Wireless signal

Use real measurement of signal strength Fault injection

Random packet dropping External noise sources MAC misbehavior

Page 36: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Fault diagnosis algorithm

General approach

Simulator Expected performanceNetwork settings

Simulator Observed performanceNetwork settings

Faults set

How to find ?

Page 37: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

How to search the faults efficiently?

Different types of faults often change one or few metricsE.g., random dropping only affects link loss

rate Thus use metrics in which observed and

expected performance is significantly different, to guide the search

Page 38: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Scenario where faults do not have strong interactions

Consider large deviation from expected performance as anomaly

Use decision tree to determine the type of fault

Fault type determines the metric to quantify performance difference

Locate faults by finding the set of nodes and links with large difference between expected and observed performance

Page 39: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Scenario where faults have strong interactions Get the initial diagnosis set from the decision

tree algorithm Iteratively refine the fault set

Adjust the magnitudes of faults in the fault set Translate difference in performance into change in faults’

magnitude It maps the impact of a fault into its magnitude Remove fault whose magnitude is too small

Add new faults that can explain large differences between the expected and observed performances

Iterate till the change in fault set is negligible

Page 40: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Example scenario

1

2

3

4 5

Page 41: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Example scenario

1

2

3

4 5

Observed performance• Increased loss rate at 1-4 and 1-2• No increase in the sending rate of 1-4, 1-2• No increase in noise experienced by neighbors

Inference

Increased Sending Rate

Increased Noise

Increased Loss

Too low CW

Noise

Packet Drop Normal

Y N

Y

Y

N

N

Page 42: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Example scenario

1

2

3

4 5

Observed performance• Increased loss rate at 1-4 and 1-2• No increase in the sending rate of 1-4, 1-2• No increase in noise experienced by neighbors

Inference

Increased Sending Rate

Increased Noise

Increased Loss

Too low CW

Noise

Packet Drop Normal

Y N

Y

Y

N

NPacket dropping at node 1

Page 43: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Accuracy of fault diagnosis

Correctness of the model Complete information Consistent information Timely information

Correctness of the reported symptoms Right size of the threshold to report a symptom Difference in the behavior of faults Timely reporting of symptoms

Page 44: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

System implementation

Windows XP Agents run on every wireless node and reports information collected

on demand Managers collect and analyze information Collected information is cast into performance counters supported

by Windows Manager is connected to a backend simulator. Collected information

is converted to script to drive the simulation Testbed:

Multihop wireless testbed built using IEEE 802.11a cards Commercially available network sniffer called Airopeek is used for data

collection Native 802.11 NICs provide rich set of networking information

Page 45: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Evaluation: Data collection overhead

Management traffic overhead Performance of FTP flow withand without data collection

No data cleaning: Each link is reported only onceWith data cleaning: Each link is reported by all observers for consistency check

Overhead < 800 bits/s/node Data collection traffic has little effect

Page 46: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Data cleaning effectiveness

Higher accuracy with denser networks

Higher accuracy with client-server traffic

Coverage greater than 80% in all cases

Higher accuracy with grid topology

Higher coverage when using history

Page 47: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Evaluation: Fault diagnosisDetecting random dropping Detecting external noise

•Symptom: Significant difference in loss rates in links•Less than 20% of fault links are left undetected•No-effect faults are faulty links sending less that threshold (250) packets of data

•Symptom: Significant difference in noise level in nodes•Noise sources are correctly identified with at most one or two false positives•Inference error in magnitudes of noises is within 4%

Page 48: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Evaluation: Fault diagnosisDetecting MAC misbehavior Detecting combinations of all

•Symptom: Significant discrepancy in throughput on links•Coverage is mostly around 80% or higher•False positives within 2

Page 49: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

what-if analysis

Types of faults

Network model

Detected faults

Faults directory

Corrective measures

Page 50: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

What-if analysisDiagnosisTopology

Corrective measures

Page 51: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Limitations

Limited by accuracy of the simulator Time to detect the faults is acceptable for

detecting long term faults but not transient faults Choices of traces to drive the simulation has

important implications Focus has only been on faults resulting in

different behavior

Page 52: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Conclusion

Used trace data for modeling the network Data collection techniques are presented to

collect network information and detect a deviation from the expected performance

Fault diagnosis algorithm is proposed to detect the root causes of failure

A scheme for what-if analysis is proposed to evaluate alternative network configuration for efficient network operation

Page 53: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Future work

Validation on a large test-bed Performance analysis in presence of mobility Detecting malicious attacks Diagnosis in presence of incomplete network

information More deeply investigating the potential of what-if

analysis

Page 54: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

References

L. Qiu, P. Bahl, A. Rao, L. Zhou, Fault Detection, Isolation, and Diagnosis in Multihop Wireless Networks, Microsoft Technical Report, Microsoft Researh-TR-2004-11, Dec. 2003

M. Steinder, A. Sethi, A survey of fault localization techniques in computer networks, Technical Report 2001, CIS Dept., Univ of Delaware, Feb 2001

M. Steinder, Probabilistic inference for diagnosing service failures in communication systems, PhD thesis, Univ. of Delaware, 2003

Page 55: Fault Detection, Isolation, and Diagnosis In Multihop Wireless Networks

Questions

What is proposed solution to model the throughput when the signal strength is poor? In Table 2, the simulated throughput monotonically decreases with the loss rate while the measured throughput does not. Why?

What could be the causes of generation of false positives in the fault diagnosis results? When can the false positive ratio increase?

http://www.cis.udel.edu/~natu/861/861.html


Recommended