Post on 17-Jan-2018
description
transcript
Wireless Mesh Network Management - Fault andPerformance Diagnosis:A Survey
Vijay P Gabale (CSE, IIT Bombay)MTech Seminar under the guidance of Prof. Bhaskaran Raman
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Wireless Network Woes!
• My machine says: wireless connection unavailable.
• Why is the network performance so low?
• Is someone interfering with my transmissions?
• Do we have complete coverage in all the buildings?
• I wonder if some one has sneakily installed an unauthorized access point.
Wireless Network Anomalies
• RF holes
• Interference
• Hidden terminal
• Rogue Access Points
Which anomaly was the cause of undesired network performance?
Challenges
• Quantification of possible causes
• Attribution of a performance problem to a specific root cause i.e. recognizing a fault
• Network management to proactively deal with likely faults
• Avoiding personal visits to nodes in long distance links
Effects
• System downtime
•Loss of productivity (loss of faith)
•Recovery cost
Number of wireless related complaints logged by the IT department of a major US corporationSource:[4]
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Enterprise Network
• Comprises of dense deployment of access points & clients in a university or corporate building
• Challenges• RF holes
• Interference
• Hidden terminals
• Rogue Access Points
• Solution space : Characterizing & then analyzing entire wireless behaviour, Online and Offline diagnosis
Long-Distance Network
• Comprises of point to point links of several meters to build a multi-hop mesh network
• Challenges• Physical visits are costly• Remote locations could sometimes become inaccessible• Lack of trained personnel• Poor power quality
• Solution space : In-node recovery or inference techniques, Independent control mechanisms
Diagnostic Questions!
• What is the per packet signal strength at every node – RF holes
• How many concurrent receptions are there – Hidden Terminal
• How is the noise level varying over time – Interference
• Is there any foreign node wandering in the network – Rogue AP
• Is the remote node working? What is the software or hardware status of the node? – Primary link failure or Software-Hardware failures
Agenda
• Introduction
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Existing Techniques
• Offline data collection & analysis : [1], [6]
• Online anomaly detection : [2]
• Simulation : [3]
• Daemon running as a part of the node : [4]
• Software & Hardware redundancies : [5], [7]
Offline data collection & analysis
• Steps :
• Dense deployment of monitors• Synchronization & unification at a central server• Inference techniques
• Example : Jigsaw[1], MacWild[6]
• Fault Diagnosis :
• Pr (Interference | Concurrent Transmissions)• Over-protective 802.11g clients and access points
Offline data collection & analysis - Framework
source : MacWild[6]
Central Server
Offline data collection & analysis
• Steps :
• Dense deployment of monitors• Synchronization & unification at a central client• Inference techniques
• Example : Jigsaw[1], MacWild[6]
• Fault Diagnosis :
• Pr (Loss due to Interference | Concurrent Transmissions)• Over-protective 802.11g clients and access points
Online anomaly detection
Steps :
Deploy multiple monitors• Sample physical layer parameters• Dynamic interference engine
Example : Mojo[2]
Fault Diagnosis : Threshold for Hidden Terminal, Capture Effect, Non 802.11 interference
Simulation
Steps :
Traces to drive simulation Deviation of observed behavior from expected behavior• Decision trees to make distinction between possible faults
Example : Troubleshooting Wireless Mesh Networks[3]
Fault Diagnosis : External noise, Packet dropping, Misbehaving clients
Simulation (contd…) Decision Tree
If simSent – realSent> ThreshSentDiff
If simNoise – realNoise> ThreshNoiseDiff
If simLoss – realLoss> ThreshLossDiff
External Noise
CW misuse
Packet dropping
Normal
Daemon running as a part of the node
• An application resides at client side
• Takes reactive or proactive actions in response to an event
• Example : Client Conduit technique[4]
• Fault Detection : Rogue APs, RF holes
Software & Hardware redundancies
• Experiences of software & hardware failures
• Techniques :
• Software & hardware watchdogs• Independent control mechanisms• Tracking & predicting health of a node
• Example : Beyond Pilots[5], Fault Diagnosis[7]
• Fault Diagnosis : Erratic power conditions, Primary link failure, Non 802.11 interference, Antenna misalignment
Agenda
• Introduction
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Problem : Intermittent Connectivity
• Symptoms : Irregular changes in connectivity or total failure
• Causes : Weak RF signal, Lack of signal, unpredictable ambiance, obstructions
• Parameter : Received signal strength
How to tackle total failure? How to track a mobile node?
Remedy : Client Conduit
• It is a mechanism to allow disconnected users to convey messages to system and network administrators.
Problem : Rogue Access Point
• What is Rogue Access Point?
• Security holes, unwanted RF interference and network load.
• Access Point Database
• Location, MAC, Channel
Remedy : Client Conduit
Is MAC Registered?
Is AP atExpected Location?
Is AP Advertising Expected SSID?
Rogue APDetected
Is AP on Expected Channel?
Yes
Yes
Yes
No
No
No
No
Problem & Remedy : Hidden Terminal
• Symptoms : Degraded performance, lower throughput
• Causes : One transmitter not able to hear other transmissions to the same receiver, heterogeneous transmit powers
• Remedy : Quantify number of concurrent transmissions
• Around 40%
• Capture Effect : Around 5%
Problem & Remedy : Non 802.11 Interference
• Symptoms : Retransmissions at the MAC layer, No concurrent transmissions detected
• Quantify noise level
• Moving window average
• Threshold
Problem : Connectivity problems over Long-Distance Links
• Symptoms : Remote node NOT Reachable
• Causes: IP address misconfiguration, routing misconfiguration, power shutdown at remote node, a board failure, malfunctioning wireless card
• Solution :
• Link Local IP addressing
• SMS backchannel
Solution : Troubleshooting a Link
Does Link Local IP Addressing Work?
Send SMS query andGet the result
Log In & Fix Configuration Problem
Wait for PowerVisit not required
Get Status Report:Signal Strength, NoiseVisit may be Required
Reboot orVisit & Replace
Yes
No
Power Unavailable
Power available,
Router Down
Router is Up
Problem & Remedy : Software and Hardware Failures
•Symptoms : Node suddenly goes down, node does not respond on trying to connect over the primary link
• Causes : Damaged power supplies or router boards, damaged CF cards, low voltages leave router in wedged state, battery problems
• Techniques : Software and hardware watchdogs, power controllers, Low Voltage Disconnect, read only boot loader
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Future Scope
• Comprehensive Network Monitoring & Inference Tool
• Quantify Performance Improvement
• User Friendly GUIs
• Automatic Recovery
Conclusion
• Classification of Techniques to resolve fault diagnosis
• Enterprise as well as Long-Distance Mesh Networks
• Faults: Connectivity, Hidden Terminal, Interference, Hardware Failures
• Need for ‘Complete Monitoring & Inference’ Suit to Detect Root Level Causes
Appendix – Comparison Table
Appendix – Comparison Table (contd…)
References of the Survey
[1] Yu-Chung Cheng, John ellardo, and Peter Benko. Jigsaw:Solving the Puzzle of
Enterprise 802.11 Analysis. SIGCOMM’06.
[2] Anmol Sheth, Christian Doerr, Dirk Grun wald, Richard Han, and Dougla Sicker.
Mojo :a Distributed Physical Layer Anomaly Detection System for 802.11WLANS.
MOBISYS’06.
[3] Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou. Troubleshooting-Wireless
Mesh Networks. SIGCOMM’06.
[4] Atul Adya, Paramvir Bahl, Ranveer Chandra, and Lilli Qiu. Architecture and
techniques for diagnosing faults in ieee 802.11 infrastructure networks. MOBICOM’04.
[5] Sonesh Surana, Rabin Patra, Sergiu Nedevschi and Manuel Ramos. Beyond Pilots:
Keeping Rural Wireless Networks Alive. To appear in USENIX NSDI’08.
[6] Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Analyzing the
MAC Level Behavior of Wireless Networks in the Wild. SIGCOMM, 2006.
[7] Sonesh Surana, Rabin Patra, and Eric Brewer. Simplifying Fault Diagnosis in
Locally Managed Rural wifi networks. SIGCOMM NSDR, 2007.
[8] Yu-Chung Cheng, Mikhali Afanasyev, Patrick Verkaik, and Peter Benko.
Automating Cross-Layer Diagnosis of Enterprise Wireless Networks. SIGCOMM,
2007.
[9] Kameswari Chebrolu, Bhaskaran Raman, and Sayandeep Sen. Long-distance
802.11b Links: Performance Measurements and Experience. MOBICOM, 2006.
References of the Survey