+ All Categories
Home > Documents > Troubleshooting Wireless Mesh Networks

Troubleshooting Wireless Mesh Networks

Date post: 16-Jan-2016
Category:
Upload: aysel
View: 30 times
Download: 0 times
Share this document with a friend
Description:
Troubleshooting Wireless Mesh Networks. Victor Bahl [email protected] joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou Microsoft Research April 1, 2004. Mesh Network Management. - PowerPoint PPT Presentation
Popular Tags:
32
Troubleshooting Wireless Mesh Networks Victor Bahl [email protected] joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou Microsoft Research April 1, 2004
Transcript
Page 1: Troubleshooting Wireless  Mesh Networks

Troubleshooting Wireless Mesh Networks

Victor [email protected]

joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou

Microsoft Research

April 1, 2004

Page 2: Troubleshooting Wireless  Mesh Networks

Mesh Network Management

ISO’s definition of network management: – Fault Management– Configuration Management– Security Management– Performance management– Accounting

“Network management is a process of controlling a complex data network so as to maximize its efficiency and productivity”

Page 3: Troubleshooting Wireless  Mesh Networks

Goals

Assist with Mesh Router configuration

Reactive and Pro-active Trouble Shooting– Investigate reported performance problems

• Time-series analysis to detect deviation from normal behavior– Localize and Isolate trouble spots

• Collect and analyze traffic reports from mesh nodes– Determine possible causes for the trouble spots

• Interference, or hardware problems, or network congestion, or malicious nodes ….

Respond to troubled spots– Re-route traffic– Rate limit– Change topology via power control & directional antenna control – Flag environmental changes & problems

Page 4: Troubleshooting Wireless  Mesh Networks

Nomenclature

Mesh Management Module (M3)– Runs on every node

Mesh Management Server (MMS)– Runs on gateway or designated nodes

Mesh Network Management Protocol (MNMP)– Protocol (similar to SNMPv3) between M3 and MMS

Page 5: Troubleshooting Wireless  Mesh Networks

Focus of this talk

• Gathering & Distribution Data

• Cleaning Data

• Fault Isolation & Diagnosis

Page 6: Troubleshooting Wireless  Mesh Networks

Challenges in Fault Diagnosis

Characteristics of multi-hop wireless networks– Unpredictable physical medium, prone to link errors– Network topology is dynamic – Resource limitation calls for a diagnosis approach with low

overhead– Vulnerable to link attacks

Identifying root causes– Just knowing link statistics is insufficient– Signature Based Techniques don’t work well– Determining normal behavior is hard

Handling multiple faults– Complicated interactions between faults and traffic, and among

faults themselves

Page 7: Troubleshooting Wireless  Mesh Networks

Previous Approaches to Fault Diagnosis

Protocols for Network Management• ANMP [singh99] • Guerrilla [shen02]

Detecting Routing and MAC misbehavior• Watchdog & pathrater [Baker00]• MACMis [Vaidya03]

Fault Management in Infrastructure mode• AirWave, AirDefense, UniCenter, Symbol’s WNMS, IBM’s

WSA, Wibhu’s SpetraMon, …

Page 8: Troubleshooting Wireless  Mesh Networks

Our Approach

Use a network simulator as a real-time diagnostic tool

Page 9: Troubleshooting Wireless  Mesh Networks

Fault Detection, Isolation & Diagnosis Process

Collect DataCleanData

DiagnoseFaults

Simulate

RawData

RootCauses

MeasuredPerformance

RoutesLink Loads

Signal Strength

InjectCandidate

Faults

PerformanceEstimate

Agent Module

Manager Module

• SNMP MIBs• Performance Counters• WRAPI• MCL• NativeWiFi

Page 10: Troubleshooting Wireless  Mesh Networks

Wireless Network

Simulation

Link RSS

Link Load

Routing Update

+/-

Loss rate, Throughput, Noise,…

FaultsDirectory

NETWORK

REPORTS

Expected Loss rate, Throughput, Noise,...

Error

Topology Changes

InterferenceInjection

Error

{Link, Node, Fault}

Traffic Simulation

Delay

Root Cause Analysis Module

Page 11: Troubleshooting Wireless  Mesh Networks

Our Fault Diagnosis Framework

Advantages– Flexible & customizable for a large class of

networks– Captures complicated interactions within the

network, between the network & environment, and among multiple faults

– Extensible in its ability of detecting new faults– Facilitates what-if analysis

Challenges– To accurately reproduce the behavior of the

network inside a simulator– To build a fault diagnosis technique using the

simulator as a diagnosis tool

Page 12: Troubleshooting Wireless  Mesh Networks

Handling the Challenges

Reproducing network behavior• Identify the set of traces to collect • Rule out erroneous data from the trace• Drive the simulator with the cleaned traces

Building fault diagnosis• Use performance results from trace-driven

simulation to establish the normal behavior• Deviation from the normal behavior indicates a

potential fault• Identify root causes by efficiently search over

fault space to re-produce faulty symptoms

Page 13: Troubleshooting Wireless  Mesh Networks

A B C D E

O P Q R S

F G

UT

F1 F2 F3 F4 F5

Why Simulator?

Flow1 Flow2 Flow3 Flow4 Flow5

2.5 Mbps 0.23 Mbps 2.09 Mbps 0.17 Mbps 2.55 Mbps

Page 14: Troubleshooting Wireless  Mesh Networks

Simulator Accuracy: RF Propagation

RF propagation model versus measured signal strengthsfor IEEE 802.11a cards from different vendors

Page 15: Troubleshooting Wireless  Mesh Networks

Simulator Accuracy: Throughput

Estimated versus actual throughput when channel conditions are good (IEEE 802.11a)

Page 16: Troubleshooting Wireless  Mesh Networks

Simulator Accuracy: Throughput (2)

Estimated matches measured throughputtill the channel conditions become poor

Page 17: Troubleshooting Wireless  Mesh Networks

Simulator Accuracy: Throughput

No. of Walls

Loss Rate

Measured Throughput

Simulated Throughput

4 11.0 % 15.52 Mbps 15.94 Mbps

5 7.01 % 12.56 Mbps 14.01 Mbps

6 3.42 % 12.97 Mbps 11.55 Mbps

Estimated matches measured throughputfor poor channel conditions when

loss rate is incorporated

Page 18: Troubleshooting Wireless  Mesh Networks

How Stable is the Channel?

Good environmental conditions,

received signal strength remains stable

Page 19: Troubleshooting Wireless  Mesh Networks

Data Collection

What should we collect?– Network Topology/Connectivity Info (Neighbor Table)– Noise level & signal strength– Traffic load to direct neighbor– Loss rate to direct neighbor (retransmission count)

Page 20: Troubleshooting Wireless  Mesh Networks

Data Distribution

Design GoalMinimize bandwidth consumption

Techniques– Dynamic scoping

• Each node takes a local view of the network• The coverage of the local view adapts to traffic patterns

– Adaptive monitoring • Minimize measurement overhead in normal case• Change update period• Push and pull

– Delta compression– Multicast

Page 21: Troubleshooting Wireless  Mesh Networks

Management Overhead

40 Kb/sec

25 Kb/sec

15 Kb/sec

BW requirement does not go up much with network size

Info distributed:Routing changesTraffic counters (e.g. pkts. sent & rcv.)Signal Strength

Avg: 1 to 5 hops

Page 22: Troubleshooting Wireless  Mesh Networks

Measurement Overhead on Throughput

Page 23: Troubleshooting Wireless  Mesh Networks

Data Cleaning

Data may not be pristine. Why?– Liars, malicious users– Missing data– Measurement errors

Clean the Data– Detect Liars

• Assumption: most nodes are honest• Approach:

– Neighborhood Watch – Find the smallest number of lying nodes to explain

inconsistency in traffic reports

– Smoothing & Interpolation

Page 24: Troubleshooting Wireless  Mesh Networks

Example: Resiliency against Liars/Lossy Links

Problem• Identify nodes that report incorrect

information (liars)• Detect lossy links

Assume• Nodes monitor neighboring traffic, build

traffic reports and periodically share info.• Most nodes provide reliable information

Challenge• Wireless links are error prone and unstable

Approach• Find the smallest number of lying nodes to

explain inconsistency in traffic reports• Use the consistent information to estimate

link loss rates

Detect liars

0

0.2

0.4

0.6

0.8

1

NL=1 NL=2 NL=5 NL=8 NL=10 NL=15 NL=20

Fra

ctio

n o

f ly

ing

no

des

id

enti

fied

coverage false positive

Detect lossy links

0

0.2

0.4

0.6

0.8

1

NL=1 NL=2 NL=5 NL=8 NL=10 NL=15 NL=20

Fra

ctio

n o

f lo

ssy

links

id

enti

fied

coverage false positive

Results

Page 25: Troubleshooting Wireless  Mesh Networks

Fault Diagnosis Algorithm

1. Initialization: diagnosed fault set F = { }2. Forward addition

while (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) {Find a candiate fault that explains the mismatch between current and predicted performance the most, and add it to F

}

3. Backward deletionwhile (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) {

Find a fault in F that explains the mismatch the least. Delete it from F if excluding it results in little change

}

4. Report F

Page 26: Troubleshooting Wireless  Mesh Networks

Performance

Number of faults

4 6 8 10 12 14

Coverage 1 1 0.75 0.7 0.92 0.86

False Positive

0 0 0 0 0.25 0.29

Faults detected:- Random packet dropping- MAC misbehavior- External noise

25 node random topology

Page 27: Troubleshooting Wireless  Mesh Networks

What-if Analysis

Laptop computer

Laptop computer Laptop computer Laptop computer Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer

Laptop computer Laptop computer Laptop computer

Laptop computer Laptop computer Laptop computer

Laptop computer Laptop computer Laptop computer Laptop computer

Laptop computer Laptop computer Laptop computer Laptop computer

F1 F2 F3 F7

F8

Improvement on removing flows

Action Total Throughput (Mbps)

None 1.064

Reduce Flow 8 by ½ 1.148

Re-route Flow 8 around grid boundary 1.217

Increase power from 15 dBm to 20 dBm 0.99

Increase power from 15 dBm to 25 dBm 1.661

Page 28: Troubleshooting Wireless  Mesh Networks

Mesh Visualization Module

Page 29: Troubleshooting Wireless  Mesh Networks
Page 30: Troubleshooting Wireless  Mesh Networks

Thanks!

http://www.research.microsoft.com/sn/mesh

Page 31: Troubleshooting Wireless  Mesh Networks

Backup

Page 32: Troubleshooting Wireless  Mesh Networks

Detection of Intentional Packet Drops

Scenario

- 49 node network

- Randomly pick nodes that drop packets

Coverage

0%20%40%60%80%

100%

r1 r2 r3 r1 r2 r3 r1 r2 r3 r1 r2 r3 r1 r2 r3

fault=1 fault=2 fault=3 fault=4 fault=5

detected faults no effect faults

False positive

0%

50%

100%

150%

200%

r1 r2 r3 r1 r2 r3 r1 r2 r3 r1 r2 r3 r1 r2 r3

fault=1 fault=2 fault=3 fault=4 fault=5


Recommended