Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | cleo-phelps |
View: | 50 times |
Download: | 1 times |
Fault Tolerance for WLAN
Speaker : Mark Yang
93.04.27
2 / 40
Outline
Hardware Fault Tolerance Dependability enhancement for IEEE 802.11 wireless LAN
with redundancy techniques Tolerance to Access-Point Failures in Dependable Wireless
Local-Area Networks Comparison
Software Fault Tolerance TCP-DCR: A novel protocol for tolerating wireless channel
errors Implementation of Explicit Wireless Loss Notification Using
MAC-Layer Information Comparison
Simulation Conclusion
3 / 40
Hardware (1)
Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques
Dependable Systems and Networks, 2003. Proceedings. 2003 International Conference on , 22-25 June 2003 Pages:521 -
528
4 / 40
Hardware (1) – Abstract
Propose the alternate approach of tolerating the existence of
"shadow regions" as opposed to prevention in order to enhance
the connection dependability.
A redundant AP is placed in the shadow region to serve the
mobile stations which roam into that region. The secondary AP is connected to the same distribution system
as the primary AP. (DS configuration)
The secondary AP acts as a wireless forwarding bridge for traffic
to/from the mobile stations in the shadow region to the primary
AP. (Forwarding configuration)
5 / 40
Hardware (1) – DS configuration
An additional AP is placed in the shadow area with the same frequency as the primary AP. The secondary AP forwards the data between the mobile stations in the shadow area and the primary AP. The two APs communicate over the distribution system.
6 / 40
Hardware (1) – Forwarding configuration
The secondary AP is placed at a certain location where it could communicate with both the mobile terminal in the shadow area and with the primary AP. The secondary AP thus could forward the packet transmissions in both directions.
7 / 40
Hardware (1) – Aspects
Since the beacon interval is less than 100ms, the maximum detection delay for link failures is 100ms A mobile station in the shadow region may only transmit data when it is granted a TXOP by the secondary AP
The primary AP sends the specification of TXOP to the secondary AP. Simultaneously, the primary AP also broadcasts the same channel reservation message in the cell in the form of a QoS-Poll frame. All stations in the non-shadowed area will receive the QoS-Poll frame and defer any transmission attempt until the channel reservation time is over. With channel reserved, the secondary AP then sends the QoS-Poll frame to mobile stations in the shadow region sequentially so that they may send their data packets free of collisions.
8 / 40
Hardware (1) – Fault models
Reliability:
Availability:
Survivability:
9 / 40
Hardware (1) – Numerical examples
Availability = 1-10-2 = 0.99
1 /λos ≈ 2.8 hours
10 / 40
Hardware (2)
Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks
Object-Oriented Real-Time Dependable Systems, 2003. Proceedings. Ninth IEEE International Workshop on , 1-3 Oct. 2003
Pages:136 - 143
11 / 40
Hardware (2) – Abstract
Enhancing the dependability of wireless networks by focusing on tolerating AP failures and develop and evaluate a new fault-detection approach, based on signal-to-noise ratio. Detection of AP Failures
Beacon-frame monitoring Signal-to-noise ratio
Three techniques to recover from AP failures: Access-Point Replication Overlapping-Coverage Link-Multiplexing
12 / 40
Hardware (2) – Beacon-frame monitoring
Handoff mechanism in 802.11 WLAN Passive Scanning: A mobile station sweeps from channel-to-channel to detect the presence of Beacon frames which are periodically transmitted by the APs. Active Scanning: A mobile station actively seeks out APs by broadcasting Probe Request frames on every channel.
Need to distinguish between user mobility and AP's failure. User mobility:
• A few users trying to handoff to a new AP due to user mobility at a given point of time.• Employ active scanning to discover new APs.
AP's failure:• The number of users trying to handoff to a new AP could be relatively large.• Using a passive scanning method instead to detect the presence of new AP.
13 / 40
Hardware (2) – Signal-to-noise ratio
Using the strength of the signal that a mobile station receives from an AP, as an indicator of the AP's "up/down" status. Initial fault recovery mechanism if the signal-to-noise ratio (SNR) drops suddenly.
14 / 40
Hardware (2) – Access-Point Replication
Using an additional AP that is designated as a backup, and that can be activated once the primary AP fails. Drawback:
The latency involved in detecting AP failures and performing the fail-over (authenticate ACK re-association request re-association response) is relatively large (7.03 seconds). Additional infrastructural costs – might not necessarily be actively used under fault-free conditions.
15 / 40
Hardware (2) – Overlapping-Coverage
If one AP fails, mobile stations associated with that AP can be transferred over to another AP whose coverage area intersects with that of the failed AP. In IEEE 802.11, the channels used by neighboring AP be separated by
at least five channels, this limited availability of channels can result in shadow areas. Drawback:
Requires that some spare capacity be reserved at each AP to take over the additional users that the AP will have to support in case a neighboring AP (with overlapping coverage) fails. The latency involved in detecting an AP failure and switching to a functional AP is relatively large.
16 / 40
Hardware (2) – Link-Multiplexing
Using redundant communication paths from a mobile station, with
each path connecting a distinct wireless network-interface card at
the mobile station to a distinct AP.
Using link-multiplexing over link-replication
Total bandwidth used for communication can remain the same as that
used by a single link. Increase in the amount of average delay in message transmission due
to the multiplexing and demultiplexing.
17 / 40
Hardware (2) – Link-Multiplexing (cont.)
Requires additional software be installed at the client & server.
Utilize a library interpositioning (interceptor) approach to capture
the network layer calls made by the application, and can be embedded
inside a middleware layer at both the client and the server.
Fault-detection. Intercepting network layer calls made by the application. Multiplexing/de-multiplexing data from/to the application.
18 / 40
Hardware – Comparison
Hardware (1) Hardware (2)
Measuresreliability, availability and survivability
delay in detecting AP failure and switching
Extra cost redundant APmultiple network-interface card
SoftwareStandard
802.11e (channel reservation)
interceptor installed at client and server
Recover time longer shorter
Otheronly for shadow regions
mobile station to consume more battery power
19 / 40
Software (1)
TCP-DCR: A novel protocol for tolerating wireless channel errors
Accepted for publication in IEEE Transactions on Mobile Computing (February 2004)
http://www.crhc.uiuc.edu/wireless/groupPubs.html
20 / 40
Software (1) – Abstract
TCP-DCR delay the triggering of congestion response algorithms for a
small bounded period of time T to allow the link level retransmissions
to recover the loss due to channel errors. If the packet is not recovered by link level retransmission by the end of
the delay period, TCP-DCR protocol triggers the congestion recovery algorithms of fast retransmission and recovery. Through simulations, TCP-DCR
Does not impact the fairness towards the native implementations of TCP. Significantly better performance when channel errors contribute more towards packet losses in the network.
21 / 40
Software (1) – Behavior
22 / 40
Software (1) – Choice of T
t0 t0+(RTT/2 – rtt/2) t0+RTT/2
t0+RTT/2+rtt/2 BS receives indication that the packet is lostt0+RTT/2+rtt Packet is recovered at receivert0+RTT+rtt Sender receives an ACK for the packetSender would have to delay the congestion at least: (t0+RTT+rtt)-(t0+RTT) = rtt The interpacket delays are non-zero and the TCP sender may not know the value of rtt The lower bound of T is one RTT
Retransmission timeout is usually set to RTT + 4 times.The choice of T should be such that unnecessary retransmission timeouts are avoided. The upper bound of T is one RTT.
23 / 40
Software (1) – Simulation No Congestion Losses
24 / 40
Software (1) – Simulation (cont.)
Only Congestion Losses
10Mbps5ms
congestion
12 TCP-SACK flows & 12 TCP-DCR flows
25 / 40
Software (1) – Simulation (cont.)
Channel Errors & Congestion Losses
12 TCP-SACK flows & 12 TCP-DCR flows
TCP-DCR flows can make use of the link bandwidth not utilized effectively by the TCPSACK flows.
26 / 40
Software (2)
Implementation of Explicit Wireless Loss Notification Using MAC-Layer Information
Wireless Communications and Networking, 2003. WCNC 2003. 2003 IEEE , Volume: 2 , 16-20 March 2003 Pages:1339 - 1343
vol.2
27 / 40
Software (2) – Abstract
TCP suffers a significant degradation in performance over wireless networks because it does not distinguish wireless link loss from congestion loss. To overcome this problem, the Explicit Wireless Loss Notification (EWLN) scheme is proposed to explicitly inform wireless link loss to the TCP sender. EWLN scheme that deploys the information from the MAC layer and takes into account the interplay with the error recovery mechanism at the link layer. The sender's congestion control mechanism can be decoupled from the retransmission mechanism and set to react only to congestion related losses.
28 / 40
Software (2) – MAC Protocol
MAC Protocol
Link-level retry
Comparing the seqNo of the current and buffered packets
To mobile terminal
To next node
29 / 40
Software (2) – Receiver
Ewln_flag = 1 Send duplicate ACKEwln_flag = 0 Congestion error
Normal
Wireless link errorand link-level can't recovery
30 / 40
Software (2) – Sender
Retransmit the packet upon receiving the first ACK with EWLN set.
To avoid transmission duplication, retransmit only when the first ACK + EWLN
31 / 40
Software (2) – Example
Two packet losses occur over a wirless link in a single transmission window
Receive 1
Not receive 2
Not receive 2
Not receive 4
Receive 6
Not receive 4
Not receive 4
Not receive 4If lose again?
32 / 40
Software (2) – Simulation
No wirelesslink error
33 / 40
Software (2) – Simulation (cont.)
congestion
34 / 40
Software – Comparison
Software (1) Software (2)
Measures throughput number of sequence
Modification sender sender, receiver and BS
Requirement receiver bufferbuffering technique in MAC protocol
Assumption link level retransmission 1. link level retransmission2. packet corruption only in the payload part
35 / 40
Simulation – Environment
Paper : TCP-DCR: A novel protocol for tolerating wireless channel errors
Software : Linux 9 + NS 2.26 (DCR: modify tcp-sack1.cc)
Topology :
Tcl (additional setup): Error Model (exponential) Link Level Retransmission (LL/LLSnoop)
36 / 40
Simulation – DCR code verify
ack no 1417 received at 87.2862, cwnd=22ack no 1417 received at 87.4168, cwnd=22dcr start at 87.4168 [ack no=1417, delay time=0.849]ack no 1429 received at 91.4158, cwnd=1dcr cancel at 91.4158 [ack no=1429]ack no 1429 received at 91.4769, cwnd=2
ack no 1131 received at 69.7674, cwnd=19ack no 1131 received at 69.9014, cwnd=19dcr start at 69.9014 [ack no=1131, delay time=0.553]ack no 1131 received at 70.2204, cwnd=19ack no 1131 received at 70.2242, cwnd=19delay fast recovery at 70.2242! [ack no=1131]ack no 1140 received at 70.7637, cwnd=19dcr cancel at 70.7637 [ack no=1140]
1st dupack
LL retransmission
ack no 4128 received at 85.4946, cwnd=19ack no 4129 received at 85.4962, cwnd=19ack no 4129 received at 85.4978, cwnd=19dcr start at 85.4978 [ack no=4129, delay time=0.144]ack no 4129 received at 85.6545, cwnd=19ack no 4129 received at 85.656, cwnd=19fast recovery begin at 85.656, dcr cancel! [ack no=4129]ack no 4142 received at 85.6816, cwnd=9
Time out
70.2242-69.9014< 0.553
85.656-85.4978> 0.144
Fast recovery
37 / 40
Simulation – Performance (1)
Some of "Fast-recovery" & "Timeout" events stall happen.
Almost all errors are recovered by "Link level retransmission“.
38 / 40
Simulation – Performance (2)
Some of "Fast-recovery" & "Timeout" events stall happen.
Almost all errors are recovered by "Link level retransmission“.
39 / 40
Simulation – Performance (3)
Some of "Fast-recovery" & "Timeout" events stall happen.
Almost all errors are recovered by "Link level retransmission“.
40 / 40
Conclusion
Papers for WLAN Fault Tolerance Hardware Fault Tolerance : less
Software Fault Tolerance : more
Simulation / Experiment method Hardware : Numerical examples or Experiment
Software : NS (Network Simulation tool)