Post on 28-Dec-2021
transcript
Debloating the Linux WiFi Stack
Toke Høiland-JørgensenKarlstad University
toke.hoiland-jorgensen@kau.se
1 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Outline
I A history of bloat fixes in Linux
I The issues
I The make-wifi-fast changes
I Implementation details
I Going forward
I Summary
2 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
What is Bufferbloat?
3 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
What is Bufferbloat?
0 20 40 60 80 100 120 14005
101520
Mbi
ts/s
0 20 40 60 80 100 120 1402468
10
Mbi
ts/s
0 20 40 60 80 100 120 140Time (s)
200400600800
100012001400
Late
ncy
(ms)
3 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
What is Bufferbloat?
0 20 40 60 80 100 120 140
1.5
2.0
2.5
3.0M
bits
/s
0 20 40 60 80 100 120 1401.52.02.53.03.5
Mbi
ts/s
0 20 40 60 80 100 120 140Time (s)
50.050.551.051.552.052.553.053.5
Late
ncy
(ms)
3 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
A history of Bloat Fixes in Linux
Mainline OpenWrt / LEDE
Byte Queue Limits Linux 3.3 Dec 2012FQ-CoDel qdisc Linux 3.5 Apr 2013TCP small queues Linux 3.6sqm-scripts Oct 2014Packet pacing (fq qdisc) Linux 3.12cake qdisc May 2016BBR congestion control Linux 4.9WiFi queue rework (ath9k) Linux 4.10 Oct 2016
4 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Outline
I A history of bloat fixes in Linux
I The issues
I The make-wifi-fast changes
I Implementation details
I Going forward
I Summary
5 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Issue 1: Bufferbloat
101 102 103
Latency (ms)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
prob
abili
ty
Current kernelOur modifications
6 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Issue 2: Airtime fairness
For station i ∈ I transmitting A-MPDU aggregates of size si at PHY rate ri :
T(i) =
{1|I | with fairness
Tdata(si ,ri)∑j∈I Tdata(sj ,rj)
otherwise(1)
R(i) = T(i)R(si , ri) (2)
where R(si , ri) =si
Tdata(si ,ri)+Tohis the effective station rate with no collisions.
7 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Issue 2: Airtime fairness
0 5 10 15 20 25 30 35 40Time (s)
0
50000
100000
150000
200000
Mic
rose
cond
s
0 5 10 15 20 25 30 35 40Time (s)
0
50000
100000
150000
200000
Mic
rose
cond
s
8 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Outline
I A history of bloat fixes in Linux
I The issues
I The make-wifi-fast changes
I Implementation details
I Going forward
I Summary
9 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
10
00
*
Qdis
c la
yer
MA
C layer
ath
9k d
river
*Can be replaced with anarbitrary configuration
Per HW queue(x4)
2 a
gg
r*
FIFO
FIFO*
buf_q retry_q
TID
12
3
Prio
buf_q retry_q
TID
RR
Assign TID
Retries
To hardware
12
3
Prio
Qdisc layer (bypassed)
MA
C layer
ath
9k d
river
HW queue(x4)
2 a
ggr
FIFO
RR
Assign TID
Retries
To hardware
retry_q
TID
Prio
FQ-CoDel
Split flows
81
92
(Glo
bal lim
it)
retry_q
TID
FQ-CoDel
Prio
Split flows
81
92
(Glo
bal lim
it)
10 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Airtime scheduler (ath9k only)
function on_tx(pkt)station = get_station(pkt)
station.deficit -= pkt.duration
function on_rx(pkt)station = get_station(pkt)
station.deficit -= calc_dur(pkt)
function schedule(hwq)if full(hwq) then return
begin:
station = list_head(station_list)
if station.deficit <= 0 thenstation.deficit += quantum
list_move_end(station, station_list)
goto begin
if !station.queue thenlist_del(station)
goto begin
queue_aggregate(station)
11 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Airtime fairness performance
FIFO FQ-CoDel FQ-MAC Airtime fair FQ0.0
0.2
0.4
0.6
0.8
1.0
Airt
ime s
hare
Station 1 Station 2 Station 3 (slow)
FIFO
FQ-C
oDel
FQ-M
ACAi
rtim
e
FIFO
FQ-C
oDel
FQ-M
ACAi
rtim
e
FIFO
FQ-C
oDel
FQ-M
ACAi
rtim
e
FIFO
FQ-C
oDel
FQ-M
ACAi
rtim
e05
10152025303540
Mbi
ts/s
Station 1 Station 2 Station 3 Average
12 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
30 Stations
FQ-CoDel FQ-MAC Airtime fair FQ0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Air
tim
e sh
are Slow station
FQ-CoDel
FQ-MAC
Airtime0
5
10
15
20
Mbits/s
13 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Sparse station optimisation
5 10 15 20 25 30Latency (ms)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive p
roba
bilit
y
Enabled (UDP)Disabled (UDP)Enabled (TCP)Disabled (TCP)
20 40 60 80 100Latency (ms)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive p
roba
bilit
y
OptimisationEnabledDisabled
14 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Better aggregation
FIFO FQ-CoDel FQ-MAC Airtime fair FQ2468
10121416
Pack
ets
Station 1 Station 2 Station 3 (slow)
15 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Helps your web browsing
Small page Large page
100
101M
ean
dow
nloa
d tim
e (s)
FIFOFQ-CoDel
FQ-MACAirtime fair FQ
16 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Better VoIP quality than the VO queue?
5 ms 50 ms
QoS MOS Thrp MOS Thrp
FIFOVO 4.17 27.5 4.13 21.6BE 1.00 28.3 1.00 22.0
FQ-CoDelVO 4.17 25.5 4.08 15.2BE 1.24 23.6 1.21 15.1
FQ-MACVO 4.41 39.1 4.38 28.5BE 4.39 43.8 4.37 34.0
AirtimeVO 4.41 39.9 4.38 32.7BE 4.39 52.0 4.36 46.1
17 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Outline
I A history of bloat fixes in Linux
I The issues
I The make-wifi-fast changes
I Implementation details
I Going forward
I Summary
18 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
When to queue?
I The smart queue should be close to the HWI But some parts of 802.11 are sensitive to reordering
I Crypto IVI Sequence numbers
I State can change while packets are queued
Solution: Split mac80211 TX handlers
19 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Changes to mac80211 / driver interface
Before:
1. mac80211 calls drv_tx()
2. Driver queues packet for HWtransmit
Now:
1. mac80211 queues packet,calls drv_wake_tx_queue()
2. Driver wakes up, pullspackets from queue
Driver opts in to new behaviour by implementing wake_tx_queue() op.
20 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Drivers covered
I ath9k (full - airtime fairness pending)I ath10k (partial - airtime fairness stalled)I mt76 (full - no airtime fairness)
How to handle FullMAC / thick firmware devices?
21 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Outline
I A history of bloat fixes in Linux
I The issues
I The make-wifi-fast changes
I Implementation details
I Going forward
I Summary
22 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Scaling CoDel
I CoDel can be too aggressive at very low ratesI Needs dynamic scaling below ~4 Mbps
Possible solution: get_expected_throughput() rate control hook
23 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Smarter aggregate sizing
Idea: Cut latency by scaling aggregate size with number of active stations.
24 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Airtime policy enforcements
”My guest network should only use 20% of the capacity”
25 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Less multicast
I Multicast traps the whole network at a low rateI Unicasting to everyone can be faster
Promising patch set on linux-wireless, but breaks IP-layer assumptions
26 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Improving rate selection
I Minstrel’s convergence time depends on the number of ratesI We think we can do better with better statistics
Currently exploring a model based on the multi-armed bandit problem
27 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Rethinking QoS?
I QoS tends to map diffserv to HW priority queuesI This wastes TXOPs
What if we get rid of the TID separation and assign QoS levels dynamicallyfor all traffic?
28 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Getting closer to the hardware
I Idea: Hardware interrupt on start of transmissionI Build next aggregate when this arrives
29 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
Acknowledgements
I Co-authorsI Michał Kazior, Dave Täht, Per Hurtig and Anna Brunstrom
I LEDE and Linux devsI Felix Fietkau, Johannes Berg, Kalle Valo
I Bufferbloat and make-wifi-fast communitiesI Too many to mention by name
30 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen
SummaryI Reduced WiFi bufferbloat by an order of magnitudeI Achieved almost perfect airtime fairness in most casesI Solved several practical implementation issues
101 102 103
Latency (ms)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
prob
abili
ty
Current kernelOur modifications
Questions?31 QCA, 9th Dec 2016 | Toke Høiland-Jørgensen