8/10/2019 USB Performance Analysis of Bulk Traffic
1/50
PlatformA
rchitectureLab
USB Performance Analysis
of Bulk Traffic
Brian [email protected]
8/10/2019 USB Performance Analysis of Bulk Traffic
2/50
P
latformA
rchitectureLab
2
Introduction
Bulk TrafficDesigned for reliable, highly variable data
transfer
No guarantees are made in the specification
for throughputIs scheduled last after ISOC, Interrupt, and
Control
Throughput is dependant on many factors
8/10/2019 USB Performance Analysis of Bulk Traffic
3/50
P
latformA
rchitectureLab
3
Introduction
We will look at Bulk Throughput from thefollowing aspectsDistribution of Throughput for Various Packet Sizes and
Endpoints
Low Bandwidth Performance
Small Endpoint PerformanceNak Performance
CPU Utilization
PCI bus Utilization
8/10/2019 USB Performance Analysis of Bulk Traffic
4/50
P
latformA
rchitectureLab
4
Test Environment -- Hardware
PII 233 (8522px) with 512 Bytes CacheAtlanta Motherboard with 440LX (PIX
4A) Chipset
32 Meg MemorySymbios OHCI Controller (for OHCI
Measurements)
Intel Lava Card as Test Device
8/10/2019 USB Performance Analysis of Bulk Traffic
5/50
P
latformA
rchitectureLab
5
Test Environment -- Software
Custom Driver and ApplicationTest Started by IOCTL
IOCTL allocates static memory
structures, submits IRP to USBDCompletion routine resubmits next
buffer
All processing done at ring 0,
IRQL_DISPATCH
8/10/2019 USB Performance Analysis of Bulk Traffic
6/50
P
latformA
rchitectureLab
6
Terminology
A Packet is a Single Packet of Data on the Bus. It isdetermined by Max Packet Size of the Device Valid numbers are 8, 16, 32, 64
A Buffer is the amount of data sent to USBD in a
Single IRP. In this presentation buffers range from 8 Bytes to 64K Bytes
Unless otherwise specified, Most Data Taken at 64
Byte Max Packet Size, 15 Endpoints Configured in the
System
8/10/2019 USB Performance Analysis of Bulk Traffic
7/50P
latformA
rchitectureLab
7
Host Controller Operation (UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
8/50P
latformA
rchitectureLab
8
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
32
128
5122048
8192
32767
0
200000
400000
600000
800000
1000000
1200000
Throughpu
t
(BytesperSec
ond)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All End Points v.s. Buffer Size for MultipleEndpoints
(UHCI)Single Endpoint Throughput
Flat Throughput @
512 and 1024 Byte
Buffers
Oscillations @ 256, 512
ByteBuffers
Small Buffer Throughput
8/10/2019 USB Performance Analysis of Bulk Traffic
9/50P
latformA
rchitectureLab
9
Small Buffer Throughput
For Buffer Sizes < Max Packet Size
Host Controller sends 1 Buffer per Frame
No Ability to Look Ahead and Schedule
Another IRP Even Though Time Remains in
the Frame
Why is this?
8/10/2019 USB Performance Analysis of Bulk Traffic
10/50P
latformA
rchitectureLab
10
Interrupt Delay
Start of Frame
Interrupt
Unused Frame Software Latency
Last Packet
Buffer 'n'
First Packet
Buffer 'n+1'
8/10/2019 USB Performance Analysis of Bulk Traffic
11/50P
latformA
rchitectureLab
11
Single Endpoint Graph
Flat Throughput @ 1024 and 512 Byte Graphs
Single Ended Throughput for 64K Byte Buffers Below
Theoretical Max of 1216000 Bytes per Second
Both are explained by Looking at the Number of
Packets per Frame
8/10/2019 USB Performance Analysis of Bulk Traffic
12/50P
latformA
rchitectureLab
12
Maximum Packets per Frame
BufferSize
MaximumBytes per
Frame (15
Packets @
64 Bytes
Per
Packet)
Number ofFrames to
transfer
bulk of data
Numberof Bytes
Left Over
TotalNumber of
Frames To
Transfer
Data
MaximumExpected
Throughput
(Bytes per
Second for
Transfer
Size)
MeasuredThroughput
(Bytes per
Second)
8 960 1 0 1 8000 807116 960 1 0 1 16000 16082
32 960 1 0 1 32000 32293
64 960 1 0 1 64000 64264
128 960 1 0 1 128000 129186
256 960 1 0 1 256000 255667
512 960 1 0 1 512000 512017
1024 960 1 64 2 512000 5155152048 960 2 128 3 682666 682803
4096 960 4 256 5 819200 819200
8192 960 8 512 9 910222 910131
16384 960 17 64 18 910222 910404
32768 960 34 128 35 936228 936072
65536 960 68 256 69 949797 948087
8/10/2019 USB Performance Analysis of Bulk Traffic
13/50P
latformA
rchitectureLab
13
Throughput for Multiple Endpoints
512 Byte Buffers
0
200000
400000
600000
800000
1000000
1200000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of Endpoints
Thr
oughput(BytesPerSecon
d)
8/10/2019 USB Performance Analysis of Bulk Traffic
14/50P
latformA
rchitectur
eLab
14
512 Byte Buffers 1 Endpoint
8 Packets * 64 Bytes per Packet = 512,000 B/S 511986 Measured
EndPoint
InterDelay(Bits)
P A C K E T N U M B E R EndTime(Bits)
1 SOF 1000 0 1 2 3 4 5 6 7 8 5000
8 Packets Total per Frame
8/10/2019 USB Performance Analysis of Bulk Traffic
15/50P
latformA
rchitectur
eLab
15
512 Byte Buffers 2 Endpoints
16 Packets * 64 Bytes per Packet = 1,024,000 B/S 1,022,067 B/S Measured
Notice that Interrupt Delay is not a factor here!
EndPoint
InterDelay(Bits)
P A C K E T N U M B E R EndingTime(Bits)
2 SOF 5 7 0 1 2 3 4 5 6
1 0 1 2 3 4 5 6 7 480
16 Packets Total per Frame
8/10/2019 USB Performance Analysis of Bulk Traffic
16/50P
latformA
rchitectur
eLab
16
512 Byte Buffer -- 3 Endpoints
24 Packets * 64 Bytes / 2 Frames = 768,000 B/S 776,211 Measured
For Frame N
End
Point
Inter
Delay
(Bits)
P A C K E T N U M B E R Ending
Time
3 S 0 1 2 3 4 5 554
2 O 1000 0 1 2 3 4
1 F 0 1 2 3
15 Packets Total in This Frame
For Frame N + 1
EndPoint
InterDelay
(Bits)
P A C K E T N U M B E R EndingTime
3 S 6 7
2 O 5 5 6 7
1 F 4 5 6 7 5700
9 Packets Total in This Frame
8/10/2019 USB Performance Analysis of Bulk Traffic
17/50P
latformA
rchitectur
eLab
17
151413121110 9 8 7 6 5 4 3 2 1
8
64
512
4096
32768
0
200000
400000
600000
800000
1000000
1200000
TotalThrough
put
(BytesperSec
ond)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All Endpoints V.S. Buffer Size
for Multiple Endpoints
(OHCI)High End Throughput
18 PPF VS 17 PPF
Single Ended Throughput
900,000 VS 950,000 B/S
Flat Throughput @
512 and 1024 B
Buffers
Small Buffer
Throughput
Oscillations @
256 and 512 B
buffers
8/10/2019 USB Performance Analysis of Bulk Traffic
18/50P
latformA
rchitectur
eLab
Minimal Endpoint Configuration
8/10/2019 USB Performance Analysis of Bulk Traffic
19/50
P
latformA
rchitectur
eLab
19
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
64
512
4096
32768
0
200000
400000
600000
8 0 0 0 0 0
1000 000
1200 000
TotalThroughput
(BytesperSec
ond)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All Endpoints V.S. Buffer Size for
Multiple Endpoints
Minimal Endpoint Configuration
(UHCI)Higher Single Endpoint
Throughput 17 VS 15 PPF
8/10/2019 USB Performance Analysis of Bulk Traffic
20/50
P
latformA
rchitectur
eLab
20
Host Controller Operation (UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
21/50
P
latformA
rchitectur
eLab
21
Throughput of a Single Endpoint in Single and Multiple Endpoint Configurations
(UHCI)
0
200000
400000
600000
800000
1000000
1200000
8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
Buffer Size (Bytes)
Throughput(Bytes
perSecond)
Single
Multiple
8/10/2019 USB Performance Analysis of Bulk Traffic
22/50
P
latformA
rchitectur
eLab
22
Results
We are working with Microsoft to remove
unused endpoints from the Host ControllerData Structures
8/10/2019 USB Performance Analysis of Bulk Traffic
23/50
P
latformA
rchitectur
eLab
23
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
32
128
512
2048
819232768
0
200000
400000
600000
800000
1000000
1200000
TotalThroughput
(BytesperSeco
nd)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
Minimal Endpoint Configuration
(OHCI)Higher Single Endpoint
ThroughputMore Endpoints get 18
Packets per Frame
8/10/2019 USB Performance Analysis of Bulk Traffic
24/50
P
latformA
rchitectur
eLab
Distribution of Throughput across
Endpoints
8/10/2019 USB Performance Analysis of Bulk Traffic
25/50
P
latformA
rchitectur
eLab
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15
12
9
6
3
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
Throughput
(BytesPerSec)
Endpoint Number
Number of Endpoints
Throughput by End Point V.S. Number of Endpoints
(UHCI)
64K Byte Buffers
8/10/2019 USB Performance Analysis of Bulk Traffic
26/50
P
latformA
rchitectur
eLab
26
Results
We are working with Microsoft to get the Host
Controller driver to start sending packets at the nextendpoint rather than starting over at the beginning of
the frame.
8/10/2019 USB Performance Analysis of Bulk Traffic
27/50
P
latformA
rchitectur
eLab
27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15
12
9
6
3
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
Throughput
(Bytes
PerSec)
Endpoint Number
Number of Endpoints
Throughput by Endpoint V.S. Number of Endpoints
64K Byte Buffers
(OHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
28/50
P
latformA
rchitectur
eLab
Limited Bandwidth Operation
8/10/2019 USB Performance Analysis of Bulk Traffic
29/50
P
latformA
rchitectur
eLab
29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15
13
11
9
7
5
3
1
0
50000
100000
150000
200000
250000
300000
Throughput
(Bytes
perSeco
nd)
Number of Endpoints
Endpoint Number
Throughput by Endpoint V.S. Number of Endpoints
1023 Bytes / Frame Isoc Traffic
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
30/50
P
latformA
rchitectur
eLab
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
15
13
11
9
7
5
3
1
0
50000
100000
150000
200000
250000
300000
350000
400000
Throughput(Bytes
PerSec)
Number of Endpoints
Endpoint Number
Throughput by Endpoint V.S. Number of Endpoints768 Bytes / Frame Isoc Traffic
(OHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
31/50
P
latformA
rchitectur
eLab
Small Endpoint Performance
8/10/2019 USB Performance Analysis of Bulk Traffic
32/50
P
latformA
rchitectur
eLab
32
15 1413 1211 10 9 8 7 6 5 4 3 2 1
8
64
512
4096
32768
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
TotalThroughput
(BytesperSeco
nd)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All End Points V.S. Buffer Size for
Multiple Endpoints
8 Byte Max Packet Size
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
33/50
P
latformA
rchitectur
eLab
33
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
32
128
512
2048
8192
32768
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
TotalThroughp
ut
(BytesperSeco
nd)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All End Points v.s. Buffer Size for Multiple Endpoints
8 Byte Max Packet Size
(OHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
34/50
P
latformA
rchitectur
eLab
34
Total Throughput for a Single Endpoint for Various Packet Sizes
(OHCI)
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
Buffer Size
Throughput(BytesperSecond)
8
16
32
64
8/10/2019 USB Performance Analysis of Bulk Traffic
35/50
P
latformA
rchitectur
eLab
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15
13
11
9
7
5
3
1
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
Throughput(BytesperSecond)
Endpoint Number
Number of Endpoints
Throughput by Endpoint V.S. Number of Endpoints
Mixed 64 and 8 Byte Endpoints
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
36/50
P
latformA
rchitectur
eLab
36
If you care about throughput.
Use 64 byte Max Packet Size Endpoints
Use Large Buffers
8/10/2019 USB Performance Analysis of Bulk Traffic
37/50
P
latformA
rchitectur
eLab
Nak Performance
8/10/2019 USB Performance Analysis of Bulk Traffic
38/50
P
latformA
rchitectur
eLab
38
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
32
128
512
2048
8192
32768
0
200000
400000
600000
800000
1000000
1200000
TotalThroughput
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All Endpoints V.S. Buffer Size
for Multiple Endpoints
with 1 Endpoint NAKing 64 Bytes OUT
(OHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
39/50
P
latformA
rchitectur
eLab
39
Single Endpoint Throughput
With 64 Byte Endpoint NAKing on the Bus
(OHCI)
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
6553
6
Buffer Size
Throughput(Bytes
perSecond)
No NAK
NAK
45 % Drop in Total
Throughput
8/10/2019 USB Performance Analysis of Bulk Traffic
40/50
P
latformA
rchitectur
eLab
40
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
8
32
128
512
2048
8192
32768
0
200000
400000
600000
800000
1000000
1200000
TotalThroughpu
t
(BytesperSecon
d)
Number of Endpoints
Buffer Size
(Bytes)
Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
14 Endpoints OUT, 1 Endpoint NAK IN
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
41/50
P
latformA
rchitectur
eLab
41
Single Endpoint Throughput
One Endpoint NAKing IN
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536
Buffer Size
Throughput(Byte
s
perSecond)
NakNo NAK
8/10/2019 USB Performance Analysis of Bulk Traffic
42/50
P
latformA
rchitectur
eLab
CPU Utilization
8/10/2019 USB Performance Analysis of Bulk Traffic
43/50
P
latformA
rchitectur
eLab
43
CPU Utilization
Idle process incrementing a counter in main memoryDesigned to simulate a heavily CPU bound load
Numbers indicate how much work the CPU could
accomplish after servicing USB trafficHigher numbers are better
Small buffers and large numbers of Endpoints take
more overhead Software Stack Navigation
Endpoint 0 is the Control -- No USB Traffic running
8/10/2019 USB Performance Analysis of Bulk Traffic
44/50
P
latformA
rchitectur
eLab
44
2048
4096
8192
16384
32768
65536
15
13
11
9
7
5
3
1
0
2000000
4000000
6000000
8000000
10000000
12000000
IdleCount
Buffer Size (Bytes)
Number of
Endpoints
CPU Utilization
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
45/50
P
latformA
rchitectureLab
45
2048 4096 8192 16384 32768 65536
15
13
11
9
7
5
3
1
0
2000000
4000000
6000000
8000000
10000000
12000000
CPU Utilization
(OHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
46/50
P
latformA
rchitectureLab
PCI Utilization
8/10/2019 USB Performance Analysis of Bulk Traffic
47/50
P
latformA
rchitectureLab
47
2048 4096 819216384 32768
65536
15
13
11
9
7
5
3
1
0
5
10
15
20
25
30
35
% U
t i l i
z a
t i o n
Buffer Size
Number of Endpoints
PCI Utilization
(UHCI)
8/10/2019 USB Performance Analysis of Bulk Traffic
48/50
P
latformA
rchitectureLab
48
PCI Utilization
(UHCI) 15 Endpoint Configuration
For low numbers of active endpoints, Host Controller
must poll memory for each unused endpoint, causingrelatively high utilization.
Removing unused endpoints will lower single
endpoint PCI utilization for this configuration.
8/10/2019 USB Performance Analysis of Bulk Traffic
49/50
P
latformA
rchitectureLab
49
Conclusions
UHCI Host Controller Driver needs a few tweaksNeed to get Host Controller to start sending packets where it last
left off rather than at endpoint 1.
Needs to remove unused endpoints from the list
Performance Recommendations
Use 64 Byte Max Packet Size Endpoints Large Buffers are better than small buffers
Reduce NAKd traffic
Fast devices if possible
8/10/2019 USB Performance Analysis of Bulk Traffic
50/50
latformA
rchitectureLab
50
Future Research Topics
Multiple IRPS per Pipe
USB needs to control throughput to the slow device Small Endpoints arent good
Small Buffers arent good
NAKing isnt good