Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | grant-phillips |
View: | 213 times |
Download: | 0 times |
12/11/19912/11/19977
[[11]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Hyok Kim
http://www.hallym.ac.kr/~hkim
Performance Analysis of TCP/IP Data Send/Receive ProcessingUnder UNIX Operating Systems
12/11/19912/11/19977
[[22]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Talk Outline
Project overviewProject overview Performance analysis of TCP/IP protocolPerformance analysis of TCP/IP protocol
Performance analysis of Parallel TCP/IPPerformance analysis of Parallel TCP/IP Bottlenecks in processing TCP/IPBottlenecks in processing TCP/IP Performance analysis techniquesPerformance analysis techniques Measurement tool and performance metricsMeasurement tool and performance metrics Empirical resultsEmpirical results
Future & on-going worksFuture & on-going works Concluding remarksConcluding remarks
12/11/19912/11/19977
[[33]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Project OverviewProject Overview
H/W implementation of TCP/IP protocolH/W implementation of TCP/IP protocol Handling ATM traffic(155Mbps or higher)Handling ATM traffic(155Mbps or higher) ATM interfacingATM interfacing
SpecificationSpecification Design of TCP/IP protocol processorDesign of TCP/IP protocol processor ATM interfacingATM interfacing PCI/AMBA interfacingPCI/AMBA interfacing API implementation for TCP/IP H/WAPI implementation for TCP/IP H/W
Joint project with Joint project with Hallym U. & Pusan National U. (major institute)Hallym U. & Pusan National U. (major institute) Kwangwoon U. & Kyungpook National U.Kwangwoon U. & Kyungpook National U.
12/11/19912/11/19977
[[44]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Internet Layering and Peer ModelInternet Layering and Peer Model
FTP client
FTPserver
TCP TCP
IP IP
data link driver
data link driver
data link protocol
IP protocol
FTP protocol
TCP protocol
medium
application
transport
network
link
12/11/19912/11/19977
[[55]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Bandwidth delivery by TCP/IPBandwidth delivery by TCP/IP
Application
ATMFast EthernetFDDI
Bandwidthrequirement
Bandwidthsupply
Reasonable bandwidth delivery ?
Application
ATMFast Ethernet
FDDI
TCP/IPTCP/IP
12/11/19912/11/19977
[[66]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Coarse Grain Architecture of Parallel TCP/IPCoarse Grain Architecture of Parallel TCP/IP
12/11/19912/11/19977
[[77]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Wnd.Sizing
Wnd.Sizing
Urgent request
Urgent request
Segmentre-assembly
Segmentre-assembly
TCP Error Check
TCP Error Check
TCPchecksum
TCPchecksum
QueueQueue
Flagtest
Flagtest
Securitycheck
Securitycheck
connectionname check
connectionname check
ACKcheck
ACKcheck
Statuscheck
Statuscheck
Wnd.check
Wnd.check
Application
TCP Control Info.
TCP Conn. Info.
IP Layer
Parallel Architecture of TCP Data ReceiverParallel Architecture of TCP Data Receiver
12/11/19912/11/19977
[[88]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance of TCP data receiverPerformance of TCP data receiver
0
500
1000
1500
2000
2500
3000
3500
Cycle Inst. Data Read Data Write
203 64 29 15
3279
726426
683 2 1 027 2 1 0
952
322128 68
713
115 48 28
Con. Name. SearchTCP checksumRcv. Wnd. CheckACK checkSequencingData Rcv.
Performance of TCPReceive
12/11/19912/11/19977
[[99]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance of Parallel TCP/IPPerformance of Parallel TCP/IP
Estimated speed-up Estimated speed-up
against sequential executionagainst sequential execution
IP S en d 1 .0 6
IP R ece iv e 2 .5 5
T C P S en d 1 .1 4
T C P R ece iv e 1 .0 2
12/11/19912/11/19977
[[1010]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Bottlenecks in TCP/IP Processing
data copiesdata copies between user space and kernel spacebetween user space and kernel space between kernel space and network devicebetween kernel space and network device
checksum calculationchecksum calculation memorymemory/timer management/timer management
interaction between protocol and OSinteraction between protocol and OS NOT the protocol itselfNOT the protocol itself
12/11/19912/11/19977
[[1111]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance Measurement (I)
S/W based measurementS/W based measurement unacceptable perturbation due to interrupt handunacceptable perturbation due to interrupt hand
ling or memory swappingling or memory swapping
H/W based measurementH/W based measurement specially designed H/W or logic analyzerspecially designed H/W or logic analyzer limited flexibilitylimited flexibility data acquisition only on execution timedata acquisition only on execution time
ex) MultiKron chip(project) by NISTex) MultiKron chip(project) by NIST
Probabilistic Analysis : Probabilistic Analysis : Queueing TheoryQueueing Theory
12/11/19912/11/19977
[[1212]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance Measurement (II)
Our measurementOur measurement using counters in Intel Pentium processorusing counters in Intel Pentium processor
time resolution is the same as system clock ticktime resolution is the same as system clock tick 166MHz -> 6ns166MHz -> 6ns 200MHz -> 5ns200MHz -> 5ns
provides additional informationprovides additional information memory access counts (memory bandwidth)memory access counts (memory bandwidth) number of H/W interruptsnumber of H/W interrupts mis-aligned data memory referencesmis-aligned data memory references branchesbranches
12/11/19912/11/19977
[[1313]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance measurement setup- sender’s part -
Communicating party System under measurement
connection
write
disconnect
user processTCP
IP
data link
socket
(4) (5)
(3) (6)
(2) (7)
(1)
Isolated 10BaseT Ethernet
Legends:(1) memory allocation and data copy(2) TCP processing(3) IP processing(4) data send to media(5) ACK arrives at datalink layer(6) ACK processing at IP(7) ACK processing at TCP
socketinitialization
12/11/19912/11/19977
[[1414]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Performance Measurement Setup- receiver’s part -
Communicating party System under measurement
socket()
bind() listen()accept()
read()
disconnect()
user processTCP
IP
data link
socket
(1) (7)
(2) (6)
(3) (5)
(4)
Isolated 10BaseT Ethernet
Legends:(1) Frame arrives at data link layer(2) IP processing(3) TCP processing(4) data copy from kernel space to user space(5) ACK construction at TCP(6) IP processing(7) data send to media
12/11/19912/11/19977
[[1515]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (I)
Cycle counts in TCP/IP send processing
3623
4451 8597
1930
8413
10111
861
13606
26015
36984
91015
401
401
10707
10707
10707
10707
1592
7171
861
861
861
757
757
757
757
401401
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
100bytes 300bytes 500bytes 1440bytes
Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t
12/11/19912/11/19977
[[1616]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (II)
Dynamic instruction counts in TCP/IP send processing
37
254
254
254
917
1016
1098
1458
262
262
262
262
577
593
593
593
68 69 69 69
163
163
163
163
1224
1224
1224
1224
0
200
400
600
800
1000
1200
1400
1600
100bytes 300bytes 500bytes 1440bytes
Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t
12/11/19912/11/19977
[[1717]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (III)
Memory access counts in TCP/IP send processing
203
330
430
911
652 71
0 766
1029
139
139
139
139
460
571
671
1141
47 47 47 47
72 72 72 72
724
724
724
724
0
200
400
600
800
1000
1200
100bytes 300bytes 500bytes 1440bytes
Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t
12/11/19912/11/19977
[[1818]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (IV)
Cycle counts in TCP/IP receive processing
3660
5471 5744 6028
401
401
401
401
3107 35
99 4144
6434
948
1031
1111 1471
1086
1086
1086
1086
2932
4196
5196
10986
6771
6771
6771
6771
861
861
861
861
7313
7313
7313
7313
0
2000
4000
6000
8000
10000
12000
100bytes 300bytes 500bytes 1440bytes
Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t
12/11/19912/11/19977
[[1919]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (V)
Dynamic Instruction counts in TCP/IP receive processing
371
1311
1315
1557
163
163
163
163
654 83
0 998
1797
92
156 220
508
116
116
116
116
472
944
1416
3540
834
834
834
834
262
262
262
262
465
465
465
465
0
500
1000
1500
2000
2500
3000
3500
4000
100bytes 300bytes 500bytes 1440bytes
Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t
12/11/19912/11/19977
[[2020]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Empirical Result (VI)
Memory access counts in TCP/IP receive processing
252
826 918 1026
72 72 72 72
358 438 517
887
63 101
139
310
81 81 81 81
372
796
1220
3148
612
612
612
612
139
139
139
139
339
339
339
339
0
500
1000
1500
2000
2500
3000
3500
100bytes 300bytes 500bytes 1440bytes
Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t
12/11/19912/11/19977
[[2121]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Memory Bandwidth Requirement (I)
““By matching the memory to the special By matching the memory to the special needs of packet processing,needs of packet processing, one could achieve high performance at an one could achieve high performance at an
acceptable cost”, by V. Jacobson.acceptable cost”, by V. Jacobson.
12/11/19912/11/19977
[[2222]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Memory Bandwidth Requirement (II)Memory Bandwidth Requirement (II)
Then, how many memory accesses Then, how many memory accesses occur ?occur ? we measured itwe measured it
requiredTime
accessesmemoryofbitswidthbusBW bytes
_
__#)(_
8
1.)sec/max(
12/11/19912/11/19977
[[2323]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Pure TCP/IP performance
SendProcessing
ReceiveProcessing
Mbps Memory BW required(Mbytes/sec.) Mbps Memory BW required
(Mbytes/sec.)
TCP/IP performance(including ACK processing )
56 61 62 172TCP/IP performance(excluding ACK processing)
87 61 85 172
* Calculation on 1440 bytes packet
not considering data link latencynot considering data link latency considering data send/receive and ACK considering data send/receive and ACK
segment send/receive time only in TCP/IP segment send/receive time only in TCP/IP layerlayer
12/11/19912/11/19977
[[2424]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
From Empirical Results
To enhance the performance of TCP/IPTo enhance the performance of TCP/IP design of efficient interface between design of efficient interface between
protocol stack and OS is requiredprotocol stack and OS is required And How?And How?
12/11/19912/11/19977
[[2525]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Future & On-going Works
Feasibility Study of ATM internetworkingFeasibility Study of ATM internetworking Analysis ofAnalysis of
ALL5 trafficALL5 traffic signaling protocolssignaling protocols commercial SAR chips & bus interfacescommercial SAR chips & bus interfaces Internetworking technologyInternetworking technology
LANE, IP over ATM, Multiprotocol over ATMLANE, IP over ATM, Multiprotocol over ATM Next Hop Resolution Protocol, etc.Next Hop Resolution Protocol, etc.
Development of TCP/IP H/W moduleDevelopment of TCP/IP H/W module now, Ethernet-based implementationnow, Ethernet-based implementation
12/11/19912/11/19977
[[2626]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Overview of TCP/IP H/W ImplementationOverview of TCP/IP H/W Implementation
TCPtimer module
Checksummodule
Memorymanagement
Unit
ARM Target System
ARM7TDMI RISC processorARM7TDMI RISC processor AMBA expansion connectorsAMBA expansion connectors
FPGA implementationFPGA implementation
12/11/19912/11/19977
[[2727]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Duration Register
ID Manager
Lookup Table
StateScheduler
Expired &Reference TimeGenerator
TimingScheduler
CAM
Timer RecordMemory
StackManager
ZeroDetect
Timer Management ModuleTimer Management Module
12/11/19912/11/19977
[[2828]]
High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory
Conclusion
OS overheads play major role in high OS overheads play major role in high performance TCP/IP processingperformance TCP/IP processing
Measurement of memory access countsMeasurement of memory access counts estimation of memory bandwidth estimation of memory bandwidth
requirementrequirement H/W implementation is needed for time-H/W implementation is needed for time-
consuming modulesconsuming modules