Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 3 times |
Architecture for Architecture for Network Hub in 2011Network Hub in 2011
David Chinnery
Ben Horowitz
Internet ModelInternet Model
Network time-of-flight latency– Unavoidable
End point latency– Limited by cheap solution for users
Latency of internet nodes (hubs, gateways)– Can provide differentiated services
High priority packets Other packets
– If bandwidth insufficient, use multiple chips send interval of wavelengths to each
Internet VisualizationInternet VisualizationSan Fransisco, USA
Perth, Australia
? hubs2 gateways2 end users
Worst case packet journey: Halfway around the world
0.200 s tolerable latency for video conferencing
Maximum Nodes Packet TravelsMaximum Nodes Packet Travels
Average number of nodes traveled = log(number of nodes in internet)– Journey of 15.7 nodes average in 1996
Estimate one node/person in 2011– Journey of 22.7 nodes average in 2011
39 nodes worst case in 1996 (1 in 1000) Scaling by ratio of averages, gives 56.3
nodes worst case in 2011 (1 in 1000)
3 421 56
54
55
Time of FlightTime of Flight
Optic fiber delay 5 us/kmRestore signal with repeaters every 100 km
– Repeater delay 0.92 us [1999]Worst case journey length ~20,100 km20,100 × 5 + 201 × 0.92 = 100,700 usTime of flight delay of 0.101 s
0.92 us 0.92 us 0.92 us 0.92 us
100 km500 us500 us
Internet VisualizationInternet VisualizationSan Fransisco, USA
Perth, Australia
? 52 hubs ? 2 gateways ? 2 end users
Worst case packet journey:0.101 s Halfway around the world
0.200 s tolerable latency for video conferencing
End User ModelEnd User ModelWorst case scenarioProcessing intensive application
– MPEG4 encoding for HDTV2Limited silicon area, as must be low cost
– Sufficient for 1920×1080 HDTV2 at 30Hz Processing latency 1/30 s
End user to end user Processing latency doubled
0.033 s
0.033 s
Internet VisualizationInternet VisualizationSan Fransisco, USA
Perth, Australia
? 52 hubs ? 2 gateways0.067 s 2 end users
Worst case packet journey:0.101 s Halfway around the world
0.200 s tolerable latency for video conferencing
0.033 s
0.033 s
Node Hardware ModelNode Hardware Model
Processing cores are Intel IXP1200 routersConservative ASIC frequency estimate
– IXP1200 speed of 166MHz in 0.28 um– Linearly scale to 0.18 um speed ×1.56– Speed ×3.00 from 0.18 um to 0.05 um [ITRS] IXP1200 speed of 775MHz in 2011
Assume across chip speed of 775 MHz– With custom macros at 10 GHz in 2011
ITRS estimate, across chip speed of 1.5 GHz
Node Router HardwareNode Router Hardware
For gateways or hubs– 2011 ASIC: 8 cm2, 811 million transistors/cm2
6500 million transistors6.5 million transistors for IXP1200
– If 2/3 of chip is memory and wires Up to 333 IXP1200s on same chip estimate 300 IXP1200s
Packet Processing at NodesPacket Processing at Nodes
Maximum onto chip bandwidth– 927 pins chip-to-package in 2011359 Gbit/s, 695 Gbit/s
Scaling IXP1200 to 2011, can process 11 million (21 million) packets/second– Can process 3.3 billion packets/s (6.3 billion)
Smallest IP packet is 20 bytes (header size)– Maximum required processing of 2.2 billion
packets (4.3 billion)Spare processing power available
Bus and I/O OverviewBus and I/O Overview
IXP1,15
Q1in
Q1out
IXP1,1
IXP1,2
IXP2,15
Q2in
Q2out
IXP2,1
IXP2,2
IXP20,15
Q20in
Q20out
IXP20,1
IXP20,2
Qout
control
IXP19,15
Q19in
Q19out
IXP19,1
IXP19,2
Qin
control
32 bit I/O bus
128 bit control buses
64 bit control buses
48 bit header detection
448 bit output bus
448 bit input bus
Header Detection HardwareHeader Detection HardwareCustom header detection macro runs at 13
times chip speed, 10.075 GHz– 12 cycles for comparison, 1 to send positions
Forty 48-bit comparators (80 at 1.5 GHz)– Up to 6 bytes detection (Ethernet destination)– Store last 47 bits from previous 448 bit word
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
48 bit comparator
t-1 47 bits t 448 bits
1 bit shifter
48-bit Comparators48-bit ComparatorsSet mask for comparison to
0, 1 or X (don’t care) Custom comparison circuit
– Signals and their negation are available from registers
– 10 transistors to implement7 bit counter with each to set
header position About 30,000 transistors total Possible 3 packets/448 bits
31 bits of bus to send positions
inputi
maski
maski
inputi
carei
SimulatorSimulator
Other simulators cumbersome for our taskWrote event driven simulator in Java
– Worst case simulations:Can easily process at maximum bandwidth with
no additional latency
Worst Case Scenario ResultsWorst Case Scenario ResultsWorst case scenario
– Minimum packet size is 20 bytes– 448 bit input bus
3 packets or less per cycle
– IXP1200 time to calculate next destination 75 cycles minimum, 345 cycles average 600 cycles maximum
At most 7 packets processed simultaneously on IXP1200– IXP1200 has 6 micro-engines load handled easily
Conclusions from SimulationConclusions from Simulation
Latency of 605 cycles 0.78 us, 0.40 usLargest possible packet that could be sent
after started processing is 65,536 bytesAdditional 1170 cycles latency 1.51 us, 0.78 us
Transceiver delay 0.05 us [1999]Additional 0.10 us/hop
Total latency/hop of 2.4 us, 1.3 us
0.0000024 s 0.0000024 s/hub
Internet VisualizationInternet VisualizationSan Fransisco, USA
Perth, Australia
0.033 s
0.033 s
0.0000024 s
0.0000024 s
0.0000024 s/hub
< 0.001 s 2 gateways0.067 s 2 end users
Worst case packet journey:0.101 s Halfway around the world
0.169 s tolerable latency for video conferencing
< 0.001 s 52 hubs (probability of 1 in
1000)
ConclusionsConclusions Limiting factor is maximum bandwidthAverage case simulations done
Can easily process at maximum bandwidth with 40 IXP1200 processors (mostly longer packets)
Reduce processing power to levels sufficient for bandwidth and model– Less IXP1200s on chip– Smaller chip size reduces cost– Reduced processing power increases
congestion, and may require high priority packets for some communications
448 Bit Operation Cycles448 Bit Operation Cycles 448 bits onto chip Up to 48 bit header detection on previous 47 bits, and 401 bits
of current 448 bits (48 bit comparators)– Send header positions in this 448 bit window
Send to high priority and low priority in queues Packet priority detection (header) in queues Incorrect priority queue drops packet, in queue controller
informed Remainder of packet sent to appropriate in queue Process packet header, send packet body to out queue Process times between 70 and 600 cycles, 345 cycles avg. Send updated packet header to out queue Inform out queue controller packet ready to send Send when output bus available 448 bits off chip
Maximum Throughput Maximum Throughput Node HardwareNode Hardware
For gateways or hubs6.5 million transistors for IXP12000.5 million transistors for other applications
such as speech codecs, V.42bis, Huffman compression, and 3DES
Up to 310 IXP1200s on the same chip
ASIC max. transistors in 1999 (millions/cm2) [ITRS] 20ASIC max. transistors in 2011 (millions/cm2) [ITRS] 811ASIC max. chip size in 1999 (cm2) [ITRS] 8ASIC max. chip size in 2011 (cm2) [ITRS] 8ASIC max. number of transistors/chip in 2011 (millions) 6488transistors for backbone IXP1200+other possible applications (millions) - one of each 7.0ideal possible number of IXP1200+other possible applications/chip 931assume 2/3 overhead for memory, routing, and other CPUs; number of IXP1200s et al. 310
927 pins with I/O at clock speed
Packet Processing at NodesPacket Processing at Nodes
Maximum onto chip bandwidth
Smallest IP packet is 20 bytes (header size)Maximum required processing power
chip-to-package pads in 2011 927 927maximum I/O speed at IXP1200 operating speed with maximum use of pads (Gbits/s) 718 1391maximum I/O bandwidth onto chip, must get back off chip as well (Gbit/s) 359 695
number of bits/s that must be processed on chip (Gbit/s) 359 695smallest possible packet size (bytes) 20 20worst case number of (20 byte) packets/s that must be processed per IXP1200 7,230,414 14,000,372number of packets that IXP1200 can process/s in 1999 2,300,000 2,300,000number of packets that IXP1200 can process/s in 2011 10,733,333 20,783,133number of packets that can be processed in 2011/s on our chip 3,331,317,770 6,450,486,216
927 pins with I/O at clock speed
Hub Cache and Main MemoryHub Cache and Main Memory
Required for IXP1200sAssumed by Scott in IXP1200 simulations:
– 4 MB of DRAM– 2 MB of SRAM
DRAM memory per IXP1200 in 1999 (Gbytes) 0.004SRAM memory per IXP1200 in 1999 (Gbytes) 0.002DRAM memory, or equivalent, required in 2011 for IXP1200s (Gbytes) 1.24SRAM memory, or equivalent, required in 2011 for IXP1200s (Gbytes) 0.62area required for DRAM for IXP1200s (cm^2) 0.17area required for SRAM for IXP1200s (cm^2) 0.24
Hub Register MemoryHub Register Memoryaverage packet latency in 2011 (s) 0.00000009latency for a single packet in 2011 (s) 0.00000045number of pins for packet I/O (this many each to get on and off) 463.5maximum bandwidth onto chip and back off chip (Gbit/s) 359minimum IPv4 packet size (bytes) 20maximum number of IPv4 packets/s (x10^9) 18maximum number of IPv4 packets to store while a packet is processed 9668maximum packet size in IPv6 (bytes) 65536average storage capacity required at maximum bandwidth (Gbit) 0.00019number of in queues 20number of out queues 20assuming one maximum length packet in each queue, register storage required (Gbit) 0.021area of all the registers (cm^2) 0.028
Average Scenario InformationAverage Scenario Information
Assumed normal distribution between 80 and 600 cycles to process a packet– Average of 340 cycles– 80 and 600 are two
standard deviations from mean
Packet sizes: