+ All Categories
Home > Documents > Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Date post: 20-Dec-2015
Category:
View: 216 times
Download: 3 times
Share this document with a friend
Popular Tags:
25
Architecture for Architecture for Network Hub in 2011 Network Hub in 2011 David Chinnery Ben Horowitz
Transcript
Page 1: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Architecture for Architecture for Network Hub in 2011Network Hub in 2011

David Chinnery

Ben Horowitz

Page 2: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Internet ModelInternet Model

Network time-of-flight latency– Unavoidable

End point latency– Limited by cheap solution for users

Latency of internet nodes (hubs, gateways)– Can provide differentiated services

High priority packets Other packets

– If bandwidth insufficient, use multiple chips send interval of wavelengths to each

Page 3: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Internet VisualizationInternet VisualizationSan Fransisco, USA

Perth, Australia

? hubs2 gateways2 end users

Worst case packet journey: Halfway around the world

0.200 s tolerable latency for video conferencing

Page 4: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Maximum Nodes Packet TravelsMaximum Nodes Packet Travels

Average number of nodes traveled = log(number of nodes in internet)– Journey of 15.7 nodes average in 1996

Estimate one node/person in 2011– Journey of 22.7 nodes average in 2011

39 nodes worst case in 1996 (1 in 1000) Scaling by ratio of averages, gives 56.3

nodes worst case in 2011 (1 in 1000)

3 421 56

54

55

Page 5: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Time of FlightTime of Flight

Optic fiber delay 5 us/kmRestore signal with repeaters every 100 km

– Repeater delay 0.92 us [1999]Worst case journey length ~20,100 km20,100 × 5 + 201 × 0.92 = 100,700 usTime of flight delay of 0.101 s

0.92 us 0.92 us 0.92 us 0.92 us

100 km500 us500 us

Page 6: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Internet VisualizationInternet VisualizationSan Fransisco, USA

Perth, Australia

? 52 hubs ? 2 gateways ? 2 end users

Worst case packet journey:0.101 s Halfway around the world

0.200 s tolerable latency for video conferencing

Page 7: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

End User ModelEnd User ModelWorst case scenarioProcessing intensive application

– MPEG4 encoding for HDTV2Limited silicon area, as must be low cost

– Sufficient for 1920×1080 HDTV2 at 30Hz Processing latency 1/30 s

End user to end user Processing latency doubled

0.033 s

0.033 s

Page 8: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Internet VisualizationInternet VisualizationSan Fransisco, USA

Perth, Australia

? 52 hubs ? 2 gateways0.067 s 2 end users

Worst case packet journey:0.101 s Halfway around the world

0.200 s tolerable latency for video conferencing

0.033 s

0.033 s

Page 9: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Node Hardware ModelNode Hardware Model

Processing cores are Intel IXP1200 routersConservative ASIC frequency estimate

– IXP1200 speed of 166MHz in 0.28 um– Linearly scale to 0.18 um speed ×1.56– Speed ×3.00 from 0.18 um to 0.05 um [ITRS] IXP1200 speed of 775MHz in 2011

Assume across chip speed of 775 MHz– With custom macros at 10 GHz in 2011

ITRS estimate, across chip speed of 1.5 GHz

Page 10: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Node Router HardwareNode Router Hardware

For gateways or hubs– 2011 ASIC: 8 cm2, 811 million transistors/cm2

6500 million transistors6.5 million transistors for IXP1200

– If 2/3 of chip is memory and wires Up to 333 IXP1200s on same chip estimate 300 IXP1200s

Page 11: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Packet Processing at NodesPacket Processing at Nodes

Maximum onto chip bandwidth– 927 pins chip-to-package in 2011359 Gbit/s, 695 Gbit/s

Scaling IXP1200 to 2011, can process 11 million (21 million) packets/second– Can process 3.3 billion packets/s (6.3 billion)

Smallest IP packet is 20 bytes (header size)– Maximum required processing of 2.2 billion

packets (4.3 billion)Spare processing power available

Page 12: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Bus and I/O OverviewBus and I/O Overview

IXP1,15

Q1in

Q1out

IXP1,1

IXP1,2

IXP2,15

Q2in

Q2out

IXP2,1

IXP2,2

IXP20,15

Q20in

Q20out

IXP20,1

IXP20,2

Qout

control

IXP19,15

Q19in

Q19out

IXP19,1

IXP19,2

Qin

control

32 bit I/O bus

128 bit control buses

64 bit control buses

48 bit header detection

448 bit output bus

448 bit input bus

Page 13: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Header Detection HardwareHeader Detection HardwareCustom header detection macro runs at 13

times chip speed, 10.075 GHz– 12 cycles for comparison, 1 to send positions

Forty 48-bit comparators (80 at 1.5 GHz)– Up to 6 bytes detection (Ethernet destination)– Store last 47 bits from previous 448 bit word

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

48 bit comparator

t-1 47 bits t 448 bits

1 bit shifter

Page 14: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

48-bit Comparators48-bit ComparatorsSet mask for comparison to

0, 1 or X (don’t care) Custom comparison circuit

– Signals and their negation are available from registers

– 10 transistors to implement7 bit counter with each to set

header position About 30,000 transistors total Possible 3 packets/448 bits

31 bits of bus to send positions

inputi

maski

maski

inputi

carei

Page 15: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

SimulatorSimulator

Other simulators cumbersome for our taskWrote event driven simulator in Java

– Worst case simulations:Can easily process at maximum bandwidth with

no additional latency

Page 16: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Worst Case Scenario ResultsWorst Case Scenario ResultsWorst case scenario

– Minimum packet size is 20 bytes– 448 bit input bus

3 packets or less per cycle

– IXP1200 time to calculate next destination 75 cycles minimum, 345 cycles average 600 cycles maximum

At most 7 packets processed simultaneously on IXP1200– IXP1200 has 6 micro-engines load handled easily

Page 17: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Conclusions from SimulationConclusions from Simulation

Latency of 605 cycles 0.78 us, 0.40 usLargest possible packet that could be sent

after started processing is 65,536 bytesAdditional 1170 cycles latency 1.51 us, 0.78 us

Transceiver delay 0.05 us [1999]Additional 0.10 us/hop

Total latency/hop of 2.4 us, 1.3 us

0.0000024 s 0.0000024 s/hub

Page 18: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Internet VisualizationInternet VisualizationSan Fransisco, USA

Perth, Australia

0.033 s

0.033 s

0.0000024 s

0.0000024 s

0.0000024 s/hub

< 0.001 s 2 gateways0.067 s 2 end users

Worst case packet journey:0.101 s Halfway around the world

0.169 s tolerable latency for video conferencing

< 0.001 s 52 hubs (probability of 1 in

1000)

Page 19: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

ConclusionsConclusions Limiting factor is maximum bandwidthAverage case simulations done

Can easily process at maximum bandwidth with 40 IXP1200 processors (mostly longer packets)

Reduce processing power to levels sufficient for bandwidth and model– Less IXP1200s on chip– Smaller chip size reduces cost– Reduced processing power increases

congestion, and may require high priority packets for some communications

Page 20: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

448 Bit Operation Cycles448 Bit Operation Cycles 448 bits onto chip Up to 48 bit header detection on previous 47 bits, and 401 bits

of current 448 bits (48 bit comparators)– Send header positions in this 448 bit window

Send to high priority and low priority in queues Packet priority detection (header) in queues Incorrect priority queue drops packet, in queue controller

informed Remainder of packet sent to appropriate in queue Process packet header, send packet body to out queue Process times between 70 and 600 cycles, 345 cycles avg. Send updated packet header to out queue Inform out queue controller packet ready to send Send when output bus available 448 bits off chip

Page 21: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Maximum Throughput Maximum Throughput Node HardwareNode Hardware

For gateways or hubs6.5 million transistors for IXP12000.5 million transistors for other applications

such as speech codecs, V.42bis, Huffman compression, and 3DES

Up to 310 IXP1200s on the same chip

ASIC max. transistors in 1999 (millions/cm2) [ITRS] 20ASIC max. transistors in 2011 (millions/cm2) [ITRS] 811ASIC max. chip size in 1999 (cm2) [ITRS] 8ASIC max. chip size in 2011 (cm2) [ITRS] 8ASIC max. number of transistors/chip in 2011 (millions) 6488transistors for backbone IXP1200+other possible applications (millions) - one of each 7.0ideal possible number of IXP1200+other possible applications/chip 931assume 2/3 overhead for memory, routing, and other CPUs; number of IXP1200s et al. 310

Page 22: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

927 pins with I/O at clock speed

Packet Processing at NodesPacket Processing at Nodes

Maximum onto chip bandwidth

Smallest IP packet is 20 bytes (header size)Maximum required processing power

chip-to-package pads in 2011 927 927maximum I/O speed at IXP1200 operating speed with maximum use of pads (Gbits/s) 718 1391maximum I/O bandwidth onto chip, must get back off chip as well (Gbit/s) 359 695

number of bits/s that must be processed on chip (Gbit/s) 359 695smallest possible packet size (bytes) 20 20worst case number of (20 byte) packets/s that must be processed per IXP1200 7,230,414 14,000,372number of packets that IXP1200 can process/s in 1999 2,300,000 2,300,000number of packets that IXP1200 can process/s in 2011 10,733,333 20,783,133number of packets that can be processed in 2011/s on our chip 3,331,317,770 6,450,486,216

927 pins with I/O at clock speed

Page 23: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Hub Cache and Main MemoryHub Cache and Main Memory

Required for IXP1200sAssumed by Scott in IXP1200 simulations:

– 4 MB of DRAM– 2 MB of SRAM

DRAM memory per IXP1200 in 1999 (Gbytes) 0.004SRAM memory per IXP1200 in 1999 (Gbytes) 0.002DRAM memory, or equivalent, required in 2011 for IXP1200s (Gbytes) 1.24SRAM memory, or equivalent, required in 2011 for IXP1200s (Gbytes) 0.62area required for DRAM for IXP1200s (cm^2) 0.17area required for SRAM for IXP1200s (cm^2) 0.24

Page 24: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Hub Register MemoryHub Register Memoryaverage packet latency in 2011 (s) 0.00000009latency for a single packet in 2011 (s) 0.00000045number of pins for packet I/O (this many each to get on and off) 463.5maximum bandwidth onto chip and back off chip (Gbit/s) 359minimum IPv4 packet size (bytes) 20maximum number of IPv4 packets/s (x10^9) 18maximum number of IPv4 packets to store while a packet is processed 9668maximum packet size in IPv6 (bytes) 65536average storage capacity required at maximum bandwidth (Gbit) 0.00019number of in queues 20number of out queues 20assuming one maximum length packet in each queue, register storage required (Gbit) 0.021area of all the registers (cm^2) 0.028

Page 25: Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Average Scenario InformationAverage Scenario Information

Assumed normal distribution between 80 and 600 cycles to process a packet– Average of 340 cycles– 80 and 600 are two

standard deviations from mean

Packet sizes:


Recommended