Photonic Networks for Intra-Chip, Inter-Chip, and
Box Interconnects in High-Performance Computing
Keren Bergman
Columbia UniversityDepartment of Electrical Engineering
(1) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Table of Contents
1. Introduction2. Large Scale Supercomputing Systems3. Photonics Design Considerations4. Interconnection Network Architectures5. Implementations: OSMOSIS, Data Vortex6. Off-Chip Bottlenecks7. Photonic Network-on-Chip8. Emerging Enabling Technologies9. SPINet Design and Implementation10. Intra-Chip Challenges11. Future Directions and Opportunities
(2) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Top 10 Supercomputers (June 2006)
Rank Site Manuf. Computer Country Processors RPeak ProcessorProc. Freq.
System Family Arch. Interconnect
1DOE/NNSA/
LLNL IBMeServer Blue Gene
SolutionUnited States 131072 367000
PowerPC 440 700
IBM BlueGene/L MPP Proprietary
2
IBM Thomas J. Watson Research
Center IBMeServer Blue Gene
SolutionUnited States 40960 114688
PowerPC 440 700
IBM BlueGene/L MPP Proprietary
3DOE/NNSA/
LLNL IBMeServer pSeries p5
575 1.9 GHzUnited States 12208 92781 POWER5 1900
IBM pSeries MPP
SP Switch Federation
4
NASA/Ames Research
Center/NAS SGISGI Altix 1.5 GHz, Voltaire Infiniband
United States 10160 60960
Intel IA-64 Itanium 2 1500 SGI Altix MPP
Numalink/ Infiniband
5
Commissariat a l'Energie
Atomique (CEA) Bull SA
NovaScale 5160, Itanium2 1.6 GHz,
Quadrics France 8704 55705.6Intel IA-64 Itanium 2 1600
Bull SMP Cluster
Constellations Quadrics
6Sandia National
Laboratories DellPowerEdge 1850,
3.6 GHz, InfinibandUnited States 9024 64972.8
Intel EM64T Xeon EM64T 3600
Dell PowerEdge
Cluster Cluster Infiniband
7
GSIC Center, Tokyo Institute of
Technology NEC/Sun
Sun Fire X4600 Cluster, Opteron
2.4/2.6 GHz, Infiniband Japan 10368 49868.8
AMD x86_64 Opteron Dual
Core 2400Sun Fire -
Cluster Cluster Infiniband
8Forschungszentrum Juelich (FZJ) IBM
eServer Blue Gene Solution Germany 16384 45875
PowerPC 440 700
IBM BlueGene/L MPP Proprietary
9Sandia National
Laboratories Cray Inc.Red Storm Cray
XT3, 2.0 GHzUnited States 10880 43520
AMD x86_64 Opteron 2000 Cray XT3 MPP
Cray XT3 Internal
Interconnect
10The Earth
Simulator Center NEC Earth-Simulator Japan 5120 40960 NEC 1000 NEC Vector MPPMulti-stage
crossbar
(3) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
System Performance GFlops/Watt
Optimized performance/Watt:Performance/rack = performance/Watt x Watt/rackWatt/rack ~ constant for air cooled, ~20kWUse low power low frequency processor cores
Key system metric:peak Flops/total powerLarge number of moderate frequency processors requires EXTREME SCALINGNetwork must scale in performance + packaging
(4) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
IBM BG/L Interconnect• System Peak 360 TFlops• 65,536 (216) dual core • 1024 dual core processor
nodes per rack– 27.5kW– ~0.25 GFlop/Watt– ~85% of inter-node
connectivity
• Main compute interconnect: 3D torus (64x32x32)• BW: 2.1 GB/s inter-node
• MPI Latency: > 2 µs (strong load dependence)
(5) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Cray XT3
• System Peak 43.5 TFlop• Interconnect:
– 3D torus– 6 switch ports per SeaStar
7.6GB/s each 45.6GB/s• 10,880 compute PEs• Interconnect bisection bandwidth
11.7 TB/s
• 64-bit AMD Opteron 100• 96 dual core 2.6GHz µproc• 998 GFlops per cabinet• 14.5 kW per cabinet
~0.07GFlops/Watt
(6) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Bull SA NovaScale (Tera 10)
CEA: Commissariat àl’Énergie Atomique
(France 's Atomic Energy Authority)
Europe’s top SupercomputerSystem Peak 55.7 TFlopsInterconnect:• Quadrics QsNet/QsNetII
• 8-port ASIC routers • Fat-tree topology• Scalability: up to 4K nodes• BW: 900 MBytes/s per node. • MPI Latency: ~2-3 µs
(7) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Box Interconnection Networks• HPCS require interconnection networks to deliver
– ultra-low latency message exchange– dynamic, bursty bandwidth– self-conflict resolving packet routing – capacity approaching Pbytes– port count scalability (>1k to 10k)– flexible packet sizes– small messages efficiency (GUPS)– High-bandwidth processing– Significantly lower power consumption
• Broader applications– optical interconnections for chips or chipsets (cost,
footprint, power dissipation critical)– optical backplanes (cost, scalability)– high-capacity routers
OPSnetwork
OPSnetwork
(8) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Performance Trends & Photonics Opportunity
• Increases in performance of individual CPU chips will come from:– number of processor cores per chip – number of parallel functional units
• One of most important features of a massively parallel supercomputer is the network that connects the processors together and allows:– machine to operate as a large coherent entity
• Interconnection network must SCALE in highly parallel system• Address power consumption with scaling• Scalability: bandwidth, latency, throughput
• Photonic Opportunity: bandwidth (WDM), throughput, power efficiency, latency
(9)COLUMBIAUNIVERSITY
Box Interconnect: key metrics• high-bandwidth low latency communication performance
between nodes necessary to provide superb parallel efficiency for applications running on petaFLOPS-scale computing systems.
• Communication networks characterized by five critical performance metrics:– Message Latency (end-to-end message exchange)– Message Throughput (messages processed at node)– Message Bandwidth (exchanged message bandwidth)– Load/Store Bandwidth (load/store operations per second)– Bisectional Bandwidth (global network bandwidth)
Lightwave Research Laboratory[10]Columbia University
Architectural Considerations
• Constraints :O/E and E/O conversions expensivenontrivial attenuation, signal degradationno buffering (FDL only)poor signal processing
• Features :wavelength parallelismhigh channel bandwidth
Leverage unique features of both photonics and electronics.
Lightwave Research Laboratory[11]Columbia University
Architectural Considerations
Leverage unique features of both photonics and electronics.
• Optics :ultrahigh-bandwidth transmissionspeed of light latencyefficient propagation
• Electronics :digital logicsignal processingbuffering
Lightwave Research Laboratory[12]Columbia University
Figures of Merit
System :
• power consumption
• cost• reliability• serviceability
Network :
• acceptance rate• throughput• latency
Physical :
• transparency• dynamic range• stability
Lightwave Research Laboratory[13]Columbia University
Architectural Foundations
PhotonicInterconnection
Network
Electronic Control
Lightwave Research Laboratory[14]Columbia University
Architectural Foundations
• No buffering →
deflection routingjudicious I/O queuingover-provisioning of paths
• high-bandwidth multiple-wavelength encoding
• low-latency electronic routing control
• Simplicity →
banyan routingMIN topologymodularity
(15) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Switching Node Design
λF
λH
O/E
O/E
controllogic
50:50
50:50
70:30
70:30
North East
SOA
SOA
West South
multi-λ packet
wavelength-parallel transparent routing
(16) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Packet Structure
slot (25.7 ns) packet (22.4 ns)
payload (19.3 ns)
FrameH0
H2H3P0…
P15
Frame
H1H2
P0…
P15
deadtime (3.3 ns) guardtime (1.6 ns)
multiple-wavelength packet
Lightwave Research Laboratory[17]Columbia University
Implemented Switching Node
(18)COLUMBIAUNIVERSITY
Lightwave Research Laboratory
Packet StructureH2H3 H1 H0 F
wavelength (nm)
aver
age
pow
er (d
Bm
)
(19)COLUMBIAUNIVERSITY
Lightwave Research Laboratory
Multi-wavelength Switch Block
Truly broadband switching of multi-wavelength packets using a single switch
Single Wavelength Switch
P dissipated,single wavelength = P dissipated,multi-wavelength
Multi-Wavelength Switch
Lightwave Research Laboratory[20]Columbia University
Topologies
simple banyan (e.g. omega)
n = ½ N log N
Lightwave Research Laboratory[21]Columbia University
Topologies
Clos network (e.g. Beneš network)
n = ½ N (2 log N–1)
Lightwave Research Laboratory[22]Columbia University
Topologies
augmented banyan (e.g. omega)
n = ½ N (log N+K)
Lightwave Research Laboratory[23]Columbia University
Topologies
Data Vortex (cyclic butterfly)
input nodes
output nodes
n1×2 = A N (log N+1)
(24) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Data Vortex Topology
input node
output node
[1110]
(0,0,0)(0,2,1)(1,2,2)(1,3,0)(2,3,1)
[0xxx][1xxx][10xx][11xx][1110]
(25) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Data Vortex Topology
output node input node
west
east
north
south
deflectionstructure
(26) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
System Implementation
DV Experimental System12×12 switch~100 ns routing latency160 Gbps per portTerabit capacity
(27) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
System Implementation
(28) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Experimental Setup
…
BERT
EDFA
λ
PPG DTG
F
mod
H0
H1
H2
H3
P0
mod
mod
mod
mod
mod
GatingSOA
P1 Rx
P15
5
10 Gbps ~39 Mbps
BoosterSOA
5-node path
12×12Data Vortex
Network
(29) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Routing DemonstrationPFH0H1H2H3
00010010001101010110011110011010101111011110111150 ns/div
7 hops ≈ 160 ns
3 hops ≈ 60 ns
(30) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Deflection Demonstration
input #7
+3 hop deflection
20 ns/div
input #4
F
H0
H1
H2
H3
1101
(31) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Error-Free Transmission
(32) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
OSNR DegradationH2H3 H1 H0 F
(33) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Dynamic Power Range
Bergman, ECOC’’0634
Just-in-Time Optical Cell Switching• Fast optical switch for fixed size data packets (cells)• Transparent data path with multiple cells in flight• Out of band electronic control path• Just-in-time switching as cells arrive
Control Switched 40GOptical Cells
5 ns/div
Just-in-TimeOptical Switching
Rx nodesTx nodes(with electricalVOQ buffers)
Bergman, ECOC’’0635
Bufferless Crossbar Design: Implemented via 2 Stage Broadcast and Select Architecture
EDFAs
40 Gb/s Packet receiversMux’s
Fast SOA Color-Selector
Gates
Fast SOA Fiber-Selector
GatesStar Couplers
1x128
#0a,b
.
.
.
#63a,b
Laser-IntegratedModulators
8x1Combine
40 Gb/s transmitters
Optical Gain Semiconductor Optical Amplifier On-Off Gates
High Sensitivity ReceiversS: Multiple fibers (8 scaling to 40+) λ : Multiple colors per fiber (8 scaling to 100+) T: Switching time (~2 ns scaling to <0.1ns)
High bit rates (40G scaling to 100G+)
Bergman, ECOC’’0636
OSMOSIS demonstrator prototype
Optical Switch
Source/Sink Cluster Proxies + I/O Adapters
Arbiterprototype
Management GUI
Bergman, ECOC’’0637
Error-free Cell Transmission with Wide Dynamic Range: 20 Mcell/s, 64 ports, 40 Gb/s per port
5 dB Cell-Cell Dynamic Range
Error-free Recovery
+2.0 dBm
40 Gb/s eye diagram25 ps
20 ns/div
Programmable cell structure
Packet 1 Packet 2 500 ps/div
Data DataPost-amble
PreambleInter-
packet gap
-14 -12 -10 -8 -6 -4 -2 0 2 4 61E-12
1E-11
1E-10
1E-9
1E-8
1E-7
1E-6
1E-5
1E-4
1E-3
0.01
7.5 dB@ 28 dB OSNRSensitiv ity
BE
R
Received power (dBm )
OSNR ~45 dB 29.8 dB 28.8 dB 27.8 dB 26.8 dB
High sensitivity receiver for scaling
Bergman, ECOC’’0638
Measured Performance Summary• Channel Performance:
– Data path bit rate: 40 Gbit/sec/port 64 ports– Control path bit rate: 2.5 Gbit/sec/port– Received OSNR: >35dB– Data cell size: 2048 bits– Data cell structure: fully programmable (via FPGA)– Latency: <500ns– Efficiency: 75%– Bit Error Rate: <10-14 switched, uncorrected
• Correctable by FEC and protocol to <10-21 BER • System Performance:
– Switch Size: 64 ports at initial implementation– Out of band control channel at 20 Megacell/sec/port– Switching at every cell boundary under full load
Bergman, ECOC’’0639
Semiconductor Optical Amplifier Switches• Switches entire data cells • Fast switching: 0.1 – 2 nanoseconds.• Inherent gain (~20dB)• High on-off ratio (>45dB)• Low polarization sensitivity (<0.6 dB)• Low noise figure (<6.5 dB)• Broadband & WDM friendly (>80 nm)• Monolithically integratable• Future ultra-fast all-optical capability
Electrically Switched SOA at 1 GHzMonolithic
SOA array
Optically SwitchedSOA at 80GHzDiscrete SOA
12
13
14
15
16
17
18
19
-20 -15 -10 -5 0 5 10Total power into SOA (dBm), 8 channels
Q fa
ctor
(dB)
10-12 BER
20 dB dynamic range
8x40Gb/s Capacity
Bergman, ECOC’’0640
Towards Commercialization:Optical integration provides 10-30X benefit
Feasibility Demonstrator
450m
m
450mm
InP 8x1 CombinerMonolithicSOA Array
MonolithicOptical Interface
Silicon Arrayed Waveguide
Discrete Devices Integrated BenefitPower: W ~250 ~8 >30X
Complexity: Parts ~2000 ~100 20XSize: sq m. ~0.2 ~0.015 >10X
Integrated Prototype
10 Tbit/sec per shelf
Lightwave Research Laboratory[41]Columbia University
Off-Chip Interconnects
Inter-Chip Interconnects(chip-to-chip)
• Current challenges• Photonic Networks-on-Chip• Design Considerations• SPINet
(42)COLUMBIAUNIVERSITY
International technology Roadmap for semiconductors
CPU/off chip bandwidth performance gapHigh performance computing systemsDistributed Shared-Memory (DSM) Microprocessors
• Shared address space by physically distributing memory among many processors
• Fundamental DSM communications bottleneck: remote memory access latency
• Emerging performance gap between CPU bandwidth and off-chip clock rates; fundamental limits reached on multi-GHz electronic signaling (power dissipation)
(43)COLUMBIAUNIVERSITY
Photonic Integrated Networks
• But: Packet size doesn’t scale down– Typical packet > 10 ns ≈ 2 m (silica fiber)
• Large Scale Photonic (O/E) Integration– Break the optical cost barrier– Very low power dissipation
• Novel, buffer-less architecture, using transparent lightpaths for acknowledge echo
Message size ~ 0.1 to 1meterNoC size ~ 100µm to 1cm
Message head
Message tail
(44)COLUMBIAUNIVERSITY
Rationale for Integrated OIN
• 64-port network-on-chip with >4Tbit/sec• MIP Latency 100ns range• Power dissipation 10X-100X below current
electronic interconnect networks that deliver fraction of throughput bandwidth
• Integration of universal programmable 2x2 multi-wavelength switching building block
(45)COLUMBIAUNIVERSITY
Programmable Multi-WavelengthSwitching Building Block
λF0λA0
50:50
50:50
70:30
70:30
SOA
SOA
O/EO/E CPLDO/EO/E
λF1λA1
50:50
50:50
SOA
in050:50
50:50
in1
out0
out1SOA
(46)COLUMBIAUNIVERSITY
Prototype Switching Node
λF0λA0
50:50
50:50
70:30
70:30
SOA
SOA
O/EO/E control
logicO/EO/E
λF1λA1
50:50
50:50
SOA
in0
50:50
50:50
in1
out0
out1SOA
Six switching states
interchange straight upperstraight
upperinterchange
lowerstraight
lowerinterchange
(47)COLUMBIAUNIVERSITY
• Ultra-broadband each messageencompasses entire WDM bandwidth
• Multi-wavelength 2×2 switch elements• Simple WDM address, on-the-fly
single bit routing, self de-conflictingno buffers, low latency
• Instantaneous lightpaths every time slot• Contentions resolved by dropping• Physical layer acknowledgements
SPINetSPINet: Scalable Photonic Integrated Network
Multistage interconnection network (MIN)
(48)COLUMBIAUNIVERSITY
DemonstrationMessages:
0 1
2 1
3 5
5 4
7 3
0.0
0.1
0.2
0.3
1.0
1.1
1.2
1.3
2.0
2.1
2.2
2.3
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
1
3
4
5 Ackssent
Paths torndown
(49)COLUMBIAUNIVERSITY
Acceptance Rate
64-port network
Average BW/port:
0.25•0.83•320 Gb/s = 64 Gb/s
Network BW:
64 Gb/s•64 = 4 Tb/s !
(50)COLUMBIAUNIVERSITY
Latency
64-port network
Mean Queuing latency (load=0.25): 0.46 slots
(51)COLUMBIAUNIVERSITY
4 node experimental implementation
• optical waveforms of signals at network’s input and output ports• demonstrated correct routing and contention resolution between optical packets.
(52)COLUMBIAUNIVERSITY
Intra-Chip Interconnects
Intra-Chip Interconnects(on-chip)
• Current on-chip interconnects challenges
• Emergence of Multi-Cores• Photonic Opportunities • Design Considerations• Intra-Chip Networking
(53)COLUMBIAUNIVERSITY
Paradigm shift in high-performance processor chip design
• Before: exponential performance acceleration with each generation via increased clock frequencies and integration densities
• As clock frequencies rise, fraction of chip reachable in a single clock cycle is decreasing by same exponential rate
• Diminishing returns: increasing processor frequency increased instruction execution latencies performance can degrade
• Now: Designers limited not by number of transistors integrated on a single die but by logic reachable within one clock cycle
• Power dissipation and optimization of performance per Watt leading trend toward multi-core parallel processors
• High-performance processor chips are distributed systems
• Evolution from computation towards communication-bound design
(54)COLUMBIAUNIVERSITY
Critical Roadblocks and the Photonics Opportunity
• On-chip communications latency– Time-of-flight: RC delay worse with each process generation– “The intrinsic interconnect delay of a 1-mm interconnect for a
35-nm technology will be longer than the MOSFET switching delay by two orders of magnitude” [Davis et al., IEEE Proc. ‘01]
– Optical signal velocity independent of data rate– Serialization latency: optical TDM compression– Quequeing latency: bufferless interconnection network with
guaranteed queueing-free paths for latency sensitive packets– Latency insensitive design (LID): EDA tools
• Exacerbated growth in power dissipation– Propagation power dissipation independent of optical signal rate– Power efficient design in photonic switching
(55)COLUMBIAUNIVERSITY
On-chip interconnect latency
0
20
40
60
80
100
250 180 130 100 80 60
• “For a 60-nanometer process a signal can reach only 5% of the die’s length in a clock cycle” [D. Matzke (Texas Instruments), IEEE Computer Sept. 97]
• Shift from function-centric to communication-centric design
16 cycles8 cycles
4 cycles
1 cycle2 cycles
[nm technology]
[% o
f rea
chab
le d
ie]
(56)COLUMBIAUNIVERSITY
Processor Chips Become Distributed Systems
• Interconnect Latency– Interconnect delays can be an order of magnitude larger than switching delays– Hard to estimate because affected by many phenomena
• process variations, cross-talk, power-supply drop variations– Breaks the synchronous assumption
• that lies at the basis of design automation tool flows
Local (scaled-length) wires• span a fixed number of gates,
scale well together with logic
Global (fixed-length) wires• span a fixed fraction of a die, do
not scale
scaling
[Ho et al., 2001]
(57)COLUMBIAUNIVERSITY
Interconnect Power Dissipation• Power dissipation is arguably
the most critical problem in high-performance chip design
• Over last two decades microprocessor power dissipation grows exponentially and primary contribution from interconnects [H
orow
itz
et a
l., 2
005]
Pow
er (W
atts
)
Year
Interconnect responsible for 50% of dynamic power dissipation[Magen et al., 2004]
(58)COLUMBIAUNIVERSITY
The Rise of Multi-Core Architectures• Rise of parallel multi-core architectures
to mitigate power dissipation
• Parallel architectures with multiple simpler processing cores provide better performance per watt than architectures based on a single complex processor
• State-of- the-art commercial chips feature more parallel and distributed architectures that are essentially multi-core chips– Montecito (Intel)– Cell (IBM, Toshiba, Sony)
• Key is to design robust, scalable, fast, and power-efficient:
intra-chip communication networks
(59)COLUMBIAUNIVERSITY
Optical interconnection networks-on-chip• photonic intra-chip interconnection networks create
potentially disruptive technology:– Ultra-high throughput– Minimal access latencies– Low power dissipation, independent of capacity
• Globally shared optical network, regular topology– Local electronic interconnect– Electronic computation
• Architecture, data routing designed for photonics:– Optical buffering not practical on chip– No significant processing in optical domain
(60)COLUMBIAUNIVERSITY
Manhattan Street Network (MSN) Optical Interconnection
• Regular shared topology:– Torus replaced by mesh– Dense grid of unidirectional waveguides– Adjacent are directed in opposite directions
(like Manhattan's streets and avenues)
• Simple 2x2 switch elements:– 2-state operation – No buffering– Routing logic in parallel electronic
control plane– Power dissipation during transitions
• Tx/Rx pair for major on-chip modules (processor cores) at specific grid addresses
• Asynchronous operation:– Modules transmit/receive packets– Very simple photonic switching elements – Asynchronous messages stretch over SEs.
AC
(61)COLUMBIAUNIVERSITY
Asynchronous electronic/optical master/slave routing
• Optical interconnection network functions in synergy with an electronic control plane that mimics the photonic network topology
• Parallel electronic control network: – control packets exchanged to provide path
setup/release requests– acknowledgments functionality– Employ path diversity, deflect around used paths– Electronic router controlling every photonic switching
element (PSE) at every intersection of the MSN grid
(62)COLUMBIAUNIVERSITY
Routing and Data Flow Control• Setup of paths accomplished as optical burst switching:
– Source wants to send a message– Sends an electronic path setup packet (PSP)
• Encodes several fields: including control packet type, destination address (X and Y coordinates), priority, and source address
– Electronic PSP travels through parallel control network, setting up routers and PSEs on its way
– No buffering takes place at any point– Each router has only two inputs and two outputs an available output port
(for deflection) always exists• Decision at every router computed by comparing coordinates of the
router/PSE with the destination coordinates of the packet • Message payload transmission in the optical domain immediately
follows as a comet tail with the electronic PSP setting up the path• After the optical payload transmission ends, a path release packet
(PRP) is sent to reset all the routers and PSEs
TX
RX
RX
TX
TX RX RX TX
TX RX
TX
RX
RX
TX
RX TX
electronic signal detects a used path and alters its route
if paths overlap, one
must be rerouted
(65)COLUMBIAUNIVERSITY
Throughput, Latency, Dynamic Programmability
• Packet payload transparency enables enormous scaling in capacity– dynamic support of variable packet sizes– message can extend many PSEs, create lightpath circuits
• Asynchronous design, shared multiple global communications
• Heavily path diversified network• Different algorithms (X-first, Y-first, mixture) used to
balance load and avoid local congestions• Programmable fields in electronic PSP enables multiple
classes of service
(66)COLUMBIAUNIVERSITY
Asynchronous, Bufferless MSN
• Asynchronous bufferless network, based on Manhattan Street Network topology – deflection routing
• Can provide a guaranteed queueing-free path for latency sensitive signals.
• Path diversity to reduce load– Differentiated services to different classes
• Latency guarantees verified by simulations
(67)COLUMBIAUNIVERSITY
Differentiated Classes of Service• Provides guaranteed queueing-free path for latency sensitive signals
• Different classes of service defined:– real-time signaling: dedicated path, preempting any other traffic.
• path re-used to route other traffic when not used by the real-time signal– Guaranteed bandwidth and CBR (constant bit rate)
• some paths designed to be time-multiplexed with long-lasting connections that require a guaranteed bit rate.
– Best effort: for non-latency sensitive applications • the vast bandwidth offered by the network
• Latency addressed by: fast propagation velocity, path diversity, and the complete avoidance of buffering
• Can secure deflection-free, buffering-free paths to a small number of high priority signals.
(68)COLUMBIAUNIVERSITY
Initial Simulations Results
(69)COLUMBIAUNIVERSITY
Summary and ConclusionsMultiple Opportunities for Insertion of Photonics
leveraging unique features of both photonics and electronics
• unique architectures to allow for synergy• multiple-wavelength transmission to maximize bandwidth• transparent optical pathways• power efficiency• design for high acceptance rates• design enabling technologies in Si photonics for network• simplicity, repeatability since photonics are still nascent
(70) Lightwave Research LaboratoryCOLUMBIAUNIVERSITY
Bibliography1. J. Protic, M. Tomasevic, V. Milutinovic , “Distributed Shared Memory: Concepts and Systems,” IEEE Parallel & Distributed
Technology, vol. 4, no. 2, Summer 1996, pp. 63–79. 2. D. Dai and D. K. Panda, “How Can We Design Better Networks for DSM Systems?” Lecture Notes in Computer Science, vol.
1417, pp. 171-184, Jan. 1998. 3. J. P. G. Sterbenz and J. D. Touch, High-Speed Networking: A Systematic Approach to High-Bandwidth Low-Latency
Communication, New York, NY: Wiley and sons, 2001. 4. W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks, San Francisco, CA: Morgan Kaufmann, 2004.5. D. A. B. Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips,” Proc. IEEE, vol. 88, pp. 728-748,
June 2000. 6. R. Luijten, C. Minkenberg, R. Hemenway, M. Sauer, R. Grzybowski, “Viable Opto-electronic HPC Interconnect Fabrics,” in
Proc. ACM/IEEE (SC|05) Conf. Supercomputing, Seattle, WA, Nov. 2005, pp. 18-18. 7. K. Kodi and A. Louri, “Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors,” IEEE
Micro, vol. 25, no. 1, pp. 41-49, Jan/Feb 2005. 8. A. Shacham, B.A. Small, O. Liboiron-Ladouceur, K. Bergman, “A Fully Implemented 12x12 Data Vortex Optical Packet
Switching Interconnection Network,” J. Lightwave Technol., vol. 23, no. 10, pp. 3066-3075, Oct. 2005. 9. R. Nagarajan, et al., “Large-Scale Photonic Integrated Circuits,” IEEE J. Select. Topics Quantum Electron., vol. 11, no. 1, pp.
50-65, Jan./Feb. 2005. 10. M. Lipson, “ Guiding, Modulating and Emitting Light on Silicon - Challenges and Opportunities”, IEEE Journal of Lightwave
Technologies, Vol. 23, No. 12, 12 December 2005 (invited). 11. C. Gunn, “CMOS Photonics for High-Speed Interconnects,” IEEE Micro, vol. 26, no. 2, pp. 58-66, Mar./Apr. 2006. 12. B. A. Small, T. Kato, K. Bergman, “Dynamic Power Considerations in a Complete 12x12 Optical Packet Switching Fabric,”
IEEE Photon. Technol. Lett., vol. 17, no. 11, pp. 2472-2474, Nov. 2005. 13. A. Shacham, B. G. Lee, K. Bergman, “A Scalable, Self-Routed, Terabit Capacity, Photonic Interconnection Network,” in Proc.
13th Annu. IEEE Symp. on High Performance Interconnects (Hot Interconnects), Stanford, CA, Aug. 2005, pp. 147-150. 14. A. Shacham, B. G. Lee, K. Bergman, “A Wideband, Non-Blocking, 2x2 Switching Node for a SPINet Network,” IEEE Photon.
Technol. Lett., vol. 17, no. 12, pp. 2742-2744, Dec. 2005. 15. A. Pattavina, Switching Theory – Architecture and Performance in Broadband ATM Networks, West Sussex, UK: Wiley &
Sons, 1998. 16. A. Shacham and K. Bergman, “Utilizing Path Diversity in Optical Packet Switched Interconnection Networks,” in Proc. Optical
Fiber Commun. Conf. (OFC 2006), Anaheim, CA, Mar. 2006, OTuN5. 17. D. S. Meliksetian and C. Y. R. Chen, “A Markov-Modulated Bernoulli Process Approximation for the Analysis of Banyan
Networks,” in Proc. ACM SIGMETRICS, Santa Clara, CA, 1993, pp. 183-194. 18. L. P. Carloni and A. L. Sangiovanni-Vincentelli, “Coping with latency in SOC design,” IEEE Micro, 22(5):24–35, Sep-Oct
2002. 19. HECRTF The High-End Computing Revitalization Task Force. Federal plan for high-end computing. Available at
http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach/. 20. A. Gara, M. A. Blumrich, D. Chen, G. L.-T. Chiu, P. Coteus, M. E. Giampapa, R. A. Haring, P. Heidelberger, D. Hoenicke, G.
V. Kopcsay, T. A. Liebsch, M. Ohmacht, B. D. Steinmacher-Burow, T. Takken, , and P. Vranas, “Overview of the blue gene/lsystem architecture,” IBM J. Res. Develop., 49(2-3):195–212, May 2005.
21. Committee on the Future of Supercomputing. “Getting up to speed: The future of supercomputing,” Available athttp://www.nap.edu/catalog/11148.html.