+ All Categories
Home > Documents > Hybrid On-chip Data Networks - Columbia...

Hybrid On-chip Data Networks - Columbia...

Date post: 30-Mar-2018
Category:
Upload: duongtuyen
View: 218 times
Download: 3 times
Share this document with a friend
46
Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University
Transcript

Hybrid On-chip Data Networks

Gilbert Hendry

Keren Bergman

Lightwave Research Lab

Columbia University

2

Chip-Scale Interconnection Networks

Intel Polaris IBM Cell AMD Opteron

• Chip multi-processors create need for high performance interconnects

• Performance bottleneck of on-chip networks and I/O

• Power dissipation constraints of the chip package

• > 50% of total power comes from interconnects*

* N. Magen et al., “Interconnect-power dissipation in a microprocessor,” SLIP 2004.

3

Motivation

• CMPs of the future = 3D stacking

• Lots of data on chip

• Photonics offers

key advantages

4

Why Photonics?

TX RX

ELECTRONICS:

Buffer, receive and re-transmit at every router.

Each bus lane routed independently. (P NLANES)

Off-chip BW is pin-limited and power hungry.

Photonics changes the rules for Bandwidth, Energy, and Distance.

OPTICS:

Modulate/receive high bandwidth data stream once per communication event.

Broadband switch routes entire multi-wavelength stream.

Off-chip BW = On-chip BW for nearly same power.

RX

TX

RX RX

TX

RX

TXRXTX

TX TXTXTX TX

RX

5

Hybrid Network Premise

Optical processing difficult and limited

Source, destination routing inefficient

Use electronics for routing,

optics for switching and transmission

Hybrid Circuit-Switching

6

Hybrid Circuit-Switched Networks

Step 1: Path SETUP request

Electronic

SETUP Msg

Source core

Destination Core

7

Hybrid Circuit-Switched Networks

Step 2: Path ACK

Electronic

ACK Msg

8

Hybrid Circuit-Switched Networks

Step 3: Transmit Data

Photonic

Switch Use

Information

9

Hybrid Circuit-Switched Networks

Meanwhile: Path Contention

Path

BLOCKED Msg

(Backoff)

10

Hybrid Circuit-Switched Networks

Step 4: Path TEARDOWN

Electronic

SETUP Msg

Source core

Destination Core

11

Hybrid Circuit-Switched Networks

• Energy-efficient end-to-end transmission

• High bandwidth through WDM

• Electronic network still available for small control messages*

• Network-level support for secure regions

• Path setup latency

• Path setup contention (no fairness)

Pros: Cons:

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

Programming and Communication

13

Shared Memory

Implicit

Communication

Explicit

Communication

scaling

“… [OpenMP on large systems] often performs worse than message passing due to

a combination of false sharing, coherence traffic, contention, and system issues that

arise from the difference in scheduling and network interface moderation”

~ Exascale Report

14

Partitioned Global Address Space

Implicit

Communication

Explicit

Communication

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

Access Method

Local Read Optical Receive

Local Write Optical send

Remote Read Electronic request, optical receive

Remote Write Optical send

Shared R/W ?

15

Message Passing

Implicit

Communication

Explicit

Communication* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

• Complex, dynamic access patterns

• Relatively larger blocks of data

• Scientific computing

16

Streaming

Implicit

Communication

Explicit

Communication

1

2

3

4

Input DataOutput Data

Persistent optical circuits

• Embedded / specialized systems (Graphics, Image + Signal Proc.)

• Execution mode of general-purpose systems (Cell Processor)

Electronic Plane

18

Electronic Router

Arbiter

Control Router

Data Switch

Buffer Crossbar

Buffer Cntrl

Data Path

Xbar CntrlRequest Bus

Flow Control

Xbar Allocation

Data Switch

Allocation

Routing Logic

Credits In

Xbar Cntrl

Ring Cntrl

Ring Cntrl

• Low frequency operation (~ 1GHz)

• 1 VC (typically)

• Small buffers (64-28)

• Narrow Channels (8-32)

19

Network Gateway

Core

Core

Core

Core

Tx/Rx

Netw

ork

IF

Bidirectional

Waveguide

Bidirectional

Electronic Channel

Control RouterElectronic Crossbar

5-port

photonic switch

To/From Control plane

To/From Data plane

Ser

iali

zati

on

Dri

ver

s

Des

eria

liza

tion

Rec

eiver

s

[P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]

External Concentration

The Photonic Plane

21

Silicon Photonic Waveguide Technology

[Vlasov and McNab, Optics Express 12 (8) 1622 (2004)]

C23(1559 nm)

C28

(1555 nm)C46

(1541 nm)C51

(1537 nm)

before injection into waveguide

after 5-cm waveguide and EDFA

[B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)]

1.28 Tb/s Data Transmission Experiment(occupies small slice of available WG BW)

100 ps

Silicon photonic waveguides provide low-power optical

interconnects in CMOS-compatible platform.

Low-loss (1.7 dB/cm),

high-bandwidth (> 200

nm) silicon photonic

waveguides can be

fabricated in

commercial CMOS

process.

22

Silicon Photonic Modulator and Detector Technology

[M Watts, Group Four Photonics (2008)]

[M Lipson, Optics Express (2007)]

85 fJ/bit demonstrated at 10 Gb/s Scalable to < 25 fJ/bit

18 Gb/s demonstrated

[S Koester, J. Lightw. Technol. (2007)]

Ge-on-Si Detectors:

40-GHz bandwidths

1 A/W responsivities

Receivers (detectors w/ CMOS

amplifiers):

1.1 pJ/bit demonstrated at 10 Gb/s

Scalable to < 50 fJ/bit

(CW)

LASERmodulator detector

23

Silicon Photonic Micro-Ring Switch Explanation

in0

in1

out0

out1

fast control of resonance

wavelength via carrier injection

Tra

nsm

issi

on

(in

i

out i)

bar state

cross state

no current,

on-resonance

current,

off-resonance

24

Higher Order Switch Designs

25

On-Chip Topology Exploration

• Photonic Torus • Nonblocking Photonic Torus

[A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008]

26

On-Chip Topology Exploration

• TorusNX • Square Root

[J. Chan et al. JLT, May 2010]

27

Photonic Plane Characteristics

• Insertion Loss

• Noise

• Power

28

Insertion Loss and Optical Power Budget

Nonlinear Effects

WDM FactorO

pti

cal

Po

wer

Bu

dg

et

Worst-case

Insertion Loss

Detector Sensitivity

29

Insertion Loss vs. Bandwidth

Network Size

Num

ber

of

λ

Topologies

30

Simulation Results

4×4

6×6

8×8

10×10

12×12

14×14

16×16

18×18

0

10

20

30

40

50

Inse

rtio

n L

oss (

dB

)

Topology Size (nodes)

Torus Topology

20.625.6

31.237.0

42.848.6

54.560.3

4×4

6×6

8×8

10×10

12×12

14×14

16×16

0

10

20

30

40

50

Insert

ion L

oss (

dB

)

Topology Size (nodes)

Non-BlockingTorus Topology

18.725.3

31.538.0

44.150.6

56.8

18×18

63.2

4×4

6×6

8×8

10×10

12×12

14×14

16×16

18×180

10

20

30

40

50

Inse

rtio

n L

oss (

dB

)

Topology Size (nodes)

TorusNX Topology

15.819.5

23.227.1

31.034.9

38.842.7

4×4

8×8

16×16

0

10

20

30

40

50

Inse

rtio

n L

oss (

dB

) Square Root Topology

12.221.5

30.6

Propagation Crossing Dropping Into a Ring

4×4

6×6

8×8

10×10

12×12

14×14

16×16

18×18

0

10

20

30

40

50

Inse

rtio

n L

oss (

dB

)

Topology Size (nodes)

TorusNX Topology

15.819.5

23.227.1

31.034.9

38.842.7

31

Simulation Results

0 100 200 300

1

10

100

Num

ber

of

Wavele

ngth

Channels

Number of Access Points

Torus Topology

100

Non-Blocking Torus Topology

10 20 30

1

10

Num

ber

of

Wavele

ngth

Channels

Number of Access Points

TorusNX Topology

0 100 200 300

1

10

100

Num

ber

of

Wavele

ngth

Channels

Number of Access Points

Square Root Topology

0 100 200 300

1

10

100

Num

ber

of

Wavele

ngth

Channels

Number of Access Points

Original is based on the IL results from previous slide, Improved is based on a hypothetical

improvement in crossing loss from 0.15 dB to 0.05 dB.

Optical power

budget

Optical power

budget

32

Photonic Plane Characteristics

• Insertion Loss

• Noise

• Power

33

Noise and Crosstalk

Laser Noise

Inter-Message Crosstalk

Intra-Message Crosstalk

Modulation Noise

Crosstalk

Filter

Coherent noise

Incoherent noise

34

Effects of Noise

Network Size

Opti

cal

SN

R

Number of λ Network Load

35

Simulation Results

0

10

20

30

40

50

Op

ticalS

NR

(dB

)

100 101 102 103 104 105 106 107

Message Size (bit)

TorusNon-blocking TorusTorusNXSquare Root

The line at OSNR=16.9 dB is where a bit-error-rate of

10-12 can be achieved, assuming an ideal binary receiver

circuit and orthogonal signaling.

Results

•Results are plotted for network size of 8×8

at saturation, at the detectors.

• Maximum OSNR = ~45 dB (due to laser

noise)

• Minimum OSNR < 17 dB (due to

message-to-message crosstalk)

• Variations between networks due to

varying likelihood of two message

intersecting on network topology.

System Performance

• SNR measures the likelihood of error-free

transmission.

• Lower SNR designs will require additional

retransmission, resulting in lower

throughput performance.

36

Photonic Plane Characteristics

• Insertion Loss

• Noise

• Power

37

Power Usage

0V1V

n-regionp-region

Electronic Control

0V

1V

Ohmic Heater

Thermal Control

Tra

nsm

issi

on

Injected Wavelengths

Off-resonance profile

On-resonance profile

• Laser Power

• Active Power

• Modulating

• Detecting

• Broadband

• Static Power

• Thermal tuning

• Tx\Rx Power

• Drivers

• TIAs

38

Energy Per Bit

10-13

10-12

10-11

10-10

10-9

10-8E

ne

rgy

pe

rB

it(J

/bit)

10-7

100 101 102 103 104 105 106 107

Message Size (bit)

TorusNon-blocking TorusTorusNXSquare Root

39

Power Breakdown

Router Logic43%

Router Buffer44%

Electronic Wire3%

Detector3%

Modulator4%

PSE2%

Thermal1%

Router Logic45%

Router Buffer44%

Electronic Wire2%

Detector2%

Modulator4%

PSE2%

Thermal1%

• Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation.

• Data was collected on 64 nodes topologies constrained to a total surface area of 2 cm × 2 cm.

Torus Topology Nonblocking Torus Topology

• 7 wavelengths @ 10 Gbps/each

• Power Dissipation = 1.59 W

• 12 wavelengths @ 10 Gbps/each

• Power Dissipation = 4.31 W

40

Power Breakdown

Router Logic37%

Router Buffer31%

Electronic Wire1%

Detector10%

Modulator17%

PSE1%

Thermal3%

Router Logic34%

Router Buffer31%

Electronic Wire7%

Detector8%

Modulator14%

PSE2%

Thermal4%

Square Root Topology TorusNX Topology

• 38 wavelengths @ 10 Gbps/each

• Power Dissipation = 3.22 W

• 27 wavelengths @ 10 Gbps/each

• Power Dissipation = 1.89 W

41

Performance

Other Interesting Issues

43

Memory Access

Processor Core

Network Router

Memory Access Point

[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

44

Other Arbitration Means - TDM

[G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010]

45

Wavelength Granularity

• Original Re-design

λ λ

Scalable number of WDM

channels

46

Conclusion

• Some applications / programming models definitely well-suited to a circuit-switched photonic network

• Interesting tradeoffs and design space


Recommended