+ All Categories
Home > Documents > Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs –...

Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs –...

Date post: 07-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
60
Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Mar 24th, 2014 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 1 Hiroki Matsutani Keio University, Japan
Transcript
Page 1: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges,

and Recent Developments

Part IV: 3D WiNoC Architectures

Mar 24th, 2014 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 1

Hiroki Matsutani Keio University, Japan

Page 2: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Outline: 3D WiNoC Architectures

• 3D IC technologies: Wired vs. Wireless [5min]

• Prototype systems: Cube-0 & Cube-1 [15min]

• Wireless 3D NoC architectures [15min]

– Ring-based 3D WiNoC – Irregular 3D WiNoC

• Experiment results and Summary [10min]

Mar 24th, 2014 2 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC

architectures, especially inductive-coupling 3D option.

Page 3: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Design cost of LSI is increasing

• System-on-Chip (SoC) – Required components are integrated on a single chip – Different LSI must be developed for each application

• System-in-Package (SiP) or 3D IC – Required components are stacked for each application

Mar 24th, 2014 3 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Next slides show techniques for stacking multiple chips

By changing the chips in a package, we can provide a wider range of chip family with modest design cost

Page 4: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

3D IC technology for going vertical

Mar 24th, 2014 4 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Two

chip

s (f

ace-

to-f

ace)

Microbump

Through silicon via

Capacitive coupling

Inductive coupling

Wired Wireless M

ore

than

th

ree

chip

s

Scalability

Flexibility

Page 5: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Inductive coupling link for 3D ICs

Mar 24th, 2014 5 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Stacking after chip fabrication Only know-good-dies selected More than

3 chips Bonding wires for power supply

Inductor for transceiver Implemented as a square coil with metal in common CMOS

Not a serious problem. Only metal layers are occupied

Footprint of inductor

We have developed some prototype systems of wireless 3D ICs using the inductive coupling

Note: This part focuses on inter-chip w ireless, not the intra-chip w ireless introduced in Parts I I and III .

Page 6: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Outline: 3D WiNoC Architectures

• 3D IC technologies: Wired vs. Wireless [5min]

• Prototype systems: Cube-0 & Cube-1 [15min]

• Wireless 3D NoC architectures [15min]

– Ring-based 3D WiNoC – Irregular 3D WiNoC

• Experiment results and Summary [10min]

Mar 24th, 2014 6 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC

architectures, especially inductive-coupling 3D option.

Page 7: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: MuCCRA-Cube (2008)

• 4 MuCCRA chips are stacked on a PCB board

Mar 24th, 2014 7

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Data Memory

Technology: 90nm

5.0m

m

2.5mm

Inductive-Coupling Up Link

Inductive-Coupling Down Link

[Saito,FPL’09] Chip thickness: 85um, Glue: 10um

Page 8: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Stacking method: Staircase stacking

Inductor has TX/RX/Idle modes

Mar 24th, 2014 8

TX

TX

TX

TX

TX TX

TX

TX

TX Bonding wire

Pillar

Inductor (TX)

Inductor (RX)

Bonding wire

Bonding wire

Slide & stack

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

(mode change 1-cycle)

Page 9: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Stacking method: Staircase stacking

• Inductive-coupling link – Local clock @ 4GHz – Serial data

Mar 24th, 2014 9 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

TX

TX

TX TX

TX

TX

TxData TxClk

System clock for NoC: 200MHz

TxData TxClk

Local clock shared by neighboring chips; No global sync.

35-bit transfer for each clock

TX

Pillar

Inductor (TX)

Inductor (RX)

We have fabricated some prototype multi-core systems using this wireless technology

Page 10: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-0 (2010)

• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)

• Each chip has – 2 cores (packet counter) – 2 routers – Inductors (P2P ring) – Inductors (vertical bus)

Mar 24th, 2014 10 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

2.1mm x 2.1mm

Core 0 & 1

Inductors (bus)

Inductors (P2P)

Router 0 & 1

Process: Fujitsu 65nm (CS202SZ) Voltage: 1.2V System clock: 200MHz

[Matsutani, NOCS’11]

Page 11: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-0 (2010)

• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)

Mar 24th, 2014 11 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

2.1mm x 2.1mm

Core 0 & 1

Inductors (bus)

Inductors (P2P)

Router 0 & 1

TX

Stacking for Ring network

RX

Slide & stack

[Matsutani, NOCS’11]

Page 12: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-0 (2010)

• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)

12

2.1mm x 2.1mm

Core 0 & 1

Inductors (bus)

Inductors (P2P)

Router 0 & 1

TX

Stacking for Ring network

RX

TX/RX

Stacking for Vertical bus

[Matsutani, NOCS’11]

Page 13: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

• Test chips for building-block 3D systems – Two chip types: Host CPU chip & Accelerator chip – We can customize number & types of chips in SiP

• Cube-1 Host CPU chip – Two 3D wireless routers – MIPS-like CPU

• Cube-1 Accelerator chip – Two 3D wireless routers – Processing element array

Mar 24th, 2014 13 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

[Miura, IEEE Micro 13]

MIP CPU Core

8x8 PE Array

Inductor

Page 14: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

Mar 24th, 2014 14 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

• Microphotographs of test chips

Host CPU Chip

Accelerator Chip

Host CPU + 3 Accelerators

[Miura, IEEE Micro 13]

Page 15: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

Mar 24th, 2014 15 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

• Block diagram of CPU & Accelerator chips

Page 16: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

Mar 24th, 2014 16 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

• Inductive-coupling ThruChip Interface (TCI)

Note: P lease refer to Part III for antenna design for on-chip w ireless.

Page 17: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

Mar 24th, 2014 17 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Page 18: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

An example: Cube-1 (2012)

Mar 24th, 2014 18 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Cube-1

Motherboard

[Miura, HotChips’13 Demo]

Cube-1 demo system PE array chip performs image processing

CPU chip for control

Page 19: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Outline: 3D WiNoC Architectures

• 3D IC technologies: Wired vs. Wireless [5min]

• Prototype systems: Cube-0 & Cube-1 [15min]

• Wireless 3D NoC architectures [15min]

– Ring-based 3D WiNoC – Irregular 3D WiNoC

• Experiment results and Summary [10min]

Mar 24th, 2014 19 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC

architectures, especially inductive-coupling 3D option.

Page 20: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Big picture: Wireless 3D NoC

• Arbitrary chips are stacked after fabrication – Each chip has vertical links at pre-specified locations, but

we do not know internal topology of each chip – Wireless 3D NoC required to stack unknown topologies

Mar 24th, 2014 20

CPU chip from C

Memory chip from A

GPU chip from B

Required chips are stacked for each application

An example (4 chips)

Note: We can add long-range links to induce small-world effects [See Part I ]

Page 21: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Two approaches: Wireless 3D NoC arch

• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability

• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol

Mar 24th, 2014 21 Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Chips should be added, removed, swapped for each app.

Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Good

Bad

Bad [Matsutani, NOCS’11] [Matsutani, ASPDAC’13]

Page 22: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Ring-based 3D wireless NoC • Chips are connected via unidirectional rings

Mar 24th, 2014 22

TX

TX

TX

TX TX

TX

TX

TX

TX

Pillar

TX

RX Router to horizontal NoC

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Page 23: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock problem

• Ring inherently includes a cyclic dependency

Mar 24th, 2014 23

Buffer

Pillar

TX TX

RX Router to horizontal NoC

Page 24: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Buffer

Ring approach: Deadlock problem

• Ring inherently includes a cyclic dependency

Mar 24th, 2014 24

Pillar

TX TX

RX Router to horizontal NoC

Page 25: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock problem

• Ring inherently includes a cyclic dependency

Mar 24th, 2014 25

Buffer

Cannot move

Cannot move

Cannot move

Pillar

TX TX

RX Router to horizontal NoC

Any packets cannot advance Deadlock avoidance is needed

Page 26: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock avoidance

• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each

message class

• Bubble flow control – Buffer space of a single

packet must be always reserved in each router

– All message classes share the same buffers

Mar 24th, 2014 26

Cyclic dependency is formed

[Puente,ICPP’99]

Page 27: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock avoidance

• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each

message class

• Bubble flow control – Buffer space of a single

packet must be always reserved in each router

– All message classes share the same buffers

Mar 24th, 2014 27

Cyclic dependency is cut at the dateline

Two VCs (VC0 and VC1)

Dateline

VC0 VC1

[Puente,ICPP’99] 2 VCs required for a message class; Multi-core uses multiple classes

Page 28: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock avoidance

• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each

message class

• Bubble flow control – Buffer space of a single

packet must be always reserved in each router

– All message classes share the same buffers

Mar 24th, 2014 28

Deadlock occurs, because all buffers are occupied

[Puente,ICPP’99]

Page 29: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Ring approach: Deadlock avoidance

• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each

message class

• Bubble flow control – Buffer space of a single

packet must be always reserved in each router

– All message classes share the same buffers

Mar 24th, 2014 29

Deadlock does not occur unless all buffers are occupied

Empty space of a packet

[Puente,ICPP’99] We employ Bubble flow for CMP with multiple message classes

Page 30: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Outline: 3D WiNoC Architectures

• 3D IC technologies: Wired vs. Wireless [5min]

• Prototype systems: Cube-0 & Cube-1 [15min]

• Wireless 3D NoC architectures [15min]

– Ring-based 3D WiNoC – Irregular 3D WiNoC

• Experiment results and Summary [10min]

Mar 24th, 2014 30 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC

architectures, especially inductive-coupling 3D option.

Page 31: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Two approaches: Wireless 3D NoC arch

• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability

• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol

Mar 24th, 2014 31 Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Chips should be added, removed, swapped for each app.

Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Good

Bad

Bad [Matsutani, NOCS’11]

Good

[Matsutani, ASPDAC’13]

Page 32: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Ad-hoc topology

• Wireless 3D CMPs – Various chips are stacked,

depending on the application

• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal

• Ad-hoc wireless 3D NoC – We cannot expect the network

topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 32

Chip 0

Chip 1

Chip 2

Chip 7

Page 33: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Ad-hoc topology

• Wireless 3D CMPs – Various chips are stacked,

depending on the application

• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal

• Ad-hoc wireless 3D NoC – We cannot expect the network

topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 33

Chip 0

Chip 1

Chip 2

Chip 7

Page 34: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Ad-hoc topology

• Wireless 3D CMPs – Various chips are stacked,

depending on the application

• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal

• Ad-hoc wireless 3D NoC – We cannot expect the network

topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 34

Chip 0

Chip 1

Chip 2

Chip 7

No horizontal link

Page 35: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Ad-hoc topology

• Wireless 3D CMPs – Various chips are stacked,

depending on the application

• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal

• Ad-hoc wireless 3D NoC – We cannot expect the network

topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 35

Chip 0

Chip 1

Chip 2

Chip 7

Extreme case: only the bottom has link

We need a mechanism to route packets even with such cases

Page 36: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then

go down

• An example – 4x4 2D mesh

– A root node is selected

Mar 24th, 2014 36

1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

[Schroeder, JSAC’91]

0 Root 0

Note: P lease refer to Part I I for routing strategy for irregular WiNoCs.

Page 37: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then

go down

• An example – 4x4 2D mesh

– Direction (up or down) is

determined Mar 24th, 2014 37

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Root

Up direction

[Schroeder, JSAC’91]

Page 38: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then

go down

• An example – 4x4 2D mesh – Routing path is generated – Down-up turn is prohibited – It generates imbalanced

paths Mar 24th, 2014 38

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Root

OK

NG

Up direction

[Schroeder, JSAC’91]

Page 39: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down

• Another example – 3D NoC with 4 chips

Mar 24th, 2014 39

Chip 0

Chip 1

Chip 3

Chip 2

6 7

4 5

2 3

0 1

Root

[Schroeder, JSAC’91]

Page 40: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down

• Another example – 3D NoC with 4 chips

Mar 24th, 2014 40

6 7

4 5

2 3

0 1

Root

[Schroeder, JSAC’91]

Chip 0

Chip 1

Chip 3

Chip 2

Page 41: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: Up*/down* routing

• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down

• Another example – 3D NoC with 4 chips

Mar 24th, 2014 41

6 7

4 5

2 3

0 1

Root

NG

OK

[Schroeder, JSAC’91]

Chip 0

Chip 1

Chip 3

Chip 2

The best spanning tree root is selected by exhaustive or heuristic using communication traces (9sec for 64-tile)

Page 42: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Irregular approach: UD with VCs

• UD routing with multiple VCs – Each layer (VC) has its own spanning tree – Packets can transit multiple layers in descent order

Mar 24th, 2014 42

Chip 0

Chip 1

Chip 3

Chip 2

6 7

4 5

2 3

0 1

Root

OK

Chip 0

Chip 1

Chip 3

Chip 2

Root’

[Koibuchi,ICPP’03] [Lysne,TPDS’06]

VC1 VC0

You can use either VC0 or VC1

6 7

4 5

2 3

0 1

OK

How to recognize the topology & build multiple spanning trees?

Page 43: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Outline: 3D WiNoC Architectures

• 3D IC technologies: Wired vs. Wireless [5min]

• Prototype systems: Cube-0 & Cube-1 [15min]

• Wireless 3D NoC architectures [15min]

– Ring-based 3D WiNoC – Irregular 3D WiNoC

• Experiment results and Summary [10min]

Mar 24th, 2014 43 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC

architectures, especially inductive-coupling 3D option.

Page 44: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Full-system CMP simulations

• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability

• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol

Mar 24th, 2014 44 Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Application performance of two approaches is evaluated

Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Good

Bad

Bad [Matsutani, NOCS’11]

Good

Page 45: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Network topology: Irregular

• The following iteration is performed 1,000 times – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50%

• We examined three cases: 16, 32, and 64 tiles

Mar 24th, 2014 45 4x4 mesh * 4chips

16-tile (2,2,4) 32-tile (4,2,4) 64-tile (4,4,4)

2x2 mesh * 4chips 4x2 mesh * 4chips

Page 46: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Network topology: Irregular

• The following iteration is performed 1,000 times – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50%

• We examined three cases: 16, 32, and 64 tiles

Mar 24th, 2014 46 4x4 mesh * 4chips

16-tile (2,2,4) 32-tile (4,2,4) 64-tile (4,4,4)

2x2 mesh * 4chips 4x2 mesh * 4chips

Among 1,000 random topologies, one with the most typical hop count value is selected for the full-system evaluation

Page 47: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Parallel programs are running on it

• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability

• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol

Mar 24th, 2014 47 Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

GEMS/Simics is used for full-system simulations

Chip 0

Chip 1

Chip 2

Chip 3

Chip 4

Good

Bad

Bad [Matsutani, NOCS’11]

Good

[Matsutani, ASPDAC’13]

Page 48: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Parallel programs are running on it

Mar 24th, 2014 48

Routers CPUs L2$banks MCs

16-tile 16 4 32 4

32-tile 32 8 64 8

64-tile 64 8 128 16

Table 1: Topologies to be examined

L1$ size & latency 64K / 1cycle

L2$ size & latency 256K / 6cycle

Memory size & latency 4G / 160cycle

Router latency [RC/VSA] [ST] [LT]

Router buffer size 5-flit per VC

Protocol MOESI directory

Table 2: Simulation parameters

Solaris 9 is running on 8-core UltraSPARC NPB (IS, DC, CG, MG, EP, LU, UA, SP, BT, FT)

Table 3: Application programs

GEMS/Simics is used for full-system simulations

Page 49: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Mar 24th, 2014

Application exec time: 16-tile

49

• Ring-based approach (VC flow & Bubble flow controls)

• Irregular approach • Irregular approach outperforms Ring-based

one by 10.8% in 16-tile case.

Page 50: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Mar 24th, 2014 51 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Application exec time: 64-tile

Ring has no scalability Irregular one improves significantly

• Ring-based approach (VC flow & Bubble flow controls)

• Irregular approach • Irregular approach outperforms Ring-based

one by 46.0% in 64-tile case.

Page 51: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Application exec time: 16-tile

Mar 24th, 2014 52 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

• Irregular (50% of horizontal links are implemented)

• 3D mesh (all horizontal links are implemented)

• Performance of Irregular approach Irr3(min) is closed to that of 3D mesh

Page 52: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Application exec time: 64-tile

Mar 24th, 2014 54 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Optimized Irr3(min) improves by 15.1% compared to the worst

• Irregular (50% of horizontal links are implemented)

• 3D mesh (all horizontal links are implemented)

• Performance of Irregular approach Irr3(min) is closed to that of 3D mesh

Page 53: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Experiment results: Cube-1 (2012)

Mar 24th, 2014 55 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Cube-1

Motherboard

[Miura, HotChips’13 Demo]

Cube-1 demo system PE array chip performs image processing

CPU chip for control

Page 54: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Mar 24th, 2014 56 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Experiment results: Cube-1 (2012)

Page 55: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Experiment results: Cube-1 (2012)

Mar 24th, 2014 57 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

[Miura, IEEE Micro 13]

Packet error rate: Error-free operation at nominal

supply voltage

Power consumption: 5.8mW per 2Gbps channel

(at 0.92V)

Page 56: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Summary: 3D WiNoC Architectures

Mar 24th, 2014 58 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

• Inductive-coupling 3D SiP – A low cost alternative to build low-volume custom

systems by stacking off-the-shelf known-good-dies – No special process technology is required;

inductors are implemented with metal layers • Cube-1: A practical 3D WiNoC system

– Two types: Host CPU chip & Accelerator chips – We can customize number & types of chips in SiP

Page 57: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Future plans: 3D WiNoC Architectures

Mar 24th, 2014 59 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Steady Challenging

Cube-2 design: • STM 28nm process • CPU & Accelerator chips • Memcached accelerator

chip for smart sensors

All-in-one TX/RX macro: • Coil + Routers/buffers • Coil uses only metal

layers; silicon available

Power/heat management: • Dynamic on/off control

of inductors • Closed-loop control [Elfadel, DATE’13] (See Part I)

Combine 2D & 3D WiNoC: • mm-wave wireless for

intra-chip (See Parts II and III)

Page 58: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

Future plans: 3D WiNoC Architectures

Mar 24th, 2014 60 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Steady Challenging

Jumbo inductors: • Jumbo inductor like a

power ring • <1mm communication • Relax SiP assembly

Wireless broadcast bus: • Combine wireless

vertical bus & P2P links • Static/dynamic TDMA vs.

CSMA/CD

Cartridge style computer: • Insert necessary chips • Power/clk from cartridge • Inter-chip data transfers

use wireless • Power from wireless ??

Exploiting small-world: • Add a random NoC chip

to 3D WiNoC SiP to shorten path length

(See Part I)

Page 59: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

References (1/2) • Cube-0: The first real 3D WiNoC

– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.

– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).

• Cube-1: The heterogeneous 3D WiNoC – N. Miura, et.al., "A Scalable 3D Heterogeneous Multicore

with an Inductive ThruChip Interface", IEEE Micro (2013).

• MuCCRA-Cube: Dynamically reconfigurable processor – S. Saito, et.al., "MuCCRA-Cube: a 3D Dynamically

Reconfigurable Processor with Inductive-Coupling Link", FPL 2009.

Mar 24th, 2014 61 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14

Page 60: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow

References (2/2) • Vertical bubble flow control on Cube-0

– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.

– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).

• Spanning trees optimization for 3D WiNoCs – H. Matsutani, et.al., "A Case for Wireless 3D NoCs for

CMPs", ASP-DAC 2013 (Best Paper Award).

Mar 24th, 2014 62 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14


Recommended