+ All Categories
Home > Documents >  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key:...

 · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key:...

Date post: 17-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
84
http://www.artist-embedded.org/
Transcript
Page 1:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 1 -

http://www.artist-embedded.org/

Page 2:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 2 -

A well-known evolution: Multi-core SoC

From System-on-Chip… to …Multi-Core System-on-Chip

Processor Memory

UART

Audio

SIM Keyboard

USB

Bluetooth

Display

Camera

GPIO UART

Audio

SIM Keyboard

USB

Bluetooth

Display

Camera

GPIO

Processor Memory

Processor Processor

Processor Memory

Processor Processor

Processor Processor Processor Processor

Processor

Processor

Processor

From

ITR

S 2

009

[ww

w.it

rs.n

et]

Page 3:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 3 -

In embedded systems: ITRS 2009 ●  « SOC-Consumer Portable Drivers »

Performance * 1000 in 15 years

Power consumption objective 500 mW

PE = dedicated accelerators, 250 kG/64 Kbits

Same design effort

Main Prc

Main Prc

Main Prc

Main Prc

Main Memory

Peripherals

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE PE

Function A

Function B

Function C

Function D

Function E

Software View

Architecture

Page 4:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 4 -

Two possible paths ●  Homogeneous = replication of identical resources

+ Programming simplicity + Fault and variability-tolerance + Flexibility - Area - Power consumption / performance

●  Heterogeneous = each resource has its own dedicated function + Area + Power consumption / performance - Each resource is critical - Programming is more complex

Page 5:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 5 -

●  Quiet evolution: mixing heterogeneous and homogeneous –  Communications are key: Network-on-Chip (NoC) –  Control distribution

●  Revolution ? –  Dynamic adaptation through reconfiguration –  Distributing decisions

Towards regular and adaptable architectures

Page 6:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 6 -

●  Context

●  MAGALI overview

●  NoC, GALS and Low-Power

●  Dynamic reconfiguration

●  Distributed decisions

●  Configuring & programming

●  Conclusion

Outline

Page 7:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 7 -

An application starting point

“Software Defined Radio” Femtocells MIMO (ICT projects Befemto & ARTIST4G)

“Cognitive Radio” TERROP NEWCOM

Page 8:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 8 -

●  Increasing complexity

–  MIMO scheme

–  Spectral efficiency increase

=> 1 Tops needed in 2015

●  Increasing flexibility

–  Software Defined Radio, Cognitive Radio

more control, more configurations

●  Strict constraints

–  Hard real-time: frame = 1 ms

–  Mastering computing latency mandatory

(latency => memory => real estate => cost)

–  Power consumption under 500 mW

What are the problems?

C.H. Van Berkel, “Multi-Core for Mobile Phones”, DATE’09

Page 9:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 9 -

LETI’s NoC

  2-D mesh based NoC   Support heterogeneous tiles: IP, memory blocs, programmable cores, reconfigurable hardware   Data-flow homogeneous programming model   Communication/Configuration (CC) controller   GALS implementation for advanced power management

LCG

GALS

CC

IP

Power

GALS interface

Local Clock Generator

Power Control

Communication/Configuration controller

IP Core

Page 10:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 10 -

MAGALI Chip

●  TRX_OFDM: 32-2048 FFT/iFFT, GI insertion, framing/deframing, power normalization

●  DCM: Fully programmable memory cores for data storage and manipulation (32Kwords 32bits), Configuration server

●  MEPHISTO: VLIW cores for complex matrices computation (8GMAC/s) ●  BIT cores: Support for mapping / interleaving / puncturing (TX and RX)

●  FEC decoders: reconfigurable channel decoders supporting LDPC, Viterbi and Turbo decoders

Page 11:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 11 -

MAGALI Chip layout

●  ST 65nm LP technologie, 5400µm x 5400µm, 30mm²

●  Total power < 500mW

●  NoC area (15 Routers + 20 GALS interfaces + NoC links) : 11% overall chip area

Page 12:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 12 -

●  Context

●  MAGALI overview

●  NoC, GALS and Low-Power

●  Dynamic reconfiguration

●  Distributed decisions

●  Configuring & programming

●  Conclusion

Outline

Page 13:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 13 -

What is a NoC? ●  “NoC is an interconnection structure for exchanging

information on a chip between heterogeneous or homogeneous HW/SW resources”

Page 14:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 14 -

Some history

●  1980 to 2000 : Multiprocessors networking

●  2000, Jantsch et Al. « NoC: an architecture for billion transistor area »

●  2000 : A. Greiner et Al. « SPIN, a fat-tree topology for IP communications »

●  2001, Dally et Al. : « Route packets, not wires »

●  2002, Benini and De Micheli: « NoC: a new SoC paradigm »

Page 15:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 15 -

NoC research worldwide

Page 16:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 16 -

Communication-Centric platform ●  Concepts

–  Architecture platform articulated around a Network-on-Chip

–  Network-on-Chip with QoS for high throughput communications, low latency, deadlock or live-lock free, reliability

–  Efficient implementation with GALS techniques ●  Key element for power management and isolation of faulty elements

–  Need for: ●  Efficient programming model

●  Associated tools: –  Development possible thanks to platform concept

Page 17:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 17 -

SoC standard methodology SoC Spec.

Arch. Def.

HW units Design

Com. Design

SoC Integration

Communication re-definition

Software def., tools dev.

Application mapping 1

T0

T0+6

T0+18+n*x

T0+24+n*x

T0+30 +n*x

Page 18:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 18 -

SoC communication-centric methodology

SoC spec.

Arch. Def.

HW units Design

Com. Config.

SoC Integration

Software def., tools adaptation

Application mapping 1

Software & tools libraries

Com. template

T0

T0+9

T0+18

T0+24

Page 19:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 19 -

NoC topics ●  NoC is in the heart of programming model

–  What are the functions of a NoC ?

●  Just an efficient interconnection medium

●  Added Quality of Service

●  Partial/full Support for programming model

–  Communication protocol stack implemented

●  NoC is in the heart of parallel and distributed computing

–  New tools for application mapping are needed

●  NoC is in the heart of implementation issues

–  Globally Asynchronous, Locally Synchronous structures

–  NoC is a potential weakness point for reliability, variability

Page 20:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 20 -

NoC topics ●  NoC is in the heart of power consumption issues

–  NoC itself can be power hungry

–  NoC can open new solutions for smart management of power for the whole structure

●  NoC is a new paradigm shifting from IP re-use to platform re-use

–  Need new design tools (exploration, construction)

●  NoC arises new questions on classical topics

–  Testability of the NoC itself, and its associated IPs

–  Debug is a difficult issue : determinism is often required by industrials, but difficult to achieve…with GALS, parallel and distributed structures

Page 21:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 21 -

NoC Protocol Stack Programming model of the NoC-based platform is essential. It can determine :

•  Reconfiguration management

• Task synchronization

•  Power management

•  Bandwidth allocation

•  End-to-end flow control

•  Protocol wrappers

•  Packet routing

•  GALS strategy

OS

I Tra

nsm

issi

on le

vel

Page 22:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 22 -

An example of NoC particularities: Topology

Chordal ring Mesh Hypercube

Omega Network

Switches Configurations

Page 23:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 23 -

Scalability and implementation

●  Topology: Chordal ring ●  Implementation:

–  NODE 4*4 only => 20 to 25 % area gain, < 5% performance gain compared to 5*5 needed for mesh

–  Is it a good layout ?

–  Long wires ?

Page 24:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 24 -

Topology vs 2-D layout : scaling 0 1 2 3 4

5

6

7

8 9 10 11 12

13

15

14

0 1 2 3

4 5 6 7

8 9 10 11

12 13 15 14

0 1 2 3 4 5 6

7

8

9

10

11

12 13 15 14 16 17 18

23

22

21

20

19

0 1 2 3 4 5

12 13 14 15 16

11 10 9 8 7 6

23 22 21 20 19 18

17

16 nodes

Topology : 4*4

Layout 1:1

2 medium wires 1 long 1 cross

24 nodes

Topology : 6*4

Layout 3:2

2 medium wires 1 long 1 cross

OK

OK

Page 25:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 25 -

Topology vs 2-D layout: 32 units case

0 1 2 3 4 5

16 17 18 19 20

15 14 13 12 11 10

31 30 29 28 27 26

21

Topologie : 8*4

Layout 2:1

2 average wires 1 long 1 cross

6 7

22

9 8

25 24

23

0 1 2 3 4 5

16 17 18 19 20 21

6 7

22 23

11 10

27 26

9 8

25 24

28 29 30 31

12 13 14 15

Topologie : 8*4

Layout 1:1

3 average wires 2 long wires 1 cross

NOK

OK, but Long wires

New layout

Page 26:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 26 -

Deleting the long wires ?

0 1 2 3 4 5 6

9

10

11

12

13

14 26

31

30

29

28

27

7 8

15

16 24 23 22 21 20 19 18 17

25

Topology : 8*4

Layout 1:1

4 average wires 1 wire crossing Long lines deleted

Diameter = 16

Equivalent mesh diameter = 10

0 1 2 3 4 5

16 17 18 19 20 21

6 7

22 23

11 10

27 26

9 8

25 24

28 29 30 31

12 13 14 15

Page 27:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 27 -

Final comparison mesh/chordal ring CHORDAL RING

Diameter = N/4 (if all the wires) N = 16, D = 4 N = 32, D = 8 N = 64, D = 16 N = 79, D = 20

With layout view :

Layout 1:1 with long lines

If long lines deleted : Equivalent diameter = real diameter *2 + p*mean lines costs

Else Equivalent diameter = real diameter + 1*long line costs + p*mean line costs If cost ~ real distance : = real diameter *2 + p*mean lines costs

MESH

Diameter = 2*(SQRT(N)-1) N = 16, D = 6 N = 32, D = 10 N = 64, D = 14 N = 79, D = 16

With layout view :

Layout 1:1 short lines only

Equivalent diameter = real diameter

Page 28:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 28 -

Implementation challenges ●  Globally Asynchronous Locally Synchronous (GALS) architecture

–  NoC is in the center of such issues

●  Low-power schemes –  Communication is power-consuming –  NoC implementation influences low-power policies

●  Test & Debug –  Mandatory for industrial acceptance –  Distributed systems induced by NoC are difficult to debug (lost of pure

determinism in many cases)

●  Tools –  Mandatory for NoC-based architecture design

●  Other challenges : –  Optical NoC, 3-D implementation

Page 29:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 29 -

GALS Architectures ●  With technology shrinks

–  Timing closure & Clock tree synthesis problems, even when using Physical Synthesis

–  Reliability issues –  Communication Power Consumption

(due to long wire loads)

●  Globally Asynchronous

Locally Synchronous (GALS) architecture –  IPs are synchronous islands

–  System communications are asynchronous

Page 30:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 30 -

Metastability issue (1)

Dout

Clk 1

Din

Clk 2

Clk 1 Clk 2

Dout Din

Page 31:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 31 -

Metastability issue (2)

Dout

Clk 1

Din

Clk 2

Din

Clk 2

Dout2

Multiple flip-flop Can “solve” the problem

Page 32:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 32 -

Boundary Synchronization (mesochronous)

Locally-Synchronous Island

Clk(n)

 Low area overhead  Power consumption "   Verification "   Throughput "   Latency

R. Dobkin, R. Ginosar, C. Sotiriu, Data Synchronization Issues in GALS SoCs, Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, pp. 170-179, Crete, Greece, 19 - 23 April 2004.

T. Bjerregaard, S. Mahadevan, R. Grøndahl Olsen and J. Sparsø, An OCP Compliant Network Adapter for GALS-based SoC Design Using the MANGO Network-on-Chip, Proceedings of the International Symposium on System-on-Chip (SoC'05), pp. 171-174, 2005.

Clk(n-1)

Adaptation Layer

Adaptation Layer

Clk(n+1)

Page 33:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 33 -

Bi-synchronous Gray FIFO based

Locally-Synchronous Island

clk

Port Controller

aclk

Port Controller

aclk

 Simple solution,  no additional cells  high throughput "   area cost "   power consumption

T. Chelcea, S. Nowick, Low-latency asynchronous FIFO's using token rings, Proceedings of International Symposium on Advanced Research in Asynchronous Circuits and Systems, pp. 210-220, April 2000.

A. Chakraborty, M. Greenstreet, Efficient Self-Timed Interfaces for Crossing Clock Domains, Proceedings of 9th International Symposium on Asynchronous Circuits and Systems (ASYNC'2003), pp. 78-88, Vancouver, Canada, 2003.

E. Beigne, P. Vivet, Design of On-chip and Off-chip Interfaces for a GALS NoC Architecture, Proceedings of 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06), Grenoble, France, pp. 172-181, March 2006.

Page 34:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 34 -

Pausable (or stretchable) clocks  Low area overhead  Low consumption  Adaptable to DFS "   Need local clock generator & specialized cells "   Throughput lowered

K. Yun, R. Donohue, Pausible Clocking: A first step toward heterogeneous systems, Proceedings of International Conference on Computer Design (ICCD), October 1996.

J. Muttersbach, T. Villiger, W. Fichtner: "Practical Design of Globally-Asynchronous Locally-Synchronous Systems", Proceedings of the Sixth International Symposium on Advanced Research in Asynchronous Circuits and Systems, ASYNC'2000, Eilat, Israel, pp. 52-59, April 2-6, 2000.

Page 35:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 35 -

GALS interfaces: conclusion ●  Mesochronous is simple BUT limited ●  Pausable Clock has intrinsic defaults for industrialization ●  GALS FIFO are the best way. Gray code is not optimal => other

code

●  Ex: 65 nm, MAGALI chip - Johnson Code –  500 Mhz –  0.014 mm2 –  10 µW leakage –  3 pJ/flit

Page 36:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 36 -

Asynchronous NoC nodes and links ●  5x5 network router, mesh topology

●  Delay Insensitivity

●  High Robustness to process variations and external conditions

–  temperatures, voltage drop… ●  Natural enabler for Dynamic Voltage Scaling

–  no need for clock frequency scaling during transitions

QDI 4-rail pipeline stage

Page 37:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 37 -

Async. Node Architecture & Performance

●  Architecture –  Fully decentralized arbitration –  5 Input Controllers : flits routing –  5 Output Controllers : flits

arbitration –  2 Virtual Channels

Techno : CMOS 65nm Throughput: 550 Mflits/s – 17.6 Gb/s Leakage: 210 µA Energy: 30 pJ/flit Area 0.17 mm2

Page 38:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 38 -

LETI’s NoC

LCG

GALS

CC

IP

Power

GALS interface

Local Clock Generator

Power Management

Communication/Configuration controller

IP Core

●  2-D mesh based NoC

●  Communication/Configuration (CC) controller

●  Support heterogeneous tiles : IP, memories (MEM), programmable cores, reconfigurable hardware (RH)

●  GALS implementation

●  Tools for NoC-based design and exploitation

Page 39:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 39 -

Low-Power & NoC ●  Transmission lines

●  Local DVFS

●  Partial activation of routers

●  Data coding

●  Routing algorithms

●  Topology choice

●  Programming model

●  Application

Transistors

System

Power Gain

Page 40:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 40 -

Local DVFS ●  Always associated with GALS techniques

●  Island partitioning

–  NoC regions are at different voltages

●  Each Unit with its local voltage/frequency

U. Y. Ogras, R. Marculescu, P. Choudhary, D. Marculescu, “Voltage-Frequency Island Partitioning for GALS-based Networks-on-Chip” Proceedings of DAC 2007, June 4–8, 2007, San Diego, California, USA

E. Beigné, F. Clermidy, S. Miermont, P. Vivet, “Dynamic Voltage and Frequency Scaling Architecture for Units Integration within a GALS NoC”, Proceedings of the 2nd IEEE International Symposium on Networks-on-Chip, NOCS’2008, New-Castle, UK, April 2008.

Page 41:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 41 -

VDD Hopping : Principle ●  Energy per operation scales with V² ●  Use of two PMOS power switches

–  Vhigh, Vlow : a discrete DVS –  Switch between Vhigh and Vlow :

●  Smooth and fast transitions (less than 100 ns) ●  Programmable Duty Ratio

Page 42:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 42 -

VDD-Hopping: distribution

●  VDD-Hopping offers DVFS at IP level –  No need of inductor, capacitor, charge pump –  Fully integrable –  Low area (3% of IP area), –  High Power Efficiency (95%) –  Only requires two external supplies per IP :

●  Vhigh (nominal voltage) & Vlow (set wrt. to logic & SRAMs constraints)

LPM : Local Power Manager LCG : Local Clock Generator

Page 43:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 43 -

VDD Hopping: clock management

Page 44:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 44 -

Resource power control

Local Clock Generator

Comm. and Conf. Controller

(CCC)

Unit Clock 320-790 MHz

PMU

Target frequency

core clock Processing Core

f1(X) f2(X) Idle low

freq1 freq2 Current

function

Asynchronous Router

Page 45:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 45 -

Exploration of VDD-Hopping benefits ●  VDD-Hopping power reduction capabilities :

–  On-line dynamic slack time optimization : 30% gain wrt. static DVFS

–  DVFS compared to On/Off mode : 45% gain

–  Total chip budget : reduction from 340mW downto 160mW

3GPP-LTE Application (MAGALI) SYSTEMC-TLM power simulation

Page 46:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 46 -

●  Context

●  MAGALI overview

●  NoC, GALS and Low-Power

●  Dynamic reconfiguration

●  Distributed decisions

●  Configuring & programming

●  Conclusion

Outline

Page 47:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 47 -

Semi-distributed control

PE PE DCM PE

DCM PE HOST PE

PE PE PE DCM

●  Data-flow directed synchronization (fork, join, loop) through each PE associated CC

●  Complex data and flow mixing performed in DCM

●  If more complex control => host control

DCM = Data and Configuration Memory

J. Martin et al., “A Microprogrammable Memory Controller for High‑Performance Dataflow Applications”, ESSCIRC’09

CC = Communi-cation & Configuration controller

F(x)

Page 48:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 48 -

Communication scheme

OCC ICC CORE

Resource

ICC ICC ICC OCC OCC OCC

50 <= Prod. 1 150 <= Prod. 2

Producer Consumer

200 <= T1 T1 <= 100

50 => Cons. 1 50 => Cons. 2

Configuration Tasks Context

P1

P2 C2

C1 R

Page 49:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 49 -

µProgrammed data synchronization C

D

B

A

ICC0 +30 data

+75 data

Send 30

Send 60

Send 15

OCC0

OCC0

Recv 10 Recv 20

Recv 15 ICC1

ICC0

45 → 10

15 → 15

x2 x3

Send 20 OCC0

Send 15

OCC1

Recv 20 Recv 15

-35 data

CORE

CORE

CORE

CORE

Mnemonic, operand(s) Description RC c s Request configuration RCL c s Request configuration + Loop pointer LL n Go back to stored loop position. Loop n times GL n Go back to first instruction. Loop n times LLi r Go back to stored loop position. Loop number in register r GLi r Go back to first instruction. Loop number in register r STOP End of micro-program

Page 50:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 50 -

Dynamic reconfiguration

PE PE DCM PE

DCM PE HOST PE

PE PE PE DCM

●  PE configurations are stored in DCM memories ●  When a PE has to run a configuration not loaded => ask to

associated DCM

●  Configurations can be modified online by the host

F. Clermidy et al. “A Communication and Configuration Controller for NoC based Reconfigurable Data Flow Architecture”, NOCS'09

Page 51:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 51 -

Self-configuration protocol

Configuration Server Resource

CFM

Cfg.Mem DCM

Config. Memory

Destination base address

Source base

address

REQ_MOVE @s , @d

MOVE @d Data Word 1 … Data Word N

@s @d

Cfg. 2 Cfg. 3 Configuration

Versus slots descriptors

Cfg. 1 slot 1 slot 1 slot 2

slot 1 slot 2

config 1 empty config 2 config 3

REQ_MOVE@s+1, @d+l

MOVE @d+l Data Word 1 … Data Word N

Cfg. 1 Cfg. 2 Cfg. 3

* N

Page 52:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 52 -

Some results: reconfiguration time ●  3GPP-LTE : RT-constraints 1 ms

–  4 configuration phases –  Most configuration time hidden by computation time

=> 4 µs reconfiguration time

Page 53:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 53 -

●  Context

●  MAGALI overview

●  NoC, GALS and Low-Power

●  Dynamic reconfiguration

●  Distributed decisions

●  Configuring & programming

●  Conclusion

Outline

Page 54:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 54 -

Why distributing decisions?

●  Number of cores is increasing => central decision is slow

●  Process variations

●  Increasing flexibility demand (applicationS)

●  Individual optimization required –  Power –  Variability

–  Thermal –  Real-time (reducing buffering needs)

●  And at run-time!

Page 55:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 55 -

Options?

●  Design-time optimization –  Greedy algorithm, tabu search, simulated annealing, Genetic algorithms, Linear

model optimization

⇒  Processing requirement is too high for run-time usage

●  Run-time optimization –  Convex optimization, Non-linear lagrange optimization, Integer linear

programming, Off-line exploration + on-line manager

⇒  Centralized method: scalability of processing and communication?

=> So? Distributing centralized methods or optimizing distributed algorithms?

Page 56:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 56 -

Distributed Scheme: Game Theory ●  Game Theory models:

–  Players

–  Interacting through actions

–  Makes decisions (distributed & parallel)

–  Maximizing individual gain (Objective Function)

–  Solution: nobody can unilaterally improve his gain (Nash Equilibrium)

Page 57:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 57 -

Game Theory in MP-SoC ●  Game Theory MP-SoC

–  Players PE

–  Actions PE parameters (e.g.: frequency)

–  Decision making actuators in PE (e.g.: DVFS)

–  Individual gain objective function per PE (e.g.: performance, power)

–  Solution: Nash Equilibrium objective function maximization

PE-1 PE-2

PE-3 PE-4

DVFS set

DVFS set

DVFS set

DVFS set

Perfor-mance

Power

Perfor-mance

Power

Perfor-mance

Power

Perfor-mance

Power

●  So... what do we need?

–  Distributed Objective Function

–  Local Maximization Algorithm

Page 58:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 58 -

Applicative + Technological

Technological

Thermal management example

T1

T2

T4

T3 T5

T6

IN

OUT

Synchronization Frequency

PE-1 PE-2 PE-3

PE-4 PE-5 PE-6

Temperature Frequency

PE-1 PE-2 PE-3

PE-4 PE-5 PE-6

T1 T2

T4

T3

T5 T6

Applicative

How do we set Frequencies

to optimize Synchronization +

Temperature?

Page 59:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 59 -

Temperature optimization

Different trade-offs between application latency and temperature

Best latency Best temperature

Page 60:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 60 -

Convergence & Scalability

Number of processors

Con

verg

ence

[gam

e cy

cles

]

Average

Convergence does not explode with the number of processors!

99.7%

95%

68%

300000 scenarios

Synthetic applications

10 freq. 100Mhz-200Mhz

Page 61:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 61 -

Optimality study

Optimization [%]

Num

ber o

f sim

ulat

ions

●  Comparison with Matlab Minimax function

●  8000 random scenarios

●  Optimization average @ 89%

Page 62:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 62 -

  Reactivity time of the controller is about 5ms   Throughput degradation: 0.17%

Criteria µprog MIPS HW Matlab model HW Optimized model

Frequency [MHz] 400 25 100

Performance overhead

[Game cycle duration] 2420 461 752

Area overhead (mm2) 0.122 0.061 0.014

Com.overhead (clk cycles) 58 7 7

Implementation (65nm)

●  Local Decision Maker (LDM)

Page 63:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 63 -

Power Management using Consensus

0 100 200 300 400 500 600 0

0.05

0.1

0.15

0.2

0.25

0.3

Algorithm iterations

ener

gy c

onsu

mpt

ion

[mJ]

energy consumption minimal energy consumption

80% 87%

Mode 1: Rb = 1

Mode 5: Rb= 10

Mode 3: Rb=2

Page 64:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 64 -

Modifying latency constraints on-line

Page 65:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 65 -

●  Context

●  MAGALI overview

●  NoC, GALS and Low-Power

●  Dynamic reconfiguration

●  Distributed decisions

●  Configuring & programming

●  Conclusion

Outline

Page 66:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 66 -

Programming Steps

SoC spec.

Com.def.

HW units Design

Com. Config.

SoC Integration

Software def., tools adaptation

Com. mapping

Software & tools libraries

Com. template

Page 67:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 67 -

NS-2 Modeling

●  NS-2 components adaptation

●  Network design –  2D-mesh, packet-switching

●  Units + Network Interface design –  Network Interface: use of Agent

component for modeling the protocol

–  Generic processing units: dataflows modeling (Application component )

–  Configuration parameters

NAM view

Application

Agent

Nodes, Links, Classifiers

Network

Network Interface

Processing Units NS-2 / NoC Relationships

Page 68:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 68 -

Results ●  Applicative throughput

–  Cumulate throughput for each resource

–  Maximum value: 20 resources × 3,2Gbps (100MHz) 64Gbps

–  Simulation: maximum throughput 20Gbps

–  NoC is needed for such application

Mean throughput 12,5Gbps

Frame 1 Frame 2 User Traffic + Rx sampling

Time (µs)

Glo

bal t

hrou

ghpu

t (G

bps)

Cumulated Rx throughput for all resources

Page 69:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 69 -

SystemC-TLM environment ●  Generated from the IP-XACT Magillem tool

●  Complete NoC SystemC/TLM Platform

–  Based on SystemC 2.1 + TLM OSCI 2.0 draft + ST TLM devkit

–  Include NoC nodes + CC controller

●  IP integration within NoC ?

–  A new IP derives from CC base classes

–  User only need to implement computation and configuration IP functionnalities

Page 70:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 70 -

NS-2 / SystemC Comparison ●  Comparison NS-2 with SystemC model (behavioral)

–  15 % differences. Due to switching mode modeling in NS2

●  Simulation time: time needed to decode a 3GPP-LTE Frame

0510152025406080100120140160180200220240LATENCE SYMBOLE DONNEE TXNuméro de symbole donnéeLatence (µs)SystemCNS2

11.522.533.5430354045505560657075LATENCE SYMBOLE PILOTE TXNuméro de symbole piloteLatence (µs)SystemCNS2

faster RTL Co-sim

(25% RTL) Full TLM NS2

17’25 1’50 5”14 1”47 X 9.5 X 21.4 X 3.5

Page 71:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 71 -

Programming Steps

SoC spec.

Com.def.

HW units Design

Com. Config.

SoC Integration

Software def., tools adaptation

Com. mapping

Software & tools libraries

Com. template

Page 72:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 72 -

NI automatic generation

●  Communication & Configuration (CC) controller = NI + high level communication and configuration primitives

●  Numerous parameters

–  Fundamental ●  Cores number

●  Input/output flows

–  Level of functionalities ●  Context size

●  Number of configuration

–  Power management ●  Global gated clock enable

Page 73:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 73 -

CC Micro-Architecture and Design Configuration

Communication

QoS Debug

Power Management

(DVFS)

Page 74:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 74 -

CC controller generation

●  All CC blocs are IP-XACT compliant

●  Magillem tool (MDS collaboration)

–  Generator to create a CC: TGI interface

# Parameter = Value ; # Range or values $unit_name = trx_ofdm ; $nb_cores = 1 ; # 1 .. 4 $nb_fifo_in = 2 ; # 1 .. 4 $nb_fifo_out = 2 ; # 1 .. 4 $nb_cfg_icc = 6 ; # 1 .. 2^nb_bits_slot_id $nb_cfg_occ = 6 ; # 1 .. 2^nb_bits_slot_id $default_size_available_fifo_in[0] = 16 ; # <2^16 (default 9) $nb_bits_size_available_fifo_in[0] = 8 ; # 1 .. 16 (default 8) $nb_bits_size_released_fifo_in[0] = 8 ; # 1 .. 16 (default 9) $default_size_available_fifo_in[1] = 16 ; # 1 .. 16 (default 9) $nb_bits_size_available_fifo_in[1] = 8 ; # 1 .. 16 (default 8) $nb_bits_size_released_fifo_in[1] = 8 ; # 1 .. 16 (default 9) $nb_bits_size_available_fifo_out[0] = 5 ; # 1 .. 16 (default 9) $nb_bits_size_available_fifo_out[1] = 5 ; # 1 .. 16 (default 9) $core_name[0] = trx_ofdm ; $core_binding_fifo_in[0] = [0,1] ; $core_binding_fifo_out[0] = [0,1] ; $nb_bits_core_status[0] = 16 ; # 1 .. 32 $core_cfg_begin[0] = 0 ; # 0 .. 2^nb_bits_core $nb_bits_core_addr[0] = 10 ; # 1 .. 21 (default 8) $core_cfg_size[0] = 8 ; # 1 .. nb_bits_core_addr $nb_cfg_core[0] = 3 ; # 1 .. 2^nb_bits_slot_id $has_gc_en_core[0] = 1 ; # 0 1 $nb_bist_elements[0] = 14 ; # user-def $scan_counter_width = 9 ; # user-def

Page 75:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 75 -

Programming Steps

SoC spec.

Com.def.

HW units Design

Com. Config.

SoC Integration

Software def., tools adaptation

Com. mapping

Software & tools libraries

Com. template

Page 76:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 76 -

Platform model

NoC programming general scheme

SME (RAM)

rotor

rotor rx_ofdm

rx_ofdm chan_est

equal

dmap

rx_fht

SME (RAM)

Application Model

Simulation

Fichiers de

configuration Configuration

files

End

Platform XML model

topology

functions

SME (RAM)

rotor

rotor rx_ofdm

rx_ofdm chan_est

equal

dmap

rx_fht

SME (RAM)

Application

Semi-automatic Mapping

dmap

rotor rx

ofdm rx fht

chan est

equal

SME SME

(RAM)

rotor

rotor rx_ofdm

rx_ofdm chan_est

equal

dmap

rx_fht

SME (RAM)

Debug

Optimization

Page 77:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 77 -

Programming choices rational

●  Limited Memory –  Off-line computing of communications –  On-line full programming

●  Numerous parameters for one application –  But few local adaptations

●  Fast Reconfiguration

=> Off-line computing with On-line adaptation

Page 78:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 78 -

Mapping and configuration manipulation

●  Bottom up view –  SW libraries for programming the communications, the HW IP,

…, at several levels.

⇒  Communication and configuration APIs (F2 APIs)

●  Top down view –  High-level models –  Tools for mapping the application on the hardware

⇒  Communication compiler (Comc)

Page 79:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 79 -

F2 APIs : layered architecture

Registers Network: (f2_write_packet, …)

Paths Memory sharing with local SME HAL

Send data

Send config.

Send credits

Enable task

Request session

… NOC protocol

NI configurations

SME configurations

Core configurations Configurations

ITM AMR LPM CFM IDM … NI configurations

MEP RX bit TX bit … Core configurations

+ +

+

+

Page 80:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 80 -

F2 APIs : description of the register map ●  Macros generated from IP-XACT

●  Example for the NI (partial):

/* Definitions for block ITM_CONFIG */ #define ITM_CONFIG_OFFSET 0x00 #define ITM_CONFIG_RANGE 2

/* Definitions for register CONFIG_1 */ #define ITM_CONFIG_CONFIG_1_OFFSET 0x00

/* Definitions for register field CHANNEL */ #define ITM_CONFIG_CONFIG_1_CHANNEL_OFFSET 31 #define ITM_CONFIG_CONFIG_1_CHANNEL_SIZE 1 #define ITM_CONFIG_CONFIG_1_CHANNEL_SET(_val_) \

SET_VAL(_val_,ITM_CONFIG_CONFIG_1_CHANNEL)

/* Definitions for register field SOURCE_ID */ #define ITM_CONFIG_CONFIG_1_SOURCE_ID_OFFSET 18 #define ITM_CONFIG_CONFIG_1_SOURCE_ID_SIZE 7 #define ITM_CONFIG_CONFIG_1_SOURCE_ID_SET(_val_) \

SET_VAL(_val_,ITM_CONFIG_CONFIG_1_SOURCE_ID)

/* Definitions for register field PATH_TO_TARGET */ #define ITM_CONFIG_CONFIG_1_PATH_TO_TARGET_OFFSET 0 #define ITM_CONFIG_CONFIG_1_PATH_TO_TARGET_SIZE 18 #define ITM_CONFIG_CONFIG_1_PATH_TO_TARGET_SET(_val_) \

SET_VAL(_val_,ITM_CONFIG_CONFIG_1_PATH_TO_TARGET)

Page 81:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 81 -

Comc : goals ●  Ease the tasks of the SW developer, by using a functional

description of the data flow

●  Hide the complexity due to the architectural concepts

●  Allow to describe parameterized configurations

Page 82:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 82 -

Communication mapping workflow

Binary

Configurations

Compilation & Link

dmap

rotor rx

ofdm rx fht

chan est

equal

SME SME

(RAM)

rotor

rotor rx_ofdm

rx_ofdm chan_est

equal

dmap

rx_fht

SME (RAM)

SME (RAM)

rotor

rotor rx_ofdm

rx_ofdm chan_est

equal

dmap

rx_fht

SME (RAM)

Data flow description

C

Code

Hardware Platform

Application

Page 83:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 83 -

Conclusion

●  NoC-Based embedded system is a paradigm shift –  Communication-centric scheme –  Large choices and optimization possibilities –  Implementation (GALS, Low-Power)

●  Scalability leads smartness to go to lower levels –  Control –  Reconfiguration –  Decisions

●  Programmability of heterogeneous platforms is key

Page 84:  · Quiet evolution: mixing heterogeneous and homogeneous – Communications are key: Network-on-Chip (NoC) – Control distribution Revolution ? – Dynamic adaptation through reconfiguration

- 84 -

Thank you


Recommended