+ All Categories
Home > Documents > Riconfigurable Computing

Riconfigurable Computing

Date post: 29-Jan-2016
Category:
Upload: salvatore-girone
View: 8 times
Download: 0 times
Share this document with a friend
Description:
Riconfigurable Computing
Popular Tags:
36
Reconfigurable Computing
Transcript
Page 1: Riconfigurable Computing

Reconfigurable Computing

Page 2: Riconfigurable Computing

Reconfigurable Computing

Needs and Architectures

Programming

Reconfigurable Machines

Examples of application

Agenda 1

Page 3: Riconfigurable Computing

Evolution in practice

• 500 FLOPS

• 19000 vacuum tubes, 1500

relays, 100000 resistors,

capacitors, crystal diodes and

inductors

• 63m2, 175KW

• 0.0029 FLOPS/Watt

1946 - ENIAC

Source: http://msrb.wordpress.com/2007/12/04/energy-dinosaurs/

2011 – NVIDIA Tegra 3

• 12 GFLOPS

• TSMC 40nm, 4 Cortex A9,

GPUs, …

• 80mm2, few Watt

• few GFLOPS/Watt

2

Page 4: Riconfigurable Computing

I-Phone Teardown 3

Page 5: Riconfigurable Computing

Apple A4 4

• System-in-Package SoC designed by Apple, manufactured by SAMSUNG in 45 nm

• Single application processor with:

• ARM Single-Core Cortex-A8 CPU • PowerVR GPU [SGX543MP2]

(Imagination Technologies) • Two 256Mb DDRAM chips (stacked)

• Used in: IPad Tablet (1 Ghz), IPhone4, IPod Touch 4th generation

Page 6: Riconfigurable Computing

Samsung Exynos 3110 / HummingBird

• Multichip, System-in-Package SoC designed/manufactured by Samsung in 45nm

• Single application processor with:

• ARM Cortex-A8 CPU (Same layout as A4)

• PowerVR SGX540 GPU • DDR Memory On-Package

• Used in Samsung Galaxy S, Google Nexus S, Samsung Galaxy Tab, Enspert Identity, Meizu M9 others Tablets, …

5

Page 7: Riconfigurable Computing

Nvidia Tegra 6

• Multichip, System-in-Package SoC designed by Nvidia, manufactured by TSMC 40nm

• One single application processor containing:

• Two ARM Dual-Core Cortex-A9 CPU

• Portal Player audio decoder

• Nvidia GPU Proprietary Graphic processors

• DDR memory in-package

• Used in Zune/Sony Samsung Mp3 players, LG Optimus, Motorola Atrix, Samsung Galaxy SmartPhones, Lenovo and others Tablets, …

Page 8: Riconfigurable Computing

Electronic Devices Market Description

SoC design is a continuous trade-off between semiconductors market demands/needs and architectural/circuital solutions

7

• Low Performance (MIPS)

Device too slow, customer

unhappy

• Power Dissipation (mW)

Battery degradation too fast,

customer unhappy

• Soft Errors due to Heat, power

peaks, Electromagnetic Fields

Unexpected behavior, customer

unhappy

• Evolution of wireless services offered by

handheld devices • Phone, Wi-Fi, GPS, Sound, Video, PDA

• Evolution of Communication and

Multimedia storage formats • GSM / GPRS / UMTS-HSDPA / LTE

• 3G, Wi-Fi / Wi-Max

• (All Audio – Video Technologies )

Page 9: Riconfigurable Computing

Trade-Off: programmable processor vs

ASIC 8

Spatial Computation

(ASIC)

Temporal Computation

(Processor)

Ax2 + Bx + C (Ax + B)x + C

Page 10: Riconfigurable Computing

Programmable Processors:

Drawbacks 9

Shannon’s Law: Algorithm

Requirements (GOPS)

Moore’s Law: Available

Computational Power

(GOPS, GOPS/mW)

year

s

Battery

Capacity

Increase the “Gap” between available computational power and computational requirement of recent algorithms/protocols

Page 11: Riconfigurable Computing

Asic: Drawbacks

• Exponential increase of technology costs

• (Design)

• Masks

• Verification

• Test

• Reliability and yield of products rapidly decrease

10

Page 12: Riconfigurable Computing

Application Specific Standard Processor

Template 11

µP

(ARM/PowerPC)

Interconnect

IO ASIC CORES

Memory

• Standard Template for ASSP processor (Application-Specific Standard Processor)

Page 13: Riconfigurable Computing

Programmability on System-on-Chip

• Programmable Hardware allow to increase volumes of SoC,

reducing NREs

• Programmable Hardware increase the products lifetime

• Programmable Hardware has a negative impact on area & power,

thus reducing product margins

12

End of product

lifetime

Margin ($)

Time

Asic oriented SoC

Programmable HW –

oriented SoC

Page 14: Riconfigurable Computing

Application Specific Standard Processor

Template (2) 13

µP

(ARM/PowerPC)

Interconnect

IO ASIC CORES

Memory

• Adding DSP specialized, we can reduce the “computational

pressure” on the microprocessor, reducing the “full-ASIC” percentage

on the system thus leaving hardware reconfigurability

DSPs

Page 15: Riconfigurable Computing

Reconfigurable Processors

• Reconfigurable Processor is special DSP architecture programmable

at execution time and base on hardware reconfiguration:

• Fine-Grain: architectures, as FPGAs, based on ~1/4-bit parallelism and typically

featuring LUTs

• Coarse-Grain: parallel architectures based on ~8/32-bit hardwired blocks (ALU,

Mult, MAC, …)

• Processor Arrays: Architectures obtained interconnecting a set of simple/small

processors featuring 8/16/32-bits datapath

14

Page 16: Riconfigurable Computing

Soluzioni ASIC-oriented: Configurable Processors

• Es: Xtensa by Tensilica: Microprocessore “configurabile” basati su

instruction set extension

15

Conventional

Dpath

Application

Specific

Dpath

Regis

ter file

• Problema 1: realizzare Compilatori “Re-targetable”

• Problema 2: Come definire la accelerazione ideale (Granularita della

accelerazione)

• Problema 3: Come portare sufficienti dati al dpath per sfruttare la capacita di

calcolo

Instruction

Decode

Page 17: Riconfigurable Computing

Application Specific Standard Processor

Template (3)

• Processoformizzazione del SoC: Allo scopo di aumentare il segmento di mercato di un

determinato prodotto, e di facilitare I bug-fix spostandoli da hardware a software, sempre piu’

tasks sono migrati da blocchi ASIC a processori/DSP, o architetture processor-oriented

16

µP

(ARM/PowerPC)

Interconnect

DSPs ASIC CORES

Processor

PIPELINE

ASIC CORES /

RECONFIGURABLE CORES

Memory,

IO

Page 18: Riconfigurable Computing

FPGAs 17

FPGA (Field Programmable Gate-Arrays:

Permettono computazione “spaziale” ma mantengono programmabilita’ a run-time

Richiedono un overhead ~100 in area e power, ~10 in Timing

Richiedono flussi di progetto HDL, non familiare a sviluppatori di applicazioni (C/C++/Java/Matlab)

Page 19: Riconfigurable Computing

Soluzioni FPGA-oriented: Re-Configurable Processors

• Es: Microprocessore “configurabile” basati su instruction set

extension

18

Instruction Decode

Conventional

Dpath

eFPGA

Regis

ter file

• Problema 1: realizzare Compilatori

“Re-targetable”

• Problema 2: Come definire la

accelerazione ideale (Granularita

della accelerazione)

• Problema 3: Come portare

sufficienti dati al dpath per sfruttare

la capacita di calcolo

• Problema 4: Area Overhead di

eFPGA rispetto ad ASIC

Page 20: Riconfigurable Computing

Run-time programmable processors:

DREAM

• µP Risc per gestire controllo,

interrupt, configurazione e

regolare il flusso di dati/istruzioni

• eFPGA per realizzare unita’

funzionali customizzabili

• Memoria ad accesso parallelo

con “DMA” programmabili per

offrire massimo parallelismo

(MIMD)

19

… … … …

… … … …

PiCoGA

Address Generators

Interconnect Cross-Bar

High-Bandwidth Memory Bank Registers

µP

STxp70

Contr

ol In

terf

ace

Page 21: Riconfigurable Computing

Modello di Calcolo DREAM 20

…..

Set_conf

…..

Set_io, Set_df

…..

for(i=0;i<N;i++)

Execute ID

…...

……

……

Unset_conf

IO Banks

Contro

l Risc

Pro

cesso

r

PiCoGA

Page 22: Riconfigurable Computing

Reconfigurability overhead di FPGAs 21

Switch Switch

Switch Switch

L

L L

L L

L

L L L

• Giallo: Area

effettivamente usata in

Computazione (LUT)

• Verde/Rosso: Routing e

configurazione

Page 23: Riconfigurable Computing

CGRA: Coarse Grained Reconfigurable Architecture

23

Reconfigurable Interconnect

Fabric

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

Le LUT sono sostituite da operatori aritmetici piu’ complessi Uso piu’ efficiente delle risorse di interconnessione Minore flessibilita’ del calcolo (Operazioni a 1 bit impiegano ALU ad 8/32 bits

ALU

16 8 32

24

2

4

ALU

Page 24: Riconfigurable Computing

MorphoSys RC Array

24

SIMD Model: Tutte le celle di

una riga eseguono la stessa

operazione a 128-bit.

MIMD Model: Ogni cella

esegue una operazione

indipendente

Page 25: Riconfigurable Computing

Processore Reconfigurabile Coarse Grain: Pact Xpp

• Xpp e’ un array di elementi coarse grain (PAE) ed una rete di interconnessione a pacchetto.

• La computazione e’ distribuita sui diversi elementi, ognuno dei quali calcola quando ha a disposizione I dati necessari

25

Page 26: Riconfigurable Computing

Pact XPP Computing Model

Modello misto a due livelli:

• Linguaggio ad eventi per descrivere la sincronizzazione tra PAE

• Assembly per I singoli PAE

• Esiste un compilatore da C ma offre risultati ancora deludenti

26

Page 27: Riconfigurable Computing

Processor Arrays

• Multi-Processor System-on-Chip

• Insieme di processori interconnessi da una rete on-chip

• Si possono considerare una sorta di CGRA in cui il PE e’ composto da un

processore

• Permettono di riusare concetti noti [multi-thread, sockets, process scheduling]

riportando il parallelismo ad un ambiente di programmazione standard

• Supporto di OS

• Uso di C come strumento di computazione

• Supporto di sistemi di comunicazione noti

27

Page 28: Riconfigurable Computing

Processor Arrays: PicoChip Processor 28

I/O

I/O I/O

I/O

External Memory

Array Processing Element

Switch Matrix

Inter-picoArray Interface

• 322 PE composti da piccoli processori a 16 bit a memoria distribuita

• Architettura eterogenea, con 4 diversi tipi di PE

• Standard (STAN)

• Multiply-accumulate (MAC)

• Memory (MEM)

• Control (CTRL)

• Interconnection fabric deterministica basata su un modello a divisione di tempo (TDM)

• Utilizzato commercialmente per base stations

Page 29: Riconfigurable Computing

Tilera Tile Processor (RAW / Tilera)

• Processor mesh:

• 2-D Array di cores omogenei

• Basic block: general-purpose processor core + switch connesso alla 2-D network on-chip

29

Page 30: Riconfigurable Computing

SoC Communication: Bus vs Networks-on-

Chip 30

M1 M2

S1 S2 B1 S3

M1.1

S1.1 S1.2

M=Master, S=Slave, B=Bridge

Tipicamente per ogni bus M<4, S+B<16

T1 T2

T3

I2 I1

T=Target, I=Initiator

Page 31: Riconfigurable Computing

MPSoC Communication: Networks-on-Chip

• Sistema di comunicazione improntato alla massima scalabilita’

• Basato su 2 componenti:

• Router:Instradamento di pacchetti tra N ingressi e M uscite)

• Network Interface: Connessione tra isola di calcolo e NoC)

• Link: Bus che unisce i routers

31

NI

router

IP

IP

NI

physical

link

network

transport

application

router

link

link

Page 32: Riconfigurable Computing

MPSoC: Clock, Power & Workload

Management

• Multi-Processor System-on-Chip:

Dal punto di vista della implementazione, permettono di sviluppare

isole di calcolo indipendenti:

• GALS: Globally Asynchronous Locally Synchronous Design (alberi di clock molto

piu’ brevi, distribuzione regolare del consumo di potenza)

• Power Management: SD, DVFS Dynamic Voltage and Frequency Scaling

(Cambiare Voltaggio e frequenza di un processore a seconda del suo carico di

lavoro)

• Redundancy: Workload distribution, Failure recovery (Migrare un task da un

processore guasto o troppo impegnato ad un vicino)

32

Page 33: Riconfigurable Computing

Morpheus

• Processore Riconfigurabile ETEROGENEO NoC-based, con domini

di frequenza indipendenti per ogni core

33

PACT

XPP DREAM

ARM9 On-Chip

Memory

NoC

Data Interface

Conf. Interface

Data Interface

Conf. Interface

eFPGA

M2K

Data Interface

Conf. Interface

Data + Configuration

IO

PACT XPP-III

DREAM

e F P G A ARM

M E M

M E M

Page 34: Riconfigurable Computing

ManyAC Architecture 34

µP MEM

MEM

MEM

MEM MEM

µP

MEM

MEM

MEM

µP

µP µP

µP

µP

µP ASIC

ASIC

ASIC

ASIC ASIC

ASIC

ASIC

ASIC

µP

MEM

COMPUTATIONAL TILE ct controller

cluster controller

• Dispositivo basato su

REGULAR

HETEROGENEITY,

array di celle con

identica struttura ma

accelerazione ASIC

customizzabile

Page 35: Riconfigurable Computing

Conclusione: Reconfigurable Processors

• Si definiscono RP architetture di calcolo che sfrutta hardware configurabile per aggiungere istruzioni application-specific al SET standard

• Design-Time programmable

• Configurable Processors

• Run-time programmable

• Risc + FPGA

• Risc + CGRA

• Microprocessor Arrays

• Sono necessari, per estendere la quota di mercato di un dispositivo fino a coprire i costi di progettazione

• Portano overhead hardware (Area, Timing)

• Causano problemi di produttivita’ e porting del Software

• Per questo, si vanno affermando soprattutto nel contesto di sistemi multiprocessore “regolarmente

eterogenei”

35

Page 36: Riconfigurable Computing

Acronyms • ASIC Application Specific Integrated Circuits

• ASSP Application Specific Standard Product

• SoC System-on-Chip

• SiP System-in-Package

• NRE Non Recurring Engineering (costs)

• TTM Time to market

• FPGA Field Programmable Gate Array

• LUT Lookup-table

• CGRA Coarse Grained Reconfigurable Architecture

• PE Processing Element (In CGRA)

• RP Reconfigurable Processor

• RISP Reconfigurable Instruction Set processor

• SIMD Single Instruction Multiple Data

• MIMD Multiple Instruction Multiple Data

• PA Processor Array

• MPSoC Multi-Processor System-on-Chip

• GALS Globally Asynchronous Locally Synchronous

• NoC Network-on-Chip

• DVFD Dynamic Voltage and Frequency Scaling

36


Recommended