+ All Categories
Home > Documents > Compilationof ParametricDataflowApplicationsfor...

Compilationof ParametricDataflowApplicationsfor...

Date post: 11-Apr-2018
Category:
Upload: duongthuan
View: 215 times
Download: 1 times
Share this document with a friend
58
Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs PhD work of Mickael Dardaillon Mickaël Dardaillon, Kevin Marquet (Citi), Tanguy Risset (Citi), Jérôme Martin (Cea Leti), Henri-Pierre Charles (CEA List) June 24th, 2016
Transcript

Compilation ofParametric Dataflow Applications for

Software-Defined-Radio-Dedicated MPSoCsPhD work of Mickael Dardaillon

Mickaël Dardaillon,Kevin Marquet (Citi), Tanguy Risset (Citi),

Jérôme Martin (Cea Leti), Henri-Pierre Charles (CEA List)

June 24th, 2016

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Evolution of telecommunication protocols

2G

3G

Wi-Fi

Bluetooth10

1000

1000000

100

10000

100000

1990 1995 20052000 2010

2G

3G

4G

data rate

(kbps)

year

BluetoothWi-Fi

2 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Evolution of telecommunication protocols

SDR

10

1000

1000000

100

10000

100000

1990 1995 20052000 2010

2G

3G

4G

data rate

(kbps)

year

BluetoothWi-Fi

2 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

4G LTE-Advanced: Downlink

0 1 2 3 4 5 6 7 8 9

1 sub-frame (1 ms)

1 frame (10 ms)

I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW

3 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

4G LTE-Advanced: Downlink

0 1 2 3 4 5 6 7 8 9

1 sub-frame (1 ms)

1 frame (10 ms)

Control

User 1

User 2

User 3

Data

...

14 OFDM Symbols

20

48

sub

ca

rriers

(20

MH

z)

I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW

3 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

4G LTE-Advanced: Downlink

0 1 2 3 4 5 6 7 8 9

1 sub-frame (1 ms)

1 frame (10 ms)

Control

User 1

User 2

User 3

Data

...

14 OFDM Symbols

20

48

sub

ca

rriers

(20

MH

z)

I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW

3 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?

Baseband processing insoftware

I ZigBeeI . . .I LTE Advanced

ConstraintsI Computing power ∼ GFLOPSI Reconfiguration time < 100µsI Consumption < 500mW

Architecture independent SDRsoftware

RF Frontend 1

AGC + synchronization

FFT

CFOestimation

CFOcorrection

channelestimation

RF Frontend 2

FFT

CFOcorrection

MIMO decoding

Demodulation

Deinterleaving

Error correction

4 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?

Baseband processing insoftware

I ZigBeeI . . .I LTE Advanced

ConstraintsI Computing power ∼ GFLOPSI Reconfiguration time < 100µsI Consumption < 500mW

Architecture independent SDRsoftware

RF Frontend 1

AGC + synchronization

FFT

CFOestimation

CFOcorrection

channelestimation

RF Frontend 2

FFT

CFOcorrection

MIMO decoding

Demodulation

Deinterleaving

Error correction4 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?What is an SDR hardware platform?

I EVP16?

I VLIWI Vector Processor

I SB3500?

I DSPI Control

Processor

I Magali?

I ConfigurableUnits

I NoC

I . . .

⇒ No unified hardware platformmodel for SDR.

Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?

5 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?What is an SDR hardware platform?

I EVP16?I VLIWI Vector Processor

I SB3500?

I DSPI Control

Processor

I Magali?

I ConfigurableUnits

I NoC

I . . .

Vector Processing for Software-Defined Radio 2619

Prog

ram

mem

ory

VLI

Wco

ntro

ller

AC

U

· · · · · ·

Vector FU

Vector register file

Vector memory

P words wide 1 word wide

Scalar RF

Scalar FU

Figure 6: A generic vector-processor architecture.

Prog

ram

mem

ory

VLI

Wco

ntro

ller

AC

U

8 words wide 1 word wide

· · · · · ·

Vector memory

4 vector registers

Load/store

ALU

MAC

Shift

4 scalar regs.

Load/store

ALU

MAC

Shift

Figure 7: The OnDSP architecture.

(iii) The VLIW execution model supports parallelismamong multiple vector functional units (FUs), for ex-ample, MAC, ALU. This VLIW parallelism comes inaddition to vector parallelism (R3).

(iv) On top of that a VLIW instruction may also specify sev-eral operations on scalar functional units (R4).

(v) To keep many functional units busy, there is ex-tensive support for address calculations (ACUs, e.g.,postincrement, modulo) and for zero-overhead loop-ing (R4).

Compared to other programmable architectures, SIMD ex-ecution results in low power consumption (R8), becausethe “overhead” of address calculations, address decoding, in-struction fetching/decoding, and control is shared by P oper-ations. A similar reasoning holds for silicon area per MOPS.

With the above in common, two vector processor in-stances have been developed within Philips: OnDSP targetingWLAN, and EVP targeting 3G and beyond.

4.1. OnDSPThe OnDSP vector processor is a key component of severalmultistandard programmable wireless LAN baseband prod-uct ICs [15]. The application of vector processing to WLANwill be addressed in Section 6.1.

The OnDSP architecture is depicted in Figure 7. Thevector size equals P = 8 (128 bits). A single VLIW in-struction can specify a number of vector operations, forexample, load/store, ALU, MAC, address calculations, and

Prog

ram

mem

ory

VLI

Wco

ntro

ller

AC

U

16 words wide 1 word wide

· · · · · ·

Vector memory

16 vector registers

Load/store unit

ALU

MAC/shift unit

Shu!e unit

Intravector unit

Code generation unit

32 scalar regs.

Load/store U

ALU

MAC U

AXU

Figure 8: The EVP architecture.

loop-control ((R3), (R4)). OnDSP supports a couple of spe-cific vector instructions, including word insertion/deletion,sliding, and gray coding/decoding. Data addresses must bea multiple of P. Program code is compressed vertically(“tagged VLIW” [16]).

In a 0.12 µm CMOS process, OnDSP measures about1.5 mm2 (250 kgates), runs 160 MHz (worst-case commer-cial), and dissipates about 0.8 mW/MHz including a typicalmemory configuration (R8). A macroassembler is used forVLIW scheduling, although optimization by hand is used forcritical code.

4.2. EVP

The EVP (embedded vector processor) is a productized ver-sion of the CVP [7]. Although originally developed to sup-port 3G standards, the current architecture proves to behighly versatile. Care has been taken to cover the OnDSP ca-pabilities for OFDM standards.

The EVP architecture is depicted in Figure 8. The mainword width is 16 bits, with support for 8-bit and 32-bit data(R1). The EVP supports multiple data types, including com-plex numbers (R1). For example, a complex vector multipli-cation uses P multipliers to multiply 1/2p complex numberseach two clock cycles.

The SIMD width is scalable (R2), and has been set toP = 16 (256 bits) for the first product instance EVP16. Themaximum VLIW-parallelism available equals five vector op-erations plus four scalar operations plus three address up-dates plus loop-control. Specific FUs of the EVP include thefollowing ((R3), (R4)).

(i) The shu!e unit can be used to rearrange the elementsof a single vector according to an arbitrary pattern(R5).

(ii) The intravector unit supports operations such as add(or take the maximum of) the elements of a single vec-tor, possibly split in, M segments of P/M elements each(R6), with M a power of 2.

⇒ No unified hardware platformmodel for SDR.

Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?

5 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?What is an SDR hardware platform?

I EVP16?I VLIWI Vector Processor

I SB3500?I DSPI Control

Processor

I Magali?

I ConfigurableUnits

I NoC

I . . .

IEEE SIGNAL PROCESSING MAGAZINE [26] MARCH 2010

Each of the three Sandblaster cores has support for SIMD instructions and thus it can exploit the DLP available in the application. Because the platform consists of three data processing cores, inter-TLP among the different tasks in the application can be also exploited on the platform. Each Sandblaster core also offers a fine-grain intra-TLP inside a single core. This intracore parallelism is also referred to as “token triggered threading” (T

3), which is a form of simultaneous multithreading (SMT). Support for SMT allows the core to switch between different threads and

their contexts quickly. However, the Sandblaster core has only limited ILP where only four instructions can be executed in parallel.

INFINEON MUSICInfineon’s MuSIC-1 platform [9] is a heterogeneous multicore platform that consists of various accelerators along with four programmable cores. Each of these four programmable cores pro-vides DLP and is used for the inner modem PHY processing with the help of filter accelerators. The turbo/Viterbi accelerators are used for performing the outer modem PHY processing. The block diagram of the platform is depict-ed in Figure 6.

The multicore nature of the MuSIC-1 platform supports intercore TLP, which allows the mapping of different tasks on different cores. Similar to Sandbridge, the ILP inside a single core is limited.

ST-ERICSSON EXTREME VECTOR PROCESSOR PLATFORMThe extreme vector processor (EVP) [13] consists of 16-wide SIMD processor with five issue slots. Three of the five slots operate on vector data and two operate on scalar data. This processor exploits both data- and instruction-level parallel-ism in the application. However, not much public information is available on the complete platform architecture and how many cores would be needed to sup-port a wireless standard.

ARM/UNIVERSITY OF MICHIGAN’S ARDBEG PLATFORMARM/University of Michigan’s Ardbeg platform [14] consists of three proces-sor cores. Two cores are allocated for baseband processing and one core for control. The platform also consists of a

turbo coprocessor for outer-modem processing (see Fig-ure 7). The platform enables TLP to be exploitable between the four functional blocks (control processor, two baseband cores, and a turbo accelerator). Each of the baseband cores is 512-b wide and is capable of performing 64-way, 32-way, and 16-way SIMD on 8-b, 16-b, and 32-b data, respectively. However, the baseband core does not allow a large amount of ILP inside the core. The baseband processor is also used to perform certain outer-modem functionality such as Viterbi decoding.

Core 3iCache

SBXMemory

SHB

Core 2iCache

SBXMemory

SHB

IO andOther

Interfaces

IO andOther

Interfaces

IO andOther

Interfaces

Core 1iCache

SBXMemory

SHB

HSN 4 AMBA

ARMIO

Subsystem

MemorySubsystem

DMA

DeviceController

Buses

HSN

SBXComplex

[FIG5] Sandbridge SB3500 platform architecture.

VLIW CU

Global PRFGlobal DRF

ICac

he

Con

figur

atio

n M

emor

ies

VLIW ViewInst. Fetch

Branch ctrlInst. Dispatch

DMEM

CGA View

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FU FU FU FU

Mode ctrlCGA and VLIW

VLI

W S

ectio

n

CG

A S

ectio

n

[FIG4] IMEC’s ADRES processor in the BEAR platform.

⇒ No unified hardware platformmodel for SDR.

Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?

5 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?What is an SDR hardware platform?

I EVP16?I VLIWI Vector Processor

I SB3500?I DSPI Control

ProcessorI Magali?

I ConfigurableUnits

I NoC

I . . .

OFDMofdm1

OFDMofdm2

OFDMofdm3

OFDMofdm4

TURBOturbo

DEMODdemod

MODmod

LDPCldpc

WIFLEXwiflex

ARMarm

80518051

DMAdma2

DMAdma3

DMAdma1

DMAdma4

DMAdma5

DSPdsp2

DSPdsp3

DSPdsp5

DSPdsp4

DSPdsp1

⇒ No unified hardware platformmodel for SDR.

Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?

5 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

ContextWhat is an SDR software?What is an SDR hardware platform?

I EVP16?I VLIWI Vector Processor

I SB3500?I DSPI Control

ProcessorI Magali?

I ConfigurableUnits

I NoCI . . .

⇒ No unified hardware platformmodel for SDR.

Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?

5 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Magali SDR

LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW

6 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Magali SDR

DSPdsp2

DSPdsp3

DSPdsp5

DSPdsp4

DSPdsp1

LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW

6 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Magali SDR

OFDMofdm1

OFDMofdm2

OFDMofdm3

OFDMofdm4

TURBOturbo

DEMODdemod

MODmod

LDPCldpc

WIFLEXwiflex

DSPdsp2

DSPdsp3

DSPdsp5

DSPdsp4

DSPdsp1

LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW

6 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Magali SDR

OFDMofdm1

OFDMofdm2

OFDMofdm3

OFDMofdm4

TURBOturbo

DEMODdemod

MODmod

LDPCldpc

WIFLEXwiflex

DMAdma2

DMAdma3

DMAdma1

DMAdma4

DMAdma5

DSPdsp2

DSPdsp3

DSPdsp5

DSPdsp4

DSPdsp1

LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW

6 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

OutlineContext

SDR software?

Programming Model for SDRDataflow Model of ComputationInput Format

Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling

Experimentations on MagaliCode GenerationExperimental Results

Conclusion

7 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

State of the Art in SDR ProgrammingImperative Concurrent

Platform LanguageExoCHI [Wang et al., 07] OpenMP + CBEAR [Derudder et al., 09] Matlab + C

Dataflow

Platform LanguageSimulinkLabViewGNU Radio Python + CRVC-CAL [Lucarz et al., 08] XML + CDiplodocusDF [Gonzalez-Pina et al., 12] UMLMAPS [Castrillon et al., 13] C like

8 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Static Dataflow (SDF) [Lee et al., 87]

Decod1Src110

Ctrl10 1 1

9 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Phase Approach with Static Dataflow

...

Decod2 Sink1 10100 10

Src2

Decod2 Sink2 10100 10

Src2

Decod2 Sink3 10100 10

Src2

Decod1Src110

Ctrl10 1 1

10 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Dynamic Dataflow (DDF) [Buck, 93]

SDF DDF

Analysable Expressive

KPN

Scenario Aware DataFlow (SADF) [Theelen et al., 06]Mode Controlled DataFlow (MCDF) [Moreira et al., 12]Schedulable Parametric DataFlow (SPDF) [Fradet et al., 12]Parameterized and Interfaced dataflow Meta-Model (PiMM)[Desnos et al., 13]Boolean Parametric DataFlow (BPDF) [Bebelis et al., 13]

Kahn Process Network (KPN) [Kahn, 74]

11 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Dynamic Dataflow (DDF) [Buck, 93]

SDF DDF

Analysable Expressive

KPNMCDFSPDF BPDFSADFPiMM

Scenario Aware DataFlow (SADF) [Theelen et al., 06]Mode Controlled DataFlow (MCDF) [Moreira et al., 12]Schedulable Parametric DataFlow (SPDF) [Fradet et al., 12]Parameterized and Interfaced dataflow Meta-Model (PiMM)[Desnos et al., 13]Boolean Parametric DataFlow (BPDF) [Bebelis et al., 13]Kahn Process Network (KPN) [Kahn, 74]

11 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Schedulable Parametric DataFlow (SPDF)

Decod1

Src

10

10 1 1 Ctrl

[Fradet et al., 12]I Model of ComputationI AnalysisI Quasi-Static Scheduling

12 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Schedulable Parametric DataFlow (SPDF)

Decod1

Src

10

10 1 1 Ctrlset p[1]

Sinkp 10

100

10

p

Decod2

[Fradet et al., 12]I Model of ComputationI AnalysisI Quasi-Static Scheduling

...

12 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Parametric DataFlow Format (PaDaF)

Decod1

Decod2

Src

10 set p[1]

Sink

10 1

p

1

10

100

10

p

Ctrl

Actor specification

class Decod: public Actor{PortIn<int> in;PortOut<int> out;ParamIn p;void compute() {[...]out.push(res, p);

Graph specification

Src src;Decod decod[2];[...]for(int i=0; i<2; i++) {

decod[i].in <= src.out[i];

13 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Front End ImplementationFront End

PaDaF

(C++)

C++ Front End

(CLang)

LLVM IR

Graph

Construction

Graph +

LLVM IR

SDR Programming ModelI Propose SPDF for SDRI C++ input formatI [IWCMC 12, IGI 14]

Front EndI Based on LLVM frameworkI Derived from SystemC analysis

[Marquet et al., 10]I Static graph structureI [CASES 14]

14 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

OutlineContext

SDR software?

Programming Model for SDRDataflow Model of ComputationInput Format

Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling

Experimentations on MagaliCode GenerationExperimental Results

Conclusion

15 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SDF Scheduling

Decod1

Decod2

Src10

Sink

10 1 1

10100

10

Ctrl

5

Iteration vector:(Src; Decod1; Ctrl; (Decod2)10; (Sink)5

)

16 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Scheduling [Fradet et al., 12]

Decod1

Decod2

Src

10 set p[1]

Sink

10 1

p

1

10

100

10

p

Ctrl

Iteration vector:(Src; Decod1; Ctrl; (Decod2)10; (Sink)p

)

17 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Mapping

Decod1

Decod2

Src

10

Ctrl

set p[1]

Sink

10 1

p

1

10

100

10

p

DEMODdemod

ARMarm

DMAdma2

DMAdma1

18 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Mapping

demod dma2

dma1

arm

Decod1

Decod2

Src

10

Ctrl

set p[1]

Sink

10 1

p

1

10

100

10

p

DEMODdemod

ARMarm

DMAdma2

DMAdma1

18 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Quasi-Static Scheduling

Decod1

Decod2

Src

10

Ctrl

set p[1]

Sink

10 1

p

1

10

100

10

p

demod dma2

dma1

arm

S(dma1) = (Src)S(arm) = (Ctrl; set(p))

S(demod) =(Decod1;get(p); (Decod2)10

)S(dma2) = (get(p); (Sink)p)

19 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Symbolic Execution

Time

arm

demod

dma2

dma1

D1 (D2)10

Src

Ctrl

(Sink)p

S(dma1) = (Src)S(arm) = (Ctrl; set(p))

S(demod) =(Decod1;get(p); (Decod2)10

)S(dma2) = (get(p); (Sink)p)

20 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Buffer Sizing

Decod1

Decod2

Src

10

Ctrl

set p[1]

Sink

10 1

p

1

10

100

10

p

demod dma2

dma1

arm

[100]

[10][1]

[10*pmax]

Problem: overestimates buffer size

e.g. MagaliI FFT size: 2048I Buffer size: 16

21 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

SPDF Model Refinement

Decod1

Decod2

Src

10

Ctrl

set p[1]

Sink

10 1

p

1

10

100

10

p

demod dma2

dma1

arm

[10]

[10][1]

[pmax]

Src::compute() {[...]out[1].push(ctrl, 10);for(int i=0; i<10; i++)

out[2].push(data[i],10);}

Idea: model each individualdata communication

I Micro-Scheduling

22 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Micro-Scheduling: an Example

Time

arm

demod

dma2

dma1

D1 (D2)10

Src

Ctrl

(Sink)p

µS(Src) =(pushSrc,D1

(10);pushSrc,D2(10)10

)µS(D2) =

(popSrc,D2

(10);pushD2,Sink(p))

µS(Sink) =(popD2,Sink(1)

10)

23 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Buffer Sizing Verification

How to verify buffer sizes using micro-schedules?

Proposed Verification MethodI Based on Model CheckingI Derived from buffer minimization [Geilen et al., 05]

ModelI ScheduleI Buffer sizes+ Micro-Schedule+ Parameter values

Model CheckerI SPINI Check for deadlocks

24 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Buffer Sizing Verification

How to verify buffer sizes using micro-schedules?

Proposed Verification MethodI Based on Model CheckingI Derived from buffer minimization [Geilen et al., 05]

ModelI ScheduleI Buffer sizes+ Micro-Schedule+ Parameter values

Model CheckerI SPINI Check for deadlocks

24 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Micro-Scheduling Implementation

Mapping

Scheduling

Front End Back End

PaDaF

(C++)

C++ Front End

(CLang)

LLVM IR

Construction

Graph

Graph +

LLVM IR

Buffer Verification

(SPIN)

Micro-SchedulingI SPDF model

refinementI Sequential

communications

Buffer VerificationI Model checkingI Model generationI [CASES 14]

25 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

OutlineContext

SDR software?

Programming Model for SDRDataflow Model of ComputationInput Format

Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling

Experimentations on MagaliCode GenerationExperimental Results

Conclusion

26 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Code GenerationGraph +

LLVM IR

OFDM

OFDMofdm1

DMAdma2

DMAdma3

DMAdma1

DMAdma4

DMAdma5

DSPdsp2

DSPdsp3

DSPdsp5

DSPdsp4

OFDMofdm2

OFDMofdm3

OFDMofdm4

ARMarm

80518051

LDPCldpc

WIFLEXwiflex

TURBOturbo

DSPdsp1

MODmod

DEMODdemod

DEMOD code generationcontrol

code generationcommunication

(C)

Control code

(ASM)

Magali code

code generationARM

TURBO

DSP

DMA

code generationARM

27 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Benchmarks using LTE

OFDM: compilation

Sink4200600

ofdm1 dma3

FFTSrc7168

dma1

Defram1024 1024 1024

Demodulation: communications

dma2

Src

Src DeinterBit

Depunct DecodTurbo

Demap DeinterWord

Sink

1200

1200 1200

1200900 900 900

900

900 300 1353 1353

57

57

dma1 demod

dma3

turbo

dma4

28 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Benchmarks using LTE

Parametric Demodulation: parameter

dma2

Src DeinterBit

Depunct DecodTurbo

Split Demap DeinterWord

set p[1]Control

Src DeinterBit

Depunct DecodTurbo

Split Demap DeinterWord

Sink

1440

1440

240 240

240

240

60 60 60

60 60 30 93 93 4

8

pp

1440

1440

1200

1200

1200

1200300p 300p 300p

300p

300p 300 1353 1353

57

57

dma1 demod

dma3

turbo

dma4

arm

29 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Results: Estimated Development Time

Compiler DevelopmentI Front-End : 4 man-monthsI Back-End : 8 man-months

Native PaDaFApplication C / ASM (#lines) (hours) C++ (#lines) (hours)OFDM 150 / 200 40 60 1Demodulation 300 / 600 160 160 4Param. Demod. 500 / 800 480 260 8

Takeaway Message:Reduces development time

30 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Results: Buffer Verification Time

Evaluation frameworkI 2.4 GHz Intel Core i5, 8 GB RAM, OS X 10.9.2.I SPIN Model Checker

Application States Transitions Exec. Time (s)

OFDM 1.28× 104 2.56× 104 0.1Demodulation 2.12× 106 1.07× 107 9Param. Demod. 6.07× 107 2.22× 108 199

Takeaway Message:Reduces development time, improves verification

31 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Results: Execution Time

Evaluation frameworkI SystemC TLM based on 65 nm CMOS implementationI ARM code run on QEMU Virtual Machine

Application Native Generated

Optimized

(µs) (µs)

(µs)

OFDM 149 168 (+13%)

149 (+0%)

Demodulation 180 283 (+57%)

180 (+0%)

Param. Demod. 419 558 (+33%)

288 (-31%)

Takeaway Message:Reduces development time, improves verification

, maintainsperformances

32 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Execution ModelSink

4200600

ofdm1 dma3

FFTSrc7168

dma1

Defram1024 1024 1024

Phase Approach

Time

dma1

ofdm1

dma3

arm

Distributed

Time

dma1

ofdm1

dma3

arm

33 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Execution ModelPhase Approach

Time

dma1

ofdm1

dma3

arm

25 µs 37 µs 16 µs 21 µs

Distributed

Time

dma1

ofdm1

dma3

arm

25 µs

74 µs 25 µs23 µs

33 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Results: Execution Time

Evaluation frameworkI SystemC TLM based on 65 nm CMOS implementationI ARM code run on QEMU Virtual Machine

Application Native Generated Optimized(µs) (µs) (µs)

OFDM 149 168 (+13%) 149 (+0%)Demodulation 180 283 (+57%) 180 (+0%)Param. Demod. 419 558 (+33%) 288 (-31%)

Takeaway Message:Reduces development time, improves verification, maintainsperformances

34 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Back End Implementation

Mapping

Scheduling

Front End Back End

PaDaF

(C++)

C++ Front End

(CLang)

LLVM IR

Graph

Construction

Graph +

LLVM IR

Buffer Verification

(SPIN)

Code

Generation

MPSoC Code

(ASM)

Magali SupportI ComputationI CommunicationI Control

LTE ExperimentationI Performance close

to nativeI Buffer verificationI Central controllerI [ComPAS 14,

CASES 14]

35 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

OutlineContext

SDR software?

Programming Model for SDRDataflow Model of ComputationInput Format

Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling

Experimentations on MagaliCode GenerationExperimental Results

Conclusion

36 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Conclusion

I A complete research experience betweenTelecommunication, SoC programming sqand Compilation.

I Mickael Dardaillon is currently working at NationalInstruments (Austin) on the compilation ParametricDataflow in LabView-FPGA([email protected])

I CEA has stopped the activities on Magali (and is, ingeneral, less involved in telecommunication chips becauseof ST-microelectronics strategy).

I There are many open questions:I How to program FPGA-based SDR machines?I How to handle fast dynamic reconfiguration in

heterogenous MP-SoC?

37 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

Question?

Mapping

Scheduling

Front End Back End

PaDaF

(C++)

C++ Front End

(CLang)

LLVM IR

Graph

Construction

Graph +

LLVM IR

Buffer Verification

(SPIN)

Code

Generation

MPSoC Code

(ASM)

Programming ModelI PaDaFI Front End

Micro-SchedulingI Buffer verificationI Model checking

ExperimentationsI Magali Back EndI LTE experiments

38 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

PerspectivesOn dataflow programming

I CompilerI Runtime

Mapping

Scheduling

Front End Back End

PaDaF

(C++)

C++ Front End

(CLang)

LLVM IR

Graph

Construction

Graph +

LLVM IR

Buffer Verification

(SPIN)

Code

Generation

MPSoC Code

(ASM)

On heterogeneous MPSoC

On data manipulation

39 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

PerspectivesOn dataflow programming

On heterogeneous MPSoCI Future of dedicated platformsI Development on such platforms

10

1000

1000000

100

10000

100000

1990 1995 20052000 2010

2G

3G

4G

data rate

(kbps)

year

BluetoothWi-Fi

On data manipulation

39 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

PerspectivesOn dataflow programming

On heterogeneous MPSoC

On data manipulationI 50% of telecom. protocolI Complexity abstraction

Control

User 1

User 2

User 3

Data

...

39 / 39

Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion

PerspectivesOn dataflow programming

On heterogeneous MPSoC

On data manipulation

39 / 39


Recommended