Date post: | 11-Apr-2018 |
Category: |
Documents |
Upload: | duongthuan |
View: | 215 times |
Download: | 1 times |
Compilation ofParametric Dataflow Applications for
Software-Defined-Radio-Dedicated MPSoCsPhD work of Mickael Dardaillon
Mickaël Dardaillon,Kevin Marquet (Citi), Tanguy Risset (Citi),
Jérôme Martin (Cea Leti), Henri-Pierre Charles (CEA List)
June 24th, 2016
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Evolution of telecommunication protocols
2G
3G
Wi-Fi
Bluetooth10
1000
1000000
100
10000
100000
1990 1995 20052000 2010
2G
3G
4G
data rate
(kbps)
year
BluetoothWi-Fi
2 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Evolution of telecommunication protocols
SDR
10
1000
1000000
100
10000
100000
1990 1995 20052000 2010
2G
3G
4G
data rate
(kbps)
year
BluetoothWi-Fi
2 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
4G LTE-Advanced: Downlink
0 1 2 3 4 5 6 7 8 9
1 sub-frame (1 ms)
1 frame (10 ms)
I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW
3 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
4G LTE-Advanced: Downlink
0 1 2 3 4 5 6 7 8 9
1 sub-frame (1 ms)
1 frame (10 ms)
Control
User 1
User 2
User 3
Data
...
14 OFDM Symbols
20
48
sub
ca
rriers
(20
MH
z)
I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW
3 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
4G LTE-Advanced: Downlink
0 1 2 3 4 5 6 7 8 9
1 sub-frame (1 ms)
1 frame (10 ms)
Control
User 1
User 2
User 3
Data
...
14 OFDM Symbols
20
48
sub
ca
rriers
(20
MH
z)
I MIMO: 4× 2 antennasI LTE throughput: 1.4 GbpsI LTE-Advanced: 7 GbpsI Latency: 2 msI Power budget: 500 mW
3 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?
Baseband processing insoftware
I ZigBeeI . . .I LTE Advanced
ConstraintsI Computing power ∼ GFLOPSI Reconfiguration time < 100µsI Consumption < 500mW
Architecture independent SDRsoftware
RF Frontend 1
AGC + synchronization
FFT
CFOestimation
CFOcorrection
channelestimation
RF Frontend 2
FFT
CFOcorrection
MIMO decoding
Demodulation
Deinterleaving
Error correction
4 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?
Baseband processing insoftware
I ZigBeeI . . .I LTE Advanced
ConstraintsI Computing power ∼ GFLOPSI Reconfiguration time < 100µsI Consumption < 500mW
Architecture independent SDRsoftware
RF Frontend 1
AGC + synchronization
FFT
CFOestimation
CFOcorrection
channelestimation
RF Frontend 2
FFT
CFOcorrection
MIMO decoding
Demodulation
Deinterleaving
Error correction4 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?What is an SDR hardware platform?
I EVP16?
I VLIWI Vector Processor
I SB3500?
I DSPI Control
Processor
I Magali?
I ConfigurableUnits
I NoC
I . . .
⇒ No unified hardware platformmodel for SDR.
Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?
5 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?What is an SDR hardware platform?
I EVP16?I VLIWI Vector Processor
I SB3500?
I DSPI Control
Processor
I Magali?
I ConfigurableUnits
I NoC
I . . .
Vector Processing for Software-Defined Radio 2619
Prog
ram
mem
ory
VLI
Wco
ntro
ller
AC
U
· · · · · ·
Vector FU
Vector register file
Vector memory
P words wide 1 word wide
Scalar RF
Scalar FU
Figure 6: A generic vector-processor architecture.
Prog
ram
mem
ory
VLI
Wco
ntro
ller
AC
U
8 words wide 1 word wide
· · · · · ·
Vector memory
4 vector registers
Load/store
ALU
MAC
Shift
4 scalar regs.
Load/store
ALU
MAC
Shift
Figure 7: The OnDSP architecture.
(iii) The VLIW execution model supports parallelismamong multiple vector functional units (FUs), for ex-ample, MAC, ALU. This VLIW parallelism comes inaddition to vector parallelism (R3).
(iv) On top of that a VLIW instruction may also specify sev-eral operations on scalar functional units (R4).
(v) To keep many functional units busy, there is ex-tensive support for address calculations (ACUs, e.g.,postincrement, modulo) and for zero-overhead loop-ing (R4).
Compared to other programmable architectures, SIMD ex-ecution results in low power consumption (R8), becausethe “overhead” of address calculations, address decoding, in-struction fetching/decoding, and control is shared by P oper-ations. A similar reasoning holds for silicon area per MOPS.
With the above in common, two vector processor in-stances have been developed within Philips: OnDSP targetingWLAN, and EVP targeting 3G and beyond.
4.1. OnDSPThe OnDSP vector processor is a key component of severalmultistandard programmable wireless LAN baseband prod-uct ICs [15]. The application of vector processing to WLANwill be addressed in Section 6.1.
The OnDSP architecture is depicted in Figure 7. Thevector size equals P = 8 (128 bits). A single VLIW in-struction can specify a number of vector operations, forexample, load/store, ALU, MAC, address calculations, and
Prog
ram
mem
ory
VLI
Wco
ntro
ller
AC
U
16 words wide 1 word wide
· · · · · ·
Vector memory
16 vector registers
Load/store unit
ALU
MAC/shift unit
Shu!e unit
Intravector unit
Code generation unit
32 scalar regs.
Load/store U
ALU
MAC U
AXU
Figure 8: The EVP architecture.
loop-control ((R3), (R4)). OnDSP supports a couple of spe-cific vector instructions, including word insertion/deletion,sliding, and gray coding/decoding. Data addresses must bea multiple of P. Program code is compressed vertically(“tagged VLIW” [16]).
In a 0.12 µm CMOS process, OnDSP measures about1.5 mm2 (250 kgates), runs 160 MHz (worst-case commer-cial), and dissipates about 0.8 mW/MHz including a typicalmemory configuration (R8). A macroassembler is used forVLIW scheduling, although optimization by hand is used forcritical code.
4.2. EVP
The EVP (embedded vector processor) is a productized ver-sion of the CVP [7]. Although originally developed to sup-port 3G standards, the current architecture proves to behighly versatile. Care has been taken to cover the OnDSP ca-pabilities for OFDM standards.
The EVP architecture is depicted in Figure 8. The mainword width is 16 bits, with support for 8-bit and 32-bit data(R1). The EVP supports multiple data types, including com-plex numbers (R1). For example, a complex vector multipli-cation uses P multipliers to multiply 1/2p complex numberseach two clock cycles.
The SIMD width is scalable (R2), and has been set toP = 16 (256 bits) for the first product instance EVP16. Themaximum VLIW-parallelism available equals five vector op-erations plus four scalar operations plus three address up-dates plus loop-control. Specific FUs of the EVP include thefollowing ((R3), (R4)).
(i) The shu!e unit can be used to rearrange the elementsof a single vector according to an arbitrary pattern(R5).
(ii) The intravector unit supports operations such as add(or take the maximum of) the elements of a single vec-tor, possibly split in, M segments of P/M elements each(R6), with M a power of 2.
⇒ No unified hardware platformmodel for SDR.
Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?
5 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?What is an SDR hardware platform?
I EVP16?I VLIWI Vector Processor
I SB3500?I DSPI Control
Processor
I Magali?
I ConfigurableUnits
I NoC
I . . .
IEEE SIGNAL PROCESSING MAGAZINE [26] MARCH 2010
Each of the three Sandblaster cores has support for SIMD instructions and thus it can exploit the DLP available in the application. Because the platform consists of three data processing cores, inter-TLP among the different tasks in the application can be also exploited on the platform. Each Sandblaster core also offers a fine-grain intra-TLP inside a single core. This intracore parallelism is also referred to as “token triggered threading” (T
3), which is a form of simultaneous multithreading (SMT). Support for SMT allows the core to switch between different threads and
their contexts quickly. However, the Sandblaster core has only limited ILP where only four instructions can be executed in parallel.
INFINEON MUSICInfineon’s MuSIC-1 platform [9] is a heterogeneous multicore platform that consists of various accelerators along with four programmable cores. Each of these four programmable cores pro-vides DLP and is used for the inner modem PHY processing with the help of filter accelerators. The turbo/Viterbi accelerators are used for performing the outer modem PHY processing. The block diagram of the platform is depict-ed in Figure 6.
The multicore nature of the MuSIC-1 platform supports intercore TLP, which allows the mapping of different tasks on different cores. Similar to Sandbridge, the ILP inside a single core is limited.
ST-ERICSSON EXTREME VECTOR PROCESSOR PLATFORMThe extreme vector processor (EVP) [13] consists of 16-wide SIMD processor with five issue slots. Three of the five slots operate on vector data and two operate on scalar data. This processor exploits both data- and instruction-level parallel-ism in the application. However, not much public information is available on the complete platform architecture and how many cores would be needed to sup-port a wireless standard.
ARM/UNIVERSITY OF MICHIGAN’S ARDBEG PLATFORMARM/University of Michigan’s Ardbeg platform [14] consists of three proces-sor cores. Two cores are allocated for baseband processing and one core for control. The platform also consists of a
turbo coprocessor for outer-modem processing (see Fig-ure 7). The platform enables TLP to be exploitable between the four functional blocks (control processor, two baseband cores, and a turbo accelerator). Each of the baseband cores is 512-b wide and is capable of performing 64-way, 32-way, and 16-way SIMD on 8-b, 16-b, and 32-b data, respectively. However, the baseband core does not allow a large amount of ILP inside the core. The baseband processor is also used to perform certain outer-modem functionality such as Viterbi decoding.
Core 3iCache
SBXMemory
SHB
Core 2iCache
SBXMemory
SHB
IO andOther
Interfaces
IO andOther
Interfaces
IO andOther
Interfaces
Core 1iCache
SBXMemory
SHB
HSN 4 AMBA
ARMIO
Subsystem
MemorySubsystem
DMA
DeviceController
Buses
HSN
SBXComplex
[FIG5] Sandbridge SB3500 platform architecture.
VLIW CU
Global PRFGlobal DRF
ICac
he
Con
figur
atio
n M
emor
ies
VLIW ViewInst. Fetch
Branch ctrlInst. Dispatch
DMEM
CGA View
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FURF
FU FU FU FU
Mode ctrlCGA and VLIW
VLI
W S
ectio
n
CG
A S
ectio
n
[FIG4] IMEC’s ADRES processor in the BEAR platform.
⇒ No unified hardware platformmodel for SDR.
Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?
5 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?What is an SDR hardware platform?
I EVP16?I VLIWI Vector Processor
I SB3500?I DSPI Control
ProcessorI Magali?
I ConfigurableUnits
I NoC
I . . .
OFDMofdm1
OFDMofdm2
OFDMofdm3
OFDMofdm4
TURBOturbo
DEMODdemod
MODmod
LDPCldpc
WIFLEXwiflex
ARMarm
80518051
DMAdma2
DMAdma3
DMAdma1
DMAdma4
DMAdma5
DSPdsp2
DSPdsp3
DSPdsp5
DSPdsp4
DSPdsp1
⇒ No unified hardware platformmodel for SDR.
Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?
5 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
ContextWhat is an SDR software?What is an SDR hardware platform?
I EVP16?I VLIWI Vector Processor
I SB3500?I DSPI Control
ProcessorI Magali?
I ConfigurableUnits
I NoCI . . .
⇒ No unified hardware platformmodel for SDR.
Problem Statement: how toprogram and compile atelecommunication protocol to anheterogeneous MPSoC?
5 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Magali SDR
LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW
6 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Magali SDR
DSPdsp2
DSPdsp3
DSPdsp5
DSPdsp4
DSPdsp1
LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW
6 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Magali SDR
OFDMofdm1
OFDMofdm2
OFDMofdm3
OFDMofdm4
TURBOturbo
DEMODdemod
MODmod
LDPCldpc
WIFLEXwiflex
DSPdsp2
DSPdsp3
DSPdsp5
DSPdsp4
DSPdsp1
LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW
6 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Magali SDR
OFDMofdm1
OFDMofdm2
OFDMofdm3
OFDMofdm4
TURBOturbo
DEMODdemod
MODmod
LDPCldpc
WIFLEXwiflex
DMAdma2
DMAdma3
DMAdma1
DMAdma4
DMAdma5
DSPdsp2
DSPdsp3
DSPdsp5
DSPdsp4
DSPdsp1
LTE demonstrator[Clermidy et al., 09]Power consumption: 231mW
6 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
OutlineContext
SDR software?
Programming Model for SDRDataflow Model of ComputationInput Format
Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling
Experimentations on MagaliCode GenerationExperimental Results
Conclusion
7 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
State of the Art in SDR ProgrammingImperative Concurrent
Platform LanguageExoCHI [Wang et al., 07] OpenMP + CBEAR [Derudder et al., 09] Matlab + C
Dataflow
Platform LanguageSimulinkLabViewGNU Radio Python + CRVC-CAL [Lucarz et al., 08] XML + CDiplodocusDF [Gonzalez-Pina et al., 12] UMLMAPS [Castrillon et al., 13] C like
8 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Static Dataflow (SDF) [Lee et al., 87]
Decod1Src110
Ctrl10 1 1
9 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Phase Approach with Static Dataflow
...
Decod2 Sink1 10100 10
Src2
Decod2 Sink2 10100 10
Src2
Decod2 Sink3 10100 10
Src2
Decod1Src110
Ctrl10 1 1
10 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Dynamic Dataflow (DDF) [Buck, 93]
SDF DDF
Analysable Expressive
KPN
Scenario Aware DataFlow (SADF) [Theelen et al., 06]Mode Controlled DataFlow (MCDF) [Moreira et al., 12]Schedulable Parametric DataFlow (SPDF) [Fradet et al., 12]Parameterized and Interfaced dataflow Meta-Model (PiMM)[Desnos et al., 13]Boolean Parametric DataFlow (BPDF) [Bebelis et al., 13]
Kahn Process Network (KPN) [Kahn, 74]
11 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Dynamic Dataflow (DDF) [Buck, 93]
SDF DDF
Analysable Expressive
KPNMCDFSPDF BPDFSADFPiMM
Scenario Aware DataFlow (SADF) [Theelen et al., 06]Mode Controlled DataFlow (MCDF) [Moreira et al., 12]Schedulable Parametric DataFlow (SPDF) [Fradet et al., 12]Parameterized and Interfaced dataflow Meta-Model (PiMM)[Desnos et al., 13]Boolean Parametric DataFlow (BPDF) [Bebelis et al., 13]Kahn Process Network (KPN) [Kahn, 74]
11 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Schedulable Parametric DataFlow (SPDF)
Decod1
Src
10
10 1 1 Ctrl
[Fradet et al., 12]I Model of ComputationI AnalysisI Quasi-Static Scheduling
12 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Schedulable Parametric DataFlow (SPDF)
Decod1
Src
10
10 1 1 Ctrlset p[1]
Sinkp 10
100
10
p
Decod2
[Fradet et al., 12]I Model of ComputationI AnalysisI Quasi-Static Scheduling
...
12 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Parametric DataFlow Format (PaDaF)
Decod1
Decod2
Src
10 set p[1]
Sink
10 1
p
1
10
100
10
p
Ctrl
Actor specification
class Decod: public Actor{PortIn<int> in;PortOut<int> out;ParamIn p;void compute() {[...]out.push(res, p);
Graph specification
Src src;Decod decod[2];[...]for(int i=0; i<2; i++) {
decod[i].in <= src.out[i];
13 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Front End ImplementationFront End
PaDaF
(C++)
C++ Front End
(CLang)
LLVM IR
Graph
Construction
Graph +
LLVM IR
SDR Programming ModelI Propose SPDF for SDRI C++ input formatI [IWCMC 12, IGI 14]
Front EndI Based on LLVM frameworkI Derived from SystemC analysis
[Marquet et al., 10]I Static graph structureI [CASES 14]
14 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
OutlineContext
SDR software?
Programming Model for SDRDataflow Model of ComputationInput Format
Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling
Experimentations on MagaliCode GenerationExperimental Results
Conclusion
15 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SDF Scheduling
Decod1
Decod2
Src10
Sink
10 1 1
10100
10
Ctrl
5
Iteration vector:(Src; Decod1; Ctrl; (Decod2)10; (Sink)5
)
16 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Scheduling [Fradet et al., 12]
Decod1
Decod2
Src
10 set p[1]
Sink
10 1
p
1
10
100
10
p
Ctrl
Iteration vector:(Src; Decod1; Ctrl; (Decod2)10; (Sink)p
)
17 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Mapping
Decod1
Decod2
Src
10
Ctrl
set p[1]
Sink
10 1
p
1
10
100
10
p
DEMODdemod
ARMarm
DMAdma2
DMAdma1
18 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Mapping
demod dma2
dma1
arm
Decod1
Decod2
Src
10
Ctrl
set p[1]
Sink
10 1
p
1
10
100
10
p
DEMODdemod
ARMarm
DMAdma2
DMAdma1
18 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Quasi-Static Scheduling
Decod1
Decod2
Src
10
Ctrl
set p[1]
Sink
10 1
p
1
10
100
10
p
demod dma2
dma1
arm
S(dma1) = (Src)S(arm) = (Ctrl; set(p))
S(demod) =(Decod1;get(p); (Decod2)10
)S(dma2) = (get(p); (Sink)p)
19 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Symbolic Execution
Time
arm
demod
dma2
dma1
D1 (D2)10
Src
Ctrl
(Sink)p
S(dma1) = (Src)S(arm) = (Ctrl; set(p))
S(demod) =(Decod1;get(p); (Decod2)10
)S(dma2) = (get(p); (Sink)p)
20 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Buffer Sizing
Decod1
Decod2
Src
10
Ctrl
set p[1]
Sink
10 1
p
1
10
100
10
p
demod dma2
dma1
arm
[100]
[10][1]
[10*pmax]
Problem: overestimates buffer size
e.g. MagaliI FFT size: 2048I Buffer size: 16
21 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
SPDF Model Refinement
Decod1
Decod2
Src
10
Ctrl
set p[1]
Sink
10 1
p
1
10
100
10
p
demod dma2
dma1
arm
[10]
[10][1]
[pmax]
Src::compute() {[...]out[1].push(ctrl, 10);for(int i=0; i<10; i++)
out[2].push(data[i],10);}
Idea: model each individualdata communication
I Micro-Scheduling
22 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Micro-Scheduling: an Example
Time
arm
demod
dma2
dma1
D1 (D2)10
Src
Ctrl
(Sink)p
µS(Src) =(pushSrc,D1
(10);pushSrc,D2(10)10
)µS(D2) =
(popSrc,D2
(10);pushD2,Sink(p))
µS(Sink) =(popD2,Sink(1)
10)
23 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Buffer Sizing Verification
How to verify buffer sizes using micro-schedules?
Proposed Verification MethodI Based on Model CheckingI Derived from buffer minimization [Geilen et al., 05]
ModelI ScheduleI Buffer sizes+ Micro-Schedule+ Parameter values
Model CheckerI SPINI Check for deadlocks
24 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Buffer Sizing Verification
How to verify buffer sizes using micro-schedules?
Proposed Verification MethodI Based on Model CheckingI Derived from buffer minimization [Geilen et al., 05]
ModelI ScheduleI Buffer sizes+ Micro-Schedule+ Parameter values
Model CheckerI SPINI Check for deadlocks
24 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Micro-Scheduling Implementation
Mapping
Scheduling
Front End Back End
PaDaF
(C++)
C++ Front End
(CLang)
LLVM IR
Construction
Graph
Graph +
LLVM IR
Buffer Verification
(SPIN)
Micro-SchedulingI SPDF model
refinementI Sequential
communications
Buffer VerificationI Model checkingI Model generationI [CASES 14]
25 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
OutlineContext
SDR software?
Programming Model for SDRDataflow Model of ComputationInput Format
Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling
Experimentations on MagaliCode GenerationExperimental Results
Conclusion
26 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Code GenerationGraph +
LLVM IR
OFDM
OFDMofdm1
DMAdma2
DMAdma3
DMAdma1
DMAdma4
DMAdma5
DSPdsp2
DSPdsp3
DSPdsp5
DSPdsp4
OFDMofdm2
OFDMofdm3
OFDMofdm4
ARMarm
80518051
LDPCldpc
WIFLEXwiflex
TURBOturbo
DSPdsp1
MODmod
DEMODdemod
DEMOD code generationcontrol
code generationcommunication
(C)
Control code
(ASM)
Magali code
code generationARM
TURBO
DSP
DMA
code generationARM
27 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Benchmarks using LTE
OFDM: compilation
Sink4200600
ofdm1 dma3
FFTSrc7168
dma1
Defram1024 1024 1024
Demodulation: communications
dma2
Src
Src DeinterBit
Depunct DecodTurbo
Demap DeinterWord
Sink
1200
1200 1200
1200900 900 900
900
900 300 1353 1353
57
57
dma1 demod
dma3
turbo
dma4
28 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Benchmarks using LTE
Parametric Demodulation: parameter
dma2
Src DeinterBit
Depunct DecodTurbo
Split Demap DeinterWord
set p[1]Control
Src DeinterBit
Depunct DecodTurbo
Split Demap DeinterWord
Sink
1440
1440
240 240
240
240
60 60 60
60 60 30 93 93 4
8
pp
1440
1440
1200
1200
1200
1200300p 300p 300p
300p
300p 300 1353 1353
57
57
dma1 demod
dma3
turbo
dma4
arm
29 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Results: Estimated Development Time
Compiler DevelopmentI Front-End : 4 man-monthsI Back-End : 8 man-months
Native PaDaFApplication C / ASM (#lines) (hours) C++ (#lines) (hours)OFDM 150 / 200 40 60 1Demodulation 300 / 600 160 160 4Param. Demod. 500 / 800 480 260 8
Takeaway Message:Reduces development time
30 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Results: Buffer Verification Time
Evaluation frameworkI 2.4 GHz Intel Core i5, 8 GB RAM, OS X 10.9.2.I SPIN Model Checker
Application States Transitions Exec. Time (s)
OFDM 1.28× 104 2.56× 104 0.1Demodulation 2.12× 106 1.07× 107 9Param. Demod. 6.07× 107 2.22× 108 199
Takeaway Message:Reduces development time, improves verification
31 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Results: Execution Time
Evaluation frameworkI SystemC TLM based on 65 nm CMOS implementationI ARM code run on QEMU Virtual Machine
Application Native Generated
Optimized
(µs) (µs)
(µs)
OFDM 149 168 (+13%)
149 (+0%)
Demodulation 180 283 (+57%)
180 (+0%)
Param. Demod. 419 558 (+33%)
288 (-31%)
Takeaway Message:Reduces development time, improves verification
, maintainsperformances
32 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Execution ModelSink
4200600
ofdm1 dma3
FFTSrc7168
dma1
Defram1024 1024 1024
Phase Approach
Time
dma1
ofdm1
dma3
arm
Distributed
Time
dma1
ofdm1
dma3
arm
33 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Execution ModelPhase Approach
Time
dma1
ofdm1
dma3
arm
25 µs 37 µs 16 µs 21 µs
Distributed
Time
dma1
ofdm1
dma3
arm
25 µs
74 µs 25 µs23 µs
33 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Results: Execution Time
Evaluation frameworkI SystemC TLM based on 65 nm CMOS implementationI ARM code run on QEMU Virtual Machine
Application Native Generated Optimized(µs) (µs) (µs)
OFDM 149 168 (+13%) 149 (+0%)Demodulation 180 283 (+57%) 180 (+0%)Param. Demod. 419 558 (+33%) 288 (-31%)
Takeaway Message:Reduces development time, improves verification, maintainsperformances
34 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Back End Implementation
Mapping
Scheduling
Front End Back End
PaDaF
(C++)
C++ Front End
(CLang)
LLVM IR
Graph
Construction
Graph +
LLVM IR
Buffer Verification
(SPIN)
Code
Generation
MPSoC Code
(ASM)
Magali SupportI ComputationI CommunicationI Control
LTE ExperimentationI Performance close
to nativeI Buffer verificationI Central controllerI [ComPAS 14,
CASES 14]
35 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
OutlineContext
SDR software?
Programming Model for SDRDataflow Model of ComputationInput Format
Dataflow Refinement and Buffer VerificationMapping and SchedulingMicro-Scheduling
Experimentations on MagaliCode GenerationExperimental Results
Conclusion
36 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Conclusion
I A complete research experience betweenTelecommunication, SoC programming sqand Compilation.
I Mickael Dardaillon is currently working at NationalInstruments (Austin) on the compilation ParametricDataflow in LabView-FPGA([email protected])
I CEA has stopped the activities on Magali (and is, ingeneral, less involved in telecommunication chips becauseof ST-microelectronics strategy).
I There are many open questions:I How to program FPGA-based SDR machines?I How to handle fast dynamic reconfiguration in
heterogenous MP-SoC?
37 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
Question?
Mapping
Scheduling
Front End Back End
PaDaF
(C++)
C++ Front End
(CLang)
LLVM IR
Graph
Construction
Graph +
LLVM IR
Buffer Verification
(SPIN)
Code
Generation
MPSoC Code
(ASM)
Programming ModelI PaDaFI Front End
Micro-SchedulingI Buffer verificationI Model checking
ExperimentationsI Magali Back EndI LTE experiments
38 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
PerspectivesOn dataflow programming
I CompilerI Runtime
Mapping
Scheduling
Front End Back End
PaDaF
(C++)
C++ Front End
(CLang)
LLVM IR
Graph
Construction
Graph +
LLVM IR
Buffer Verification
(SPIN)
Code
Generation
MPSoC Code
(ASM)
On heterogeneous MPSoC
On data manipulation
39 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
PerspectivesOn dataflow programming
On heterogeneous MPSoCI Future of dedicated platformsI Development on such platforms
10
1000
1000000
100
10000
100000
1990 1995 20052000 2010
2G
3G
4G
data rate
(kbps)
year
BluetoothWi-Fi
On data manipulation
39 / 39
Context Programming Model for SDR Micro-Scheduling Experimentations on Magali Conclusion
PerspectivesOn dataflow programming
On heterogeneous MPSoC
On data manipulationI 50% of telecom. protocolI Complexity abstraction
Control
User 1
User 2
User 3
Data
...
39 / 39