A Flexible Interconnection Structure
for Reconfigurable FPGA Dataflow Applications
Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian Pilato, Donatella Sciuto and Marco Domenico Santambrogio
Politecnico di MilanoDipartimento di Elettronica, Informazione e Bioingegneria
Milano, IT
[durelli, nacci, rcattaneo, pilato, sciuto]@[email protected]
1
20th Reconfigurable Architectures Workshop May 20-21, 2013, Boston, USA
Rationale
• Strive for performance in computing intensive applications
• Reconfigurable HW well suited for certain classes of applications– Multimedia, computational biology, physical
simulation
• FPGA used in HPC systems• High maintenance costs
– need to share resources among users
• Need to dynamically share and reuse components on FPGA among different users
2
Outline
• Goals• State of Art• Proposed Solution• Design and Evaluation• Case Study• Conclusions and Future work
3
Goals
• Design an interconnection able to:– Create different pipelines reusing
available components on the FPGA– Share the resources between different
applications– Not insert any stall in the pipeline
• Target FPGA for HPC scenario
4
State of Art
• BUS interconnection– Congestion problem– Does not scale
• Network on Chip– Possible congestion problem– Good scalability
5
• Introduce unexpected delays in computation– Can’t assure performance when sharing
the device between different users
Proposed Solution
• Switch based interconnection– Cores inputs connected to interconnection
outputs– Cores outputs connected to interconnection
inputs– Fully pipelined point-to-point communication
• Data read/write only when all the inputs are available
• Can be configured by setting for each input and output channels:– Switching configuration:
• Multiplexer configuration to route information
– From which clock cycle the channel is active– How much data have to be read/write through that
channel6
Proposed Solution
• Suited for Dataflow/Pipelined applications• Parameters can be extracted from an high
level description of the application and pipeline structure:– Possibility to automate the parameter
extraction and interconnection design
7
3
5
2
4
Implementation
8
• Solution Implemented with HLS:– HLS well suited for dataflow/stencil loop synthesis– Simplify HW development– Generation of compatible interfaces
• Maxeler Technologies:– HPC Dataflow computing exploiting FPGA– Proprietary HLS starting from Java-like description:
• Proposed interconnection solution easily described in Java
• MaxWorkstation 3A:– Intel i7 quad-core– Xilinx Virtex6 XC6VSX547T– PCIe communication:
• Maximum 8 channels/streams
Evaluation: Area Occupation
9
• Area increment (10-30%) due to increase in switching logic
• The interconnection consumes up to 6% of the FPGA:– Lot of space remains for user cores
Evaluation: Frequency
10
• Tested with pass-through cores to evaluate maximum working frequency of the interconnection (300MHz)
• In case of real life applications (Brain network with cores working at 200MHz) the interconnection does not affect the critical path
Case Study• Application:
– Image processing pipeline (up to 4 stages):• Gray scale (GS), Gaussian blur (GB), Edge detection (ED) filters• Their combinations
• Tested architectures:
• Experiments:– Single execution of a N stages pipeline– Batch execution of a workload of 100 random applications
11
(A) (B) (C) (D)
Case Study: Single execution
12
(A) (B) (C) (D)
Case Study: Single execution
13
(A) (B) (C) (D)
Case Study: Batch execution
14
• Proposed solution (D) does not introduce overhead in the overall execution time w.r.t. the other two architectures
• Low system load:– Up to 30% reduction in the overall workload execution time
Case Study: Batch execution
15
• Low system load (1-2 applications):– Proposed solution (D) does not introduce delays in the
execution of a single application of the workload
• Higher system loads (more than 2 applications):– 10%-30% reduction in single application execution time
Conclusions and Future work
• Conclusion:– Design of a interconnection to support HW
resource sharing in multi-application scenario
– Solution suited for dataflow/pipelined systems
– Possibility to realize different pipeline configurations at run-time
• Future works:– Design of a mapping/reconfiguration strategy
to allocate user cores and configure new core instances at run-time
16
17