+ All Categories
Home > Documents > Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual...

Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual...

Date post: 22-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
23
Computing with FPGAs Peter Škoda Division of Electronics
Transcript
Page 1: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Computing with FPGAs

Peter Škoda

Division of Electronics

Page 2: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Division of electronics

Laboratories and groups:

Laboratory for Information Systems

Laboratory for Stochastic Signals and Processes Research (LISSP)

Computational biology and bioinformatics group

Research:

Intelligent data and signal analysis techniques

Knowledge representations for information systems

Development of advanced measurement systems and signal processing techniques with applications in biomedicine, bioinformatics

DEL and CIR (Centre for Informatics and Computing) have recently proposed establishment of Scientific Computing and Information Processing Institute (SCIP)

Page 3: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Laboratory for Stochastic Signals and

Processes Research

Research High resolution measurement in the time and amplitude domain

Methods for processing and compressing huge data structures in computational linguistics and bioinformatics

Methods for analysis of time series applying theory of stochastic processes, chaotic and fractal signals and nonlinear dynamics

New programmable architectures and advanced features based on FPGA embedded systems design

Research and development projects related to PLD/FPGA at DEL and CIR: PLD Development and programming System, CPM Operating system, 1988

R&D of Optoelectronic based laser simulators, 1993.

Real Life Data Measurement and Characterization, Long term scientific project (Ministry of Science Education and Sport), (2007-).

Reconfigurable embedded systems based assistive applications for elderly people, Croatian-Hungarian Intergovernmental S&T Programme, (2009-2011).

Reliability of programmable logic devices in industrial embedded systems, R&D project with the KONČAR Electrical Engineering INSTITUTE, (2007-2009).

Quantum Random Number Generator, World Bank Croatia TAL2 project (2004-2006), (with DEP).

Page 4: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Motivation

Perpetual issue: demand for computing power keeps on

increasing

Multi-core CPUs, multi-processor systems, computer

clusters

Heterogeneous Computing

Use of different kind of processing units in a single computing

system – CPUs, DSPs, GPUs, custom accelerator units

Most common today: CPU+GPU, CPU+FPGA

FPGAs in computing – used to implement custom

accelerator units

Page 5: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

FPGA – Field Programmable Gate Array

User-programmable digital

integrated circuit

Building elements:

Logic blocks

Input/output blocks

Programmable interconnect

Specialized memory,

arithmetic and

communication blocks

Page 6: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Logic Block

Implements general

combinational and

sequential logic

Look-Up Tables (LUT) –

combinational functions

Flip-Flops (FF) – sequential

functions

Page 7: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Input/Output Block

Provides connections to

outside components

Direction:

Output

Input

Bidirectional

Buffers:

Convert signal voltage

levels

Drive internal (In Buf) and

external (Out Buf) lines

Page 8: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Interconnect

Provides connections

between blocks

Two types of nets:

Signal net – regular

connections

Clock net – clock signal

distribution

Switch matrix

Page 9: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Specialized Blocks

Memory

Arithmetic

Multipliers

Multiply-accumulate

Communication

Fast serializer/deserializer

Page 10: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

CPU vs. FPGA

CPU FPGA

Fixed hardware

Easier to program

High clock speed – GHz range

Sequential execution of instructions

Limited parallelism levels – data, task

Fixed set of arithmetic precisions

User defined hardware

More difficult to program

Low clock speed – 100s MHz range

Logic circuits that operate concurrently

Wide range of parallelism levels – bit, operation, data, task

Custom arithmetic precisions

Page 11: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

FPGA in computer systems

Provides a platform for implementation of custom accelerators

Used in addition to CPU

FPGA executes only computation kernel – the computationally most

intensive part of the application

Coprocessor

Connects directly to CPU (Hyper Transport, FSB), has direct access

to main memory

Peripheral processing unit

Connects through peripheral bus (PCIe)

Page 12: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Programming FPGAs Describe hardware function

In text form Hardware description languages: VHDL, Verilog

C to HDL tools: Jacquard ROCCC, Mentor Graphics Catapult C, Impulse C

In graphical form NI LabVIEW

Xilinx System Generator for DSP + MathWorks Simulink

Synthesis Translates HDL description into configurations of FPGA building blocks

(logic, IO, memory, etc.)

Place and Route Distribute blocks and connection to physical resources on FPGA

Bitstream generation Generate configuration file which is written to the FPGA

Page 13: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Hardware Description vs. Programming

Languages

HDL

(VHDL, Verilog) Programming Language (C/C++, Java)

Concurrent execution

Explicit expression of

parallelism

Sequential execution

through finite state

machines (FSM)

Wide range of behavioural

abstraction levels (logic,

RTL, algorithm)

Sequential execution

No expression of

parallelism

Parallel execution through

thread mechanism

High level of behavioural

abstraction (algorithm)

Page 14: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Example: Artificial Neural Network

Artificial neural networks (ANN)

Computational models inspired by biological neural networks

of the brain

Processing in is mainly parallel and distributed,

Information is stored in connections

ANNs are widely used in many domains

Eg. signal processing, automation and control.

Page 15: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Artificial Neuron

Fundamental parts:

Inputs

Synaptic links with weights

Activation function Φ

Bias constant b – usually incorporated into the weight vector

Total synaptic input:

Output:

Commonly used activation functions:

bxwu

n

i

ii 1

xxf )(

xexf

1

1)(

xx

xx

ee

eexf

)(

)(uy

Page 16: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Multilayer Perceptron (MLP)

One of the most

commonly used ANN type

Feed-forward network

No connections between

non-adjacent layers

No connections between

neurons in the same layer

Input layer

Hidden layers

Output layer

Page 17: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

MLP Parallelism

Layer parallelism

In multilayer networks the

layers can be pipelined

Node parallelism

Corresponds to individual

neurons – neurons are

processed in parallel

Weight parallelism

In computation of total

synaptic input – inputs are

multiplied with weights in

parallel

Page 18: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

FPGA Implementation - Neuron

Implemented in two parts

Basic functional unit (BFU) Implements computation of total synaptic input

Computed sequentially using multiply-accumulate (MAC) unit

Synaptic weights stored in local ROM

Bias constant included as synaptic weight

Activation function look-up table (LUT) ROM addressed by total synaptic input

Page 19: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

FPGA Implementation – MLP

Single layer One BFU per neuron

Single activation function LUT for a layer

Total synaptic inputs are loaded into shift registers and shifted to the activation function LUT

Computation on new inputs is carried out simultaneously with shifting of old results

Multilayer implementations Pipelined layers – cascading

Sequential layers – results routed back as new inputs

Page 20: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Performance

Evaluated on a single layer of a larger neural network

266 inputs

176 neurons

linear activation function

Target device: Xilinx Virtex-5 XC5VSX50T

Placed and routed at 85 MHz clock frequency

14,96 Gop/s (fixed-point multiply-accumulate operations)

Precision

(bits)

Input 16

Weights 14

Output 16

Resource Available Used Utilization

DSP48E 288 176 61%

Flip-flop 32640 2825 9%

LUT 32640 20197 62%

Page 21: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Performance

Extrapolation to entire network

Sequential layers implementation

Needs 542 clock cycles to evaluate (6.4 μs at 85 MHz)

Executes 62746 multiply-accumulate operations

9,84 Gop/s

Layer Number of

nodes

Activation

function

input 266 -

1st 176 linear

2nd 88 tan-sigmoid

3rd 2 log-sigmpoid

Page 22: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

Conclusion

FPGAs provide great opportunities for computing acceleration...

Custom architectures tailored for specific applications

Wide range of parallelism levels – bit, operation, data, task

...but are underutilized

Development for FPGA requires significantly more effort than regular computer programming

Development tools and processes geared towards integrated circuit design

Limited support for computing applications

Future prospects

Hardware/software co-design

Automated hardware and software generation from high-level system model

Page 23: Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual issue: demand for computing power keeps on increasing Multi-core CPUs, multi-processor

The end


Recommended