Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual...

Computing with FPGAs

Peter Škoda

Division of Electronics

Division of electronics

Laboratories and groups:

Laboratory for Information Systems

Laboratory for Stochastic Signals and Processes Research (LISSP)

Computational biology and bioinformatics group

Research:

Intelligent data and signal analysis techniques

Knowledge representations for information systems

Development of advanced measurement systems and signal processing techniques with applications in biomedicine, bioinformatics

DEL and CIR (Centre for Informatics and Computing) have recently proposed establishment of Scientific Computing and Information Processing Institute (SCIP)

Laboratory for Stochastic Signals and

Processes Research

Research High resolution measurement in the time and amplitude domain

Methods for processing and compressing huge data structures in computational linguistics and bioinformatics

Methods for analysis of time series applying theory of stochastic processes, chaotic and fractal signals and nonlinear dynamics

New programmable architectures and advanced features based on FPGA embedded systems design

Research and development projects related to PLD/FPGA at DEL and CIR: PLD Development and programming System, CPM Operating system, 1988

R&D of Optoelectronic based laser simulators, 1993.

Real Life Data Measurement and Characterization, Long term scientific project (Ministry of Science Education and Sport), (2007-).

Reconfigurable embedded systems based assistive applications for elderly people, Croatian-Hungarian Intergovernmental S&T Programme, (2009-2011).

Reliability of programmable logic devices in industrial embedded systems, R&D project with the KONČAR Electrical Engineering INSTITUTE, (2007-2009).

Quantum Random Number Generator, World Bank Croatia TAL2 project (2004-2006), (with DEP).

Motivation

Perpetual issue: demand for computing power keeps on

increasing

Multi-core CPUs, multi-processor systems, computer

clusters

Heterogeneous Computing

Use of different kind of processing units in a single computing

system – CPUs, DSPs, GPUs, custom accelerator units

Most common today: CPU+GPU, CPU+FPGA

FPGAs in computing – used to implement custom

accelerator units

FPGA – Field Programmable Gate Array

User-programmable digital

integrated circuit

Building elements:

Logic blocks

Input/output blocks

Programmable interconnect

Specialized memory,

arithmetic and

communication blocks

Logic Block

Implements general

combinational and

sequential logic

Look-Up Tables (LUT) –

combinational functions

Flip-Flops (FF) – sequential

functions

Input/Output Block

Provides connections to

outside components

Direction:

Output

Input

Bidirectional

Buffers:

Convert signal voltage

levels

Drive internal (In Buf) and

external (Out Buf) lines

Interconnect

Provides connections

between blocks

Two types of nets:

Signal net – regular

connections

Clock net – clock signal

distribution

Switch matrix

Specialized Blocks

Memory

Arithmetic

Multipliers

Multiply-accumulate

Communication

Fast serializer/deserializer

CPU vs. FPGA

CPU FPGA

Fixed hardware

Easier to program

High clock speed – GHz range

Sequential execution of instructions

Limited parallelism levels – data, task

Fixed set of arithmetic precisions

User defined hardware

More difficult to program

Low clock speed – 100s MHz range

Logic circuits that operate concurrently

Wide range of parallelism levels – bit, operation, data, task

Custom arithmetic precisions

FPGA in computer systems

Provides a platform for implementation of custom accelerators

Used in addition to CPU

FPGA executes only computation kernel – the computationally most

intensive part of the application

Coprocessor

Connects directly to CPU (Hyper Transport, FSB), has direct access

to main memory

Peripheral processing unit

Connects through peripheral bus (PCIe)

Programming FPGAs Describe hardware function

In text form Hardware description languages: VHDL, Verilog

C to HDL tools: Jacquard ROCCC, Mentor Graphics Catapult C, Impulse C

In graphical form NI LabVIEW

Xilinx System Generator for DSP + MathWorks Simulink

Synthesis Translates HDL description into configurations of FPGA building blocks

(logic, IO, memory, etc.)

Place and Route Distribute blocks and connection to physical resources on FPGA

Bitstream generation Generate configuration file which is written to the FPGA

Hardware Description vs. Programming

Languages

HDL

(VHDL, Verilog) Programming Language (C/C++, Java)

Concurrent execution

Explicit expression of

parallelism

Sequential execution

through finite state

machines (FSM)

Wide range of behavioural

abstraction levels (logic,

RTL, algorithm)

Sequential execution

No expression of

parallelism

Parallel execution through

thread mechanism

High level of behavioural

abstraction (algorithm)

Example: Artificial Neural Network

Artificial neural networks (ANN)

Computational models inspired by biological neural networks

of the brain

Processing in is mainly parallel and distributed,

Information is stored in connections

ANNs are widely used in many domains

Eg. signal processing, automation and control.

Artificial Neuron

Fundamental parts:

Inputs

Synaptic links with weights

Activation function Φ

Bias constant b – usually incorporated into the weight vector

Total synaptic input:

Output:

Commonly used activation functions:

bxwu

n

i

ii 1

xxf )(

xexf

1

1)(

xx

xx

ee

eexf

)(

)(uy

Multilayer Perceptron (MLP)

One of the most

commonly used ANN type

Feed-forward network

No connections between

non-adjacent layers

No connections between

neurons in the same layer

Input layer

Hidden layers

Output layer

MLP Parallelism

Layer parallelism

In multilayer networks the

layers can be pipelined

Node parallelism

Corresponds to individual

neurons – neurons are

processed in parallel

Weight parallelism

In computation of total

synaptic input – inputs are

multiplied with weights in

parallel

FPGA Implementation - Neuron

Implemented in two parts

Basic functional unit (BFU) Implements computation of total synaptic input

Computed sequentially using multiply-accumulate (MAC) unit

Synaptic weights stored in local ROM

Bias constant included as synaptic weight

Activation function look-up table (LUT) ROM addressed by total synaptic input

FPGA Implementation – MLP

Single layer One BFU per neuron

Single activation function LUT for a layer

Total synaptic inputs are loaded into shift registers and shifted to the activation function LUT

Computation on new inputs is carried out simultaneously with shifting of old results

Multilayer implementations Pipelined layers – cascading

Sequential layers – results routed back as new inputs

Performance

Evaluated on a single layer of a larger neural network

266 inputs

176 neurons

linear activation function

Target device: Xilinx Virtex-5 XC5VSX50T

Placed and routed at 85 MHz clock frequency

14,96 Gop/s (fixed-point multiply-accumulate operations)

Precision

(bits)

Input 16

Weights 14

Output 16

Resource Available Used Utilization

DSP48E 288 176 61%

Flip-flop 32640 2825 9%

LUT 32640 20197 62%

Performance

Extrapolation to entire network

Sequential layers implementation

Needs 542 clock cycles to evaluate (6.4 μs at 85 MHz)

Executes 62746 multiply-accumulate operations

9,84 Gop/s

Layer Number of

nodes

Activation

function

input 266 -

1st 176 linear

2nd 88 tan-sigmoid

3rd 2 log-sigmpoid

Conclusion

FPGAs provide great opportunities for computing acceleration...

Custom architectures tailored for specific applications

Wide range of parallelism levels – bit, operation, data, task

...but are underutilized

Development for FPGA requires significantly more effort than regular computer programming

Development tools and processes geared towards integrated circuit design

Limited support for computing applications

Future prospects

Hardware/software co-design

Automated hardware and software generation from high-level system model

The end

Date post:	22-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Computing with FPGAlnr.irb.hr/pd/daqws/images/P.Skoda.pdf · 2011-11-08 · Motivation Perpetual...

Documents