+ All Categories
Home > Documents > A Monte Carlo Simulation Accelerator using FPGA Devices

A Monte Carlo Simulation Accelerator using FPGA Devices

Date post: 30-Dec-2015
Category:
Upload: allegra-obrien
View: 27 times
Download: 0 times
Share this document with a friend
Description:
A Monte Carlo Simulation Accelerator using FPGA Devices. Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai Philip. Overview. Overview. Objective Background Software-only Implementation Hardware Implementation FPGA Soft-Core Micro-Processor. - PowerPoint PPT Presentation
Popular Tags:
48
A Monte Carlo A Monte Carlo Simulation Accelerator Simulation Accelerator using FPGA Devices using FPGA Devices Final Year project : LHW0304 Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai P Supervisor : Professor LEONG, Heng Wai P hilip hilip
Transcript

A Monte Carlo Simulation A Monte Carlo Simulation Accelerator using FPGA Accelerator using FPGA

DevicesDevices

Final Year project : LHW0304Final Year project : LHW0304

Ng Kin Fung && Ng Kwok TungNg Kin Fung && Ng Kwok Tung

Supervisor : Professor LEONG, Heng Wai PhilipSupervisor : Professor LEONG, Heng Wai Philip

OverviewOverview

OverviewOverview

ObjectiveObjectiveBackgroundBackground

Software-only ImplementationSoftware-only ImplementationHardware ImplementationHardware Implementation

FPGAFPGA

Soft-Core Micro-ProcessorSoft-Core Micro-Processor

OverviewOverview

BackgroundBackground Interest Rate ModelingInterest Rate ModelingBrace-Gatarek-Musiela (BGM) ModelBrace-Gatarek-Musiela (BGM) Model

Motivation and ContributionMotivation and ContributionSystem DesignSystem Design

System Design OverviewSystem Design OverviewSystem ComponentsSystem ComponentsSystem OperationsSystem Operations

OverviewOverview

Experiment and ResultExperiment and ResultResourcesResourcesPerformancePerformanceData Transmission OverheadData Transmission Overhead

ConclusionConclusionFuture ImprovementFuture ImprovementQ & A Section Q & A Section

ObjectiveObjective

ObjectiveObjective

What we achieved in last semesterWhat we achieved in last semester Study and get familiar with the development related toStudy and get familiar with the development related to

olsols Implement some simple examples to get experience iImplement some simple examples to get experience i

n system development of FPGA with Soft-core Micro-n system development of FPGA with Soft-core Micro-processorprocessor

First ever successful port of the Microblaze system to the Celoxica RC200 development board

Study the performance and power consumption of the Study the performance and power consumption of the systemsystem

ObjectiveObjective

How about this semesterHow about this semesterBuild up a Build up a Monte Carlo Simulation Accelerator Monte Carlo Simulation Accelerator

using FPGA technology and Soft-core Micro-using FPGA technology and Soft-core Micro-processorprocessor

Study the speed up and performanceStudy the speed up and performanceStudy the transmission overhead of the Study the transmission overhead of the

transmission channel between user core and transmission channel between user core and Soft-core Micro-processorSoft-core Micro-processor

FPGA and Soft-Core FPGA and Soft-Core Micro-ProcessorMicro-Processor

Software only implementationSoftware only implementation

TheThe performance isis NOT satisfactorySequential execution of instruction instead of Sequential execution of instruction instead of

parallel executionparallel executionSlow Memory accessSlow Memory access

Lack of ability to customize hardwareNo way to save power by switching off No way to save power by switching off

hardware modulehardware moduleThere is a need to solve the problem in There is a need to solve the problem in

another approachanother approach

FPGA TechnologyFPGA Technology

More and more More and more popular in system design in system design Higher degree of parallelism

Fewer clock cycle requiredFewer clock cycle required

FPGA TechnologyFPGA Technology Explicitly hardwired to perform a certain operatioExplicitly hardwired to perform a certain operatio

nn Optimized for specific purpose higher performan higher performan

ce ce Enable Enable customization of hardware module module

Power Saving Power Saving Reconfigurable

Enable reuse of hardwareEnable reuse of hardware Able to simulate and synthesize the circuits from Able to simulate and synthesize the circuits from

a high level program-like description a high level program-like description Easy system development and system testing system development and system testing Shorter time to market higher profit higher profit

Soft-Core Micro-ProcessorSoft-Core Micro-Processor

Most systems use a Most systems use a PC+FPGA accessed accessed through a through a PCI bus Bottleneck for entire system for entire system

Use of Use of Soft-Core Micro-ProcessorEverything is implemented in FPGAEverything is implemented in FPGATransmission of data is within the FPGATransmission of data is within the FPGAA A higher transmission bandwidth and and

lower latency

Soft-Core Micro-ProcessorSoft-Core Micro-Processor

Other advantagesOther advantagesEasier to developRetain the advantage of using FPGA Retain the advantage of using FPGA

FlexibleRetargetable

ConclusionConclusionFPGA technology + Soft-Core Micro-Proce

ssor

Interest Rate Interest Rate ModelingModeling

Interest Rate ModelingInterest Rate Modeling

Important of interest rate modelingImportant of interest rate modelingSimulate market behavior with historical Simulate market behavior with historical

parameter valuesparameter valuesExplain interest rate movements in terms of Explain interest rate movements in terms of

an underlying model an underlying model decision making on economic policy risk management

Brace-Gatarek-Musiela (BGM) MoBrace-Gatarek-Musiela (BGM) Modeldel

One of the most popular interest rate One of the most popular interest rate modelsmodels

Base on Monte Carlo MethodBase on Monte Carlo MethodLooping Part (Looping Part (most computational

expensive))

Implementing BGM Model using FPGA Implementing BGM Model using FPGA and Soft Core Microprocessorand Soft Core Microprocessor

0

20

40

60

80

100

120

pt1 pt2 pt3 pt4 pt5 pt6 pt7 pt8 pt9

BGM core generate 50 paths with 9 fixed points

Implementing BGM Model using FPGA Implementing BGM Model using FPGA and Soft Core Microprocessorand Soft Core Microprocessor

Implemented by FPGA in parallel styleImplemented by FPGA in parallel stylePost-processing calculation by MicroblazePost-processing calculation by Microblaze

Average and Standard errorAverage and Standard errorFast Simplex Link Bus for data transmissiFast Simplex Link Bus for data transmissi

on between BGM core and Microblazeon between BGM core and Microblaze

ContributionContribution

ContributionContribution

Improve the performance of the systemImprove the performance of the system

ImplementationImplementation ResponsibilityResponsibility PerformancePerformance

Software-onlySoftware-only On MarketOn Market LowestLowest

FPGA + PCFPGA + PC CSE ResearchCSE Research HighHigh

FPGA + Soft-Core

Micro-ProcessorOur Task Highest

System DesignSystem Design

System Design OverviewSystem Design Overview

System ComponentSystem Component

MicroblazeMicroblaze A soft-core MicroprocessorA soft-core Microprocessor

Delivered as HDL source code for synthesis Delivered as HDL source code for synthesis Designed in VHDLDesigned in VHDL Specially optimized for Xilinx FPGAs A reduced instruction set computer (RISC) A reduced instruction set computer (RISC) Speed of Microblaze across different devices from Xilinx StatisticsSpeed of Microblaze across different devices from Xilinx Statistics

Virtex™ -II Pro (-6) Virtex™ -II Pro (-6) 150 MHz 150 MHz 101 D-MIPS 101 D-MIPS

Virtex-II (-5) Virtex-II (-5) 125 MHz 125 MHz 82 D-MIPS82 D-MIPS

Virtex-E (-7) Virtex-E (-7) 75 MHz 75 MHz 49 D-MIPS49 D-MIPS

Spartan-IIE (-6) Spartan-IIE (-6) 75MHz 75MHz 49 D-MIPS49 D-MIPS

Spartan™ -II (-4) Spartan™ -II (-4) 65 MHz 65 MHz 43 D-MIPS43 D-MIPS

User Core – BGMUser Core – BGM Connect the core designed in VHDL to the Microblaze systConnect the core designed in VHDL to the Microblaze syst

emem Solve most computational expensive task in fully h

ardware Need to follow the signal and timing of the bus connected Need to follow the signal and timing of the bus connected A microprocessor description (MPD) fileA microprocessor description (MPD) file

Defines the interface of the peripheral Defines the interface of the peripheral Ports, BusesPorts, Buses

A Peripheral Analyze Order (PAO) fileA Peripheral Analyze Order (PAO) file A list of HDL files in order of compilation that are needA list of HDL files in order of compilation that are need

ed for synthesised for synthesis

Fast Simplex Link (FSL)Fast Simplex Link (FSL)

32 bits wide bus32 bits wide bus Unidirectional point-to-point data streaming Unidirectional point-to-point data streaming

interfacesinterfaces Control and Data communication supportControl and Data communication support FIFO based communicationFIFO based communication Fast Internal data and control transmission

Peak bandwidth 300MB / SEC

Fast Simplex Link (FSL)Fast Simplex Link (FSL)

Fast Simplex Link (FSL)Fast Simplex Link (FSL)

Xilinx Fast Simplex Link Channel Product Specification DS449 (v1.1) Aug 06, 2003

Fast Simplex Link (FSL)Fast Simplex Link (FSL)

Xilinx Fast Simplex Link Channel Product Specification DS449 (v1.1) Aug 06, 2003

Use Read Marco microblaze_bread_datafsl(val, id) for reading data from FSL FIFO to Microblaze

On-Chip Memory, Local Memory On-Chip Memory, Local Memory Bus and Memory Bus ControllerBus and Memory Bus Controller On Chip MemoryOn Chip Memory

Storage medium for the data and instructionStorage medium for the data and instruction Minimize the transmission overhead between the between the

Microblaze and the memoryMicroblaze and the memory Local Memory BusLocal Memory Bus

Single-cycle access to on-chip dual-port block RAM to on-chip dual-port block RAM Performance of 125 MHz

LMB BRAM Interface ControllerLMB BRAM Interface Controller Interface between the LMB and the bram_block peripInterface between the LMB and the bram_block perip

heralheral Separate controller for data and control Separate controller for data and control

On-Chip Peripheral Bus On-Chip Peripheral Bus (OPB Bus)(OPB Bus)

Connection between the main system and Connection between the main system and the peripheralsthe peripheralsMake Microblaze System Make Microblaze System More Functional

In this projectIn this projectUARTUARTOPB TimerOPB TimerGPIO GPIO

Universal Asynchronous Universal Asynchronous Receiver-Transmitter (UART)Receiver-Transmitter (UART)

Handles asynchronous serial communicatiHandles asynchronous serial communicationon

Libgen allows the mapping of standard inpLibgen allows the mapping of standard input and outputut and outputUse of scanf and printf for the Use of scanf and printf for the communicatio

n with user

OPB TimerOPB Timer

Facilitate the Facilitate the correct measurement of the performance

Initiate timer Initiate timer Start timer Start timer Stop timer Stop timer Get timer value Get timer valueXStatus XTmrCtr_InitializeXStatus XTmrCtr_Initializevoid XTmrCtr_Startvoid XTmrCtr_Startvoid XTmrCtr_Stop void XTmrCtr_Stop Xuint32 XTmrCtr_GetValue Xuint32 XTmrCtr_GetValue

General Purpose Input Output General Purpose Input Output (GPIO)(GPIO)

Problem found on FSL BusReset signal connected to GoundReset signal connected to GoundNo way to reset the BGM core through FSL B to reset the BGM core through FSL B

ususSolutionSolution

Make change to the VHDL source codeMake change to the VHDL source codeUse GPIO

General Purpose Input Output General Purpose Input Output (GPIO)(GPIO)

Reset

BGM CoreMicroblaze FSL

Reset Reset

X

Reset by GPIO

Reset by FSL BGM CoreMicroblaze GPIO

Reset Reset

System System OperationsOperations

BGM Core is reset

Microblaze System Start

Timer is started

BGM Process

Any More Data

Post-Processing Calculation by Microblaze

Timer is stopped

Result is printed out

End of Microblaze System

yes

No

System System OperationsOperations

BGM Core in process of generating path

BGM Process Start

Data transfer from BGM core to Microblaze System

Data format transform

Temperate storage of data

End of Microblaze System

Experimental Experimental ResultsResults

ResourcesResources Selected Device : 2v1000fg456-4 Resources for BGM core aloneSelected Device : 2v1000fg456-4 Resources for BGM core alone

DeviceDevice Used numberUsed number Total NumberTotal Number PercentagePercentage

SlicesSlices 64556455 51205120 126%

Slice Flip FlopsSlice Flip Flops 57685768 1024010240 56%56%

4-input LUTs4-input LUTs 1097410974 1024010240 107%

Bonded IOBsBonded IOBs 4242 324324 12%12%

MULT18X18sMULT18X18s 3737 4040 92%92%

GCLKsGCLKs 33 1616 18%18%

DCMsDCMs 11 88 12%12%

Unable to place whole system to the FPGA boardSystem Simulation by ModelSim

PerformancePerformance

Comparison of performance for the running of BGM core in FPGA and in PC

(By Dr. Zhang)Speed up factor : 19.87

PerformancePerformance

The comparison of performance for the running the BGM core in FPGA and PC with different number of paths generated

(By Dr. Zhang)Stable Performance with different path numbers

PerformancePerformance

Simulation of Microblaze system Total time required for generating 50 paths : 2.871ms

Speed up factor : 21.94

Transmission BandwidthTransmission Bandwidth

Transmission MediaTransmission Media Peak Transmission BandwidthPeak Transmission Bandwidth

Serial PortSerial Port 15KB / SEC15KB / SEC

Parallel PortParallel Port 150KB / SEC150KB / SEC

10M Ethernet10M Ethernet 1.2MB / SEC1.2MB / SEC

USBUSB 1.5MB / SEC1.5MB / SEC

100M Ethernet100M Ethernet 12MB / SEC12MB / SEC

PCI BusPCI Bus 100MB / SEC100MB / SEC

FSL BusFSL Bus 300MB / SEC

Transmission BandwidthTransmission Bandwidth

In FSL Bus 32 bit of data is sent by about 40000psTransmission bandwidth is around 100MB per second

Same significant as the peak transmission bandwidth as stated in specification

ConclusionConclusion

A A Monte Carlo Simulation Accelerator was implemented usinMonte Carlo Simulation Accelerator was implemented using FPGA technology and Xilinx Microblaze Soft-core Micro-prg FPGA technology and Xilinx Microblaze Soft-core Micro-processorocessor

A A speed up factor 21.94 when compared with software when compared with software only implementationonly implementation

Higher bandwidth and lower latency can be achieved can be achieved using FSL Link between Microblaze and BGM coreusing FSL Link between Microblaze and BGM core

High performance, the parallelism of execution of instruction, the reconfigurability and reuseability and the short development time……

Future DevelopmentFuture Development

Put the whole system in the FPGA boardPut the whole system in the FPGA board Implement other applications which put Implement other applications which put

high performance and short developing high performance and short developing time as the major considerationtime as the major consideration

Study other IP core included and make Study other IP core included and make improvement to the systemimprovement to the system

Q & AQ & A


Recommended