Mapping Multiple Multivariate Gaussian Random...

Post on 01-May-2018

226 views 5 download

transcript

Mapping Multiple Multivariate

Gaussian Random Number Generators

on an FPGA

Chalermpol Saiprasert, Christos-S. Bouganis and George A. Constantinides

1

Outline

• Monte Carlo Simulation

• Multivariate Gaussian Random Number Generator (MVGRNG)

• Objective

• Optimization algorithm

• Proposed framework – Hardware architecture

• Experimental Results

• Conclusions

2

Introduction

• Monte Carlo simulation

» Mathematical technique

» Repeated random sampling

» Evaluate non-deterministic processes

• Pre-requisite for MC simulation random numbers

• Multivariate Gaussian distribution to capture many correlated

variables

• Acceleration of MC using FPGA

» Speed up simulations

» Optimization of MVGRNG

3

Objective

• Existing approaches only focus on single distribution MVGRNG

• Mapping of multiple multivariate Gaussian distributions

• Example: Optimization of many financial portfolios

» Represented by many multivariate Gaussian distributions

• MVRNG usually part of larger application

» Resource usage CRUCIAL

• Efficient resource sharing

4

Generate multivariate Gaussian Random Numbers

• Mean (m) and Covariance matrix (Σ)

• OBJECTIVE : APPROXIMATE Σ

• Eigenvalue Decomposition using SVD

• Using any levels of decomposition K

5

U UT

U1/ 21/ 2UT

x U1/ 2z m, z ~ N(0,I)

c izi i1

K

m

Proposed Algorithm

6

Approximation

Optimization

Approximation

Error Calculation

Calculate Overall

Approximation Error

Approximation

Optimization

Approximation

Error Calculation

Approximation

Optimization

Approximation

Error Calculation

Calculate Remainder of Target

Matrices

Check for Termination

Constraint

Termination constraint

Input Matrices

No

Yes

Vector Coefficients c

Σ1 Σ2 Σm

Approximate Σ for each

distribution

Target redundancies between

ALL input distribution

Exploit similarities in

PRECISION REQUIREMENTS

Select appropriate precision to minimize approximation error for all distributions

Distinct coefficients for each

distribution

Algorithm takes any M

number of distributions

Error Estimation Model

• Mean square error

• Approximation error for each distribution

7

2

2

1

NError

Approximated Matrix

Actual Matrix

Matrix Order

G

R

N

G

X +

X +

X +

CB1

CB2

CB3

1

1c

2

1c

1

2c2

2c

1

3c

2

3c

2

1

1

1 / zz

2

2

1

2 / zz

2

3

1

3 / zz

x1/x

2

8

Hardware Architecture

Constructed from K number of CBs K = no of decomposition levels

Mixed precisions in datapath

LUTs based

Precision in adder path = max(All CBs)

9

Hardware Architecture

Two multivariate Gaussian distributions (3x3

correlation matrices)

Using 3 levels of decomposition (K=3)

GRNG with different seeds for each input

distribution – completely independent

x1 produced after K clock cycles

x2 produced after 2K clock cycles

G

R

N

G

X +

X +

X +

CB1

CB2

CB3

1

1c

2

1c

1

2c2

2c

1

3c

2

3c

2

1

1

1 / zz

2

2

1

2 / zz

2

3

1

3 / zz

x1/x

2

Experiment I

Accuracy of Error and Resource

Estimation Model

Accuracy of the Error Estimation Model

10-15

10-10

10-5

100

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

Estimated Approximation Error of Correlation Matrices

Em

piric

al A

pp

roxim

atio

n E

rro

r o

f C

orr

ela

tio

n M

atr

ice

s

Accuracy of the Resource Estimation Model

0 500 1000 1500 2000 2500 3000 35000

500

1000

1500

2000

2500

3000

Estimated Resource Utilization (LUTs)

Em

piric

all

Re

so

urc

e U

tiliz

atio

n (

LU

Ts)

Experiment II

Comparison with Existing

Approaches

Experimental Setup

• Approaches under consideration

» [Thomas and Luk 2008]

» Our previous work [Saiprasert et al 2009]

• Adjust throughput of existing approaches to be the same level

» Fair comparison

• Force M consecutive levels to use same CB for [Saiprasert et al

2009]

» M = number of input distributions

14

CB2

CB3

CB2

CB1

CB3

CB2

a1

a1, a

2

a2

a3

a4

a3, a

4

x

x

Comparison of All Approaches

15

[Thomas08] [Saiprasert09] This work

Architecture DSP LUTs LUTs

Precision Fixed Mixed Mixed

Optimization across

all input distributions

No No Yes

Reuse same

hardware for all

input matrices

Force M consecutive

decomposition levels

to share same

hardware

Optimized

precisions and

coeff for all input

distributions

Experimental Setup

• 4 sets of input correlation matrices

» Set I: Four 2x2 matrices

» Set II: Four 4x4 matrices

» Set III: Four 6x6 matrices

» Set IV: Two 2x2 and two 4x4 matrices

• One MVGRNG optimised for each set

• 100,000 vectors obtained for each set

16

Set I Matrices (2x2)

200 400 600 800 1000 1200 140010

-15

10-10

10-5

100

Resource Utilization (LUTs)

Ap

pro

xim

atio

n E

rro

r o

f C

orr

ela

tio

n M

atr

ix

Proposed Approach

Extension of our previous work

[Thomas and Luk 08]

18bit GRNG

Floating Point GRNG

18 bit upstream, double precision hardware

Double precision upstream, double precision hardware

18 bit upstream, 18bit hardware

18 bit upstream, mixed precision hardware

Set II Matrices (4x4)

0 500 1000 1500 2000 2500 300010

-15

10-10

10-5

100

Resource Utilization (LUTs)

Ap

pro

xim

atio

n E

rro

r o

f C

orr

ela

tio

n M

atr

ix

Proposed Approach

Extension of our previous work

[Thomas and Luk 08]

18bit GRNG

Floating Point GRNG

Set III Matrices (6x6)

0 500 1000 1500 2000 2500 3000 3500 400010

-15

10-10

10-5

100

Resource Utilization (LUTs)

Ap

pro

xim

atio

n E

rro

r o

f C

orr

ela

tio

n M

atr

ix

Proposed Approach

Extension of our previous work

[Thomas and Luk 08]

18bit GRNG

Floating Point GRNG

50%

38%

Set IV Matrices (Mixed Matrix Orders)

0 500 1000 1500 2000 250010

-15

10-10

10-5

100

Resource Utilization (LUTs)

Ap

pro

xim

atio

n E

rro

r o

f C

orr

ela

tio

n M

atr

ix

Proposed Approach

Extension of our previous work

[Thomas and Luk 08]

18bit GRNG

Floating Point GRNG

Conclusions

• Innovative approach for multiple distributions MVGRNG

• One generator optimized for all input distributions

• Effective resource sharing algorithm

• Exploits similarities in precision requirements

• Up to 50% reduction in resource usage

• Without any penalty on the quality of the generated data

21

THANK YOU FOR YOUR ATTENTION