+ All Categories
Home > Documents > Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk...

Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk...

Date post: 22-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
Terry Spitz Citi 14 May 2013 Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance
Transcript
Page 1: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Terry Spitz

Citi

14 May 2013

Citi | Markets Quantitative Analysis

Optimising Risk Management

Computational Methods and Technologies for Finance

Page 2: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Agenda

1. Increasing Compute Requirements

2. High Performance Hardware

3. Software Optimisation Technologies

4. Writing efficient parallel code

5. Case Study: Optimising Pricing Models

6. Conclusions

Page 3: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Increasing Compute Requirements

Pre-2008

• More trades

• More complex payoffs, more observation dates

• More complex models, more factors, more complex calibration

Post-2008

• More competitive marketplace demands reduced cost per trade

• More stability/accuracy from more steps/trials

• More market data scenarios

• More regulatory testing, for example hedging simulations

Page 4: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

In t h e t w i l igh t of Moore’s La w , t h e t ra ns i t ions t o

m ult icore p roces s ors , GPU com p ut ing , a nd c loud

com p ut ing a re not s ep a ra t e t rend s , bu t a s p ec t s of a

s ing le t rend – m a ins t rea m com p ut ers from d es k t op s t o

s m a rt p h ones a re be ing p er m a nent ly t ra ns for m ed in t o

h e t erogen eous s up ercom p ut er c lus t ers . Hencefor t h , a

s ing le com p ut e -in t ens ive a p p l ica t ion w i l l need t o

h a r nes s d i fferen t k ind s of cores , in im m ens e num bers ,

t o ge t i t s job d one .

Th e free lunch is over . Now w elcom e t o t h e h a rd w a re

jungle .

Herb Su t ter (2012)

Page 5: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

High Performance Hardware

• NVidia

– GeForce/Quadro (1999-)

– Tesla (2007-)

– Fermi (2010-)

– Kepler (2012-)

• Intel

– Xeon (2004-)

– Larrabee (2006-2009)

– MIC Architecture: Knight’s Ferry / Intel Xeon Phi (2012-)

• Sony Cell (2005-2009)

• FPGA vendors

Page 6: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Hardware cost

Device Intel Xeon on

Grid NVidia Tesla

M2090 Intel Xeon Phi

Cost per year $6300/server $2000/card est. $2000/card

Cores 12x 3Ghz 512x 1.3Ghz 62x 1.05Ghz

Cost per core per year

$525/core $4/core est. $31/core

Speed 300 Gflops 665 Gflops est. 1290 Gflop

Cost per GFlop $21/Gflop $3/Gflop est. $1.6/Gflop

Memory 16Gb 6Gb 8Gb

Page 7: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

We s h ould forge t a bou t s m a l l

e ffic ien c ies , s a y a bou t 9 7 % of t h e

t im e p rem a t ure op t im iz a t ion is t h e

root of a l l evi l .

Yet w e s h ould not p a s s up our

op p ort un i t ies in t h a t cr i t ica l 3 %.

Don a ld Kn u th (1974 )

Page 8: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Software Optimisation Technologies

• Optimise serial code – Profile and optimise algorithms & code

– Better compiler: e.g. Intel CC

• Parallelise on CPU – SSE/AVX

– Multicore

– OpenMP

– MPI

– Grid

• Port to GPU – CUDA

– OpenCL

– C++ AMP

– Data parallel: Thrust, Microsoft Accelerator

Page 9: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Software Parallelisation APIs

Raw OS fork() / pthread_create(…) / CreateThread(…);

OpenMP #pragma omp parallel for for(int i = 0; i < N; i++) { …

TBB/PPL parallel_for (0, size, [&](int i) { …

Grid Session::sendTaskInput(Message* taskInput);

CUDA __global__ void VecAdd(float* A, float* B) {…

VecAdd<<<blocksPerGrid, threadsPerBlock>>>(…);

OpenCL kernel = clCreateKernel(program, …) clEnqueueNDRangeKernel(queue, kernel, …)

C++ AMP parallel_for_each(array.extent, [=](index<2> i) restrict(amp) { …

Thrust thrust::transform(rng.begin(), rng.end(), payoffs.begin(), compute_payoff(…))

Page 11: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Writing efficient GPU code

• Balance memory versus compute bottlenecks

• You need to run > 10,000 threads to keep the GPU busy

• SIMD (Single instruction, Multiple Data) with few divergent branches/loops

• Find the parallelism, for example:

– Multiple contracts

– Outer loops, e.g. MC Paths

– Third-party functions, e.g. matrix operations, RNG, parallel_reduce

• Extreme optimisation

– coherent memory access, shared memory, block/tile size, synchronisation, atomics, fast_maths, asynch/overlapped operations, …

• Limitations

– No exceptions, virtual functions, STL/vectors, new/delete, debugging, recursive functions, IEEE compliant maths, unsupported types, complex objects, …

Page 12: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Porting existing applications

https://developer.nvidia.com/content/assess-parallelize-optimize-deploy

• Analyse

– identify the hot spots by profiling with one or more realistic data sets

– estimate performance improvements considering the strong and weak scaling

• Parallelise

– GPU-accelerated libraries

– OpenACC directives

– GPU programming languages

• Optimise

• Deploy

Th e k ey w it h GPUs is t o a s k w h y y ou

w a n t t o u s e t h em – a re y ou look in g t o

d o s om et h in g a lot ch ea p er, or a lot

fa s t e r? Or a re y ou look in g t o d o

s om et h in g t h a t y ou ca n n ot fea s ibly

d o t od a y ?

Th e a s s es s p h a s e p re t t y m u ch h a s t o

h a ve w h a t ever m e t r ic is a p p lica ble

(t im e t o s olu t ion , # s olu t ion s /s econd

p er w a t t , s olu t ion in u n d er 5 m in u t es

– w h a t ever) fir m ly in m in d .

J oh n As h ley - NVid ia

Page 13: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Case Study:

Migrating Pricing Models to

CUDA

Page 14: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Binomial Tree

BinomialTree::calculateFairValue(…)

treeResult = tree.calculate();

applyTerminalCondition();

for(int i = startBackwardAt; i>=0; --i)

{

_steps[i]->update(*this, spots, fairs);

applyExerciseDecision(…)

}

Page 15: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

kernel

Binomial Tree Parallel

BinomialTree::calculateFairValue(…)

treeResult = tree.calculate();

applyTerminalCondition();

for(int i = startBackwardAt; i>=0; --i)

{

_steps[i]->update(*this, spotsVector, fairsVector);

applyExerciseDecision(…)

}

host

device_vector<double> d_spots(numContracts * numSteps);

device_vector<double> d_fairs(numContracts * numSteps);

applyTerminalConditions(d_spots, d_fairs);

binomialTreePricer<<<numContracts>>>(…)

for(int nContract=0; …)

fairValue[nContract] = h_fairs[nContract*numSteps + offset];

Page 16: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

CUDA Approach

CPU CUDA

1. Prepare data

2. Loop over ‘kernel’

3. Summarise results

1. Prepare data 2. Allocate device memory 3. Copy data to device 4. Invoke parallel kernel 5. Copy data from device 6. Summarise results

Page 17: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

VarianceSwap Pricer

VarianceSwap::calculateFairValue(…)

VarianceSwapReplication::calculateFairValue(…)

calcFutureVariance(…)

integrandStart = createLogContractIntegrand(…)

integrator.integrate(…)

for (i=0; i<degreeDiv2; ++i)

calculateEuropean(…)

engine.calculate(…)

European::calculateFairValue(…)

call = new Call(…)

integration.integrate(…)

for (i=0; i<degree; ++i)

double payoff = blackPremium(…)

Page 18: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

VarianceSwap Pricer Parallel

VarianceSwap::calculateFairValue(…)

VarianceSwapReplication::calculateFairValue(…)

calcFutureVariance(…)

integrandStart = createLogContractIntegrand(…)

integrator.integrate(…)

for (i=0; i<degreeDiv2; ++i)

calculateEuropean(…)

engine.calculate(…)

European::calculateFairValue(…)

call = new Call(…)

integration.integrate(…)

for (i=0; i<degree; ++i)

double payoff = blackPremium(…)

kernel

host

device_vector<double> bsPayoffs(…)

device_vector<double> inputs(…)

calculateBS<<<varSwapDegree, europeanDegree>>>(…)

Page 19: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

MonteCarlo Pricer

EuropeanMCPricer::calculateFairValue(…) { MCPathsGeneratorUtils::generatePaths(…) pathGen->generateIndependentNormals(…) _RNGFactory->createNewRNG(…) | RNG->getIndependentNormals(…) | getIndependentUniforms(variates); | convertUniformstoNormals(variates); pathGen->generateCorrelatedNormals(…) pathGen->generatePaths(…) _model->diffuse(…) for( size_t iTrial = 0; iTrial<trials; ++iTrial) { contract->calcContractualFlows( mcFixingSchedule for(size_t i=0; i < nFlows; ++i) { ImplementorUtils::getDefaultImplementor(…) calcEngine.calculate( flowImpl, fvRequest, results) fairValue += results->getFairValue(); } priceCollector(fairValue); result->setFairValue(acc::mean(priceCollector)); result->set(FairValueStdDev, sqrt(variance(priceCollector)/trials));

Page 20: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

MonteCarlo Pricer Parallel

EuropeanMCPricer::calculateFairValue(…) { MCPathsGeneratorUtils::generatePaths(…) pathGen->generateIndependentNormals(…) _RNGFactory->createNewRNG(…) RNG->getIndependentNormals(…) getIndependentUniforms(variates); convertUniformstoNormals(variates); pathGen->generateCorrelatedNormals(…) pathGen->generatePaths(…) _model->diffuse(…) for( size_t iTrial = 0; iTrial<trials; ++iTrial) { contract->calcContractualFlows( mcFixingSchedule for(size_t i=0; i < nFlows; ++i) { ImplementorUtils::getDefaultImplementor(…) calcEngine.calculate( flowImpl, fvRequest, results fairValue += results->getFairValue(); } priceCollector(fairValue); result->setFairValue(acc::mean(priceCollector)); result->set(FairValueStdDev, sqrt(variance(priceCollector)/trials));

host

curandCreateGenerator(…)

curandGenerateNormalDouble(…)

thrust::device_vector<double> contractFlows(nFlows*trials);

doTrialKernel<<<nTrials>>>(…)

pathGen->generateCorrelatedNormal(nTrial, …) kernel

pathGen->generatePath(nTrial, …)

_model->diffuse(…)

contract->calcContractualFlows(nTrial, …)

thrust::transform(contractFlows, flowPVs, PVs);

double fairValue = thrust::reduce(PVs)/trials;

double variance = thrust::inner_product(PVs, PVs, 0.0)

– fairValue*fairValue;

Page 21: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Copying Object Graphs

C++ Data Marshalling Best Practices - Cliff Woolley, NVIDIA http://on-demand.gputechconf.com/gtc/2012/presentations/S0377-GTC2012-Data-Marshalling-Practices.pdf

• We need data marshalling (serialization) to GPU

– We are moving data from one physical address space to another

– Virtual function tables must be updated

– Possible differences in structure layout

– Want bus transfers to be as efficient as possible

– Want parallel-friendly data organization to benefit the GPU

• C-style struct: cudaMemcpy and you’re done (even for an array)

– Except for fixing alignment

• TRICKIER CASES

– Virtual functions (device vtable ≠ host vtable)

- Split off any class containing virtual functions into:

• Base class contains only Plain Old Data members

• Derived class contains the virtual functions

– Bitfields

– AoS vs. SoA

– STL

Page 22: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Copying Object Graphs - Example

class PathGenerator : public Base { public: __host__ __device__ shared_ptr<MCPaths> generateCorrelatedNormals( shared_ptr<MCPaths> uniformRandomNumbers ) const; private: shared_ptr<PathParams> _params; ublas::matrix<double> _cholesky; };

PathGenerator vtbl

_params

_cholesky

PathParams vtbl

_antithetic

_brownianBridge

_size2

_data

ublas::matrix

_size1

Page 23: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Copying Object Graphs - DeviceWriter

class PathGenerator : public Base { public: void prepareDeviceMemory(DeviceWriter& writer) { writer.writeObject(this); writer.writeObject(_params); writer.writeMatrix(&_cholesky); } };

PathGen vtbl

_params

_cholesky

PathParams vtbl

_antithetic

_brownianBridge

_size2

_data

ublas::matrix

_size1

vtbl _params _cholesky vtbl _antithetic _brownianBridge _size2 _data _size1 … Host:

Original:

vtbl _params _cholesky vtbl _antithetic _brownianBridge _size2 _data _size1 … Device:

Page 24: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Calling member function on device

file: PathGenerator.cu #include ‚cuda_shared_ptr.hpp‛ #include ‚cuda_vector.h‛ #include ‚cuda_matrix.hpp‛ #include ‚MCPaths.h‛ PathGenerator::generateCorrelatedNormals(shared_ptr<MCPaths> uniformRandomNumbers) { shared_ptr<thrust::device_vector<double>> correlatedNormals( thrust::device_vector<double>(uniformRandomNumbers.size())); DeviceWriter writer; prepareDeviceMemory(writer); writer.writeObject(uniformRandomNumbers); writer.copyToDevice(); generateCorrelatedNormalKernel<<1, uniformRandomNumbers.getTrials>>>( writer.getDevicePtr(this), writer.getDevicePtr(uniformRandomNumbers), thrust::raw_pointer_cast(correlatedNormals->data())); return correlatedNormals; }

vtbl _params _cholesky vtbl _antithetic _brownianBridge _size2 _data _size1 … Device:

file: cuda_shared_ptr.hpp template<class T> class shared_ptr { __host__ __device__ T* operator->() const {…} }

Page 25: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Calling virtual function on device

file: PathGenerator.cu #include ‚cuda_shared_ptr.hpp‛ #include ‚cuda_vector.h‛ #include ‚cuda_matrix.hpp‛ #include ‚MCPaths.h‛ PathGenerator::generatePaths(shared_ptr<MCPaths> correlatedNormals) { shared_ptr<thrust::device_vector<double>> paths( thrust::device_vector<double>(correlatedNormals())); DeviceWriter writer; prepareDeviceMemory(writer); writer.writeObject(correlatedNormals); writer.addHostvtbl(“BSModel”, getvtbl(_bsModel)); writer.addDevicevtbl(“BSModel”, getDevicevtbl<BSModel>()); writer.copyToDevice(); generatePathKernel<<1, correlatedNormals.getTrials>>>(..) _model->diffuse(…) }

vtbl _params _cholesky vtbl _antithetic _brownianBridge _size2 _data _size1 … Device:

file: Model.h class ModelBase { __host__ __device__ virtual void diffuse(…) {…} }

Page 26: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

__host__ __device__? file: PathGenerator.2.h

#ifndef __CUDACC__

shared_ptr<MCPaths> PathGenerator::generateCorrelatedNormals(…) const

{

shared_ptr<MCPaths> dZs_ptr(new MCPaths(…));

MCPaths& dZs = *dZs_ptr;

#else

__device__ void DevicePathGenerator::generateCorrelatedNormal(const int iTrial, …) const

{

MCPathsRaw& dZs = *pdZs;

#endif

const size_t nStateVariables = dZs.getIndices();

const size_t nTimeSteps = dZs.getTimeSteps();

#ifndef __CUDACC__

for( size_t iTrial =0; iTrial< randomNumbers.getTrials(); ++iTrial)

{

#endif

for(size_t index =0; index< nStateVariables; ++index)

{

for(size_t jTime=0; jTime< nTimeSteps; ++jTime)

{

dZs(2*iTrial+1,index,jTime) = crng;

Page 27: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Case Study: Conclusions

1. Understand the balance between performance needs, hardware, development complexity/costs, risk

2. To CUDA-enable existing libraries:

– Choose appropriate granularity for parallelisation, review code which will need porting to CUDA

– Simple objects are unchanged

– Move object allocation (new/vector resizing) out of lower level functions

– Add *.cu files

• #include cuda-enabled STL and boost headers

• write kernels to dispatch to __device__ member functions

• write host code to prepare/copy objects and call kernel

– Modify/split existing headers

• Use #ifdef __CUDACC__ to wrap device functions/subclasses

• CUDAHOSTDEVICE prefix for shared header functions

– Project files

• Add a build configuration to disable CUDA

3. Issues

– Persisting objects on device between calls (using this paradigm)

– nvcc compiler warnings all on/off

– Debugging errors is painful (Nsight should help)

Page 28: Citi | Markets Quantitative Analysis · Citi | Markets Quantitative Analysis Optimising Risk Management Computational Methods and Technologies for Finance . ... root of all evil.

Citi | Markets Quantitative Analysis

The End


Recommended