+ All Categories
Home > Documents > Simulation in a Big Data World - B. Nikolic · SIMULATION IN A BIG DATA WORLD Bojan Nikolic BN...

Simulation in a Big Data World - B. Nikolic · SIMULATION IN A BIG DATA WORLD Bojan Nikolic BN...

Date post: 30-May-2018
Category:
Upload: dothu
View: 215 times
Download: 0 times
Share this document with a friend
23
SIMULATION IN A BIG DATA WORLD Bojan Nikolic BN Algorithms Ltd London 6 th July 2017 London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
Transcript

SIMULATION IN A BIG DATA WORLD

Bojan Nikolic

BN Algorithms Ltd

London 6th July 2017

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

LEGAL DISCLAIMER

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

1. No warranty, express or implied, to the fitness of the presented architecture or code for any purpose

2. Use entirely at your own risk – BN Algorithms or Bojan Nikolic shall not be responsible for any damages direct or

consequential

3. © Bojan Nikolic 2017, All Rights Reserved. Code sections present in this presentation are licensed to the reader

under the QuantLib open source license.

4. Apache Spark and Apache Zeppelin are trademarks of the Apache Software Foundation

5. AWS EMR is a product of Amazon Inc

Using “Big Data” Technologies to scale-up

simulation workloads

Concrete technologies: Apache Spark + Zeppelin

stack

Concrete example workload: valuation of

financial derivatives using QuantLib

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

THIS TALK IS ABOUT…

•Reuse the investment in Big Data technologies in a different, large, field:

•Training

•User Interfaces

•Cloud infrastructures/Own Data centre deployments

•Opportunity to combine big data analysis and simulations

•Opportunity for technology transfer from simulations into big data

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

THIS IS (HOPEFULLY!) OF INTEREST BECAUSE…

Career in Finance & Science connected by computing:

Design of PetaFLOPS+/PetaByte+ computing systems to process radio astronomy data

Grid/Cloud Risk Management systems for financial derivatives

Technologies enabling novel large radio astronomy telescopes

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

ABOUT ME…

Why derivatives? Simplify business, diversify risk

(But can equally be used to amplify risk)

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

FINANCIAL DERIVATIVES

Derivative = A security (contract) whose value depends in a non-linear way to

a price in an open market

Example: European Call Option at maturity 𝑃 𝑆 = ቊ0 𝑆 ≤ 𝐾𝑆 − 𝐾 𝑆 > 𝐾

What does the business want?

1. Short Time-To-Solution Reliably, (Low Max Power)

2. Low Capital Expenditure

3. Low Total Cost of ownership

4. High Degree of flexibility

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

ARCHITECTURAL DRIVERS

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

ARCHITECTURE

Boost C++ Libraries

QuantLib

QuantLib -- SWIG

Apache Spark

Apache ZeppelinKEY:

Non-Domain

Specific Module

Finance-Specific

Module

X Y “X” uses “Y”

1. Project started in 2000 – now on 17 years of active development

2. C++ / Object Oriented Architecture, 2000s vintage

3. BSD-like license

4. In commercial use at a number institutions (generally they are shy of publicising their use)

5. Designed primarily for single threaded use

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

ABOUT QUANTLIB

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

QUANTLIB OBJECT ORIENTATED ARCHITECTURE

OO design patterns enforce a distribution topology and organisation

1. Usually very inefficient to scale-up in this enforced topology/organisation

2. Difficult to ensure reliability

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

OBJECT ORIENTATION VS BIG DATA TECHNOLOGIES

Object Orientation

==

“Dataless Programming”

Not a good match for

“Big Data” Technologies!

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

TASK GRANULARITY-COUPLING PLANETa

sk G

ranu

lari

ty

Task Coupling

Fine Grained

Coarse Grained

Loosely Coupled Strongly Coupled

Derivatives

Risk Analysis

MCMC

Optimisation

Hydrodynamic

Simulation

Deterministic Convex

Problem Optimisation

1. Manually aggregate a fixed number of valuation into a single task

2. Each of these tasks recreates all the necessary objects

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

TASK STRATEGY

Sample Application:

Calculate model value for a set of 100 swaptions for a wide range of model

market conditions

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

QUANTLIB SWIG BINDING ARCHITECTURE

QuantLib

QuantLib

– SWIG

Interface

definitions

Apache SparkKEY:

Non-Domain

Specific Module

Finance-Specific

Module

X Y “X” uses “Y”C++ Adapters

Java Classes

JVM Bytecode

C / JNI

C++ APIX Y

“Y” is auto-

generated from “X”

using SWIG

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

SPARK ZEPPELIN ARCHITECTURE – COMPONENT & CONNECTOR VIEW

HTML/JS Client

Apache Zeppelin

Driver

Cluster

Scheduler

Worker Worker Worker

KEY:

Run-Time

Component

X Y

“X” sends requests

to “Y”, “Y” replies

asynchronously

Build QuantLib & Java bindings for

the AWS EMR runtime environment

Setup & spinup an EMR cluster

Construct a Zeppelin notebook with Scala/Spark

QuantLib simulation/valuation

Explore/visualize the simulation results

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

STRATEGY

HOW TO BUILD QUANTLIB FOR AMAZON EMR:

1. Single standalone shared (*.so) library, compatible with GCC 4.8 used by AWS EMR

2. SWIG Java bindings (for use viaScala/Spark) -> Single .JAR

3. NIX system the recommended way to build!

4. Pre-built example used here:1. Shared Lib:

https://s3.amazonaws.com/bnalgo-ql-emr-77x45/libQuantLibJNI.so

2. JAR: https://s3.amazonaws.com/bnalgo-ql-emr-77x45/QuantLib.jar

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

1. Standard AWS EMR Zeppelin + Spark (here using EMR AMI V5.7)

2. Bootstrap actions to add QuantLib:s3://bnalgo-ql-emr-77x45/qlemrbootstrap.sh

3. Spin-up & all ready-to-go (in 5 mins!)

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

CLUSTER AMAZON EMR START-UP

Scala/Spark QuantLib

valuation function

• Spark does not know how to distribute QuantLib values

• Does know how to distribute Scala closures, including QuantLib types-> Write functions which close over an environment without any QuantLib values

QuantLib

Global State

• Spark does not distribute or synchronise the QuantLib global state

• Write in a functional style – set global state in each valuation function

• Use executor-cores = 1 to separate global state between tasks

Spark distributed Map operation

• The input RDD is the set of simulation input parameters (i.e., scenarios)

• The map function which is the QuantLib valuation function

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

SCALA/SPARK APPLICATION: THE THREE TRICKY BITS

ZEPPELIN NOTEBOOK!

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

1. For many traditional “Big-Data” applications 𝜌~1 to 10

2. In this example: 𝜌 ≫ 106

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

SIMULATION OR BIG DATA ?

𝜌 =Number of executed CPU operations

Number of bytes of input from storage or network

1. Scale-up

2. Resilience

3. Use existing internal infrastructure or public cloud

4. Load balance against other analytics work

5. Results are stored in an environment ideally suited for further analysis and visualisation

6. Reproducibility

7. Collaboration with remote colleagues

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

WHAT HAVE WE ACHIEVED?

WWW.BNIKOLIC.CO.UK• SW Solutions

• Consulting

• Training

London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD

Thank you!


Recommended