Date post: | 18-Dec-2014 |
Category: |
Software |
Upload: | paradigm4inc |
View: | 174 times |
Download: | 5 times |
Massively Scalable
Computational
Finance with SciDB
Bryan Lewis
Chief Data Scientist
Frank Smietana
Solutions Architect
© P
ara
dig
m4
GoToWebinar
• Ask questions using the
Q&A window
• This webinar is being
recorded
• Replays will be available
from paradigm4.com
© P
ara
dig
m4
Common issues
• Expensive data ETL
• Lack of horizontal scalability
• Hard to program
• Hard to extend
• Difficulty with data JOINS
© P
ara
dig
m4
What is SciDB?
Massively scalable
distributed array database
© P
ara
dig
m4
What is SciDB?
Open source
© P
ara
dig
m4
Mike Stonebraker CTO
What is SciDB?
© Paradigm4 Inc.
Lawrence Berkeley
NASA Goddard
Projects using satellite image data
Institute for Geoinformatics
Global land change analysis on remote
sensing data (LANDSAT, MODIS, SENTINEL)
Lawrence Berkeley
Big Science and SciDB
© P
ara
dig
m4
Commercial applications Pharma, Biotech, Healthcare
Quantitative Finance
Image & Sensor Analytics
E-commerce
© P
ara
dig
m4
Arrays for finance
Symbol
Tim
e
© P
ara
dig
m4
Fast multidimensional SELECTs
© P
ara
dig
m4
Table model i j data
1 1 0.5
1 2 0.3
1 3 0.1
1 4 -0.5
2 1 0.9
2 2 0.0
2 3 -0.8
2 4 -0.8
3 1 1.1
3 2 1.0
3 3 1.2
3 4 1.5
4 1 0.9
4 2 1.0
4 3 1.2
4 4 1,5
© P
ara
dig
m4
Array model
0.5 0.3 0.1 -0.5
0.9 0.0 -0.8 -0.8
1.1 1.0 1.2 1.5
0.9 1.0 1.2 1.5
j
i
(1,1)
© P
ara
dig
m4
Our approach
• Less data movement
• Spatial data clustering
• Leverage popular languages
• Extensibility
© P
ara
dig
m4
C++
Julia
Java/JVM
Javascript
Array SQL
Use Popular Languages
JDBC
Protocol buffers
C/C++ API
HTTP
© P
ara
dig
m4
SciDB
0
SciDB
…
SciDB
1
SciDB
2
Shared-nothing architecture
© P
ara
dig
m4
Common issues
• Expensive data ETL
• Lack of horizontal scalability
• Hard to program
• Hard to extend
• Difficulty with data JOINS
© P
ara
dig
m4
SciDB
• Minimize ETL
• Massively scalable
• Program from many languages
• Open-source extensibility
• Fast parallel JOIN
© P
ara
dig
m4
Poll
© P
ara
dig
m4
Examples
• Order books
• Network analysis
© P
ara
dig
m4
Order book challenges
• Lots of exchanges
• Regulatory compliance
• Margins are shrinking
• Want more alpha
© P
ara
dig
m4
Create order book
• Load raw data into array
• Dimension along symbol and time
coordinate axes
• Create order book entries with
custom aggregation function ORDERBOOK
https://github.com/Paradigm4/orderbook-example
© P
ara
dig
m4
Consolidate order books
• Load as arrays
• Merge into single array
• Impute missing value
(inexact temporal join)
• Aggregate by time and symbol
© P
ara
dig
m4
Example Order Books
© P
ara
dig
m4
Merge and impute
© P
ara
dig
m4
Consolidated Order Book
© P
ara
dig
m4
Benchmark Results
• 9 exchanges; 358,000,000 events; 8,000 symbols
• Order book depth: 10
© P
ara
dig
m4
Financial network analysis
© P
ara
dig
m4
A graph
© P
ara
dig
m4
Sparse matrix representation
© P
ara
dig
m4
Bitcoin transactions A directed graph
Represented as a nonsymmetric
sparse matrix
From
address
To
address Date, Amount,
Transaction ID
© P
ara
dig
m4
Bitcoin network schema
(using the Reid/Harrigan user ID method)
Identify important nodes
• Kleinberg HITS method
• Subgraph centrality
• Fielder clustering
• Other methods...
Bitcoin subgraph centrality
• Identify top 5 most central hub and authority nodes
• 16.3M nodes
• 6.3M x 6.3M sparse matrix
• 8-instance SciDB cluster on a single workstation (8 cores)
• 20 seconds
© Paradigm4 Inc.
Correlation network
1 Compute bar data closing
prices from TAQ trades
2 na.locf imputation
3 Correlation matrix across all
instruments
4 Regularize
5 Precision matrix
6 Threshold
7 Plot clusters
All inside SciDB up to plot
Take away
• Bringing the analysis to the data
• In-database complex math
• Parallel time series analysis
• Programmable from C++, R, Python ...
• MPP on commodity clusters, clouds
• Extensible, open-source
www.paradigm4.com
© Paradigm4 Inc.
Questions?
Tell us about your application • [email protected]
Try our Quick Start • scidb.org/forum
• Download a VM or EC2 AMI
www.paradigm4.com