+ All Categories
Home > Business > Haefele june27 1150am_room212_v2

Haefele june27 1150am_room212_v2

Date post: 24-Jun-2015
Category:
Upload: hadoop-summit
View: 719 times
Download: 0 times
Share this document with a friend
Description:
Deep Value has been using Hadoop to do simulations of trading strategies that trade over 3.5% of the US stock market. We provide both high frequency market making and execution strategies. Our largest customer is the NYSE where we provide execution services to the floor broker community. We have taken our high performance, fault tolerant Java trading engine and adapted it to run as a Map-Reduce job. Our execution engine Mapper is then used to pull out the order-by-order data of all orders going into the US stock market and replay these against our production algorithmic logic. We do this to understand if any changes made to the algorithmic logic improve the overall performance of our trading. However this approach, although solving one set of issues (“is this approach better than than that”), creates a new set of challenges. These include not blowing our compute budget (EC2 costs add up so we built our own 50 server base cluster), and deal with the escalating data that these simulations generate. Luckily these are first world problems that Hadoop itself can help us address. We will describe how we went about converting our execution engine to use Hadoop and what components are needed to build a suitable trading simulation environment. We will also examine the types of analysis that we have build on top of the trading data that have helped us us understand what we are doing.
Popular Tags:
16
Deep Value Hadoop Summit June 2013 Deep Value, Inc.
Transcript
Page 1: Haefele june27 1150am_room212_v2

Deep Value

Hadoop Summit June 2013

Deep Value, Inc.

Page 2: Haefele june27 1150am_room212_v2

Outline of talk l  Who are we

l  What do we do

l  What is HFT

l  What is the structure of our technology effort

l  How we use Hadoop

l  Focus on what we've built at top level and lessons learned

l  Next steps? Open source with founding team

Page 3: Haefele june27 1150am_room212_v2

Deep Value l  Started in 2006 to provide high performance execution

algorithms on a “paid for performance” basis.

l  Execution algorithms take large client orders and split into small pieces to execute through the day

l  Routinely trade 0.5 – 1% of US stock market volumes. Highest date in 2012 was ~4% and ~3% this year

l  Exchange sponsored execution algorithms to NYSE floor brokers.

l  45 people based in US and India

Page 4: Haefele june27 1150am_room212_v2

What do we do l  Utilize sophisticated math and statistics to see patterns in

the data to come up with trading tactics

l  Use simulation to understand if trading ideas in-fact work.

l  Core business is providing tools (algos) to mutual funds and others to avoid being gamed by pure HFT-traders

l  Ability to harness compute resources is a key determinant of success - Hadoop

l  All compute resources are now cluster based and need a grid platform to utilize - Hadoop

Page 5: Haefele june27 1150am_room212_v2

What if HFT? l  Look at every order in the market and make real-time

decisions on what to do next

l  Looking to receive rebates by providing liquidity when sensible to do so

– Citibank was favourite for many years due to low price and thus large % spread

l  Some amount of “sniffing out” of large orders

l  Often a speed game – faster routers, shorter wires, FPGA

l  We use smarts to try and not show our hand

Page 6: Haefele june27 1150am_room212_v2

Trading Systems l  Order management systems (OMS) / Execution

Management Systems (EMS)

l  Takes in market data representing every order placed in every market

l  Sends out orders to market, manipulates those orders (replace/cancel) and receives fills

– Via name-value protocol call FIX

l  Fills represent actual trades

l  Logs what it is doing via structured logging

Page 7: Haefele june27 1150am_room212_v2

Cloe

Page 8: Haefele june27 1150am_room212_v2

Lessons from building grid l  Cluster wide locks is the problem

–  Focus on these in design

– Batch changes and get lock once

l  Build for performance case, and have failure case be potentially slower / more complex

– Regular message processing doesn't get cluster locks

l  Hybrid of message passing & centralized control

Page 9: Haefele june27 1150am_room212_v2

Questions to solve: Hadoop l  What is the algorithm actually doing?

– Complexity e.g. feedback loops

– Testing against intentions

l  Can we do better next time

– Back-testing

–  Improved research process

l  Log and historical market data management

Page 10: Haefele june27 1150am_room212_v2

DV Research Process l  What to be able to look at “raw” market data to be able to

prove ideas

– Typically non-programmers with statistical background

– R-project including R-Hadoop

l  Want to be able to make change to production code, and test if this works better via simulation

– Does it work better, how, when?

l  Roll out code to production easily

Page 11: Haefele june27 1150am_room212_v2

Hadoop-ifying Cloe l  Realized we could run Cloe under Hadoop

l  Drive “orders” into Cloe via Hadoop

l  Pass in market data quote files via HBase

l  Store simulation results in Hadoop/HBase

l  Market Simulation Framework outputs fills

l  Cascading to allow complex analysis by senior coders

Page 12: Haefele june27 1150am_room212_v2
Page 13: Haefele june27 1150am_room212_v2

Lessons learned - Hadoop l  EC2 costs can mount quickly

– Had hybrid plan (either own or EC2)

– Built our own 50 node cluster. See DV blog.

l  Smaller files should be in Hbase not Hadoop has a NameNode limitation

– All file pointers in memory

l  Different tasks with different resource requirements don't play nicely in single cluster

– YARN should solve this.

Page 14: Haefele june27 1150am_room212_v2

Lessons learned – Hadoop...

l Make developer machine setup turn-key

– We use extensive scripting to make getting dev environment running a one step process

– Dev environment was controlled to close to cluster environment

l Cascading is great for complex analysis

l Importance of configuration of cluster

– Memory, threads, cores for your jobs

Page 15: Haefele june27 1150am_room212_v2

Next steps l  Considering open-sourcing via Apache license

l  Bring some sanity to traditional execution technology space

l  Looking for a founding team

l  Please talk to me afterward if you're interested in investigating further

Page 16: Haefele june27 1150am_room212_v2

End


Recommended