Statistically Sound Machine Learningfor
Algorithmic Trading of Financial Instruments
Developing Predictive-Model-Based Trading SystemsUsing TSSB
David Aronson
with
Timothy Masters, Ph.D.Technical Advisor
Edition 1.20
Table of Contents
Introduction 1Two Approaches to Automated Trading 1Predictive Modeling 2
Indicators and Targets 3Converting Predictions to TradeDecisions 4
Testing the Trading System 5Walkforward Testing 7Cross Validation 9Overlap Considerations 10Performance Criteria 11Model Performance Versus Financial Performance 12Financial Relevance and Generalizability 12Performance Statistics in TSSB 13
Desirable Program Features 16
A Simple Standalone Trading System 19The Script File 19The Audit Log 23
A Walkforward Fold 26Out-of-Sample Results for This Fold 31The Walkforward Summary 33
A Simple Filter System 35The Trade File 35The Script File . . ] 37The Audit Log : . . 41
Out-of-Sample Results for This Fold 43The Walkforward Summary 45
Common Initial Commands 47'Market Price Histories and Variables 47Quick Reference to Initial Commands 48Detailed Descriptions 50
INTRADAY BY MINUTE 50INTRADAY BY SECOND 50MARKET DATE FORMAT YYMMDD 51
MARKET DATE FORMAT M_D_YYYY 51MARKET DATE FORMAT AUTOMATIC 51REMOVE ZERO VOLUME 52READ MARKET LIST 52READ MARKET HISTORIES 53MARKET SCAN 54RETAIN YEARS 55RETAIN MOD 56CLEAN RAWDATA 59INDEX ;. 59READ VARIABLE LIST 60OUTLIER SCAN 61DESCRIBE 62CROSS MARKET AD 63CROSS MARKET KL 63CROSS MARKET IQ 64STATIONARITY 65
A Final Example 68
Reading and Writing Databases 73Quick Reference to Database Commands 73Detailed Descriptions 74
RETAIN MARKET LIST 74VARIABLE IS TEXT 75WRITE DATABASE 75READ DATABASE 76READ UNORDERED DATABASE 77APPEND DATABASE 78IS PROFIT 79
A Saving/Restoring Example 80
Creating Variables 83Overview and Basic Syntax , 83Index Markets and Derived Variables 84
An Example of IS INDEX and MINUS INDEX 85Multiple Indices 86
Historical Adjustment to Improve Stationarity 87Centering 87Scaling . . 88Normalization 89An Example of Centering, Scaling, and Normalization 89
Cross-Market Normalization 92Pooled Variables 93
MEDIAN pooling 94CLUMP60 Pooling 94
Mahalanobis Distance 96Absorption Ratio 97Trend Indicators 99
MA DIFFERENCE ShortLength LongLength Lag 99LINEAR PER ATR HistLength ATRlength 99QUADRATIC PER ATR HistLength ATRlength 100CUBIC PER ATR HistLength ATRlength 100RSI HistLength 101STOCHASTIC K HistLength 101STOCHASTIC D HistLength 101PRICE MOMENTUM HistLength StdDevLength . . . . . . . . 101ADX HistLength 102MIN ADX HistLength MinLength 102RESIDUAL MIN ADX HistLength MinLength 102MAX ADX HistLength MaxLength 103RESIDUAL MAX ADX HistLength MaxLength 103DELTA ADX HistLength DeltaLength 103ACCEL ADX HistLength DeltaLength 104INTRADAY INTENSITY HistLength 104DELTA INTRADAY INTENSITY HistLength DeltaLength
104REACTIVITY HistLength 105DELTA REACTIVITY HistLength DeltaDist 107MIN REACTIVITY HistLength Dist 107MAX REACTIVITY HistLength Dist 107
Trend-Like Indicators , . 108CLOSE TO CLOSE 108N DAY HIGH HistLength 108N DAY LOW HistLength . 109
Deviations from Trend 110"CLOSE MINUS MOVING AVERAGE HistLen ATRlen . . . 110LINEAR DEVIATION HistLength 110QUADRATIC DEVIATION HistLength I l lCUBIC DEVIATION HistLength I l lDETRENDED RSI DetrendedLength DetrenderLength Lookback
112
Volatility Indicators 113ABS PRICE CHANGE OSCILLATOR ShortLen Multiplier .113PRICE VARIANCE RATIO HistLength Multiplier 114MIN PRICE VARIANCE RATIO HistLen Mult Mlength 114CHANGE VARIANCE RATIO HistLength Multiplier 114MIN CHANGE VARIANCE RATIO HistLen Mult Mien . . . . 115ATR RATIO HistLength Multiplier 115DELTA PRICE VARIANCE RATIO HistLength Multiplier
115DELTA CHANGE VARIANCE RATIO HistLength Multiplier
116DELTA ATR RATIO HistLength Multiplier 116BOLLINGER WIDTH HistLength 116DELTA BOLLINGER WIDTH HistLength DeltaLength . . . 117N DAYNARROWER HistLength 117N DAY WIDER HistLength 118
Indicators Involving Indices 119INDEX CORRELATION HistLength 119DELTA INDEX CORRELATION HistLength DeltaLength . . 119DEVIATION FROM INDEX FIT HistLength MovAvgLength
, . . . 1 2 0PURIFIED INDEX Norm HistLen Npred Nfam Nlooks Lookl ...
121Basic Price Distribution Statistics 123
PRICE SKEWNESS HistLength Multiplier 123CHANGE SKEWNESS HistLength Multiplier 123PRICE KURTOSIS HistLength Multiplier 123CHANGE KURTOSIS HistLength Multiplier . 124DELTA PRICE SKEWNESS HistLen Multiplier DeltaLen . . . 124DELTA CHANGE SKEWNESS HistLen Multiplier DeltaLen
( . . . 124DELTA PRICE KURTOSIS HistLen Multiplier DeltaLen 124DELTA CHANGE KURTOSIS HistLen Multiplier DeltaLen . . 124
Indicators That Significantly Involve Volume 125VOLUME MOMENTUM HistLength Multiplier 125DELTA VOLUME MOMENTUM HistLen Multiplier DeltaLen
125VOLUME WEIGHTED MA OVER MA HistLength 126DIFF VOLUME WEIGHTED MA OVER MA ShortDist LongDist
126PRICE VOLUME FIT HistLength 127
DIFF PRICE VOLUME FIT ShortDist LongDist 127DELTA PRICE VOLUME FIT HistLength DeltaDist 127ON BALANCE VOLUME HistLength 128DELTA ON BALANCE VOLUME HistLength DeltaDist . 128POSITIVE VOLUME INDICATOR HistLength 129DELTA POSITIVE VOLUME INDICATOR HistLen DeltaDist
129NEGATIVE VOLUME INDICATOR HistLength 129DELTA NEGATIVE VOLUME INDICATOR HistLen DeltaDist
: r. 129PRODUCT PRICE VOLUME HistLength 130SUM PRICE VOLUME HistLength 131DELTA PRODUCT PRICE VOLUME HistLen DeltaDist . 131DELTA SUM PRICE VOLUME HistLen DeltaDist 131
Entropy and Mutual Information Indicators 132PRICE ENTROPY WordLength 133VOLUME ENTROPY WordLength 133PRICE MUTUAL INFORMATION WordLength 133VOLUME MUTUAL INFORMATION WordLength 134
Indicators Based on Wavelets 135REAL MORLET Period 137REAL DIFF MORLET Period 138REAL PRODUCT MORLET Period 138IMAG MORLET Period 139IMAG DIFF MORLET Period 139IMAG PRODUCT MORLET Period 140PHASE MORLET Period 140DAUB MEAN HistLength Level 141DAUB MIN HistLength Level 141DAUB MAX HistLength Level 141DAUB STD HistLength Level . 141DAUB ENERGY HistLength Level 141DAUB NL ENERGY HistLength Level 142DAUB CURVE HistLength Level 142/
Folio w-Through-Index (FTI) Indicators 143Low-Pass Filtering and FTI Computation 144Block Size and Channels 144Essential Parameters for FTI calculation 144Computing FTI 147Automated Choice of Filter Period 148Trends Within Trends 148
FTI Indicators Available in TSSB 150FTI LOWPASS BlockSize HalfLength Period 150FTI MINOR LOWPASS BlockSize HalfLength LowPeriod
HighPeriod 151FTI MAJOR LOWPASS BlockSize HalfLength LowPeriod
HighPeriod 151FTI FTI BlockSize HalfLength Period 151FTI LARGEST FTI BlockSize HalfLength LowPeriod HighPeriod
152FTI MINOR FTI BlockSize HalfLength LowPeriod HighPeriod
152FTI MAJOR FTI BlockSize HalfLength LowPeriod HighPeriod
152FTI LARGEST PERIOD BlockSize HalfLength LowPeriod
HighPeriod 153FTI MINOR PERIOD BlockSize HalfLength LowPeriod
HighPeriod 153FTI MAJOR PERIOD BlockSize HalfLength LowPeriod
HighPeriod 153FTICRAT BlockSize HalfLength LowPeriod HighPeriod . . 154FTI MINOR BEST CRAT BlockSize HalfLength LowPeriod
HighPeriod 154FTI MAJOR BEST CRAT BlockSize HalfLength LowPeriod
HighPeriod 155FTI BOTH BEST CRAT BlockSize HalfLength LowPeriod
HighPeriod 155Target Variables 156
NEXT DAY LOG RATIO 156NEXT DAY ATR RETURN Distance 157SUBSEQUENT DAY ATR RETURN Lead Distance 157NEXT MONTH ATR RETURN Distance '.'... 157HIT OR MISS Up Down Cutoff ATRdist 158FUTURE SLOPE Ahead ATRdist 158RSQ FUTURE SLOPE Ahead ATRdist 15-9
Screening Variables 161Chi-Square Tests 162
Options for the Chi-Square Test 163Output of the Chi-Square Test 164Running Chi-Square Tests from the Menu 165
Nonredundant Predictor Screening 167
Options for Nonredundant Predictor Screening 170Running Nonredundant Predictor Screening from the Menu . . 171Examples of Nonredundant Predictor Screening 173
Models 1: Fundamentals 179Overview and Basic Syntax 181Mandatory Specifications Common to All Models 183
The INPUT list 183The OUTPUT Specifier . ; . . . . . 184Number of Inputs Chosen by Stepwise Selection 184The Criterion to be Optimized in Indicator Selection 185A Lower Limit on the Number or Fraction of Trades 189Summary of Mandatory Specifications for All Models 190
Optional Specifications Common to All Models 191Mitigating Outliers 191Testing Multiple Stepwise Indicator Sets 192Stepwise Indicator Selection With Cross Validation 193When the Target Does Not Measure Profit 194Multiple-Market Trades Based on Ranked Predictions 195Restricting Models to Long or Short Trades 196Prescreening For Specialist Models 196Building a Committee with Exclusion Groups 197Building a Committee with Resampling and Subsampling . . . . 199Avoiding Overlap Bias 200A Popularity Contest for Indicators 202Bootstrap Statistical Significance Tests for Performance 204Monte-Carlo Permutation Tests : 206
An Example Using Most Model Specifications 207Sequential Prediction 208
Models 2: The Models '.. . 213Linear Regression 213
The MODEL CRITERION Specification for LINREG Models21-3
The Identity Model 215Quadratic Regression 216The General Regression Neural Network 219The Multiple-Layer Feedforward Network 220
The Number of Neurons in the First Hidden Layer 220The Number of Neurons in the Second Hidden Layer 221Functional Form of the Output Neuron 221
The Domain of the Neurons 222A Basic MLFN Suitable for Most Applications 223A Complex-Domain MLFN 224
The Basic Tree Model 225A Forest of Trees 227Boosted Trees 229Operation String Models 231
Use of Constants in Operation Strings 233Split Linear Models for Regime Regression 236
An Ordinary SPLIT LINEAR Model 238The NOISE Version of the SPLIT LINEAR Model 239
Committees 241Model Specifications Used by Committees 242The AVERAGE Committee 244The LFNREG (Linear Regression) Committee 245Constrained Linear Regression Committee 246Models as Committees 247Creating Component Models for Committees 248
Exclusion Groups 249Explicit Specification of Different Indicators 250Using Different Selection Criteria 250Varying the Training Set by Subsampling 251Varying the Training Set by Resampling 252
Oracles 253Model Specifications Used by Oracles 254Traditional Operation of the Oracle 255Prescreen Operation of the Oracle 257
The HONOR PRESCREEN Option 257The PRESCREEN ONLY Option ( ... 257An Example of Prescreen Operation 259
More Complex Oracles 260
Testing Methods 263Performance for the Entire Dataset 264Walkforward Testing 265Cross Validation by Time Period 266Cross Validation using a Control Variable 267Cross Validation by Random Blocks 269Preserving Predictions for Trade Simulation 270
Market States as Trade Triggers 271An Example of Simple Triggering 272Triggering Based on State Change 275
Triggering Versus Prescreening 277Commands Common to All Four Examples 278Example 1: Model Specialization via PRESCREEN 280Example 2: Unguided Specialization 285Example 3: Triggering on High Volatility 290Example 4: Triggering on Low Volatility 295
Permutation Training 299The Components of Performance 302Permutation Training and Selection Bias 306Multiple-Market Considerations 311
Transforms 313Expression Transforms 315
Quantities That May Be Referenced 316Vector Operations in Expression Transforms 320Vector-to-Scalar Functions 321An Example with the @SIGN_AGE Function 321Logical Evaluation in Expression Transforms 323An Example with Logical Expressions 324A More Complex Example 325
Principal Component Transforms 326Invoking the Principal Components Transform 327Tables Printed 328An Example 329
Linear and Quadratic Regression Transforms 332A Regression Transform Example 333
The Nominal Mapping Transform '. . 336Inputs and the Target 336Gates 337Focusing on Extreme Targets 338Declaring the Transform and its Options 338A Nominal Mapping Example 341
The ARMA Transform 345The PURIFY Transform 353
Defining the Purified and Purifier Series 353Specifying the Predictor Functions 355Miscellaneous Specifications 356
Usage Considerations 357A Simple Example 359
Complex Prediction Systems 363Stacking Models and Committees 365
Graphics 371Series Plot 372Series + Market , 374Histogram 375Thresholded Histogram 377Density Map 380Bivariate and Trivariate Plots 386
Trivariate Plots 391Equity 393Prediction Map 397Indicator-Target Relationship 400
Isolating Predictability of Direction Versus Magnitude 405
Finding Independent Predictors 407A FIND GROUPS Demonstration 410
Market Regression Classes 417REGRESSION CLASS Demonstrations 419
The Hierarchical Method 419The Sequential Method 423The Leung Method 427
Developing a Stand-Alone System 431Choosing Predictor Candidates and the Target 431
Choosing the Target ''. . . . 431Quality Does Not Equal Quantity for Predictors 434
Predictor and Target Selection for this Study 436Stationary 439The Problem of Outliers 440Cross-Market Compatibility 442
Data Snooping: Friend or Foe? 444Checking Stability with Subsampling 445How Long Does the Model Hold Up? 449Finding Models for a Committee 451The Trading System 453
The Final Test 456
Trade Simulation and Portfolios 459Writing Equity Curves 461Performance Measures 465Portfolios (File-Based Version) 466
A Portfolio Example 470
Integrated Portfolios , 479A FIXED Portfolio Example ".'". 482An OOS Portfolio Example 483