FUNDAMENTAL: Using Macroeconomic Indicators and
Genetic Algorithms in Stock Market Forecasting
Oleksandr Yefimochkin
Dissertação para obtenção do Grau de Mestre em
Engenharia Electrotécnica e de Computadores
Júri
Presidente: Professor Marcelino Bicho dos Santos
Orientador: Professor Rui Fuentecilla Maia Ferreira Neves
Co-orientador: Professor Nuno Cavaco Gomes Horta
Vogal: Professor Miguel Leitao Bignolas Mira da Silva
Outubro de 2011
i
Resumo
Mercados de Capitais tornaram-se extremamente populares na comunidade académica,
principalmente na área de Machine Learning and Softcomputing, onde o impacto de vários
factores e previsão de preços futuros são investigados utilizando uma variedade de algoritmos.
Entre essas metodologias inteligentes, é possível destacar as técnicas, tais como Algoritmos
Genéticos, Programação Genética e Redes Neurais. Neste trabalho foi desenvolvido uma
aplicação que utilizando um Algoritmo Genético, Indicadores Macroeconómicos de diferentes
regiões do mundo (Estados Unidos da América, União Europeia e Alemanha) e medindo o
impacto dos indicadores através da volatilidade dos Futuros do Índice S&P500, consegue
prever a evolução futura dos preços deste Índice. Apesar da variedade de indicadores técnicos
existentes, neste trabalho apenas foram utilizadas Medias Moveis e VIX, deste modo dando
uma maior ênfase à medição do impacto dos Indicadores Macroeconómicos. Para validar os
resultados, as estratégias obtidos foram comparados com o B&H e estratégias baseadas em
Médias Móveis e VIX no período entre 2010/01 e 2011/09 mostrando ter um melhor
desempenho. A aplicação desenvolvida obteve excelentes resultados durante a simulação,
utilizando quase exclusivamente os Indicadores Macroeconómicos. A conclusão mais
importante deste trabalho é que o impacto das notícias macroeconómicas pode ser medido
com sucesso utilizando a volatilidade do mercado associados à sua publicação. O impacto
medido desta forma pode ser utilizado com sucesso no investimento de curto prazo, apesar de
geralmente considerar-se que a análise macroeconómica considera factores que afectam os
mercados a longo prazo.
Palavras-Chave: Algoritmos Genéticos, Bolsas de Valores, Analise Fundamental, Indicadores
Macroeconómicos, Analise Técnica.
ii
iii
Abstract
Capital Market has become extremely popular among academic community, particularly in
Machine Learning and Softcomputing areas where the impact of various factors and prediction
of future prices are investigated using a variety of algorithms and Fundamental and/or Technical
Analysis. Among those intelligent methodologies, it is possible to highlight techniques such as
Genetic Algorithms, Genetic Programming and Neural Networks. In this work was developed
Genetic Algorithm based application that using mainly Macroeconomic Indicators from different
regions (United States of America, European Monetary Union and Germany) and measuring its
impact using S&P500 Index Futures volatility, can successfully forecast Index’s price evolution.
Despite the wide range of existing Technical Indicator, in this work only MAs and VIX were
used, since it was intended to give greater emphasis to the measurement of impact of the
Macroeconomic Indicators. To validate the results, obtained strategies were compared against
the B&H and MA based strategies in the period between 2010/01 and 2011/09 with the S&P500
Index Futures, showing to have better performance. The developed application made an
excellent profit in a simulation exercise using almost exclusively Macroeconomic Indicators and
by performing its optimization. The most important conclusion of this work is that the
Macroeconomic News’ Impacts can be successfully measured using the market’s volatility
associated to its release. The Macroeconomic Indicators’ impacts, measured this way, can be
successfully used in the short term forecasting, despite the fact that usually it is considered that
Macroeconomic analysis considers factors affecting the long-term level.
Keywords: Genetic Algorithms, Stock Markets, Fundamental Analysis, Macroeconomic
Indicators, Technical Analysis.
iv
v
Acknowledgements
I am heartily thankful to my supervisor Rui Neves and to my co-advisor Nuno Horta, whose
encouragements, guidance and support from the initial to the final level enabled me to develop
an understanding of the subject.
Lastly, I offer my regards and blessings to my family and to all of those who supported me in
any respect during the completion of the project.
vi
vii
“Look at market fluctuations as your friend rather than your enemy; profit from folly rather than
participate in it.”
Warren Buffett
viii
ix
Table of Contents
RESUMO ................................................................................................................................................. I
ABSTRACT ............................................................................................................................................. III
ACKNOWLEDGEMENTS ......................................................................................................................... V
LIST OF TABLES .................................................................................................................................. XIII
LIST OF FIGURES ................................................................................................................................ XVII
LIST OF ACRONYMS AND ABBREVIATIONS ......................................................................................... XIX
OPTIMIZATION AND COMPUTER ENGINEERING RELATED ..................................................................................... XIX
INVESTMENT RELATED ................................................................................................................................. XIX
CHAPTER 1 INTRODUCTION ............................................................................................................ 1
1.1 CONTEXT AND MOTIVATION .............................................................................................................. 2
1.2 WORK’S PURPOSE ............................................................................................................................ 2
1.3 EXISTING METHODOLOGIES ............................................................................................................... 2
1.4 DOCUMENT STRUCTURE .................................................................................................................... 3
CHAPTER 2 RELATED WORK ............................................................................................................ 5
2.1 INTRODUCTION ................................................................................................................................ 5
2.2 MARKET ANALYSIS AND INVESTMENT TECHNIQUES ................................................................................. 5
2.2.1 Fundamental Analysis ............................................................................................................. 5
2.2.1.1 Industry Analysis .......................................................................................................................... 5
2.2.1.2 Company Analysis ........................................................................................................................ 6
2.2.1.3 Economic Analysis ........................................................................................................................ 7
2.2.2 Technical Analysis ................................................................................................................. 10
2.2.2.1 Transaction Volumes .................................................................................................................. 11
2.2.2.2 Moving Averages ........................................................................................................................ 11
2.2.2.3 Trend Lines ................................................................................................................................. 11
2.2.2.4 Support and Resistance Lines ..................................................................................................... 12
2.2.2.5 Ascending/Descending Triangle Pattern .................................................................................... 12
2.2.2.6 Reversal Patterns ....................................................................................................................... 13
2.2.2.7 Technical Indicators.................................................................................................................... 13
2.2.3 Fundamental Analysis vs. Technical Analysis ....................................................................... 14
2.3 SOFT COMPUTING METHODOLOGIES AND EXISTING SOLUTIONS ............................................................... 14
2.3.1 Artificial Neural Networks .................................................................................................... 15
2.3.2 Genetic Algorithms and Genetic Programming .................................................................... 16
2.3.3 Other solutions ..................................................................................................................... 18
x
2.4 CONCLUSIONS ............................................................................................................................... 21
CHAPTER 3 DATA TIME SERIES ANALYSIS ...................................................................................... 23
3.1 DATA TIME SERIES ......................................................................................................................... 23
3.1.1 Macroeconomic Data Time Series ........................................................................................ 23
3.1.2 Index Data Time Series ......................................................................................................... 23
3.1.3 Macroeconomic Data Impact Measurement ........................................................................ 24
3.1.4 Macroeconomic Data Filtering ............................................................................................. 25
3.1.5 Correlation between Index Prices and Macroeconomic Data ............................................... 27
3.1.6 Technical Data Time Series ................................................................................................... 31
3.1.7 Volatility Measurement ........................................................................................................ 32
3.2 CONCLUSIONS ............................................................................................................................... 33
CHAPTER 4 SOLUTION’S ARCHITECTURE AND IMPLEMENTATION ................................................. 35
4.1 OVERALL ARCHITECTURE ................................................................................................................. 35
4.2 IMPLEMENTATION’S ARCHITECTURE................................................................................................... 36
4.3 OPTIMIZATION AND SIMULATION LAYER ............................................................................................. 38
4.3.1 Genetic Algorithm ................................................................................................................. 38
4.3.2 Hypotheses Representation .................................................................................................. 38
4.3.3 The Fitness Function ............................................................................................................. 38
4.3.4 The Genetic Operations ........................................................................................................ 38
4.3.4.1 Recombination Operation .......................................................................................................... 39
4.3.4.2 Mutation Operation ................................................................................................................... 39
4.3.5 Termination Criterion............................................................................................................ 39
4.3.6 Selection Function ................................................................................................................. 40
4.3.7 The Flow-Chart of the Genetic Algorithm ............................................................................. 41
4.3.8 Algorithm’s Parameters ........................................................................................................ 43
4.3.9 Optimization Package Class Diagram ................................................................................... 44
4.4 APPLICATION LAYER........................................................................................................................ 45
4.4.1 Problem Specific Model ........................................................................................................ 45
4.4.1.1 Evaluation Function .................................................................................................................... 48
4.4.1.2 Crossover Operation .................................................................................................................. 49
4.4.1.3 Mutation Operation ................................................................................................................... 50
4.4.1.4 Model Package Class Diagram .................................................................................................... 51
4.4.2 Problem Specific Data Time Series ........................................................................................ 53
CHAPTER 5 RESULTS ...................................................................................................................... 55
5.1 CASE STUDY I – MAS, VIX AND ALL MEVS ......................................................................................... 55
5.1.1 Case Study I.I – MAs and VIX ................................................................................................ 56
xi
5.1.2 Case Study I.II – MAs, VIX and all MEVs with Linear Contribution ........................................ 56
5.1.3 Case Study I.III – MAs, VIX and all MEVs with Linear Contribution and Simple Decay ......... 58
5.1.4 Case Study I.IV – MAs, VIX and all MEVs with Linear Contribution and Exponential Decay . 61
5.1.5 Case Study I.V – MAs, VIX and all MEVs with Unit Contribution ........................................... 62
5.1.6 Case Study I.VI – MAs, VIX and all MEVs with Unit Contribution and Simple Decay ............ 62
5.1.7 Case Study I.VII – MAs, VIX and all MEVs with Unit Contribution and Exponential Decay ... 63
5.1.8 Case Study I.VIII – Case Study I.III with Restricted Parameters ............................................. 63
5.1.9 Conclusions ........................................................................................................................... 64
5.2 CASE STUDY II – MAS, VIX AND MEVS’ OPTIMISATION WITH LINEAR CONTRIBUTION AND SIMPLE DECAY ..... 64
5.2.1 Case Study II.I - MEVs’ Optimization ..................................................................................... 65
5.2.2 Case Study II.II - MEVs’ Optimization with Sliding Window .................................................. 66
5.2.3 Conclusions ........................................................................................................................... 68
5.3 SUMMARY .................................................................................................................................... 69
5.3.1 Best Strategies ...................................................................................................................... 69
5.3.2 Key Macroeconomic Indicators ............................................................................................. 69
CHAPTER 6 CONCLUSIONS AND FUTURE WORK ............................................................................ 71
6.1 CONCLUSION................................................................................................................................. 71
6.2 FUTURE WORK .............................................................................................................................. 71
REFERENCES ......................................................................................................................................... 73
APPENDIX A – MACROECONOMIC INDICATORS ................................................................................... 77
APPENDIX B - APPLICATION’S USER GUIDE .......................................................................................... 89
APPLICATION’S INSTALLATION ........................................................................................................................ 89
APPLICATION’S USER INTERFACE .................................................................................................................... 89
APPLICATION’S INPUT PARAMETERS ............................................................................................................... 90
Application’s Output .......................................................................................................................... 93
APPENDIX C – INDEX DATA TIME SERIES COLLECTING PROGRAM ........................................................ 95
APPENDIX D – ENTERPRISE ARCHITECT QUICK USE GUIDE ................................................................... 97
xii
xiii
List of Tables
Table 1 – Company's Fundamental Indicators .............................................................................. 6
Table 2 – The Most Influential U.S. Economic Indicators ............................................................. 8
Table 3 - U.S. Economic Indicators Most Sensitive to Stocks ...................................................... 9
Table 4 – US Economic Indicators Most Sensitive to Bonds ........................................................ 9
Table 5 - Indicators That Most Influence the U.S. Dollar’s Value ................................................. 9
Table 6 - Indicators That Lead the Rest of the Economy ............................................................ 10
Table 7 - “Top Ten” International Economic Indicators ............................................................... 10
Table 8 – Technical Indicators .................................................................................................... 14
Table 9 - A Genetic Algorithm Prototype ..................................................................................... 17
Table 10 - Terminal and function sets ......................................................................................... 18
Table 11 - Summary of the existing solutions ............................................................................. 19
Table 12 - Mean Index Variation from 01/01/2007 to 01/01/2010 ............................................... 25
Table 13 - Top 50 AMV Macroeconomic Indicators .................................................................... 25
Table 14 - Correlation between Variables' Impacts and Index Prices ........................................ 27
Table 15 - Moving Averages Used .............................................................................................. 31
Table 16 - Genetic Algorithm’s Parameters ................................................................................ 43
Table 17 - Parameters' Description and Ranges of Values ........................................................ 46
Table 18 - Hypothesis' additional Parameters............................................................................. 47
Table 19 - Mutation Operation ..................................................................................................... 51
Table 20 - Case Study’s I Constant Parameters ......................................................................... 55
Table 21 – Application’s Parameters Case Study I.I ................................................................... 56
Table 22 - Case Study I.I PI Evaluation Function ....................................................................... 56
Table 23 - Case Study I.I PIMDD Evaluation Function ............................................................... 56
Table 24 - Case Study I.I CR Evaluation Function ...................................................................... 56
xiv
Table 25 – Application’s Parameters Case Studies I.II – I.VII .................................................... 57
Table 26 - Case Study I.II PI Evaluation Function ...................................................................... 57
Table 27 - Case Study I.II PIMDD Evaluation Function .............................................................. 57
Table 28 - Case Study I.II CR Evaluation Function..................................................................... 57
Table 29 - Case Study I.III PIMDD Evaluation Function ............................................................. 58
Table 30 - Case Study I.III CR Evaluation Function.................................................................... 58
Table 31 - Case Study I.III PI Evaluation Function ..................................................................... 58
Table 32 - Case Study I.III Best Solutions’ Decay ...................................................................... 60
Table 33 - Case Study I.IV PIMDD Evaluation Function ............................................................. 62
Table 34 - Case Study I.IV CR Evaluation Function ................................................................... 62
Table 35 - Case Study I.V PIMDD Evaluation Function .............................................................. 62
Table 36 – Case Study I.V CR Evaluation Function ................................................................... 62
Table 37 - Case Study I.VI PIMDD Evaluation Function ............................................................. 63
Table 38 - Case Study I.VI CR Evaluation Function ................................................................... 63
Table 39 - Case Study I.VII PIMDD Evaluation Function ............................................................ 63
Table 40 - Case Study I.VII CR Evaluation Function .................................................................. 63
Table 41 - Application’s Parameters Case Study I.VIII ............................................................... 64
Table 42 - Case Study I.VIII PIMDD Evaluation Function ........................................................... 64
Table 43 - Case Study’s II Constant Parameters ........................................................................ 65
Table 44 - Application’s Parameters Case Study II.I................................................................... 65
Table 45 - Case Study II.I ............................................................................................................ 65
Table 46 - Case Study II.I Best Solution’s Parameters ............................................................... 66
Table 47 - Application’s Parameters Case Study II.II.................................................................. 67
Table 48 - Case Study II.II – 1 year Training 3 months Investment ............................................ 67
Table 49 - Case Study II.II – 2 years Training 6 months Investment .......................................... 67
xv
Table 50 - Case Study II.II – 3 years Training 9 months Investment .......................................... 67
Table 51 - Case Study II.II Best Solution Evolution .................................................................... 68
Table 52 - EMU Macroeconomic Indicators ................................................................................ 77
Table 53 - German Macroeconomic Indicators ........................................................................... 79
Table 54 - USA Macroeconomic Indicators ................................................................................. 82
Table 55 - Application's Input Parameters .................................................................................. 91
Table 56 - Example of the Configuration File .............................................................................. 92
xvi
xvii
List of Figures
Figure 1 - Simple Moving Average and Exponential Moving Average ........................................ 11
Figure 2 - Up and Down Trend Lines .......................................................................................... 12
Figure 3 - Resistance and Support Lines .................................................................................... 12
Figure 4 - Ascending Triangle (bear type) and Descending Triangle (bull type) ........................ 13
Figure 5 - Double Bottom, Double Top and Head and Shoulders Patterns ................................ 13
Figure 6 – ANN Generic Structure .............................................................................................. 15
Figure 7 - Macroeconomic Indicator Data Format ....................................................................... 23
Figure 8 - Index Data Format ...................................................................................................... 24
Figure 9 - Impact Example, Unemployment Rate and Nonfarm Payrolls Release ..................... 24
Figure 10 - MEV Impact Sum and S&P 500 Index Futures in 2007 ............................................ 31
Figure 11 - Moving Average Usage Example ............................................................................. 32
Figure 12 - VIX Data Format ....................................................................................................... 32
Figure 13 – VIX and S&P500 ...................................................................................................... 33
Figure 14 - Solution's Overall Architecture .................................................................................. 36
Figure 15 - Package Diagram ..................................................................................................... 37
Figure 16 - Genetic Algorithm Flowchart ..................................................................................... 42
Figure 17 - Implementation and Usage of the Optimization API ................................................. 44
Figure 18 - Optimization Package Class Diagram ...................................................................... 45
Figure 19 – Hypothesis Optimization Structure ........................................................................... 46
Figure 20 - Crossover Operation ................................................................................................. 50
Figure 21 - Model Package Class Diagram ................................................................................ 52
Figure 22 - Events Modeling Class Diagram ............................................................................... 53
Figure 23 - Case Study I.III Best Strategies’ Profitability ............................................................ 59
Figure 24 - Case Study I.III Best Solutions vs. Buy and Hold ..................................................... 59
xviii
Figure 25 - Study I.III Best Strategy Decisions Evaluation ......................................................... 61
Figure 26 - Case Study II.I Best Solution vs. B&H ...................................................................... 66
Figure 27 - Case Study II.II Best Solution vs. B&H ..................................................................... 68
Figure 28 - Best Strategies vs. B&H............................................................................................ 69
Figure 29 - Setting Loading ......................................................................................................... 89
Figure 30 - Data Time Series Loading ........................................................................................ 90
Figure 31 - Optimization Process Evolution ................................................................................ 90
Figure 32 - Index Data Time Series Collecting Program Interface 1 .......................................... 95
Figure 33 - Index Data Time Series Collecting Program Interface 2 .......................................... 95
Figure 34 - Enterprise Architect Configuration ............................................................................ 97
Figure 35 - Enterprise Architect Tools and Project Browser ....................................................... 98
Figure 36 - Importing, Exporting and Synchronizing the Source Code ....................................... 98
xix
List of Acronyms and Abbreviations
Optimization and Computer Engineering Related
GA - Genetic Algorithm
GP - Genetic Programming
SVM - Support Vector Machines
ANN - Artificial Neural Networks
ES - Evolution Strategies
EC - Evolutionary Computation
EA - Evolutionary Algorithm
FCM - Fuzzy Cognitive Maps
SOM - Self-Organizing Maps
SUS - Stochastic Universal Sampling
FPS - Fitness Proportionate Selection
API - Application Programming Interface
UML - Unified Modelling Language
Investment Related
EMH - Efficient Market Hypothesis
B&H - buy-and-hold
TA - Technical Analysis
FA - Fundamental Analysis
MA - Moving Average
EMA - Exponential Moving Average
SMA - Simple Moving Average
RSI - Relative Strength Index
ROC - Rate of Change
MACD - Moving Average Convergence/Divergence
OBV - On Balance Volume
EPS - Earning Per Share
PER - Price Earning Ratio
PCF - Price Cash Flow
PSR - Price Sales Ratio
POR - Pay Out Ratio
DY - Dividend Yield
PBV - Price Book Value
ROE - Return On Equity
ROA - Return On Assets
xx
DER - Debt Equity Ratio
QR - Quick Ratio
MC - Market Capitalization
EV - Enterprise Value
EVM - Enterprise Value Multiple
GDP – Gross Domestic Product
CPI - Consumer Price Index
PPI - Producer Price Index
LEI - Index of Leading Economic Indicators
AMV - Absolute Mean Variation
PI - Profitability Index
ROI - Return On Investment
MDD - Maximal Drawdown
CR - Calmar Ratio
CBOE - Chicago Board Options Exchange
VIX – CBOE’s Volatility Index
PIR - Profit Investment Ratio
1
CHAPTER 1 INTRODUCTION
Stock markets are an important component of most countries' economies and play a major role
in the international financial system. They are important from both the industry’s point of view as
well as the investor’s point of view. The stock markets drove the industrial revolution and
continue to be a fundamental part of the highly successful economic system in which we live.
Even the Communist Party of China realized it. However, there are several uncertainties
involved in the movements of the stock markets and investing has become a very complex task.
Many factors interact in the stock market including political events, Macroeconomic Factors, and
traders’ expectations. Therefore, predicting market price movements is quite difficult. To this
end, most investors use technical analysis that looks at the price movement of a security and
Fundamental analysis that looks at economic factors. Fundamental analysis performs top-down
analysis of the factors and trends that impact the success of a company while technical analysis
uses historical price and volume data to project future price behaviour.
Despite the fact that technical analysis and fundamental analysis are widely used, there are still
some criticisms directed to market behaviour forecasting. For instance, according to Efficient
Market Hypothesis (EMH) it is not possible to use either fundamental analysis or technical
analysis to trade and beat the market. According to this hypothesis, best investment strategy we
can apply is buy-and-hold (B&H) in which an investor buys stocks and holds them for a long
period of time, waiting for its valuation, regardless of market fluctuations. The EMH is widely
accepted in academic circles but in the technical community, this idea of purely random
movements of prices is totally rejected. Financial practitioners also reject the EMH as being
inconsistent with their real-world experience.
In this report is intended, in opposition with the EMH, demonstrate that it is possible to predict
market trends using Soft Computing Methodologies and Macroeconomic data time series. Many
approaches are used in stock market forecasting, among which Genetic Algorithms, Artificial
Neural Networks, Decision Trees, Support Vector Machines, Fuzzy Cognitive Maps, Bayes net
and Rough Sets technique are highlighted. Each approach has its advantages and
disadvantages and relying on the analysis made in CHAPTER 2, it was decided to implement a
Genetic Algorithm based solution.
Genetic Algorithm (GA) is an approaches based on simulated evolution which allow a
probabilistic search for the best hypothesis (symbolic expression or computer program) in a
hypotheses space and it is capable of global optimization of multivariate functions. This
approach has traditionally been used in optimization, but with few enhancements, can also be
used in classification and prediction. It is a member of the evolutionary algorithm’s family that
starts from a high-level statement of what needs to be done, and using principles of Darwinian
2
natural selection and biologically inspired operations, solves automatically the problem. During
the search process, hypotheses (possible solutions or strategies) are treated as individuals of a
population, and their fitness that needs to be maximized represents the measure of their quality
(in this case is the profit generated by the investment). Elements of the population mate,
mutate, reproduce and evolve until some termination condition is met and an approximate
solution is found. The GA has the capacity of adaptation to the problem, independently of the
size and the complexity of the solution wanted and this is the reason why it is used in this work.
1.1 Context and Motivation
Investing can mean the difference between losing all of your hard-earned money or potentially
gaining much more, therefore knowing the best investing strategies is very important. Many
strategies can be established using important market information and nowadays latest stock
market news, economical news and companies’ information can be easily found online for free
or for very affordable prices.
Some Fundamental macrofactors can be key determinants of stock index movements. However
the stock markets can be affected by many Macroeconomical factors and market trends can be
extremely difficult to predict. Sometimes, establishing cause-effect relationships becomes
impractical for humans, because the impact that a particular economic factor has on stock
market may vary in time. The need arises for tools able to establish cause-effect relationships
between multiple factors, but also able to predict future market trends.
1.2 Work’s Purpose
In this thesis it is intended to develop an Application that allows to predict the evolution of the
S&P500 Index Futures’ prices based essentially on the Macroeconomic Indicators’ news that
show the evolution of countries’ economic health. It is intended to use Macroeconomic
Indicators from different regions (United States of America, European Monetary Union,
Germany) and the S&P500 index futures as input. The application must have the ability to
measure the impact of Macroeconomic news on the market and using soft computing
techniques and some auxiliary tools, be able to estimate the future evolution of the index prices
in order to make profitable investment decisions. It is also intended to demonstrate that, using
intelligent computing techniques it is possible to beat the market and overcome the B&H
investment strategy.
1.3 Existing Methodologies
The Soft Computing methodologies have been applied to market forecasting and trading rules,
and in many cases have demonstrated better performance than competing approaches like
standard econometric models. Many authors use Artificial Neural Networks (ANNs) for market
forecasting because it is possible to extract nonlinear regularities from economic time series.
There is a growing interest in fuzzy logic computing, and its usage in forecast the future
3
changes in prices of stocks. Genetic Algorithms (GA), Genetic Programming (GP), Fuzzy
Cognitive Maps (FCM), Decision Tree and many other methodologies have been successfully
used in the market forecasting. Hybrid solutions are also very common. All these approaches
are described in next chapter, but also advantages and disadvantages of each are presented.
1.4 Document Structure
The presented thesis is structured as following:
Chapter 2 addresses the theory behind the developed work, namely the concepts of Market
Analysis and Investment Techniques and Soft Computing methodologies. Also, in this
chapter, it is given an overview about different methodologies which can be used and are
already used.
Chapter 3 illustrates the solution’s architecture of the developed application.
Chapter 4 proposes the validation procedure to evaluate the developed system by providing
a study of the solution’s performance and robustness.
Chapter 5 summarizes the provided report and supplies the respective conclusion and
future work.
The appendix provides the installation and user manual necessary for executing the
provided application.
4
5
CHAPTER 2 RELATED WORK
2.1 Introduction
This chapter will present the investment analysis and strategies used in the Capital Markets by
investors and also various techniques of Soft Computing used in stock market forecasting. Thus
Section 2.2 presents Fundamental and Technical Analysis fundamentals and introduces some
investment strategies. In Section 2.3 Soft Computing methodologies and related publications
are presented and existing solutions are listed. Finally conclusions are made in Section 2.4.
2.2 Market Analysis and Investment Techniques
Typically, investors use fundamental analysis, technical analysis, or both, to evaluate
investment opportunities. These are the two main approaches in the financial markets. As
already mentioned in the previous chapter and technical analysis looks at the price movement
and volume of a security and uses this data to predict its future trends. Fundamental analysis,
on the other hand, looks at economic, political and company's fundamental data, known as
fundamentals. However the idea that an investor uses or may use only one of these techniques
is deeply wrong. The investor, to be successful, must take into account both types of factors,
namely, technical and fundamental.
2.2.1 Fundamental Analysis
The fundamental analysis is based on several factors which can be summarized in three main
strands: Economic Analysis, Industry Analysis and Company Analysis. In the first one national
and global economy state are examined using several periodically published economic reports.
In the second case, industry’s conditions such as Customers, Market Share, Industry Growth,
Regulation and Competition are taken into account. Finally, company analysis focuses on
business intrinsic value using several financial statements and indicators.
2.2.1.1 Industry Analysis
To minimize the risk a company should have a variety of customers. If a company bases its
business on a limited range of customers, the smaller changes in customers preferences can be
devastating (for instance relation between military suppliers and government). The company’s
market share can be used to forecast the volumes of business while the forecast of customers’
number will influence the Industry Growth. If there is no provision of increase of customers than
the company has to conquer market share in order to grow and the competition must be taken
into account. Local Regulation can also affect the attractiveness of a company, limiting or
driving its growth. In the Industry Analysis many factors must be examined, which are
sometimes very subjective and difficult to analyze and interpret.
6
2.2.1.2 Company Analysis
The general idea behind this type of analysis is to find undervalued companies, analyzing their
intrinsic value based on company's financial statements. These statements are used to
calculate a number of useful ratios and reach some conclusions about company’s profitability,
price, liquidity, leverage and efficiency. The most used, useful and perhaps the best indicators
are presented in Table 1, followed by a brief description.
Table 1 – Company's Fundamental Indicators
Indicator Measure Description and
Strategy Associated
Earning Per
Share (EPS)
Profit allocated to each share, determining a share's price, calculation of other indicators. Growing/Good (15% per year) EPS: Buy
Price Earning
Ratio (PER)
Valuation of share price compared to EPS, time needed to recover investment, earnings growth in the future. Low PER: Buy (High PER can suggests higher earnings expectations)
Price Cash Flow
(PCF)
Relation between Share Price and Cash Flow Per Share reflects the real cash flow generated. Low PCF: Buy
Price Sales
Ratio (PSR)
Relation between Share Price and Total Revenue reflects the real revenue generated. Low PSR: Buy
Pay Out Ratio
(POR)
Earnings paid out in dividends, conclusions about company’s strategy or health (reinvestment, expansion). Analyze company’s strategy
Dividend Yield
(DY)
Pay out in dividends relative to share price, return on investment, share price evaluation. High DY: Buy
Price Book
Value (PBV)
Compare market value to book value, company’s health analysis. Low PBV: Buy (undervalued)
Return On
Equity (ROE)
Net income returned over shareholders equity, profitability of a company. High ROE: Buy
7
Indicator Measure Description and
Strategy Associated
Return On
Assets (ROA)
Company’s profitability relative to its total assets, management efficiency. High ROA: Buy
Debt Equity
Ratio (DER)
Measure of financial leverage. Low DER: Buy (Small Debt)
Quick Ratio
(QR)
Short-term liquidity, ability to meet short-term obligations. High QR: Buy
Market
Capitalization
(MC)
Total market value of shares, determining company's dimension
The ranges of values of the presented indicators are highly dependent on the industry and
many other factors. The company’s indicators can be used as signals of future valuations but
also as signs that something is wrong, therefore should not be used isolated but in conjunction
with other available approaches. Further study of these indicators and associated strategies can
be found in [1].
2.2.1.3 Economic Analysis
There is no way to understand the market and the companies without understanding economic
fundamentals, because companies and markets are parts of the financial and economic system.
Economic Indicators are simply economic statistics and can be used in analysis of the general
economic trend. The most important Economic Indicators that reflect the state of the economy
can be found in Employment Reports, Interest Rate Statements, Reports on Inflation and
Money Supply, Retail Sales Reports, Gross Domestic Product and other statistics published by
governments. New releases of economic data are published daily, weekly, monthly, and
quarterly, and they often tell conflicting stories about the global economy state. Despite this
crucial data be easily accessible, it is very difficult to correlate all these factors for a human.
Increasing Economic globalization leads to increasing economic interdependence of national
economies and increasingly complex relationships between them. Particularly highlighted is the
economy of the United States of America, the world's largest national economy, which
variations affect almost all other world economies. Bernard Baumohl, in his book [2] presents a
detailed description of most important U.S. and Worldwide economical indicators. Most
Influential U.S. indicators are present in Table 2.
8
Table 2 – The Most Influential U.S. Economic Indicators
Group Indicators
Employment
- Employment Situation - Weekly Claims for Unemployment Insurance - Help-Wanted Advertising Index - Corporate Layoff Announcements - Mass Layoff Statistics - ADP National Employment Report
Consumer
Spending and
Confidence
- Personal Income and Spending - Retail Sales - E-Commerce Retail Sales - Weekly Chain Store Sales - Consumer Credit Outstanding - Consumer Confidence Index - Survey of Consumer Sentiment - ABC News/Washington Post Consumer Comfort Index - UBS/Gallup Index of Investor Optimism
National Output
and Inventories
- Gross Domestic Product (GDP) - Durable Goods Orders - Factory Orders - Business Inventories - Industrial Production and Capacity Utilization - Institute for Supply Management (ISM) Manufacturing Survey - Institute for Supply Management (ISM) Non-Manufacturing
Business Survey - Chicago Purchasing Managers Index - Index of Leading Economic Indicators (LEI)
Housing and
Construction
- Housing Starts and Building Permits - Existing Home Sales - New Home Sales - Housing Market Index: National Association of Home Builders
(NAHB) - Weekly Mortgage Applications Survey and the National
Delinquency Survey - Construction Spending
Regional Federal
Reserve Bank
Surveys
- Federal Reserve Bank of New York: Empire State Manufacturing Survey
- Federal Reserve Bank of Philadelphia: Business Outlook Survey - Federal Reserve Bank of Kansas City: Manufacturing Survey of
the 10th District - Federal Reserve Bank of Richmond: Manufacturing Activity for
the Fifth District - Federal Reserve Bank of Chicago: National Activity Index
(CFNAI) - The Federal Reserve Board’s Beige Book - The Federal Open Market Committee (FOMC) Statement
Foreign Trade
- International Trade in Goods and Services - Current Account Balance (Summary of International
Transactions) - Treasury International Capital (TIC) System
Prices,
Productivity, and
Wages
- Consumer Price Index (CPI) - Producer Price Index (PPI) - Employment Cost Index - Import and Export Prices - Productivity and Costs - Employer Costs for Employee Compensation - Real Earnings - Yield Curve
9
According to [2] Indicator’s predictive ability and accuracy varies over the time and over the
business cycle and different indicators can influence the value of stocks, bonds, and currencies
in a different way. The summary of the most important U.S indicators based on degree of
interest (stocks, bond, currencies, other) is presented in Table 3, Table 4, Table 5 and Table 6.
Table 3 - U.S. Economic Indicators Most Sensitive to Stocks
Rank Indicator
1 Employment Situation Report (Payroll Survey)
2 ISM Report—Manufacturing
3 Weekly Claims for Unemployment Insurance
4 Consumer Prices
5 Producer Prices
6 Retail Sales
7 Consumer Confidence and Sentiment Surveys
8 Personal Income and Spending
9 Industrial Production
10 GDP
Table 4 – US Economic Indicators Most Sensitive to Bonds
Rank Indicator
1 Employment Situation Report
2 Consumer Prices
3 ISM Report—Manufacturing
4 Producer Prices
5 Weekly Claims for Unemployment Insurance
6 Retail Sales
7 Housing Starts
8 Personal Income and Spending
9 ADP National Employment Report
10 GDP
Table 5 - Indicators That Most Influence the U.S. Dollar’s Value
Rank Indicator
1 Employment Situation Report (Payroll Survey)
2 International Trade
3 GDP
4 Current Account
5 Industrial Production/Capacity Utilization
6 ISM Report—Manufacturing
7 Retail Sales
8 Consumer Prices
10
Rank Indicator
9 Consumer Confidence and Sentiment Surveys
10 Productivity and Costs
Table 6 - Indicators That Lead the Rest of the Economy
Indicator
Yield Curve
New Orders for Durable Goods
Producer Prices (crude goods without food and energy)
Personal Income and Spending (purchases of durable goods)
Housing Permits
Weekly Applications for Mortgages
Housing Market Index
Weekly Claims for Unemployment Insurance
Institute for Supply Management (manufacturing survey)
UBS/Gallup Survey of Investor Optimism
The most influential international economic indicators are shown in Table 7. The importance is
measured using several factors, namely the size of the economy, markets liquidity, facility to
buy and sell securities, trading partnerships with the U.S., exchange of goods and services and
service sectors.
Table 7 - “Top Ten” International Economic Indicators
Indicator
German Industrial Production
German IFO Business Survey
German Consumer Price Index
Japan Tankan Survey
Japan Industrial Production
Eurozone/Global Purchasing Managers Index
OECD Composite Leading Indicators (CLI)
China Industrial Production
India GDP and Wholesale Price Index
Brazil Industrial Production
2.2.2 Technical Analysis
The assumption of technical analysis is that equity prices are formed by movement of supply
and demand and the supply-demand relationship can be seen as a price-volume relationship.
Therefore, the study of these factors, expressed in prices and transaction volumes, is what
really matters in stock market forecasting. There are a number of investment strategies based
only on these two factors: Price and Volume.
11
2.2.2.1 Transaction Volumes
The volume is used to measure financial instruments activity and it corresponds to the number
of shares traded over a certain period of time. Using the volume becomes possible to identify
trends and chart patterns, and usually, the large price movements are considered important if
relatively high volumes are involved or its variation is significant. The volume is required for the
price rising, and usually, during the price falls the volume decreases. An example of volume
bars in the chart can be seen in Figure 1.
2.2.2.2 Moving Averages
With a series of sequential numbers in time we can calculate an average corresponding to a
specific period of time. Moving averages (MA) are nothing more than means of prices for a
particular time window. The most commonly used averages are the Simple Moving Average
(SMA) of last N days and Exponential Moving Average (EMA), where the price of recent days
has a greater weight in the average. The price movements usually have many variations, and
therefore become very difficult to analyze. With the averages, this movement is smoothed out
and the investor can get a clear and comprehensive view of trends in the short, medium and
long terms. There are several strategies based on this indicator and in a general way the
investor should buy when the Moving Average is rising, or crosses down the prices line and sell
when the Moving Average is descendant or crosses up the prices line. Often the intersections of
Moving Averages with different number of days are also seen as buying or selling signals. An
example of MAs (simple and exponential) is illustrated in Figure 1.
Figure 1 - Simple Moving Average and Exponential Moving Average
2.2.2.3 Trend Lines
Trend lines, support and resistance lines, just as the Moving Average indicator, are based on
price evolution and volume, which are closely related to each other. In the case of trend lines,
the goal is to know the current trend in order to follow it. However, in most cases, uptrend and
downtrend have many oscillations and to identify the overall direction it is necessary to draw a
12
line segment connected to the lowest or highest points of the price line. Thus it is possible to
visualize the direction of movement, i.e. the trend. Trend lines usage is illustrated in Figure 2.
2.2.2.4 Support and Resistance Lines
Support is the movement of acquisition, with enough volume to sustain a price fall by an
appreciable period of time, while the resistance is the sell movement with enough volume to
sustain a price rise. Besides the volume, price repeating and time distances are used to identify
support and resistance lines. An example of these lines is shown in Figure 3.
Figure 2 - Up and Down Trend Lines
Figure 3 - Resistance and Support Lines
2.2.2.5 Ascending/Descending Triangle Pattern
Combining trend line with support / resistance line is obtained an ascending/descending
triangle. In the case of triangles, attention should be taken to crossing points of the price with
the support / resistance line which are normally followed by price increase or decrease. When
the triangle is of bull/bear type then the crossing point is followed by price increase/decrease.
This behaviour can be seen in Figure 4.
13
.
Figure 4 - Ascending Triangle (bear type) and Descending Triangle (bull type)
2.2.2.6 Reversal Patterns
Reversal Patterns are the figures formed from price lines and can give good signals to
investors. Two of the most “powerful” figures are Double Top and Bottom and Head and
Shoulders. Typically the head and shoulders patterns precede downtrend while double bottoms
follow extended downtrends and precede uptrend. Double top pattern typically follows extended
uptrend and precedes downtrend. These types of patterns are illustrated in Figure 5.
bottom bottom
toptop
head
shoulder shoulder
Double Top Pattern
Double Bottom Pattern
Head and Shoulders Pattern
Figure 5 - Double Bottom, Double Top and Head and Shoulders Patterns
2.2.2.7 Technical Indicators
In technical analysis investors use various indicators based on prices and volumes among
which the following are highlighted: Relative Strength Index (RSI), Rate of Change (ROC),
Moving Average Convergence/Divergence (MACD), On Balance Volume (OBV) and also MAs
that have already been described. The meaning of each indicator and general strategy
associated to it is presented in Table 8.
14
Table 8 – Technical Indicators
Indicator Measure Description and Strategy
Associated
RSI
RS is the Average of n periods closes up over Average of n periods closes down
Momentum indicator compares recent gains to recent losses, identification of overbought and oversold conditions. RSI>=70: Sell (overvalued/overbought). RSI<=30 Buy (undervalued/oversold)
ROC
Percentage change between recent price and past price. ROC>0: Buy (upward momentum). ROC<0: Sell (selling pressure)
MACD
Trend following momentum indicator, relationship between two moving averages of prices. MACD upward cross zero: Buy. MACD downward cross zero: Sell.
OBV
Momentum detecting method, trend confirm, relation between volume and price change. OBV downward: Sell (downtrend confirm). OBV upward: Buy (uptrend confirm)
Several strategies can be established based on technical indicators and volume-price charts.
Further study of these strategies can be found in [1].
2.2.3 Fundamental Analysis vs. Technical Analysis
These two approaches should not be viewed as disjoint, and both should be used in
conjunction. The investor, to be successful, must take into account both, technical and
fundamental analysis, because there are evidences that both, technical and fundamental
trading, have roles to play in stock prices forecasting.
2.3 Soft Computing methodologies and Existing solutions
In the stock market forecasting, various prediction algorithms and models have been proposed
by many academics and industry researchers. Artificial Neural Networks (ANNs) and Genetic
Algorithms (GA) allow extracting nonlinear relations from economic time series and are the most
used approaches in market forecasting. There is a growing interest in fuzzy logic computing and
its usage in the forecasting problem. Fuzzy Cognitive Maps (FCM), Decision Trees and many
other methodologies. Hybrid solutions are also very commonly used in this area. In the following
sections, with special attention given to GA and ANN, all this approaches are described and
related publications are presented. At the end of the chapter advantages and disadvantages of
15
all algorithms are presented in order to allow the drawing of conclusions about the direction to
follow in this thesis work.
2.3.1 Artificial Neural Networks
The Artificial Neural Network (ANN) is the approach that uses computational analogues of
neurons and it is one of the most used tools in classification and forecasting problems. The
Generic ANN structure, Perceptron structure and mostly used activation functions are shown in
Figure 6.
Input Layer
Hidden Layers
Output Layer
Inp
uts O
utp
uts
+
w1 w0
w2
1
...
wn
Sigmoid
Most Common Sigmoids:Perceptron
Figure 6 – ANN Generic Structure
The gradient method is the most used method of training ANNs by minimizing the cost function
. To accelerate the learning process, Adaptative Step Size and Momentum methods are
also frequently used. The equation (1) is the main recursive procedure while equations (2) and
(3) represent adaptative step size and momentum terms respectively. For the further study,
more detailed description of ANNs and other related topics can be found in [3].
(1)
(2)
(3)
The ANNs are one of the most used approaches in stock market forecasting area. Over the last
decade, ANNs have been widely used and shown better performance over other approaches in
many cases. Many authors used ANNs to predict buying and selling timings and security
selection in stock markets, but show some differences in their goals, strategies and input data
16
used. This approach is also successfully used in prediction of foreign exchange rates, as
demonstrated in [4].
One of the decisive factors is the choice of input data and most of the solutions, [5], [6], [7] and
[8], use as input Fundamental and Technical indicators at the same time, while other solutions,
[9], [10] and [11] use Fundamentals only. Since the industry factors are very subjective and
difficult to analyze, to quantify and interpret, only Macroeconomic and Company fundamentals
are used. Solutions [5] and [7] use Technical Indicators and Companies’ Fundamentals, [6] and
[8] use Technical and Macroeconomic indicators, [9] and [10] use Companies’ Fundamentals
only, while solution [11] uses Macroeconomic indicators time series only.
Some authors are not only directly focused on stock price movement, but also focus different on
strands, like building of portfolios and selecting stocks [7] or forecasting revenue growth rate of
firms [10]. As there is no possibility of direct interpretation of the trained ANNs, and in order to
find the relationships between the past technical and economic indexes and buying/selling
timings, analysis of internal representation of a hierarchical neural network is done using
clustering methods in solution [5]. Publication [10] compares ANNs performance with Decision
Tree C4.5, Bayes net and Rough Sets technique, while [6] focuses on feasibility analysis
consisting of a series of different univariate and multivariate, linear and nonlinear statistical tests
that help in defining the topology of the ANN. In [8] hybrid self-organizing maps and genetic
algorithm based backpropagation neural networks are used. The summary of the ANNs existing
solutions mentioned can be found in Table 11. For a broader vision of the solutions based on
the Artificial Neural Networks [12], [13], [14] and [15] should be consulted.
2.3.2 Genetic Algorithms and Genetic Programming
Genetic Algorithms (GA) and Genetic Programming (GP, specialization of GA) are approaches
based on simulated evolution which allow a probabilistic search for the best hypothesis
(symbolic expression or computer program) in a hypotheses space. This approach has
traditionally been used in optimization [16], but with few enhancements, can also be used in
classification and prediction. A generic description of the GA is presented in Table 9. For the
further study, more detailed description of GA, GP, Genetic Operators and other related topics
can be found in [3].
In the presented prototypical genetic algorithm the population containing p hypotheses is
maintained. On each of iterations, the successor population is formed by probabilistically
selecting current hypotheses according to their fitness and by adding new hypotheses. New
hypotheses are created by applying a crossover operator to pairs of best fit hypotheses and by
creating mutations in the resulting generation of hypotheses. This process is iterated until
sufficiently fit hypotheses are discovered. The GA is the approach that can be used for the
global optimization of multivariate functions. It is a probabilistically directed search based on
17
recombination and mutation operations performed over trial solutions. It is an efficient approach
used in search of exact or approximate location of a global optimum.
As in the case of ANNs, in the case of GAs, despite using the same theoretical basis, existing
solutions show some differences. The publications can be distinguished based on the
representation of the hypotheses but also in their goals and input data used. There are
significant differences in the modelling of the problems proposed by the authors, which are
reflected in the structure and representation of the hypotheses. Generally, the GA uses binary
word string of fixed length to represent chromosome. In [17], [18] and [19] GAs are used to
estimate the optimal model parameters and the hypotheses are represented using this kind of
symbolic expressions with known established structures. In the GP the difference relies on the
fact that zeros and ones, corresponding to genes in GA, are replaced by syntax trees. Solutions
[20], [21] and [22] use GP, where the hypotheses are represented using the syntax trees and
the models and parameters are constructed and estimated using components presented in
Table 10. Solutions [21], [18] and [19] use Companies’ Fundamental Indicators as input data,
solutions [20] and [22] use both, Technical and Companies’ indicators while [17] uses only
Macroeconomic data. The summary of the GA and GP existing solutions that were mentioned
can be found in Table 11.
Table 9 - A Genetic Algorithm Prototype
{
: A function that assigns an evaluation score, given a hypothesis. : A threshold specifying the termination criterion.
: The number of hypotheses to be included in the population.
: The fraction of the population to be replaced by Crossover at each step. : The mutation rate.
Initialize population: Generate hypotheses at random.
Evaluate: For each in , compute . While [ ] do
{
Create a new generation, : 1. Select: Probabilistically select members of to add to . The probability
of selecting hypothesis from is given by:
2. Crossover: Probabilistically select
pairs of hypotheses from , according to
given above. For each pair produce two offspring by applying the Crossover
operator. Add all offspring to 3. Mutate: Choose percent of the members of , with uniform probability. For each, invert one randomly selected bit in its representation. 4. Update: .
5. Evaluate: for each in , compute }
Return the hypothesis from that has the highest fitness. }
18
Table 10 - Terminal and function sets
Article Terminal Set Function Set:
[21] metric, average, minimum, and maximum AND, OR, NOT, <, >
[22] metric plus, divide, square, times, minus, log
[20] indicator type,”<” or “>”, floating-point number OR, AND
2.3.3 Other solutions
There are many other approaches used in stock market forecasting, among which Decision
Trees [10], Support Vector Machines (SVM) [16], Fuzzy Cognitive Maps (FCM) [23], Bayes net
[10] and Rough Sets technique [10] are highlighted. A decision tree is a decision support tool
that uses a tree-like graph, constructed based on attributes information gain, in which internal
nodes denotes tests on the attributes and each branch represents the result of a test. The
leaves of the tree represent classes or distributions. The Rough Set is a predictive data mining
tool and the Bayes net is a data structure that enables fast processing of probability distributions
that is frequently used in inferring unobserved variables, parameter learning, and structure
learning. In [10] Decision Tree C4.5, Bayes net, ANN, Rough Sets technique are studied and
compared in forecasting of Revenue Growth Rate of firms using Company's Fundamentals.
Support Vector Machines (SVMs), frequently used in classification and regression, are methods
that analyze data and recognize patterns, these are used in [16] prediction of stock market
movement direction, using Technical and Fundamental indicators time series. The Fuzzy
cognitive map is a network describing system that incorporates fuzzy logic principles. It is the
network of interrelated factors or concepts where the relationships have a cause-effect form,
where nodes represent concepts and the links represent existing relationships. This approach is
used in Establishing the cause-effect relationships between factors in [23] using as input
Fundamental and Technical Indicators. For a more detailed study of these topics it is advised to
consult [24] and [3]. The summary of mentioned existing solutions can be found in Table 11.
19
Table 11 - Summary of the existing solutions
Article Year Heuristic Input Data Financial Assets
Goals of the Study
Innovative Aspects
Market Period
Algorithm's performanc
e Comparison
[5] 1990 ANN, Clustering
Technical Indicators and Companies’ Fundamentals: Interest rates, moving averages, other
TOPIX Index (Japan)
Buying/selling timing, technical/ economic indexes relationships
Supplementary learning alg., training and monitoring
1987 - 1990
Return 25% per year
B&H, Multiple Regression
[23] 2001 FCM, ES Macroeconomic and Technical Indicators: Interest rates, moving averages, other
Athens Stock Exchange
Factors Cause-effect relationships
ES-based FCM
1997 - 1998
Return 147% per year
B&H
[6] 1996
ANN, Univariate and Multivariate Analysis
Macroeconomic and Technical Indicators: Interest rates, moving averages, other
Swiss Bond Yield
Buying/selling timing
Sensitivity analysis, Fundamental data
1995
Percentage of correct predictions 67%.
ANN using Technical Indicators
[9] 1997 ANN Company's Fundamentals: profits, dividends, sales, other
S&P 500 Market Forecast Fundamental Data
1993 - 1995
Return 33% per year
B&H, ANN and B&H, other
[10] 2007
Decision Tree C4.5, Bayes net, ANN, Rough Sets technique
Company's Fundamentals: price to earning ratio, gross sales, book to market ratio, return on net worth, return on equities, earning per share, other
Taiwan stock
Forecasting Revenue Growth Rate of firms
Fundamental Data, Rough sets technique
2004 - 2005
Rough Sets. Accuracy (2005): 75,15%
Decision Tree C4.5, Bayes net, ANN, Rough sets technique
[4] 2008 ANN
Macroeconomic and Technical Indicators: Interest rates, gross domestic products, moving averages, other
Forex Prediction of foreign exchange rates
Levenberg-Marquardt alg. , Fundamental Data
Not specified
Percentage of correct predictions
Influence of Fundamental data not captured
[20] 2009 GP, Coevolutionary Algorithms
Technical Indicators and Companies’ Fundamentals: company's profit, moving averages, other
Warsaw Stock Exchange
Rule discovery on Stock Market
Coevolution usage for selling and buying rules
Not specified
Averaged profit ratio 39,13%
B&H
[7] 2004 ANN
Technical Indicators and Companies’ Fundamentals: P/E Ratio, Book Value per share, ROE, Payout Ratio, Dividend Yield, Price to Book ratio, other
Australian Securities Exchange (ASX)
Security selection in the Stock market
Improving Stock selection rules
1994 - 2003
Outperform simple selection rules
Simple selection rules
20
Article Year Heuristic Input Data Financial Assets
Goals of the Study
Innovative Aspects
Market Period
Algorithm's performanc
e Comparison
[21] 2003 GP
Companies’ Fundamentals: Cash and Short-Term Investments, Long-Term Debt, Sales, Retained Earnings, other
S&P Induction of useful classification rules
Not specified 1972 – 1999
Prediction more right than wrong
Different Initial settings
[16] 2009 Least Squares SVM, GA
Macroeconomic and Technical Indicators: gross domestic products, moving averages, other
S&P 500, DJIA, NYSE
Predict stock market movement direction
LSSVM + GA-based input feature selection
1926 - 2005
Hit ratios above 75%
Different Models Generated by GA
[11] 2009 Hybrid ARDL, ARIMA and ANN
Macroeconomic Indicators: consumer price index, interest rate, exchange rate and money volume, other
Tehran Stock Exchange (Iran)
Predict stock market movement direction
Resilient Back propagation technique
1993 - 2006
Hybrid model and economy indicators
Hybrid model, ARDL, ARIMA, ANNs
[22] 2009 GP Planning
Technical Indicators and Companies’ Fundamentals: gross profit, earning per stock, Relative Strength Index, other
China Steel stock
Construct investment model, Rule discovery
Genetic Programming Planning
2004 - 2007
Outperform Decision Tree
Decision Tree model
[8] 2008
SOM, GA based Backpropagation ANN
Technical Indicators and Companies’ Fundamentals: P/E Ratio, Dividend Yield, Annual growth in Sales, other
NSE (India) Method for stock picking
Selection of stocks using fundamental analysis
2005 - 2008
Outperform Simple backpropagation ANN
Simple backpropagation neural networks
[17] 1997 GA
Macroeconomic Indicators: Industrial production, unemployment rate, Consumer Price Index, other
S&P500, Treasury Bills
Investment strategy, switching decisions
Macroeconomic Time-Series
1958 - 1993
Wealth ranging from 3.55 to 3.93
Perfect foresight with wealth index of 4.26
[18] 2006 GA
Company's Fundamentals: return on capital employed, price/earnings ratio, earning per share and liquidity ratio, other
Shanghai Stock Exchange
Select high quality stocks with investment value
Not specified 2002 - 2004
Significantly outperformed the benchmark
B&H
[19] 2007 GA Company's Fundamentals: 50 financial statement variables
3181 Chinese listed companies
Predict the direction of one-year-ahead earnings change
All possible ratios used,
2000 - 2004
Outperforms PNN and decision tree
Probabilistic Neural Network and Decision Tree
21
2.4 Conclusions
Existing solutions in the stock market forecasting, reveal that the ANNs are most used by
researchers. Numerous successful applications have shown that ANNs are a very useful
approach in the stock market modelling and forecasting, but sometimes exhibit inconsistent
results due to limitations. Local minima and overfitting are the most common problems
encountered in ANNs usage. Moreover, despite the correctness of the estimations obtained by
ANNs, the results are difficult to interpret and investors can’t take any conclusion about the
driver factors that influence the market trends. In [16] SVM is successfully used and shows
better performance than ANN, however, SVM model often suffers from much difficulty in
improving computational efficiency, optimizing model parameters, and selecting relevant input
features. As in the case of ANNs, SVM model is not helpful for developing comprehensive
forecasting models. Bayes net, on the other hand, are more efficient than ANNs and SVM, but it
is also difficult to examine its solution beyond the problem of getting the probabilities
knowledge. Decision tree approach seems to solve the problem of previous approaches in the
understanding of the key drivers which affect the stock price movements, but in [22] GP shows
to have better performance and is also capable of interpretable classification rule induction.
Using FCM it is possible to establish cause effect relationships between the factors [23], but
there is always the need of intervention of experts for the determination of the structure and the
estimation of link values, an intervention that is not necessary for instance in the case of GP.
Genetic Algorithms are used in solution [17] and show good performance in switching between
S&P index and Treasury Bills. Solutions [18] and [19] that use companies’ fundamentals and
GAs show that it is possible to beat the B&H and that the GAs can outperforms the PNN and
decision trees in forecasting.
The study of existing solutions also reveals that most studies are based on Technical Analysis
rather than Fundamental Analysis. When Fundamental data is used, it is usually used in
conjunction with Technical Indicators. In [11] and other studies it is shown that the use of
macroeconomic variables can improve the estimation result. Even when used, the Fundamental
data, in most cases are Companies’ Fundamentals and it is easy to notice that lower
importance is given to Macroeconomical factors. However, in [16] it was found that most key
determinants (three of four) of affected stock index movement are some fundamental
macrofactors. As plainly declared in this article: “This finding is very meaningful for investors
and decision makers because this finding can tell them that investing stock market should
depend mostly on the fundamental analysis rather than technical analysis...”. Despite this facts,
and when used, there are always only few macroeconomic variables involved in the studies.
It’s extremely difficult to evaluate the designed strategies in terms of profitability since most of
them are applied to different financial assets and market periods, and for these reasons the
B&H strategy is considered as a benchmark. Also, based on all the arguments presented, we
conclude that many of the potentials of Genetic Algorithms, Genetic Programming and
22
Fundamental Analysis, more precisely Economic Analysis, are under-explored. We believe that
using this data and these algorithms, capable of constructing models and at the same time
calculating model parameters, it possible to get good results in the stock market forecasting.
23
CHAPTER 3 Data Time Series Analysis
In this chapter, before proceeding with the formulation of the problem, formulation of the
solution and the description of the development, an analysis of the data time series available for
the problem is made. It starts by giving the description of the Macroeconomical data time series,
index data time series and its detailed analysis. At the end of chapter conclusions are made
about the data.
3.1 Data Time Series
In this study are used Macroeconomic Indicators from different regions (United States of
America, European Monetary Union, Germany) and the S&P500 index futures. This work
focuses on the U.S. market due to its size and the impact it has over the rest of the world. Also
S&P500 index is chosen because it includes 500 leading companies in leading industries of the
U.S. economy, capturing 75% coverage of U.S. equities. The usage of the futures of the index
(and not the index itself) is due to the fact that these are traded for longer hours during every
day, even when the main market is closed. This way it is possible to track the index behaviour in
the instants when the news come out, even if the market is closed. Detailed descriptions of all
the data time series used in this work are presented in the following sections.
3.1.1 Macroeconomic Data Time Series
The American, European Union and German Macroeconomic Indicators in this work were
chosen relying on [2] (see Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7), i.e., taking
into account Employment/Unemployment, Manufacturing, Consumer Prices, Producer Prices,
Retail Sales, Consumer Confidence and Sentiment, Personal Income and Spending, Industrial
Production, GDP and many other factors. The data from January 2007 to June 2011 was
collected from the [25] website (global online currency trading portal) and the Volatility levels
used to rate the Macroeconomical Indicators were also taken into account in the selection of the
Indicators. The format of the data collected is present in Figure 7.
DATE ACTUAL CONSENSUS PREVIOUS
Figure 7 - Macroeconomic Indicator Data Format
The total number of 110 Indicators were collected, 63 of them being American, 25 European
and 22 German. The detailed listing of the Indicators collected, due to its extension, is available
in APPENDIX A – Macroeconomic Indicators and it can be consulted in Table 52, Table 53 and
Table 54.
3.1.2 Index Data Time Series
The S&P500 index futures’ data time series were collected from [26] website from 01/2006 to
09/2011. For this purpose a program was developed using Microsoft’s .NET technology and C#
programming language in Visual Studio 2010 IDE. This program is capable of extracting daily,
24
hourly and minutely data time series of the futures but also to automatically convert the time
zones (Moscow TC/GMT +3/4 hours to GMT+0) taking into account daylight saving time
conventions and all the other adjustment rules. The program’s brief description and the manual
are available in APPENDIX C. Index Data Series format is presented in Figure 8.
DATE TIME OPEN HIGH LOW CLOSE VOLUME
Figure 8 - Index Data Format
3.1.3 Macroeconomic Data Impact Measurement
With such a wide range of economic variables used and the difficulty associated with evaluation
of their impact on the market, arises the need to restrict the search space as well as to define a
measure of their impact. Because of these facts, this study attempted to find a versatile way to
measure the impact of news of macroeconomic indicators on the market.
By analyzing the data of the minutely index’s prices it was concluded that there were huge
variations in price in the instants when important news come out. It was also noticed that
besides the absolute values of Macroeconomic Indicators’ variations, expectations had much
impact on the direction of the variations in these moments. Also should be noted that variations
of this kind could only be caused by large differences in supply and demand and large volumes
associated that reflect the market's interpretation of the involved news. Example of this kind of
variation is presented in Figure 9, where the release of Unemployment Rate and Nonfarm
Payrolls triggered a variation of more than 10 U.S. dollars and more than 70.000 of contracts
traded just during one minute.
Figure 9 - Impact Example, Unemployment Rate and Nonfarm Payrolls Release
25
Based on the facts presented, it was decided to use the minutely variations of the prices to
measure the intensity and the meaning (positive or negative) of the news releases. This way it
is possible to analyse the impact of the macroeconomic variables without analyzing the
quantities and the impact that this variables theoretically should have. The minutely variations of
the prices that happen when the news are released, allow to know how the big players in the
market interpreted corresponding news.
3.1.4 Macroeconomic Data Filtering
Defined the measure of the impact of Macroeconomic Indicators comes the need to restrict the
search space by choosing only the most influent variables. To select the most important
variables among available data, the Absolute Mean Variation (AMV) was calculated using (4)
from 01/01/2007 to 01/01/2010 for all 110 variables (indicators).
(4)
- Price variation during minute in $.
In order to have a scale which allows to establish limits, were calculated the mean variations of
the index for the same period (see Table 12).
Table 12 - Mean Index Variation from 01/01/2007 to 01/01/2010
Variation Interval Variation ($)
1 minute 0,225124
1 hour 2,198593
1 day 10,276654
From all the Economic data available were selected and analyzed 50 Indicators with higher
AMV (and at least three times Index minutely AMV) that can be seen in Table 13.
Table 13 - Top 50 AMV Macroeconomic Indicators
Macroeconomic Indicator Region AMV
Fed Interest Rate Decision U.S.A. 4,767
Average Hourly Earnings (YoY) U.S.A. 4,55
Average Hourly Earnings (MoM) U.S.A. 4,513
Average Weekly Hours U.S.A. 4,435
Unemployment Rate U.S.A. 4,318
Nonfarm Payrolls U.S.A. 4,318
Consumer Price Index Ex Food & Energy (YoY) U.S.A. 2,294
Consumer Price Index Ex Food & Energy (MoM) U.S.A. 2,294
Consumer Price Index (YoY) U.S.A. 2,294
Consumer Price Index (MoM) U.S.A. 2,294
Real Personal Consumption Expenditures (QoQ) U.S.A. 2,244
Gross Domestic Purchases Price Index U.S.A. 2,169
26
Macroeconomic Indicator Region AMV
Gross Domestic Product Annualized U.S.A. 2,066
Retail Sales (MoM) U.S.A. 2,056
Retail Sales ex Autos (MoM) U.S.A. 2,036
Producer Price Index ex Food & Energy (YoY) U.S.A. 1,876
Producer Price Index ex Food & Energy (MoM) U.S.A. 1,871
Producer Price Index (MoM) U.S.A. 1,871
Producer Price Index (YoY) U.S.A. 1,853
Continuing Jobless Claims U.S.A. 1,621
Durable Goods Orders U.S.A. 1,62
Durable Goods Orders ex Transportation U.S.A. 1,616
Building Permits (MoM) U.S.A. 1,5
Initial Jobless Claims U.S.A. 1,461
Housing Starts (MoM) U.S.A. 1,44
Import Price Index (YoY) U.S.A. 1,376
Import Price Index (MoM) U.S.A. 1,376
ECB Interest Rate Decision E.M.U. 1,356
Consumer Confidence U.S.A. 1,323
Richmond Fed Manufacturing Index U.S.A. 1,281
Core Personal Consumption Expenditure - Prices Index (YoY) U.S.A. 1,262
Personal Income (MoM) U.S.A. 1,26
Personal Consumption Expenditures (MoM) U.S.A. 1,26
Core Personal Consumption Expenditure - Prices Index (MoM) U.S.A. 1,26
ISM Manufacturing U.S.A. 1,229
Pending Home Sales (MoM) U.S.A. 1,216
ADP Employment Change U.S.A. 1,2
NY Empire State Manufacturing Index U.S.A. 1,147
Nonfarm Productivity U.S.A. 1,104
New Home Sales U.S.A. 1,097
ISM Non-Manufacturing U.S.A. 1,097
Housing Price Index (MoM) U.S.A. 1,083
Factory Orders U.S.A. 1,079
Trade Balance U.S.A. 1,046
Construction Spending (MoM) U.S.A. 0,997
Existing Home Sales U.S.A. 0,986
Unit Labor Costs U.S.A. 0,975
New Home Sales (MoM) U.S.A. 0,942
Consumer Credit Change U.S.A. 0,89
Existing Home Sales (MoM) U.S.A. 0,86
Making an analysis of the Top AMV Indicators (Table 13) it can be concluded that
Employment/Unemployment, Manufacturing, Consumer Prices, Producer Prices, Retail Sales,
Consumer Confidence and Sentiment, Personal Income and Spending, Industrial Production
and GDP, in other words, the U.S. Economic Indicators Most Sensitive to Stocks according to
27
[2] presented in Table 3, are all represented in the table. This result is very important because it
confirms that the evaluation of the impact of Macroeconomic news can be done using the
minutely price change and that the variations is proportional to the importance. Many other
important Economic Indicators (Table 2) are also represented in this table. The Fed Interest
Rate Decision (USA) and ECB Interest Rate Decision (EMU) despite not being present in Table
3, play an important role because they affect companies’ and individuals’ debt expenses and in
consequence the spending an revenues and its importance is highlighted in almost all parts of
book [2].
3.1.5 Correlation between Index Prices and Macroeconomic Data
Defined the measure of the impact of Macroeconomic variables, it was considered that is
important to verify what is the correlation between impact with the quotations index. For this
purpose were calculated correlations between the index and the impact of each variable for the
period of 3 years (from 2007 to 2010) and for each year individually. To understand if there was
a relationship between the amplitudes of the minute variations and correlations with the index,
in addition were calculated AMVs for each variables for each period. The correlations between
the variables and the index and the minutely AMVs are presented in Table 14.
Table 14 - Correlation between Variables' Impacts and Index Prices
Period
Correl. 2007/1
- 2010/1
Correl. 2007/1
- 2008/1
Correl. 2008/1
- 2009/1
Correl. 2009/1
- 2010/1
AMV ($/min) 2007/1
- 2008/1
AMV ($/min) 2008/1
- 2009/1
AMV ($/min) 2009/1
- 2010/1
Fed Interest Rate Decision
0,155 0,465 -0,178 0,264 4,73 4,468 0,722
Average Hourly Earnings (YoY)
-0,023 -0,062 0,127 -0,043 1,91 6,378 3,185
Average Hourly Earnings (MoM)
0,058 -0,151 0,127 -0,043 3,53 6,378 3,185
Average Weekly Hours
-0,033 -0,103 0,127 -0,032 2,107 6,378 3,154
Unemployment Rate
-0,003 -0,256 0,127 -0,043 2,744 6,378 3,185
Nonfarm Payrolls
-0,003 -0,256 0,127 -0,043 2,744 6,378 3,185
Consumer Price Index Ex Food & Energy (YoY)
0,075 0,388 -0,024 -0,547 1,645 4,12 0,839
Consumer Price Index Ex Food & Energy (MoM)
0,075 0,388 -0,024 -0,547 1,645 4,12 0,839
28
Period
Correl. 2007/1
- 2010/1
Correl. 2007/1
- 2008/1
Correl. 2008/1
- 2009/1
Correl. 2009/1
- 2010/1
AMV ($/min) 2007/1
- 2008/1
AMV ($/min) 2008/1
- 2009/1
AMV ($/min) 2009/1
- 2010/1
Consumer Price Index (YoY)
0,075 0,388 -0,024 -0,547 1,645 4,12 0,839
Consumer Price Index (MoM)
0,075 0,388 -0,024 -0,547 1,645 4,12 0,839
Real Personal Consumption Expenditures (QoQ)
-0,039 0,418 -0,36 0,017 1,208 2,225 2,389
Gross Domestic Purchases Price Index
-0,229 -0,239 -0,3 -0,141 0,989 1,904 2,321
Gross Domestic Product Annualized
-0,162 -0,262 -0,355 -0,001 1,054 2,284 2,517
Retail Sales (MoM)
0,172 -0,283 0,261 -0,058 1,486 2,124 2,107
Retail Sales ex Autos (MoM)
0,218 -0,03 0,261 -0,058 1,435 2,124 2,107
Producer Price Index ex Food & Energy (YoY)
-0,005 0,112 0,307 -0,038 2,002 1,755 1,942
Producer Price Index ex Food & Energy (MoM)
0,124 0,171 0,307 0,045 1,839 1,755 1,763
Producer Price Index (MoM)
0,124 0,171 0,307 0,045 1,839 1,755 1,763
Producer Price Index (YoY)
0,036 0,112 0,307 0,045 2,002 1,755 1,763
Continuing Jobless Claims
-0,01 -0,011 -0,128 -0,024 0,188 1,934 1,43
Durable Goods Orders
-0,109 0,446 -0,071 -0,135 0,925 2,464 1,409
Durable Goods Orders ex Transportation
-0,11 0,264 -0,071 -0,135 0,517 2,464 1,409
Building Permits (MoM)
0,27 0,357 0,086 -0,529 1,352 1,645 1,363
Initial Jobless Claims
-0,014 -0,001 -0,128 -0,024 0,719 1,934 1,43
Housing Starts (MoM)
0,222 0,192 0,066 -0,529 1,099 1,645 1,363
Import Price Index (YoY)
0,057 -0,523 -0,148 -0,036 0,919 1,421 1,451
Import Price Index (MoM)
0,057 -0,523 -0,148 -0,036 0,919 1,421 1,451
29
Period
Correl. 2007/1
- 2010/1
Correl. 2007/1
- 2008/1
Correl. 2008/1
- 2009/1
Correl. 2009/1
- 2010/1
AMV ($/min) 2007/1
- 2008/1
AMV ($/min) 2008/1
- 2009/1
AMV ($/min) 2009/1
- 2010/1
ECB Interest Rate Decision
-0,116 0,055 -0,354 0,528 0,158 2,956 0,427
Consumer Confidence
0,032 -0,668 -0,062 0,252 1,071 1,948 0,859
Richmond Fed Manufacturing Index
-0,178 -0,68 0,16 -0,182 0,959 1,404 1,314
Core Personal Consumption Expenditure - Prices Index (YoY)
0,021 0,236 -0,272 -0,317 1,002 1,715 0,922
Personal Income (MoM)
0,003 0,324 -0,272 -0,317 1,109 1,715 0,922
Personal Consumption Expenditures (MoM)
0,003 0,324 -0,272 -0,317 1,109 1,715 0,922
Core Personal Consumption Expenditure - Price Index (MoM)
0,003 0,324 -0,272 -0,317 1,109 1,715 0,922
ISM Manufacturing
0,206 -0,009 0,177 -0,078 1,133 1,159 1,24
Pending Home Sales (MoM)
-0,085 0,121 -0,367 0,061 0,419 0,972 1,741
ADP Employment Change
0,319 0,201 0,363 -0,052 0,329 1,398 1,712
NY Empire State Manufacturing Index
0,135 0,347 0,179 0,15 0,738 1,429 0,879
Nonfarm Productivity
0,026 0,231 -0,051 -0,198 1,074 0,763 1,147
New Home Sales
-0,239 0,43 -0,287 -0,441 1,238 0,789 0,919
ISM Non-Manufacturing
-0,023 0,412 -0,067 0,013 0,976 1,4 0,689
Housing Price Index (MoM)
-0,189 0,017 0,044 -0,226 0,083 1,168 0,969
Factory Orders 0,118 0,384 0,475 -0,487 0,791 1,299 0,892
Trade Balance 0,098 -0,398 -0,001 -0,022 1,224 1,498 0,418
Construction Spending (MoM)
0,067 -0,039 0,177 -0,134 0,124 1,159 1,29
Existing Home Sales Change
-0,26 -0,524 -0,419 0,064 1,038 0,55 1,271
Unit Labor Costs
-0,118 0,2 -0,249 -0,198 0,713 0,567 1,147
30
Period
Correl. 2007/1
- 2010/1
Correl. 2007/1
- 2008/1
Correl. 2008/1
- 2009/1
Correl. 2009/1
- 2010/1
AMV ($/min) 2007/1
- 2008/1
AMV ($/min) 2008/1
- 2009/1
AMV ($/min) 2009/1
- 2010/1
New Home Sales (MoM)
-0,055 0,431 -0,287 -0,441 0,638 0,789 0,919
Consumer Credit Change
0,148 -0,137 0,202 -0,665 0,593 1,051 0,626
Existing Home Sales (MoM)
-0,178 -0,084 -0,419 0,064 0,537 0,55 1,271
The analysis of the data leads us to the conclusions that:
By observation of the results it was not detected a direct relationship between the
magnitude of the impact and the correlation;
The correlation varies a lot over the years and it can be strongly positive in some years and
negative in others.
It was also found that the 50 top variables resulting from the filtering of three years (from 2007
to 2010) are the same that from filtering 2 in 2 years (from 2007 to 2009 and from 2008 to
2010). Only difference that can be found performing filtering with different intervals is that the
top 12 variables have always the same rank and that there are only some small changes in the
ranking of the rest of the variables.
To verify the correlation between simultaneous contribution of several variables and the index
10 variables were selected with relatively high correlation in the year 2007, that are: Fed
Interest Rate Decision (variable with the higher correlation, 0.465), Consumer Price Index Ex
Food & Energy (YoY), Real Personal Consumption Expenditures (QoQ), Durable Goods
Orders, Building Permits (MoM), New Home Sales, ISM Non-Manufacturing, New Home Sales
(MoM), Factory Orders, Durable Goods Orders ex Transportation. The correlation between the
sum of the impacts of these variables with the Index is 0.713 which is much higher than the
individual correlations of the variables and the correlation of the sum of all the impacts that is
0.2. Thus, there is strong evidence that the market follows the cumulative impact of the most
important variables (variables with higher correlation) for a given period of time. By analyzing
the variables involved it is possible also to conclude that in fact these are the variables closely
related to US subprime mortgage crisis of this period (Interest Rate, Building Permits and New
Home Sales). The relationship between the sum of the impacts and the evolution of the index
can also be clearly seen in Figure 10 and it must be noted that the sign of the derivative of the
sum often anticipates the evolution of the index.
31
Figure 10 - MEV Impact Sum and S&P 500 Index Futures in 2007
Thus, to meet the proposed goals, there is a need to discover the combination of
macroeconomic variables best correlated with the Index. There are 50 macroeconomic
variables and consequently there are possible combinations associated at each
moment and it is impractical to carry out this research using only human capabilities.
3.1.6 Technical Data Time Series
Despite the wide range of existing Technical Indicators and in some cases their effectiveness, in
this work only one indicator was selected and used, since it is intended to give greater
emphasis to the measurement of impact of the macroeconomic indicators. The important aspect
that should be taken into account is the need to avoid any loss in the situations when the
Fundamental Indicators and Market Trend point in opposite directions, or in other words, even if
Macroeconomic Indicators point something there steel a need to choose the right moment to
act. To achieve this, most commonly used hourly and daily Moving Averages (MAs, described in
2.2.2.2) of the close prices are used (see Table 15), due to its simplicity of calculation and
efficacy.
Table 15 - Moving Averages Used
Moving Average Number of Periods (days or hours)
Daily 10, 20, 30, 50, 100, 200
Hourly 10, 20, 30, 50, 100, 200, 300
The transactions decisions should be made based on the macroeconomic situation and the
confirmation of the trend and timings of the MAs. When the moving average crosses down the
index (becomes lower than) is considered to be a signal to buy. When the moving average
32
crosses up the index (becomes higher than Index) is considered to be a signal to sell. An
example of these two situations of the usage of MAs is illustrated in Figure 11 (should be noted
that not all buy and sell points are highlighted).
BUY
SELL
Figure 11 - Moving Average Usage Example
3.1.7 Volatility Measurement
Another goal of this work is to avoid losses at times when the macroeconomic news have no impact
on the market and there is a climate of fear or uncertainty. For this purpose is used Chicago Board
Options Exchange (CBOE) Volatility Index (VIX) which shows the market's expectation of 30-day
volatility. It is constructed using the implied volatilities of a wide range of S&P 500 index options, it is
meant to be forward looking (investors' expectations on future market volatility), it is calculated
from both calls and puts. The VIX is a widely used measure of market risk and is often referred to as
the "investor fear gauge". The VIX daily data time series were extracted from [27] and the format of
the data collected is present in Figure 12.
DATE HIGH LOW CLOSE VOLUME ADJ. CLOSE
Figure 12 - VIX Data Format VIX values greater than 30 are generally associated with a large amount of volatility as a result
of investor fear or uncertainty, while values below 20 generally correspond to less stressful,
even complacent, times in the markets. An example of climate of fear and uncertainty is
presented in Figure 13 (between 09/2008 and 04/2009), where is observed the increase of the
VIX and a great fall of the S&P500.
33
Figure 13 – VIX and S&P500
3.2 Conclusions
After performing the analysis of data it was possible to draw many important conclusions which
had a great influence in all the choices that were made during the implementation of the
solution. In the first place, it was established a new way of measuring the impact of
macroeconomic news based on minutely price variations. Then, in order to choose the most
important variables among the available data and in order to validate the effectiveness of impact
measurement method, all the Macroeconomic Data collected was rated based on AMV of each
variable. It was found that the U.S. Economic Indicators Most Sensitive to Stocks according to
[2] were the variables with high minutely AMV scores, validating the idea that the minutely price
changes can be used as a measure of the impact of macroeconomic variables. Then, it was
found that in certain periods of time there was a strong correlation between the impacts of
macroeconomic variables and the index price movements. Also it was verified that certain linear
combinations have a much stronger correlations with the index than individual correlations
themselves. However, the human analysis of huge amount of data and all the possible
combinations is impractical, what justifies the implementation of the automated solution
described in the following sections. Since it is impractical to carry out this research using only
human capabilities, to perform this task, it was developed a Genetic Algorithm capable of
efficiently search the approximate location of a global optimum combination of the variables.
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
0
200
400
600
800
1000
1200
1400
1600
S&P500
VIX
34
35
CHAPTER 4 Solution’s Architecture and Implementation
The goal of this chapter is to provide the description of the solution developed to forecast the
stock market movements using Macroeconomic variables and Technical Indicators based on
analysis of data time series made in the previous chapter. It starts with the formulation of the
solution and presents the overall architecture of the developed system. Subsequently, detailed
characterization of the several modules within the system and description of the Genetic
Algorithm used is shown. Finally, a detailed description of the implementation of each module of
the system is described.
4.1 Overall Architecture
Given the problem at hand, relying on the analysis made in the previous chapter and with the
goal of creating a modular, flexible, scalable and reusable application, it was decided to use a
multitier architecture. To achieve these goals the solution was divided into two main layers:
Optimization and Simulation Layer and Application Layer.
The Optimization and Simulation Layer is the base and the support layer that allows the
development of any optimization application specific to some given context. It is a reusable set
of libraries and classes that can be used in any software system, that’s why it was also
designated as Optimization and Simulation Framework. This framework provides a particular set
of rules and specifications for routines, data structures and classes, in other words it provides
an Application Programming Interface (API). It was decided to divide this layer into three parts
because the optimization problems similar to the problem at hand typically differ in three key
aspects, namely, the optimization algorithm, the data series and the model representation.
Given that in this work it is used the Genetic Algorithm for already mentioned reasons (3.1.5),
the developed framework provides the GA that can be used in a generic and flexible way to
solve any specific optimization problem that only differ in model or hypothesis representation
and in data series.
The second layer is built using the Optimization and Simulation Framework above of this layer
by implementing interfaces and extending the classes defined in the API. It is in this section that
are defined the data time series and the model (or the hypothesis representation) which will
later be used by the genetic algorithm throughout the optimization process. It also offers a
simple interface that at the start-up allows the confirmation of the simulation’s settings and the
state of evolution of the algorithm at each moment of its execution.
A detailed manual of the installation of the application and of its usage is presented in
APPENDIX B - Application’s User Guide. In this section are also described application’s input
and output files.
36
Optimization and Simulation Framework
Optimization Application Programing Interface (API)
Data Time SeriesGenetic Algorithm Model
Specific Data Time SeriesApplication Logic Specific Model
Application
Optimization and Simulation Layer
Application Layer
User Interface
Figure 14 - Solution's Overall Architecture
4.2 Implementation’s Architecture
Because of the high demands for computational resources and in order to meet the
requirements described in 4.1, C++ programming language was chosen for the implementation
due to its characteristics: a combination of both, high and low level language features, general-
purpose, high-performance and multi-paradigm. The design of the implementation was done
using the Enterprise Architect application, visual modelling tool for the planning, design and
construction of software systems. This tool was used to establish the structure of the application
packages as well as to create class diagrams using Unified Modelling Language (UML). This
application was also used to generate the base code (classes’ structure and header files) which
was later integrated in Eclipse IDE where the implementation of the solution was completed.
Quick Use Guide of the Enterprise Architect can be found in APPENDIX D – Enterprise
Architect Quick Use Guide.
The division of the program was made in three main packages: Optimization and Simulation
Package (Optimization and Simulation Framework and API), problem Model Package and Data
Time Series Package. The package diagram can be seen in Figure 15. Thus, the Optimizations
and Simulation Layer is implemented in a separate package (GA package) to be easily
37
reusable. Due to the complexity of the problem it was decided to divide the Application Layer in
three parts (two packages and the main application). In order to make the system scalable and
facilitate debugging were created two separated packages for this problem’s specific Data Time
Series (Data package) and this problem’s model (Model package). In the Data package it is
implemented the data structure that stores the Index’s data time series, Macroeconomic data
the time series and Technical data time series (MAs). In this package are also implemented
functions that can filter macroeconomic data, calculate AMVs and other measures necessary to
solve the problem. In the Model package, as the name suggests, it is modelled and defined the
problem representation (GA’s hypothesis) as well as the methods of dealing with representation
(Genetic Operations, Evaluation and other functions).
Figure 15 - Package Diagram It should be noticed that the main program is not present in the package diagram and it use
these packages as a base. The Data and Model packages import the base Optimization
package in order to use the Optimization API. As the evaluation of the model throughout the
research process is based on available data, the Data package must be imported in the Model
package, to allow the definition of evaluation function.
38
Detailed descriptions of all the layers, packages, data structures, classes and other features are
done in the following sections. In the first place it is made the description of implementation of
the Optimization and Simulation Layer followed by the Application Layer.
4.3 Optimization and Simulation Layer
In this section it is presented a summary description of the Genetic Algorithm (theoretical
description, possibilities and the options taken) followed by the flowchart of the algorithm and
the detailed description of the implementation of this layer (diagram of the classes).
4.3.1 Genetic Algorithm
GA is the member of the evolutionary algorithm’s family that starts from a high-level statement
of what needs to be done, and using principles of Darwinian natural selection and biologically
inspired operations, solves automatically the problem. During the search process, hypotheses
(possible solutions) are treated as individuals of a population, and their fitness that needs to be
maximized represents the measure of their quality. Elements of the population mate, mutate,
reproduce and evolve until some termination condition is met and an approximate solution is
found. The GA has the capacity of adaptation to the problem, independently of the size and the
complexity of the solution wanted and this is the reason why it is used in this work. In this thesis,
the algorithm was designed relying on [3], [28] and [29], more precisely on the chapters
dedicated to Genetic Algorithm and Genetic Programming. The flowchart of the algorithm, the
decisions taken and the detailed description of all the steps and functionalities are presented in
the following sections.
4.3.2 Hypotheses Representation
Hypotheses are the possible solutions of the problem and the fundamental elements of the
hypotheses (also called individuals or chromosomes) are its genes. The genes are problem
specific variables or functions that are used in conjunction to construct potential solutions of the
problem. The structure of the hypotheses specific to this problem is described in 4.4.1.
4.3.3 The Fitness Function
Fitness is a numerical value used to measure the appropriateness of a solution and it can
combine two or more different elements (multiobjective). Fitness is measured in each iteration
since the initial random population is created and is used to answer the question “how good (or
bad) each hypothesis is?”. The fitness function specific to this problem is described in 4.4.1.
4.3.4 The Genetic Operations
The genetic biologically inspired operations include crossover (sexual recombination), mutation
and reproduction. The techniques used (and most commonly used) in this work to perform
these operations are described in the following sections.
39
4.3.4.1 Recombination Operation
Recombination is the most important operation in two primary operations used for modifying
structures in the Genetic Algorithm, where two selected solutions are combined (sexually or
asexually) to form two new solutions (offspring).
Reproduction is an asexual method where a selected individual is copied into the new
population. Generally 10% of the individuals are allowed to reproduce. Since the fitness of
selected individual does not change (individual does not need to be tested as the result is
already known), reproduction has a significant effect on the total time required for GA search
because there is 10% reduction in the required time to test the fitness of the associated
population.
The most commonly used technique to perform the sexual recombination is Uniform Crossover
that is global and less biased when compared to that of standard and one point crossover and it
outperforms one point crossover, which in turn outperforms a two point crossover relying on
[30]. This method simply considers each gene position of the two parents and swaps the two
genes with a probability of 50%. The reproduction methods used to solve this problem are
described in 4.4.1 and the most commonly used probabilities to perform these operations are
presented in 4.3.8.
4.3.4.2 Mutation Operation
Mutation is another important operation used in genetic programming. Different types of
mutations are possible. In this operation chromosomes’ genes can be replaced by randomly
generated genes (Random Resetting) or can have some small random variations (Creep
Mutation). Mutation in GA has two parameters:
-The probability of choosing mutation (GAs use mutation rates in range such that on average
between one gene per generation and one gene per offspring mutates).
-The probability of choosing an internal point within the parent to be mutated.
The Mutation operator used to solve this problem is described in 4.4.1 and the most commonly
used probabilities to perform this operation are presented in 4.3.8.
4.3.5 Termination Criterion
There are two ways of specifying when the GA should stop, that can easily affect the quality and
speed of the search. In the first one a certain amount of time or number of generations is
specified. In the second one an error tolerance on the fitness is established. In both the cases at
the end of execution the best individual produced represents the solution of the GA.
In this work the termination criterion is based both in the number of generations and the fitness
evolution during the time. Thus, the algorithm can terminate the search when it is reached a
40
certain number of generations or when no improvement of the fitness is achieved for a certain
number of iterations.
4.3.6 Selection Function
To ensure that the GA’s search is non-random process and that fitter solutions are typically
more likely to be selected, there is the need to define the selection function. In many studies,
the selection is performed using Fitness Proportionate Selection (FPS) function, also called
Roulette Wheel. If the is the fitness of the solution and is the total sum
of all the members of the population, then the probability that the solution will be copied to the
next generation is:
(5)
This selection function is implemented using the following steps:
(a) Order the individuals in a population by their normalised fitness (best at the top of the list)
(b) Chose a random number, , from zero to one.
(c) From the top of the list, loop through every individual keeping a total of their normalised
fitness values. As soon as this total exceeds stop the loop and select the current individual.
Many other selection methods are used in GAs, among which Tournament Selection and
Stochastic Universal Sampling (SUS) are highlighted. In the first method the genetic program
chooses two random solutions and the solution with the higher fitness will win. This method
simulates biological mating patterns but and it is mostly used if the population size is very large
(several thousands) that is not the case.
The second method is similar to the FPS regarding the probability of the selection (that is also
calculated using (5)) and the ordering of individuals. In this method equally spaced pointers are
placed over the line starting in the random value chosen (lower than the probability of the best
individual) as many as there are individuals to be selected, what allows a greater variety and
avoids premature convergence. Stochastic Universal Sampling is a development of fitness
proportionate selection (an elaborate variation of FPS) and it ensures that the observed
selection frequencies of each individual are in line with the expected frequencies. While fitness
proportionate selection chooses several solutions from the population by repeated random
sampling, SUS uses a single random value to sample all of the solutions by choosing them at
evenly spaced intervals. So if we have an individual with probability of selection equal to 0.055
and we select 100 individuals, we would expect that individual to be selected between five and
six times and SUS guarantees this. The individual will be selected five or six times, not ten, not
41
zero times and not 100 times. FPS does not make this guarantee. SUS is performed using the
following steps:
(a) Order the individuals in a population by their normalised fitness (best at the top of the list)
(b) Chose a random number, , from zero to
., where is the number of individuals to be
selected.
(c) For each Natural number
, from the top of the list, loop through every
individual keeping a total of their normalised fitness values. As soon as this total exceeds
stop the loop and select the current individual.
Initially in this work FPS was used to perform the selection of the individuals, but it showed to
have the limitation of premature convergence what led to the usage of SUS (with the number of
individuals to be selected ( ) equal to the population size).
4.3.7 The Flow-Chart of the Genetic Algorithm
GA uses the following execution steps to search the solution:
1. Generate an initial population of individuals (chromosomes) that are of random compositions
of the genes previously defined.
2. Evaluate each individual in the population and assign it a fitness value according to how well
it solves the problem that need to be solved.
3. Create a new population:
i) Copy individuals
ii) Create new individuals by mutation
iii) Create new individuals by recombination
4. The best individual (hypothesis) that appeared in any generation, the best-so-far solution, is
designated as the result of GA.
During the process, once the new population is complete (individuals formed by two main
methods: reproduction and crossover) the old population is destroyed. This iterative process of
measuring fitness and performing the genetic operations (steps 2 and 3) is repeated over many
generations. The run of genetic algorithm terminates when the termination criterion is satisfied.
The best individual ever encountered during the run (i.e. the best-so-far individual) is typically
designated as the result of the run. All the individuals in initial random population and the
individuals resulting from genetic operations during a run of the GA are syntactically valid
hypotheses. The flowchart of the GA designed in this work is presented in Figure 16.
42
Generation=0
Termination Criterion Satisfied?
No
Individuals=0
Individuals=M?Generation=Generation+1 Yes
Designate ResultYes
End
No
Select Two Individuals Based on Fitness
Perform Recombination with Probability Pr
Perform Mutation with Probability Pm
Individuals=Individuals+2
Create Random Individual
Insert Offspring into Intermediate Pool
Evaluate Individuals’s Fitness
Individuals=M?
Yes
Individuals=0
Individuals=Individuals+1
Insert Individual into Population
Population=Intermediate Pool
Start
Offspring=New Individual(s)?
No
Evaluate Offspring’s Fitness Yes
Figure 16 - Genetic Algorithm Flowchart
43
4.3.8 Algorithm’s Parameters
The parameters of the genetic algorithm that show better experimental results vary greatly
depending on the problem and these can be chosen experimentally. However, relying on the
analyzed literature ([3], [28], [29] and all the publications related with GAs and stock market
forecasting) the algorithm’s specific parameters must be chosen taking into account several
factors:
1. Population size: A larger population, greater exploration of the problem space. The More
complex a problem, the greater the population size needed. However, when the number of
individuals is very high, the search process can become too much time consuming. In Financial
problems are often used populations with the size close to 100 (sometimes even lower)
individuals, what is also taken into consideration in this work.
2. Termination Criterion: Maximum number of generations and fitness are the most commonly
used criteria. In this work are taken into consideration both the criteria and the maximum
numbers of generations and fitness improvement (number of generations without improvement).
3. Probability of crossover: What proportion of the population will undergo crossover (sexual
recombination). General varies from 90% or higher values.
4. Probability of reproduction: What proportion of individuals in a population that will undergo
reproduction (asexual recombination), in other words, how many individuals can be cloned
without suffering the crossover. Generally this probability stays constant at 10%, what is also
taken into consideration in this work.
5. Probability of mutation: What proportion of individuals in a population that will undergo
mutation, in other words, how many individuals can have random changes in some of their
genes. Usually mutation rate between 5% and 10% (genes) are used, what is also taken into
consideration in this work.
In this work, to find the appropriate values for these parameters were taken into account
literature’s suggestions after performing some tests were chosen the following values (Table
16):
Table 16 - Genetic Algorithm’s Parameters
Input Parameter Value
Mutation Rate 0.5 of individuals ( 5% of genes)
Crossover Rate 0.95 of individuals
Population Size 100 and 1000
Generations 200
Generations Same Fitness 50
44
4.3.9 Optimization Package Class Diagram
The Optimization Package Class Diagram is presented in Figure 18 and it is divided into three
classes and two interfaces that are listed and described below:
GeneticAlgorithm: it is the main class that performs the optimization task using all the other
components (specified interfaces and abstract classes).
IHypothesis: it is the interface that represents a generic hypothesis, offering the methods that
will be used by a GA to reproduce, mutate, evaluate and select the hypothesis during the
optimization process.
IHypothesisFactory: it is the interface that is used by a GA to create initial random population.
It is also used (or it can be used) by the hypothesis during the genetic operations (crossover
and mutation).
DataSeriesProvider: it is the abstract class that represents the data time series that are used
by GA and consequently by the hypotheses during the process of evaluation.
MTRand – it is the class that implements a famous Mersenne Twister pseudo-random number
generator. Besides being used by the GA, it can and should be used in the user defined code.
However, there are no limitations of using any other random generator.
Thus, the optimization can be performed using this framework in a generic way and
independently of the problem at hand. The only condition that must be guaranteed is that the
user defined model objects (Hypotheses) generated by a user defined hypothesis factory and
the data time series provider objects must follow the rules specified in the interfaces and
abstract classes provided in this optimisation framework. In other words, to guarantee the
correct functioning of this layer, the correct implementation of the interfaces and the extension
of the classes are mandatory. The flowchart of the correct implementation and usage of the
Optimization API is presented in Figure 17.
Create HypothesisFactory and DataSeriesProvider objects
Create GA objectSearch for Solution until
termination criterion is met
Implementation
Usage
Extend Data Series Provider Class
Implement IHypothesis Interface
Implement IHypothesisFactory Interface
Figure 17 - Implementation and Usage of the Optimization API
45
Figure 18 - Optimization Package Class Diagram
4.4 Application Layer
Defined the structure of the optimization layer, in the following sections are presented the
implementation of the user defined model and the data time series packages specific to the
problem at hand.
4.4.1 Problem Specific Model
In the previous sections it was established that the sum of the impacts in some cases (certain
linear combinations) is strongly correlated with the evolution of the index. It was also determined
that the derivative of the impact sum in these cases frequently anticipates the evolution of the
prices. Thus, in this work it is intended to discover the combinations of linear variables which
sum of impacts is highly correlated with the index, in order to forecast the evolution of the prices
and allow the discovery of profitable strategies. Besides using macroeconomic variables, it is
also proposed to use moving averages with the help of which it is planned to choose the right
moments of entry and exit in the market. In order to avoid losses at times when macroeconomic
news are ignored by investors in a climate of fear and uncertainty, it is intended to use the
Volatility Index (VIX). There is also evidence that the MEV’s impact loses intensity over the time
46
and that the latest news have more impact than the older ones. Therefore, it is proposed to
model the decay of the impact’s intensity. Given that the correlation between the index and the
impacts of variables varies over time it was decided to determine which the most appropriate
training intervals are. It is also intended to determine the weight of each factor and to use a
voting system, according to which, when exceeded some certain threshold of confidence,
investment decisions can be taken. Based on these criteria it was decided to use hypotheses
with the structure of parameters to be optimized presented in Figure 19.
MEV Collection
MEV Impact Sum Weight
MA1 MA2
MA1Weight
MA2Weight
MEV Impact Sum Derivative Weight
Decay
VIX LimitThreshold
Figure 19 – Hypothesis Optimization Structure
The description of each parameter to be optimized and the ranges of values that these
parameters can take are presented in Table 17.
Table 17 - Parameters' Description and Ranges of Values
Parameter to be optimized Description Range of values
MEV Collection Combination of different Macroeconomic variables.
Any combination of available variables (Table 13).
Training Window Number of months that are considered in evaluation of hypothesis during GA optimization.
Specified in the start-up, typically used values are: 12, 18, 24, 30 and 36.
MA1 Moving average used by the hypothesis.
Can take any value available in Table 15.
MA2 Moving average used by the hypothesis.
Can take any value available in Table 15.
Decay Time that the impact of a MEV takes to decay.
Integer value that can vary between 1 and 8 weeks (expresses in days or days and hours).
VIX Limit The limit below which the market is consider being calm.
Can take the values 20, 25, 30, 35 and 40 for historical reasons.
Threshold Threshold of confidence that needs to be exceeded to force investment decisions.
Multiple of 0.1 between 0 and 1.
MEV Impact Sum Weight Importance of the MEV Impact Sum.
Integer from 0 to 10.
MEV Impact Sum Derivative Weight
Importance of the MEV Impact Sum Derivative.
Integer from 0 to 10.
MA1 Weight Importance of the MA1. Integer from 0 to 10.
MA2 Weight Importance of the MA2. Integer from 0 to 10.
47
To determine what action should be done at each moment it was decided to implement a
weighted voting system, where all the decisions are made relying on the sum of the votes, their
weights and the threshold. First are calculated the Maximum Voting Balance and Voting
Balance using (6) and (7). Then are used the following three rules:
.
(6)
(7)
In an attempt to model the human interpretation of the news, it was considered that the decay of
the impact of macroeconomic news and the way in which the impact is accounted, can take
different forms. Therefore, two additional parameters were defined, namely Decay Type and
Contribution Type. Corresponding ranges of values which these can take that can be seen in
Table 18.
Table 18 - Hypothesis' additional Parameters
Additional Parameter Range of Values
Decay Type exponential, simple, none
Contribution Type unit, linear
Given that the investors frequently qualify the news simply as good or bad without associating
an intensity from some certain range of values, it was considered that it would be interesting to
analyze the forecasting capabilities of the hypotheses considering that each variable can
contribute with +1 or -1 to the impact sum (unit contribution). In the case of the linear
contribution, it is considered that the MEV impact sum is calculated in a simple way. Besides
this, it is considered that the decay of the impact of the news can take three forms: exponential,
simple and none. In the case of exponential decay the impact of each variable is calculated
using (8), i.e., it is considered that the decay corresponds to the time necessary to reduce the
impact of the variable to 36.79% of its initial value. When the decay type is simple, the variable’s
contribution is valid and it accounted until a certain limit of time (decay) is reached since its
48
publication. It should be noted that the MEV Sum Derivative Weight is set to 0 (zero) when the
Decay Type is exponential or simple, because in these situations it will be always a negative
variation due to the decay. Like the name suggests, in the case when the decay type in none it
is considered that there is no decay of impact.
(8)
All scenarios resulting from the formulation of the problem will be analyzed in the next chapter.
The scenario to be optimized and simulated by the application is specified before the start of the
application in the configuration file, whose format is described at the end of this chapter.
4.4.1.1 Evaluation Function
The aim of this work, besides obtaining profitable strategies, it is also the risk minimization. The
temporal distribution of profitability is very important in the measuring of the risk and sequence
of negative returns, beyond leading to large losses, can act as a very negative psychological
factor. Three main measures are used to evaluate the investment strategies in this work,
namely Profitability Index (PI) also known as profit investment ratio (PIR), Return on Investment
(ROI) and Maximal Drawdown (MDD). The first two measures are used to determine if the cash
flow stream over the holding period is higher or lower than acquisition or investment cost, and if
the investment strategy meets the return objectives or not. The Drawdown, relying on [31], is a
percentage loss that occurs from the peak of the price to its lower (or highest in the case of
selling) posterior value and it is frequently used to determine an investment's financial risk. The
Maximum Drawdown is the largest single drop from peak to bottom and it shows how sustained
one’s losses can be and it is frequently used as the risk measure by many money management
professionals. Relying on [32], it is also possible to define a Calmar Ratio (CR) as a function
that uses both measures simultaneously and it is a measurement frequently used to evaluate
Trading Advisors and hedge funds. The Calmar Ratio is an important statistic used to measure
return (potential opportunity’s gain) vs. drawdown risk (potential opportunity’s loss) in
investment area.
Profitability Index (PI), Return on Investment (ROI) Maximum Drawdown (MDD) and Calmar
Ratio (CR) are calculated using (9), (10), (11) and (12) respectively, for a certain continuous
period T and P(t) representing the price at the time t.
(9)
49
(10)
(11)
(12)
However, considering that the investment is made intermittently (i.e. not always being on the
market) in consecutive time intervals and making reinvestment, the calculations of profitability
measures can become complex. Thus, in this work is used the characteristic of the PI shown in
(13). In these circumstances the Calmar ration can be calculated using (14) and (15).
(13)
(14)
(15)
In order to allow the evaluation of the strategies based on Profitability only and Profitability and
Drawdown, it was decided to implement the following evolution functions:
(16)
(17)
(18)
4.4.1.2 Crossover Operation
In this work, due to the reasons already mentioned in 4.3.4, it is used the Uniform Crossover to
perform the sexual recombination. The Uniform crossover applied to this specific problem is
50
illustrated in Figure 20. In the cases when the MEV Collections of the two parent hypotheses do
not have the same size, the resulting hypotheses (offspring) will have the MEV Collections with
the mean size of the parents’ lengths. For instance, if two selected parents have 8 and 4 MEVs
respectively and there are 12 different variables, each offspring will have 6 MEVs. The
probability of choosing crossover as a genetic operation to be applied is specified before the
start of the application in the configuration file, whose format is described at the end of this
chapter.
MEV Collection
MEV Impact Sum Weight
MA1 MA2
MA1Weight
MA2Weight
MEV Impact Sum Derivative Weight
Decay
VIX LimitThreshold
MEV Collection
MEV Impact Sum Weight
MA1 MA2
MA1Weight
MA2Weight
MEV Impact Sum Derivative Weight
Decay
VIX LimitThreshold
MEV Collection
MEV Impact Sum Weight
MA1 MA2
MA1Weight
MA2Weight
MEV Impact Sum Derivative Weight
Decay
VIX LimitThreshold
MEV Collection
MEV Impact Sum Weight
MA1 MA2
MA1Weight
MA2Weight
MEV Impact Sum Derivative Weight
Decay
VIX LimitThreshold
Original Hypotheses
Resulting Hypotheses
Figure 20 - Crossover Operation
4.4.1.3 Mutation Operation
In this work, different types of mutations are possible. In this operation chromosomes’ genes are
replaced by randomly generated genes (Random Resetting) or suffer some small random
variations (Creep Mutation). The probability of choosing mutation as a genetic operation to be
applied is specified before the start of the application in the configuration file, whose format is
described at the end of this chapter. In this work, the main complexity appears due to the huge
amount of macroeconomic data. Because of this fact, the probability of choosing an internal
51
point within the parent to be mutated is not uniform, with the MEV Collection having a higher
probability of mutation than other genes. In this work the MEV Collection is mutated 75% of
time, while the other genes are mutated uniformly 25% of time (all the genes has the same
probability of be mutated, this probabilities were experimentally estimated) in the cases when
there is no decay. In the case of considering that the decay exists, the probability of mutation of
decay gene is three times higher than the probability of the rest of genes. The detailed
description of the mutation performed to each gene is presented in Table 19.
Table 19 - Mutation Operation
Parameter to be mutated Mutation type Description
MEV Collection Creep Mutation. Random Variable is added, removed or changed.
MA1 Random Resetting. Randomly chosen value from available in Table 15.
MA2 Random Resetting. Randomly chosen value from available in Table 15.
Decay Random Resetting.
Randomly generated float value that can vary between 1 and 8 weeks (expressed in days and hours).
VIX Limit Random Resetting. Randomly chosen value from: 20, 25, 30, 35 and 40.
Threshold Random Resetting. Randomly generated multiple of 0.1, range specified in input file.
MEV Impact Sum Weight Random Resetting. Incremented, decremented or randomly generated integer, range specified in input file.
MEV Impact Sum Derivative Weight
Random Resetting. Incremented, decremented or randomly generated integer, range specified in input file.
MA1 Weight Random Resetting. Incremented, decremented or randomly generated integer, range specified in input file.
MA2 Weight Random Resetting. Incremented, decremented or randomly generated integer, range specified in input file.
4.4.1.4 Model Package Class Diagram
The Model Package Class Diagram is presented in Figure 21 and it is divided into two classes
and two enumerations that are listed and described below. It should be noted that because of
the dimensions of the class diagram, most of the methods and attributes of the classes of this
package were omitted.
MEV_GA_Hypothesis: it is the application layer class that implements the IHypothesis
interface that represents a concrete hypothesis specific to this problem, implementing the
methods that will be used by a GA to reproduce, mutate, evaluate and select the hypothesis
during the optimization process of this specific problem.
52
MEV_GA_HypothesisFactory: it is the application layer class that implements the
IHypothesisFactory interface that that is used by a GA to create initial random population of the
Hypothesis specific to this problem. It is also used by this problem’s specific hypothesis during
the genetic operations (crossover and mutation) to generate random genes.
MEVContributionTypeEnum and MEVDecayTypeEnum: are the enumerations used to define
one of the specific scenarios described in the beginning of this section (4.4.1).
The random generator used inside MEV_GA_Hypothesis and MEV_GA_HypothesisFactory
classes is the MTRand class specified in Optimization package.
Figure 21 - Model Package Class Diagram
53
4.4.2 Problem Specific Data Time Series
Because of the existence of a variety of data types in this problem, the modelling of data time
series was done with particular care to facilitate its loading, storage and access. Thus, all the
data records were classified as events with the common characteristics like the date and time
and the name. Subsequently were modelled specific data time series corresponding to
Technical Events (MAs and VIX), to index data (S&P500 Futures), date and to Macroeconomic
Data (All the MEVs). The events’ modelling Class Diagram is presented in Figure 21. It should
be noted that because of the dimensions of this package, most of the classes were omitted.
Internally, the data is stored in the Maps (associative container that stores elements formed by
the combination of a key value and a mapped value) where all the data events are indexed by
date/time and names, making the access to the data simple and intuitive as in the case of
vectors. The maps use a balanced search trees (red–black trees) in which search, insert and
delete operations have complexity O (log n) (Time complexity in big O notation).
Figure 22 - Events Modeling Class Diagram
54
55
CHAPTER 5 Results
The goal of this chapter is to show and analyse the results of the optimisation and the
simulation of the developed solution in different scenarios described in the previous chapter.
The experimentation is done so as to allow the discovery of the solutions with the best
predictive capabilities, while the analysis of the results is made so as to allow future works
focused on macroeconomic factors. To verify what the importance of macroeconomic factors is,
the first simulations are focuses only on the moving averages (MAs) and volatility (VIX),
enabling future comparison with the case studies where Macroeconomic Indicators are used.
Subsequently, is performed a simulations using all the macroeconomic variables with different
types of decay and contribution. At the end of chapter tests are focused on restricted
parameters in order to allow the discovery of strategies with higher profitability. Throughout the
execution of the tests several scenarios will be eliminated based on the results. During the
simulation it will be made a more detailed analysis of the best results in order to allow the
discovery of more profitable model. The tests will be made using the parameters presented in
Table 20.
Table 20 - Case Study’s I Constant Parameters
Input Parameter Value
Mutation Rate 0.5
Crossover Rate 0.95
Population Size 100
Start Training Date 2007/01/01
End Training Date 2010/01/01
Start Investment Date 2010/01/01
End Investment Date 2011/09/01
Generations 200
Generations Same Fitness 50
Number Of Runs 10
The analysis of the results is made using mainly the measures of Profitability Index and its
Maximum Drawdown. In order to allow a better understanding of the Algorithm’s behaviour, are
also presented Maximum PI and Minimum PI that are observed during the investment period.
The number of transactions is also presented in order to show how frequently the decisions are
taken and how many investment decisions are made. It is considered that there is a transaction
cost of 0.1% associated to each transaction, i.e., to each buy or sell order. In each case study
the results of the discovered strategies are compared with the benchmark, i.e., the Buy and
Hold strategy. In order to offer a normalized measure of the profitability, Annualized Profitability
Index is calculated for each discovered strategy and the B&H strategy.
5.1 Case Study I – MAs, VIX and all MEVs
In this case study are performed several test using only MAs and VIX and using simultaneously
MAs, VIX and all the Macroeconomic Indicators. The sub-sections of this case study also focus
on all the evaluation functions, different decay types and contribution types described in 4.4.1 in
56
order to allow subsequent more restricted case studies. The training is made during 3 years
between 2007/01 and 2010/01 and the testing is performed during 1 year and 9 months
between 2010/01 and 2011/09.
5.1.1 Case Study I.I – MAs and VIX
This case study focuses on MAs and VIX only and the simulation is performed using the input
parameters presented in Table 21 and using different evaluation functions. The results
presented in Table 22, Table 23 and Table 24 show that the solutions found are always the due
to the limited search space (using MAs of 100 and 200 days and VIX Limit of 20).In all the
cases with the solutions have the same form with only difference in the MAs used in the case
when the VIX is higher than the established limit. The strategy (with the best PI is equal to
0.9162) is discovered using PI fitness function but even so the result are much worse than in
the case of B&H where the final PI is equals to 1.0445. No conclusions can be drawn on what is
the best evaluation function based on this results.
Table 21 – Application’s Parameters Case Study I.I
Input Parameter Value
Minimum MEV number 0
Maximum MEV number 0
Threshold Limits 0.5
MEV Sum. Weights 0
MEV Sum. Derivative Weights 0
MA Weights 10
Table 22 - Case Study I.I PI Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9162 1,0874 0,1713 0,1575 26 0,9162 1,0753 0,9488 1,0445
Maximum 0,9162 1,0874 0,1713 0,1575 26 0,9162 1,0753 0,9488 1,0445
Average 0,9162 1,0874 0,1713 0,1575 26 0,9162 1,0753 0,9488 1,0445
Table 23 - Case Study I.I PIMDD Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
Maximum 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
Average 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
Table 24 - Case Study I.I CR Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
Maximum 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
Average 0,9054 1,0874 0,1821 0,1674 26 0,9054 1,0753 0,9421 1,0445
5.1.2 Case Study I.II – MAs, VIX and all MEVs with Linear Contribution
This case study focuses on MAs, VIX and all the Macroeconomic Indicators and the simulation
is performed using the input parameters presented in Table 25. These parameters remain
constant in the following five case studies. The results of the simulation performed using these
57
parameters, linear contribution and different evaluation functions are presented in Table 26,
Table 27 and Table 28.
Table 25 – Application’s Parameters Case Studies I.II – I.VII
Input Parameter Value
Minimum MEV number 50
Maximum MEV number 50
Threshold Limits 0 to 1.0, multiples of 0.1
MEV Sum. Weights 0 to 10
MEV Sum. Derivative Weights 0 to 10
MA Weights 0 to 10
Table 26 - Case Study I.II PI Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7926 1,0693 0,1694 0,1599 24 0,8916 1,0753 0,9335 1,0445
Maximum 0,9047 1,1281 0,2346 0,2284 105 1,0338 1,0753 1,0202 1,0445
Average 0,851 1,09656 0,19035 0,18307 41,8 0,97963 1,0753 0,98737 1,0445
Table 27 - Case Study I.II PIMDD Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7926 1,0617 0,1694 0,1614 24 0,973 1,0753 0,9837 1,0445
Maximum 0,8597 1,1281 0,2346 0,2284 38 1,0338 1,0753 1,0202 1,0445
Average 0,81887 1,09062 0,20871 0,20265 28,4 0,99946 1,0753 0,99963 1,0445
Table 28 - Case Study I.II CR Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7926 1,0767 0,1694 0,1649 24 0,9867 1,0753 0,992 1,0445
Maximum 0,8578 1,1281 0,2346 0,2284 32 1,0508 1,0753 1,0302 1,0445
Average 0,84416 1,11703 0,18305 0,17819 25,8 1,02608 1,0753 1,01556 1,0445
In this case study are observed significant improvements compared to the case study where
only MAs and VIX are used. In all the cases (using all the evaluation function) the average
profitability found is higher than in the case where only MAs and VIX are used, what leads us to
the conclusion that it is possible to obtain better results using simultaneously Macroeconomic
Indicators and Technical Indicators. It is also possible to conclude that the usage of PI
evaluation function leads to worst results than PIMDD and CR evaluation functions. The best
results are discovered using CR evaluation function where in most of cases it is possible to
have no losses, returns close to B&H and smaller number of transactions. Due to these facts,
the following case studies will be focused on the best two evaluation functions (PIMDD and
CR). The best strategies discovered in this case study are using 100 and 200 days MAs, only
use derivative of MEV sum impact with equivalent weights and high VIX Limit of 40. Usage of
the Macroeconomic Indicators also leads to increase of number of transactions and profitability.
58
5.1.3 Case Study I.III – MAs, VIX and all MEVs with Linear Contribution and
Simple Decay
In this case study the simulation is performed using the input parameters presented in Table 25
and it is considered that the MEVs’ impacts suffer a simple decay described in 4.4.1, i.e., the
impact of each Macroeconomic Variable is only considered during a certain interval of time after
its release. The simulation is performed using the parameters presented in and two best
evaluation functions discovered in the previous sections (PIMDD and CR). The results of the
simulation are presented in Table 29 and Table 30.
Table 29 - Case Study I.III PIMDD Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
1 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
2 0,9289 1,3979 0,1891 0,1521 77 1,2811 1,0753 1,1602 1,0445
3 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
4 0,9289 1,3866 0,1842 0,1491 80 1,2707 1,0753 1,1546 1,0445
5 0,9289 1,3979 0,1891 0,1521 77 1,2811 1,0753 1,1602 1,0445
6 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
7 0,9289 1,3408 0,2027 0,167 80 1,2288 1,0753 1,1316 1,0445
8 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
9 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
10 0,9289 1,3734 0,2076 0,167 77 1,2587 1,0753 1,148 1,0445
Minimum 0,9289 1,3408 0,1842 0,1491 77 1,2288 1,0753 1,1316 1,0445
Maximum 0,9289 1,3979 0,2076 0,167 80 1,2811 1,0753 1,1602 1,0445
Average 0,9289 1,37636 0,20107 0,16223 77,6 1,26139 1,0753 1,14946 1,0445
Table 30 - Case Study I.III CR Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9289 1,2284 0,2027 0,167 48 1,1154 1,0753 1,0677 1,0445
Maximum 0,9715 1,3734 0,2443 0,2059 80 1,2587 1,0753 1,148 1,0445
Average 0,94344 1,29806 0,22656 0,18376 63,9 1,18814 1,0753 1,10875 1,0445
The results obtained in this case study are very promising. In all the “runs” and using both
Evaluation Functions the algorithm overcomes the Buy and Hold strategy always. Since the
results in this case study are extremely positive the simulation was also performed using PI
evaluation function which results are presented in Table 31, however this evaluation function
compared to PIMDD and CR evaluation functions shows to have the worst performance again.
Table 31 - Case Study I.III PI Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9117 1,1954 0,1847 0,1641 68 1,0809 1,0753 1,0478 1,0445
Maximum 0,9117 1,2728 0,2417 0,2022 80 1,1664 1,0753 1,0968 1,0445
Average 0,9117 1,24996 0,1922 0,16791 77,6 1,144 1,0753 1,08405 1,0445
59
Some very interesting conclusions can be drawn in this case study, especially in case of using
PIMDD evaluation function that shows to have the best performance. In this case were found 4
different solutions which profitability behaviour during the training and test and comparison with
the B&H are presented in Figure 23 and Figure 24 respectively.
Figure 23 - Case Study I.III Best Strategies’ Profitability
Figure 24 - Case Study I.III Best Solutions vs. Buy and Hold
It can be concluded that the best four strategies discovered in this case study have a very
similar behaviour in the training and testing periods. The analysis of the best solutions reveals
that all the decisions are taken based only on Macroeconomic Indicators’ impact measure and
that the MAs (more precisely MA1) is only used to get out from short position when the volatility
is too high (higher than the limit established during the optimization process). The only
difference that exists between the strategies is the decay that is considered by each strategy
0,9
1
1,1
1,2
1,3
1,4
1,5
1,6
1,7
1,8
1,9
02-01-2007 02-01-2008 02-01-2009 02-01-2010 02-01-2011
Pro
fita
bili
ty I
nd
ex
PI4
PI3
PI2
PI1
0,9
0,95
1
1,05
1,1
1,15
1,2
1,25
1,3
1,35
1,4
Pro
fita
bili
ty I
nd
ex
PI4
PI3
PI2
PI1
B&H
60
that varies between 18 and 20 days (Table 32). All the strategies are taking the investment
decisions based on MEVs’ Impact and use 50 days MA to get out short positions when the VIX
is high. The value of the decay is suggesting that the MEVs’ impacts are very useful in a short-
term investment and that investors don’t have in consideration older news. The behaviour of the
best strategy discovered in this case study is illustrated in Figure 25 where are presented the
S&P500 Futures’ price evolution, 50 days MA, Macroeconomic Impact sum, the operation (1 is
long,-1 is short, 0 is out), the Volatility Index, the Volatility Index Limit, and the Profitability Index
of employed strategy. When the impact sum is higher than 0, the market is classified as “Bull”
and the strategy goes long, while when the impact sum is lower than 0, the market is classified
as “Bear” and the strategy decides to do short selling (goes short). When the VIX gets higher
than the limit, the strategy decides to get out of the market by closing immediately its long
position or using MA1 to get out in the case of short position. In order to improve the strategies
discovery process, the behaviour of the strategy was submitted to a detailed analysis that can
be also observed in Figure 25. The investment period is divided in a green and red regions
corresponding to the regions where the strategy had a high and low hit rates respectively.
Yellow region corresponds to very volatile market where the strategy indicates that no
investment should be done or that the short position must be closed as soon as possible using
MA1. Special attention is given to regions 1 and 2, marked with circles in the top part of the
Figure 25.
Table 32 - Case Study I.III Best Solutions’ Decay
Profitability Index Decay
1,2288 19 days 8 hours
1,2587 19 days 3 hours
1,2707 18 days 12 hours
1,2811 19 days
In the first region and in the beginning of second it is detected a “Bear Market” and is made a
decision of entering in a short selling position while the market is rising. Such situations could
be avoided with the use of moving averages. In the middle of second region the impact sum is
close to zero and crosses zero several times what generate multiple unnecessary transactions
that bring undesired costs. These situations can be avoided by establishing a limit of a
confidence or by ignoring very small variations and using short (10, 20 and 30 days) Moving
Averages.
61
1000
1050
1100
1150
1200
1250
1300
1350
1400
S&P500
MA 50
-22,5
-17,5
-12,5
-7,5
-2,5
2,5
7,5
12,5
17,5
22,5
MEV Impact Sum
0
10
15
20
25
30
35
40
45
50
VIX
VIX Limit
-1,5
-1
-0,5
0
0,5
1
1,5
Operation
0,9
1
1,1
1,2
1,3
1,4
PI
1 2
Figure 25 - Study I.III Best Strategy Decisions Evaluation
5.1.4 Case Study I.IV – MAs, VIX and all MEVs with Linear Contribution and
Exponential Decay
In this case study the simulation is performed using linear contribution and the input parameters
presented in Table 25 and it is considered that the MEVs’ impacts suffer an exponential decay
described in 4.4.1, i.e., the impact of each Macroeconomic Variable is reduced to 36.79% of its
62
initial value after a certain interval of time after its release. The results of the simulation are
presented in Table 33 and Table 34.
Table 33 - Case Study I.IV PIMDD Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,8613 1,1532 0,2409 0,2038 50 0,9047 1,0753 0,9416 1,0445
Maximum 0,9413 1,2322 0,371 0,3011 81 0,9887 1,0753 0,9932 1,0445
Average 0,87457 1,2193 0,34481 0,282 55,1 0,91859 1,0753 0,95021 1,0445
Table 34 - Case Study I.IV CR Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,8853 1,1204 0,1733 0,1547 37 0,9775 1,0753 0,9864 1,0445
Maximum 0,9952 1,1946 0,1969 0,167 81 1,0449 1,0753 1,0267 1,0445
Average 0,89629 1,12782 0,17704 0,15694 41,6 0,99576 1,0753 0,99739 1,0445
Given that the strategies discovered in this case study are not profitable (average loss of less
than 1% in the best case), its detailed analysis is omitted.
5.1.5 Case Study I.V – MAs, VIX and all MEVs with Unit Contribution
This case study focuses on MAs, VIX and all the Macroeconomic Indicators and the simulation
is performed using the input parameters presented in Table 25. The results of the simulation
performed using these parameters, unit contribution and different evaluation functions are
presented in Table 35 and Table 36.
Table 35 - Case Study I.V PIMDD Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7522 1,0272 0,18 0,1721 27 0,8838 1,0753 0,9286 1,0445
Maximum 0,8659 1,1414 0,275 0,2677 38 1,046 1,0753 1,0274 1,0445
Average 0,76407 1,03862 0,265 0,25765 32,7 0,92002 1,0753 0,95094 1,0445
Table 36 – Case Study I.V CR Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7522 1,0272 0,275 0,2677 32 0,8934 1,0753 0,9346 1,0445
Maximum 0,7522 1,0272 0,275 0,2677 32 0,9254 1,0753 0,9545 1,0445
Average 0,7522 1,0272 0,275 0,2677 32 0,9094 1,0753 0,94455 1,0445
Given that the strategies discovered in this case study are not profitable (average loss of
approximately 8%), its detailed analysis is omitted.
5.1.6 Case Study I.VI – MAs, VIX and all MEVs with Unit Contribution and Simple
Decay
In this case study the simulation is performed using the input parameters presented in Table 25
and it is considered that the MEVs’ impacts suffer a simple decay. The results of the simulation
are presented in Table 37 and Table 38.
63
Table 37 - Case Study I.VI PIMDD Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7692 1,0272 0,1676 0,1631 24 0,8619 1,0753 0,9147 1,0445
Maximum 0,8597 1,1702 0,258 0,2512 41 1,0724 1,0753 1,0428 1,0445
Average 0,78701 1,05145 0,2402 0,23386 27,9 0,9005 1,0753 0,93833 1,0445
Table 38 - Case Study I.VI CR Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,6869 1,0272 0,1828 0,178 29 0,7509 1,0753 0,8421 1,0445
Maximum 0,9525 1,1883 0,3403 0,3313 53 1,0437 1,0753 1,026 1,0445
Average 0,82918 1,06746 0,2267 0,21519 37,6 0,90941 1,0753 0,94352 1,0445
Given that the strategies discovered in this case study are not profitable (average loss of
approximately 9% in the best case) its detailed analysis is omitted.
5.1.7 Case Study I.VII – MAs, VIX and all MEVs with Unit Contribution and
Exponential Decay
In this case study the simulation is performed using linear contribution and the input parameters
presented in Table 25 and it is considered that the MEVs’ impacts suffer an exponential decay.
The results of the simulation are presented in Table 39Table 33 and Table 40.
Table 39 - Case Study I.VII PIMDD Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7968 1,0272 0,1949 0,1735 44 0,8369 1,0753 0,8987 1,0445
Maximum 0,999 1,2755 0,3162 0,2841 81 1,0662 1,0753 1,0392 1,0445
Average 0,87283 1,14946 0,2703 0,23613 53,1 0,93795 1,0753 0,96162 1,0445
Table 40 - Case Study I.VII CR Evaluation Function
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7839 1,0272 0,1676 0,1614 23 0,8855 1,0753 0,9297 1,0445
Maximum 0,873 1,1267 0,2434 0,2369 44 1,0325 1,0753 1,0194 1,0445
Average 0,82455 1,05858 0,20836 0,2008 31,4 0,94883 1,0753 0,96856 1,0445
Given that the strategies discovered in this case study are not profitable (average loss of
approximately 5% in the best case) its detailed analysis is omitted.
5.1.8 Case Study I.VIII – Case Study I.III with Restricted Parameters
In this case study, based on the analysis made in 5.1.3, the parameters were restricted in order
to force the algorithm to use MAs simultaneously with MEVs. The parameters (Table 41) were
chosen so as to avoid the situations of short selling in the moments when the Market is rising or
long orders when the market is falling. The tests were performed using PIMDD evaluation
function that showed to have the best performance and the results of simulation are presented
in Table 42. All the found solutions are using the same MAs (MAs of 100 and 200 days) and are
observed very small variations in the decays that are always very close to the decays found in
64
5.1.3. It is given the same weight to all the factors (MA1 Weight, MA2 Weight and MEV Sum
Weight) in all the strategies. The decay values suggest that the investment should be short-term
while the MAs are too long (usually used in a long-term investment). Since the results are very
unsatisfactory, it is confirmed that there is no benefits of using MEVs in the long-term
investment.
Table 41 - Application’s Parameters Case Study I.VIII
Input Parameter Value
Minimum MEV number 50
Maximum MEV number 50
Threshold Limits 0.1
MEV Sum. Weights 1 and 2
MEV Sum. Derivative Weights
1 and 2
MA Weights 1 and 2
Table 42 - Case Study I.VIII PIMDD Evaluation Function
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7916 1,0547 0,1723 0,1546 35 0,9397 1,0753 0,9634 1,0445
Maximum 0,8691 1,1145 0,2357 0,2294 47 1,0045 1,0753 1,0027 1,0445
Average 0,84542 1,08245 0,18972 0,17909 42,5 0,98455 1,0753 0,99066 1,0445
Given that the strategies discovered in this case study are not profitable its detailed analysis is
omitted.
5.1.9 Conclusions
Based on the best results (5.1.2 and 5.1.3) it can be concluded that it is possible to obtain better
results using Macroeconomic Indicators or Macroeconomic Indicators, MAs and VIX than in the
case of using only MAs and VIX or than using the B&H strategy. It is also possible to conclude
that the Macroeconomic Indicators’ impact can be successfully used in the short term
forecasting (less than 1 month) despite the fact that usually it is considered that Macroeconomic
analysis considers factors affecting the long-term level.
5.2 Case Study II – MAs, VIX and MEVs’ Optimisation with Linear
Contribution and Simple Decay
This case study is based on the previous case study best results, namely on the section 5.1.3
where promising results were obtained using mostly MEVs without doing any optimization of
these. Thus, in this case study in an attempt to improve the results it is performed the
optimization of MEVs. Since the impact of macroeconomic variables varies over time, are also
performed several tests using a sliding window optimization. In the case study 5.1.3 was
discovered that the MEVs can be successfully used in a short-term investment considering only
the MEVs’ impacts from last 18-20 days and using short MAs (less than or equal to 50 days)
only to close short positions when the volatility is too high. Due to these facts, in this case study,
the optimization of the decay is made between 2 and 3 weeks and MAs of 100 and 200 days
65
are excluded. Given that the optimization of the variables has the complexity of
possible combinations associated, a larger population is used in this case study. Thus, the
constant parameters used by the application are presented in Table 43.
Table 43 - Case Study’s II Constant Parameters
Input Parameter Value
Mutation Rate 0.5
Crossover Rate 0.95
Population Size 1000
Start Training Date 2007/01/01
End Training Date 2010/01/01
Start Investment Date 2010/01/01
End Investment Date 2011/09/01
Minimum MEV number 1
Maximum MEV number 50
Number Of Runs 10
Generations 200
5.2.1 Case Study II.I - MEVs’ Optimization
In this case study the simulation is performed using linear contribution, simple decay and the
input parameters presented in Table 44 and it is considered that the MEVs’ impacts suffer a
simple decay. The results of the simulation are presented in Table 45.
Table 44 - Application’s Parameters Case Study II.I
Input Parameter Value
Threshold Limits 0.1
MEV Sum. Weights 2
MA Weights 0,1 and 2
Training Period 36 months
Investment Period 20 months
Generations Same Fitness 200
Table 45 - Case Study II.I
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,9583 1,5882 0,1593 0,1003 38 1,4555 1,0753 1,2526 1,0445
Maximum 0,9583 1,5882 0,1593 0,1003 38 1,4555 1,0753 1,2526 1,0445
Average 0,9583 1,5882 0,1593 0,1003 38 1,4555 1,0753 1,2526 1,0445
All the solutions found in this case study are very similar, using 50 days MAs, VIX limit of 40 and
18 days decay. Also, all the strategies are using Gross Domestic Purchases Price Index (MoM
usually published jointly with Gross Domestic Product Annualized, Real Personal Consumption
Expenditures QoQ), Housing Starts (MoM, usually published jointly with Building Permits), ECB
Interest Rate Decision and Consumer Price Index (MoM, usually published jointly with
Consumer Price Index Ex Food & Energy). The internal state of the parameters of the best
strategy is presented in Table 46. Once again all the investment decisions are made using only
Macroeconomic Indicators’ Impacts and MA of 50 days is only used to close short positions
when the volatility is too high. The comparison between the best discovered strategy and the
66
B&H is presented in Figure 26, where it can be seen that the discovered strategy significantly
overcomes the benchmark.
Table 46 - Case Study II.I Best Solution’s Parameters
Investment Period Macroeconomic Indicators Decay MA VIX Limit
2010/01/01-2011/09/01
- Gross Domestic Purchases Price Index(MoM) - Housing Starts (MoM) - ECB Interest Rate Decision - Consumer Price Index (MoM)
18 days
50 days
40
Figure 26 - Case Study II.I Best Solution vs. B&H
5.2.2 Case Study II.II - MEVs’ Optimization with Sliding Window
In order to improve the results by using the most recent information, in this case study are
performed several test assuming that the training and investment are made using a sliding
window. In all the tests, training period corresponds to 75% of data (commonly used in GA).
Since it was not possible to obtain improvements in the results using the MAs in all the previous
case studies, it is also considered that the investment decisions are made using only MEVs’
impact and that the MAs are only used to close short positions when the volatility is too high.
Thus, in the following sub-sections are presented the results of simulations obtained using the
parameters presented in Table 47. The results of the simulations are presented in Table 48,
Table 49 and Table 50.
0,9
1
1,1
1,2
1,3
1,4
1,5
1,6
Pro
fita
bili
ty I
nd
ex
Best Strategy
Buy and Hold
67
Table 47 - Application’s Parameters Case Study II.II
Input Parameter Value
Threshold Limits 0.1
MEV Sum. Weights 2
MA Weights 0
Training Period 36, 24, 12 months
Investment Periods 9, 6, 3 months
Generations 200
Generations Same Fitness 50
Table 48 - Case Study II.II – 1 year Training 3 months Investment
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7666 1,0083 0,1519 0,1292 58 0,8166 1,0753 0,8855 1,0445
Maximum 0,9335 1,3576 0,2797 0,2568 84 1,2732 1,0753 1,156 1,0445
Average 0,87456 1,15523 0,20795 0,18728 67,1 1,01506 1,0753 1,00676 1,0445
Table 49 - Case Study II.II – 2 years Training 6 months Investment
Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
Minimum 0,7654 0,999 0,0902 0,0879 49 0,8193 1,0753 0,8873 1,0445
Maximum 0,9367 1,4268 0,2407 0,2338 85 1,3052 1,0753 1,1733 1,0445
Average 0,85728 1,14206 0,20501 0,18941 67,1 1,02336 1,0753 1,01133 1,0445
Table 50 - Case Study II.II – 3 years Training 9 months Investment
Run Min.PI Max.PI PI MDD PI MDD% Transactions PI B&H PI/year B&H/year
1 0,9099 1,5081 0,141 0,0935 50 1,3825 1,0753 1,2145 1,0445
2 0,9099 1,5081 0,141 0,0935 50 1,3825 1,0753 1,2145 1,0445
3 0,9099 1,5526 0,2313 0,149 48 1,3212 1,0753 1,1819 1,0445
4 0,907 1,7886 0,2665 0,149 58 1,5221 1,0753 1,2867 1,0445
5 0,9099 1,5526 0,23 0,1481 48 1,3226 1,0753 1,1826 1,0445
6 0,9099 1,5526 0,23 0,1481 48 1,3226 1,0753 1,1826 1,0445
7 0,9099 1,5081 0,141 0,0935 50 1,3825 1,0753 1,2145 1,0445
8 0,8644 1,1885 0,1968 0,1728 64 1,0125 1,0753 1,0075 1,0445
9 0,907 1,8075 0,169 0,0935 54 1,657 1,0753 1,3539 1,0445
10 0,9679 1,7223 0,2566 0,149 54 1,4657 1,0753 1,2578 1,0445
Minimum 0,8644 1,1885 0,141 0,0935 48 1,0125 1,0753 1,0075 1,0445
Maximum 0,9679 1,8075 0,2665 0,1728 64 1,657 1,0753 1,3539 1,0445
Average 0,91057 1,5689 0,20032 0,129 52,4 1,37712 1,0753 1,20965 1,0445
The best results of this case study were obtained using 3 years training and 9 months
investment periods (longest periods), suggesting that to detect patterns of behaviour, the
algorithm needs larger amounts of data. The best strategy’s evolution over the time is presented
in Table 51 and it can be seen that the strategy suffers several changes over the time being the
VIX Limit the only constant parameter and equals to 40.
68
Table 51 - Case Study II.II Best Solution Evolution
Investment Period Macroeconomic Indicators Decay MA
2010/01/01-2010/10/01
- Consumer Price Index (MoM) - Import Price Index (MoM) - ECB Interest Rate Decision - Core Personal Consumption Expenditure - Prices Index (YoY) - Import Price Index (YoY)
18 days 10 days
2010/10/01-2011/07/01
- Consumer Price Index (YoY) - ECB Interest Rate Decision - Real Personal Consumption Expenditures (QoQ) - Housing Starts (MoM) - Gross Domestic Product Annualized - Import Price Index (MoM)
15 days 2 hours
10 days
2011/07/01-2011/09/01
-Consumer Price Index (YoY) -ECB Interest Rate Decision -Housing Starts (MoM) -Gross Domestic Product Annualized
18 days 20 days
Figure 27 - Case Study II.II Best Solution vs. B&H
5.2.3 Conclusions
In this case study, in attempt to improve the results of the previous case studies, was performed
an optimization of the Macroeconomic Indicators. During the simulation, were found solutions
that using sub-sets of Macroeconomic Indicators and in fewer transactions achieved better
results. Using sliding window was possible to obtain better results that lead us to the conclusion
that the impact of macroeconomic variables varies over time and that it is possible to discover
the most important factors using GA optimization.
0,85
1,05
1,25
1,45
1,65
1,85
Pro
fita
bili
ty I
nd
ex
Best Strategy
Buy and Hold
69
5.3 Summary
In this section are summarized the best strategies discovered during the optimization tests
performed in this chapter. Are also listed the key Macroeconomic Indicators which affect the
stock price movements.
5.3.1 Best Strategies
The best discovered strategies are presented in Figure 28 where can be seen that all of them
overcome the B&H strategy. The best results were obtained using the MEVs optimization and
sliding window training and investment. It can be seen in the figure that all the strategies show
better performance and anticipation of the Bear Market, having a less positive performance
when the market is rising (Bull Market).
Figure 28 - Best Strategies vs. B&H
5.3.2 Key Macroeconomic Indicators
In this section are listed the Key Macroeconomic Factors discovered during the Optimization
followed by a brief description:
Gross Domestic Product Annualized: shows the monetary value of all the goods,
services and structures produced within a country in a given period of time. It is a gross
measure of market activity because it indicates the pace at which a country's economy
is growing or decreasing.
Real Personal Consumption Expenditures (QoQ): is an average of the amount of
money the consumers spend in a month on durable goods, consumer products, and
services. It is considered as an important indicator of inflation.
0,8
1
1,2
1,4
1,6
1,8
Pro
fita
bili
ty I
nd
ex B&H
All MEVs
MEVs Optimization
MEVs Optimization and SW
70
Gross Domestic Purchases Price Index: gauges the change in the prices of goods and
services. Changes in the GDP price index are followed as an indicator of inflationary
pressure that may anticipate interest rates to rise.
Consumer Price Index (YoY): is a measure of price movements by the comparison
between the retail prices of a representative shopping basket of goods and services.
The CPI is a key indicator to measure inflation and changes in purchasing trends.
Consumer Price Index Ex Food & Energy (MoM): is a measure of price movements by
the comparison between the retail prices of a representative shopping basket of goods
and services. Those volatile products such as food and energy are excluded in order to
capture an accurate calculation.
Housing Starts (MoM): is an indicator that tracks how many new single-family homes or
buildings were constructed. For the survey each house and each single apartment are
counted as one housing start. The figures include all private and publicly owned units. It
indicates movements of the US housing market.
Building Permits (MoM): shows the number of permits for new construction projects. It
implies the movement of corporate investments (US economic development).
ECB Interest Rate Decision: is hawkish about the inflationary outlook of the economy
and rises the interest rates it is positive, or bullish, for the EUR. Likewise, if the ECB has
a dovish view on the European economy and keeps the ongoing interest rate, or cuts
the interest rate it is seen as negative, or bearish.
Core Personal Consumption Expenditure - Prices Index: is an average amount of
money that consumers spend in a month. "Core" excludes seasonally volatile products
such as food and energy in order to capture an accurate calculation of the expenditure.
It is a significant indicator of inflation.
Personal Consumption Expenditures: is an indicator that measures the total expenditure
by individuals. The level of spending can be used as an indicator of consumer optimism.
It is also considered as a measure of economic growth: While the Personal spending
stimulates inflationary pressures, it could lead to the rise of interest rates.
Personal Income: measures the total income received by individuals, from all sources
including wages and salaries, interest, dividends, rent, workers' compensation,
proprietors' earnings, and transfer payments. This figure can provide insight on the US
employment situation.
Import Price Index: informs the changes in the price of imported products into the US.
The higher the cost of imported goods, the stronger the effect they will have on inflation,
redunding in a higher probability of a rate rise.
Export Price Index: informs of the changes in the price of U.S. export goods and
services. The U.S. trade represents 20 percent of total world trade. Thus, it is correlated
with the value of the USD and its volatility. A rise in prices is a threat over the mid-term
as higher prices mean lower demands to be expected.
71
CHAPTER 6 Conclusions and Future Work
In this work it is proposed a potential tool that using Macroeconomic Indicators can be
successfully used in Stock Market Index forecasting. To validate the designed application
obtained strategies were compared against the B&H and MA based strategies in the period
between 2010/01 and 2011/09 with the S&P500 Index Futures, showing to have better
performance than these strategies. The developed application made an excellent profit in a
simulation exercise. The preliminary results are promising, and much more can be performed to
improve them. The following sections focus on the key findings that can be drawn from the
results and propose several features that potentially can improve the current solution.
6.1 Conclusion
In this work, several investment methodologies and computational techniques applied to the
stock market forecasting were analysed. Based on the existing solutions and on the analysis of
the problem, it was developed a GA based application that using mainly Macroeconomic
Indicators made an excellent profit during the simulation (average of 25% per year). The
obtained results indicate that using GAs and other Softcomputing methodologies it is possible to
optimize the existing investment strategies and obtain results that are competitive with the
existing strategies (Buy & Hold, hedge funds).
From the Investment point of view, the most important conclusion of this work is that the
Macroeconomic News’ Impacts can be successfully measured using the market’s volatility
associated to its release, that in the case of this work was measured with the minutely variations
of the S&P500 Index Futures’ prices. The Macroeconomic Indicators’ impacts, measured this
way, can be successfully used in the short term forecasting, despite the fact that usually it is
considered that Macroeconomic analysis considers factors affecting the long-term level.
6.2 Future Work
Although, the application has not ended in completely success given that in some cases the
discovered strategies showed to have some undesired behaviour, but the results are
satisfactory and lead to new research direction. Based on the proposed approach, several
directions can be followed in order to enhance current forecasting potential. Having in
consideration the key findings of the proposed approach, the choices made during this work
based on the goals and inherent constraints, the following possibilities can be explored in order
to improve the results:
Focusing on the best results, test the best discover configuration with different Indexes,
like German DAX or US DJIA;
Focusing on the best results, test models that also include various technical indicators
that have not been tested in this work;
72
Explore the different ways of modelling the decay of the impact;
Explore different ways of measuring the impact of macroeconomic news based on the
volatility, for instance using the hourly or daily variations;
Explore how to use the volatility in conjunction with the indicator value and the
estimated indicator ‘s value in order to measure the impact;
Perform several tests in order to discover the most appropriate training and testing
periods;
Improve the existing GA by trying additional mutation, crossover and selection
procedures. Analyze the behaviour of the algorithm with other different evaluation
functions that have not been used in this work.
73
References
[1] Fernando Braga de Matos, Ganhar em Bolsa.: Dom Quixote, 2007.
[2] Bernard Baumohl, The Secrets of Economic Indicators: Hidden Clues to Future Economic
Trends and Investment Opportunities, 2nd edition.: Wharton School Publishing, 2007.
[3] Tom M. Mitchell, Machine Learning.: McGraw-Hill Science/Engineering/Math, 1997.
[4] Yang Li, Qing-Guo Wang, Tong Heng Lee Ming Hao Eng, "Forecast Forex with ANN Using
Fundamental Data," in International Conference on Information Management, Innovation
Management and Industrial Engineering, 2008, pp. 279-282.
[5] K. Asakawa T. Kimoto, "Stock Market Prediction System with Modular Neural Networks," in
IJCNN International Joint Conference on Neural Networks, vol. 1, 1990, pp. 1 - 6.
[6] M. Tomassini T. Ankenbrand, "Predicting multivariate financial time series using neural
networks: the Swiss bond case," in Computational Intelligence for Financial Engineering.
Proceedings of the IEEE/IAFE 1996 Conference, 1996, pp. 27-33.
[7] G. Finnie, and C.N.W. Tan B. Vanstone, "Applying Fundamental Analysis and Neural
Networks in the Australian Stockmarket," in International Conference on Artificial
Intelligence in Science and Technology (AISAT 2004), Hobart, Tasmania, 2004.
[8] T.K. Bandopadhyaya, S. Sharma A.U. Khan, "Classification and Identification of Stocks
using SOM and Genetic Algorithm based Backpropagation Neural Network," in Innovations
in Information Technology, 2008. IIT 2008. International Conference, 2008, pp. 292-296.
[9] N. Talaat, S. Shaheen A. Atiya, "An Efficient Stock Market Forecasting Model," in Neural
Networks, IEEE International Conference, vol. 4, 1997, pp. 2112-2115.
[10] You-Shyang Chen Ching-Hsue Cheng, "Fundamental Analysis of Stock Trading Systems
using Classification Techniques," in Machine Learning and Cybernetics, 2007 International
Conference, 2007, pp. 1377-1382.
[11] Afsaneh Ghasemi, Tazehabadi Esmaiel Abounoori, "Forecasting Stock Price Using
Macroeconomic Variables: A Hybrid ARDL, ARIMA and Artificial Neural Network," in ICIFE
'09 Proceedings of the 2009 International Conference on Information and Financial
Engineering, 2009.
[12] S. Arosha, "Automated Neural-ware System for Stock Market Predicition," in Cybernetics
74
and Intelligent Systems, 2004 IEEE Conference, 2004, pp. 1166-1171.
[13] L. C. Lee and C. F. Lee R. J. Kuo, "Integration of Artificial Neutral Networks and Fuzzy
Delphi for Stock Market Forecasting," in Systems, Man, and Cybernetics, 1996., IEEE
International Conference, vol. 2, 1996, pp. 1073-1078.
[14] Chi-Bin Cheng, Chung-Jen Fu Yu-Ru Syau, "A Neuro-Fuzzy Approach for Equity Valuation
Based on Fundamental Analysis," in Fuzzy Information Processing Society, 2006. NAFIPS
2006. Annual meeting of the North American, 2006, pp. 392-396.
[15] P. Cheng, A. Jain C. Quek, "Predicting impact of news on stock price," in An evaluation of
neuro fuzzy systems. IEEE Congress on Evolutionary Computation, 2007, pp. 1226-1233.
[16] Huanhuan Chen, Shouyang Wang Lean Yu and Kin Keung Lai, "Evolving Least Squares
Support Vector Machines for Stock Market Trend Mining," in Evolutionary Computation,
IEEE Transactions, 2009, pp. 87-102.
[17] T. L. Paez, G. Vora S. K. Kassicieh, "Investment decisions using genetic algorithms," in
System Sciences, 1997, Proceedings of the Thirtieth Hawaii International Conference, vol.
5, 1997, pp. 484-490.
[18] Lean Yu, Tao Huang, Shouyang Wang, and Kin Keung Lai Chengxiong Zhou, "Selecting
Valuable Stock Using Genetic Algorithm," in SEAL 2006, LNCS 4247, 2006, pp. 688–694.
[19] Dagang Ke , Yongjun Wang, and Lida Xu Yanxia Jiang, "Using Genetic Algorithms to
Predict Financial Performance," in Systems, Man and Cybernetics, 2007. ISIC. IEEE
International Conference, 7-10 Oct. 2007, pp. 3225-3229.
[20] Łukasz Rachwalski Paweł B. Myszkowski, "Trading rule discovery on Warsaw Stock
Exchange using revolutionary algorithms," in Computer Science and Information
Technology, IMCSIT '09. International Multiconference, 2009, pp. 81 - 88.
[21] C. G. Doherty, "Fundamental analysis using genetic programming for classification rule
induction," in in Proc. Genet. Algorithms Genet. Program. Stanford 2003, Stanford, CA:
Stanford Bookstore, 2003, pp. 45-51.
[22] Wen-Tsao Pan Chih-Hung Wen, "Construct for Investment Strategy Model through Genetic
Programming Planning," in Artificial Intelligence, 2009. JCAI '09. International Joint
Conference, 25-26 April 2009, pp. 252-255.
[23] I. E. Diakouakis and D. M. Emiris D. E. Koulouriotis, "A Fuzzy Cognitive Map-based Stock
Market Model Syntesis, analysys and Experimental Results," in Fuzzy Systems, 2001. The
75
10th IEEE International Conference, vol. 1, 2001, pp. 465-468.
[24] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition.: Morgan
Kaufmann, Elsevier, 2006.
[25] (2011) FXstreet.com. [Online]. http://www.fxstreet.com/fundamental/economic-calendar/
[26] (1999-2011) finam.ru. [Online]. http://www.finam.ru/analysis/export/default.asp
[27] (2011, Aug.) Yahoo Finance. [Online]. finance.yahoo.com
[28] P. Surekha, T. Hamsapriya S. Sumathi, Evolutionary Intelligence.: Springer, 2008.
[29] A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing, 2nd, Ed.: Springer,
Natural Computing Series, 2007.
[30] V. Salladurai S. Narmadha, "Multi-Product Inventory Optimization using Uniform Crossover
Genetic Algorithm," in (IJCSIS) International Journal of Computer Science and Information
Security, 2010.
[31] Jan Vecer Libor Pospisil, "PDE Methods for the Maximum Drawdown," in Columbia
University, Department of Statistics, New York, USA, April 1, 2008.
[32] Martin Eling Frank Schuhmacher, "Sufficient conditions for expected utility to imply
drawdown-based performance rankings," in Journal of Banking & Finance, vol. 35,
September 2011, pp. 2311-2318, you can find slides here:
http://www.intelligenthedgefundinvesting.com/pubs/rb-mm.pdf.
[33] Ching-Hsue Cheng You-Shyang Chen, "Forecasting Revenue Growth Rate Using
Fundamental Analysis: A Feature," in Fuzzy Systems and Knowledge Discovery, 2007.
FSKD 2007. Fourth International Conference, 2007, pp. 151-155.
76
77
APPENDIX A – Macroeconomic Indicators
Table 52 - EMU Macroeconomic Indicators
Macroeconomic Indicator
Description
Construction Output s.a (MoM)
The report released by the Eurostat is the output of the construction industry, in both the private and public sectors. It shows the strength of the construction industry, which, at the same time, hints at the investments made in this sector of the economy.
Consumer Confidence Released by the European Commission is a leading index that measures the level of consumer confidence in economic activity. A high level of consumer confidence stimulates economic expansion while a low level drives to economic downturn.
Consumer Price Index (MoM)
Released by the Eurostat captures the changes in the price of goods and services. The CPI is a significant way to measure changes in purchasing trends and inflation in the Euro Zone.
Consumer Price Index (YoY)
Released by the Eurostat captures the changes in the price of goods and services. The CPI is a significant way to measure changes in purchasing trends and inflation in the Euro Zone.
Consumer Price Index - Core (YoY)
Released by Eurostat is a measure of price movements by the comparison between the retail prices of a representative shopping basket of goods and services excluding the volatile components like food, energy, alcohol and tobacco. The core CPI is a key indicator to measure inflation and changes in purchasing trends.
Current Account n.s.a Released by the European Central Bank is a net flow of current transactions, including goods, services, and interest payments into and out of the Euro-Zone. A current account surplus indicates that the flow of capital into the Euro-Zone exceeds the capital reduction.
ECB Interest Rate Decision
Announced by the European Central Bank. Usually if the ECB is hawkish about the inflationary outlook of the economy and raises the interest rates it is positive, or bullish, for the EUR. Likewise, if the ECB has a dovish view on the European economy and keeps the ongoing interest rate, or cuts the interest rate it is seen as negative, or bearish.
Employment Change (QoQ)
Released by the Eurostat is a measure of the change in the number of employed people in the Euro-Zone. Generally speaking, a rise in this indicator has positive implications for consumer spending which stimulates economic growth.
Employment Change (YoY)
Released by the Eurostat is a measure of the change in the number of employed people in the Euro-Zone. Generally speaking, a rise in this indicator has positive implications for consumer spending which stimulates economic growth.
78
Macroeconomic Indicator
Description
Gross Domestic Product s.a. (QoQ)
Released by the Eurostat is a measure of the total value of all goods and services produced by the Eurozone. The GDP is considered as a broad measure of the Eurozone economic activity and health.
Gross Domestic Product s.a. (YoY)
Released by the Eurostat is a measure of the total value of all goods and services produced by the Eurozone. The GDP is considered as a broad measure of the Eurozone economic activity and health.
Industrial New Orders (YoY)
Released by the Eurostat captures the value of new contracts for goods in the manufacturing sector. An increasing number of Industrial New Orders predicts enhanced production and a growth in the GDP.
Industrial New Orders s.a. (MoM)
Released by the Eurostat captures the value of new contracts for goods in the manufacturing sector. An increasing number of Industrial New Orders predicts enhanced production and a growth in the GDP.
Industrial Production s.a. (MoM)
Released by the Eurostat. It shows the volume of production of Industries such as factories and manufacturing. Up trend is regarded as inflationary which may anticipate interest rates to rise.
Industrial Production w.d.a. (YoY)
Released by the Eurostat. It shows the volume of production of Industries such as factories and manufacturing. Up trend is regarded as inflationary which may anticipate interest rates to rise.
Producer Price Index (MoM)
Released by the Eurostat is an index that measures the change in prices received by domestic producers of commodities in all stages of processing (crude materials, intermediate materials, and finished goods).
Producer Price Index (YoY)
Released by the Eurostat is an index that measures the change in prices received by domestic producers of commodities in all stages of processing (crude materials, intermediate materials, and finished goods).
Purchasing Manager Index Manufacturing
Released by the Markit Economics captures business conditions in the manufacturing sector. As the manufacturing sector dominates a large part of total GDP, the manufacturing PMI is an important indicator of business conditions and the overall economic condition in the Euro Zone.
Purchasing Manager Index Services
Released by the Markit Economics is an indicator of the economic situation in the Euro Zone services sector. It captures an overview of the condition of sales and employment. It is worth noting that the European service sector does not influence, either positively or negatively, the GDP as much as the PMI manufacturing does.
79
Macroeconomic Indicator
Description
Retail Sales (MoM) Released by the Eurostat is a measure of changes in sales of the Euro zone retail sector. It shows the performance of the retail sector in the short term. Percent changes reflect the rate of changes of such sales. The changes are widely followed as an indicator of consumer spending.
Retail Sales (YoY) Released by the Eurostat is a measure of changes in sales of the Euro zone retail sector. It shows the performance of the retail sector in the short term. Percent changes reflect the rate of changes of such sales. The changes are widely followed as an indicator of consumer spending.
Trade Balance n.s.a. Released by the Eurostat is a balance between exports and imports of total goods and services. A positive value shows trade surplus, while a negative value shows trade deficit. It is an event that generates some volatility for the EUR.
Trade Balance s.a. Released by the Eurostat is a balance between exports and imports of total goods and services. A positive value shows trade surplus, while a negative value shows trade deficit. It is an event that generates some volatility for the EUR.
Unemployment Rate Released by the Eurostat is the number of unemployed workers divided by the total civilian labor force. It is a leading indicator for the European Economy. If the rate is up, it indicates a lack of expansion within the European lobar market. As a result, a rise leads to weaken the European economy.
ZEW Survey - Economic Sentiment
Published by the Zentrum für Europäische Wirtschaftsforschung measures the institutional investor sentiment, reflecting the difference between the share of investors that are optimistic and the share of analysts that are pessimistic. A positive number means that the share of optimists outweighs the share of pessimists
Table 53 - German Macroeconomic Indicators
Macroeconomic Indicator
Description
Consumer Price Index (MoM)
Released by the Statistiches Bundesamt Deutschland measures the average price change for all goods and services purchased by households for consumption purposes. CPI is the main indicator to measure inflation and changes in purchasing trends.
Consumer Price Index (YoY)
Released by the Statistiches Bundesamt Deutschland measures the average price change for all goods and services purchased by households for consumption purposes. CPI is the main indicator to measure inflation and changes in purchasing trends.
80
Macroeconomic Indicator
Description
Factory Orders n.s.a. (YoY)
Released by the Bundesministerium für Wirtschaft und Technologie is an indicator that includes shipments, inventories, and new and unfilled orders. An increase in the factory order total may indicate an expansion in the German economy and could be an inflationary factor.
Factory Orders s.a. (MoM)
Released by the Bundesministerium für Wirtschaft und Technologie is an indicator that includes shipments, inventories, and new and unfilled orders. An increase in the factory order total may indicate an expansion in the German economy and could be an inflationary factor. It is worth noting that the German Factory barely influences, either positively or negatively, the total Eurozone GDP.
Gfk Consumer Confidence Survey
The GfK Consumer Confidence is a leading index that measures the level of consumer confidence in economic activity. A high level of consumer confidence stimulates economic expansion while a low level drives to economic downturn.
Gross Domestic Product n.s.a (YoY)
Released by the Statistisches Bundesamt Deutschland is a measure of the total value of all goods and services produced by Germany. The GDP is considered as a broad measure of the German economic activity and health.
Gross Domestic Product s.a (QoQ)
Released by the Statistisches Bundesamt Deutschland is a measure of the total value of all goods and services produced by Germany. The GDP is considered as a broad measure of the German economic activity and health.
Gross Domestic Product w.d.a (YoY)
Released by the Statistisches Bundesamt Deutschland is a measure of the total value of all goods and services produced by Germany. The GDP is considered as a broad measure of the German economic activity and health.
IFO - Business Climate Released by the CESifo Group is closely watched as an early indicator of current conditions and business expectations in Germany. The Institute surveys more than 7,000 enterprises on their assessment of the business situation and their short-term planning.
IFO - Expectations Released by the CESifo Group is closely watched as an early indicator of current conditions and business expectations for the next six months, where firms rate the future outlook as better, same, or worse.
Industrial Production s.a. (MoM)
Released by the Statistisches Bundesamt Deutschland measures outputs of the German factories and mines. Changes in industrial production are widely followed as a major indicator of strength in the manufacturing sector.
Industrial Production s.a. w.d.a. (YoY)
Released by the Statistisches Bundesamt Deutschland measures outputs of the German factories and mines. Changes in industrial production are widely followed as a major indicator of strength in the manufacturing sector.
Producer Price Index (MoM)
Released by the Statistisches Bundesamt Deutschland measures the average changes in prices in the German primary markets. Changes in the PPI are widely followed as an indicator of commodity inflation.
81
Macroeconomic Indicator
Description
Producer Price Index (YoY)
Released by the Statistisches Bundesamt Deutschland measures the average changes in prices in the German primary markets. Changes in the PPI are widely followed as an indicator of commodity inflation.
Purchasing Manager Index Manufacturing
Released by the Markit economics captures business conditions in the manufacturing sector. As the manufacturing sector dominates a large part of total GDP, the manufacturing PMI is an important indicator of business conditions and the overall economic condition in Germany.
Retail Sales (MoM) Released by the Statistisches Bundesamt Deutschland is a measure of changes in sales of the German retail sector. It shows the performance of the retail sector in the short term. Percent changes reflect the rate of changes of such sales. The changes are widely followed as an indicator of consumer spending.
Retail Sales (YoY) Released by the Statistisches Bundesamt Deutschland is a measure of changes in sales of the German retail sector. It shows the performance of the retail sector in the short term. Percent changes reflect the rate of changes of such sales. The changes are widely followed as an indicator of consumer spending.
Trade Balance Released by the Statistisches Bundesamt Deutschland is a balance between exports and imports of total goods and services. A positive value shows a trade surplus, while a negative value shows a trade deficit. It is an event that generates some volatility for the EUR.
Unemployment Change Published by the German Statistics Office is a measure of the change in the number of unemployed people in Germany. A rise in this indicator has negative implications for consumer spending which encourages economic growth.
Unemployment Rate s.a.
Published by the German Statistics Office shows, in a percent basis, the amount of unemployed people in Germany. A decrease in this indicator has positive implications for consumer spending which stimulates economic growth.
ZEW Survey - Current Situation
Published by the Zentrum für Europäische Wirtschaftsforschung measures the institutional investor sentiment, reflecting the difference between the share of investors that are optimistic and the share of analysts that are pessimistic.
ZEW Survey - Economic Sentiment
Published by the Zentrum für Europäische Wirtschaftsforschung measures the institutional investor sentiment, reflecting the difference between the share of investors that are optimistic and the share of analysts that are pessimistic.
82
Table 54 - USA Macroeconomic Indicators
Macroeconomic Indicator
Description
ABC Washington Post Consumer Confidence
Released by ABC News and the Washington Post captures the level of confidence that individuals have in economic activity reflecting respondents' evaluations of their personal financial situation. Generally, a high level of consumer confidence stimulates economic expansion while a low level drives to economic downturn.
ADP Employment Change
Released by the Automatic Data Processing, Inc is a measure of the change in the number of employed people in the US Generally speaking, a rise in this indicator has positive implications for consumer spending which stimulates economic growth.
Average Hourly Earnings (MoM)
Released by the US Department of Labor is a significant indicator of labor cost inflation and of the tightness of labor markets. The Federal Reserve Board pays close attention to when setting interest rates.
Average Hourly Earnings (YoY)
Released by the US Department of Labor is a significant indicator of labor cost inflation and of the tightness of labor markets. The Federal Reserve Board pays close attention to when setting interest rates.
Average Weekly Hours Released by the US Department of Labor is an indicator of labor cost inflation and of the tightness of labor markets. The Federal Reserve Board pays close attention to when setting interest rates. Excessive volatility is expected.
Building Permits (MoM)
Released by the US Census Bureau, at the Department of Commerce shows the number of permits for new construction projects. It implies the movement of corporate investments (US economic development). It tends to cause some volatility to the USD.
Chicago Purchasing Managers' Index
Released by the Kingsbury International captures business conditions across Illinois, Indiana and Michigan. This index is an indicator of business trends and it is interrelated with the ISM manufacturing Index. It is widely used to indicate the overall economic condition in US.
Construction Spending (MoM)
Released by the US Census Bureau is an indicator that measures the total amount of spending in the US on all types of construction. The residential construction component is useful for predicting future national new home sales and mortgage origination volume.
Consumer Confidence Released by the Conference Board captures the level of confidence that individuals have in economic activity. A high level of consumer confidence stimulates economic expansion while a low level drives to economic downturn.
83
Macroeconomic Indicator
Description
Consumer Credit Change
Released by the Board of Governors of the Federal Reserve is an amount of money that individuals borrowed. It shows if consumers can afford large expenses, which can fuel economic growth. However, a high figure may also indicate that the economy is overheating, as consumers borrow in order to live beyond their means.
Consumer Price Index (MoM)
Released by the US Department of Labor is a measure of price movements by the comparison between the retail prices of a representative shopping basket of goods and services. The purchase power of USD is dragged down by inflation. The CPI is a key indicator to measure inflation and changes in purchasing trends.
Consumer Price Index (YoY)
Released by the US Department of Labor is a measure of price movements by the comparison between the retail prices of a representative shopping basket of goods and services. The purchase power of USD is dragged down by inflation. The CPI is a key indicator to measure inflation and changes in purchasing trends.
Consumer Price Index Ex Food & Energy (MoM)
Released by the US Department of Labor is a measure of price movements by the comparison between the retail prices of a representative shopping basket of goods and services. Those volatile products such as food and energy are excluded in order to capture an accurate calculation.
Consumer Price Index Ex Food & Energy (YoY)
Released by the US Department of Labor is a measure of price movements by the comparison between the retail prices of a representative shopping basket of goods and services. Those volatile products such as food and energy are excluded in order to capture an accurate calculation.
Continuing Jobless Claims
Released by the US Department of Labor measure the number of individuals who are unemployed and are currently receiving unemployment benefits. It presents the strength in the labor market. A rise in this indicator has negative implications for consumer spending which discourage economic growth.
Core Personal Consumption Expenditure - Prices Index (MoM)
Released by the US Bureau of Economic Analysis is an average amount of money that consumers spend in a month. "Core" excludes seasonally volatile products such as food and energy in order to capture an accurate calculation of the expenditure. It is a significant indicator of inflation.
Core Personal Consumption Expenditure - Prices Index (YoY)
Released by the US Bureau of Economic Analysis is an average amount of money that consumers spend in a month. "Core" excludes seasonally volatile products such as food and energy in order to capture an accurate calculation of the expenditure. It is a significant indicator of inflation.
84
Macroeconomic Indicator
Description
Durable Goods Orders ex Transportation
Released by the US Census Bureau, the cost of orders received by manufacturers for durable goods, which means goods planned to last for three years or more, excluding the transport sector. As those durable products often involve large investments they are sensitive to the US economic situation.
Durable Goods Orders Released by the US Census Bureau, measures the cost of orders received by manufacturers for durable goods, which means goods planned to last for three years or more, such as motor vehicles and appliances. As those durable products often involve large investments they are sensitive to the US economic situation.
Existing Home Sales (MoM)
Released by the National Association of Realtors, provide an estimated value of housing market conditions. As the housing market is considered as a sensitive factor to the US economy, it generates some volatility for the USD.
Existing Home Sales Released by the National Association of Realtors provide an estimated value of housing market conditions. As the housing market is considered as a sensitive factor to the US economy, it generates some volatility for the USD.
Factory Orders Released by the US Census Bureau is a measure of the total orders of durable and non durable goods such as shipments (sales), inventories and orders at the manufacturing level which can offer insight into inflation and growth in the manufacturing sector.
Fed Interest Rate Decision
The Board of Governors of the Federal Reserve announces an interest rate. This interest rate affects the whole range of interest rates set by commercial banks, building societies and other institutions for their own savers and borrowers. It also tends to affect the exchange rate.
Gross Domestic Product Annualized
Released by the US Bureau of Economic Analysis shows the monetary value of all the goods, services and structures produced within a country in a given period of time. It is a gross measure of market activity because it indicates the pace at which a country's economy is growing or decreasing.
Gross Domestic Purchases Price Index
Released by the Bureau of Economic Analysis, Department of Commerce gauges the change in the prices of goods and services. Changes in the GDP price index are followed as an indicator of inflationary pressure that may anticipate interest rates to rise.
Housing Price Index (MoM)
Released by the Office of Federal Reserve Housing Enterprise Oversight provides an estimated value of housing market conditions. It is an important indicator as the housing market is considered as a sensitive factor to the US economy.
85
Macroeconomic Indicator
Description
Housing Starts (MoM) Released by the US Census Bureau, at the Department of Commerce is an indicator that tracks how many new single-family homes or buildings were constructed. For the survey each house and each single apartment are counted as one housing start. The figures include all private and publicly owned units. It indicates movements of the US housing market.
IBD TIPP Economic Optimism
Released by The Investor's Business Daily (IBD) TechnoMetrica Institute of Policy and Politics (TIPP), measures the sentiment of consumers related to economic conditions. The report is based on a monthly survey where near to 1000 nationwide adults evaluate their economic outlook for the next six months, personal financial perspectives and their confidence in federal economics policies. If consumers are optimistic they will purchase more goods and services which will involve growth in domestic demand and stimulation to the economy. A reading above 50 indicates optimism, below 50 is pessimism.
Import Price Index (MoM)
Released by the US Department of Labor informs the changes in the price of imported products into the US. The higher the cost of imported goods, the stronger the effect they will have on inflation, redunding in a higher probability of a rate rise.
Import Price Index (YoY)
Released by the US Department of Labor informs the changes in the price of imported products into the US. The higher the cost of imported goods, the stronger the effect they will have on inflation, redunding in a higher probability of a rate rise.
Initial Jobless Claims Released by the US Department of Labor is a measure of the number of people filing first-time claims for state unemployment insurance. In other words, it provides a measure of strength in the labor market. A larger than expected number indicates weakness in this market which influences the strength and direction of the US economy.
ISM Manufacturing The Institute for Supply Management (ISM) Manufacturing Index shows business conditions in the US manufacturing sector It is a significant indicator of the overall economic condition in US.
ISM Non-Manufacturing
Released by the Institute for Supply Management (ISM) shows business conditions in the US non-manufacturing sector. It is worth noting that the non-manufacturing sector does not influence, either positively or negatively, the GDP as much as the ISM Manufacturing does.
MBA Mortgage Applications
Released by the Mortgage Bankers Association presents various mortgage applications. It is considered as a leading indicator of the U.S Housing Market. A Mortgage growth represents a healthy housing market that stimulates the overall US economy.
Monthly Budget Statement
Released by the Financial Management Service summarizes the financial activities of federal entities, disbursing officers, and Federal Reserve banks.
86
Macroeconomic Indicator
Description
NAHB Housing Market Index
Released by the National Association of Home Builders. It presents home sales and expected home buildings in the future indicating housing market trend in the United States. The growth rate of the housing market affects the USD volatility.
Net Long-term TIC Flows
Released by the US Department of Treasury. TIC stands for Treasury International Capital. It shows in and out flows of financial resources in the United States. The TIC flow is one of the major events in the market, as it is seen by most participants as the Government resource for offsetting the current Trade Deficit.
New Home Sales (MoM)
Released by the US Census Bureau is an important measure of housing market conditions. House buyers spend money on furnishing and financing their homes so as a result the demand for goods, services and the employees is stimulated.
New Home Sales Released by the US Census Bureau is an important measure of housing market conditions. House buyers spend money on furnishing and financing their homes so as a result the demand for goods, services and the employees is stimulated.
Nonfarm Payrolls Released by the US Department of Labor is one of the most important data. The report presents the number of people on the payrolls of all non-agricultural businesses. The monthly changes in payrolls can be excessively volatile.
Nonfarm Productivity Released by the Bureau of Labor Statistics of the US Department of Labor shows the output per Hour of labor worked. Non-farm Productivity indicates the overall business health in the US, which has an influence on GDP.
NY Empire State Manufacturing
Conducted by the Federal Reserve Bank of New York gauges business conditions for New York manufacturers.
Pending Home Sales (MoM)
Released by the National Association of Realtors is a leading indicator of trends of the housing market in the US It captures residential housing contract activity of existing single-family homes. As the housing market is considered as a sensitive factor to the US economy, it generates some volatility for the USD.
Personal Consumption Expenditure Deflator
Price changes may cause consumers to switch from buying one good to another and the PCE Deflator has the ability to account for such substitutions. This makes it the preferred measure of inflation for the Federal Reserve and it's released by the Commerce Department.
Personal Consumption Expenditures (MoM)
Released by the Bureau of Economic Analysis, Department of Commerce is an indicator that measures the total expenditure by individuals. The level of spending can be used as an indicator of consumer optimism. It is also considered as a measure of economic growth: While the Personal spending stimulates inflationary pressures, it could lead to raise interest rates.
87
Macroeconomic Indicator
Description
Personal Income (MoM)
Released by the Bureau of Economic Analysis, Department of Commerce measures the total income received by individuals, from all sources including wages and salaries, interest, dividends, rent, workers' compensation, proprietors' earnings, and transfer payments. This figure can provide insight on the US employment situation.
Philadelphia Fed Manufacturing Survey
The Philadelphia Fed Survey is a spread index of manufacturing conditions (movements of manufacturing) within the Federal Reserve Bank of Philadelphia. This survey, served as an indicator of manufacturing sector trends, is interrelated with the ISM manufacturing Index (Institute for Supply Management) and the index of industrial production.
Producer Price Index (MoM)
Released by the Bureau of Labor statistics, Department of Labor measures the average changes in prices in primary markets of the US by producers of commodities in all states of processing. Changes in the PPI are widely followed as an indicator of commodity inflation.
Producer Price Index (YoY)
Released by the Bureau of Labor statistics, Department of Labor measures the average changes in prices in primary markets of the US by producers of commodities in all states of processing. Changes in the PPI are widely followed as an indicator of commodity inflation.
Producer Price Index ex Food & Energy (MoM)
Released by the Bureau of Labor statistics, Department of Labor measures the average changes in prices in primary markets of the US by producers of commodities in all states of processing. Those volatile products such as food and energy are excluded in order to capture an accurate calculation.
Producer Price Index ex Food & Energy (YoY)
Released by the Bureau of Labor statistics, Department of Labor measures the average changes in prices in primary markets of the US by producers of commodities in all states of processing. Those volatile products such as food and energy are excluded in order to capture an accurate calculation.
Real Personal Consumption Expenditures (QoQ)
Released by the US Bureau of Economic Analysis is an average of the amount of money the consumers spend in a month on durable goods, consumer products, and services. It is considered as an important indicator of inflation.
Retail Sales (MoM) Released by the US Census Bureau measures the total receipts of retail stores. Monthly percent changes reflect the rate of changes of such sales. Changes in Retail Sales are widely followed as an indicator of consumer spending.
88
Macroeconomic Indicator
Description
Retail Sales ex Autos (MoM)
Released by the US Census Bureau is a monthly data that shows all goods sold by retailers based on a sampling of retail stores of different types and sizes except the automobile sector. The retail sales index is often taken as an indicator of consumer confidence. This report is the "advance" report, which can be revised fairly significantly after the final numbers are calculated.
Reuters Michigan Consumer Sentiment Index
Released by the Reuters/University of Michigan is a survey of personal consumer confidence in economic activity. It shows a picture of whether or not consumers are willing to spend money.
Richmond Fed Manufacturing Index
Conducted by Federal Reserve Bank of Richmond provides information on current activity in the manufacturing sector (mailing 220 business organizations). The industry inflation can be seen from the survey.
S&P Case-Shiller Home Price Indices (YoY)
Released by the Standard & Poor’s examines changes in the value of the residential real estate market in 20 regions across the US. This report serves as an indicator for the health of the US housing market.
Total Net TIC Flows Released by the US Department of Treasury. TIC stands for Treasury International Capital. It shows in and out flows of financial resources in the United States. The TIC flow is one of the major events in the market, as it is seen by most participants as the Government resource for offsetting the current Trade Deficit.
Trade Balance Released by the Bureau of Economic Analysis and the U.S. Census Bureau is a balance between exports and imports of total goods and services. A positive value shows trade surplus, while a negative value shows trade deficit. It is an event that generates some volatility for the USD.
Unemployment Rate Released by the US Department of Labor is the number of unemployed workers divided by the total civilian labor force. If the rate is up, it indicates a lack of expansion within the US economy.
Unit Labor Costs Released by the Bureau of Labor Statistics, Department of Labor shows a total cost of employing a labor force. It can serve as an indicator of trends in production costs, share prices, and inflation. A high reading is seen as positive (or bullish) for the USD, whereas a low reading is seen as negative, or bearish.
Wholesale Inventories Released by the US Census Bureau captures sales and inventory statistics from the second stage of the manufacturing process. The sales figures do not move the market as they do not reflect personal consumption while wholesale inventories may change the aggregate inventory profile which can influence the GDP forecast.
89
APPENDIX B - Application’s User Guide
The format of configuration file and output files were established in order to allow the simulation
and analysis of all the scenarios described in 4.4.1 and the comparison of the results. In the
following sections are presented the application’s installation and user guides.
Application’s Installation
The application was developed in Windows environment and in order to use it is necessary to
install MinGW (Minimalist GNU for Windows) that can be downloaded from
http://sourceforge.net/projects/mingw/files/. The developed application is called
“GA_Fundamentel.exe” and it must be placed in the same directory with the configuration file
named “parameters.txt”.
Application’s User Interface
The application offers a simple interface that at the start-up allows confirming the simulation’s
settings, information about the loaded input files (data time series) and the state of evolution of
the algorithm at each moment of its execution, like shown in Figure 29, Figure 30 and Figure 31.
Figure 29 - Setting Loading
90
Figure 30 - Data Time Series Loading
Figure 31 - Optimization Process Evolution
Application’s Input Parameters
All the input parameters, their description and ranges of typically used values are presented in
Table 55, followed by an example of configuration file presented in Table 56. These values were
estimated throughout the development of the application and take into account computational
resources required and time demands (training and simulation time). The configuration file must
be named “parameters.txt” and must be located in the same folder as the application executable
file, namely, “GA_Fundamental.exe”.
91
Table 55 - Application's Input Parameters
Input Parameter Description Range of Values Typically Used
Mutation Rate Probability of applying the Mutation Genetic Operation.
Between 0.05 and 0.2.
Crossover Rate Probability of applying the Crossover Genetic Operation.
Between 0.90 and 1.0.
Population Size Number of individuals in population in each generation.
Between 100 and 1000
Start Training Date Training Start date in format yyyymmddhhmmss.
Between 20070101000000 and 20100101000000.
End Training Date Training End date in format yyyymmddhhmmss.
At least 1 year after the Start Training Date
Start Investment Date Investment End date in format yyyymmddhhmmss.
Between 20090101000000 and 20100101000000
End Investment Date Investment End date in format yyyymmddhhmmss.
At least 1 year after the Start Training Date
Investment Period Investment window (months) after the training.
3 months
Training Period Training window (months) before the Investment window.
36 months.
Generations Generations must pass to meet termination criterion.
Between 100 and 1000.
Generations Same Fitness Generations must pass without fitness enhancement to meet termination criterion.
Between 100 and 1000
Number Of Runs Number of times that algorithm restarts from the beginning.
Between 10 and 100.
Data Location The main data folder location all the all the input data.
Any location.
MEV data locations Folder name where of all the MEV data.
Any name.
INDEX data location Folder name where of all the Index and VIX data.
Any name.
Minimum MEV number Minimum number of variables to be included in each hypothesis.
Between 0 and 5.
Maximum MEV number Maximum number of variables to be included in each hypothesis.
Between 5 and 20.
MEV Contribution Type MEV Contribution Type to be used in each hypothesis.
Linear and unit.
MEV Decay Type MEV Decay Type to be used in each hypothesis.
simple, exponential and none
Fitness Function The fitness function to be used during the evaluation
PI, PIMDD or CR
Threshold Limits The voting threshold above which investment decisions are made.
Between 0 and 1.
MEV Sum. Weights Weights to be used during the optimization process.
Between 1 and 10.
MEV Sum. Derivative Weights
Weights to be used during the optimization process
Between 1 and 10.
MA Weights Weights to be used during the optimization process.
Between 1 and 10.
Save Fitness Evolution Indicated if the application must to save the fitness evolution results.
True or false.
Evaluation Type Investment time scale. Daily or hourly
92
These values were estimated throughout the development of the application and take into
account computational resources required and time demands (training and simulation time).
The configuration file must be named “parameters.txt” and must be located in the same folder
as the application executable file, namely, “GA_Fundamental.exe”.
Table 56 - Example of the Configuration File
/Comments must follow this format //Algorithm's Parameters mutationRate-0.5 crossoverRate-0.9 populationSize-100 //Training/Investment Start and End Dates in format yyyymmddhhmmss startTrainingDate-20070101000000 endTrainingDate-20100101000000 startInvestmentDate-20100101000000 endInvestmentDate-20110702000000 //Investment Period integer months investmentPeriod-18 //Training Periods integer months trainingPeriods-36 //Number of runs of the Algorithm numberOfRuns-100 //Number of generations - termination criterion generations-500 //Number of generations without improvement - termination criterion generationsSameFitness-500 //MEVSumWeights-1,2,3 MEVSumWeights-1,2,3,4,5,6,7,8,9,10 //MEVSumDerivativeWeights-1,2,3 MEVSumDerivativeWeights-1,2,3,4,5,6,7,8,9,10 //MAWeights-5,6,7,8,9,10 MAWeights-1,2,3,4,5,6,7,8,9,10 //fitnessFunction-CR,PI,PIMDD fitnessFunction-PI //thresholdLimits-0.1,0.2,0.3,0.4 thresholdLimits-0.0,0.1,0.2,0.3,0.4,0.5 //Data Locations //Main Folder, all MEV and Index Sub-folders must be inside dataLocation-C:\Users\Alexander\Desktop\data //MEV sub-folders can be more than one //MEVdata_locations-some folder MEVdata_locations-topMEV //Index sub-folder INDEXdata_location-S&P500 //Model Parameters //Minimum number of macroeconomic variables to be used by every hypothesis minMEVnumb-5 //Maximum number of macroeconomic variables to be used by every hypothesis maxMEVnumb-20 //Impact Contribution type: linear or unit MEVContributionTypeEnumVar-linear //Impact Decay type: exponential or simple or none MEVDecayTypeEnumVar-none saveFitnessEvolution-false //Evaluation Type can be: hourly or daily evaluationType-daily
93
Application’s Output
To enable debugging, detailed analysis of the discovered strategies and comparison of the
results, it was decided to create during the execution multiple files that store all the necessary
information about the data time series, training strategies’ state, investment strategies’ state and
the results of the investment.
Output File Description
All_DTS.csv File that contains all the information about the data time series, including hourly Index prices, MEV minutely variations, MAs and VIX.
IndexAMVs.csv File that contains all the historical mean variations (minutely, hourly and daily).
MEV_AMV_Rating.csv File that contains the MEVs’ rating based on the minutely variations.
run1_startYYYYMMDDHHMMSS_endYYYYMMDDHHMMSS.csv
File that contains the evolution of the numerical parameters of the strategy.
run1_startYYYYMMDDHHMMSS_endYYYYMMDDHHMMSS.txt
File that contains the state of the strategy including listing of the MEVs and other parameters.
run1_startYYYYMMDDHHMMSS_endYYYYMMDDHHMMSS_train.txt
File that contains the state of the strategy discovered during the training including listing of the MEVs and other parameters and all the decisions made during the training.
FitnessEvolution.csv File that contains the evolution of the fitness during the training.
FinalResults.csv File that contains the final results of the simulation expressed in terms of minimum, maximum and final PIs, MDD and number of transactions.
94
95
APPENDIX C – Index Data Time Series Collecting Program
This program is capable of extracting daily, hourly and minutely data time series of the futures
but also to automatically convert the time zones (Moscow TC/GMT +3/4 hours to GMT+0) taking
into account daylight saving time conventions and all the other adjustment rules. It offers a
simple interface illustrated in Figure 32 and Figure 33. To download all the index data available
it is only necessary to choose the destination folder, the index future’s type and the time scale.
In the case when the destination location is omitted, the application downloads all the data to
the directory where it is located.
Figure 32 - Index Data Time Series Collecting Program Interface 1
Figure 33 - Index Data Time Series Collecting Program Interface 2
96
97
APPENDIX D – Enterprise Architect Quick Use Guide
Enterprise Architect is an UML modelling application that can be integrated with other tools (like
Eclipse) for application development. The latest version is licensed by the IST is 7.5.
Professional and it can be downloaded from https://delta.ist.utl.pt/software/software.php. The
Enterprise Architect software is available to all users of the IST campus under a Desktop
unlimited license.
Before creating a new project the application should be to configure to work in C++ mode by
default. It can be done by going to “Tools->Options->Source Code Engineering” and setting
C++ as default language for Code Generation like illustrated in Figure 34.
Figure 34 - Enterprise Architect Configuration
The structure, i.e., the packages, classes, interfaces can be directly drawn into the work area
from the tools are highlighted in Figure 35. The global structure of the project can be seen in the
project browser, also highlighted in Figure 35. The source code can be imported, exported and
synchronized with the project by clicking the right mouse button on any element of the project
and going to “Code Engineering” like illustrated in Figure 36.
98
Figure 35 - Enterprise Architect Tools and Project Browser
Figure 36 - Importing, Exporting and Synchronizing the Source Code