Forecasting Stock Markets Using Machine Learning · stock prices are extremely complex to model....

Forecasting Stock Markets Using Machine

Learning

André Dinis Oliveira

Forecasting the PSI-20 index using a Machine Learning

approach

Trabalho de Projeto apresentado como requisito parcial para

obtenção do grau de Mestre em Estatística e Gestão de

Informação

I

NOVA Information Management School

Instituto Superior de Estatística e Gestão de Informação

Universidade Nova de Lisboa

FORECASTING STOCK MARKETS USING MACHINE LEARNING

por

André Dinis Oliveira

Trabalho de Projeto apresentado como requisito parcial para a obtenção do grau de Mestre em Estatística e Gestão de Informação, Especialização em Análise e Gestão de Risco

Orientador: Prof. Mauro Castelli

Novembro 2016

II

Para os meus Pais, Ana Jacinta e José Lúcio

ACKNOWLEDGEMENTS

It would be hard to do this work project without the help of Prof. Mauro Castelli

that gave me full support trough out this adventure. I am sincerely grateful for all

the material, discussion and patience provided.

I also wish to thank my friends that were always available to distracted me

during the up’s and down’s of my life and this Master project. A special thanks to

O Fred.

Por fim quero agradecer a minha Mae persistente , ao meu Pai compreendedor,

ao chato do meu irmao Adriano e a minha querida irma Helena. A vossa ajuda e

paciencia foi essencial.

A todos, muito obrigado.

III

ABSTRACT

Predicting financial markets is a task of extreme difficulty. The factors that influence

stock prices are extremely complex to model. Machine Learning algorithms have

been widely used to predict financial markets with some degree of success. This

Master’s project aims to study the application of these algorithms to the Portuguese

stock market, the PSI-20, with special emphasis on genetic programming and the

introduction of the concept of semantics in the process of evolution. Three systems

based on genetic programming were studied: STGP, GSGP and GSGP-LS. The

construction of the predictive models is based on historical information of the index

extracted through a blooberg portal. In order to analyze the quality of the models

based on genetic programming, the final results were compared with other Machine

Learning algorithms through the application of significance statistical tests. An

analysis of the quality of the results of the different algorithms is presented and

discussed.

KEYWORDS

Genetic Programming; Stock Markets; Machine Learning; Geometric Semantic Operators;

Forecasting

IV

RESUMO

Prever mercados financeiros e uma tarefa de extrema dificuldade. Os fatores que

influenciam os precos de accoes sao de natureza complexa e de difıcil modelizacao.

Algoritmos baseados em aprendizagem automatica tem sido bastante utilizados

para prever os mercados financeiros com algum grau de sucesso. Este projeto de

Mestrado tem o objetivo de estudar a aplicacao destes algoritmos ao mercado de

accoes portugues, o PSI-20, com especial destaque para a aplicacao de programacao

genetica e para a introducao do conceito de semantica no processo de evolucao. Tres

sistemas baseados em programacao genetica foram estudados: STGP, GSGP and

GSGP-LS. A construcao dos modelos preditivos baseia-se em informacao historica

do ındice extraida atraves de um portal blooberg. Para analisar a qualidade dos

modelos baseados em programacao genetica, os resultados finais foram comparados

com outros algoritmos da area de aprendizagem automatica atraves da aplicacao

de testes de significancia estatıstica. Uma analise a qualidade dos resultados dos

diferentes algoritmos e apresentada e discutida.

PALAVRAS-CHAVE

Programacao Genetica ; Mercados Financeiros; Aprendizagem de Maquina; Operadores

Geneticos Semanticos; Previsao

V

INDEX

1 Introduction 1

1.1 Forecasting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 5

3 Machine Learning 9

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.1 Representation of Individuals . . . . . . . . . . . . . . . . . . 11

3.2.2 Initialization the Population . . . . . . . . . . . . . . . . . . . 12

3.2.3 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.5 Genetic Operators . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.6 GP Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 18

VI

3.2.7 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Geometric Semantic Genetic Programming . . . . . . . . . . . . . . . 20

3.3.1 Local Search in Geometric Semantic Operators . . . . . . . . . 22

3.4 Other ML Teqcnhiques . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.1 Linear Regression - LR . . . . . . . . . . . . . . . . . . . . . . 25

3.4.2 Support Vector Machines - SVM . . . . . . . . . . . . . . . . 26

3.4.3 Artificial Neural Networks - ANN . . . . . . . . . . . . . . . . 27

4 Methodology 29

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Software Methodology . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Results and Discussion 36

5.1 Comparison with other ML algorithms . . . . . . . . . . . . . . . . . 40

6 Conclusion and Future work 42

Bibliography 46

VII

List of Figures

3.1 General description of GP algorithm . . . . . . . . . . . . . . . . . . 11

3.2 GP Tree-based representation of 10x+ 10 . . . . . . . . . . . . . . . 12

3.3 Two examples of individuals created by full method with maximum

depth=2. 3.3a represents the tree syntax for f(x, y) = Sin(x)+(x−y)

and 3.3b the tree syntax for f(x, y) = x ∗ y + x/y . . . . . . . . . . . 13

3.4 Three examples of individuals created by grow method with depth limit=2.

3.4a represents the tree syntax for f(x, y) = x and 3.4b the tree syntax

for f(x, y) = (x+ y) + 10 and 3.4c for f(x, y) = x ∗ y + x/y . . . . . 14

3.5 Example of subtree crossover . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 Example of subtree mutation . . . . . . . . . . . . . . . . . . . . . . . 18

3.7 A visual intuition of a two-dimensional semantic space that is used

to explain properties of the geometric semantic crossover presented in

Moraglio (2012). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.8 A graphical representation of (a) GSM and (b) GSM-LS . . . . . . . 23

3.9 Schematic of a single hidden layer neural network . . . . . . . . . . . 28

4.1 General Machine Learning System . . . . . . . . . . . . . . . . . . . . 31

4.2 Evolution of PSI-20 Index close prices (02/01/2014 to 06/05/2016) . 33

VIII

5.1 Comparison between the three GP systems: results obtained for the

PSI-20 index dataset. Evolution of (a) training and (b) test errors

for each technique (MAE), median over 50 independent runs. . . . . . 36

5.2 Comparison between the three GP systems: results obtained for the

PSI-20 index dataset. Evolution of (a) training and (b) test errors

for each technique, median over 50 independent runs. . . . . . . . . . 38

IX

List of Tables

3.1 Three examples of possible primitive set . . . . . . . . . . . . . . . . . 12

4.1 The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Comparison between the GP systems: reports the median values obtained

for the last generation . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 P-values given by the statistical test for the GP systems . . . . . . . . 39

5.3 Experimental comparison between different non-evolutionary techniques

and GSGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 P-values given by the statistical test for the GP systems . . . . . . . . 41

X

ACRONYMS

ML Machine Learning: is the subfield of computer science that ”gives computers

the ability to learn without being explicitly programmed”

GP Genetic Programming: Evolutionary algorithm that mimics Darwin’s theory

of evolution of species

GSO Geometric Semantic Operators: new genetic operators that incorporate in

the process of evolution the concept of semantics.

SVM Support Vector Machines: algorithm belonging to ML field often used to

solve regression and segmentation problems. Frequency

ANN Artificial Neural Networks: algorithm belonging to ML that is bio-inspired

by the functioning of the human brain.

XI

Chapter 1

Introduction

Stock price time-series are often characterized by a chaotic and non-linear behaviour

which makes the forecast a challenging task. The factors that produces uncertainty

in this field are complex and from different nature, from economic, political and

investment decisions to unclear reasons that, somehow, produce effects and make

hard to predict how the prices will evolve. The stock market attracts investments

due to the ability of producing high revenues. However, owing to its risky nature,

there is a need for an intelligent tool that minimizes risks and, hopefully, maximizes

profits.

Predicting stock prices using historical data of the time-series to provide

an estimate of future values is the most common approach among the literature.

More recently, researchers have started to develop machine learning (ML) techniques

that resemble biological and evolutionary process to solve complex and non-linear

problems. This work contrasts the typical approach, where classical statistical

methods are employed. Examples of such ML techniques are Artificial Neural

Networks (ANN), Support Vector Machines and Genetic Programming (GP).

Genetic Programming (GP) belongs to the field of evolutionary computation

where algorithms are inspired on Darwin theory of evolution. In GP possible

solutions are usually called individuals and this population of individuals evolves

1

using genetic operators (crossover and mutation) to produce, hopefully, better individuals.

In the first description of GP made by Koza (1992), these genetic operators produce

new individuals simply by changing the syntax of the parents without taking into

account the semantics of the individuals. The concept of semantics in GP usually

refers to the vector of outputs produced by a GP individual on a set of training

data. Recently, new gentic operators called geometric semantic operators have been

proposed by Moraglio (2012). These operators have the interesting characteristic

of inducing an unimodal fitness landscape on any supervised learning problem.

However, these operators also presents a serious limitation, they produce individuals

that are much larger then their parents which makes the size of the individuals in

the population increasing exponentially over the generations.

The objective of this project is the application of genetic programming

systems to forecast the PSI-20 index, the Portuguese stock market. The approach

proposed in this work is to analyse the performance of different GP systems to

predict the next day price.

1.1 Forecasting Problem

The application of ML algorithms, more specifically GP, can be helpful in various

financial problems. It has already been applied successfully in financial forecasting,

trading strategies optimization and financial modelling.

This Master project focus on forecasting stock prices time-series using a

machine learning approach. Considering a short-term forecasting problem (one-day-

head forecast), the objective is to predict the stock price in given dayt+1 using a set

of inputs variables that represents the past stock prices up to dayt. The problem

can be described as follow: given a set of inputs variables, xt, xt−1, ..., xt−m we have:

xt+1 = f(xt, xt−1, ..., xt−n) (1.1)

where xt+1 is the next value of the time-series and f is the forecast function.

2

ML algorithms play the role of finding the best forecast function, f , through

the identification of hidden patterns and relations in the data either by parameter

optimization, creation of expressions and variable selection.

Although this project is applied to forecast financial time-series, it should

not be only of value to the financial sector. In general, ML can be used to forecast

and modelling any type of time-series.

1.2 Research Objectives

This project aims to apply ML algorithms to one of the most challenging tasks of

the financial sector: forecasting financial time-series. Financial agents can benefit

from systems based on ML to planning and monitoring their financial investments

more accurately and therefore achieving higher returns.

The main goal of this work is the application of Genetic Programming (GP)

and some new advances in the field, namely the introduction of geometric semantic

operators, to the problem of finding the best model that, given historical data, can

predict the price of the stock in the future.

In order to achieve the main goal, the following specific steps are set:

• A description of the ML algorithms with special emphasis in the field of GP

and its use in the financial sector.

• Selection of a financial time-series to develop the experimental work.

• Description of the data and the methodology used.

• Assessment of the performance of the models produced by the considered GP

systems:

– Standard Genetic Programming (STGP)

3

– Geometric Semantic Genetic Programming (GSGP)

– Geometric Semantic Genetic Programming with Local Search (GSGP-

LS).

• A comparison of the performance of the models between different ML approaches

1.3 Document Structure

The paragraphs bellow describes how the rest of the document is structured.

Chapter 2, Literature Review, summarizes the past work done on this field.

The main topics analysed in the scope of this project are Genetic Programming and

Machine Learning approaches to forecast the financial markets.

Chapter 3, Genetic Programming, presents mainly an overview of GP,

some advances made recently in the field, such the introduction of the concept

of ”semantics” in GP and a brief description of other ML algorithms used in this

field.

Chapter 4, Methodology and Data, describes the approach implemented

during this project to design a forecast system based on GP, the dataset, the

preparation and preprocessing of the data.

Chapter 5, Results and Discussion, presents the outcome of the project

and a interpretation of the results provided.

Chapter 6, Conclusion and Future Work, summarizes the project results

and proposes possible future developments.

4

Chapter 2

Literature Review

This chapter presents the literature and previous activities that are most relevant for

this work project. The literature review is done accordantly with the most relevant

subjects that were taking into account for the preparation of this project, more

specifically, Genetic Programming and Machine Learning techniques to forecast

stock prices.

Machine learning algorithms have been widely applied in many areas of

finance. More specifically, ML techniques are common accepted to predict stock

markets by means of a regression or classification problems. Usually, we have a

quantitative output measurement (such as a stock price) or categorical (such as stock

price goes up/down), that we wish to predict based on a set of variables, for example

the stock prices of previous days or other indicators that could explain the final

outcome. The use of ML algorithms allows us to build predictive models that can

explain the relation between input and output variables on an set of training data.

Considering an Supervised Learning approach, the agent is provided with known

input-output values (labelled data) and tries to formulate a function to explain such

relation. Models such Artificial Neural Networks (ANN), Support Vector Machines

(SVM), Genetic Algorithm (GA) and Genetic Programming (GP) have shown to

perform well both on regression and classification problems.

5

Some relevant articles about using ML techniques to predict the stock

markets are listed here.Atsalakis & Valavanis (2009) presents a survey on neural and

neuro-fuzzy techniques to forecast stock markets. It is shown that these techniques

are widely accepted to studying and performing stock market prediction. Shen

et al. (2012) proposes a prediction algorithm that exploits the use of temporal

correlation among global stock markets and various financial products to predict

the next-day stock trend using SVM. The author performs an empirical study on

NASDAQ, S&P500 and DJIA indexes achieving good results. Choudhry & Garg

(2008) investigate the use of an hybrid machine learning system based on Genetic

Algorithm (GA) and Support Vector Machines (SVM) for stock market forecasting.

The genetic algorithm is used to choose the set of most informative input variables

from the entire dataset. The results showed that the hybrid GA-SVM system

outperforms the stand alone SVM system.

There are numerous papers exploiting the use of Genetic Programming to

forecast stock prices. For instance, Hui (2003) presents a standard GP approach

to forecast the IBM stock prices. The paper presents an experimental study, which

aims to analyze how the GP parameters (population size, number of generations,

etc) effects the final accurracy of the GP model. Sheta et al. (2013) developed a

stock market prediction model based on GP for the S&P 500 stock market and

pointed out some unique advantages of using GP to stock market modelling. The

same work also proposed a comparison between the proposed GP model and other

prediction models, such Linear Regression and Fuzzy Logic models, showing a

very competitive performance of the GP model both on training and testing cases.

Schwaerzel & Bylander (2006) investigated the addition of high-order statistics

(mean, standard deviation, skewness and kurtosis) as well as trigonometric functions

to the function set in order to improve the results produced by the GP model in

financial time series. The paper analyses the performance improvement considering

a more sophisticated function set rather than a basic function set, as proposed by

Koza (1992). Their results indicate that the use of GP model with extended function

set outperforms ARMA models and basic GP. In Santini & Tettamanzi (2001) an

6

intelligent application based on GP was developed to forecast financial markets

which allowed the authors to win the competition organized within the CEC2000 on

”Dow Jones Prediction”. Lee & Tong (2011) proposes a hybrid forecasting model

for nonlinear time series by considering both ARIMA and GP models to improve

upon both the ANN and the ARIMA forecasting models. The results indicate

improvements in the accuracy of the proposed model against other ML approaches,

such ANN and standard GP, both on training and testing instances.

More recently, researchs have been focused on developing new variants of

GP to improve its performance. Moraglio (2012) proposes the use of geometric

semantic operators which enables the GP system to evolve and improve based on

the semantics of the solutions. The concept of semantics in GP is often intended as

the vector of outputs produced by a GP program on the training data. Although

these new operators have the interesting property of inducing a unimodal fitness

landscape, they also have the serious limitation of producing individuals much

larger than their parents which makes the size of the individuals in the population

increasing exponentially with generations and, making impossible to use them in

real life applications.

To surpass this limitation, L. Vanneschi (2013) proposed an efficient implementation

of geometric semantic operators (GSGP) which makes possible to use them in real

life applications. In Castelli & Vanneschi (2015) a detailed description of the

proposed implementation is made and performs an investigation of its efficiency

in terms of running time and memory occupation. The paper also describes the

GSGP library, written in c++, and the compilation process. The use of geometric

semantic operators has already been studied on real life applications. For instance,

L. Vanneschi (2013) studied the performance of GSGP against standard GP (STGP)

to problems in pharmacokinetics and energy load forecasting. The results obtained

in their investigation indicates that GSGP outperforms STGP on the training data

and significantly outperforms STGP on testing data. The authors were also able to

explain why those operators represents a concrete mechanism to limit overfitting.

7

The use of these geometric semantic operators to stock market forecasting

is presented by Castelli & Trujillo (2016). The paper investigates the use of different

GP systems based on geometric semantic operators (GSGP and GSGP combined

with a local search algorithm) and other machine learning techniques to forecast

stock market prices of Dow Jones index and the Istanbul stock index. This paper

presents two major contributions: it integrates the GSGP framework with an local

search optimiser, which is intended to improve the convergence speed of GSGP,

and develop an empirical study of the performance of the proposed algorithms to

challenging problems such forecasting stock markets.

8

Chapter 3

Machine Learning

3.1 Introduction

The use of machine learning techniques in the financial markets, specifically for

predicting financial time-series, have been quite successful. Nowadays, researchers

and companies are trying to develop intelligent algorithms that can capture the

hidden patterns inherent to stock markets in order to predict more efficiently the

behaviour of the stock prices. This field falls into the scope of machine learning and

predictive models.

In general, the approaches used by researchers can be divided into two

main classes:

• The econometric models developed based on statistical approaches such the

Linear Regression (LR), Autoregression Moving Average and ARIMA models.

However this models models offer a simple implementation they have the

nonrealistic assumption that the financial time-series data follows a linear

pattern and is stationary.

• Predictive models for forecasting market stock prices based on intelligent

algorithms that resemble biological processes to solve nonlinear and complex

9

problems . Examples of such algorithms are Genetic Programming (GP),

Artificial Neural Networks (ANN) and Support Vector Machines (SVM).

In the following sections will be presented the most common machine

learning algorithms used to predict the stock market, with special emphasis on

the Genetic Programming model.

3.2 Genetic Programming

GP represents one of the famous evolutionary computation techniques which seeks to

solve wide domain problems automatically. It was first described by Koza (1992) as

an automatic and domain-independent method which can create computer programs

capable of solving a large variety of problems. Moreover, GP attempts to perform

automated learning of computers programs by mimicking the process of Darwinian

evolution. In GP the process starts by randomly creating a initial population of

computer programs and with generations GP algorithm transforms the population

of programs into a new, hopefully better, population of programs through the use

of genetic operators, usually mutation and crossover. The quality of the computer

programs to solve the specific domain problem is asses by the usage of an appropriate

fitness function. A general description of the GP algorithm is described in Figure

3.1. Hence, in order to set up the GP run, there are 4 preparatory steps for solving

a given problem with GP:

1. Define the representation space. This means choosing the appropriate Terminal

set and Function Set, together they define the structure that is available for

GP to generate the computer programs (individuals).

2. Define the fitness function. Choose an indicate fitness function to evaluate the

candidate solutions.

3. Define the parameters to control the GP run. In order to run GP a few

10

parameters need to be set (population size, generations, probabilities of applying

the genetic operators, etc).

4. Define the termination criterion. Choose what triggers the GP run to terminate.

Figure 3.1: General description of GP algorithm

3.2.1 Representation of Individuals

In GP it is usual to represent the computer programs or individuals in a tree-based

notation (Koza, 1992). The syntax trees are constructed from a set of functions and

terminals, the Function Set and Terminal Set respectively. Together they form the

primitive set of the problem. The primitive set defines the search space which GP

will scan, that is, all the individuals that can be generate by the combination of

primitives in all possible ways.

The Function Set represents the internal nodes of the trees and in a simple

numeric problem the function set could consist, for instance, on the arithmetic

functions (+,-,*,/). The Terminal Set may consist of external input variables (e.g

x,y,z), function with no arguments (e.g rand() which returns random numbers) and

constants. The elements in the Function Set and Terminal Set are specified with

a number of arguments (which is usually called Arity). Table 3.1 presents three

different examples of primitive sets which will be used as examples. For instance,

considering the primitive set 1 in table 3.1, Figure 3.2 shows a tree-based syntax

representation of the candidate expression 10x+ 10.

For GP to work properly, the primitive set needs to ensure two important

11

+

*

x 10

10

Figure 3.2: GP Tree-based representation of 10x+ 10

Table 3.1: Three examples of possible primitive set

Primitive Set Function Set Arity Terminal Set Arity1 (+,-,*,/) (2,2,2,2) (x,y,10) (0,0,0)2 (+,-,*,/,sin,cos) (2,2,2,2,1,1) (x,y,rand(),10) (0,0,0,0)3 (AND, OR, NOT) (2,2,1) (x,y) (0,0)

properties known as closure and sufficiency. Sufficiency means that is possible to

create the solution to the problem using the elements available in the primitive set.

Unfortunately, ensure that the primitive set is sufficient may not be a easy task.

Having a insufficient primitive set, GP can only create programs that approximate

the desired one.

The primitive set also needs to ensure the closure property, that is, the

function set must be well define for any possible combination of expression that

may occur in the process of evolution. For example, considering the tree expression

in Fig 3.3b, the / function must be protected for the case when y equals zero. To

ensure the closure property for the divide operator, Koza (1992) introduced the

protected division, which returns 1 when the denominator equals zero.

3.2.2 Initialization the Population

After the Terminal and Function set are defined the next step is to choose the

initialization method to create the initial population. The individuals in the initial

12

population are usually randomly created. In GP there are three usual methods of

initialization:

• The full method

• The grow method

• The ramped half-and-half method

In the full method the initial individuals are created having a pre-specified maximum

depth. It is, randomly, selected functions from the Function Set to all nodes of the

tree until the maximum tree depth is reached. For the maximum depth level of the

tree only terminals can be selected. The full method creates trees which all the

leaves (terminals) have the same depth, although this does not mean that all the

trees will have the same number of nodes or the same shapes. This only happens

if all the functions in the primitive set have equal arity (R. Poli & McPhee., 2008).

For example, Figure 3.3 shows two individuals created by the full method with a

maximum tree depth equal 2. Although this method will generate all individuals

with maximum depth, the shapes may vary depending on the arity of the functions.

Individual 3.3a has a slightly different shape than 3.3b because the sin function

have arity equal to 1 and all functions of 3.3b have arity equal to 2. This method

creates all the individuals in the initial population with maximum depth which will

diminish the population diversity, at least in the initial population.

+

sin

x

-

x y

(a)

+

*

x y

/

x y

(b)

Figure 3.3: Two examples of individuals created by full method with maximumdepth=2. 3.3a represents the tree syntax for f(x, y) = Sin(x) + (x− y) and 3.3b thetree syntax for f(x, y) = x ∗ y + x/y

13

The grow method allows the creation of trees with different sizes and

shapes. In this method, nodes are selected from the function and terminal sets until

the maximum tree depth is reached. Then, when the maximum depth is reached

only terminals can be selected. This method allows to generate individuals with

different levels of depth, up to maximum depth. Figure 3.4 shows three individuals

created by grow method. Considering the primative set 2 in table 3.1 and with a

depth limit equal 2, in 3.4a the node chosen, x, belongs to the terminal set. This

prevents the tree of growing any more creating a tree with depth equal 0.

x

(a)

+

+

x y

10

(b)

+

*

x y

/

x y

(c)

Figure 3.4: Three examples of individuals created by grow method with depth limit=2.3.4a represents the tree syntax for f(x, y) = x and 3.4b the tree syntax for f(x, y) =(x+ y) + 10 and 3.4c for f(x, y) = x ∗ y + x/y

Since neither the grow or full method provide a very wide diversity of sizes

or shapes on their own in the initial population, Koza (1992) proposed a combination

of both methods called ramped half-and-half. Half the initial population is constructed

using full and half is constructed using grow. This is done using a range of depth

limits to help ensure that we generate a diverse initial population, both in structure

and in computational complexity.

14

3.2.3 Fitness Function

After determinate the search space, we need to define a measure of performance to

quantify how good a candidate solution is. This is done by the definition of a fitness

function. The fitness function is the mechanism that tells GP which candidate

solutions or regions of the search space are good. In problems such symbolic

regression, the fitness is usually based on error measures. Two error measures which

are widely used for regression problems are, for example, the MAE (Mean Absolute

Error) and the RMSE (Root Mean Squared Error) given by the formulas:

MAE =1

n

n∑i=1

|fi − yi| (3.1)

RMSE =1

n

√√√√ n∑i=1

(fi − yi)2 (3.2)

The fitness function evaluates the candidate solution by the amount of

error between its output and the desired output.

3.2.4 Selection

In GP genetic operators are applied to individuals that are selected based on fitness,

which means that better individuals are more likely to be chosen to be copy to the

next population or selected to perform crossover or mutation. The most usual

employed selection methods used in GP are: tournament selection and the fitness

proportionate selection.

When using tournament selection, a number of individuals (size of the

tournament) are randomly selected, repetitively, from the population and compared

with each other. The best individual in the tournament is chosen to be the parent of

the next generation. If crossover is applied, two selection tournaments are need to

choose two parents. The size of the tournament is a parameter choose by the user and

may effect the evolution process due to selection pressure. A system with a strong

15

selection pressure, (big tournament size), highly benefits the more fit individuals,

while a tournament with a weak selection pressure, (small tournament size) gives a

better change for less fit individuals to be chosen as parents.

In a fitness proportionate selection, individuals are chosen based on a

probability given by:

pi =fin∑

i=1

fi

(3.3)

where pi is the probability of individual i to be selected and fi is the fitness of

individual i. This method ensures that better individuals will always be more likely

to be selected as parents for the next population.

3.2.5 Genetic Operators

GP uses genetic operators to create new individuals which will be breed into the new

population. Therefore, the most commonly used operators in GP are the crossover

and mutation operators. The selection of which genetic operators should be used to

create new individuals is probabilistic. Their probability of application are called

operator rates. Usually, crossover operator is chosen with higher probability and, on

the contrary, mutation operator is less likely to be applied. The crossover rate (pc)

is normally above 90% and the mutation rate (pm) is usually much smaller, typically

being in the region of 1%. When the sum of crossover and mutation rates is less

than 100% a new operator called reproduction is used with a rate of 100%-(pc+pm).

Crossover Operator

The crossover operators produce new individuals (child’s) by mixing the structure of

their parents. The parents are chosen by a selection algorithm introduced in section

3.4. The most commonly used form of crossover is subtree crossover (Koza, 1992),

which works in the following way:

1. Select two parents using a selection method.

16

2. Select a random subtree from each parent and the root of each subtree is the

crossover point.

3. Create two new individuals (child’s) by swapping the subtrees selected in their

parents.

Figure 3.5 shows a example of subtree crossover.

Figure 3.5: Example of subtree crossover

Mutation Operator

Other genetic operator used in GP to change the syntax structure of the trees is the

Mutation operator. The most commonly used form of mutation in GP is the subtree

mutation which selects, randomly, a node in the tree and replaces that subtree

selected by a random generated tree.

Another common method of mutation used in GP is point mutation. When

using point mutation, a random node is selected and the primitive stored there is

replaced by a random primitive with the same arity available on the primitive set.

Point mutation is used in a way that each node is considered to turn and, given a

17

Figure 3.6: Example of subtree mutation

certain probability is altered by another primitive on the primitive set as explained

above. This method allows multiple nodes to be mutated in a single application of

point mutation.

Reproduction Operator

The reproduction operator simply involves the copy of the selected individual to the

next generation without any modification. Reproduction is often associated with

elitism. Elitism consist in copying the best individual, or a percentage of the fittest

individuals, of the generation to the next one, without any modification.

3.2.6 GP Parameters

In any GP applications the user need to specify the control parameters for the

run. The control parameters are very important and can effect the performance of

18

GP to solve the problem. Before any GP run, the user must specify the following

parameters:

• Population Size. The population size is an important parameter and its value

depends on the complexity of the problem. However, generally GP performs

better if the population size is ’bigger’.

• Number of Generations. This parameters indicate the maximum number of

generations available for GP to evolve.

• Probabilities of performing the genetic operators. Traditionally, Crossover is

applied with a ’big’ probability, usually 90%, and mutation is applied with a

’smaller’ probability, usually less than 5%.

• The Selection method. The most common selection method used in GP is the

tournament selection. When using this method, it is also needed to specify

another parameter which is the tournament size.

• Population initialization. In many GP applications, it is common generate

the initial population using ramped half-half with a depth range of 2-6. This

method is more commonly used because it provides a diverse initial population.

3.2.7 Termination Criterion

The termination criterion is the method that determines when the run stop. The

run may finish when the maximum number of generations, or when an enough fitted

individual is found. It is also defined the data returned from the run which usually

consists on the best-so-far individual.

19

3.3 Geometric Semantic Genetic Programming

In the earlier sections GP was presented has in Koza (1992). Genetic operators

as described above only produces new offspring’s by manipulating their syntactic

representation. Although this property allows genetic operators to remain simple

and generic search operators it becomes difficult to understand how a modification

of the syntax may affect the quality of the individual.

A recent trend in Genetic Programming is the attempt to construct genetic

operators that can take into account the semantics of the solution. The concept of

semantics in GP is often intend to mean a vector of output values obtained by a set

of input data. In order words, the semantics of a solution refers to the behaviour

of itself. Regarding on this definition, Moraglio (2012) have introduced geometric

semantic operators for GP: Geometric Semantic Crossover and Geometric Semantic

Mutation. In Moraglio (2012) is presented the formal definitions of this genetic

operators.

Definition 3.3.1. Geometric Semantic Crossover (GSC ).Given two parent functions T1,T2 : Rn → R , the geometric semantic crossoverreturns the real function TXO = (T1 ∗ TR) + ((1− TR) ∗ T2), where TR is a randomreal function whose output values range in the interval [0,1].

Moraglio (2012) formally proves that this GSC operator corresponds to

geometric crossover on the semantic space, and thus induces a unimodal fitness

landscape. In order to TR produces values in the range of [0, 1] it is usually use the

sigmoid function:

TR =1

1 + e(−TRand)(3.4)

where TRand is a generated random tree with no constrains.

Definition 3.3.2. Geometric Semantic Mutation (GSM ).Given a parent function T : Rn→ R, the geometric semantic mutation with mutationstep ms returns the real function TM = T + ms ∗ (TR1 − TR2), where TR1 and TR2

are random real functions.

20

Moraglio (2012) proves that this operator corresponds to ball mutation on

the semantic space and induces a unimodal fitness landscape. The random generated

tress TR1 and TR2 have been limited to assume values in the range of [0,1], using

the exact same method describe for TR used in GSC. The parameter ms allows a

’small’ perturbation in the individual because in centred in zero (difference between

the two random trees). Despite that, this parameter can be tuned to define a bigger

or smaller magnitude of the perturbation produced by this operator.

The use of this new genetic operators allows us to produce modifications

on the syntactic space of GP individuals that have a exact effect on their semantics.

For any supervised learning problem, where the expected output values are known

and the fitness consists on the distance in the semantic space between any individual

and the target point, these operators have a very interesting property of inducing

a uni-modal fitness landscape (error surface), such like regression and classification

problems. Other interesting property derived from the definition of this operators is

that geometric properties remains independently from the data on which individuals

are evaluated. In other words, geometric semantic crossover produces an offspring

that lies between the parents also in the semantic space induced by the test data.

This is extermely interesting because this operators can be considered a tool to help

control and limit overffing, offering a satisfatory generalization ability in the test set

(out-of sample data). This last property was first clearly presented in L. Vanneschi

(2013) trough the application of this operators to several real life applications.

Figure 3.7 shows the semantic space for geometric semantic crossover on

the training data and test data. It is easy to see that geometric semantic crossover

produces an offspring, C, that is better or equal than the worst of its parents. Figure

3.7 shows the offspring C, that stands between its parents P1 and P2 and is closer

to the desired target T than parent P2. This property holds by construction for the

test data which offers the ability to control overfitting.For instance, offspring C is

not worse than the worst of the parents not only considering the training target T

, but also considering any hypothetical test target T’.

21

Figure 3.7: A visual intuition of a two-dimensional semantic space that is used toexplain properties of the geometric semantic crossover presented in Moraglio (2012).

Although this operators have the interesting properties explained above,

they also have the strong limitation, by construction, of generating offspring’s that

are much larger than their parents, provoking an exponential growth in the size of

the individuals. In a few generations the size of the individuals in the population

becomes incredibly large which makes these operators unusable in real life applications

To overcome this limitation, in L. Vanneschi (2013) is defined a new implementation

of these operators, which allows us to use them efficiently. For a more comprehensive

description of this efficient implementation the interested reader in pointed to L. Vanneschi

(2013) and Castelli et al. (2014).

3.3.1 Local Search in Geometric Semantic Operators

One interesting strategy of improving geometric semantic operators is to integrate

a Local Search algorithm in order to optimize the search. This technique can be

integrated in the geometric operators, in particular the geometric semantic mutation

with a local search optimizer (GSM-LS) given a tree T, generates the following

individual:

22

TM = α0 + α1 ∗ T + α2 ∗ (TR1 − TR2) (3.5)

where αi ∈ R; notice that α2 replaces the mutation step parameter ms of

the geometric semantic mutation (GSM). The LS operator attempts to determine

the best linear combination between the parent tree and the random trees used to

perturb it, which is local in the sense of the linear problem posed by the GSM

operator. This strategy can be seen as fitting a linear regression model on the GSM

operator to improve

It should not be seen as a LS in the entire semantic space, since in that case

the LS would necessarily converge thorough to the desired program in the unimodal

landscape.

With a local search method integrated, the search process will become

more efficient and will improve the convergence speed of the algorithm in order to

obtain better performance with respect to the algorithm that only uses the geometric

semantic operators. Moreover, by speeding up the search process, it will be possible

to limit the construction of over-specialised solutions that could overfit the data.

Figure 3.8: A graphical representation of (a) GSM and (b) GSM-LS

Figure 3.8 illustrates the difference between the GSM and the GSM-LS.

First,(a) shows a plot of the semantic space, the space of all possible program

outputs, with the highest fitness peak at the desired program output t. Also, the

23

semantics of a single individual is represented as s, a circle around s is the area in

which the semantics x of the offspring generated by GSM will lie. GSM can, in

some cases, generate offspring with semantics that are farther away from t than the

parent, with lower fitnes and this property can slow down the convergence speed of

the search. When using a local search optimizer, GSM-LS, it will always produce

an offspring that have a better fitness than the parent, by forcing the geometric

mutation to always move towards to the desired program, as represented in Figure

(b).

24

3.4 Other ML Teqcnhiques

In the following subsections will be presented a brief description of other machine

learning techniques that are commonly used to perform stock market prediction,

such linear regression models, support vector machines and artificial neural networks.

3.4.1 Linear Regression - LR

The linear model has been present for a long time now and remains one of the most

important tools in the statistics field. A linear regression model can be represented

by the following mathematical expression:

X = (X1, X2, ..., Xp) (3.6a)

β = (β1, β2, ..., βp) (3.6b)

Y = β0 +

p∑j=1

Xjβj (3.6c)

where Xi represents the model input variables and Y is the model output variable.

The βi, i=1,2,...,p are the model parameters which need to be estimated.

The term β0 is the intercept, also known as the bias in machine learning.

Often it is convenient to include the constant variable 1 in X, include β0 in the

vector of coefficients β, and then write the linear model in vector form as an inner

product:

Y = XT β (3.7)

For the estimation of the coefficients of the model β the most common

approach is to estimate using the least squares method. In this approach β is

estimate in order to minimize the residual sum of squares:

RSS(β) =N∑j=1

(yi − xTi β)2 (3.8)

25

writing the formula in matrix notation we have:

RSS(β) = (y −Xβ)T (y −Xβ) (3.9)

where X is an N * p matrix with each row an input vector, and y is an N-vector of

the outputs in the training set. Differentiating in order of β and equal to zero we

get the equations:

XT (y −Xβ) = 0 (3.10a)

β = (XTX)−1XTy (3.10b)

3.4.2 Support Vector Machines - SVM

Suppport vector machine (SVM) have been implemented in many types of problems

such classification, recognition and regression. It was firstly on classification problems,

principle to develop binary classifications. The goal of support vector machine is to

build a hyperplane as the decision surface such the margin of separation between

labels is maximized.

For SVM regression, the inputs X are first mapped into a m-dimensional

feature space using some nonlinear relation, and then a linear model is constructed

in this feature space. Using mathematical notation, the linear model is given by

f(X,φ) =m∑j=1

(φj ∗ gj(X)) + b (3.11)

where gj(X),j=1,...m is the function representing the nonlinear transformations

and b is the ’bias’ term. In order to estimate the quality of the produced outputs is

used a loss function proposed by Vapnik.

Lε(y, f(X,φ)) =

0, if |y − f(X,φ)| <= ε

|y − f(X,φ)| − ε, othetwise

26

For a more comprehensive explanation of SVM, the reader is referred to

the Bibliography (Zhang (2001) and Smola & Scholkopf (2004)).

3.4.3 Artificial Neural Networks - ANN

Artificial neural networks (ANNs) are a bio-inspired computational model that tries

to mimic, at some rudimentary level, the behaviour of the human brain. These

models are used to estimate or approximate functions by building a system of

interconnected ’neurons’ which can compute values from input variables and fit

a function the approximate the desired output. The artificial neural networks are

usually presented by having three layers, the input layer which are compose by

the input variables used to modelling the problem, the hidden layer that receive

values from the input layer and provides them for the output layer. All the layers

are connected between each other by corresponding weights to neurons of different

layers. Them, each network is trained through receiving some examples (many pairs

of input and outputs) and as a result, weights among layers will change and update

by comparing the output of the network with the desired target until the computed

values from the networks ’match’ the desired target.

A neural network can be seen as a two-stage regression or classification

model, typically represented by a network diagram as in Figure 3.9.

In real life applications, a neural networks is often constructed having more

than one hidden layers. A multi layer perceptron (MLP) is a feed forward artificial

neural network model that maps sets of input data on to a set of appropriate outputs.

A MLP consists of multiple layers of nodes with each layer fully connected to the

next one. Except for the input nodes, each node is a neuron (or processing element)

with a non-linear activation function.

For training the network the most common approach is to use a technique

called backpropagation (BP) with non-linearly activating nodes. This technique the

most common approach among the literature. It is extremely simple to implement

27

Figure 3.9: Schematic of a single hidden layer neural network

although tends to converge slowly. To produce a non-linearly relationship between

the layers is necessary to use a nonlinear activation function such sigmoid function:

φ(s) =1

1 + e−s(3.12)

This is an example of a very well known predictive model, in the field of

supervised learning, and is carried out through backpropagation.

28

Chapter 4

Methodology

4.1 Introduction

Forecasting stock prices can be a challenging task. The process of determining

which indicators and input data will be used, and gathering enough training data to

training the system appropriately is not obvious. The input data may be raw data on

volume, price, or daily change, but also it may include derived data such as technical

indicators (moving average, trend-line indicators, etc.) or fundamental indicators

(intrinsic share value, economic environment, etc.). It is crucial to understand what

data can be useful to capture the underlying patterns and integrate into the machine

learning system.

The methodology used in this work consists on applying Machine Learning

systems, with special emphasis on Genetic Programming. GP has been considered

one of the most successful existing computational intelligence methods and capable

to obtain competitive results on a very large set of real-life application against other

methods. The experimental work is focused on applying the standard GP, such as

the introduction of geometric semantic operators, to forecast the PSI-20 index using

historical data and considering one day ahead forecasting.

The time-series chosen was the PSI-20 Index, which represents a capitalization-

weighted index of the top 20 stocks listed on the Lisbon Stock Exchange.

29

Section 4.2 introduces the problem statement as well as the software used.

Section 4.3 is presents the dataset used for this work and the data transformations

made. Finally, section 4.4 describes all the experimental settings used for each

system taken into account.

30

4.2 Machine Learning Algorithm

In the chosen approach to predict the next day price of PSI-20 index, it was used a

supervised learning approach where the input variables of the algorithm are a set of

economic indicators, directly related with the PSI-20 Index, with known time-stamp

t. The goal of the Machine Learning System is to use the input variables to predict

the values of the outputs.

Figure 4.1 shows how a supervised learning system functions.

Figure 4.1: General Machine Learning System

The whole time series, PSI-20 Index prices from 2 January 2014 to 6

May 2016, were divided into the training and the testing data sets as described

in Figure 4.2. For the training set was considered the data from 2 January 2014

to 31 December 2015 (500 data observations) and for the testing set the remaining

period.

The objective is to predict the stock price at the end of the day t+1

considering the dataset reported in Table 4.1.

31

4.2.1 Software Methodology

For the Standard Genetic Programming Application (STGP), a c++ library written

for the purpose of this work was implemented.

Regarding the Geometric Semantic Genetic Programming (GSGP) it was

used the GSGP implementation freely available at http://gsgp.sourceforge.net and

very well documented in Castelli & Vanneschi (2015). GSGP is free/open source

c++ library and it provides a robust and efficient implementation of geometric

semantic genetic operators for Genetic Programming. This library implements the

standard GP algorithm but the genetic operators have embedded the concept of

the semantic awareness as explained in section 3.3. It is easily adaptable and its

implementation is straightforward, depending on set of configuration parameters.

For the GSGP-LS implementation the same library was used with small

adaptations of the source code to include a local search optimizer in the GSM

operator.

All the results concerning other Machine Learning Techniques were obtained

by using WEKA software.

4.3 Data Description

The data for this work are a set of variables related with the PSI-20 Index from

2 January 2014 to 6 May 2016, which corresponds to 600 data observations. The

variables represent historical data relative to the index, namely daily close prices,

open price, high price and low price. In Table 4.1 is the description of the dataset

used that was transformed using this variables.

All the data used in this work was extracted from a bloomberg terminal.

32

Figure 4.2: Evolution of PSI-20 Index close prices (02/01/2014 to 06/05/2016)

4.3.1 Data Transformation

After running a few experiments on raw stock prices the GP systems often reach

to a local optimal in a small number generations by predicting a solution similar to

last day price. To overcome this problem rather than using raw stock prices, daily

changes in stock prices were used. A new variable xi is defined, which represents

the daily change in price of the time series data:

xi = Pi − Pi−1 (4.1)

This method of differencing the data is commonly used to transform non-

stationary time series into stationary ones. Given that stock prices are likely to be

near with each other considering consecutive days, GP systems will be more likely to

produce a solution that resembles on last day price data as output prediction. When

considering daily changes in stock prices the later is not likely to happen because

daily changes are not likely to be close to each other on a day-to-day period.

33

Table 4.1: The Data Set

Variable Description

(x1,...x10) Daily changes of close prices between consecutive days

x11 The change of open price between consecutive days

x12 The change of high price between consecutive days

x13 The change of low price between consecutive days

x14 The percent change of close prices between consecutive days

Target The change of close price in the next day

4.4 Experimental settings

Three different GP systems were implemented: the standard GP approach (ST-GP),

as proposed by Koza (1992); GSGP that uses geometric semantic operators, both

GSC (Geometric Semantic Crossover) and GSM (Geometric Semantic Mutation);

and GSGP with a Local Search Optimizer implemented on the GSM operator,

GSGP-LS.

All the systems were set with a population size of 200 individuals evolved

for 1000 generations with a total of 50 runs. To perform the tree initialization the

Ramped Half-and-Half method was used with a maximum initial depth equal to

6. Selection was made by the tournament selection method with a tournament size

of 10. For the STGP, the function set consisted on the arithmetic operators, (+,-

,*,/) as well as the cosine, sine, and log functions. For the others GP systems the

function set contained only the arithmetic operators. The terminal set consisted of

by 14 variables, summarized in Table 4.2. For all systems, Crossover has been used

with probability 70% and Mutation with probability 30%. With respect to of GSM

the mutation step was set randomly in each mutation event as in Vanneschi et al.

(2014). Elitism was granted to the best individual in the population for all systems.

After running some experiments the number of generations where the local search

has been used was limited to 10 to avoid overfitting on the training data. To analyse

the performance obtained the mean absolute error (MAE) was considered, for all

34

GP systems, defined as follow:

MAE =1

n

n∑i=1

|fi − yi| (4.2)

where fi is the output measure of the GP program and yi is the target value for the

instance i. All the parameters of the systems studied are summarized in Table 4.2.

In the next chapter the results obtained are reported. The experimental

results are evaluated by reporting the median error of the training and test sets. For

each run, the best individual of the generation is stored and then the median value

per generation is reported.

Table 4.2: Experimental Settings

Method

Parameters STGP GSGP GSGP-LS

Terminal set x1,...,x14 x1,...,x14 x1,...,x14

Funcion set +,-,*,/,cos ,sin,log +,-,*,/ +,-,*,/

Fitness Function MAE MAE MAE

Population 200 200 200

Generations 1000 1000 1000

Probability Crossover 0.7 0.7 0.7

Probability Mutation 0.3 0.3 0.3

Elitism best individual best individual best individual

Tournament Size 10 10 10

Max depth creation 6 6 6

No Generations using local search - - 10

35

Chapter 5

Results and Discussion

The results presented in this section were obtained using the described methodology

in subsection 4.2.1. For all the GP systems, 50 runs have been performed. Figure

5.1 reports, for the dataset taken into account, training and test error (MAE) for

the all the GP systems considered against generations.

(a) (b)

Figure 5.1: Comparison between the three GP systems: results obtained for the PSI-20 index dataset. Evolution of (a) training and (b) test errors for each technique(MAE), median over 50 independent runs.

The results obtained are reported by plotting the median error on the

training and test set. In each generation, the best individual in the population (i.e.,

the one that has the smallest training error) was chosen and the value of its error

on the training and test sets was stored. The reported plots contain the median of

all these values collected at the end of each generation. The median was preferred

36

over the mean in reported plots because of its higher robustness to outliers.

Considering STGP and GSGP, Figure 5.1 clearly show that GSGP outperforms

STGP on both training and test sets. It is possible to see that GSGP performs well

against STGP because the properties of the genetic operators defined in GSGP

allow a faster convergence on the training data and it is also possible to note on

the testing set that GSGP is able to control overfitting. Although the GSGP in the

final generation is able to converge to a lower training error than STGP, on unseen

(test) data the two GP systems achieve a comparable test error. Regarding the

GSGP-LS system performance, it shows a even faster convergence on the training

data with respect to the GSGP system on the first 10 generations. After that, the

LS optimizer stops and the performance become like a normal GSGP system. It

is possible to note that the application of the LS optimizer despite of producing a

faster convergence in the training data also produces a overfit on the testing data,

even greater than STGP for the considered dataset. The median results of 50 runs

for the last generation are shown in Table 5.1.

Table 5.1: Comparison between the GP systems: reports the median values obtained

for the last generation

Mean Absolute Error

Method Train Test

STGP 61.13 60.02

GSGP 43.28 59.49

GSGP-LS 39.1 62.02

To analyse the final results for the three GP systems, Figure 5.2 reports

a statistical study of the last generation results achieved by STGP, GSGP and

GSGP-LS for the PSI-20 data set, considering the same 50 runs. On the training

data GSGP and GSGP-LS are able to achieve better results when compared with

STGP and when considering the testing data, all the systems achieve similar median

results. It is also useful to note the ability of GSGP and GSGP-LS (after the LS

optimizer stops) to limit overfitting. Note in some runs, STGP completely overfit

37

(a) Train (b) Test

Figure 5.2: Comparison between the three GP systems: results obtained for the PSI-

20 index dataset. Evolution of (a) training and (b) test errors for each technique,

median over 50 independent runs.

the data achieving test error greater than 100.

In order to study the statistical significance of the final results, at generation

1000, it was firstly used the Shapiro Wilk test, with α=0.05, to test if the data are

normally distributed. The Shapiro Wilk test is based on the following statistic:

W =b2

n∑i=1

(x(i) − x)(5.1)

where x(i) are the ordered values, x(1) < x(2) < ... < x(n). The variable b is calculated

in the following way:

b =

n/2∑i=1

an−i+1 × (x(n−i+1) − x(i)) if n is even

(n+1)/2∑i=1

an−i+1 × (x(n−i+1) − x(i)) if n is odd

(5.2)

where an−i+1 are calculated based on statistical moments from a normal distribution.

Since the p-values for the Shapiro Wilk test reject the null hypothesis,

a rank-based statistic was used. The Wilcoxon rank-sum test is used to test if

two population are likely to derive from the same distribution (i.e., that the two

populations have the same shape). It is common among researches to interpret this

test as comparing the medians between the two populations. The Wilcoxon rank-

38

sum test, with α = 0.05, was used under the null hypothesis that the samples have

equal medians. For a more comprehensive explanation of statistical test performed

in this work, the reader is referred to Bibliography Kanji (2006). The p-values

obtained are reported in Table 5.2.

Table 5.2: P-values given by the statistical test for the GP systems

STGP GSGP GSGP-LS

Method Train Test Train Test Train Test

STGP - - < 0.001 0.0358 < 0.001 < 0.001

GSGP - - - - < 0.001 < 0.001

According the p-values, it is possible to say that GSGP produces solutions

that are significantly better (i.e., with lower error) than STGP both on training and

test data. When comparing the STGP against GSGP-LS, the latter clearly produce

significantly better solutions but only on the training set, due to some overfitting in

early generations of GSGP-LS, the former ended up producing better results on the

test set. Analysing the p-values obtained for the comparison between GSGP and

GSGP-LS it is possible to state that GSGP-LS only produces significantly better

solutions on the training data, when considering the test data GSGP-LS ended up

producing significantly worst solutions. This results were somehow expected in the

training data due to how the genetic operators are constructed on the different GP

systems. In the testing data the application of the LS optimizer was not beneficial for

the PSI-20 dataset. Despite the fact that GSGP-LS is able to achieve an incredible

faster convergence in fewer generations, in this case it also ended up producing

over-specialized solutions.

After comparing the behaviour of the three systems the GSGP was chosen,

as baseline, to compare with other ML techniques due to his overall superior performance

when analysing the training and testing errors.

39

5.1 Comparison with other ML algorithms

After comparing the GP systems against each other it is also important to compare

against other state-of-the-art ML algorithms, to understand and evaluate the how

the results obtained by GP are competitive against more common approaches.

Table 5.3 reports the values of the training and test errors (MAE) of the

solutions obtained by all the studied techniques including the GP systems.

Table 5.3: Experimental comparison between different non-evolutionary techniques

and GSGP

Mean Absolute Error

Method Train Test

Linear Regression 63.01 58.6

Isotonic Regression 62.69 57.71

Neural Nets 53.14 71.88

SVM (degree 1) 62.49 58.9

SVM (degree 2) 57.38 66.88

SVM (degree 3) 48.04 76.56

STGP 61.13 60.02

GSGP 43.28 59.49

GSGP-LS 39.1 62.02

From these results, it is possible to say that GSGP and GSGP-LS perform

better than all the other methods on training set. Considering both training and

test set it is also interesting to note that GSGP and GSGP-LS outperforms well

known ML algorithms such as Neural Networks and SVM polynomial degree 2 and

3 for this dataset. For the other cases it is possible to notice that the GP systems

is able to produce very comparable results against the other methods

To study the statistical significance of these results, the same set of tests

described in the previous section was performed. All the obtained p-values relative

to the comparison between the three GP systems and the other ML methods are

presented in Table 5.4.

40

Table 5.4: P-values given by the statistical test for the GP systems

LIN ISO NN SVM-1 SVM-2 SVM-3

Method Train Test Train Test Train Test Train Test Train Test Train Test

STGP < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.0011 < 0.001 < 0.001 < 0.001 < 0.001

GSGP < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

GSGP-LS < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

In table 5.4, LIN refers to linear regression, ISO stands for isotonic regression,

NN stands for neural networks, SVM-1 refers to the support vector machines with

polynomial kernel of first degree and similarly for SVM-2 and SVM-3. According to

the results reported in the table, the differences in terms of training and test fitness

between all methods are statistically significant for the PSI-20 dataset. Regarding

the GSGP method, which is the best performance on unseen instances, it produces

results that are significant better results with respect to several of the other non

evolutionary methods (NN and SVM-2 and SVM-3). When considering the training

instances, GSGP performs significant better with respect to all non evolutionary

methods STGP(according to the p-values). The only techniques that significantly

outperform the GSGP system on the test instances are LIN and ISO and SVM-1.

These results obtained for the PSI-2O index,indicates the capability of GP

systems to produce predictive models for stock markets.

41

Chapter 6

Conclusion and Future work

Predicting stock market prices is far from being a trivial task. The uncertainty

and volatility that characterize stock markets makes very hard and sometimes even

impossible to predict what will happen. Understanding what can and will happen

in financial markets is extremely important nowadays to everyone who needs to

plan investments, management of risks or allocate efficiently their resources. In

order to address this extremely hard problem, computational intelligence techniques

have been proposed and applied with some degree of success. This computational

intelligence techniques are often referred as Machine Learning and Predictive Models.

In this work project we studied the application of evolutionary algorithms, namely

Genetic Programming, in order to address this problem. A comparison between

the standard approach of GP and some recent developments on GP systems, which

incorporates the concept of semantics in the evolution process trough the definition

of new genetic operators.

This work intended to study the suitability of GP systems to forecast the

financial markets and also to perform more empirical tests to analyse some properties

and problems that arises when using geometric semantic operators (GSGP and

GSGP-LS). To validate the different systems, an extensive experimental analysis

was performed using the Portuguese stock market, the PSI-20 Index. The GP

systems were tested against each other in terms of performance and also against

42

state-of-the-art ML techniques. The usage of geometric semantic operators (GSGP)

presented some interesting theoretical and practical properties that can be exploited

to achieve better performance versus the standard GP (STGP). It was interesting to

see that the introduction of a local search optimizer within GSGP is able to produce

better results on the training set in a significantly lower number of generations

with respect of GSGP. Unfortunately, the usage of LS, in this experimental study,

ended up to early over-specialize solutions provoking an superior error on the test

set. Experimental results reported in this work have shown that GP is more than

capable of produce satisfactory results when comparing with other techniques and

in some cases is able to outperform them. These results are a clear indication that

GP is capable of generate appropriate predictive models of stock prices.

Regarding possible future work,, I intend to continue investigate the usage

of genetic programming to forecast stock prices considering the following main areas:

• Variable selection. Considering a larger dataset with more indicators can

be helpful to construct accurate predictive models.

• Parameter optimization. Normally, parameters configuration on GP has

meaningful effects on the performance of GP. It is important to build more

scientific methods to set up this parameters rather than just running a few

experiments.

• Hybrid models. It might be useful to construct hybrid predictive models

in order to take advantage of the pros inherent to different models. Mixing

different models with GP may be beneficial to model the volatility of stock

markets.

To summarise, this work project provides both an overview of the principal

ML techniques used to predict stock market prices and a empirical study about the

application of ML to predict stock markets, with special emphasis on GP and the

introduction of GSO’s in GP.

43

Bibliography

Atsalakis, George S., & Valavanis, Kimon P. 2009. Surveying stock market

forecasting techniques - Part II: Soft computing methods. Expert Systems with

Applications, 36(3 PART 2), 5932–5941.

Castelli, Mauro, & Trujillo, Leonardo. 2016. Stock index return forecasting:

semantics-based genetic programming with local search optimiser .

Castelli, Mauro, Vanneschi, Leonardo, & Silva, Sara. 2014. Prediction of the Unified

Parkinson’s Disease Rating Scale assessment using a genetic programming system

with geometric semantic genetic operators,41. Expert Systems with Applications,

4608–4616.

Castelli, M., Silva S., & Vanneschi, L. 2015. A C++ framework for geometric

semantic genetic programming, Genetic Programming and Evolvable Machines,

Vol. 16, No. 1, pp.73–81.

Choudhry, Rohit, & Garg, Kumkum. 2008. A Hybrid Machine Learning System

for Stock Market Forecasting. World Academy of Science, Engineering and

Technology, 2(15), 315–318.

Hui, Anthony. 2003. Using Genetic Programming to Perform Time-Series

Forecasting of Stock Prices.

Kanji, G.K. 2006. 100 Statistical Tests, 3Rd Ed. 1–257.

Kara, Yakup, Acar Boyacioglu, Melek, & Baykan, Omer Kaan. 2011. Predicting

direction of stock price index movement using artificial neural networks and

44

support vector machines: The sample of the Istanbul Stock Exchange. Expert

Systems with Applications,38. 5311–5319.

Keijzer, M. 2003.

Koza, JR. 1992. Genetic Programming: On the Programming of Computers by

Means of Natural Selection. MIT, Cambridge.

L. Vanneschi, M. Castelli, L. Manzoni S. Silva. 2013. A new implementation of

geometric semantic GP and its application to problems in pharmacokinetics, in

Proceedings of the 16th European Conference on Genetic Programming, EuroGP

2013, Volume 7831 of LNCS, ed. by K. Krawiec, et al. (Springer, Vienna, 2013),

pp. 205–216.

L. Vanneschi, M. Castelli, S. Silva. 2014. A survey of semantic methods in genetic

programming. Genet.Program Evolvable Mach. 15(2), 195–214.

Lee, Yi Shian, & Tong, Lee Ing. 2011. Forecasting time series using a methodology

based on autoregressive integrated moving average and genetic programming.

Knowledge-Based Systems, 24(1), 66–72.

Moraglio, A., Krawiec K. Johnson C.G. 2012. Geometric Semantic Genetic

Programming. In: Coello Coello, C.A., Cutello, V., Deb, K., Forrest, S., Nicosia,

G., Pavone, M. (eds.) PPSN XII, Part I. LNCS, vol. 7491, pp. 21–31. Springer,

Heidelberg .

R. Poli, W. Langdon, & McPhee., N. 2008. A Field Guide to Genetic Programing.

Santini, Massimo, & Tettamanzi, Andrea. 2001. Genetic Programming for Financial

Time Series Prediction. Genetic Programming, Proceedings of EuroGP’2001,

2038, 361–370.

Schwaerzel, Roy, & Bylander, Tom. 2006. Predicting Financial Time Series by

Genetic Programming with Trigonometric Functions and High-Order Statistics,4.

Library, 955–956.

45

Shen, Shunrong, Jiang, Haomiao, & Zhang, Tongda. 2012. Stock market forecasting

using machine learning algorithms. Department of Electrical Engineering,

Stanford University, 1–5.

Sheta, Alaa, Faris, Hossam, & Alkasassbeh, Mouhammd. 2013. A Genetic

Programming Model for S&P 500 Stock Market Prediction,6. International

Journal of Control and Automation, 303–314.

Smola, a J, & Scholkopf, B. 2004. A tutorial on support vector regression,14.

Statistics and Computing, 199–222.

Vanneschi, Leonardo, Silva, Sara, Castelli, Mauro, & Manzoni, Luca. 2014.

Geometric Semantic Genetic Programming for Real Life Applications,in Genetic

Programming Theory and Practice XI, Springer, New York, pp.191–209. 191–209.

Zhang, Tong. 2001. An Introduction to Support Vector Machines and Other Kernel-

Based Learning Methods,22. AI Magazine, 103.

46

Date post:	01-Aug-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Forecasting Stock Markets Using Machine Learning · stock prices are extremely complex to model....

Documents