Download - Forecasting stock market movement direction with support vector machine

Tunisia Polytechnic School

Data analysis project

Presented byMohamed DHAOUI (3rd year engineering student)

[email protected]@gmail.com Academic Year : 2015-

20161

Forecasting stock market movement direction withsupport vector machine

2

How SVM works?

Problematic and motivations

Experiment design and results

Conclusion

PLAN

3


• The financial market is a complex, evolutionary, and non-linear dynamical system.

• The financial forecasting is characterized by data intensity, noise, non-stationary, unstructured nature, high degree of uncertainty, and hidden relationships.

• Movements in market prices are not random. Rather, they behave in a highly non-linear, dynamic manner.

In this paper, we investigate the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index.

Financial market

4


• Support vector machine (SVM) is a very specific type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution.

• SVM is shown to be very resistant to the over-fitting problem.

• Training SVM is equivalent to solving a linearly constrained quadratic programming problem so that the solution of SVM is always unique and globally optimal.

Support Vector Machine

5


• The NIKKEI 225 Index measures the composite price performance of 225 highly capitalized stocks trading on the Tokyo Stock Exchange (TSE), representing a broad cross-section of Japanese industries.

• There are two basic reasons for the success of these index trading vehicles: - They provide an effective means for investors to hedge against

potential market risks. - They create new profit making opportunities for market speculators

and arbitrageur.

NIKKEI 225 index

6

How SVM works? Linearly separable data

For a two-class linearly separable learning task, the aim of SVC is to find a hyperplane that can separate two classes of given samples with a maximal margin.

-> good classification performance-> guarantees high predictive accuracy for the future data

the margin corresponds to the shortest distance between the closest data points to any point on the hyperplane

-> The smallest distance is called the margin of separation -> The hyperplane is called optimal separating hyperplane if the margin is maximized

7


8


9

How SVM works?

primal problem

Linearly separable data

10


11


12

How SVM works? Linearly inseparable data

Introducing a new function: -> a feature map mapping the input space to a usually high dimensional feature space where the data points become linearly separable.

is an upper bound on the number of training errors

Controls the trade-off between complexity of the machine and the number of inseparable points.

Introduced to account for the amount of a violation of classification by the classifier

13


14


15


-> Introducing Kerner

16


• term structure of interest rates (TS)• short-term interest rate (ST)• long-term interest rate (LT)• consumer price index (CPI)• industrial production (IP)

The economy growth has a close relationship with Japanese export. The largest export target for Japan is the United States of America (USA), which is the leading economy in the world. Therefore, the economic condition of USA inRuences Japan economy

• S& P 500 Index is a well-known indicator of the economic condition in USA• The exchange rate of US Dollars against Japanese Yen (JPY)

Input variables

17


-> The behaviors of the NIKKEI 225 Index, the S& P 500 Index and Japanese Yen are very complex. It is impossible to give an explicit formula to describe the underlying relationship between them

18

Experiment design and results Data collection

• Source: from the finance section of Yahoo and the Pacific Exchange Rate Service provided by Professor Werner Antweiler, University of British Columbia, Vancouver, Canada, respectively.

• Periode: from January 1, 1990 to December 31, 2002

• Number of observations: total of 676 pairs of observations:- (640 pairs of observations) is used to determine the speci1cations of the

models and parameters. The second part- (36 pairs of observations) is reserved for out-of-sample evaluation and

comparison of performances among various forecasting models.

19

Experiment design and results Comparaison with other forecasting methods

• To evaluate the forecasting ability of SVM, we use the random walk model (RW) as a benchmark for comparison

• RW is a one-step-ahead forecasting method, since it uses the current actual valueto predict the future value as follows:

• We also compare the SVM’s forecasting performance with that of linear discriminant analysis (LDA) a quadratic discriminant analysis (QDA)

20


• LDA: This method maximizes the ratio of between-class variance to the within-class variance in any particular data set, thereby guaranteeing maximal separability.

• QDA: It is similar to LDA, only dropping the assumption of equal covariance matrices. Therefore, the boundary between two discrimination regions is allowed to be a quadratic surface

Comparaison with other forecasting methods

21

Experiment design and results Combining model

A combining model by integrating SVM with other classi1cation methods as follows

Where wi is the weight assigned to classification method I ->

A well-performed forecasting method should be given a larger weight than the others during the score combination

22


• The relative performance of the models is measured by hit ratio

Table: Forecasting performance of different classi1cation methods

23


RW performs worstWhy?• All historic information is summarized in the current value• increments–positive or negative are uncorrelated (random) -> in the long run there are as many positive as negative Ructuations making long term predictions other than the trend impossible

SVM performs bestWhy?• SVM is designed to minimize the structural risk, whereas the previous techniques are usually

based on minimization of empirical risk• SVM is usually less vulnerable to the over-fitting problem

QDA out-performs LDA in term of hit ratio, because LDA assumes that all the classes haveequal covariance matrices, which is not consistent with the properties of input variable belongingto different classes

24

Conclusion

• The use of support vector machines to predict financial movement direction. SVM is a promising type of tool for financial forecasting

• SVM is superior to the other individual classi1cation methods in forecasting weekly movement direction of NIKKEI 225 Index

• Each method has its own strengths and weaknesses

• The weakness of one method can be balanced by the strengths of another by achieving a systematic effect

The combining model performs best among all the forecasting methods.

25

Thank you for your attention