Tunisia Polytechnic School
Data analysis project
Presented byMohamed DHAOUI (3rd year engineering student)
[email protected]@gmail.com Academic Year : 2015-
20161
Forecasting stock market movement direction withsupport vector machine
2
How SVM works?
Problematic and motivations
Experiment design and results
Conclusion
PLAN
3
Problematic and motivations
• The financial market is a complex, evolutionary, and non-linear dynamical system.
• The financial forecasting is characterized by data intensity, noise, non-stationary, unstructured nature, high degree of uncertainty, and hidden relationships.
• Movements in market prices are not random. Rather, they behave in a highly non-linear, dynamic manner.
In this paper, we investigate the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index.
Financial market
4
Problematic and motivations
• Support vector machine (SVM) is a very specific type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution.
• SVM is shown to be very resistant to the over-fitting problem.
• Training SVM is equivalent to solving a linearly constrained quadratic programming problem so that the solution of SVM is always unique and globally optimal.
Support Vector Machine
5
Problematic and motivations
• The NIKKEI 225 Index measures the composite price performance of 225 highly capitalized stocks trading on the Tokyo Stock Exchange (TSE), representing a broad cross-section of Japanese industries.
• There are two basic reasons for the success of these index trading vehicles: - They provide an effective means for investors to hedge against
potential market risks. - They create new profit making opportunities for market speculators
and arbitrageur.
NIKKEI 225 index
6
How SVM works? Linearly separable data
For a two-class linearly separable learning task, the aim of SVC is to find a hyperplane that can separate two classes of given samples with a maximal margin.
-> good classification performance-> guarantees high predictive accuracy for the future data
the margin corresponds to the shortest distance between the closest data points to any point on the hyperplane
-> The smallest distance is called the margin of separation -> The hyperplane is called optimal separating hyperplane if the margin is maximized
7
How SVM works? Linearly separable data
8
How SVM works? Linearly separable data
9
How SVM works?
primal problem
Linearly separable data
10
How SVM works? Linearly separable data
11
How SVM works? Linearly separable data
12
How SVM works? Linearly inseparable data
Introducing a new function: -> a feature map mapping the input space to a usually high dimensional feature space where the data points become linearly separable.
is an upper bound on the number of training errors
Controls the trade-off between complexity of the machine and the number of inseparable points.
Introduced to account for the amount of a violation of classification by the classifier
13
How SVM works? Linearly inseparable data
14
How SVM works? Linearly inseparable data
15
How SVM works? Linearly inseparable data
-> Introducing Kerner
16
Experiment design and results
• term structure of interest rates (TS)• short-term interest rate (ST)• long-term interest rate (LT)• consumer price index (CPI)• industrial production (IP)
The economy growth has a close relationship with Japanese export. The largest export target for Japan is the United States of America (USA), which is the leading economy in the world. Therefore, the economic condition of USA inRuences Japan economy
• S& P 500 Index is a well-known indicator of the economic condition in USA• The exchange rate of US Dollars against Japanese Yen (JPY)
Input variables
17
Experiment design and results
-> The behaviors of the NIKKEI 225 Index, the S& P 500 Index and Japanese Yen are very complex. It is impossible to give an explicit formula to describe the underlying relationship between them
18
Experiment design and results Data collection
• Source: from the finance section of Yahoo and the Pacific Exchange Rate Service provided by Professor Werner Antweiler, University of British Columbia, Vancouver, Canada, respectively.
• Periode: from January 1, 1990 to December 31, 2002
• Number of observations: total of 676 pairs of observations:- (640 pairs of observations) is used to determine the speci1cations of the
models and parameters. The second part- (36 pairs of observations) is reserved for out-of-sample evaluation and
comparison of performances among various forecasting models.
19
Experiment design and results Comparaison with other forecasting methods
• To evaluate the forecasting ability of SVM, we use the random walk model (RW) as a benchmark for comparison
• RW is a one-step-ahead forecasting method, since it uses the current actual valueto predict the future value as follows:
• We also compare the SVM’s forecasting performance with that of linear discriminant analysis (LDA) a quadratic discriminant analysis (QDA)
20
Experiment design and results
• LDA: This method maximizes the ratio of between-class variance to the within-class variance in any particular data set, thereby guaranteeing maximal separability.
• QDA: It is similar to LDA, only dropping the assumption of equal covariance matrices. Therefore, the boundary between two discrimination regions is allowed to be a quadratic surface
Comparaison with other forecasting methods
21
Experiment design and results Combining model
A combining model by integrating SVM with other classi1cation methods as follows
Where wi is the weight assigned to classification method I ->
A well-performed forecasting method should be given a larger weight than the others during the score combination
22
Experiment design and results
• The relative performance of the models is measured by hit ratio
Table: Forecasting performance of different classi1cation methods
23
Experiment design and results
RW performs worstWhy?• All historic information is summarized in the current value• increments–positive or negative are uncorrelated (random) -> in the long run there are as many positive as negative Ructuations making long term predictions other than the trend impossible
SVM performs bestWhy?• SVM is designed to minimize the structural risk, whereas the previous techniques are usually
based on minimization of empirical risk• SVM is usually less vulnerable to the over-fitting problem
QDA out-performs LDA in term of hit ratio, because LDA assumes that all the classes haveequal covariance matrices, which is not consistent with the properties of input variable belongingto different classes
24
Conclusion
• The use of support vector machines to predict financial movement direction. SVM is a promising type of tool for financial forecasting
• SVM is superior to the other individual classi1cation methods in forecasting weekly movement direction of NIKKEI 225 Index
• Each method has its own strengths and weaknesses
• The weakness of one method can be balanced by the strengths of another by achieving a systematic effect
The combining model performs best among all the forecasting methods.
25
Thank you for your attention