+ All Categories
Home > Documents > Preprocessing Data: A Study on Testing Transformations for...

Preprocessing Data: A Study on Testing Transformations for...

Date post: 22-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
95
INOM EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP , STOCKHOLM SVERIGE 2019 Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data SARA BARWARY TINA ABAZARI KTH SKOLAN FÖR TEKNIKVETENSKAP
Transcript
Page 1: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

INOM EXAMENSARBETE TEKNIK,GRUNDNIVÅ, 15 HP

, STOCKHOLM SVERIGE 2019

Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data

SARA BARWARY

TINA ABAZARI

KTHSKOLAN FÖR TEKNIKVETENSKAP

Page 2: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 3: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Preprocessing Data: A Study on Testing Transformations for Stationarity of Financial Data SARA BARWARY TINA ABAZARI

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2019 Supervisors Rickard Henricsson, Peyman Dabiri & Cecilia Pettersson Supervisors at KTH: Camilla Landén, Per Jörgen Säve-Söderbergh & Julia Liljegren Examiner at KTH: Per Jörgen Säve-Söderbergh

Page 4: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

TRITA-SCI-GRU 2019:270 MAT-K 2019:29

Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Page 5: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Abstract

In thesis within Industrial Economics and Applied Mathematics in cooperation

with Svenska Handelsbanken given transformations was examined in order to

assess their ability tomake a given time series stationary. In addition, a parameter

α belonging to each of the transformation formulas was to be decided. To do this

an extensive study of previous research was conducted and two different tests of

hypothesis where obtained to confirm output. A result was concluded where a

value or interval for α was chosen for each transformation. Moreover, the first

difference transformation is proven to have a positive effect on stationarity of

financial data.

Sammanfattning

Det här kandidatexamensarbetet inom Industriell Ekonomi och tillämpad

matematik i samarbete med Handelsbanken undersöker givna transformationer

för att bedöma deras förmåga att göra givna tidsserier stationära. Dessutom

skulle en parameter α tillhörande varje transformations formel bestämmas. För

att göra detta utfördes en omfattande studie av tidigare forskning och två olika

hypotestester gjordes för att bekräfta output. Ett resultat sammanställdes där ett

värde eller ett intervall för α valdes till varje transformation. Dessutom visade

det sig att ”first difference” transformationen är bra för stationäritet av finansiell

data.

Keywords

Bachelor Thesis, financial outcome, transformations, stationarity, tests of

hypothesis, EWMA

i

Page 6: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 7: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

1 Preface

This Bachelor’s thesis was written in the spring of 2019 by Sara Barwary and

Tina Abazari during a five-years Master’s program within Industrial Engineering

and Management at KTH Royal Institute of Technology. The thesis is based on

application of theory frommathematical statistics as well as the field of industrial

economics. We would like to thank Cecilia Pettersson, Rickard Henricsson

and Peyman Dabiri at Handelsbanken for contributing to the work and giving

resources needed. We would also like to express appreciation to our supervisor

Camilla Landén and additionally Per Jörgen Säve- Söderbergh at KTH for helping

and giving support when facing problems throughout the work. Julia Liljegren

at the department of Industrial Engineering and Management also provided

valuable input and guidance to the project.

ii

Page 8: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 9: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Contents

1 Preface ii

2 Introduction 12.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Goal and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Economic Theory 63.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.2 Market Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.3 Exchange Rates . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.4 Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.5 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.6 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Timing of Entry Framework . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 First Mover Advantages . . . . . . . . . . . . . . . . . . . . . 10

3.2.2 First Mover Disadvantages . . . . . . . . . . . . . . . . . . . 10

3.3 Porter’s Five Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Mathematical Theory 134.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 The Objectives of Time Series Analysis . . . . . . . . . . . . 13

4.1.2 Time Series Decomposition . . . . . . . . . . . . . . . . . . . 14

4.1.3 Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.4 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.5 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Stationarity Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 18

4.2.1 Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test . . . . . 20

4.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 Level Transformation . . . . . . . . . . . . . . . . . . . . . . 23

iii

Page 10: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4.3.2 First Difference Transformation . . . . . . . . . . . . . . . . 23

4.3.3 Mean EWMA-transformation . . . . . . . . . . . . . . . . . 24

4.3.4 Variance-EWMA Transformation . . . . . . . . . . . . . . . 24

4.3.5 Skewness EWMA Transformation . . . . . . . . . . . . . . . 25

4.3.6 Kurtosis-EWMA Transformation . . . . . . . . . . . . . . . . 25

4.3.7 Autocorrelation Transformation . . . . . . . . . . . . . . . . 25

4.3.8 Correlation-EWMA Transformation . . . . . . . . . . . . . . 26

5 Methodology 285.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Data and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Exchange Rates (FX data) . . . . . . . . . . . . . . . . . . . 29

5.2.2 US Sectors Data . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.3 Countries- Stock Index Data . . . . . . . . . . . . . . . . . . 29

5.2.4 Commodities Data . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.5 VIX- Market Volatility Index Data . . . . . . . . . . . . . . . 30

5.2.6 Bond (IR) Data . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3 Selection of Transformations and Hypothesis Tests . . . . . . . . . 31

5.4 Selection of Market Entry Frameworks . . . . . . . . . . . . . . . . 31

5.5 Literature Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.6 Procedure of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Results 406.1 First Trial: Plots for Currency Rates, with Fixed α . . . . . . . . . . 40

6.1.1 Statistics for First Trial . . . . . . . . . . . . . . . . . . . . . 45

6.2 Second Trial: Plots for Commodity, with a Fixed α . . . . . . . . . . 47

6.2.1 Statistics for Second Trial . . . . . . . . . . . . . . . . . . . . 50

6.3 Third Trial: Plots for Commodity Prices, with a Fixed α . . . . . . . 51

6.3.1 Statistics with trial 3 . . . . . . . . . . . . . . . . . . . . . . . 54

6.4 Seasonality and Trends . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.5 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.6 First Differences on all Data . . . . . . . . . . . . . . . . . . . . . . 59

6.7 Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . . . . . 60

iv

Page 11: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.7.1 Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.7.2 US-Sectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.7.3 Countries Index . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.7.4 Commodities . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.7.5 VIX (Market Volatility) . . . . . . . . . . . . . . . . . . . . . 63

6.7.6 IR (Bonds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.7.7 Aggregated α . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Conclusions 657.1 Interpretation and Impact . . . . . . . . . . . . . . . . . . . . . . . 65

7.1.1 Trial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.1.2 Trial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.1.3 Trial 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.1.4 Skewness and Kurtosis . . . . . . . . . . . . . . . . . . . . . 66

7.1.5 First Difference as a Transformation . . . . . . . . . . . . . . 66

7.1.6 Finding the Optimal α . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Analysis of Timing of Entry and Competitive Rivalry . . . . . . . . . 69

7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.4 Benefits for SHB and its Stakeholders . . . . . . . . . . . . . . . . . 73

7.5 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

v

Page 12: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 13: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

2 Introduction

2.1 Background

The last couple of years machine learning based forecasting has gained attention

increasingly and becomemore established. Moreover, usingmachine learning for

prediction of financial outcome has become desirable among financial institutions

and private investors.1 There are ongoing discussions and research about how to

improve these prediction models, as well as about how to pre-process input data

in order to obtain predictions with high accuracy.2 This is why machine learning

has become essential since the effective method combines computer science and

mathematics to develop models with the intent of delivering maximal predictive

precision.

Predictions of financial outcome, for example security prices or market indices,

involve a time component since future price movements may be dependent on

past values. Thus, the time dimension needs to be taken into account when

using a machine learning based prediction model. These prices in the financial

market can be seen as observations at points in time. Financial price over a time

period can therefore be described as a time series. As mentioned, the interest for

using machine learning for prediction of price movements in the financial market

has grown. Consequently, time series forecasting has become an increasingly

important area of machine learning.3

The underlying assumption in time series forecasting and the related machine

learning methods is that the input data, is a stationary process. That is, the

statistical properties for example the mean, variance and autocorrelation of the

time series should not change over time.4 However, most data is not stationary.

1Sarlin, Peter. Björk, Kaj-Mikael.”Machine learning in finance”. Neurocomputing. Vol. 264,2017: 1-88, Retrieved 2019-02-02

2Palaniappan, Vivek.”Using Machine Learning to Predict Stock Prices” 2018-10-31 https://medium.com/analytics-vidhya/using-machine-learning-to-predict-stock-prices-c4d0b23b029a (Retrieved 2019-02-02)

3Brownlee, Jason. ”What is Time Series Forecasting?”. Machine Learning Mastery. 2016-12-2https://machinelearningmastery.com/time-series-forecasting/?fbclid=IwAR1Zpv80x-4EEN-IIo-h1HL5fGHF6fD-OZYpknScLWdmU-p3uJ803ZF9Ag(Retrieved2019− 05− 01)

4Lindgren, George. ”Stationary stochastic processes”p.13-16http://www.math.chalmers.se/ rootzen/fintid/stationary120312.pdf (Retrieved 2019-02-02)

1

Page 14: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

As the time span of historical observations increases, the greater is the probability

of the time series showing non-stationary characteristics.5

For many machine learning methods, handling non-stationary data sets is a

challenge since it could increase the risk of obtaining prediction outcomes

significantly different from the real outcomes. Non-stationary time series is

a result from data showing trends, seasonal effects, cycles, noise and other

structures dependent on the time observation. Therefore, it cannot be analyzed

through traditional techniques. Instead, forecasting non-stationary time series

may require models with higher complexity. In order to facilitate achieving more

reliable output from a prediction model effects such as seasonal components and

trends may need to be removed from the input data set.6 It is possible to make

data stationary, or at least approximately stationary by the use of mathematical

transformations.

In the last couple of months, Svenska Handelsbanken AB (SHB) has been

discussing a market entry for new financial products. The idea is to predict the

return of the securities with a machine learning based model, which the products

can be based upon in the future.

Richard Henricsson at SHB conducted research ten years ago regarding

mathematical transformations and their ability to generate stationary financial

data. As a result of considering this potential business idea, the question has been

raised by SHB regarding whether the transformations are still applicable to data

today. Henricsson found several transformations, including both established ones

and his own approximations. The approximations were derived with the aim to

reduce complexity of some of the transformations. His studies resulted in seven

chosen transformations.

• Differencing (First order)• Exponentially weighted moving average: Mean• Exponentially weighted moving average: Variance

5Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p.16-19 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-02-19)

6Kang,Eugine. ”TimeSeries: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-stationarity-1bee9085da05 (Retrieved 2019-02-23)

2

Page 15: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

• Exponentially weighted moving average: Skewness• Exponentially weighted moving average: Kurtosis• Exponentially weighted moving average: Autocorrelation• Exponentially weighted moving average: Correlation

The definition and meaning of these will be explained more thoroughly in the

theoretical background, Section 4. Except for the first difference transformation,

the other transformations depend on a unknown constant α. Changing the

value of α will result in different output obtained from each transformation.

Consequently, the choice of α for a chosen transformation may have an impact on

whether the data can be made stationary. In accordance, this has raised interest

for SHB to examine the specific values for α to potentially make financial time

series stationary.

Furthermore, SHB is one of the biggest banks in the Nordic countries. In

the Nordic financial sector, there are not many commercial players today

providing financial products related to machine learning based financial outcome

prediction. Consequently, SHB has the potential to be among the first players in

this area. It is therefore of interest for SHB to understand how the timing of entry

to market can affect their business.

2.2 Research Question

The work of this thesis was done in cooperation with SHB. The main research

question is to examine whether financial data can be transformed to become

stationary, and forwhat value or values of the parameterα stationarity is achieved.

The time span of all the data sets is 2001-01-01 to 2018-12-31. The main research

questions to be answered are consequently the following:

1. Are the given transformations sufficient enough to make the data stationary?

2. Which parameter value or values of α for each transformation will make the

data potentially stationary?

Also, a discussion will be held regarding the effects of the timing of entry to a new

market. More precisely, given that the transformations can make financial data

3

Page 16: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

stationary and SHB can develop financial products based on machine learning

financial outcome prediction, how will the timing of potential new product

launches affect their competitive advantage.

2.3 Goal and Purpose

Predicting financial outcome for securities is relevant for private investors as well

as for financial institutions. The goal of this thesis is to examine how financial

data may be pre-processed in order to make it useful as input data to a prediction

model.

The main goal for SHB is to, based on the results of this thesis, separately develop

a machine learning based forecasting model for prediction of financial outcome.

More precisely, their model will indicate future price movements in the financial

market, mainly for stocks inwell developed countries such as theUS. For example,

this can be stock indices from theUS such asDow Jones Industrial Average or S&P

500.

Since SHB is currently in the initial phase of the model development, it is of

importance for them to know if it is possible to make input data for the future

model stationary. The goal of this research is to provide an insight regarding the

question. If it not possible tomake the data stationary, itmay be required for them

to consider conducting further research on building a model on non-stationary

data. Alternatively, this thesis can answer whether it is needed for SHB to conduct

further research regarding how to make data stationary. Therefore, the greater

purpose of this study is to give a direction for the future work for SHB.

In the market entry discussion, the market of SHB will be limited to other banks

institutions in Sweden/the Nordics since SHB:s main business activity lies within

this area.

4

Page 17: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

2.4 Scope and Limitations

The scope of this thesis is limited to examining transformations givenby SHB.Also

the data is provided by SHB and it is mainly related to the financial markets of the

US and other well developed countries. Moreover, it is necessary to determine

what qualifies as stationarity since there exists a strong and weak form. For this

research, it has been decided that it is sufficient if a time series only fulfills the

requirements of weak stationarity since proving strict stationarity for a whole data

set is complex. The difference between these types of stationarity is explained

in Section 4.1.5.7 Moreover, the project will be limited to only two different

hypothesis tests, both chosen by SHB. These were chosen since they are based

on different model assumptions and hypotheses and may therefore give a wider

perspective to the analysis of the results.

7”Stationarity Differencing”https://www.statisticshowto.datasciencecentral.com/stationarity/ (Retrieved 2019-03-02)

5

Page 18: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

3 Economic Theory

Terminology related to the financial market that are mentioned in the thesis or

used as input data in the research are explained in this section. The purpose

is to facilitate obtaining an understanding of the content of this thesis. Theory

regarding stocks, bonds and other financial assets will be provided to understand

why they are important to look at when studying an economy. Moreover a model

of Porters Five Forces will be introduced and discussed as well as benefits with

being a first mover to the market.

3.1 Terminology

3.1.1 Securities

A security is a financial asset that can be traded. There are are several types of

securities and these are in general classified as equity securities, debt securities

and derivatives.

Equity securities represents ownership in an entity. The most common equity

security is a stock, which is an ownership of a share of a company.8

A holder of a debt security borrows money which later must be repaid. For

instance, when a debt security is issued, different terms are formulated for

example, for the size of the loan, the maturity date and the interest rate.

Corporate bonds and government bonds are examples of two frequently debt

securities.9

Derivatives are contracts between at least two parties. The value of the contract

is based on an underlying asset such as a stock, a market index, interest rate or a

market index. There are various derivatives, such as options and futures.10

8Kenton, Will. ”Security”, 2019-05-20. https://www.investopedia.com/terms/s/security.asp(Retrieved 2019-05-22)

9Chen,James. ”Debt Security”, 2019-03-23. https://www.investopedia.com/terms/d/debtsecurity.asp(Retrieved 2019-05-20)

10Chen, James. ”What is a Derivate?”, 2019-05-19.https://www.investopedia.com/ask/answers/12/derivative.asp (Retrieved 2019-05-22)

6

Page 19: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

3.1.2 Market Index

A market index is a measurement of a segment of the financial market. More

precisely, the index shows the performance of the securities within the chosen

segment. A market index is computed from the prices of the securities. There are

several weighting methods for determining the impact of each price. 11

3.1.3 Exchange Rates

An exchange rate is the value of an economic zone’s currency compared to the

currency of another nation or a specific economic zone. The currency exchange

rate is one of the most important factors to use when indicating a country’s

economic health relative to others. It is vital to a country’s level of trade and

financial flows in the area.12 Movements in the exchange rate has an influence on

the decisions of businesses, government and individuals in society. Collectively,

this may have an effect on the activity on the financial markets (for example on

how people trade and how securities are valued).13

3.1.4 Commodities

Commodities are basic goods used in commerce and as input in productions

of both products and services. The price of it is usually decided by the whole

market. It could be anything from raw material to chemicals sold. Commodities

aremost commonly sold and purchased through future contracts that standardize

the quantity and minimum quality of the commodity that is being traded. The

market of commodities is important since it offers a market place wheremembers

can transact business. It also establishes a regulated trading with rules and

11Young,Julie. ”Market Index”, 2019-05-02. https://www.investopedia.com/terms/m/marketindex.asp(Retrieved 2019-05-22)

12Twin, Alexandra. ”6 Factors that Influence Exchange Rate”, 2019-05-20. https://www.investopedia.com/trading/factors-influence-exchange-rates/ (Retrieved 2019-05-20)

13Hamilton, Adam. ”Understanding Exchange Rates and Why They Are Important”,2018. https://www.rba.gov.au/publications/bulletin/2018/dec/pdf/understanding-exchange-rates-and-why-they-are-important.pdf (Retrieved 2019-05-20)

7

Page 20: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

regulations. Moreover it is a place for collecting and disseminating as well as

grading of the commodities depending on quality.14

One example that will be used in the thesis is the the spot price of crude oil which

is considered one of the most important commodities in the world. Since today’s

society and economy is dependent on non-renewable fossil fuels crude oil plays an

important role in the market of commodities. The cost of a barrel of crude oil is

determined by the global market, more precisely the supply and demand of it. For

example, if the demand for crude oil is high and the supply is low, the result will

be higher oil prices. This is important for economists and experts to predict since

the prices are volatile. The price of oil can directly or indirectly through multiple

steps affect the costs of goods and services in the economy which can result in

inflation. The West Texas Intermediate crude oil is considered one of the major

benchmarks of crude oil.15

3.1.5 Volatility

Volatility is the standard deviation the return of an asset. The standard deviation

is the square root of the variance. Both variance and standard deviation measure

the variability of a return.

The volatility is as an indicator of the risk level for an assets, for instance a security,

portfolio ormarket. It is expected to bemore challenging to predict the price of an

highly volatile asset. Consequently volatile assets are viewed as riskier compared

to less volatile assets. Shortly, volatility is considered as the risk related to the

change in the asset’s price.

The VIX Index is an example of a market volatility measure. Before making an

investment decision, investors normally look at the VIX values to gain insight

about the market risk.16

14Lioudis, Nick. ”CommoditiesTrading: An Overview”, 2018-05-18. https://www.investopedia.com/investing/commodities-trading-overview/(Retrieved 2019-05-20)

15Premkumar,Divya. ”How do oil prices affect stock market”, 2019-01-08. https://www.tradebrains.in/how-do-oil-prices-affect-the-stock-market/(Retrieved 2019-05-01)

16Kuepper, Justin. ”Volatility Definition”, 2019-04-18.https://www.investopedia.com/terms/v/volatility.asp (Retrieved 2019-05-01)

8

Page 21: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

3.1.6 Bonds

A bond is a fixed income instrument that is a loan made by an investor to a

borrower. When companies or other financial institutions need to finance new

projects, ongoing operations or other financial investors they can issue bonds

directly to investors. The borrower, the one that issued the bond, for example

includes terms of the loan, interest payments and maturity date. The interest

payment, the coupon, is the earning for bondholders for loaning their funds.

The interest rate that determines the payment is called the coupon rate. A

government bonds is a bond issued by the government. Treasury yield is the

return on investment on the U.S. government’s debt obligations. It is important

when analysing stocks since it tends to signal investor confidence. When it

is high the bond’s price drops and yield increase since investors believe they

can find investments with higher return. When confidence is low, the opposite

occurs.

Bonds will affect the amount of liquidity in countries since it determines how easy

or difficult it will be to take loans and buy on credit for example. Since the bonds

are so strongly related to the economy it means they are important for forecasting.

Bond yields will indicate what investors think the economy will do.17

3.2 Timing of Entry Framework

When firms are about to enter a new market, either by launching a new product

or expanding to new regions, one main concern is regarding when to enter the

market. Entrants are usually divided into three categories depending on their

time of entrance. These are the firstmovers, early followers and the late entrants.

Earlier research have resulted in contradictory answers to the question of which

entry timing strategy is the optimal and why.

The first movers of a market are the first to bring and sell a new good or service

to the market. Early followers are relatively early to the market, even though

17Amadeo, Kimberly. ”How BondsAffect the U.S. Economy”, 2019-01-20. https://www.thebalance.com/how-do-bonds-affect-the-us-economy-3305601 (Retrieved 2019-05-01)

9

Page 22: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

they are not the first to enter. Lastly, the late entrants are seen when a product

is becoming or has become more commercial, in other words when the product

gains mass market penetration.

3.2.1 First Mover Advantages

The theory of timing of entry also covers the advantages and disadvantages of

being the first mover. According to theory, the first mover will gain brand loyalty

and technological leadership. Additionally, first movers have more time on the

market, enabling them to gain more market share. This could eventually result

in a Winner-Takes-All Market. The reasons is that the company may be posed

as a technological innovator and gain reputation as a leader. Being the first also

enables the player to develop the characteristics of the technology, for instance its

features, functionality of the technology, as well as forming the pricing.

Firms that enter the market early can capture important resources such as key

locations, government permits, patents to the technology, access to distribution

channels and develop relationships with suppliers. Another advantage with being

early is exploiting buyer switching costs. In other words if a buyer faces switching

costs when changing to other superior technology and has invested time in the

technology, the first mover that captures customers may be able to keep those

customers. If the industry pressures and encourages the adoption of a dominant

design the timing of the entry could be critical to its likelihood of success.

3.2.2 First Mover Disadvantages

Studies have shown that many first movers are exposed to higher costs, which

reduce the profits of their businesses. To become the first mover, it may be

required to add resources to research and development work. The late entrants

have on the other hand the possibility to use already existing work, technology

and knowledge developed by the first mover, to create a similar product. They

can also adapt the product or service development to the customers’ preferences

instead of facing customer uncertainty of requirements. As a result, they can avoid

high development expenses.

10

Page 23: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Another negative aspect is that new developed technologies may require other

technologies or components produced by other firms. Therefore, they are

dependent on the effort of other firms. The first movers can therefore not rely

on enabling technologies. Moreover, when firms introduces new technology and

innovations, often there are no appropriate suppliers or distributors exist. This

will lead to the firm having to assist the suppliers or perhaps develop its own

suppliers which is a time and resource demanding task.

3.3 Porter’s Five Forces

Porter’s Five Forces Framework, developed by Michael Porter, is a tool for

analyzing the market dynamics and the competition of a business. The purpose

of the model is to identify and analyze five competitive forces that shape every

industry and helps determine an industry’s weaknesses and strengths. The

insights are often used to see if new product or service offerings can be profitable.

Also it may be used for answering strategic questions such as how, where and

when a market entry should be done. The five forces are threats of new entrants,

bargaining power of suppliers, bargaining power of customers, threats of

substitute products and competitive rivalry. All together, the four first forces

describe the competitive rivalry.

11

Page 24: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 3.1: Porter’s five forces model and important questions to answer duringthe analysis

12

Page 25: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4 Mathematical Theory

The following section provides information regarding the mathematical theories

and models used in the thesis. It also intends to explain the assumptions which

the models are based upon.

4.1 Time Series

A time series is a series of data points, measured over a time period and indexed

in time order. In other words, values are taken by a variable over time in

chronological order.18 The time series is denoted as a vector {Xt}, t=0,1,2.... wheret represents the time and Xt is seen as a random variable. There exists both

discrete and continuous time series for a time series. For a time t ∈ [0,∞).

4.1.1 The Objectives of Time Series Analysis

The primary objective of time series analysis is the development of mathematical

models that describe the data sample. The purpose is to extract meaningful

statistics and characteristics of the data. There are in general two main goals of

the time series analysis:

1. Identifying the nature of the phenomenon. What does it contain?

2. Forecasting or in other words predicting future values of the time series

variable.

These goals require an identification of the pattern that is observed in the time

series. With this it can be interpreted and integrated with other data for a forecast

model.19

18”Time Series” http://www.businessdictionary.com/definition/time-series.html (Retrieved2019-01-30)

19”Time Series” https://www.stat.ncsu.edu/people/bloomfield/courses/st730/slides/SnS-01-2.pdf (Retrieved 2019-02-02)

13

Page 26: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4.1.2 Time Series Decomposition

Within time series analysis, one can decompose a time series into several

components. Let {Xt} be a sequence of random variables. Then, a time series

can be decomposed either additively as:

Xt = Tt + St + ϵt

or multiplicatively as

Xt = Tt ∗ St ∗ ϵt

where Tt is the trend component at time t, St is the seasonal component at time t

and ϵt is a irregular component at time t.20

Over a long time period a time series may show a general tendency of decrease,

increase or stagnation. This is represented by the trend component in a

decomposition. The seasonal component exhibits patterns affected by seasonal

factors such as the day of the weak or the quarter of the year. The period of

the seasonality is fixed and known. Further, the irregular component portrays

events that do not occur regularly and are of unpredictable characteristics.21

The irregular component corresponds to the residual obtained after the trend

and seasonality have been removed, that is, ϵt is a random noise component.

Additionally, ϵt is stationary at least in the weak (described in Section 4.1.3) sense.22

4.1.3 Trends

Usually one wants to know if there is a trend in the time series to support future

forecasting. In some cases a trend is seen as an accumulated effect of certain

factors and in other cases trends indicate a kind of influence that needs further

investigation. The trend could for example be linear, exponential or even mixed

20Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf (Retrieved 2019-02-16)

21 Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)

22Brockwell, J Peter. Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.20.Third ed, Springer

14

Page 27: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

between different types.23

4.1.4 Seasonality

In time series data, seasonality is a presence of variations that occur at specific

regular intervals for example every autumn. These repeat regularly over time.

Identifying or removing seasonal components could result in a more clear

relationship between the variables that are input and output. It could also provide

information that is helpful for improvement of model performance.24

4.1.5 Stationarity

A stationary assumption is equivalent to saying that the generating mechanism

of the process is itself time-invariant, so that neither the form nor the parameter

values of the generation procedure change over time. A process {Xt}, t ∈ Z (where

Z is the integer set) is defined to be weakly stationary if it satisfies

1. E[Xt] = µ

2. Var[Xt] = σ2x < ∞

3. γX(s, t) = γX(s + h, t + h) for all s, t, h ∈ Z, where γ is the autocovariance

function.

In other words this means that a stochastic process that is stationary will have a

mean and variance that do not change over a time period. Also the autocovariance,

meaning the covariance between the values of the process at two points in

time, will only depend on the distance between the time points and not on

time itself.25 There is also a more restrictive definition of stationarity than

the above mentioned. A time series {Xt1 , Xt2 ..., Xtn , t = 0,±1,±2, ....} is strictlystationary if the same joint probability distribution holds for (Xt1 , ..., Xtn) as for

(Xt1+h, ..., Xtn+h), that is

23Deshpande, Bala. 2014-03-12 ”Time series forecasting:understanding trend and seasonality” http://www.simafore.com/blog/bid/205420/Time-series-forecasting-understanding-trend-and-seasonality (Retrieved 2019-05-01)

24Brownlee, Jason. 2016-12-23 ”How to Identify and Remove Seasonality from Time SeriesData with Python” https://machinelearningmastery.com/time-series-seasonality-with-python/(Retrieved 2019-04-14)

25A. Lincoln. Introduction to the theory of time series, Chapter 1 p.4-6

15

Page 28: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

(Xt1 , ....., Xtn)d= (Xt1+h, ....., Xtn+h)

for all integers h and n>0 .26

The importance of stationarity is great. If the data selection of a time series is non-

stationary the series can very much influence both its behaviour and properties.

Thus, a regression depending on the data points will be hard to prove. Also, if the

variables in a regression model not are surely stationary, the assumptions for the

asymptotic analysis may not be valid.27 Non-stationary time series will depend on

data showing trends, seasonal effects and other structures dependent on the time

observation.28 A time series is usually non-deterministic, hencewhat occurs in the

future can not be predicted with certainty. Therefore, the concept of stationary of

a time series abates the complexity in forecasting the future.29

In order to prove or check for stationarity there are a number of different

approaches that could be useful. The most commonmethods are examining plots

and statistical tests.30 One can run a sequence of plots and examine them to

find any obvious trends or seasonal effect. With this, summary statistics can be

obtained which are used to summarize a set of observations, to communicate as

much of the information as possible. In the process the data is partitioned into

intervals and then it is checked if there are obvious or significant differences in

the summary statistics between them. Statistical tests can provide a method for

making quantitative decisions about a particular sample.

26Brockwell, J Peter . Davis, A Richard. ”Introduction to Time Series and Forecasting”, p.13.Third ed, Springer

27Ryabko,Daniil. ”Asymptotic Nonparametric Statistical Analysis of Stationary Time Series”,2019-03-30 https://arxiv.org/abs/1904.00173 (Retrieved 2019-05-01)

28Kang,Eugine.”TimeSeries: Check Stationarity”, 2018-08-26. https://medium.com/@kangeugine/time-series-check-stationarity-1bee9085da05 (Retrieved 2019-02-23)

29Adhikari, Ratnadip et al. ”An Introductory Study on Time Series Modeling and Forecasting”p. 12-18 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf(Retrieved 2019-03-23)

30”Tests of Stationarity” https://people.maths.bris.ac.uk/ magpn/Research/LSTS/TOS.html(Retrieved 2019-02-12)

16

Page 29: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 4.1: The following graph illustrates a non-stationary time series, a randomwalk that has not been adjusted

Figure 4.2: This figure illustrates the same data but after stationarity is obtainedwith the first difference transformation. As one can see the graph seemsmore likea even line, indicating stationarity.

17

Page 30: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4.2 Stationarity Hypothesis Testing

As mentioned in the limitations to this project, we will only use two different

stationarity tests. These hypothesis tests are used to obtain an indication as to

whether a time series is stationary. However they can not be used as a proof

of stationarity. If the counter hypothesis is rejected, the null-hypothesis is not

confirmed. A non significant result only means it can be concluded that the

counter-hypothesis is not a strong competitor to the null-hypothesis. Also, in

general there can be many other null-hypotheses that also would not have been

rejected.31

4.2.1 Dickey-Fuller Test

Acommonly usedmethod for checking the existence of a unit root is by theDickey-

Fuller test, which was developed by David Dickey and Wayne Fuller (1979).

The Dickey-Fuller hypothesis test gives an indication on whether a process is

stationary or not.32 The test checks if a process follows a unit root process. The

augmented Dickey-Fuller (ADF) test is an expansion of the original Dickey-Fuller

(DF) test, used for higher order correlations, since the Dickey-Fuller is only valid

for AR(1)-processes. An AR(1)-process is an autoregressive process of the first

order. This means that the current value is based on the immediately preceding

value.33 Similar to the originalDF-test, theADF tests for a unit root in a time series

sample. The primary difference is that the ADF is used for more complicated and

larger sets of time series models.34 If there is higher order correlation instead of

only AR(1)- processes the augmented version must be used.

The purpose is to test the null hypothesis, that an unit root is present against the

hypothesis that there is no unit root which indicates that the data is stationary.

31”Hypotesprövning” http://gauss.stat.su.se/gu/sg/2012VT/Kompendium/KAP17new.pdf(Retrieved 2019-05-03)

32 ”ADF — Augmented DickeyFuller Test ” https://www.statisticshowto.datasciencecentral.com/adf-augmented-dickey-fuller-test/ (Retrieved 2019-03-15)

33Pantelis, Anastasios. 2008. ”Testing for unit roots in the presence of structural change”http://lup.lub.lu.se/luur/download?func=downloadFilerecordOId=1338330fileOId=1646631(Retrieved 2019-03-09)

34”The Augmented Dickey-Fuller Test” https://www.thoughtco.com/the-augmented-dickey-fuller-test-1145985 (Retrieved 2019-02-27)

18

Page 31: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Consider the first order autoregressive model

Xt = δ + θXt−1 + ϵt

where θ = 1 corresponds to a unit root and ϵt is a white noise process, with a

constant variance and zeromean. In a stationaryAR(1)-process, the constant term

δ can be expressed as δ = (1− θ)µ, where µ is the mean of the series.

The null hypothesis of a unit root is that θ = 1 which also implies that δ = 0.

Hence, to test the null hypothesis θ = 1 and δ = 0must be shown. This is difficult

to test, therefore the model is rewritten as

∆Xt = δ + (θ − 1)Xt−1 + ϵt = πXt−1 + ϵt

The null hypothesis states that ϕ− 1 = 0 or equivalently π = 0. The hypothesis is

thus formulated as

H0 : π = 0

H1 : π < 0

When the hypotheses are established the Dickey-Fuller test performs a t-test on

H0. With the test one obtains a critical value τ̂ , which is a point in the test

distribution and is compared to the test statistics.

τ̂ =ϕ̂− 1

SE(ϕ)=

π̂

SE(π̂)

35Whenperforming the ADF test, p-value<0.05 indicates strong evidence against

the null hypothesis. Thus, stationarity is not rejected. On the other hand, p-

35Verbeek, Marno.”A Guide to Modern Econometrics” 2014, 2nd Edition, p.265-268

19

Page 32: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

value≥ 0.05, then evidence against the null-hypothesis is weak, hence stationarity

of the time series can be rejected.

4.2.2 Kwiatkowski–Phillips–Schmidt–Shin (KPSS)-Test

The KPSS-test is a test of the stationarity hypothesis proposed by Kwiatkowski,

Phillips, Schmidt and Shin (1990). Similar to the Dickey-Fuller test, the

characteristics of the KPSS-test is that it gives an indication on whether there

exists a unit root or the process is stationary.36

Let Xt, t = 1,2,...T be a time series of observed values. Assume, the series can be

decomposed into a deterministic trend, a random walk, and a stationary error.

The data generating process (DGP) of Xt in KPSS can then be defined as

Xt = Yt + ϵt + ξt

where Yt is the deterministic trend term, ϵt is the error term, and ξt is the random

walk term, so that

ξt = ξt−1 + ηt

.

By definition of the random walk ηt∼ iid(0,σ2).37 If σ2=0meaning the variance of

ηt is zero, then it holds that

ξt = ξt−1

That is, the random walk process devolves to a constant term and Xt becomes

36”What isa Critical Value?”, 2019. https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/basic-statistics/inference/supporting-topics/basics/what-is-a-critical-value/ (Retrieved 2019-05-04)

37Nabeya, Seiji et al. ”Asymptotic Theory of a Test for the Con-stancy of Regression Coefficients Against the Random Walk alternative”1987. https://projecteuclid.org/download/pdf1/euclid.aos/1176350701?fbclid =IwAR2Rt2XpMITexA880DiEC4qzo8V EjzmA7HjMKNyp3mKSoKSAXhOaY Ff85c(Retrieved2019−04− 30)

20

Page 33: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

trend-stationary, meaning that the series grows around the deterministic trend.

Consequently, the null hypothesis can be formulated as

H0 : σ2 = 0

H1 : σ2 > 0

Under the null hypothesis the process is trend-stationary (and the counter

hypothesis implies that Xt, t = 1, 2...T is a unit root process).38 To reduce

complexity, the deterministic component of the series may also be removed, Yt =

0. This is a special case for which the null hypothesis is that Xt is level-stationary

around a level or mean (ξ0) instead of around a trend, meaning that the mean

value no longer depends on t.39 A statistic that can be used for the null hypothesis

is the LM statistic, which is defined as

LM =T∑i=1

St2/σ̂2

t

where

S2t =

t∑i=1

ei

.

That is, S2t is the squared partial sum of the residuals from a regression of x on the

deterministic term. Further, et, t=1, 2,T denotes the residuals from a regression

of X on a time trend and an intercept. Also, σ̂2t is the notation for the estimated

value of the variance obtained from the regression. If the aim is to test for trend

stationarity then the residual is redefined as

ei = Xi− X̄

38Cappuccio, Nunzio et al. ”The Fragility of the KPSS Stationarity Test”2009. http://leonardo3.dse.univr.it/home/workingpapers/fragilitykpss.pdf?fbclid =IwAR0snLcQCpmgyNCMq0eR9JgXXwFW3hnIZykKcv72IbZO7t57goM9d1W4xGI(Retrieved2019−04− 30)

39Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternativeof a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8(Retrieved 2019-04-30)

21

Page 34: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

which is the regression of X only on an intercept.40

4.3 Transformations

This section will provide theory regarding the transformations that Henricsson

found to be relevant when doing research. Furthermore, the purpose of them

will be discussed. Data transformation is a process where information or data

is converted from one format to another. In this case the goal is to transform data

fromnon-stationary to stationary. To describe these given equations the following

variables are introduced:

Data is measured on the range ( t0, .., t, .., tmax) and consists of T elements. The

dataset X, is an N*T matrix containing the N variable vectors (x1, x2,.., xN) where

xi = (xi,t0,…, xit…, xi,tmax). For a certain point in time t, and a specific variable k,

we will present a number of approximations of transformations.

Most of the generally approximated transformations depend on the rate of

decay α, which can be varied so there are a suitable number of varieties of the

transformations and an estimation may be needed. Generally the formula for the

new forecast after the transformation follows the pattern

NewForecast = α(NewData) + (1− α)MostRecentForecast

One can say that the approximation of α will decide the rate of howmuch the new

forecast represents of new data and howmuch to consider the past.41 Studies that

have been performed before have suggested that the value of α should be below

0.3 for a smoothing result.42

40Journal of Econonometrics ”Testing the null hypothesis of stationarity against the alternativeof a unit root” 1991. http://debis.deu.edu.tr/userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8(Retrieved 2019-04-30)

41Ragnarstrom, Elsa. ”How tocalculate forecast accuracy for stocked items with a lumpy demands”, 2015. https://www.diva-portal.org/smash/get/diva2:901177/FULLTEXT01.pdf (Retrieved 2019-05-03)

42”How To Identify Patterns in Time Series Data: Time Series Analysis”

22

Page 35: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4.3.1 Level Transformation

Let {Xt, t = 0, 1, 2...} be a time series. Then the level transformation is definedas

F1i,t = Xi,t̄

where

t̄ = max(tj ≤ t)

t̄ = max(tj ≤ t) is the largest t value in the sample at a specific point of time. That,

it corresponds to the latest observation. In other words, if there are any missing

values, the most recent value obtained will be used.

4.3.2 First Difference Transformation

The first difference at time t, F2i,t is obtained by looking at the change between

an observation at time t and the previous time step, t-1, from the original series.43

The first difference transformation is defined as

F2i,t = Xi,t̄ −Xi,t̄−1

A non-stationary behavior commonly encountered is when the level of the process

changes, although the process still shows homogeneity in the variability. Taking

the (first) difference may in these cases lead to stationarity.44 In time series

analysis, differencing is frequently used for removing dependency on time, for

which structures such as trend and seasonality may be included.

http://www.statsoft.com/Textbook/Time-Series-Analysis (Retrieved 2019-05-03)43Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011.p 9044Bisgaard.S, Kulahci. M. ”Time Series Analysis and Forecasting”, 2017-06-

22. https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q(Retrieved 2019-02-18)

23

Page 36: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

4.3.3 Mean EWMA-transformation

An exponentially weightedmoving average, also called EWMA is a type of moving

average that places a greater weight and significance on the most recent data

points. For example, it can be assumed that a security’s price is mostly dependent

on more recent prices compared to long ago historical data. The previous value of

the EWMA is taken into consideration in the calculation of the following EWMA.

The weights are based on the expontential function as the name indicates.45 This

is a very popular scheme to produce a smoothed time series. In general if you have

a time series called {Xt} then the smoother version will look like

St = α ∗ xt + (1− α)St−1

46

The definition for the EWMAmean in this case is

F3i,t = (1− α) ∗ F3i,t−1 + α ∗ F2i,t

4.3.4 Variance-EWMA Transformation

As mentioned, exponentially weighted moving averages are often used for

smoothing irregular fluctuations in a time series to better find the patterns over a

specific time period. Since EWMA has different properties the formula used for

the EWMA variance transformation is

F4i,t = (1− α) ∗ F4i,t−1 + α(F2i,t − F3i,t)2

From EWMA variance, a future variance is estimated by the weighted average of

45”Exponentially Weighted Moving Average” https://www.value-at-risk.net/exponentially-weighted-moving-average-ewma (Retrieved 2019-03-02)

46Jinka, Preetam. ”Exponential Smoothing for Time Series Forecasting”, 2017-06-22. https://www.vividcortex.com/blog/exponential-smoothing-for-time-series-forecasting?fbclid=IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q(Retrieved 2019-02-18)

24

Page 37: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

past variances.47

4.3.5 Skewness EWMA Transformation

This transformation measures the skewness and uses it in order to transform the

data. The formula used is

F5i,t = (1− α) ∗ F5i,t−1 + α(F2i,t − F3i,t)3

4.3.6 Kurtosis-EWMA Transformation

This transformation measures the kurtosis of the change in the variable.

F6i,t = (1− α) ∗ F6i,t−1 + α(F2i,t − F3i,t)4

4.3.7 Autocorrelation Transformation

In general probability theory and statistics with a known stochastic process in

focus, the autocorrelation will be a number that represents the similarity between

a given time series and a lagged version of it over successive time intervals. In

other words it is the same as calculating the correlation between two different

time series, its current value versus its past. The result varies between -1 and 1. If

the autocorrelation is positive it means that the increase in one time series results

in an increase in the other time series as well.48 Firstly, the EWMAautocovariance

is calculated by the following formula

47Breaking Down Finance. EXPONENTIALLY MOVING AVERAGE VOLATILITY (EWMA).https://breakingdownfinance.com/finance-topics/risk-management/ewma/ (Retrieved2019-05-03)

48Kenton, Will. ”Autocorrelation”, 2019-03-31.https://www.investopedia.com/terms/a/autocorrelation.asp (Retrieved 2019-04-13)

25

Page 38: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

F7i,t = (1− α) ∗ F7i,t−1 + α(F2i,t − F3i,t)(F2i,t−1 − F3i,t−1)

Normally, the autocovariance function between time t1 and t2 for Xt is defined

as

γX(t1, t2) = Cov(Xt1 , Xt2)

and the autocorrelation is defined as

φX,X(t1, t2) =γX(t1, t2)

σt1 ∗ σt2

where σt2 is the variance at time t.49 To obtain the EWMA autocorrelation

between, t1 = t and t2 = t − 1 the standard variances are replaced with the

corresponding EWMA variances. Also, the EWMA autocovariance is used and

the formula is hence

EWMA autocorrt =F7i,t√

F4i,t∗√

F4i,t−1

4.3.8 Correlation-EWMA Transformation

In probability theory the correlation measures the degree to which two time

series move in relation to each other. Just like in the autocorrelation case, if

the correlation is positive, it indicates that if one series moves up the other will

follow.50 Let {Xt, t = 0, 1, 2...} be a time series representing one set of observeddata, and {Yt, t = 0, 1, 2....} be another time series which represents another set ofobserved data.

To begin with, the EWMA covariance is calculated by the formula

F8i,j ,t = (1− α) ∗ F8i,j ,t−1 + α(F2i,t − F3i,t)(F2j ,t−1 − F3j ,t−1)

49Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.6250Hayes, Adam. ”Correlation”, 2019-04-30.

https://www.investopedia.com/terms/c/correlation.asp (Retrieved 2019-05-01)

26

Page 39: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

where index i and index j correspond to Xt and Yt, respectively. In general, the

covariance between to random variables X and Y is denoted Cov(X,Y) and the

correlation between the random variables is defined as

φX,Y =Cov(X,Y )

σX ∗ σY

where σX2 is the variance of X and σY

2 is the variance of Y .51

Using the EWMA covariance and replacing the standard variance with their

corresponding EWMA variances, the EWMA correlation is formulated as

EWMA corrt =F8i,j ,t√

F4i,t∗√

F4j ,t

51Kulahci, Murat et al. ”Time Series Analysis and Forecasting by Example”, 2011 p.62

27

Page 40: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

5 Methodology

As tools it was decided to limit this project to the programming language Python

and spreadsheet Microsoft Excel. These tools have been chosen since they are

easily used for time series data and one can perform all the hypothesis tests and

transformations required using these.52

5.1 Data Collection

The data was provided by SHB and consisted of different security prices and

indices. These covered the time period from 2001-01-01 to 2018-12-31 and were

noted on a daily basis. This was in order to capture real trends and seasonality

of the time series. The data regarded US related securities, such as US sectors

stock indices, US treasury bonds, exchange rates with the US dollar and more.

Processing this type of data may lay the basis for SHB to use the data and predict

future outcome of the US stock market. For example, future values for US stock

market indices Dow Jones Index or SP 500 may potentially be forecasted by a

prediction model after the data is pre-processed. This was an area of interest for

SHB.

The data is considered to be quantitative since it only contains numbers.53

Qualitative data was also used when discussing experiences with professionals

with previous expertise regarding data pre-processing. For example discussions

on how to interpret results or to understand more about the data chosen.

5.2 Data and Notations

This section contains the data and notation used in this thesis and explanations

regarding them.

52Brownlee, Jason. ”How to Check if Time Series Data is Stationary with Python”, 2016-12-30 https://machinelearningmastery.com/time-series-data-stationary-python/ (Retrieved 2019-03-09)

53”Collecting Data” http://betterthesis.dk/research-methods/lesson-1different-approaches-to-research/collecting-data (Retrieved 2019-02-09)

28

Page 41: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

5.2.1 Exchange Rates (FX data)

An exchange rate shows the value of one currency unit relative to a unit of

another currency in the foreign exchange market.54 Further in this report, a

currancy pair Currancy1 Currancy2 represents the price given in currency 2,

for one unit of currency 1. As FX-data, the currency pairs used are EURUSD,

GBPUSD, AUDUSD, NZDUSD, USDCAD, USDCHF, USDJPY, USDNOK and

USDSEK .

5.2.2 US Sectors Data

The sector data used are indices, each one describing the performance of

a chosen sector in the United States. The index is designed by Morgan

Stanley Capital International (MSCI) and covers securities in the large and

mid cap segment within the specific sector. MSCI is a provider of security

indices and performance analytics.55 The classification of the securities

follows the Global Industry Classification Standard (GICS®).56 Notations

for each sector are MXUS0EN (Energy), MXUS0MT (Materials), MXUS0IN

(Industrials), MXUS0CD (Consumer Discretionary), MXUS0CS (Consumer

Staples), MXUS0HC (Health Care), MXUS0FN (Financials), MXUS0IT

(Information Technology) and MXUS0TC (Telecom Services) and MXUS0UT

(Utilities).

5.2.3 Countries- Stock Index Data

The country (and region) indices used are MXDE (Denmark), MXEU (Europe),

MXGB (United Kingdom), MXFR (France), MXCH (Switzerland), MXES

(Spain), MXIT (Italy) and MXUS (the United States). Each index is developed

54Investopedia, ”CurrancyPair Definition”.https://www.investopedia.com/terms/c/currencypair.asp. (Retrieved 2019-05-04)

55”Index solutions”. MSCI, https://www.msci.com/index-solutions (Retrieved 2019-05-18)56”MSCI USA

MATERIALS INDEX”. MSCI, 2019-04-30. https://www.msci.com/documents/10199/6ce4617e-9127-480f-8f3b-1fdf4c0c8962 (Retrieved 2019-05-03)

29

Page 42: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

by MSCI and is used as an measurement for the stock market performance for

large and mid cap stocks of a particular country or region.57

5.2.4 Commodities Data

The commodity data covers WTI Crude Oil prices (denoted C1 Comdty) and

secondly, the spot prices of 1 troy ounce of gold in terms of US dollars (denoted

XAUUSD). 58

5.2.5 VIX- Market Volatility Index Data

The twomarket volatility data sets used are denoted as VIX Index and V2X Index.

VIX Index, which is also known as the Chicago Board Exchange (CBOE) Market

Volatility Index, is a real-time market index reflecting the market’s expectation

of the volatility. It is a 30-day forward looking volatility based on SP 500 index

options.59 V2X Index is based on real-time prices of EURO STOXX 50 Index

options. The index corresponds to the market expectations of the two month

forward-looking volatility. 60

5.2.6 Bond (IR) Data

The indices usedwereUSGG30YR Index,USGG10YR Index,USGG2YR Index and

CSI BARC Index. The USGGXYR Index, where X is an number, denotes a United

States government bond indexwithX yearsmaturity fromwhen it was first issued.

It is ameasure of of the generic governmentX-yield forUS issues of treasuries. For

example, the USGG10YR represents the index of 10-year US government bond. 61

57”MSCI US Index”. MSCI, 2019-04-30. https://www.msci.com/documents/10199/67a768a1-71d0-4bd0-8d7e-f7b53e8d0d9f (Retrieved 2019-05-03)

58”XAUUSD”. TradersTrus. https://traders-trust.com/instrument/xauusd-gold-spot-vs-us-dollar/ (Retrieved 2019-05-03)

59”CBOE Volatility Index (VIX) Definition”. Investopedia.https://www.investopedia.com/terms/v/vix.asp (Retrieved 2019-05-03)

60”EURO STOXX® 50 VOLATILITY (VSTOXX) INDEX”. STOXX, 2019-03-29. https://www.stoxx.com/document/Bookmarks/CurrentFactsheets/V2TX.pdf (Retrieved2019-05-03)

61InvestmentFinance. ”USGG10YR”, 2014-02-07.https://www.investment-and-finance.net/finance/u/usgg10yr.html (Retrieved 2019-05-02)

30

Page 43: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

The CSI Barc Index represents Barclays Capital US Corporate High Yield Bond

Index - Yield-to-Worst (YTW) 10 Year Treasury spread. 62

5.2.7 Transformations

Tomake the results and conclusionmore short and concise the followingnotations

will be used for the transformations

• First difference transformation: Transformation 1•EWMA- Mean transformation: Transformation 2

• EWMA- variance transformation: Transformation 3

• Autocorrelation transformation: Transformation 4• Correlation-EWMA transformation: Transformation 5

5.3 Selection of Transformations and Hypothesis Tests

Asmentioned, the selection of transformations and hypothesis tests weremade by

SHB and were based on their knowledge and experience of data pre-processing

and commonly used transformations. Both established transformations and

approximated formulas derived by Richard Henricsson were chosen. The

hypothesis tests includedwere also chosen since they are formulatedwith opposite

null hypotheses. This would therefore provide a wider perspective of the results

of the hypothesis test compared to having two tests where their null hypothesis

indicates the same outcome.

5.4 Selection of Market Entry Frameworks

The timing of entry framework was chosen to get a comprehensive view on how

SHB can be affected by the decision of when to commercialize their machine

learning based predictionmodel, for example by launching new financial products

based on their technology.

62 BlackRock.”ETF Landscape: Industry Highlights”, 2012-04-12 (Retrieved 2019-05-02)

31

Page 44: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

The benefits of entering at different times also depend on how the market

dynamics are right now. Porters five forces, is a well-established model for

understanding the characteristics and dynamics of a market therefore it is

included as a complement to the analysis of the potential market entry. Although,

since Porters Fiver Forces is an extensive model covering various perspectives

of the competitive environment of a industry or market, not every aspects was

considered relevant for the discussion regarding market entry. Therefore, it was

determined to not include every aspect in the analysis.

5.5 Literature Study

Prior to diving into the specific scope of the thesis a comprehensive examination

of existing knowledge was performed. Literature and journals were both collected

online and provided by SHB. The first part of the literature study was regarding

the aspects of economics and management within this thesis. It was important

to understand the greater purpose of it to SHB and how different economic

theories can be applied to the long term project. Two important books for the

thesis included in the study was Porter’s Five Forces by Newton and Bristoll and

Shilling’s Strategic Management of Technological Innovation(2017).

The literature selected for the mathematical theory was used in order to

understand concepts and terms presented within time series analysis. For the

research question of this thesis, a comprehension is necessary regarding the

different components of time series as well as how and why time series can be

processed. Moreover, in a systematic order, studies examining similar theses

were collected and summarized to gain an indication of what could be considered

a reasonable result. Theory for the mathematical section was mainly retrieved

from Introduction to Time Series and Forecasting by Brockwell and Davis(2002)

and Time Series Analysis and Forecasting by Example by Bisgaard and Kulahci

(2011). Also it was supported with knowledge received from Richard Henricsson

and Peyman Dabiri at SHB.

32

Page 45: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

5.6 Procedure of Work

As mentioned, the first step of the thesis was to do a comprehensive literature

study to gain a deep understanding of the long term purpose regarding economics

and management for SHB. It was important to analyze possibilities, competitors,

advantages and potential outcome. For instance, an extensive research was

conducted about what banks in the Nordics that currently offered products based

on machine learning prediction of security prices.

After this study was obtained the focus shifted to understanding themathematical

theories. For the time series analysis, it was chosen to work with the additive

model presented in the theory section since it has been proven that this general

model fits data smoothly and are flexible without adding much complexity or

variance to the process.63 To confirm this, a few data sets were plotted to

graphically identify whether or not the models were additive. By inspecting the

pattern of increase of the amplitude of the time series it was decided that the time

series most likely was an additive model.

The mission was to investigate whether or not the transformations given could

make the given data stationary. Firstly, all the data given was imported, the

transformations were coded, as well as the two hypothesis tests with specific

Python functions from the ”stattools” package. Important to note is that the ADF-

function testes generally for stationarity whearas the KPSS-function in Python

tests for the null hypothesis that the data is level or trend stationary.64 This is

automatically integrated in the two functions.

A relevant concern for the this project was missing values since the values of the

transformation at time t, depends on the value at time t-1. Therefore, an important

first step of the procedure was to establish a methodology for handling potential

missing values in the data sets. Particularly, in this case it was agreed on to replace

themissing valuewith themost recent value prior to it as an approximation. It was

later noticed that all the given data sets contained no missing values. Therefore,

63”Generalized Additative Models”, 2017-07-06 https://machinelearningmastery.com/time-series-data-stationary-python/ (Retrieved 2019-05-06)

64”statsmodels.tsa.stattools.kpss”https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.kpss.html (Retrieved2019-05-04)

33

Page 46: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 5.1: Figure showing an additive time series. As one can see the amplitude ofthe series seems to remain constant for different time steps which is an indicationof an additive model.

Figure 5.2: Figure showing a multiplicative time series. As one can see theamplitude of the series seems to increase with a factor for different time stepswhich is an indication of a multiplicative model.

34

Page 47: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

this treatment was not used.

As mentioned, for the majority of the transformations, the calculation for time

t was dependent on the value at the previous time step. In other words, the

calculations were made recursively, but in order to start the recursion an initial

value was needed for all the transformations except the first difference which does

not depend on the previous value. In all of the cases n=19 was chosen, that is the

19 first values in a time series. These values were later on removed from the time

series and not included in the usage of the transformations. This seemed like a

reasonable amount to give a good estimated starting point without removing too

much of the data. The sample size that was chosen large enough to smooth the

effects of eventual effects of potentially existing extreme values. However, since

the size of the transformed series will decrease as n increases, a relatively narrow

sample size was desired.

Starting with the EWMA- mean, the first transformation value was estimated as

taking the average of a sample n of the first differences. The first differences are

chosen as the base data for all the initial approximations since first differences in

general may lead to removal of some trends or seasonal effects. Hence, compared

to the original series, differenced series may behave more similar to a stationary

series. More precisely initial EWMA-mean is65

F3i,1 = F̄2i,t =

∑nt=1 F2i,tn

The initial EWMA-variance was calculates as the variance of the sample of

differences.66

F4i,1 =

∑nt=1(F2i,t − F̄2i,t)

2

n− 1

Likewise, the initial skewness value was obtained by calculating the skewness of

65”Sample Mean”. https://www.statisticshowto.datasciencecentral.com/sample-mean/(Retrieved 2019-04-03)

66”Sample Variance: Simple Definition, Howto Find it in Easy Steps”. https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/descriptive-statistics/sample-variance/ (Retrieved 2019-04-03)

35

Page 48: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

the sample, that is

F5i,1 =n

(n− 1)(n− 2)∗∑n

t=1(F2i,t − F̄2i,t)3

s3

is the standard deviation for a sample of the 19 first differences where s is the

standard deviation for the sample of the 19 first differences.67

For the EWMA kurtosis calculations, the inital value was approximated by the

kurtosis of the sample

F6i,1 =n(n+ 1)

(n− 1)(n− 2)(n− 3)∗

n∑t=1

(F2i,t − F̄2i,t)4

s4− 3(n− 1)2)

(n− 2)(n− 3)

where s is the sample standard deviation.68

The intital value for EWMA autocorrelation was calculated by69

F7i,1 =1

(n− 1)∗∑n

t=1(F2i,t − F̄2i,t)(F2i,t+1 − F̄2i,t+1)

F4i,1

Instead of using the variance as the standard formula for autocorrelation suggests

it was chosen to use the EWMA-variance. This was due to the fact that the EWMA-

variance by theory should be more stationary compared to data of the standard

variance since it i1s exponentially weighted. Thus, using the transformed values

should result in a more stationary initial value for the autocorrelation.

The autocorrelation was calculated by taking the first until the 19:th differenced

value as one time period, and the second value to the 20:th as the other set. In

other words, lag 1 was used for the autocorrelation function. Lag 1 was chosen

since thenextEWMAautocorrelation calculation at time t, is the value at t+1.

67 ”SKEW function”.Microsoft. 2019. https://support.office.com/en-ie/article/skew-function-bdf49d86-b1ef-4804-a046-28eaea69c9fa (Retrieved 2019-04-03)

68”KURT function”. Microsoft. 2018. https://support.office.com/en-us/article/kurt-function-bc3a265c-5da4-4dcb-b7fd-c237789095ab (Retrieved 2019-04-03)

69”Autocorrelation function”. http://www.real-statistics.com/time-series-analysis/stochastic-processes/autocorrelation-function/(Retrieved 2019-04-03)

36

Page 49: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Initial correlation-value is calculated

F8i,1 =1

(n− 1)∗∑n

t=1(F2i,t − F̄2i,t)(F2j,t − F̄2j,t)√F4i,1

√F4j,1

where i and j denote two different time series.70

Similar to the case with the autocorrelation, the square root of the EWMA-

variances for the different time series are taken instead of the standard deviation

that the standard formulas use. Once again this was chosen due to the fact that

theEWMA-variances should bemore stationary than the standard deviation of the

original data. Therefore, this should result in a more stationary inital value.

With these initial values the transformation formulas could be applied to the

time series. Before properly using the coded program for all data sets and

transformations to check for stationarity, and find optimal values for α, a few data

sets were tested. The original data set and the output of every transformation

were visualized with graphs, since graphical methods for observing stationarity is

a commonly used in time series analysis. The aim was to obtain an initial insight

of how a transformation might affect the financial time series. For this purpose,

only one fixed value for α was set.

To begin with, with two different currencies where chosen for the initial

assessment. This is because currency exchange rates give important information

regarding a country’s relative level of economic health. Therefore, it is important

data for the prediction model.71 Currency exchange rates are not only related to

interest rates but also to inflation and a country with clear positive attributes will

draw more investment funds. Therefore, this was one of the easiest data sets for

understanding how they affect different prediction models. Besides, exchange

rates are generally relatively stable and therefore we predicted that FX-data would

be one of the easiest sectors to stationarize. As our second choice it was decided to

plot commodities such as Crude Oil, C1. Since it is known that oil prices in general

70”CORREL function”. Microsoft. 2019. https://support.office.com/en-gb/article/correl-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92 (Retrieved 2019-04-03)

71”6 Factors that Influence Exchange Rates” https://www.investopedia.com/trading/factors-influence-exchange-rates/ (Retrieved 2019-05-01)

37

Page 50: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

depend on seasonality it was wanted to see how well our transformations would

handle the seasonal component.72

When plotting for these initial graphs a fixed α=0.5 was chosen for each

transformation. As mentioned, these plots were only for our own understanding

and therefore α =0.5 seemed like a neutral choice, since it gives as much weight

to the current value as the past. Before trying to find an optimal α, once again a

fixed α was chosen to compare the transformation formula to a moving average

of the transformation. This would show if our approximated transformation

formulas are good estimates. When taking a moving average, instead of using

the transformation formula to obtain the rest of the values from the initial value,

the mathematical formulas presented in this section 5.5 for the initial values are

used. But for each point you move the interval which is implied by ”moving

average”. For example, the first initial value n=1 goes to 19 in each formula. For

the next point we will instead use n=2 up to 20 for the same formula without

the usage of the approximated transformation formulas. The the moving average

and transformation formula points are plotted simultaneously to see if their

values show a similar pattern. If they differ a lot, it can be concluded that the

transformation formulasmight not be specific enough towork for this type of data.

Therefore, the abnormal transformations would not be further used when finding

an optimal α.

Figure 5.3: Figure explaining how one can see the pattern of the moving averagecompared to the actual transformation outcome. As seen, the pattern of themoving average graph matches the transformation. This indicated that thistransformation is a good and valid approximation

72Journal of International Studies ” Seasonal patterns in oil prices and their implications forinvestors” 2018. (Retrieved 2019-05-01)

38

Page 51: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

After confirming the formulas, one of the main step was to examine the first

difference transformation. Since the transformation does not depend on the

variable α it was not tried for different values of α. The data sets were run

through for this transformation separately and later on tried with the hypothesis

tests to see if the first difference transformation alone would be enough as a

transformation.

The final step of the project was searching for the optimal value or an optimal

interval for α for each EWMA transformation respectively. A step length of 0.02

was used. This was chosen since it was noticed that taking smaller steps would

require too much time to run for all the given data sets. This was done for all the

transformations that were dependent on α and for the transformations for which

formulas proved to be good approximations through the moving average test. We

decided to define a passing test, in other words the data set being stationary, if

and when it passed both hypothesis tests.

39

Page 52: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6 Results

Initially, three different trials were conducted for a fixed α, using data from two

currency pairs and one commodity. Each data set was transformed separately and

for each transformation. To visualize the result from the transformation, a graph

consisting of the transformed values was plotted after a data set had been run

through the transformation. Afterwards the hypothesis tests where performed to

see whether the graphical results indicated were valid. The final step was to test

for differentα- values for each transformation in order to find the optimal value or

values for α in the sense that it couldmake input data potentially stationary.

6.1 First Trial: Plots for Currency Rates, with Fixed α

For the first trial, the currency pairs EURUSD and USDCHF were arbitrarily

chosen among the currencies. As mentioned, the intention was to get an overview

of how currency rates time series behave before and after each transformation. To

check for stationarity, each transformation was plotted and examined. Also, the

transformed series were checked using the two hypotheses tests KPSS and ADF.

For all transformations depending on the weighting factor α, a fixed value α = 0.5

was used. This was done for the first two data sets of the currency data to see

the results of the plots and also if these agreed with the tests statistics for each

currency.

Figure 6.1 shows the original time series for the EURUSD prices from 2001-01-01

to 2018-12-31. It is observed that data does not seem to have a constant mean

nor a constant variance and may therefore be non-stationary. However, there

are no distinguishable seasonal patterns or extremely large deviations, whichmay

indicate that the process is relatively stable. This behavior was expected since it

is common for FX data.

Figure 6.2 shows the original time series for the USDCHF currency plotted from

the data. It is observed that the time series does not look stationary, rather it

seems as if there is a general decrease, exhibited by a trend component.

Figure 6.3 shows the time series for the EURUSD currency but transformed

40

Page 53: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.1: EURUSD graph

Figure 6.2: USDCHF graph

with the first difference transformation. It is observed that the transformed

series seems to move regularly around a certain value (a straight horizontal

line). Specifically it seems to have a constant mean and variance, similar to the

behaviour of a stationary time series. Since the first difference transformation is

known to be an easy, yet a very useful transformation these positive results were

awaited.

Figure 6.4 shows the time series for the USDCHF currency but transformed with

the first difference transformation. With this the time series seems to become

more stationary even though there are still peeks.

Figure 6.5 shows the time series for the EURUSD currency but transformed with

41

Page 54: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.3: EURUSD graph- transformation 1

Figure 6.4: USDCHF graph- transformation 1

the EWMA-mean transformation. With this the time series seems to become even

more stationary, with a regular movement around zero. The theory describes the

EWMA-mean transformation as a smoothing one and indeed it seems to smooth

the pattern of the graph. Some peeks can be still be found, which may represent

extreme movements in the original series (a sharp increase or decrease in prices

in a short time period). Unusual events, such as the financial crisis in 2008 may

bring contribute to this kind of behaviour in a financial time series.

Figure 6.6 shows the time series for the USDCHF currency but transformed with

the EWMA-mean transformation. With this the time series also seems to become

more stationary.

42

Page 55: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.5: EURUSD graph- transformation 2

Figure 6.6: USDCHF graph- transformation

Figure 6.7 shows the time series for the EURUSD currency but transformed with

the EWMA-variance transformation. With this the time series seems to become

more stationary which one can see since the scale on the y-axis decreases which

means that the peaks become smaller.

Figure 6.8 shows the time series for theUSDCHFCurncy but transformedwith the

EWMA variance transformation. With this there is barely any difference visually

compared to the previous transformation.

Figure 6.9 shows the time series for the EURUSD curency but transformed with

the EWMAautocorrelation transformation. With this it seems that the correlation

43

Page 56: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.7: USDCHF graph- transformation 3

Figure 6.8: USDCHF graph- transformation 3

is stable between around -1 and 1 which is where the autorcorrelation should

be.

Figure 6.10 shows the time series for the USDCHF currency but transformed with

the EWMA autocorrelation transformation. Just as for the EURUSD-currency it

seems stable and varies around -1 and 1.

Figure 6.11 shows the correlation between the two time series for EURUSD

currency and USDCHF currency. From the plot it is observed that there might

be some dependency between the two currency pairs. This was expected since

44

Page 57: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.9: EURUSD graph- transformation 4

Figure 6.10: USDCHF graph- transformation 4

both currency pairs depend on the US dollar.

6.1.1 Statistics for First Trial

The following statistics were obtained for the Augmented Dickey Fuller-Test

(ADF) and KPSS-test for the two different currencies tried.

For EURUSD- Currency pair:

P-value/Transformation 1 2 3 4 5

ADF 1.84*10−21 4.37*10−21 0.0002 8.02*10−21 2.50*10-20KPSS 0.1 0.1 0.1 0.0204 0.0333

Table 6.1: Test Statistics obtained for EURUSD trial 1

45

Page 58: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.11: USDCHF graph- transformation 5

For USDCHF- Currency pair:

P-value/Transformation 1 2 3 4 5

ADF 2.58*10−24 6.47*10−24 6.27*10−16 6.05*10−22 2.50*10-20KPSS 0.1 0.1 0.01 0.0877 0.0333

Table 6.2: Test Statistics obtained for USDCHF trial 1

Table 6.1: One can see that the p-values for the ADF-test are low. The results

from the KPSS- test have large p-values for most of the transformations except for

transformation 4 and 5. This indicated that according to the ADF-test the time

series are all made stationary for the different transforms. However, the KPSS-

test statistics does not agree regarding transformation 4 and 5 making the data

stationary.

Table 6.2: One can see that the p-values for the ADF-test are low which

indicates stationarity. The results from the KPSS- test have large p-values for

transformation 1,2 and 4. Small values are obtained from the KPSS-test for

transformation 3 and 5, which indicates that the KPSS-test does not share the

same conclusion as for the ADF-test regarding making data stationary.

46

Page 59: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.2 Second Trial: Plots for Commodity, with a Fixed α

Second trial, a commodity, more precisely the crude oil, was examined with

α = 0.5 for all transformations.

Figure 6.12: Commodity: Original Time Series

Figure 6.12 shows the original time series of the commodity plotted from the data.

The time series varies a lot with different peeks. As seen the y-scale on the axis

is much larger than for the currencies which indicates that currencies in fact are

more stable compared to commodities. As mentioned, many commodity prices

are affected by seasonal fluctuations.

Figure 6.13: Commodity with the first difference transformation

6.13: The differenced series behaves as it has become more stationary but not

47

Page 60: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

completely since some values are remarkably larger or smaller than the rest of the

set. In Figure 6.12 (representing the original series) there is a sudden decrease in

the crude oil price around 2008 (which is also the year of the financial crisis). This

is also reflected in the transformed serieswhere extreme values can be foundwhen

examining the same time period. In general, the first difference transformation

seems to be working in accordance with theory, but might not work as effectively

for removing the impact of such extreme events.

Figure 6.14: Commodities with the EWMA transformation

6.14: For the commodities the EWMA mean transformation seems to narrow

the range for which the time series values varies within. For the difference

transformation seen in Figure 6.14, the values varies within a range of -15

to 15 approximately (except for the extreme values). After the EWMA-mean

transformation is performed, it is observed from the graph that the range for

which the time series values varies within, has been narrowed to around -8 to

8. In accordance with theory, the transformation has a smoothing effect on the

series. Similar to the differencing transformation, EWMA-mean does not seem to

succeed with removing extreme events such as the 2008 financial crisis.

Figure 6.15 pictures the EWMA variance transformation of the CL1. As seen the

transformation does not seem to have a beneficial effect on the time series. The

EWMA-variance depends on the values from the difference and the EWMA-mean

transformation. Therefore, it might be expected that if the EWMA-mean and

the differenced values are stationary, then the EWMA-variance should yield a

48

Page 61: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.15: EWMA variance transformation of the CL1

stationary output too. Although, by the plot it can be seen that the transformation,

in fact, seems to make the data more non-stationary and varied. Especially for

short time periods with extreme movements and fluctuations in the original data

(such as around 2008), the EWMA-variance values for those periods become

very deviant. This indicates that the EWMA-variance is highly sensitive to

variations in the input data. In a normal case, if the first difference transformation

and the EWMA- mean transformation make the data set stationary whereas

the EWMA-variance has a negative impact one would not normally do these

transformations and the following ones but since it is wanted to understand all

of the transformations, they will all be plotted.

Figure 6.16: EWMA auotcorrelation transformation

49

Page 62: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

5.16 shows that autocorrelation varies between -1 and 1 and graphically seems

stable

6.2.1 Statistics for Second Trial

P-value/Transformation 1 2 3 4

ADF 8.78*10−18 5.83*10−17 2.16*10−4 8.45*10−20KPSS 0.1 0.1 0.01 0.01

Table 6.3: Test Statistics obtained for Commodity trial 2

Table 6.3: ADF-values are low for all different transformation. KPSS-values are

high for transformation 1 and 2 but low for the rest. As for the currencies, the

ADF-test indicates stationarity for all the transformations whereas the KPSS-test

only confirms stationarity for transformation 1 and 2.

50

Page 63: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.3 Third Trial: Plots for Commodity Prices, with a Fixed α

For the third trial another time series of commodity prices was tried, this time,

gold prices expressed in US dollars. As for the previous trials, a fixed value α = 0.5

was used as the weighing factor for the transformations.

Figure 6.17: Original data of XAUUSD currency

Figure 6.18: First Difference transformation of XAUUSD currency

Figure 6.18: The first difference seems to have a positive impact on the stationarity

of the time series and is similar to the results from the previous trials.

51

Page 64: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.19: EWMA- mean transformation of XAUUSD currency

Figure 6.19: The EWMA- mean transformation seems to behave similarly to the

differenced values seen in Figure 6.18. More precisely, both transformations

seem to move around the same level and also their variations are resembling,

in the sense that they have deviating values at the same points in time. As

for the previous trials, compared to the first order differencing transformation,

the EWMA-mean values seem to have a more narrow range in which these

transformed values varies. The EWMA-mean transformation is based on the

differenced values rather than the original series in the calculations. By theorem,

it is also known that EWMA-mean should serve as a smoothing technique. An

interpretation of the results could therefore be that trends are firstly removed

by differencing, and thereafter the data set is smoothened by the EWMA-mean.

Therefore the graph of the resulting series seems more stationary.

Figure 6.20: The transformation does not seem to have a positive impact on the

time series. The EWMA-variance transformation takes both EWMA-mean values

and differenced values as input. Looking at the Figure 6.18 (First difference of

XAUUSD data) and Figure 6.19 (EWMA-mean of XAUUSD data), it is noticed

that both transformations exhibit extreme values at the same points of time, for

example one is found between 2013 and 20014, and another is found just before

2012. These deviations seem to increase remarkably after the EWMA-variance

52

Page 65: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.20: EWMA-variance transformation on XAUUSD currency

transformation. As for the previous trials, the transformation presents a certain

sensitivity to sharp fluctuations in the input time series.

Figure 6.21: EWMA-autocorrelation transformation on XAUUSD currency

The transformeddata looks stable in the sense that the series seem tomove around

a certain mean value with a constant variance. More specifically, the values vary

from approximately -1 to 1. The result can be expected since it is known that in

general, autocorrelation values range from -1 to 1.

53

Page 66: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.22: Correlation between commodity and XAUUSD currency

Most of the correlation values varies around -1 and 1 but the values before exceeds

the limits a bit.

6.3.1 Statistics with trial 3

P-value/Transformation 1 2 3 4 5

ADF 5.16*10−25 1.19*10−24 1.51*10−8 1.28*10−20 8.71*10−11KPSS 0.1 0.1 0.01 0.1 0.01

Table 6.4: Test Statistics obtained for commodity XAUUSD

As seen in Table 6.4 the ADF-values are well below 0.05 which indicates

stationarity, whereas the KPSS-values only exceed 0.05 for transformation 1,2

and 4. That is, the KPSS-test only indicates stationarity after using the three

mentioned transformations. Also, transformation4 for the gold commodity seems

to work better than for the oil.

54

Page 67: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.4 Seasonality and Trends

The following graphs were obtained by using Python’s decompose function. This

was done to ”confirm” and visually see that the time series do indeed consist of a

trend component and a seasonal one even when transformed.

Figure 6.23: Graph showing the original data, the trend of it, the seasonality andthe residual decomposed

55

Page 68: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.24: Graph showing the data transformed with first difference, the trendof it , the seasonality and the residual decomposed

6.5 Skewness and Kurtosis

As mentioned in the procedure of work (Section 4.5), before using the

transformation formulas implemented in the Python code, a simple moving

average, corresponding to an EWMA transformation, was plotted in Excel. In the

same graph the values of the EWMA transformation were also plotted. This was

done for each EWMA transformation that Henricsson at SHB had approximated,

which were for skewness, kurtosis, autocorrelation and correlation. The

difference between the simple moving average and an exponentially weighthed

moving average is that the EWMA gives more weight to recent values than past

values. Although, EWMA is based on the simple moving average and therefore

their graphs should approximately follow the same pattern. Consequently, if an

EWMA transformation followed the pattern of its corresponding moving average,

it was assumed that the transformationwaswell approximated. The aimwas to see

if the given formulaswould serve as accurate approximations. If not, theywere not

further examined, in other words, they were excluded from the research question.

All the approximated EWMA transformations seemed to follow a similar pattern

to themoving average, except for the skewness and kurtosis transformation.

56

Page 69: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.25: Graph showing the moving average of the skewness versus theEWMA-skewness obtained by given transformation formula. As one can see theblue pattern representing the EWMA-skew, does not match the moving average(orange pattern) and there is an evident difference between their values.

Figure 6.26: Graph showing the EWMA-skewness up close since it was notproperly seen behind the moving average of the skewness.

57

Page 70: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Figure 6.27: Graph showing themoving average of the kurtosis versus the kurtosisobtained by given transformation. Once again it is seen that the blue graph of thekurtosis (obtained by the transformation) does not match the moving average ofthe kurtosis.

Figure 6.28: Graph showing the kurtosis up close since it was not properly visiblebehind the moving average of the kurtosis.

58

Page 71: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.6 First Differences on all Data

The first difference transformation does not depend on the value of α and

was therefore, treated separately. All the data that passed the first difference

transformation is presented below. X represents passing both stationarity tests,

whereas 0 represents failing at least one of them.

Currencies

Figure 6.29: Table showing the currencies that passed transformation 1

US Sectors

Figure 6.30: Table showing the US sectors that passed transformation 1

Countries

Figure 6.31: Table showing the countries that passed transformation 1

Commodities

Figure 6.32: Table showing the commodities that passed transformation 1

VIX

Figure 6.33: Table showing the VIX data (market volatility data) that passedtransformation 1

59

Page 72: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

Bonds

Figure 6.34: Table showing the IR data (bond data) that passed transformation 1

6.7 Finding the Optimal α

6.7.1 Currencies

To find the optimal α:s between 0 and 1 a step length 0.02 was chosen to loop

through different values of the variable. Each one was examined, starting with

FX- data. The transformations were tested separately, meaning for a chosen

transformation, the goal was to find a value for α that would make the most FX

data sets stationary.

The value or values of α for a chosen transformation, that made the transformed

series pass both the ADF and the KPSS-test for the most number of currencies

was chosen as the optimal value or interval. In the table, the symbol Xmeans that

the currency passed the two hypothesis tests for the transformation and later the

value or interval for αwill be presented. The symbol 0 corresponds to not passing

the tests, either both or one of them.

Currency/Transformation 2 3 4

EURUSD X X XGBPUSD X X XAUDUSD X X XNZDUSD X X XUSDCAD X X XUSDCHF X X XUSDJPY X X XUSDNOK X X XUSDSEK X X X

Table 6.5: Table Obtained for Currencies

The optimal values of α for transformation 2 are 0.381≤ α ≤ 1.

The optimal value is α = 0.981 for transformation 3.

60

Page 73: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

The optimal interval for transformation 4 is 0.701≤ α ≤ 0.981.

The optimal interval for transformation 5 is 0.141≤ α ≤ 1. Here, all combinations

of two currency pairs were used to calculate different correlations. For example,

the correlation between EURUSD andGBPUSDwas tried, thereafter for EURUSD

and AUDUSD and so on.

6.7.2 US-Sectors

The same procedure was applied to the data of the US Sectors and so on.

US sector Index/Transformation 2 3 4

MXUS0EN X X XMXUS0MT X X XMXUS0IN X X XMXUS0CD X X XMXUS0CS X X XMXUS0HC 0 X XMXUS0FN X X XMXUS0IT 0 X XMXUS0TC X X XMXUS0UT X X XMXUS X X X

Table 6.6: Table Obtained for US Sectors

For transformation 2 there is no α that passes all the tests for the data.

Optimal interval for transformation 3 is 0.941≤ α ≤ 0.981.

Optimal intervals for transformation 4 is 0.061≤ α ≤ 0.521 and 0.861≤ α ≤ 0.981.

Optimal interval for transformation 5 is 0.261≤ α ≤ 1 where correlations

were calculated for various combinations of the US sectors. For example,

the correlation between MXUS0CD (Consumer Discretionary) and MXUS0CS

(Consumer Staples) was tried, thereafter forMXUS0CS andMXUS0UT (Utilities)

and so on.

61

Page 74: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.7.3 Countries Index

Countries Index/Transformation 2 3 4

MXUS X X XMXEU X X XMXGB X X XMXFR X X XMXCH X X XMXES X X XMXIT X X XMXDE X X X

Table 6.7: Table Obtained for Countries Index

The optimal interval for transformation 2 is 0.101≤ α ≤ 1.

The optimal interval for transformation 3 is 0.961≤ α ≤ 0.981.

The optimal interval for transformation 4 is 0.181≤ α ≤ 0.981.

The optimal interval for transformation 5 is 0.041≤ α ≤ 1, where correlations

were calculated as previous.

6.7.4 Commodities

Commodity/Transformation 2 3 4

CL1 X X XXAUUSD X 0 X

Table 6.8: Table Obtained for Commodities

The optimal interval for transformation 2 is 0.161≤ α ≤ 1.

For transformation 3 there was no optimal value of α

The optimal interval for transformation 4 is 0.181≤ α ≤ 0.981.

For transformation 5, correlations were calculated as for previous the financial

variables, in other words, correlations between different commodities were tried.

There was no value for α that made data pass both tests.

62

Page 75: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.7.5 VIX (Market Volatility)

Index/Transformation 2 3 4

VIX X X XV2X 0 0 X

Table 6.9: Table Obtained for VIX

For transformation 2 there is no optimal value that makes data pass both

hypothesis tests.

For transformation 3 there is no optimal value that passes both hypothesis tests.

The optimal intervals for transformation 4 are 0.181≤ α ≤ 0.421 and 0.921≤ α ≤0.981

For transformation 5 the correlation between all the different combinations of VIX

indexes were tried and the interval 0.101≤ α ≤ 0.981 is optimal.

6.7.6 IR (Bonds)

IR Index/Transformation 2 3 4

USGG30YR X X XUSGG10YR X X XUSGG2YR X X XCSI BARC X X X

Table 6.10: Table Obtained for IR Indexes

The optimal interval for transformation 2 is 0.121≤ α ≤ 1.

The optimal interval for transformation 3 is 0.921≤ α ≤ 0.981.

The optimal interval for transformation 4 is 0.961≤ α ≤ 0.981.

For transformation 5 correlations between all the various combinations of

different bond time series were tried, and there was no α that made all data pass

both tests.

63

Page 76: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

6.7.7 Aggregated α

To further obtain an aggregated α for each transformation, the value or interval

of α that made the majority of the time series pass the hypothesis tests, were

aggregated into one. As a result, the following α:s were obtained

• Transformation 2: 0.381≤ α ≤ 1

•Transformation 3: α = 0.981

•For transformation 4, twodifferent intervalswere obtained, both in a lower rangeand a higher one that potentially could make the data stationary. The intervals

were 0.181 ≤ α ≤ 0.421 and 0.921 ≤ α ≤ 0.981.

• Transformation 5: 0.141 ≤ α ≤ 1

Accordingly, for each transformations, the values of α presented above resulted in

making financial data potentially stationary.

64

Page 77: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

7 Conclusions

7.1 Interpretation and Impact

7.1.1 Trial 1

For the first trial, most transforms seem stationary when examining the pattern

of the graphs. It is also seen that the correlation between the variables exceeds

the limits -1 and 1, which is due to the fact that real data is used. To not

exceed the limits one could scale the data with a common factor. To confirm the

graphical conclusions two hypothesis tests were performed. For the first trial, the

hypothesis tests are performed to each one of the transformed data sets to see

which ones indicate stationary data and which ones do not.

For the EURUSD exchange rate all the p-values from the ADF-tests are small

and therefore it proves that this hypothesis test considers all the transformations

to make data stationary. On the other hand, the KPSS-test does not result in

the same conclusion. There it is seen that transformation nr 4 and 5 have low

values and therefore rejects the null hypothesis that the data is stationary. For

the USDCHF- currency the same results are concluded for the ADF-test but the

KPSS-test indicates that transformation 3, 4 and 5 make data stationary.

From the first trial it is concluded that even though all the graphs visually seemed

to become more stable it does not have to coincide with the result of the different

hypothesis tests. It is also concluded that the ADF and KPSS-test do not have

to indicate the same result and can be contradicting regarding the validity of the

transformations.

7.1.2 Trial 2

Just as for trial 1, the transformations graphically seems to be improving the data.

All except the EWMA-variance transformation that seems to increase the peeks of

the time series. To confirm stationarity the hypothesis tests were performed. For

the second trial, the p-values for the data of the commodity is obtained. In this

case all the p-values from the ADF-tests are low which indicates stationarity. On

65

Page 78: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

the contrary, the p-values for transformation 3 and 4 for the KPSS are also low,

which suggests that the data is non-stationary.

It is concluded from the KPSS-test that the EWMA-variance makes already

stationary data non-stationary which was also graphically seen. Therefore, when

performing these tests on real data for actual prediction one should stop if one

transformation has already made the data stationary instead of continuing since

more transformations do not have to result in more stationary data.

7.1.3 Trial 3

For the third trial the gold price data was tried. In accordance with trial 2 all

transformations except EWMA-variance seem to be stabilizing. Thep-values from

the ADF-tests all proved stationarity. For the KPSS-test, transformation 3 and 5

indicated a non-stationary data set. Trial 3 therefore serves as a further indication

regarding EWMA-variance not being a beneficial transformation.

7.1.4 Skewness and Kurtosis

As seen in the plotted moving averages, it is clear that the approximated

EWMAs of skewness and kurtosis are not good enough estimates since they

are not similar to the moving average patterns. There is a huge difference

between the values calculated by the mathematical formula of the skewness for

example, and the values that the transformations of the skewness results in.

Because of this the transformations of skewness and kurtosis are not further

used and it was concluded that better estimates for these transformations are

required for the transformations to be valid. This is an example of a area that

could use more research to find better formulas and approximations for these

characteristics.

7.1.5 First Difference as a Transformation

The first difference transformation does in accordance with theory prove to

be a good transformation. This was seen both graphically when plotting the

66

Page 79: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

transformed data and also when trying the transformation for all given data. Most

of the data transformed with the first difference passes both the hypothesis tests

performed. Therefore, it is concluded that this will be a very useful transformation

for future regression models when pre-processing data.

One important aspect to mention is that in this project a time series that was

made stationary by the first difference transformation was still tried for the rest

of the transformations to see the effects of them. To further extend the scope

for future projects it could be relevant to include a part so that data that passes

the first difference transformation and is stationary is not further treated and

analysed for finding optimal α values. This could provide another perspective of

the results.

Also, the first difference was also performed with lag 1. For series with different

lenghts of the season, a better result would have been obtained if the lag was set

to the length of the season. Hence, for future work the seasonal component and

its period should be examined before differencing a time series. This is to find

an appropriate lag and consequently increase the probability of making the series

stationary.

Additionally, for some data a higher degree of differencing may be required to

make a process stationary. For further research it can be examined whether only

differencing of one is sufficient or not to make the financial data stationary.

7.1.6 Finding the Optimal α

As mentioned in the results, a value or intervals of α are found for each

transformation. From the results in Section 6.7.7 it is seen the value is very

different depending on the transformation and therefore it is difficult to aggregate

the values into a single α for all financial data. In comparison with theory found

regarding α below 0.3 being a good value, it is seen that a lot of lower values are

indeed included in the intervals obtained. However, the span of the value is big

since there were many α:s that made the data pass the hypothesis tests. This

means that in this project there is no indication regarding α:s lower than 0.3 being

more useful compared to the α:s above this. However, the lower the value, the

67

Page 80: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

more previous data is taken into consideration which is desirable since old data

is necessary for a prediction of future values. For future projects it is essential to

include more data in order to obtain a more clear indication of what values of α

are adequate for making financial time series stationary. The values of α obtained

might only be suitable for the data used in this research but not for other input

data. A big interval does not have to indicate a bad result since it could mean that

the data used just i easily transformed and treated. As mentioned before, a future

goal for SHB is to be able to predict a US stockmarket index. Among other factors,

the index is affected by macro factors such as interest rates, unemployment rates

and GDP figures.73 Consequently, it might be required to pre-process this type of

data too. To broaden this research, the transformations can be tried for various

types of macro data (or other relevant data affecting the stock market), to find

optimal values of α that would make these data sets stationary.

Usingmore hypothesis tests tomake the conditions of ”passing” harder could also

be necessary for making the value or interval of α more specified. In general,

the majority of the transformations seem to make data stationary, at least for

some values of α. EWMA-variance was an exception, since it has a tendency to

sometimes make data even more unstable and non-stationary. More precisely,

this could be seen when the input data had outlying values or sharp fluctuations

in a short period of time. These deviations are even more apparent after using

the transformation and hence, it is concluded that EWMA-variance is possibly

sensitive to extreme values and deviations in the input data. This was seen

both in the graphs and when looping to find an optimal α since only α around

approximately 0.982 worked. To understand why the EWMA-variance formula is

a poor transformation its formula

F4it = (1− α) ∗ F4i,t−1 + α(F2it − F3it)2

is further investigated. If α is close to 1 it means that there will be little influence

of the EWMA-variance from time t-1. Most of the values will therefore solely

depend on transformation 1 (first difference) and transformation 2 (EWMA-

73”Trading the Dow Jones industrial average”. UFX.https://www.ufx.com/en-gb/assets/indices/dow-jones/(Retrieved 2019-05-10)

68

Page 81: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

mean) at time t. Since there is so little influence of the previous values,

the prediction of future ones may be misleading. As mentioned, the results

show that EWMA-variance does not handle deviations in the data well. This

transformation might be more suitable for non-volatile data sets. Although, to

draw a conclusion, the transformation has to be tried for much more data, both

highly fluctuating and relatively stable series, and thereafter compare the results.

Also, another weighting of time could be tried as further research, for instance a

log-space exponential moving average for a re-experiment since this has proven

to give more accurate results for very long-term and highly volatile data in other

experiments.74

7.2 Analysis of Timing of Entry and Competitive Rivalry

Machine learning based prediction of financial security prices is a fairly new topic

and it therefore is of interest to discuss the advantages and disadvantages of being

among the first to commercialize the concept.

In the market in the area of predicting securities’ returns with machine learning,

being the first mover can create many advantages for the firm. It could open the

opportunity to establish their model as the standard in the banking industry. This

could make a strong impression on both customers and investors and SHB may

be perceived as innovators or technological leaders. In the long term thismay also

result in gained customer loyalty.

Although there are many first mover advantages regarding this project if SHB

manages to be first, there are also disadvantages. These could be such as too

much expenditures and resources spent on R&D for SHB for this specific project.

During the project it was noticed that a lot of resources had been used in order

to remain in the lead of this development. For example this entire project was

spent on examining the validation of specific data and transformation. As in the

case of the kurtosis and skewness, an extensive amount of time was spent on these

transformations which later on proved to be poor approximations and could not

74”Log-spaceExponential Moving Average”, 2017-11-22. https://www.tradingview.com/script/cyfV1gLU-Log-space-Moving-Average/(Retrieved 2019-05-07)

69

Page 82: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

be used for obtaining stationarity of data. This could indicate that perhaps it

could be more beneficial for SHB to be for example the second first to the market.

Then the method for developing a prediction model and also pre-processing data

might already have been examined. There could also be a possibility of having a

prediction model to imitate.

Since the long-term goal of this project is to develop a prediction model it was

interesting to analyze the potential outcome of this through Porters Five Forces.

The possibility for customers to negotiate is not expected to be very high since

there are no banks in Sweden offering these opportunities for investors and

customers based on machine learning. This indicates that being the first mover

may be beneficial for SHB:s bargaining power over the customer and could for

instance set the market prices for their offerings.

Since this area of machine learning is relatively new there are not many

competitors in this specific field among banks. However, this does not have to

mean that there are none in the future. There could bemany other banks investing

in this type of R&D hoping to enter the market in the near future. In this case,

being first to the market can be beneficial since it gives additional time to gain

market share.

However, after applying these models there are still relevant aspects not covered

in the analysis. For instance, there is no perspective regarding the government

and how future regulations and laws could affect the market or limit and change

the structure of the banks. Banks have to follow regulations and laws not only

for legal purposes but to show validity to customers. This means that if laws do

change SHB have to be quick to adjust its business and models to it.

Most importantly, there is currently no information about the end product or

service to be offered in the future. Consequently, no final conclusion can be drawn

regarding the strategy for entering a market. If the product would be an mutual

fund which investment decisions are made based on the forecasts from the future

prediction model, then it may be beneficial to be early in order to attract other

investors the fund quickly. Especially, this is important formutual funds since one

of the advantage ofmutual funds is that they provide economic of scales (reducing

transaction costs for investors). In this specific case, being the first mover could

70

Page 83: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

maybe an the advantage since it bring time more time than competitors to gain

these customers.75

7.3 Future Work

This thesis could be used in future work regarding more precise prediction of

different time series. For example, being able to handle different types of data

and have common knowledge on how to make them stationary. The focus in

this research was on specific financial data but the spectra could be made wider.

Moreover, data from more social aspects could give an even better estimate.

For example social media data could have for example been used, such as

twitter hashtags or other important factors that may have an impact on how

people behave on the financial markets. New research have shown that social

media sentiment may have an impact on the stock market.76 In other word,

compared to 10 years ago the influence of apps and social media has increased

significantly which should be taken into consideration when predicting financial

outcome. Inherently, it raises the question how this type of data can be pre-

processed before used as an input in a prediction model. For instance, a question

is whether it is possible to quantify social media data and thereafter use the

transformations assessed in this research to make the data stationary. In, that

case it is also interesting to examine for which parameter values α the data will

become stationary. Other macro-data could also be further investigated and used

to see if data with less frequency behave different compared to our financial data

that has approximately a daily frequency.

It could also be useful to further investigate approximately how much a financial

time series is dependent on a trend and a seasonal component and respectively,

before transforming the series. In this way, it may be possible to identify whether

some transformations or some specific values for the weighting factor α that fit

certain types of time series. If this suggested research is conducted in the future,

an example of an eventual finding could be that one of the transformations is

75Segal, Troy. ”Mutual Fund”, 2019-05-20https://www.investopedia.com/terms/m/mutualfund.asp (Retrieved 2019-05-23)

76Chousa, P. Ramon J. ”Influencing of social media over the stock market”. Psychology andMarketing, 2017. Vol.34(1), pp.101-108

71

Page 84: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

suitable formaking cyclical time series stationary. Such findings enables adjusting

the transformation after what type of series is used as an input.

Regarding the results, it would be further research into the differences between

the ADF-test and the KPSS-test since the results sometimes are contradicting.

The most common hypothesis tests either have the same null hypothesis or the

opposite one. Therefore, in this project it was seen as sufficient to only have two

of them. Withmore timemore hypothesis testswith other hypotheses that provide

other support for stationarity could be used. Moreover, in the future, more

transformations and more data can be tried to get an even more reliable result.

Our belief aswhy theKPSS-test andADF-test showdifferent results is that they are

based on different underlying assumptions. Moreover both the hypothesis tests

are based on the usage of the additive model for time series. As explained, the

additive model was chosen due to the plots indicating an additive model since the

amplitude of the graphs did not seem to increase by a multiplicative factor. Also,

it was chosen due to the simplicity of the model. To improve this project other

tests to find the relationship between the time series and its amplitude could for

example be used to gain further evidence to select themodel type. Since not all the

data was plotted from the start the choice of model is a clear assumption in this

project and should be further examined in the future. It was also later discovered

that one can confirm whether or not the time series is additive or multiplicative

by decomposition of the data and an analysis of the residuals obtained through

Python programs.77 For more accuracy this could be done.

Anothermore technical improvement would have been to loop through the αwith

smaller steps such as 10−7 for example. As mentioned this was not done since it

requires a lot of capacity and takes too much time. With greater resources this

could have been done to see how sensitive the value of α is tominor changes. Also,

it was an extensive project during a short period so there was not enough time to

do further analysis of the data beyond the subject of stationarity. With more time

there could bemore analysis done to find outliers of the data and for example treat

them with other diagnostics.

77”Ismy time series additive ormultiplicative?”, 2017 https://www.r-bloggers.com/is-my-time-series-additive-or-multiplicative/ (Retrieved 2019-05-06)

72

Page 85: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

7.4 Benefits for SHB and its Stakeholders

SHB has many stakeholders affecting its business, internally as well as externally.

The main interest groups are the employees, customers, society, owners and

investors. When proceeding with a project it is important to identify these and

decide how to handle the different interests. This project was limited to only

the pre-processing of the initial data which means that for this part there is

no direct link to the different stakeholders since the machine learning model

has not yet been modelled and is kept internally within the project group. But,

as mentioned, the bigger hope for this project is for SHB to now be able to

obtain a prediction model for different stocks and indices for example. This

will have a significant impact on investors, customers and society in general.

With this machine-learning established method wiser investment decisions and

recommendations will be provided. This will not only increase the profit of

SHB but also its brand which as clear link to investors, customers and hopefully

employees as stakeholders. SHB is also outspoken about its will to contribution

to the sustainable development goals (SDG:s). Since society and the people of it

also is a vital stakeholder to analyse the potential way the project could increase

value to it should be discussed through its impact on the SDGs.78

One of the SDGs in focus is to promote sustained economic growth, higher

levels of productivity and technological innovation. This will long term lead to

better conditions for jobs and encourage entrepreneurship. This project could

contribute to this throughmore precisely focus on green investment opportunities

or infrastructure stocks through machine learning for prediction of financial

outcome. This will then also contribute to sustainable cities and communities

as well as climate action, two more very important development goals if using

data of this kind for the model. Therefore, with this project stakeholders such

as the society in general can also be affected in a positive way, depending on the

model developed. To summarize, we acknowledge this to not only be a financially

beneficial project but also a project that could help SHB, its brand and society

overall depending on the future model development even though this bachelor

78Handelsbanken ”SustainabilityReport”, 2017https://www.industrivarden.se/globalassets/innehavsbolagen/hallbarhetsred2017eng.pdf(Retrieved2019−05− 06)

73

Page 86: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

thesis is only the data-handling of the process.

7.5 Final Words

We are very thankful for the opportunity to participate in a project created and

directed by Handelsbanken. We have learned a lot about the process of working

within a specialized project and how to utilize tools provided. It was a learning

experience to be able to handle all the resources in the most effective way and we

hope that this project will contribute to new research or applications.

We received support throughout the project both from the school and from

Handelsbanken which we value greatly. The project has resulted in a deep

understanding of time series analysis and treatments for stationarity.

74

Page 87: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

References

[1] ADF — Augmented Dickey Fuller Test https://www.statisticshowto.

datasciencecentral.com/adf-augmented-dickey-fuller-test/ Retrieved

2019-03-15

[2] Adhikari, Ratnadip et al.An Introductory Study onTimeSeriesModeling and

Forecasting p.16-19

https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf

Retrieved 2019-02-19

[3] Autocorrelation function

http://www.real-statistics.com/time-series-analysis/

stochastic-processes/autocorrelation-function/ Retrieved 2019-04-03

[4] BlackRock.ETF Landscape: IndustryHighlights, 2012-04-12. https://www.

fondsprofessionell.at/upload/attach/1336476343.pdf Retrieved 2019-

05-02

[5] Breaking Down Finance. EXPONENTIALLY MOVING AVERAGE

VOLATILITY (EWMA) https://breakingdownfinance.com/

finance-topics/risk-management/ewma/ Retrieved 2019-05-03

[6] Brownlee, Jason.How toCheck if Time SeriesData is Stationarywith Python

2016-12-30

https://machinelearningmastery.com/

time-series-data-stationary-python/ Retrieved 2019-03-09

[7] Brownlee. Jason. Time Series Forecasting as Supervised Learning

2015-12-05

https://machinelearningmastery.com/

time-series-forecasting-supervised-learning/Retrieved 2019-02-02

[8] Brownlee, Jason. How to Identify and Remove Seasonality from Time

Series Data with Python 2016-12-23 https://machinelearningmastery.

com/time-series-seasonality-with-python/ Retrieved 2019-04-14

75

Page 88: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

[9] Brownlee, Jason.

What is Time Series Forecasting?”. Machine Learning Mastery, 2016-

12-02 https://machinelearningmastery.com/time-series-forecasting/

?fbclid=IwAR1Zpv80x-4EEN-IIo-h1HL5fGHF6fD-OZYpknScLWdmU-p3uJ8_

03ZF9Ag Retrieved 2019-05-01

[10] Cappuccio, Nunzio et al.

The Fragility of the KPSS Stationarity Test 2009. http://leonardo3.dse.

univr.it/home/workingpapers/fragility_kpss.pdf?fbclid=

IwAR0snLcQCpmgyNCMq0eR9JgXXwFW3hnIZykKcv72IbZO7t57goM9d1W4xGI

Retrieved 2019-04-30

[11] Investopedia. CBOE Volatility Index (VIX) Definition

https://www.investopedia.com/terms/v/vix.asp Retrieved 2019-05-03

[12] Collecting Data

http://betterthesis.dk/research-methods/

lesson-1different-approaches-to-research/collecting-data Retrieved

2019-02-09

[13] CORREL function https://support.office.com/en-gb/article/

correl-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92 Retrieved

2019-04-03

[14] Chousa, P. Ramon J. Influencing of social media over the stock market

Psychology and Marketing, 2017. Vol.34(1), pp.101-108

[15] Deshpande, Bala. Time series forecasting: understanding trend

and seasonality, 2014-03-12 http://www.simafore.com/blog/bid/205420/

Time-series-forecasting-understanding-trend-and-seasonality

Retrieved 2019-05-01

[16] EURO STOXX® 50 VOLATILITY (VSTOXX) INDEX, 2019-03-29. https:

//www.stoxx.com/document/Bookmarks/CurrentFactsheets/V2T.pdf

Retrieved 2019-05-03

[17] Exponentially Weighted Moving Average https://www.value-at-risk.

net/exponentially-weighted-moving-average-ewmaRetrieved 2019-03-02

76

Page 89: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

[18] Generalized Additative Models, 2017-07-06.

https://machinelearningmastery.com/

time-series-data-stationary-python/ Retrieved 2019-05-06

[19] Hayes, Adam. Correlation, 2019-04-30. https://www.investopedia.com/

terms/c/correlation.asp Retrieved 2019-05-01

[20] Handelsbanken. Sustainability Report, 2017.

https://www.industrivarden.se/globalassets/innehavsbolagen/

hallbarhetsred_2017_eng.pdf Retrieved 2019-05-06

[21] Hypotesprövning http://gauss.stat.su.se/gu/sg/2012VT/Kompendium/

KAP17new.pdf Retrieved 2019-05-03

[22] Investopedia Currency Pair Definition https://www.investopedia.com/

terms/c/currencypair.asp Retrieved 2019-05-04

[23] Investopedia CBOE Volatility Index (VIX) Definition

https://www.investopedia.com/terms/v/vix.asp Retrieved 2019-05-03

[24] Investment & FinanceUSGG10YR, 2014-02-

07. https://www.investment-and-finance.net/finance/u/usgg10yr.html

Retrieved 2019-05-02

[25] Jinka, Preetam. Exponential Smoothing for Time Series Forecasting 2017-

06-22.

https://www.vividcortex.com/blog/

exponential-smoothing-for-time-series-forecasting?fbclid=

IwAR2XCtbMASHciBFEIRrpRkVvJda6ziKVJ3qCirAQJ3Oc3GsNBk5VZ4xLd0Q

Retrieved 2019-02-18

[26] Journal of Econonometrics. Testing the null hypothesis of stationarity

against the alternative of a unit root 1991. http://debis.deu.edu.tr/

userweb//onder.hanedar/dosyalar/kpss.pdf?fbclid=

IwAR3uwIVD3WTB1T865Kv3ZotZ3iBaM9nEuq44dIpRr1ULvrVTgvHefVQqwG8

Retrieved 2019-04-30

[27] Journal of International Studies. Seasonal patterns in oil prices and their

implications for investors, 2018.

77

Page 90: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

[28] Kang, Eugine. Time Series, Check Stationarity 2018-08-26

https://medium.com/@kangeugine/

time-series-check-stationarity-1bee9085da05 Retrieved 2019-02-23

[29] Kenton, Will. Autocorrelation, 2019-03-31. https://www.investopedia.

com/terms/a/autocorrelation.asp Retrieved 2019-04-03

[30] KURT function https://support.office.com/en-us/article/

kurt-function-bc3a265c-5da4-4dcb-b7fd-c237789095ab Retrived 2019-

04-03

[31] Kulahci, Murat et al. Time Series Analysis and Forecasting by Example,

2011 p.90

[32] Kuepper, Justin, Volatility Definition,2019-04-18. https:

//www.investopedia.com/terms/v/volatility.asp (Retrieved 2019-05-01)

[33] Lindgren, George. Stationary stochastic processes p.13-16

http://www.math.chalmers.se/~rootzen/fintid/stationary120312.pdf

Retrieved 2019-02-02

[34] Log-space Exponential Moving Average, 2017-11-22.

https://www.tradingview.com/script/

cyfV1gLU-Log-space-Moving-Average/ Retrieved 2019-05-07

[35] MSCIMSCI US Index, 2019-04-30.https://www.msci.com/documents/

10199/67a768a1-71d0-4bd0-8d7e-f7b53e8d0d9f Retrieved 2019-05-03

[36] MSCIMSCI USA MATERIALS INDEX, 2019-04-30.https://www.msci.

com/documents/10199/6ce4617e-9127-480f-8f3b-1fdf4c0c8962 Retrieved

2019-05-03

[37] MSCIIndex solutions, https://www.msci.com/index-solutions Retrieved

2019-05-18

[38] Nabeya, Seiji et al. Asymptotic Theory of a Test for the Constancy of

Regression Coefficients Against the RandomWalk alternative 1987.

https://projecteuclid.org/download/pdf_1/euclid.aos/1176350701?

fbclid=IwAR2Rt2XpMITe_

xA880DiEC4qzo8VEjzmA7HjMKNyp3mKSoKSAXhOaYFf85c Retrieved 2019-04-30

78

Page 91: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

[39] Palaniappan, Vivek. Using Machine Learning to Predict Stock Prices 2018-

10-31

https://medium.com/analytics-vidhya/

using-machine-learning-to-predict-stock-prices-c4d0b23b029a

Retrieved 2019-02-02

[40] Pantelis, Anastasios. Testing for unit roots in the presence of structural

changes 2008 http://lup.lub.lu.se/luur/download?func=downloadFile&

recordOId=1338330&fileOId=1646631 Retrieved 2019-03-09

[41] Paul Newton Helen Bristoll Porters Five Forces p. 20-25

[42] Ragnarstrom, Elsa. How to calculate forecast accuracy for stocked items

with a lumpy demands, 2015. https://www.diva-portal.org/smash/get/

diva2:901177/FULLTEXT01.pdf Retrieved 2019-05-03

[43] Ryabko,Daniil. Asymptotic Nonparametric Statistical Analysis of

Stationary Time Series 2019-03-30

https://arxiv.org/abs/1904.00173 Retrieved 2019-05-01

[44] Sample Mean https://www.statisticshowto.datasciencecentral.com/

sample-mean/ Retrieved 2019-04-03

[45] Sample Variance: Simple Definition, How to Find

it in Easy Steps https://www.statisticshowto.datasciencecentral.com/

probability-and-statistics/descriptive-statistics/

sample-variance/ Retrieved 2019-04-03

[46] Sarlin, Peter and Björk, Kaj-Mikael.

Machine learning in financeNeurocomputing. Vol. 264, 2017: 1-88

[47] Shilling , Melissa Strategic Management of Technological Innovation 5th

edition 2017 p.93-97

[48] SKEW function https://support.office.com/en-ie/article/

skew-function-bdf49d86-b1ef-4804-a046-28eaea69c9fa

Retrieved 2019-04-03

79

Page 92: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

[49] Stationarity and Differencing

https://www.statisticshowto.datasciencecentral.com/stationarity/

Retrieved 2019-03-02

[50] statsmodels.tsa.stattools.kpss https://www.statsmodels.org/dev/

generated/statsmodels.tsa.stattools.kpss.html Retrieved 2019-05-04

[51] Segal, Troy. Mutual Fund, 2019-05-20. https://www.investopedia.com/

terms/m/mutualfund.asp Retrieved 2019-05-23

[52] Tests of Stationarity

https://people.maths.bris.ac.uk/~magpn/Research/LSTS/TOS.html

Retrieved 2019-02-12

[53] The Augmented Dickey-Fuller Test https://www.thoughtco.com/

the-augmented-dickey-fuller-test-1145985 Retrieved 2019-02-27

[54] Time Series

http://www.businessdictionary.com/definition/time-series.html

Retrieved 2019-01-30

[55] Time Series Analysis: Building a Model on Non-stationary

Time Series, 2018-01-30. https://datascienceplus.com/

time-series-analysis-building-a-model-on-non-stationary-time-series/

Retrieved 2019-03-23

[56] UFX. Trading the Dow Jones industrial average, 2018-01-30. https://

www.ufx.com/en-gb/assets/indices/dow-jones/ Retrieved 2019-05-10

[57] Verbeek, Marno. A Guide to Modern Econometrics, 2014, 2nd Edition,

p.265-268

[58] 6 Factors that Influence Exchange Rates

https:

//www.investopedia.com/trading/factors-influence-exchange-rates/

Retrieved 2019-05-01

80

Page 93: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 94: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15
Page 95: Preprocessing Data: A Study on Testing Transformations for …kth.diva-portal.org/smash/get/diva2:1334843/FULLTEXT01.pdf · 2019-07-03 · EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15

TRITA -SCI-GRU 2019:270

www.kth.se


Recommended