Modeling Implied Volatility - ULisboa · Volatility is a measure of randomness, allowing us to...

Modeling Implied Volatility

Rongjiao Ji

Thesis to obtain the Master of Science Degree in

Mathematics and Applications

Supervisor(s): Prof. Cláudia Rita Ribeiro Coelho Nunes PhilippartProf. Maria do Rosário de Oliveira Silva

Examination Committee

Chairperson: Prof. António Manuel Pacheco PiresSupervisor: Prof. Cláudia Rita Ribeiro Coelho Nunes Philippart

Member of the Committee: Prof. Maria da Conceição Esperança Amado

November 2017

ii

Acknowledgments

I would like to show the great appreciation to my supervisors, Prof. Claudia Nunes Philippart and Prof.

Maria do Rosario de Oliveira Silva, for their guidance and suggestion. I am very grateful to my family

for their love, support and comprehension. I deeply appreciate the help, company and encouragement

from Ana Galhoz and Chenshan Xu.

iii

iv

Resumo

Relativamente a questao de determinacao de precos de contratos de produtos derivados, a volatili-

dade do preco da accao ao longo do tempo e frequentemente desconhecida. Volatilidade e uma medida

da aleatoridade, permitindo avaliar quao incerto e o movimento do preco no futuro.

Neste trabalho deriva-se a volatilidade implıcita em cada contrato, usando a formula de Black-

Scholes. Como nao e possıvel determinar analiticamente este valor, sendo necessario recorrer a

metodos numericos, recorre-se ao metodo da bisseccao.

Discute-se a determinacao da volatilidade implıcita usando como valor de entrada na formula de

Black-Scholes o preco dos futuros e o preco das accoes envolvidas. Adicionalmente apresentam-se

dois metodos de calculo, de forma a aumentar a precisao das estimativas obtidas.

Apresentam-se varios modelos para ajustamento dos valores obtidos, nomeadamente modelos

baseados em regressao quantılica linear e florestas aleatorias. Usando estes modelos, e feita previsao

de volatilidade, a qual podera ser utilizada para prever o preco de uma accao no futuro. Desta forma os

investidores poderao ter mais informacao referente as suas decisoes de investimento, nomeadamente

se deverao comprar ou vender opcoes.

Palavras-chave: volatilidade implıcita, formula de Black-Scholes, regressao quantılica, flo-

restas aleatorias

v

vi

Abstract

With respect to the valuation issue of a derivative’s contracts in finance, the volatility of the price of

the underlying asset is often unknown. Volatility is a measure of randomness, allowing us to assess how

uncertain the price movement is in the future.

In this work we first derive the implied volatility for each contract, using the Black-Scholes formula.

Since it is not possible to determine the implied volatility analytically, one needs to resort to numerical

methods. Here we propose to use the bisection method to compute the estimated value of the implied

volatility.

The determination of implied volatility is discussed afterwards, using the future price and the asset

price as input value in the Black-Scholes formula. In addition, two calculation methods are presented, in

order to increase the accuracy of the estimates obtained.

Several models are presented for adjustment of the obtained values, namely models based on linear

quantile regression and random forests. Using these models, we may forecast the implied volatility, and

then use these values to predict the price of an option contract in the future. In this way, the investors

are able to gather more information regarding their investment decisions, in particular they may decide

about buying or selling the options, according to their expectations regarding the behavior of the market

in the future.

Keywords: implied volatility, Black-Scholes formula, quantile regression, random forests.

vii

viii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Combination of options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 EURO STOXX50 index price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Financial Theoretical Overview 9

2.1 Assumptions On The Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Financial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Interest Rate / Discounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.3 Forward and Future Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.4 Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Implied Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Put-Call Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Black-Scholes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Statistical Theoretical Overview 17

3.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Pre-processing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 The Formula of Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Quantile Regression Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Linear regression and its quantile application . . . . . . . . . . . . . . . . . . . . . 20

3.3.2 Tree-based regressors and its quantile application . . . . . . . . . . . . . . . . . . 22

4 Computation and Analysis of Implied Volatility 27

4.1 Calculation Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Analysis of the computed implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

ix

4.2.1 Analysis and comparison based on the example . . . . . . . . . . . . . . . . . . . 30

4.2.2 Analysis on the entire IVF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Modeling and Predicting the Implied Volatility 35

5.1 The process of choosing a subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Explanatory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3.1 Pre-processing process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3.3 Quantile Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.4 A comparison in test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Conclusions 53

6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Bibliography 57

x

List of Tables

1.1 The distribution of trading dates and maturities on weekdays for call and put options.

Almost all the maturities concentrate on Friday. . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 The distribution of trading dates and maturities in months for call and put options. The

months with more maturities are March, June, September and December. . . . . . . . . . 3

1.3 The descriptions of relevant terms which are provided from three types of derivatives

(option, future and discount). The notation star(*) indicates that the marked term is added.

The ’Example Set’ which contains the trading information for a fixed date (January 03,

2014) and maturity (March 21, 2014) is shown at the far right. . . . . . . . . . . . . . . . . 4

1.4 Some call and put options valid from Jan 03, 2014 to Mar 21, 2014 with strike ranged

from 3000 to 3200. These options represent the options whose prices are close to the

crosspoint of call and put options prices. The values in the brackets are ask prices, while

the values outside are bid prices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Parameter λ’s values and corresponding transformations . . . . . . . . . . . . . . . . . . 19

3.2 Comparison between linear, median, and quantile regression methods. . . . . . . . . . . 21

4.1 For the contracts with estimate implied volatilities IV stableF . . . . . . . . . . . . . . . . . . . 33

5.1 Statistical description for options whose maturities are at 2015-09-18. . . . . . . . . . . . 37

5.2 An overview in terms of the response variable together with original covariates and ones

generated afterwards in regression models. Both notations and descriptions are displayed

representatively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Summary of covariates on regression functions . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4 The coefficients of the five covariates and their significance in the robust linear regression. 44

5.5 The summary of coefficients for difference quantiles. . . . . . . . . . . . . . . . . . . . . . 44

5.6 Summary of two regression models’ results . . . . . . . . . . . . . . . . . . . . . . . . . . 51

xi

xii

List of Figures

1.1 The tendencies of option prices of the call and corresponding put options with strike price

in Example Set. The enlargement of the crosspoint of lines is shown in the top right,

where the differences of bid prices and ask prices, overlapped in the original scale, are

visible and clear. With strike rising up, the price of call options declines, while the price of

put options increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Four types of call and put options’ prices for different maturities. . . . . . . . . . . . . . . . 7

4.1 Framework of computation Method 1 (m1) and Method 2 (m2). . . . . . . . . . . . . . . . 28

4.2 The framework of how we organize the related datasets. . . . . . . . . . . . . . . . . . . . 30

4.3 Different types of the implied volatility based on the Example set. Implied volatility com-

puted by future price, asset price and constructed price in two computation methods are

shown in different colors. The results computed by method 1 (denoted as m1 in the leg-

end) are denoted by the notation ’o’, while the ones by method 2 (denoted as m2 in the

legend) are denoted by the notation ’+’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Comparison of implied volatility derived by both methods. . . . . . . . . . . . . . . . . . . 32

4.5 Option (ask) prices of their call and put options for the contracts with future price involved. 33

5.1 Boxplot of trading dates for every maturities appeared. . . . . . . . . . . . . . . . . . . . . 36

5.2 Sample sizes and rates of each maturity. The red numbers at the top of the columns

explain the proportion of the sample size of each maturity in the total amount of the ob-

servations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3 Pair plot for options whose maturities at 2015-09-18. Here we plot the seven variables,

namely time to maturity, strike, constructed price, IV (m1),IV (ask), IV (bid) and IV (m2). 38

5.4 3D plots from the side of both time to maturity and strike to check the distribution of implied

volatility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 The distribution of response variable before and after Box-Cox transformation. . . . . . . 39

5.6 Plots with extreme values detected by robust Mahalanobis distance. Points whose ro-

bust Mahalanobis Distances are the largest 1% amongst the dataset would be treated as

extreme values and marked green . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.7 General plots of regression Model 3. The heavy-tail problem indicates that the fitting is

hard to be treated as a good one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

xiii

5.8 Plots for coefficients in quantile linear regression. Each black dot is the corresponding

variable’s coefficient for the quantile τ chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}.

The red lines are the ordinary least squares estimate and its confidence interval. . . . . . 45

5.9 Quantile Linear Regressions for transformed implied volatility with strike and time to matu-

rity. The quantile fitted values of response variable respectively in lower quantile, median

and higher quantile are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.10 90% prediction interval for test set in quantile linear regression. . . . . . . . . . . . . . . . 47

5.11 90% prediction interval for test set in Quantile Random Forest. . . . . . . . . . . . . . . . 49

5.12 Quantile Random Forests for transformed implied volatility with strike and time to maturity.

The quantile fitted values of response variable respectively in lower quantile, median and

higher quantile are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.13 Two methods with median prediction and quantile boundaries. . . . . . . . . . . . . . . . 52

xiv

Chapter 1

Introduction

1.1 Motivation

The famous Black-Scholes formula [1] was the first formal reply to the one million dollar question: how

to valuate an option contract. Even though this question was studied for more than hundred years, Black

and Scholes [1] and Merton [2] in 1973 provided a breakthrough result, and it is the footstone of modern

option pricing studies. An outsider may have the idea that the main goal is to capture the empirical

properties of option prices. Here the logic and purpose behind is that an option pricing model can be

used as a tool, to obtain the features of option prices, such as implied volatility, by relating the option

price with the price of the underlying asset in an arbitrage-free manner (i.e. no way to make a risk-free

profit) [3]. The main thrusts behind generating the Black-Scholes model is to find a way like this to fit the

market prices of options. The Black-Scholes formula is not used as a pricing model for vanilla options

(which are the options without any special feature), but applied to transform the market prices into an

expression in terms of implied volatility [4].

As far as we know, volatility is a measure of the randomness and an evaluation of how uncertain the

price movement is in the future, specifically towards the price of an option’s underlying asset [5]. The

higher volatility indicates the greater expected fluctuation for the underlying asset’s price [6]. Although

the volatility associated with the underlying asset(s) is assumed to be constant and known in Black-

Scholes formula, in reality it is changing and mostly unknown. Different from studying the historical

volatility of the asset’s price, nowadays the focus shifts to the implied volatility [7]. Prices in option

markets are actually commonly quoted in terms of Black-Scholes implied volatility. The value of a call

option generated by Black-Scholes formula is a strictly increasing function of the volatility parameter

[3]. Thus, given observed option prices in the market, the value of implied volatility can be deduced by

matching the corresponding Black-Scholes price with the market price [8]. The implied volatility from the

real market shows a pattern named volatility smile, influenced by time to maturity and strike price of the

option. Thus some advanced alternative parametric models were generated for implied volatility.

With the development of artificial intelligence and machine learning rising up rapidly, and the zoom-

ing computational ability of hardwares in the recent two decades, the conception of machine learning

1

becomes the hotspot and brings some brand new perspectives of understanding the contemporary data-

driven society and industries. Finance, in particular, is a representative industry established by count-

less buyers and sellers, and tremendous data caused through the trading. Different from the theoretical

parametric models, such as stochastic volatility model [9] and jump-diffusion model [10], non-parametric

models based on machine learning techniques, which has weaker restrictions than the former in usual

cases, have been developed for pricing options [11].

Quantile method [12] has been studied and used in multiple fields for regression problems. Popularly,

statisticians and engineers are used to analyzing the conditional mean of the response variable by min-

imizing the expected squared error loss function, neglecting other aspects of the conditional distribution

but only mean. Quantile method, beyond the conditional mean, give more complete information about

the distribution of response variables as a function of the dependent variable other than one value alone.

Not only can the quantile method build the prediction intervals to judge the reliability of predictions, it can

also detect the extreme values of response variable if one observation lies far from the conditional me-

dian for example. While the conditional mean minimizes the expected squared error loss, the conditional

quantiles can minimize the expected weighted absolute deviations loss. Takeuchi et al. [13] presented

a nonparametric version of a quantile estimator which provided uniform convergence and bounds on

quantile property. Meinshausen [14] inferred the conditional quantiles with quantile regression forests

as a generalization of random forests. It starts to be used in financial studies as well. Zikes and Barunık

[15] investigated how the conditional quantiles of future returns and volatility of financial assets vary with

the asset prices’ variation through simple linear quantile autoregressions.

In this work, we use some of these methods to predict future volatility. For that purpose, we use data

(that will be discussed in the next section) from which we can compute the implied volatility and perform

a further study.

1.2 Dataset Description

The dataset used in this work is provided by the BNP Paribas bank, summarizing information over three

types of derivatives (option contracts, discount values and future contracts). These three derivatives

are all based on the same underlying asset, the EURO STOXX50 index, which is an Europe’s leading

Blue-Chip index for the Eurozone. From the original dataset, there are 312339 options contracts (where

147416 are call options contracts and 164923 are put options contracts), 7255 discount values, and

1153 future contracts.

Specifically, as the main object, option contracts include trades in 424 different dates, ranged from

January 02, 2014 to October 29, 2015. In Tables 1.1 and 1.2, we can gather the information for the

distribution of date and maturity for call and put options in weekdays and months. From a simple ob-

servation of these tables, regarding the dates of option contracts, they are almost uniformly distributed

amongst weekdays, a bit more falling in summer and autumn, while less falling in winter. However for

the maturities, almost all the maturities concentrate on Friday. Also the months with more maturities are

March, June, September and December.

2

Table 1.1: The distribution of trading dates and maturities on weekdays for call and put options. Almostall the maturities concentrate on Friday.

Monday Tuesday Wednesday Thursday Friday

Call

Date 20.30% 20.80% 20.50% 19.50% 18.80%

Maturity 0.13% 0.08% 0.00% 0.75% 99.00%

Put

Date 20.10% 20.60% 20.50% 19.90% 18.80%

Maturity 0.13% 0.07% 0.00% 0.92% 98.90%

Table 1.2: The distribution of trading dates and maturities in months for call and put options. The monthswith more maturities are March, June, September and December.

January February March April May June July August September October November December

CallDate 8.84% 8.75% 8.29% 5.18% 7.26% 9.32% 10.80% 11.00% 13.00% 11.30% 3.14% 3.19%

Maturity 1.93% 2.25% 8.80% 1.68% 1.79% 24.60% 3.15% 3.61% 9.91% 5.22% 3.21% 33.90%

PutDate 8.70% 9.22% 8.84% 5.76% 7.31% 9.89% 10.50% 10.60% 12.10% 10.80% 3.06% 3.17%

Maturity 2.06% 2.46% 8.62% 2.72% 1.65% 25.40% 4.16% 4.53% 9.65% 5.42% 4.10% 29.20%

After analysis of future contracts, we find that there are only ten distinct maturities inside and these

maturities spread over the third Friday of every quarter from 2014 to the mid year of 2016. As mentioned

in Section 2.2.3, we denote these maturities as ’major maturities’ for simplification in the following work.

In Table 1.3 we present the descriptions of relevant terms. Because of the intrinsic difference between

the products, we separate its description by the type of contract (option, future or discount). We remark

that the date which we are here presenting is not exactly equal to the formatting of the date set initially

provided by the bank BNP. For each date and maturity appeared in the original data set, we transformed

those numerical integers in original information into the current recognizable categorical date format.

This operation can help us understand the patterns behind trading dates. Afterwards, in this original data

set, some non-relevant information was discarded (such as the lot size, bid and ask size). Moreover we

also included four variables which are marked by star and are relevant for the rest of the study, namely:

• Time to maturity (life length of a option contract from its date to maturity.) Note that in this work,

we count the entire natural days among this period, instead of the actual financial trading days.

• Moneyness (calculated based on Equation 2.4 in Chapter 2, which later on we will explain in more

details.)

• Constructed price (the constructed price S of the underlying asset (given the true market price of

the asset is S), deduced from Equation 2.2. The difference between this constructed price and

the asset price will be explained afterwards in the subsection. It is, by the way, a non-standard

terminology, used in this thesis.)

• Interest rate (deduced from Equation 2.1 and mentioned in Subsection 2.2.1.)

3

Table 1.3: The descriptions of relevant terms which are provided from three types of derivatives (option,future and discount). The notation star(*) indicates that the marked term is added. The ’Example Set’which contains the trading information for a fixed date (January 03, 2014) and maturity (March 21, 2014)is shown at the far right.

Term Description Example Set

Date t The beginning date of a derivative contract. 2014-01-03

Maturity T The expiration date of a derivative contract. 2014-03-21

Option ContractsType Call options C or put options P . Both

Strike K The price paid for the asset if the option is exercised.The strike, bid price,

Bid Price The highest price a buyer is willing to pay for the option.ask price and label

Ask Price The lowest price a seller is willing to accept for the option.are shown in Table 1.4

*Moneyness M Deduced by M = KDS = K

F and used to label options.

*Time To Maturity τ The subtraction of date and maturity. 77

*Constructed Price S Constructed as the price of underlying asset through S = FD. 3062.73

Future ContractsBid Price The highst price a buyer is willing to pay for a future contract 3061

Ask Price The lowest price a seller is willing to accept for a future contract 3062

Discount ValueDiscount Value D Discount the value in the future back to current time. 0.9996

Interest Rate r Deduced by D = e−r(T−t). 5.5× 10−6

In order to give readers an understandable perspective on these financial terms and several financial

criteria mentioned below, here we display specifically the trades (marked as ’Example Set’) with trading

date in January 03,2014 and maturity in March 21,2014, as an example to explain the micro structure of

dataset and the following operations. The information in detail is shown in the right side of Table 1.3 and

details of some option contracts is shown in Table 1.4.

1.2.1 Combination of options

Note that the amount of call options is 147416 and the amount of put options is 164923. Due to the

Put-Call Parity, volatility is the same for a call option and a put option with the same combination of date,

maturity and strike (shown in Section 2.4). To avoid calculating the same thing repeatedly, it is vital to

combine the call with corresponding put options. As a result, we have now 107571 pairs of call and put

options.

Following our ’Example Set’ where we extract a set of contracts with the date in January 03, 2014

and maturity in March 21, 2014, the only unfixed variable for pricing the options is strike price. In Table

1.4, together with Figure 1.1 and its enlargement, we can have a clear idea of how the options are

combined, the way that call options and put options are related and the tendency of the price changing.

From Figure 1.1 and Table 1.4, we can figure out that: For call options, when strike value goes up,

the price of call option declines. Oppositely, the price of put option increases. However, the range of

call option prices is larger than the range of put option prices. One of the possible reason might be that

traders believe the market price would go up. What is more, when the call option and put option prices

intersect, it normally means that the case ’at-the-money’ option was reached. In that case, the value is

4

Figure 1.1: The tendencies of option prices of the call and corresponding put options with strike price inExample Set. The enlargement of the crosspoint of lines is shown in the top right, where the differencesof bid prices and ask prices, overlapped in the original scale, are visible and clear. With strike rising up,the price of call options declines, while the price of put options increases.

Table 1.4: Some call and put options valid from Jan 03, 2014 to Mar 21, 2014 with strike ranged from3000 to 3200. These options represent the options whose prices are close to the crosspoint of call andput options prices. The values in the brackets are ask prices, while the values outside are bid prices.

Index Strike Call Option Price Put Option Price Label

28 3000 135.3(137) 63.1 (64) in-the-money for call

29 3025 119.3(120.6) 71.8 (72.7) and

30 3050 104(105) 81.6 (82.5) out-of-the-money for put

constructed price: 3062.73 at-the-money

31 3075 90(90.8) 92.4 (93.3)

out-of-the-money for call32 3100 77(77.7) 104.4 (105.2)

and33 3125 65.2(65.6) 117.6 (118.4)

in-the-money for put34 3150 54.7(55.3) 131.9 (133.4)

35 3175 45.3(46) 147.4 (149.1)

36 3200 37.2(37.9) 164.2 (166)

supposed to be close with ’constructed price’. According to the definition introduced in Section 2.2.4, the

options can be divided into two types by comparing their strike prices with the asset price: ’in-the-money’

call (put) options when strikes are smaller (larger), or ’out-of-the-money’ call (put) options when strikes

are smaller (smaller). This kind of opposite relationship is as expected following their definition and the

opposite sides in the trading movement.

5

1.2.2 EURO STOXX50 index price

As we mentioned at the beginning of this section, the EURO STOXX50 index is namely the underlying

asset under study, i.e. the objective financial asset that our three derivatives associated with. Besides

the three original data sets containing information about options, future and discount, the information

of historical EURO STOXX50 index price (denoted as Smarket) can be obtained easily from the website

’Yahoo Finance’ 1. Note that the newly obtained index prices were captured at 18:00 CET in each trading

day while the options contracts were gathered at 17:15 CET. Thus the index price obtained later, is not

the real-time asset price at 17:15 CET and can variate a bit from the constructed price deduced from

the original dataset, within 45 minutes at very active trading days, in particular at major maturities of

future contracts. Following the ’Example Set’, the value of the EURO STOXX50E index in Jan 03, 2014

is Smarket = 3074.43, while the constructed price S = 3062.73.

According to theoretical inference, for the current date t and a fixed maturity T , when the put option

and call option encounter the same price C(t, T,K) = P (t, T,K) at strike K, via Put-Call Parity in

Equation 2.5 (and Equation 2.6 through another form), there is St = Ke−r(T−t) (and F (t, T,K) = K)

valid for the same strike K. Thus the true value St is supposed to be equal to S, i.e. the constructed

price should worth the same as current true price of underlying asset. Thus, we have two different

situations depending on whether

1. Those options whose maturities belong to also the major maturities of future contracts. In this

situation, their crosspoints shift a bit left from the line of current index price in Figure 1.2(d) for

maturity at 2014-12-19.

The reason of the shifts might be caused by Triple Witching Phenomenon as we mention in Chapter

2, when the contracts for stock index futures, stock index options and stock options expire on the

same day (four times a year on the third Friday of March, June, September and December). Triple

witching days generate trading activity and volatility because contracts that are allowed to expire

may necessitate the purchase or sale of the underlying security. While some derivative contracts

are opened with the intention of buying or selling the underlying security, traders seeking derivative

exposure only must close, roll out or offset their open positions prior to the close of trading on triple

witching days. It is probably the reason why the shifts happened.

What is more, Figure 1.2(c) corresponds to a case where information is not complete, as the

number of records is not enough (less than 10).

2. Those options whose maturities do not belong to the major maturities of future contracts, or their

maturities do belong to the major maturities but the influence of Triple Witching Phenomenon is

not that much and limited.

In this situation, the index price normally passes the crosspoint of the prices of call and put options

by simply observing (like in Figure 1.2(a) and Figure 1.2(b)). As the price of EURO STOXX50E

index is equal to the strike price when C(t, T,K) = P (t, T,K) in most cases, those figures prove

1 From the website ’Yahoo Finance’ https://finance.yahoo.com/quote/%5ESTOXX50E/history?period1=1388620800&

period2=1446076800&interval=1d&filter=history&frequency=1d

6

https://finance.yahoo.com/quote/%5ESTOXX50E/history?period1=1388620800&period2=1446076800&interval=1d&filter=history&frequency=1d

https://finance.yahoo.com/quote/%5ESTOXX50E/history?period1=1388620800&period2=1446076800&interval=1d&filter=history&frequency=1d

(a) Maturity 2014-01-17 (b) Maturity 2014-03-21

(c) Maturity 2014-09-19 (d) Maturity 2014-12-19

Figure 1.2: Four types of call and put options’ prices for different maturities.

that the price of the underlying asset is very similar to the index price in these cases, even through

some of the maturities belongs to major maturities in future contracts market. Thus the index price

can be used as a supplement for those options which have no information about the relevant future

price. It can give a rough direction for the investors although its accuracy is not assured.

1.3 Objectives

The work focuses on implementing firstly several financial criterion to a dataset composed by option

contracts, discount values and future contracts, and secondly the quantile methods to a dataset in terms

of the computed implied volatility, in order to generate a clear cognition of real option trading process

and dig out several latent patterns for the financial option market based on the implied volatility of the

price of underlying asset through Black-Scholes formula.

This work first derives the implied volatility for each contract, implicitly by equalizing the estimate

option price (as a function of implied volatility deduced by Black-Scholes formulas) with the option price

in the market, through Bisection method. Black-Scholes formula with i) future price involved, and ii) asset

price involved, and two calculation methods for the boundary condition are discussed with particular

7

emphasis on the accuracy of the results. Then, using the obtained implied volatility, we fit several

prediction models, mainly quantile method on linear regression and random forests. The goal is then,

using the selected models (based on the quality of fitting), to forecast the tendency of implied volatility.

As the dataset has many observations, it is hard to visualize the distribution and significant charac-

teristics. Therefore we focus on the subset with some interesting features to study further.

1.4 Thesis Outline

Chapter 2 introduces the essential financial knowledge in order to understand and analyze the option

contracts dataset from the bank. First part of this chapter starts by explaining the the financial market

assumptions and five common financial derivatives such as the interest rate, index, option, forward and

future, followed by introducing the implied volatility of prices of the underlying asset. Finally, the most

important financial theorems in this work, Put-Call Parity (that relate the put options and call options

prices) and Black-Scholes Formula (to estimate the options prices) are discussed.

Chapter 3 describes the root-finding bisection method, the statistical methods used in pre-processing

steps, and interprets the principle of quantile regression to take into account more general perspectives

of the distribution of response variable, ended by combining the machine learning algorithms such as

Decision Tree and Random Forests with quantile methods.

Chapter 4 demonstrates how to calculate the implied volatility, the main object of this work. On the

contrary of estimating the options prices to a specific number, we generate the estimate prices as a

function of implied volatility deduced by two forms of Black-Scholes formula. We equalize the estimates

with the option prices in the market, and then derive implicitly the implied volatility by Bisection method

for each contract. Due to the fact that the true market prices are composed by bid-ask spread not by a

single value, two methods of calculation are proposed and compared.

Chapter 5 mainly estimates the implied volatility dataset selected and deduced in last chapter. After

pre-processing, this chapter focuses on using quantile methods together with linear regression and

random forests method to fit the dataset. Since the estimate implied volatility has relatively large size

and observations overlap heavily, it is hard to visualize the distribution and significant characteristics,

even harder to fit a proper model, so this chapter focuses on the subset which contains some significant

features to study further. At the end, this chapter displays the results for different modeling results for

the subset selected and compares the the performance of these models.

The thesis concludes with chapter 6, where we also point possible future work.

8

Chapter 2

Financial Theoretical Overview

The first goal of this work is to obtain the features of option prices, implied volatility, by using Black-

Scholes formula as the option pricing model. To understand clearly the financial movement and achieve

this goal, we need to know some knowledge related with financial market. Thus in this chapter we first

present some financial concepts which will be used in the rest of the thesis. Secondly, we address the

main issue of this thesis, which is the implied volatility, providing its definition and the pattern named

volatility smile. The chapter ends with illustrations of the Put-Call Parity which connects call options

prices with put options prices and Black-Scholes formula we mentioned several times.

2.1 Assumptions On The Assets

Due to the fact that in the modern financial markets there are various types of financial derivatives and

therefore different transaction systems and rules, the academic community reaches a consensus and

usually follows several simplified ’ideal conditions’ concerning the financial market and underlying assets

(for instance, stocks). The main assumptions include [1]:

1. The rate of return on the riskless asset is constant through time.

2. The price of the stock follows a Geometric Brownian motion, with drift and volatility assumed to be

constant.

3. The stock pays no dividends or other distributions.

4. The option based on the stock is European, meaning that only operation at maturity is allowed.

5. There is no arbitrage opportunity, i.e., there is no way to make a riskless profit.

6. It is possible to borrow and lend any amount, even fractional, of the price of a security, at the

short-term interest rate.

7. It is possible to buy and sell any amount, even fractional, of the stock (this includes short selling).

8. There are no transaction costs in buying or selling the stock, i.e., a frictionless market.

9

2.2 Financial Derivatives

In finance, a derivative is a contract between two or more parties that derives its value based on an

underlying financial asset (an asset, stock, commodity, index, or interest rate). Common derivatives

include future contracts, forward contracts and options. Here we present the relevant underlying assets

(interest rate and stock index) briefly and mainly focus on derivatives (put and call options, future and

forward contracts).

2.2.1 Interest Rate / Discounts

Interest rate is the amount of interest due per period (day, month, year, ect.). It is a proportion of the

amount that is lent, deposited or borrowed over the same period [16]. If t denotes the current date, and

T denotes the maturity, then the corresponding value of 1 monetary unit at time T at today’s price, given

that the interest rate is r, is:

D(t, T ) = e−r(T−t), (2.1)

in case the interest rate is continuously compounded (here continuously compounding means that the

interest rate is added to the account’s balance every infinitesimal time). There are other compounded

schemes, but here we only assume this one. Single non-arbitrage arguments show that call options

prices increase with the interest rate, where the corresponding put prices decreases [17].

2.2.2 Index

The stock index is generated in order to measure how the stock market is behaving in general. As a

type of financial asset, an index is calculated by the weighted sum of a basket of representative stocks,

to describe the basic forms of option contracts and give a basic understanding of the purpose of options

[18].

For instance, the EURO STOXX 50 is a stock index of Eurozone stocks designed by STOXX, an

index provider owned by Deutsche Borse Group. Introduced on 26 February 1998, it is made up of fifty

of the largest and most liquid stocks. The index futures and options on the EURO STOXX 50, traded on

Eurex, are among the most liquid such products in Europe and the world.

The calculation of the indices employs the Laspeyres formula, which measures price changes against

a fixed base quantity weight. For further details on the construction of this index, we refer STOXX

Calculation Guide1.

2.2.3 Forward and Future Contract

A forward contract is a contract that the buyer promises and is obliged to buy an asset (a stock etc.)

from the seller at some specified price F (t, T ) and time T in the future, given today’s date t [19]. No

1 STOXX Calculation Guide http://www.stoxx.com/indices/rulebooks.html

10

http://www.stoxx.com/indices/rulebooks.html

money changes hands until the maturity of the contract. In other words, given the current market price of

the asset (spot price) is St, when it goes to maturity, the buyer will hand over the amount F and receive

the asset, which by then will worth ST .

In order to establish the relationship between F (t, T ) (the forward price when the forward contract

was initiated at time t) and St (the spot price at the same time t), we construct a portfolio as following:

First we enter into a forward contract at price F (t, T ). In the meanwhile, we short-sell the asset and

invest the money (St) in the bank. Therefore we start this strategy without any investment. At the

maturity T , we get Ster(T−t) from the bank, but we are still obliged to pay F (t, T ) for the asset, as settled

in the forward contract, and then return the asset back to the short-seller. In a nutshell, at the maturity we

hold Ster(T−t)−F (t, T ). Based on no-arbitrage principle [20], which means no ’free lunches’, a portfolio

that would have nonnegative payoffs must have a nonnegative cost. Since we begin this portfolio with

zero investment, we should finish these trading with zero profit and loss, which leads to:

F (t, T ) = Ster(T−t) =

StD(t, T )

(2.2)

A future contract is very similar to a forward contract in many ways, but the main difference is that

the profit or loss for no matter buyer and seller is daily calculated and the difference between that spot

price and the initial futures price is paid gradually from one party to another during the validation period

of the contract, leading to the fact that the future price is always equal to the spot price on the maturity.

Although the continual resettlement feature of futures contracts makes it difficult to determine an

equilibrium futures price in terms of its underlying variables, if interest rates are non-stochastic and

there are no arbitrage opportunities, it can be shown that futures prices are equal to forward prices.

Consequently, the valuation formulas given for forward prices will then also hold for futures prices [21].

There is one more thing worth to mention. Future contracts, especially for stock indices, are typically

divided into four or more maturities throughout the year, at which time they are traded most heavily

by day traders. For example, in 2015 there are maturities for future contracts in March 20, June 19,

September 18, and December 18. We denote these maturities which are the third Friday of every

quarter in each year) as major maturities for simplicity in the following part of this work.

Triple Witching Phenomenon happens exactly at these four times of each year. Triple Witching

Phenomenon occurs when the contracts for stock index futures, stock index options and stock options

expire on the same day. Triple witching days generate trading activity and volatility because contracts

that are allowed to expire may necessitate the purchase or sale of the underlying security. While some

derivative contracts are opened with the intention of buying or selling the underlying security, traders

seeking derivative exposure only must close, roll out or offset their open positions prior to the close of

trading on triple witching days.

2.2.4 Option

Different from the holder of a future or forward contract who is obliged to trade the asset at the maturity,

an option in the financial market is a contract giving the right but not the obligation to buy or sell the

11

underlying asset at an agreed price within a specified period of time under some certain conditions.

When the option is European, this right may only be exercised on a chosen future date, called the

maturity. The price that is paid for the asset when the option is exercised is called the strike price [1].

Here, along this thesis, we always assume European options and for this reason, wherever no con-

fusion arises, we omit the word ’European’.

For a call option, the buyer (of the option) has the right to buy the stock, at the maturity, paying a

strike price, whereas in case of a put, he/she has the right to sell the stock [1]. It follows, trivially, that in

case of a call option, the payoff at maturity (i.e., the return of the product) is

max(ST −K, 0),

whereas in the case of a put option is

max(K − ST , 0),

where K is the strike price [18].

In the liquid market, the ’current’ price is an uncertain term, and it only represents the trading price

in the most up-to-date trade which the sellers and the buyers both agree with. Since the bid price is the

highest price that a prospective buyer is willing to pay for the asset till the last trade, and the ask price is

the lowest price acceptable to a prospective seller of the same asset at the same moment, it indicates

that the trade will be executed probably when the trading price is between the bid price and ask price.

The difference between the two prices is called the ”bid-ask spread”. It is a key indicator of the liquidity

of the asset. Generally speaking, the smaller the spread, the better the liquidity. The bid-ask spread can

widen dramatically during periods of ill-liquidity or market turmoil, since traders will not be willing to pay

a price beyond a certain threshold, while sellers may not be willing to accept prices below a certain level

[21].

Moneyness and the types of options

In options trading, the difference between ”in the money” and ”out of the money” is a matter of the

strike price’s position relative to the market value of the underlying asset. Moneyness, here denoted by

M , is defined as [19]:

M =KD(t, T )

St, (2.3)

where, for simplicity, we presently omit T from the notation.

Besides, based on the relationship between the asset prices and the future contracts mentioned in

Equation 2.2, the moneyness can be denoted by

M =K

F (t, T )(2.4)

Therefore, for an option, if its moneyness

• M > 1, the option is an in-the-money call option or an out-of-the-money put option (depending on

12

which contract are we addressing to);

• M = 1, the option is at-the-money option for both call and put option;

• M < 1, the option is an out-of-the-money call option, or an in-the-money put option.

If the difference of St and K is larger, then the option is called deep-in-the money or deep-out-of-the-

money option.

Known that a call option is an investment chosen by those who believe the underlying asset price will

continue to rise, an in-the-money call option is a call option with its strike price lower than the current

stock price. In this case, the option holder is more likely to earn a profit.

Following the same idea as call options, put options are purchased by investors who believe the

asset price will go down. In-the-money put options are the options whose strike prices are above the

current stock price.

2.3 Implied Volatility

Volatility is a measure of the randomness and uncertain risk, specifically a variability of fluctuation to-

wards the price of an option’s underlying security. Volatility of a stock or index price is an evaluation of

how uncertain the price movement is in the future [5]. The higher the volatility, the greater the expected

fluctuations for the underlying asset’s price, and the higher values of both call options and put options

[6].

The unknown input when computing the price of the option is the expected volatility over the life of the

option. In a market economy with actively traded option contracts, which express the market’s view of

the relevant prices for those contracts, one can solve for the volatility that equates the observed market

price of the option contract with the price given by the chosen option pricing formula. This yields the

implied volatility [22]. Therefore, implied volatility is defined as the parameter σ in the option pricing

model Black-Scholes formula that yields the actually observed market price of a particular option, as we

will see later on.

The studies of pricing the options used to focus on the historic volatility, but now the focus shifts to

the implied volatility, which can indicate the expected volatility in the future according to current observed

option prices in the market, given other known option pricing variables (such as asset price and time to

maturity) and option pricing parameters (like interest rate and strike price). Given the historical level of

the underlying asset’s implied volatility, the trader can decide whether to buy an option (when extrinsic

value is on the high end) or whether to sell an option (when extrinsic value is on the low end).

Since the volatility calculated by the observed option prices shows a non-flat shape, similar to a

’smile’, this characteristic is also named ’volatility smile’.

13

2.4 Put-Call Parity

Here we present a popular formula, that relates the prices of a call and a put option, on the same

underlying asset, maturity and strike price. For a fixed date t, we define an European call option and

an European put option with asset price St, maturity T and strike price K. The payoff function for this

portfolio is

max(ST −K, 0)−max(K − ST , 0) = ST −K

Note that the previous formula shows that the profit of a call option minus the profit of a put option is

the same as the following portfolio: borrow Ke−r(T−t) and buy a stock, paying St. At the maturity T , the

return from such portfolio is

ST −K,

i.e., the value of the stock at the maturity minus the amount of money that you need to return back to the

bank.

Therefore, as at the maturity, the payoff of being the owner of a call and the seller of a put is equal to

ST−K, then the price of the call at time t, C(T,K), and the price of the corresponding put, P (T,K), have

to be equal to the initial investment of the portfolio consisting of one stock and money, by non-arbitrage

arguments, therefore [18]:

C(T,K)− P (T,K) = St −Ke−r(T−t). (2.5)

It is a very important financial theorem called Put-Call Parity, and the equality is independent of the

behavior of the asset in the future and holds at any time to maturity.

If traders have long trading positions, it means that they actually hold the traded asset in a long term

and they are concerned when the price of the asset falls. On the opposite, a short position indicates

that a trader first borrows an asset and subsequently sells it on the market. In this way, if the price of the

asset falls, the trader can then buy it back at a lower price to give the asset back to the lender [23].

Noting that we mentioned before the price of forward contract at the maturity is given by F (t, T ) =

Ster(T−t), by replacing the stock price St with forward price F (t, T ), there is one more way to get the

same payoff:

C(T,K)− P (T,K) = F (t, T )e−r(T−t) −Ke−r(T−t) = [F (t, T )−K]D(t, T ). (2.6)

The equation holds because if the stock price at maturity is above the strike price, the call option will

be exercised, while if it is below, the put will be exercised, and thus in either case one unit of the asset

will be purchased for the strike price, exactly as in a forward (future) contract.

Note that the Put-Call Parity (Equation 2.5) holds, assuming in some cases non-realistic conditions,

in particular taking away all frictions and incompleteness of the market. In practice, bid-ask spreads and

liquidity issues imply that observable prices of European options do no align necessarily to the theory.

In this case the left-hand side is a fiduciary call, which is a long call and enough cash (or bonds) to

pay the strike price if the call is exercised, while the right-hand side is a protective put, which is a long

14

put and the asset, so the asset can be sold for the strike price if the stock price is below strike at expiry.

Both sides have payoff max(ST ,K) at maturity (i.e., at least the strike price, or the value of the asset if

more), which gives another way of proving or interpreting Put-Call Parity.

2.5 Black-Scholes Formula

Black and Scholes [1] and Merton [2] in 1973 provided a breakthrough result in the modern option pricing

studies. An outsider may have the idea that the main goal is to capture the empirical properties of option

prices.

An option pricing model can be used as a tool, to obtain the features of option prices, such as

implied volatility, by relating the option price with the price of the underlying asset under the arbitrage-

free assumption to keep a fair market [3]. The main thrusts behind generating the Black-Scholes model

is to find a way to transform the market prices into an expression in terms of implied volatility [4].

Besides the assumptions for the market and underlying asset at the beginning of this section, there

are other assumptions for Black-Scholes model to hold [18], notably the following:

• The underlying stock price follows a log-normal distribution (also known as the Geometric Brownian

Motion);

• The interest rate is fixed or a known function of time;

• There are no dividends on the underlying stock.

Followed the notations in the last subsection, the Black-Scholes formula [1] proposes the estimate

value of the call option C(t, T,K, St, σ) as

C(t, T,K, St, σ) = StΦ(d1)−Ke−r(T−t)Φ(d2), (2.7)

d1 =ln(St

K ) + (r + 0.5σ2)(T − t)σ√T − t

,

d2 = d1 − σ√T − t,

where Φ(·) is the cumulative distribution function of the standard normal distribution, and σ is the volatility

of returns of the underlying asset. It is the square root of the quadratic variation of the stock’s log price

process.

In view of the Put-Call Parity (Equation 2.5), it follows also from Equation 2.7 that the corresponding

put option price C(t, T,K, St, σ) is:

P (t, T,K, St, σ) = −St +Ke−r(T−t) + StΦ(d1)−Ke−r(T−t)Φ(d2)

= −St[1− Φ(d1)] +Ke−r(T−t)[1− Φ(d2)]

= −StΦ(−d1) +Ke−r(T−t)Φ(−d2). (2.8)

Another common form for Black-Scholes formula based on future price and discount value for call

15

option price C(t, T,K, F (t, T ), D(t, T )) is proposed as:

C(t, T,K, F (t, T ), D(t, T )) = D(t, T )(F (t, T )Φ(d1)−KΦ(d2)), (2.9)

d1 =ln(F (t,T )

K ) + 0.5σ2(T − t)σ√T − t

,

d2 = d1 − σ√T − t,

where D(t, T ) = e−r(T−t) is the discount factor, F (t, T ) = Ster(T−t) = St/D(t, T ) is the forward price of

the underlying asset.

Similarly, if we join in the Put-Call Parity (2.6) here for the prices of the call option (2.9), we can then

deduce the prices of the corresponding put option P (t, T,K, F (t, T ), D(t, T )):

P (t, T,K, F (t, T ), D(t, T )) = −[F (t, T )−K]D(t, T ) +D(t, T )(F (t, T )Φ(d1)−KΦ(d2))

= −D(t, T )F (t, T )[1− Φ(d1)] +KD(t, T )[1− Φ(d2)]

= D(t, T )[Φ(−d2)K − Φ(−d1)F (t, T )]. (2.10)

The estimate price of a call option using Black-Scholes is a function of the volatility parameter, and

it is strictly increasing from ]0,+∞[ to ]max{St − Ke−r(T−t), 0}, S0[ [3]. If an observed market price

Cmarket(t, T ) within this range is given, it is possible to compute the estimate of volatility parameter

σt(T,K) such that the corresponding Black-Scholes prices can be matched with the market price:

∃!σt(T,K) > 0, s.t. C(t, T,K, St, σt(T,K)) = Cmarket(t, T )

16

Chapter 3

Statistical Theoretical Overview

In this chapter we do a brief overview of the statistical methods used in this thesis, following the sequence

of methods that are applied through the work.

Firstly, we introduce the Bisection method, which is used to compute the implied volatility as the root

of the non-linear function we generate based on the Black-Scholes formula. After obtaining the implied

volatility, we need some pre-processing methods such as Box-Cox method and Mahalanobis Distance,

which is discussed in the second section in this chapter. Next, we introduce the idea of Quantile regres-

sion models in the third section. It generalizes linear regression methods and the tree-based regressors

with the combination of Quantile method, namely quantile linear regression and quantile random forest,

as our main models.

3.1 Bisection Method

The bisection method (so-called interval halving method) is one of the simplest root-finding algorithms

which is used to find zeros of continuous non-linear functions. The advantages of this method reflect

on two aspects: it is very robust and always tends to the solution if the signs of the function values

are different at the borders of the chosen initial interval, and it can be applied for non-differentiable

continuous functions [24]. However, the speed of convergence is relatively slow because it can only

double the accuracy with each iteration, which can be fast at the first several iterations and become very

slow at the end. In this process, the length of the interval is reduced by 50% in each step, leading to a

strictly monotone linear convergence.

It is used to find the root of a nonlinear real valued scalar function f(x) = 0 for f : R → R. This

algorithm works as following:

1. An interval [xa, xb] inside of the domain of definition is chosen initially. The signs of f(xa) and

f(xb) must be different to make sure that the function must have at least one root inside, by the

Intermediate Value Theorem.

2. At the beginning of t-th iteration, a midpoint of the interval xmidt =xat +x

bt

2 is computed and the

function value f(xmidt ) is evaluated.

17

• If f(xmidt )×f(xat ) > 0, then the root is inside the interval [xmidt , xbt ]. Thus we define new lower

border xat+1 as xmidt and continue the process;

• If f(xmidt )×f(xat ) < 0, then the root is inside the interval [xat , xmidt ].Thus we define new upper

border xbt+1 as xmidt and continue the process;

• If f(xmidt )× f(xat ) = 0, then xmidt is the root and the iteration is complete.

3.2 Pre-processing Methods

Many statistical methods and analysis comply with the assumption that the population under study is

normally distributed with a common variance and additive error structure. When the theoretical as-

sumption is seriously violated, one of the practical measures is to design a new model which contains

important aspects of the original model and also satisfies the assumption, for example, by applying a

proper transformation to the data set [25]. In this work, the parametric power transformation, Box-Cox

transformation, is applied to implied volatility to satisfy the linear regression assumption in Section 5.3.1.

Afterwards, in order to detect the multivariate outliers, the Mahalanobis distance, proposed in 1930 [26],

is applied to indicate how far one observation is from the center of the data bulk with respect to the

covariance structure.

3.2.1 The Formula of Box-Cox Transformation

While some traditional transformations (for example, square root, log, inverse) are more well-known for

improving normality, the Box-Cox transformation (put forward in 1964 by Box and Cox [27]) represents

a family of power transformations which integrates and extends the traditional measures to find the

optimal normalizing transformation [28]. The transformed values can be considered as the concretation

of a normal distribution. The Box-Cox transformation is defined as:

y(λ)i =

yλi − 1

λ, λ 6= 0

ln(yi), λ = 0

, (3.1)

for yi > 0, with transformation parameter λ [25].

Note that the transformation in Equation 3.1 is valid only for yi > 0. Modifications have been made

for a variable assuming negative values, and Box and Cox proposed the shifted power transformation as

an alternative way:

y(λ)i =

(yi + c)λ − 1

λ, λ 6= 0

ln(yi + c), λ = 0

, (3.2)

where λ is still the transformation parameter and c is a constant, satisfied yi + c > 0

The Box-Cox transformation parameter λ can be computed automatically by using any statistical

package, for instance, R which is used in this work. As we mentioned above, this family of transforma-

tions incorporates many traditional transformation measures [28]. Here are some examples shown in

18

Table 3.1:

Table 3.1: Parameter λ’s values and corresponding transformations

λ Transformations

1.00 identical to original data

0.50 square root transformation

0.33 cube root transformation

0.25 fourth root transformation

0.00 natural log transformation

-0.05 reciprocal square root transformation

-1.00 reciprocal (inverse) transformation

3.2.2 Mahalanobis Distance

Different from univariate outliers, multivariate outliers may not deviate from the majority of observations.

The Mahalanobis distance, introduced by Mahalanobis [29], is a measure of distance between groups

in terms of multivariate outliers detection. Mahalanobis distance is unitless and scale-invariant, and it

takes into account the associations between the original variables.

Formally, if xi is a realization of a p-dimensional random vector xi = (xi,1, . . . ,xi,p)> in a multivariate

sample with n observations with the sample mean µ = (µ1, . . . , µp)> and the sample covariance matrix

Σ, then the Mahalanobis Distance between xi and µ is defined as [30]:

DM (xi, µ) =√

(xi − µ)>Σ−1(xi − µ), i = 1, . . . , n. (3.3)

If xi is normally distributed, thenD2M (xi, µ) follows a Chi-square distribution with p degree of freedom,

i.e.

D2M (xi, µ) ∼ χ2

(p).

A certain cut-off value, e.g. the 99% quantile of χ2(p) can be used as an indication of extremeness.

One point can be considered as potential outlier, if it has a larger value of Mahalanobis distance than a

cut-off value [31].

Robust Mahalanobis Distance

Both mean and sample covariance matrix are very sensitive to outliers. Therefore, the classical

outlier identification method does not always find them when the estimators are themselves affected

by outliers. The minimum covariance determinant estimator (MCD) is a method based on very robust

estimators. It searches for a subset containing half of the data, and its covariance matrix has the smallest

determinant (Since this work does not focus on the algorithm of MCD, please check [32] for more details).

The resulting estimators also lead to robust estimates of the Mahalanobis distance, and this robust

distance is better suited to expose the outliers. Some studies show that same cut-off value chosen from

the χ2(p) distribution is still suitable for the robust case [31].

19

3.3 Quantile Regression Method

Koenker and Bassett Jr [12] came up with the quantile regression, expanding the range of ’median re-

gression’ proposed by Farebrother [33], which remedies the situation when in practice the dataset cannot

satisfy the assumptions of Ordinary Least Square (OLS) method. Quantile regression is a generaliza-

tion of linear regression where it models a quantile of interest as a linear function of the explanatory

variables. It has the advantage to be more robust then ordinary least squares. And it has shown that it

can lead to good results when there are complex relations between the variables.

Definition of quantile

Quantile regression focuses on the conditional quantiles of Y given X = x rather than the conditional

mean, where the median of Y given X = x is one particular case. Assume the distribution function for

a real-valued response variable Y is

FY(y) = P (Y ≤ y),

then the τ -th quantile of Y defines as the minimum value of y which satisfies F (y) ≥ τ , i.e.,

QY(τ) = F−1Y (τ) = inf{y : F (y) ≥ τ}, 0 < τ < 1. (3.4)

Definition of quantile loss function

Given a realization of a random sample y = {y1, y2, . . . , yn} of Y and a quantile τ ∈ (0, 1), the

L1 − norm quantile regression is used to minimize the loss function [34]:

L(y, y)τ =

n∑i=1,yi≥yi

τ |yi − yi|+n∑

i=1,yi<yi

(1− τ)|yi − yi| =n∑i=1

ρτ (yi − yi), (3.5)

where yi are the estimated values and ρτ (yi − yi) (named as check function [12]) is defined as:

ρτ (yi − yi) =

τ(yi − yi), yi − yi > 0

− (1− τ)(yi − yi), otherwise. (3.6)

The quantile regression is going to be discussed in more detail in Section 3.3.1, and in Section 3.3.2

estimation methods based on decision trees are briefly reviewed.

3.3.1 Linear regression and its quantile application

Suppose that there is a dataset with n observations, {yi,xi} for i = 1, . . . , n, where yi is a response

variable and the explanatory variables xi = (xi1, . . . , xip). The linear regression model is:

yi = β0 + β1xi1 + · · ·+ βpxip + εi = x>i β + εi, i = 1, . . . , n, (3.7)

where x>i β is the inner product between vectors xi = (1, xi1, . . . , xip), β = (β0, β1, . . . , βp)> and εi is

a random variable, called error term. Sometimes one of the explanatory variables can be a non-linear

function of another one, as in polynomial regression, but the model remains linear because it is still

20

linear function of the parameter vector β. The error term εi captures all other influence on the response

variable yi other than xi.

In linear regression, we assume

E(Y|X = x) = x>β.

The major assumptions made by standard linear regression models with standard estimation tech-

niques (e.g. ordinary least squares) are [35]:

• Linearity: It requires the mean of the response variable is a linear combination of the explanatory

variables.

• Constant variance (homoscedasticity): Different values of the response variable are assumed to

have the same variance, regardless of the values of the explanatory variables, i.e. var(Yi|Xi =

xi) = σ2.

• Independence of errors: The errors of the response variables are assumed to be uncorrelated with

each other, i.e. cov(εi, εj) = 0,∀i 6= j.

• No multicollinearity in explanatory variables. Multicollinearity occurs when the explanatory vari-

ables are too highly correlated with each other.

Linear quantile regression fits a conditional quantile of the response variable by a linear function x>β.

In Table 3.2 we compare the general quantile linear regression with the most common and widely used

alternatives: linear and L1 regression methods [36].

Table 3.2: Comparison between linear, median, and quantile regression methods.

Ordinary Least Squares (OLS)

Conditional mean function E(Y|X = x) = x>β.

Loss function:n∑i=1

(yi − yi)2, minimize the sum of square of residuals.

Estimates β = arg minβ∈Rp+1

n∑i=1

(yi − xi>β)2.

Least Absolute Deviation (LAD)

Conditional median function Q0.5(Y|X = x) = x>β(0.5).


|yi − yi|, minimize the sum of absolute values of residuals.

Estimates β = arg minyi∈Rp+1

∑ni=1 |yi − yi|,.

τ -Quantile

Conditional quantile function Qτ (Y|X = x) = x>β(τ).


ρτ (yi − xi>β), minimize the sum of weighted absolute value of residuals.

Estimates β = β(τ) = arg minβ∈Rp+1

n∑i=1

ρτ (yi − xi>β).

21

3.3.2 Tree-based regressors and its quantile application

Tree-based methods are simple to visualize and to interpret, but in complex cases decision trees may not

give competitive prediction results. Hence, ensemble methods are needed to generate a large amount

of trees which are able to be combined afterwards to yield one predicted result, and usually create

surprising improvements in prediction. This section begins with the introduction of a single decision tree,

followed by the mechanisms of Random Forest and corresponding quantile expansion.

Elementary tree-based model: Decision Tree

Decision tree is a non-parametric supervised learning method in the form of a tree structure, used for

both regression and classification. In this work we focus in particular on the regression part. It splits a

dataset from the entire space (denoted as R) into several regions (denoted as Rj) from the top of the

tree to each leaf in the bottom by a series of binary if-then rules. These rules identify distinct regions

where the observations inside share the most homogeneous responses to predictors. In each internal

node, the dataset is split and predictors are judged to minimize the prediction error. The leaves, named

terminal nodes, represent the final division of regions. At a leaf node, the mean of the response values

assigned to that node is the predicted value returned by the decision tree.

Following the notation of Breiman [37], we define θ as the random vector for the entire tree, to record

how explanatory variables are split at each node and the corresponding tree is represented by T (θ).

Every leaf j = 1, . . . , J of a tree T (θ) corresponds to a rectangular subspace denoted as Rj , j = 1, . . . , J .

For every x, there is one and only one leaf j such that x ∈ Rj (corresponding to the leaf that is obtained

when dropping x down the tree). Denote this leaf by j(x, θ) for tree T (θ). The goal is to construct these

regions R1, R2, . . . , RJ minimizingJ∑j=1

∑i∈Rj

(yi − yRj)2,

where yRjrepresents the mean response over all observations in region Rj , j = 1, . . . , J .

The process of building a decision tree T (θ) is:

1. First, a binary split is applied to the current subset. Due to the fact that it is difficult to take every

possible partition of the dataset into J rectangles into account, we need to apply a top-down,

greedy recursive binary splitting approach to split successively further down to the leaf and to

choose the best split at each step of tree-building process. The decision of making strategic splits

is vital for the accuracy of prediction. In the process of creating two sub-regions, the criteria is used

to make sure the homogeneity in the sub-regions increased. This work uses CART (Classification

and Regression Tree) algorithms. In order to perform this approach, for a predictor Xj , there is a

cutpoint s such that the predictor can be split to two regions, either the region where Xj is smaller

than s, i.e. {Xj < s} or the region where Xj is greater or equal to s, i.e. {Xj ≥ s}. The best

cutpoint for each predictor Xj , j = 1, . . . , p, is chosen such that the tree has the lowest sum of

22

square of residuals, i.e., defining

R1(j, s) = {Xi : Xi,j < s} and R2(j, s) = {Xi : Xi,j ≥ s}.

We want to find the best pair of (j, s) to minimize

∑i:xi∈R1(j,s)

(yi − yR1)2 +

∑i:xi∈R2(j,s)

(yi − yR2)2, (3.8)

where yR1represents the mean response over all observations in R1(j, s) and yR2

represents the

mean response over all observations in R2(j, s).

2. Afterwards the process is repeated and applied to the next node, focusing on splitting one of the

previous identified regions, until user defined stopping criterion is reached. The best predictor and

cutpoint for each further node are chosen to minimize Equation 3.8 and the distinct regions are

generated gradually. Thus a fully grown tree is generated for training dataset with 100% accuracy.

3. Finally, it is known that decision trees can cause the over-fitting problem if the tree is too complex

and full of details, leading to bad performance on prediction of new observations. It is necessary

to set constraints on tree size or to prune the grown tree. (Note that in this work, we mainly use

decision trees following the random forest method mentioned in next subsection. Therefore the

pruning is not necessary and its introduction is omitted here. Those who are interested in pruning,

please check Mingers [38], Mehta et al. [39] and Kearns and Mansour [40].)

Prediction

A decision tree works through two steps: first divide the train set into J distinct and non-overlapping

regions R1, R2, . . . , RJ . The predicted value associated with a given leaf is the mean response of the

observations in train set falling in that region. Thus if a new observation falls in one of the regions, its

predicted value will be the mean response value for the train observations belonging to that region.

Specifically, the prediction of a single tree T (θ) for new points X = x is the average over the observed

values in leaf j(x, θ). The weight vector ωi(x, θ) is a positive constant if observation xi belongs to the

leaf j(x, θ), otherwise it gets 0. The sum of the weights is one, and thus

ωi(x, θ) =1{Xi∈Rj(x,θ)}

#{m : Xm ∈ Rj(x, θ)}. (3.9)

Given X = x, the prediction of a single tree is then the weighted mean value of the original observa-

tions yi, i = 1, . . . , n,

single tree : µ(x, θ) =

n∑i=1

ωi(x, θ)yi.

Decision trees are popular because

• Small trees can be visualized, relatively easy to understand and interpret. A leaf can be explained

by boolean logic.

23

• It is able to use simultaneously categorical and continuous variables.

• The tree is unaffected by monotone transformations and differing scales of the explanatory vari-

ables. Thus, the model allows more general applications without the need of pre-processing the

data.

• The computational cost of using the tree to predict a response is logarithmic in the number of

objects in the train set and thus relatively fast, even though the train of the tree may be a complex

and demanding process.

• Trees are less sensitive to outliers in the explanatory variable.

Despite these benefits, decision trees have several weaknesses as well. They have difficulty in

modeling smooth functions and its structure depends on the sample of data. Small changes in training

set can result in very different results of splits. What is more, decision trees are built by greedy algorithm

where locally optimal decisions are made at each node, rather than returning the globally optimal result

[41].

However, by combination of many decision trees based on the ensemble methods, the problems can

be reduced and the performance of prediction can be quite improved. Random forest, as a representa-

tive ensemble method, is introduced in next section.

Ensemble method: Random Forest

Random forest is an ensemble learning method generated by using many decision trees on selected

training samples. Like decision trees, it is capable of solving both regression and classification problems.

Random forests employs randomness each time it selects a subset of training set for building each tree

and a subset of input attributes to choose the best attribute and split on for generating each node.

In bagging, each tree is independently constructed using a bootstrap sample of the data set. In

the end, a simple average is taken for prediction. Breiman [37] proposed random forests, which add

an additional layer of randomness to bagging. In addition to constructing each tree using a different

bootstrap sample of the data, random forests change the way how regression trees are constructed. In

standard trees, each node is split using the best split among all variables. However, in a random forest,

each node is split using the best variable among a subset of variables which is randomly chosen at each

node. This strategy turns out to perform very well and is robust against over-fitting [37].

The algorithm for generating a random forest regression model is as follow [42]:

1. Before building each tree, first draw several samples from training set by bootstrap resampling

method (with replacement) from the original data. Here we denote the number of samples as

ntree, indicating that ntree trees will be built.

2. For each selected sample, grow a regression decision tree without pruning. At each node of a

decision tree, a random subset of mtry variables is chosen from the entire p variables, as the

target to be split at this node, rather than choosing the best split among all variables. Only one of

24

these mtry variables can be used to generate the best split rule and corresponding subregions at

this node.

Thus for the selected training sets in step 1, ntree trees are fully grown and combined into a random

forest.

3. Predict new observations by aggregating the predictions of this ntree trees. (The way of aggrega-

tion will be discussed further.)

Advantages of Random Forests:

• As long as the number of trees is large enough, the problems of huge computation and the over-

fitting do not bother much, due to the fact that not all the variables are considered in each step of

building nodes.

• Random forest also reduces the influence of some very strong variables which strongly take control

of each tree, and therefore other variables are allowed to show their own performances.

• Random forest runs efficiently on large dataset and the learning speed is fast.

• Random forest can handle a large set of input variables, so there is no need to operate variable

selection.

• Random forest offers an experimental method for detecting variable interactions.

For regression, the prediction of random forests for a new data point X = x is the averaged re-

sponse of all the trees. Using random forests, the conditional mean E(Y|X = x) is approximated by

the averaged prediction of ntree single trees, each constructed with an independent and identically dis-

tributed vector θt, t = 1, . . . , ntree. Let ωi(x) be the average of ωi(x, θt) (defined in Equation 3.9) over

this collection of trees,

ωi(x) =1

ntree

ntree∑t=1

ωi(x, θt). (3.10)

The prediction of random forests is then

E(Y|X = x) = µY|X=x(x) =

n∑i=1

ωi(x)yi.

Quantile Random Forest

It was shown above that random forests approximates the conditional mean E(Y|X = x) by a

weighted mean over the observations of the response variable Y. Not only that, the weighted ob-

servations can reveal the full conditional distribution. The conditional distribution function of Y given

X = x is written as

F (Y|X = x) = P (Y ≤ y|X = x) = E(1{Y≤y}|X = x).

25

The last expression inspires to draw analogies with the random forest approximation of the conditional

mean E(Y|X = x). Just as E(Y|X = x) is approximated by a weighted mean over the observations

of response variable, we can define an approximation for E(1{Y≤y}|X = x) by the weighted mean over

the observations of 1{Y≤y} [14],

F (Y|X = x) =

n∑i=1

ωi(x)1{Y≤y},

using the same weights ωi(x) (defined in equation 3.10) as in random forests. This approximation is at

the heart of the quantile regression forests algorithm.

The estimate QY|X=x(τ) of the conditional quantiles

QY|X=x(τ) = F−1Y|X=x(τ),

are obtained by plugging F−1Y|X=x(τ) instead of F−1Y|X=x(τ) into equation 3.4. Other approaches for

estimating quantiles from empirical distribution functions are discussed in Hyndman and Fan (1996).

The key difference between quantile regression forests and random forests is that: in each tree,

random forests keeps only the mean of the response variable of the observations that fall into each

leaf node and neglects all other information. In contrast, quantile regression forests keeps recording

every quantile of fitted response values in all trees, not just their mean, and assesses the conditional

distribution based on this broader and more comprehensive information.

26

Chapter 4

Computation and Analysis of Implied

Volatility

The main goal of this chapter is to demonstrate how to calculate the implied volatility from the dataset

introduced in Section 1.2 and analyze the results. In order to compute the implied volatility from the

options price, we apply the Black-Scholes formula. In many cases, the Black-Scholes formula, as derived

by Black and Scholes [1], is used to compute the price of a certain option, provided with the data

concerning interest rate, asset price, maturity, strike and volatility. But in our context, the volatility is

unknown, and the option price is provided (or at least one approximate value of it available). Therefore,

we invert the Black-Scholes formula in order to get estimates for the implied volatility.

As we do not have explicitly the option price, but instead the bid and ask price for each contract,

we propose to compute the implied volatility using two ways: either we compute the average of the bid

with the ask price and the obtained value is the input for the Black-Scholes formula; or we input the

bid and then ask price in the Black-Scholes formula individually, and then compute an average of the

resulting implied volatilities. For reasons that will become clear in the following sections, we will resort

to a numerical method (namely the Bisection method) to compute the implied volatilities.

As a result, four sets of implied volatility are generated through operations above. We display and

compare the implied volatilities computed through the example shown in Section 1.2 specifically, and

afterwards give a general description of the main type of implied volatilities, which is deduced based on

future price.

4.1 Calculation Processes

Thanks to Put-Call Parity mentioned in Section 2.4, we can relate call options prices with put options

prices with the same underlying asset, strike, date and maturity. Based on the combination of call and

put options we did in Section 1.2, as a matter of choice, we have decided to focus on call options prices

and thus the implied volatility mentioned afterwards refers to call options.

If we define a call option price generated by Black-Scholes formula with the only unknown parameter

27

implied volatility σ as FBS(σ), it should be equal to the true market value Cmarket of the call option

contract with the same asset, date, maturity and strike price:

FBS(σ) ≈ Cmarket. (4.1)

Regarding to solve Equation 4.1 and obtain the implied volatility σ, we have two challenges:

1. In fact, the market price for an option contract Cmarket is not provided in the dataset. We have

information concerning bid and ask prices, hereby denoted by Cbid and Cask respectively, as the

range of Cmarket (also known as bid-ask spread [Cbid, Cask]). The true price of the option, Cmarket,

may not exactly fall in this range, although most likely it will match inside. Therefore we propose

the following two alternative methods and show the ideas briefly in Figure 4.1:

Method 1:

Assume that

Cmarket =1

2(Cbid + Cask),

and compute the implied volatility from this new value.

Method 2:

First, compute the implied volatility using Cbid as the input variable, and named it σbid.

Second, compute the implied volatility using Cask as the input variable, and named it σask.

Third, compute the resulting implied volatility as the average of σbid and σask

Figure 4.1: Framework of computation Method 1 (m1) and Method 2 (m2).

2. Whichever we decide to use as input values to compute implied volatility, we need to invert the

Black-Scholes formula. Unfortunately, this formula is a non-linear function which does not have a

closed form solution for implied volatility. Therefore one needs to resort to numerical approxima-

tions. Then, as we cannot find explicitly:

σ : FBS(σ) = Cmarket,

we need to estimate:

σ = arg minσ

[FBS(σ)− Cmarket]. (4.2)

28

For the minimization method, we consider the bisection method mentioned in Section 3.1, a root-

finding algorithm for continuous non-linear, in particular non-differentiable functions. The bisection

methods is used to find the root, the implied volatility σ, of this non-linear non-differentiable func-

tion, FBS(σ)− Cmarket = 0. The process is as following:

• Firstly, initialize the limits of an interval [0.001, 1], and choose the first midpoint as 0.2 to

accelerate the computation speed, as according to our knowledge, the implied volatility for

this kind of option is normally small.

• Secondly, in each iteration, compute the midpoint and its corresponding function value. Follow

the second step of the algorithm mentioned in Section 3.1 to compare the signs and reset the

interval.

• Finally, repeat the last step until the absolute value of midpoint’s function value (i.e. the error of

estimate option price and true market price) is less than 0.00001, or the number of iterations

reaches the maximum 1000. For the former case, the estimate implied volatility is attained

as the value of the midpoint in the last iteration. The latter indicates that the computation can

not converge, regardless the number of iterations because there is no improvement even if

we set the maximum of iteration as 3000.

As we mentioned in Section 2.5, for different configuration of portfolio, the Black-Scholes formula

has two forms: Equation 2.7 and Equation 2.9 depended on whether the price of asset or the future is

involved.

By using future price

FFBS(σ) = C(t, T,K, F (t, T ), D(t, T )) = D(t, T )(F (t, T )Φ(d1)−KΦ(d2)),

d1 =ln(F (t,T )

K ) + 0.5σ2(T − t)σ√T − t

,

d2 = d1 − σ√T − t.

By using asset price

FSBS(σ) = C(t, T,K, St, σ) = StΦ(d1)−Ke−r(T−t)Φ(d2),

d1 =ln(St

K ) + (r + 0.5σ2)(T − t)σ√T − t

,

d2 = d1 − σ√T − t.

In Figure 4.2 we show the framework of how we organize the related datasets and as a result, there

are in total four sets of implied volatility, noted as IV m1F , IV m2

F , IV m1S and IV m2

S , where the subscript

F (S) means that we use future price (asset price), and the superscript m1 (m2) means that we have

used method 1 (method 2).

29

Figure 4.2: The framework of how we organize the related datasets.

4.2 Analysis of the computed implied volatility

We start our analysis by showing the implied volatilities computed from the contracts in the subset

mentioned in Section 1.2 which is named ’Example Set’. Afterwards, we focus on the most reliable

result, IVF , and present some plots and descriptive statistics.

4.2.1 Analysis and comparison based on the example

To avoid the plots being unreadable, instead of plotting for the whole dataset the computed implied

volatilities, we here follow the ’Example Set’ mentioned in Section 1.2, which contains 42 contracts of

options from Jan 3, 2014 to March 1,2014, and present the implied volatility computed from these 42

observations in the example in Figure 4.3.

In Figure 4.3, the results computed by method 1 (denoted as m1 in the legend) are denoted by the

notation ’o’, while the ones by method 2 (denoted as m2 in the legend) are denoted by the notation ’+’.

We can see that the computation methods do not cause much difference, except that for index price

case there are several values missing at which points the computation cannot converge.

The three curves in Figure 4.3 represent three input variables in the Black-Scholes formula, which

are relatively the future price (represented by the curve on the top), the index price (represented by the

curve on the bottom), and additionally, the constructed price (represented by the curve in the middle,

closes to the one on the top). Note that although the constructed price is deduced from future prices,

here the constructed price is used as input in the same ’asset price involved’ formula as index price. It

illustrates that

• The implied volatilities based on the future price and the constructed price are similar, especially

for the last 20 observations. The last 20 observations’ strike prices are closer to the daily index

price than the first 22 observations’. Therefore, smaller difference of option’s strike price and daily

30

Figure 4.3: Different types of the implied volatility based on the Example set. Implied volatility computedby future price, asset price and constructed price in two computation methods are shown in differentcolors. The results computed by method 1 (denoted as m1 in the legend) are denoted by the notation’o’, while the ones by method 2 (denoted as m2 in the legend) are denoted by the notation ’+’.

index price indicates less influence on the computed volatilities caused by the input variables of

Black-Scholes formula.

• The implied volatility based on index price is not accurate on the major maturities. As mentioned

earlier in Section 1.2.2, the deduced constructed price can variate a bit from index price in the

major maturities. In this example, the maturity March 21, 2014 is one of the major maturities and

the daily index price is 3074.43, different from the constructed price 3062.73.

In a nutshell, the implied volatility based on future price is the most accurate and reliable result. Only

when the future price is not available, which means that the maturity of an option is not one of the major

maturities, the implied volatility based on the index price can be considered as a supplement. What is

more, because, except those major maturities, the market is not as active and variate as in the major

maturities within the last 45 minutes, the index price can be closer to the constructed price for the sake

of producing reliable implied volatility.

31

4.2.2 Analysis on the entire IVF

In the last subsection, we mainly display a small subset of implied volatilities. Through that way, we

can provide the reader a clear look of the specific value for each contract. We have seen that two

computation methods give very similar results on computed implied volatility. Now it is the time to have

a general look of the difference of the whole set of these two resultant implied volatilities based on future

price (IVF ).

Here we discuss and compare the differences of the implied volatility derived by method 1 and

method 2 respectively in dataset IVF . After plotting the result deduced by method 1 against method

2 in Figure 4.4, we can see that most of points marked black fall around the diagonal line, meaning that

the estimated implied volatilities are pretty similar. These points are noted as stable because the esti-

mates stay stably regardless of calculation methods. Only a small set of option contracts produced the

differences between two methods are marked red. Those red points with differences of estimated results

larger than 0.001 are only 0.65% of the whole dataset, not as reliable as the black ones because they

can not defect the influence of calculation methods. We named the red part as unstable. In a nutshell,

we define these two datasets as ’Stable’ set and ’Unstable’ set according to whether the observations

inside are sensitive to the computation methods, i.e. whether the computed implied volatility remains

relatively the same when the computation method changes from method 1 to method 2. Here the cut-off

point 0.001 is chosen manually by visualization, while other criteria is open to taken into account in the

future.

Figure 4.4: Comparison of implied volatility derived by both methods.

Next we try to figure out what kind of features inside of the unstable set make the result different

32

through two computation methods. The most prominent feature, shown in Figure 4.5, is that the ob-

servations inside all belong to deep ’in-the-money’ options or deep ’out-of-the-money’ options for the

call option. Defined in Section 2.2.4, these two kinds of options normally own a strike far away from

the daily asset price. What is more, the trading sizes are normally pretty small, due to the fact that for

deep in-the-money options, their option prices are more expensive than their profits, not attractive to the

buyers, and for deep out-of-the-money options, they have few chances to earn money in spite of their

cheap prices.

(a) Stable set. (b) Unstable set.

Figure 4.5: Option (ask) prices of their call and put options for the contracts with future price involved.

Therefore, the observations in the unstable set represent an extreme case, in which the option con-

tracts are hard to be exercised successfully. Therefore, we do not focus on the unstable set anymore,

and we continue our study based on the stable set. The dataset mentioned afterwards refers to the

stable set inside of the computed implied volatility with future price as input, denoted by IV stableF .

Table 4.1: For the contracts with estimate implied volatilities IV stableF .mean sd median trimmed mad min max range skew kurtosis se

Common variables

time to maturity 129.74 72.88 127.00 128.45 84.51 1.00 270.00 269.00 0.13 -1.02 0.41

strike 3282.91 403.15 3300.00 3283.72 407.71 1900.00 4500.00 2600.00 -0.04 -0.24 2.26

constructed price 3310.65 223.64 3253.37 3306.92 262.18 2777.45 3760.12 982.67 0.23 -1.17 1.25

Estimate Implied Volatilities

IV(m1) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.03 1.44 5.93 0.00

IV(bid) 0.01 0.00 0.01 0.01 0.00 0.00 0.03 0.03 1.29 5.03 0.00

IV(ask) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.04 1.55 6.58 0.00

IV(m2) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.03 1.41 5.81 0.00

Thus in Table 4.1, we display a statistical analysis for the current dataset concerned IV stableF . We

can see that in this dataset both mean and median of time to maturity are almost four months as the

entire IVF . Further study and analysis of dataset IV stableF are discussed in next chapter.

33

34

Chapter 5

Modeling and Predicting the Implied

Volatility

Now that we have estimated the implied volatility, in this chapter we move forward, trying to fit the dataset

concerned with some regression models and give some explanations. In view of the large size of the

dataset IV stableF (31896 observations in total), we will focus on some specific subsets. Therefore, we will

explain in the first section of this chapter how we choose the subsets, in the second section what we do

in explanatory analysis, and in the third section how we fit the models.

5.1 The process of choosing a subset

Mentioned at the end of last chapter, the dataset we are focusing on now is IV stableF , namely the stable

set inside of the computed implied volatility with future price as input. It contains 31896 observations,

occupying 99.35% of the dataset IVF .

From the description of dataset IVF in Section 4.2, we have a rough idea that the distribution of time

to maturity is relatively even, waving around the length of four months. Due to the fact that one of the

biggest challenges of this work is that the observations are highly mixed and overlapped, clear patterns

of implied volatility are hard to attain. However, according to our knowledge for financial activities,

the movement of the market and the actions of the traders tend to have periodic variations. An idea

therefore comes up about if it is possible to extract some subsets which holds the significant seasonal

characteristics of the implied volatility variation.

If we fix a maturity and study the contracts which would be expired at that day, from the descriptive

analysis of IV stableF in Table 4.1, we can illustrate that the earliest trading among these contracts happens

at most 270 days prior to this maturity, and the latest one happens at least one day before it.

We first want to know more features about the date and maturity, i.e. the beginning and ending

day of the life of an option contract. Figure 5.1 shows, respectively, the range of the trading dates for

those contracts which end at ten distinct major maturities. Note that the information of options was

gathered from January 2, 2014 to October 29, 2015. It is worth to highlight that the records lacked in

35

Figure 5.1: Boxplot of trading dates for every maturities appeared.

the original dataset for a period from March 21, 2014 to May 16, 2015, due to unclear reasons. The lack

of information of this period actually affects and reduces the number of contracts traded in June, 2014

and September, 2014. What is more, the first maturity appeared (March 21, 2014) and the last three

maturities shown in this study (December 18, 2015, March 18, 2016 and June 17, 2016) are influenced

by the beginning and ending days of the records. The box plots for the middle four maturities (December

19, 2014, March 20, 2015, June 19, 2015 and September 18, 2015) seem to have more complete

information currently.

Figure 5.2: Sample sizes and rates of each maturity. The red numbers at the top of the columns explainthe proportion of the sample size of each maturity in the total amount of the observations.

36

Figure 5.2 displays the sample sizes of the contracts related with each maturity. The red numbers

at the top of the columns explain the proportion of the sample size of each maturity in the total amount

of the observations. It is considered to be a good choice for studying the contracts related with maturity

at September 18, 2015 as our current target. It first contains relatively complete information, holding

17.7% of the total contracts. Secondly it is the last maturity before the end of the records, supposedly

containing the most comprehensive information throughout the period of records. We need further study

to check if it has similar enough tendency of the entire dataset IVF to represent the latter one.

5.2 Explanatory Analysis

For the chosen dataset composed by options with maturities at September 18, 2015, we display a brief

analysis to attain more information we have not noticed in the previous dataset when observations are

mingled disorderly and unsystematic.

Table 5.1: Statistical description for options whose maturities are at 2015-09-18.mean sd median trimmed mad min max range skew kurtosis se

Common variables

time to maturity 129.14 72.50 133.00 129.31 90.44 1.00 270.00 269.00 -0.02 -1.16 0.97

strike 3466.14 379.90 3500.00 3484.44 370.65 2200.00 4350.00 2150.00 -0.43 -0.03 5.06

discount 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00 -0.56 -0.60 0.00

constructed price 3475.59 186.65 3526.46 3496.08 169.78 2938.49 3753.79 815.30 -0.90 0.06 2.48

Input using future prices

future(bid) 3475.45 186.90 3526.00 3495.96 170.50 2938.00 3754.00 816.00 -0.90 0.06 2.49

future(ask) 3477.25 186.78 3529.00 3497.74 167.53 2940.00 3756.00 816.00 -0.90 0.06 2.49

Estimate Implied Volatilities

IV(m1) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.44 8.94 0.00

IV(ask) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.48 9.46 0.00

IV(bid) 0.01 0.00 0.01 0.01 0.00 0.01 0.03 0.03 2.37 8.19 0.00

IV(m2) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.42 8.79 0.00

Market prices for options

call(bid) 212.67 200.78 155.20 181.66 172.72 0.10 1300.60 1300.50 1.42 2.19 2.67

call(ask) 217.08 205.12 157.95 185.31 175.69 0.20 1306.60 1306.40 1.42 2.13 2.73

put(bid) 201.70 162.41 155.40 180.19 145.00 0.00 779.20 779.20 1.07 0.55 2.16

put(ask) 204.66 166.24 156.55 182.49 147.30 0.00 797.20 797.20 1.08 0.58 2.21

We do a brief statistical analysis for options whose maturities are at 2015-09-18. Table 5.1 and

Figure 5.3 indicate that:

• Both mean and median of time to maturity in our current dataset are almost around four months.

• As for the implied volatility calculated by method 1 and method 2, the only differences occur in the

skew and kurtosis. The values of skew and kurtosis are slightly larger in the current dataset than

the entire set with all maturities, indicating that even though the distribution of implied volatility still

has right tail and skews positively, the observations are actually more centralized together.

• For the distribution of the constructed price, the skew is -0.90 (with heavy left tail), while the kurtosis

is 0.06, indicating that the distribution is much more centralized in the current dataset than the

37

Figure 5.3: Pair plot for options whose maturities at 2015-09-18. Here we plot the seven variables,namely time to maturity, strike, constructed price, IV (m1),IV (ask), IV (bid) and IV (m2).

entire set with all maturities. However, the constructed price is still hard to say it totally follows a

standard log-normal distribution required by the assumption of the Black-Scholes formula.

After preliminary analysis, the current dataset has the improvement on centralizing information from

observations and reinforcing the correlation between our response variable (implied volatility) and pre-

dictors (time to maturity and strike).

(a) Perspective 1: from the side of time to maturity (b) Perspective 2: from the side of strike

Figure 5.4: 3D plots from the side of both time to maturity and strike to check the distribution of impliedvolatility.

38

Afterwards, we show two perspectives in 3D in the Figure 5.4. There are two types of ’smile’, the first

type has smaller values of time to maturity and its ’smile’ is relatively complete, as it shows the inflection

points of the curves and the curves bounce up after implied volatility going down with the increase of

strike. Most observations belong to the second type where the curves remain monotonically decreasing

along with the strike rising up.

5.3 Regression Modeling

Here we come up with some linear regression models to fit the dataset, and random Forest model is

also applied to complement as non-linear model, whose result is compared with the linear models.

5.3.1 Pre-processing process

In pre-processing, we first randomly separate the dataset into train set (75% of the entire observations)

and test set (25% of the entire observations). Response variable (implied volatility) for train and test are

positive-skewed, so in order to fit the following regression models and to satisfy their assumptions, the

train set then adopts Box-Cox method (introduced in Section 3.2.1) to calibrate the distribution of the

implied volatility into normalization.

(a) train (b) test

Figure 5.5: The distribution of response variable before and after Box-Cox transformation.

The parameter λ = −1.59 generated by train is applied on both train and test dataset to stay in

the same scale. Note that the implied volatility in this case ranges from 0.0072 to 0.0379, thus the

transformed implied volatility has the range between -1604.2571 and -113.8081. From the Figure 5.5,

this method works well and the transformed response variable (noted as ’bcIV ’) approximately follows

normal distribution. Since the Box-Cox transformation already transformed the orders of magnitudes in

the similar levels of strike and time to maturity, thus scaling process is not necessary afterwards.

Next we apply Mahalanobis Distance for the set of three variables (time to maturity, strike and bcIV

39

which is the estimate implied volatility adjusted by Box-Cox method) to find the extreme values relatively

far away from the main cluster. Points whose robust Mahalanobis Distances are the largest 1% amongst

the dataset would be treated as extreme values and marked green in the correlation pair plot and 3D

plot in Figure 5.6, although those points can not be treated as outliers from the cluster.

(a) Pairs plot. (b) 3D plot.

Figure 5.6: Plots with extreme values detected by robust Mahalanobis distance. Points whose robustMahalanobis Distances are the largest 1% amongst the dataset would be treated as extreme values andmarked green

Shown in Figure 5.6, instead of choosing several complete ’smile’ curves ( for example, the first

type of smile shown in Figure 5.4), the Mahalanobis distance selects out several observations with high

volatility. Since it is mentioned in Section 3.2.2 that Mahalanobis distance is calculated through each

principal component axis therefore scale-invariant, the influence of different scales for three variables

are eliminated. However, looking back to the first type of curves with inflection points in 3D plots, these

observations probably indicate significant characteristics on how the implied volatility distributes when

the time to maturity is small. Thus the Mahalanobis distance does not give us a convincing evidence of

outliers detection. We leave all observations in further study. Note that the extreme values selected here

remain in color green, and it will be explained later on whether to keep them or remove them as outliers.

5.3.2 Linear Regression

After pre-processing operations, we start now our processes to fit linear models. The implied volatility

from the market shows a pattern named volatility smile, influenced by time to maturity and strike price

of the option. Thus our goal here is to generate some advanced alternative parametric models for

implied volatility based on time to maturity and strike price. In this section, we first come up with three

classical ordinary least square models gradually and apply both least absolute deviation regression and

quantile regression based on the third function we generate. Table 5.2 provides an overview in terms

of the response variable together with original covariates and the ones generated afterwards. Note

that both in the linear regression and random forests we use the transformed implied volatility through

Box-Cox method as the response variable to keep the models and predictions in the same scale, even

40

though random forest does not need the response variable to be normally distributed as needed by

linear regression.

Table 5.2: An overview in terms of the response variable together with original covariates and ones gen-erated afterwards in regression models. Both notations and descriptions are displayed representatively.

Variables Descriptions

Response Y Transformed implied volatility (by Box-Cox)

Covariate

X1 StrikeX2 Time to maturityX3 time to maturity × strikeX4 1/ time to maturityX5 strike2

The models are compared in terms of model efficiency. The coefficient of determination (R2), as an

effective performance statistic, is measured to give information about the goodness of fit of each model,

through

R2 =SSRegressionSSTotal

= 1− SSErrorSSTotal

.

Model 1 : Yi = β0 + β1X1,i + β2X2,i + εi, (5.1)

Model 2 : Yi = β0 + β1X1,i + β2X2,i + β3X3,i + εi, (5.2)

Model 3 : Yi = β0 + β1X1,i + β2X2,i + β3X3,i + β4X4,i + β5X5,i + εi, (5.3)

where εi is the error item.

To avoid repeating the similar operation several times and to give a clear impression, the general

expressions of three linear models are shown in Equation 5.3, and the meaning of covariates are based

on Table 5.2. Specifically, the models are generated as the following steps:

1. The initial model Model 1 is based on the transformed implied volatility related with a simple addi-

tion of our two original covariates, time to maturity and strike.

2. In Model 2, the interaction term is added. Taken into consideration of the relationship between

two covariates from Figure 5.6(a), strike and time to volatility seem to have a kind non-linear

relationship. It indicates that the interaction of these two covariates might be vital, so we combine

the interaction term of time to maturity and strike into the initial model.

3. Two terms are added inModel 3, which are ’the reciprocal value of time to maturity’ and ’the square

of strike’. Shown in Figure 5.6(a), we can see that the time to maturity goes inversely with implied

volatility. What is more, the phenomenon of ’volatility smile’ is visible, showing the existence of the

pattern between the square of strike and implied volatility.

Table 5.3 illustrates that:

For train set, the adjusted R2 values for three models are gradually increase from 0.8185, 0.8540 to

0.8668, stating that the covariates can explain at most 86.68% of the variance of the response variable,

41

Table 5.3: Summary of covariates on regression functions

Estimate Std. Error t value Pr(>|t|)

Model 1

β0 1394.7363 16.2883 85.63 0.0000

β1 -0.5904 0.0045 -131.42 0.0000

β2 -1.3634 0.0236 -57.68 0.0000

Multiple R2: 0.8186, Adjusted R2: 0.8185

Model 2

(Intercept) 2395.9840 34.4390 69.57 0.0000

β0 -0.8872 0.0101 -87.97 0.0000

β1 -7.5530 0.1940 -38.94 0.0000

β3 0.0019 0.0001 32.10 0.0000


Model 3

(Intercept) 1424.2046 107.3273 13.27 0.0000

β0 -0.3830 0.0587 -6.53 0.0000

β1 -6.4615 0.1970 -32.79 0.0000

β3 0.0016 0.0001 27.58 0.0000

β4 447.5932 23.7503 18.85 0.0000

β5 -0.0001 0.0000 -8.14 0.0000


and through the process the efficiency keeps improving. All covariates appeared in three models are

significant indicated by very small p-values. This actually can be caused by underestimation problem,

since for such a variate response variable, we only have two types of information (time to maturity and

strike). Besides the models mentioned above, we also tried some more complicated combination in

terms of time to maturity and strike, like high-order polynomial regression. However, the resultant R2

values do not have significant improvement and the meaning of the coefficients are harder to be ex-

plained. It is a trade-off on the complexity versus conciseness. Taken into consideration of the following

processes, since we are going to apply both robust regression and quantile regression which can bring

more perspectives of the relationship between covariates and response variable, we think it is a better

choice to use a relative simple and efficient model and keep the third model (Model 3) as the best one

till now.

The normal quantile-quantile plot is represented at the top right of Figure 5.7, showing the standard-

ized residuals in ascending order from left to right. The plotted curve are supposed to be close to a

straight line if the residuals follow a normal distribution. Here we have a clear and large deviation from

this theoretical line for small and large residuals, indicated for a “heavy tail”-distribution, i.e., a distribu-

tion with higher probability of events occurring in its tails [43]. The heavy-tail problem is slightly reduced,

compared with the Model 1 and Model 2, but the problem is still obvious and the fitting is hard to be

42

treated as a good one.

Figure 5.7: General plots of regression Model 3. The heavy-tail problem indicates that the fitting is hardto be treated as a good one.

It is worth to mention that we also applied the three functions above to the dataset in which the

extreme values found by Mahalanobis distance were excluded. However the test set can not be operated

into the same scale as the train set by the Mahalanobis distance due to the fact that Mahalanobis

distance uses each time sample’s own mean and variance. If a model is trained by train set without

extreme values, even though the R2s are improved in the train set, the performance of new model in the

pure test set can be even slightly worse than the previous one trained by train set with extreme values.

A solution might be to calculate the Mahalanobis distance for the test set using the mean and variance

of the train set. Since the package in R does not allow to set the mean and variance, due to the time

limits this approach has to be left for future work. Now we use all observations in the train set including

the extreme values, and we next explore the robust regression and quantile regression, which are less

influenced by the latent ’outliers’, based on currently the best model ’Model 3’.

Robust linear regression

Robust regression analysis provides an alternative to a least squares regression when fundamental

assumptions are unfulfilled by the nature of the data [44]. As we know that L1-norm robust regression is

an alternative way of the classical regression which is based on ordinary least squares, when data are

43

contaminated with outliers or influential observations as we mentioned in Section 3.3.1.

Table 5.4: The coefficients of the five covariates and their significance in the robust linear regression.Estimate Std. Error t value Pr(>|t|)

β0 1337.53410 73.01018 18.31983

β1 -0.37826 0.03993 -9.47298 < 2.2e-16

β2 -5.68856 0.13404 -42.43847 < 2.2e-16

β3 0.00134 0.00004 34.07648 < 2.2e-16

β4 389.05375 16.15629 24.08064 < 2.2e-16

β5 -0.00006 0.00001 -10.59057 < 2.2e-16

R2 is 0.8640, R2 on test is 0.8669

Note that the robust regression is one specific case of quantile regression when the quantile required

is equal to 0.5. Thus before showing the whole picture of fitted values from quantile regression, we first

provide an analysis on the robust linear model based on the median of response variable. Table 5.4 de-

scribes the coefficients of the five covariates and their significance in this robust regression. Compared

with the same set of covariates in the classical linear model Model 3 in Table 5.3, the coefficients and

their standard deviation have changed slightly. The R2 for the train set is 0.8640 and for the test set it

0.8669, smaller by 0.2% compared with classical linear regression, showing that robust linear regression

can not handle the outliers better than classical linear regression in this case.

Quantile linear regression

After the robust regression, we extent our study focusing not only on median but also on a set of

quantiles {0.05, 0.10, 0.25, 0.75, 0.90, 0.95} of the response variable.

Table 5.5: The summary of coefficients for difference quantiles.tau= 0.05 tau= 0.10 tau= 0.25 tau= 0.50 tau= 0.75 tau= 0.90 tau= 0.95

β0 764.27 1120.48 1582.73 1276.90 934.43 508.14 576.51β1 0.15 -0.12 -0.50 -0.35 -0.20 -0.05 -0.09β2 -7.99 -7.52 -6.38 -5.42 -4.13 -2.12 -2.23β3 0.0022 0.0021 0.0016 0.0013 0.0008 0.0001 0.0001β4 466.89 490.85 418.87 351.05 361.74 373.10 388.88β5 -0.00018 -0.00013 -0.00005 -0.00006 -0.00007 -0.00007 -0.00006

Figure 5.8 visualizes the change in quantile coefficients described in Table 5.5 along with confidence

intervals for all coefficients. Each black dot is the corresponding variable’s coefficient for the quantile τ

chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}. The red lines are the ordinary least squares esti-

mate and its confidence interval.

We find that the absolute value of covariates ’time to maturity’ is smaller (i.e. it has negatively less

influence on the response variable) when the quantile of implied volatility is larger. Covariate ’time to

maturity × strike’ acts the same way but following positively tendency instead of negatively in previous

case. Covariate ’strike2’ have more significant impact on lower quantiles of implied volatility than the

upper quantiles.

44

Figure 5.8: Plots for coefficients in quantile linear regression. Each black dot is the correspondingvariable’s coefficient for the quantile τ chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}. The red linesare the ordinary least squares estimate and its confidence interval.

In order to display intuitively the effect and goodness of fitting of quantile regression, Figure 5.9

shows the quantile fitted values of response variable respectively in lower quantile (5%), median and

higher quantile (95%). We can see that the fitted values of lower and higher quantile regressions can

basically cover the boundaries of the entire distribution of response variable and median regression

catches the major feature of response variable as well.

In fact, there is a more robust way to estimating the performance of models by prediction intervals for

linear methods, due to the lack of requirement for the assumption of normally distributed residuals. In

particular, to explain better and visualize straightforward the result, we can define the prediction interval

whose boundaries can be adjusted by the lower and higher quantiles of response variable. In this

case, shown in Figure 5.10, we set two quantiles as 0.05 and 0.95 to create a 90% prediction interval,

then calculate the values in 5% quantile and 95% quantile as two boundaries based on train set. The

observations in test set are checked later on whether their true values are contained in the prediction

interval.

In this way, there are 83.27% observations satisfied to be covered correctly by prediction interval, i.e.

the range of the predictions in 5% and 95% quantile regressions. It is actually a pretty good result and it

will be compared with the prediction accuracy of next model, random forests.

45

(a)Low

erquantile.(b)

Median.

(c)H

igherquantile.

(d)Low

erquantile.(e)

Median.

(f)H

igherquantile.

Figure5.9:

Quantile

LinearR

egressionsfor

transformed

implied

volatilityw

ithstrike

andtim

eto

maturity.

Thequantile

fittedvalues

ofresponse

variablerespectively

inlow

erquantile,median

andhigherquantile

areshow

n.

46

Figu

re5.

10:

90%

pred

ictio

nin

terv

alfo

rtes

tset

inqu

antil

elin

earr

egre

ssio

n.

47

5.3.3 Quantile Random Forest

It is believed that for the dataset in which the relationship between response variable and covariates

follows a linear structure, the linear regression can be more efficient and accurate. Instead, if the rela-

tionship is highly non-linear or complicated, the tree-based models may give better results and expla-

nations. The linear regression above shows an acceptable performance with 83.27% observations in

test set covered correctly by the prediction interval, so our next job is to check on the fitting of quantile

random forests.

Described in Section 3.3.2, the logic of quantile regression forests is simple: let the model to record

not only the mean value of response variables, but all observed values of response in every leaf of

each tree in the forest. Therefore, based on the whole information of each leaf, we can attain the full

conditional distribution of response values for every observation and we can define the prediction interval

as well. In this case, we calculate the values in 5% quantile and 95% quantile as two boundaries in each

leaf based on the train set. After training, in prediction, each observation in test set will go through every

tree and get the prediction set as large as the number of trees. The observations in test set which fall

into a leaf will receive its prediction in the range of the boundaries with 90% probability and we can

easily check if the range contains the true value for the test set later. Note that obviously other estimates

of the distribution like median can be calculated by the quantile random forests. Figure 5.11 shows the

true value and the prediction interval for each observations in the test set, and Figure 5.12 shows the

quantile fitted values of response variable respectively in lower quantile (5%), median (50%) and higher

quantile (95%) for random forests.

Note that in Section 3.3.2, we mentioned one of the advantages of random forests is that: because

not all the covariates are considered in building each node, this model can avoid the over-fitting problem

as long as the number of trees is large enough. Therefore we build 1000 trees without the limitation on

the depth, evaluated by one of the popular criterion methods, mean square error, and bootstrap resam-

pling method. As discussed above, the prediction values at 5% quantile and 95% quantile compose the

lower and upper bound of the prediction interval.

After training, the R2 calculated based on the median prediction for this random forest is surprisingly

0.9907, stating that the covariates explain approximately 99.07% of the variance of the response vari-

able. But note that since different observations contribute different amount of information, the adjusted

R2 may not be the best evaluation method to compare the results of the linear regression (here in this

case it is specifically the median-based L1-norm regression) with the results of random forest (here it is

specifically the median prediction).

Here we use the correct rate at which the true values fall exactly inside of the 90% prediction interval

to compare the two types of models. For random forests, the correct rate is 0.8242, meaning that 82.84%

is predicted correctly with 90% probability. This rate is lower than the one got from linear regression,

which is 0.8327. It shows as expected that each prediction interval describes the upper and lower

bounds for each single observations, therefore, the intervals are tighter and more accurate for predicting

the possible future values for response variable.

48

Figu

re5.

11:

90%

pred

ictio

nin

terv

alfo

rtes

tset

inQ

uant

ileR

ando

mFo

rest

.

49

(a)Low

erquantile.(b)

Median.

(c)H

igherquantile.

(d)Low

erquantile.(e)

Median.

(f)H

igherquantile.

Figure5.12:Q

uantileR

andomForests

fortransformed

implied

volatilityw

ithstrike

andtim

eto

maturity.The

quantilefitted

valuesofresponse

variablerespectively

inlow

erquantile,median

andhigherquantile

areshow

n.

50

5.3.4 A comparison in test set

In order to compare how many observations in test set fall correctly into the 90% prediction interval

generated by the train set (correct rate) through two regression models, we show a brief summary in

Table 5.6.

Table 5.6: Summary of two regression models’ results

Model Structure Correct Rate

Quantile Linear Regression Model 3 83.27%

Quantile Random Forests 1000 trees 82.84%

What is more, because in the test set we have 1411 observations, the figures above are only dis-

played to give a perspective of the general fitting result and the tendency of the implied volatility together

with its prediction interval. Here to be clearly visualized, we choose the first 20 observations to show the

predictions of these observations.

In quantile linear regression, the width of each prediction interval is approximately equal for each

observation. Figure 5.13(a) shows that the quantile linear regression gives prediction intervals with

relatively even widths for these observations, while in Figure 5.13(b) the widths of prediction intervals

are larger when the differences between true values and median predictions are larger for quantile

random forests. The widths of prediction intervals for quantile random forests are generally tighter than

those in quantile linear regression.

This is probably the reason why quantile random forest has less correct rate than quantile linear

regression. Thus even through the points failed to fall into the prediction interval are five, four points

among them are very close to the boundaries. While in quantile linear regressions, there are four points

missed the interval, but only two are close to the boundaries. This indicates that quantile random forest

is more feasible to give accurate prediction and prediction interval for well-fitted observations, and for

the badly-fitted points it sacrifices its accuracy (i.e., by increasing the prediction range) but allows more

space to make correct decision (i.e., to cover the true value under the prediction interval) with 90%

probability.

51

(a) Linear regression.

(b) Random Forests.

Figure 5.13: Two methods with median prediction and quantile boundaries.

52

Chapter 6

Conclusions

6.1 Achievements

The motivation for this work was to develop understanding in the field of options trading, and to propose

a way for computation and estimation of the important parameter, implied volatility. Basically, three steps

were implemented.

Firstly, we reorganized the original dataset and prepared the environment for the calculation of im-

plied volatility. By figuring out the relationship of options together with future contracts, discount values

and the price of underlying index added, we had a clear idea of the tendency of option’s price, and

combined the put options with call options based on Put-Call Parity.

Secondly, with respect to the valuation issue of a derivative’s contracts in finance, the volatility of

the price of the underlying asset is unknown. We derived the implied volatility for each contract through

the Black-Scholes formula and used the bisection method to compute the estimate value of the implied

volatility. After calculating the implied volatility, we analyzed the calculation stability and compared the

differences of results. Here we generated in total 4 types of implied volatility from two forms of Black-

Sholes formula based respectively on the two inputs (future prices or index prices) and two calculation

methods for the range of option prices.

Thirdly, we selected a subset which has some periodic features and has relatively no lack of trading

information, and used it for estimating and predicting the implied volatility by linear and tree-based

regressions. Not only was the median regression considered, but quantile method were applied to

establish the prediction interval and a more general perspective for understanding the dataset. Through

the prediction intervals we built for both linear and random forest regressions, we compared the results

and analyzed the advantages and disadvantages of both models. Based on our analysis, the models

we generated could explain the most of the observations and give a not bad prediction.

From the data set point of view, there were two kinds of dataset we studied during this work, one kind

with information of market option trading movement which was used to derive the implied volatility, the

other kind with information of volatility smile phenomenon which was used to model the implied volatility

based on time to maturity and strike. Due to the large size of the datasets, for each kind of dataset,

53

we extracted one subset to capture the latent patterns and display the results. The ’Example Set’ was

representative of the first kind, containing the information from Jan 03, 2014 to Mar 21, 2014, while the

’maturity at Sept 18, 2015’ gathered the option expired in this maturity.

6.2 Directions for Future Work

Further research in the development of this analysis of real option trades regarding to implied volatility in

this thesis is expected, to explore the full information from the option contracts and reduce the uncertainty

on the behavior of implied volatility for traders.

As for fitting model for the dataset concerned, we have already benefited from catching the subset

with significant features instead of studying the entire dataset. In the future the subsets can be chosen

more specifically, for example, options with one-day time to maturity, options which are only in-the-money

or pure out-of-the-money, options whose maturities are very close to the current index price, etc..

We explored a bit on Gradient Boosting Regression Trees, but due to the tight time and computation

limitation of the computer, it is not easy to calibrate the parameters through cross-validation. Recall that

the type of ensemble techniques we applied, bagging (including random forests), was generated by firstly

creating several subsets of the original training dataset through bootstrap resampling, secondly fitting

each decision tree independently with entire or partial predictors, and finally combining all trees into a

single predictive model based on simple averaging of models after the ensemble. The boosting method

works in a similar way, but different from random forests, the boosting focuses on fitting the residuals

after the prediction of previously generated trees. It is a forward, stagewise procedure. At each iteration,

a new base-learner model is added to increase emphasis on observations fitted poorly by the previous

trees and trained with respect to deduce the error of the current ensemble entity. By using the quantile

Loss Function, a general distribution of the response variable can be captured. By setting the specific

quantile for the Loss function, a Gradient Boosting Regression Tree used for explaining the distribution

of response variable at that quantile is generated, thus its conditional quantile interval can be predicted

as well. Applying Gradient Boosting Regression Tree could possibly give better results for prediction

accuracy in the further study.

If a model is trained by a train set without extreme values, even though the R2s are improved in the

train set, the performance of new model in the pure test set can be even slightly worse than the previous

one trained by the train set with extreme values. A solution might be to calculate the Mahalanobis

distance for the test set using the mean and variance of the train set, and this approach has to be left for

future work.

In Chapter 4 we explained the computation of implied volatility with asset price involved IVS . Due

to the limitation of time, we have not found a very straightforward way to compare the details of the two

datasets, except the basic statistical description. The two datasets are supposed to supplement with

each other as we explained at the beginning of the thesis, which needs to have further attention to dig

out this latent relationship.

In a nutshell, the work developed within this thesis on how to analyze a dataset of real option trading

54

has established a framework which has only been explored on one branch of interesting subsets and its

properties. If desired, it can be extended and give more realistic suggestions in the future.

55

56

Bibliography

[1] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Econ-

omy, 81(3):637–654, 1973.

[2] R. C. Merton. Theory of rational option pricing. The Bell Journal of Economics and Management

Science, pages 141–183, 1973.

[3] P. Tankov. Financial modelling with jump processes, volume 2. CRC press, 2003.

[4] J. D. MacBeth and L. J. Merville. An empirical examination of the black-scholes call option pricing

model. The Journal of Finance, 34(5):1173–1186, 1979.

[5] G. N. Gregoriou. Stock market volatility. CRC press, 2009.

[6] G. W. Schwert. Stock market volatility. Financial Analysts Journal, 46(3):23–34, 1990.

[7] S. J. Koopman, B. Jungbacker, and E. Hol. Forecasting daily variability of the s&p 100 stock index

using historical, realised and implied volatility measurements. Journal of Empirical Finance, 12(3):

445–475, 2005.

[8] T. E. Day and C. M. Lewis. Stock market volatility and the information content of stock index options.

Journal of Econometrics, 52(1-2):267–287, 1992.

[9] S. L. Heston. A closed-form solution for options with stochastic volatility with applications to bond

and currency options. The Review of Financial Studies, 6(2):327–343, 1993.

[10] R. C. Merton. Option pricing when underlying stock returns are discontinuous. Journal of Financial

Economics, 3(1-2):125–144, 1976.

[11] H. Park, N. Kim, and J. Lee. Parametric models and non-parametric machine learning models for

predicting option prices: Empirical comparison study over kospi 200 index options. Expert Systems

with Applications, 41(11):5227–5237, 2014.

[12] R. Koenker and G. Bassett Jr. Regression quantiles. Econometrica: journal of the Econometric

Society, pages 33–50, 1978.

[13] I. Takeuchi, Q. V. Le, T. D. Sears, and A. J. Smola. Nonparametric quantile estimation. Journal of

Machine Learning Research, 7(Jul):1231–1264, 2006.

57

[14] N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7(Jun):

983–999, 2006.

[15] F. Zikes and J. Barunık. Semi-parametric conditional quantile models for financial returns and

realized volatility. Journal of Financial Econometrics, 14(1):185–226, 2014.

[16] D. Brigo and F. Mercurio. Interest rate models-theory and practice: with smile, inflation and credit.

Springer Science & Business Media, 2007.

[17] J. C. Cox, J. E. Ingersoll Jr, and S. A. Ross. A theory of the term structure of interest rates. In

Theory of Valuation, pages 129–164. World Scientific, 2005.

[18] P. Wilmott. Paul Wilmott introduces quantitative finance. John Wiley & Sons, 2007.

[19] J. Voit. The statistical mechanics of financial markets. Springer Science & Business Media, 2013.

[20] H. R. Varian. The arbitrage principle in financial economics. The Journal of Economic Perspectives,

1(2):55–72, 1987.

[21] J. C. Cox and M. Rubinstein. Options markets. Prentice Hall, 1985.

[22] P. Giot. Relationships between implied volatility indexes and stock index returns. The Journal of

Portfolio Management, 31(3):92–100, 2005.

[23] W. F. Sharpe, G. J. Alexander, and J. V. Bailey. Investments, volume 6. Prentice-Hall Upper Saddle

River, NJ, 1999.

[24] D. Bachrathy and G. Stepan. Bisection method in higher dimensions and the efficiency number.

Periodica Polytechnica. Engineering. Mechanical Engineering, 56(2):81, 2012.

[25] R. Sakia. The box-cox transformation technique: a review. The Statistician, pages 169–178, 1992.

[26] P. C. Mahalanobis. On test and measures of group divergence, part i: Theoretical formulae. 1930.

[27] G. E. Box and D. R. Cox. An analysis of transformations. Journal of the Royal Statistical Society.

Series B (Methodological), pages 211–252, 1964.

[28] J. W. Osborne. Improving your data transformations: Applying the box-cox transformation. Practical

Assessment, Research & Evaluation, 15(12):1–9, 2010.

[29] P. C. Mahalanobis. On the generalised distance in statistics. Proceedings of the National Institute

of Sciences of India, 1936, pages 49–55, 1936.

[30] P. Filzmoser and K. Hron. Outlier detection for compositional data using robust methods. Mathe-

matical Geosciences, 40(3):233–248, 2008.

[31] P. J. Rousseeuw and B. C. Van Zomeren. Unmasking multivariate outliers and leverage points.

Journal of the American Statistical Association, 85(411):633–639, 1990.

58

[32] A. S. Hadi. Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society.

Series B (Methodological), pages 761–771, 1992.

[33] R. Farebrother. The historical development of the l1 and l∞ estimation procedures 1793–1930.

Statistical Data Analysis Based on the L1 Norm and Related Methods, North-Holland, Amsterdam,

pages 37–63, 1987.

[34] Y. Li and J. Zhu. L 1-norm quantile regression. Journal of Computational and Graphical Statistics,

17(1):163–185, 2008.

[35] G. A. Seber and A. J. Lee. Linear regression analysis, volume 936. John Wiley & Sons, 2012.

[36] D. Pollard. Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7

(2):186–199, 1991.

[37] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

[38] J. Mingers. An empirical comparison of pruning methods for decision tree induction. Machine

Learning, 4(2):227–243, 1989.

[39] M. Mehta, J. Rissanen, R. Agrawal, et al. Mdl-based decision tree pruning. In KDD, volume 21,

pages 216–221, 1995.

[40] M. J. Kearns and Y. Mansour. A fast, bottom-up decision tree pruning algorithm with near-optimal

generalization. In ICML, volume 98, pages 269–277, 1998.

[41] G. James, D. Witten, T. Hastie, and R. Tibshirani. An introduction to statistical learning, volume

112. Springer, 2013.

[42] A. Liaw, M. Wiener, et al. Classification and regression by randomforest. R News, 2(3):18–22,

2002.

[43] P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted least-squares.

Communications in Statistics-theory and Methods, 6(9):813–827, 1977.

[44] R. Maronna, R. D. Martin, and V. Yohai. Robust statistics. John Wiley & Sons, Chichester. ISBN,

2006.

59

60

Date post:	22-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Modeling Implied Volatility - ULisboa · Volatility is a measure of randomness, allowing us to...

Documents