Modeling Implied Volatility
Rongjiao Ji
Thesis to obtain the Master of Science Degree in
Mathematics and Applications
Supervisor(s): Prof. Cláudia Rita Ribeiro Coelho Nunes PhilippartProf. Maria do Rosário de Oliveira Silva
Examination Committee
Chairperson: Prof. António Manuel Pacheco PiresSupervisor: Prof. Cláudia Rita Ribeiro Coelho Nunes Philippart
Member of the Committee: Prof. Maria da Conceição Esperança Amado
November 2017
ii
Acknowledgments
I would like to show the great appreciation to my supervisors, Prof. Claudia Nunes Philippart and Prof.
Maria do Rosario de Oliveira Silva, for their guidance and suggestion. I am very grateful to my family
for their love, support and comprehension. I deeply appreciate the help, company and encouragement
from Ana Galhoz and Chenshan Xu.
iii
iv
Resumo
Relativamente a questao de determinacao de precos de contratos de produtos derivados, a volatili-
dade do preco da accao ao longo do tempo e frequentemente desconhecida. Volatilidade e uma medida
da aleatoridade, permitindo avaliar quao incerto e o movimento do preco no futuro.
Neste trabalho deriva-se a volatilidade implıcita em cada contrato, usando a formula de Black-
Scholes. Como nao e possıvel determinar analiticamente este valor, sendo necessario recorrer a
metodos numericos, recorre-se ao metodo da bisseccao.
Discute-se a determinacao da volatilidade implıcita usando como valor de entrada na formula de
Black-Scholes o preco dos futuros e o preco das accoes envolvidas. Adicionalmente apresentam-se
dois metodos de calculo, de forma a aumentar a precisao das estimativas obtidas.
Apresentam-se varios modelos para ajustamento dos valores obtidos, nomeadamente modelos
baseados em regressao quantılica linear e florestas aleatorias. Usando estes modelos, e feita previsao
de volatilidade, a qual podera ser utilizada para prever o preco de uma accao no futuro. Desta forma os
investidores poderao ter mais informacao referente as suas decisoes de investimento, nomeadamente
se deverao comprar ou vender opcoes.
Palavras-chave: volatilidade implıcita, formula de Black-Scholes, regressao quantılica, flo-
restas aleatorias
v
vi
Abstract
With respect to the valuation issue of a derivative’s contracts in finance, the volatility of the price of
the underlying asset is often unknown. Volatility is a measure of randomness, allowing us to assess how
uncertain the price movement is in the future.
In this work we first derive the implied volatility for each contract, using the Black-Scholes formula.
Since it is not possible to determine the implied volatility analytically, one needs to resort to numerical
methods. Here we propose to use the bisection method to compute the estimated value of the implied
volatility.
The determination of implied volatility is discussed afterwards, using the future price and the asset
price as input value in the Black-Scholes formula. In addition, two calculation methods are presented, in
order to increase the accuracy of the estimates obtained.
Several models are presented for adjustment of the obtained values, namely models based on linear
quantile regression and random forests. Using these models, we may forecast the implied volatility, and
then use these values to predict the price of an option contract in the future. In this way, the investors
are able to gather more information regarding their investment decisions, in particular they may decide
about buying or selling the options, according to their expectations regarding the behavior of the market
in the future.
Keywords: implied volatility, Black-Scholes formula, quantile regression, random forests.
vii
viii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Combination of options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 EURO STOXX50 index price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Financial Theoretical Overview 9
2.1 Assumptions On The Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Financial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Interest Rate / Discounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Forward and Future Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Implied Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Put-Call Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Black-Scholes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Statistical Theoretical Overview 17
3.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Pre-processing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 The Formula of Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Quantile Regression Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Linear regression and its quantile application . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 Tree-based regressors and its quantile application . . . . . . . . . . . . . . . . . . 22
4 Computation and Analysis of Implied Volatility 27
4.1 Calculation Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Analysis of the computed implied volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
ix
4.2.1 Analysis and comparison based on the example . . . . . . . . . . . . . . . . . . . 30
4.2.2 Analysis on the entire IVF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Modeling and Predicting the Implied Volatility 35
5.1 The process of choosing a subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Explanatory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.1 Pre-processing process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.3 Quantile Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.4 A comparison in test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Conclusions 53
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Bibliography 57
x
List of Tables
1.1 The distribution of trading dates and maturities on weekdays for call and put options.
Almost all the maturities concentrate on Friday. . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The distribution of trading dates and maturities in months for call and put options. The
months with more maturities are March, June, September and December. . . . . . . . . . 3
1.3 The descriptions of relevant terms which are provided from three types of derivatives
(option, future and discount). The notation star(*) indicates that the marked term is added.
The ’Example Set’ which contains the trading information for a fixed date (January 03,
2014) and maturity (March 21, 2014) is shown at the far right. . . . . . . . . . . . . . . . . 4
1.4 Some call and put options valid from Jan 03, 2014 to Mar 21, 2014 with strike ranged
from 3000 to 3200. These options represent the options whose prices are close to the
crosspoint of call and put options prices. The values in the brackets are ask prices, while
the values outside are bid prices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Parameter λ’s values and corresponding transformations . . . . . . . . . . . . . . . . . . 19
3.2 Comparison between linear, median, and quantile regression methods. . . . . . . . . . . 21
4.1 For the contracts with estimate implied volatilities IV stableF . . . . . . . . . . . . . . . . . . . 33
5.1 Statistical description for options whose maturities are at 2015-09-18. . . . . . . . . . . . 37
5.2 An overview in terms of the response variable together with original covariates and ones
generated afterwards in regression models. Both notations and descriptions are displayed
representatively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Summary of covariates on regression functions . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 The coefficients of the five covariates and their significance in the robust linear regression. 44
5.5 The summary of coefficients for difference quantiles. . . . . . . . . . . . . . . . . . . . . . 44
5.6 Summary of two regression models’ results . . . . . . . . . . . . . . . . . . . . . . . . . . 51
xi
xii
List of Figures
1.1 The tendencies of option prices of the call and corresponding put options with strike price
in Example Set. The enlargement of the crosspoint of lines is shown in the top right,
where the differences of bid prices and ask prices, overlapped in the original scale, are
visible and clear. With strike rising up, the price of call options declines, while the price of
put options increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Four types of call and put options’ prices for different maturities. . . . . . . . . . . . . . . . 7
4.1 Framework of computation Method 1 (m1) and Method 2 (m2). . . . . . . . . . . . . . . . 28
4.2 The framework of how we organize the related datasets. . . . . . . . . . . . . . . . . . . . 30
4.3 Different types of the implied volatility based on the Example set. Implied volatility com-
puted by future price, asset price and constructed price in two computation methods are
shown in different colors. The results computed by method 1 (denoted as m1 in the leg-
end) are denoted by the notation ’o’, while the ones by method 2 (denoted as m2 in the
legend) are denoted by the notation ’+’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Comparison of implied volatility derived by both methods. . . . . . . . . . . . . . . . . . . 32
4.5 Option (ask) prices of their call and put options for the contracts with future price involved. 33
5.1 Boxplot of trading dates for every maturities appeared. . . . . . . . . . . . . . . . . . . . . 36
5.2 Sample sizes and rates of each maturity. The red numbers at the top of the columns
explain the proportion of the sample size of each maturity in the total amount of the ob-
servations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Pair plot for options whose maturities at 2015-09-18. Here we plot the seven variables,
namely time to maturity, strike, constructed price, IV (m1),IV (ask), IV (bid) and IV (m2). 38
5.4 3D plots from the side of both time to maturity and strike to check the distribution of implied
volatility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.5 The distribution of response variable before and after Box-Cox transformation. . . . . . . 39
5.6 Plots with extreme values detected by robust Mahalanobis distance. Points whose ro-
bust Mahalanobis Distances are the largest 1% amongst the dataset would be treated as
extreme values and marked green . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.7 General plots of regression Model 3. The heavy-tail problem indicates that the fitting is
hard to be treated as a good one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
xiii
5.8 Plots for coefficients in quantile linear regression. Each black dot is the corresponding
variable’s coefficient for the quantile τ chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}.
The red lines are the ordinary least squares estimate and its confidence interval. . . . . . 45
5.9 Quantile Linear Regressions for transformed implied volatility with strike and time to matu-
rity. The quantile fitted values of response variable respectively in lower quantile, median
and higher quantile are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.10 90% prediction interval for test set in quantile linear regression. . . . . . . . . . . . . . . . 47
5.11 90% prediction interval for test set in Quantile Random Forest. . . . . . . . . . . . . . . . 49
5.12 Quantile Random Forests for transformed implied volatility with strike and time to maturity.
The quantile fitted values of response variable respectively in lower quantile, median and
higher quantile are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.13 Two methods with median prediction and quantile boundaries. . . . . . . . . . . . . . . . 52
xiv
Chapter 1
Introduction
1.1 Motivation
The famous Black-Scholes formula [1] was the first formal reply to the one million dollar question: how
to valuate an option contract. Even though this question was studied for more than hundred years, Black
and Scholes [1] and Merton [2] in 1973 provided a breakthrough result, and it is the footstone of modern
option pricing studies. An outsider may have the idea that the main goal is to capture the empirical
properties of option prices. Here the logic and purpose behind is that an option pricing model can be
used as a tool, to obtain the features of option prices, such as implied volatility, by relating the option
price with the price of the underlying asset in an arbitrage-free manner (i.e. no way to make a risk-free
profit) [3]. The main thrusts behind generating the Black-Scholes model is to find a way like this to fit the
market prices of options. The Black-Scholes formula is not used as a pricing model for vanilla options
(which are the options without any special feature), but applied to transform the market prices into an
expression in terms of implied volatility [4].
As far as we know, volatility is a measure of the randomness and an evaluation of how uncertain the
price movement is in the future, specifically towards the price of an option’s underlying asset [5]. The
higher volatility indicates the greater expected fluctuation for the underlying asset’s price [6]. Although
the volatility associated with the underlying asset(s) is assumed to be constant and known in Black-
Scholes formula, in reality it is changing and mostly unknown. Different from studying the historical
volatility of the asset’s price, nowadays the focus shifts to the implied volatility [7]. Prices in option
markets are actually commonly quoted in terms of Black-Scholes implied volatility. The value of a call
option generated by Black-Scholes formula is a strictly increasing function of the volatility parameter
[3]. Thus, given observed option prices in the market, the value of implied volatility can be deduced by
matching the corresponding Black-Scholes price with the market price [8]. The implied volatility from the
real market shows a pattern named volatility smile, influenced by time to maturity and strike price of the
option. Thus some advanced alternative parametric models were generated for implied volatility.
With the development of artificial intelligence and machine learning rising up rapidly, and the zoom-
ing computational ability of hardwares in the recent two decades, the conception of machine learning
1
becomes the hotspot and brings some brand new perspectives of understanding the contemporary data-
driven society and industries. Finance, in particular, is a representative industry established by count-
less buyers and sellers, and tremendous data caused through the trading. Different from the theoretical
parametric models, such as stochastic volatility model [9] and jump-diffusion model [10], non-parametric
models based on machine learning techniques, which has weaker restrictions than the former in usual
cases, have been developed for pricing options [11].
Quantile method [12] has been studied and used in multiple fields for regression problems. Popularly,
statisticians and engineers are used to analyzing the conditional mean of the response variable by min-
imizing the expected squared error loss function, neglecting other aspects of the conditional distribution
but only mean. Quantile method, beyond the conditional mean, give more complete information about
the distribution of response variables as a function of the dependent variable other than one value alone.
Not only can the quantile method build the prediction intervals to judge the reliability of predictions, it can
also detect the extreme values of response variable if one observation lies far from the conditional me-
dian for example. While the conditional mean minimizes the expected squared error loss, the conditional
quantiles can minimize the expected weighted absolute deviations loss. Takeuchi et al. [13] presented
a nonparametric version of a quantile estimator which provided uniform convergence and bounds on
quantile property. Meinshausen [14] inferred the conditional quantiles with quantile regression forests
as a generalization of random forests. It starts to be used in financial studies as well. Zikes and Barunık
[15] investigated how the conditional quantiles of future returns and volatility of financial assets vary with
the asset prices’ variation through simple linear quantile autoregressions.
In this work, we use some of these methods to predict future volatility. For that purpose, we use data
(that will be discussed in the next section) from which we can compute the implied volatility and perform
a further study.
1.2 Dataset Description
The dataset used in this work is provided by the BNP Paribas bank, summarizing information over three
types of derivatives (option contracts, discount values and future contracts). These three derivatives
are all based on the same underlying asset, the EURO STOXX50 index, which is an Europe’s leading
Blue-Chip index for the Eurozone. From the original dataset, there are 312339 options contracts (where
147416 are call options contracts and 164923 are put options contracts), 7255 discount values, and
1153 future contracts.
Specifically, as the main object, option contracts include trades in 424 different dates, ranged from
January 02, 2014 to October 29, 2015. In Tables 1.1 and 1.2, we can gather the information for the
distribution of date and maturity for call and put options in weekdays and months. From a simple ob-
servation of these tables, regarding the dates of option contracts, they are almost uniformly distributed
amongst weekdays, a bit more falling in summer and autumn, while less falling in winter. However for
the maturities, almost all the maturities concentrate on Friday. Also the months with more maturities are
March, June, September and December.
2
Table 1.1: The distribution of trading dates and maturities on weekdays for call and put options. Almostall the maturities concentrate on Friday.
Monday Tuesday Wednesday Thursday Friday
Call
Date 20.30% 20.80% 20.50% 19.50% 18.80%
Maturity 0.13% 0.08% 0.00% 0.75% 99.00%
Put
Date 20.10% 20.60% 20.50% 19.90% 18.80%
Maturity 0.13% 0.07% 0.00% 0.92% 98.90%
Table 1.2: The distribution of trading dates and maturities in months for call and put options. The monthswith more maturities are March, June, September and December.
January February March April May June July August September October November December
CallDate 8.84% 8.75% 8.29% 5.18% 7.26% 9.32% 10.80% 11.00% 13.00% 11.30% 3.14% 3.19%
Maturity 1.93% 2.25% 8.80% 1.68% 1.79% 24.60% 3.15% 3.61% 9.91% 5.22% 3.21% 33.90%
PutDate 8.70% 9.22% 8.84% 5.76% 7.31% 9.89% 10.50% 10.60% 12.10% 10.80% 3.06% 3.17%
Maturity 2.06% 2.46% 8.62% 2.72% 1.65% 25.40% 4.16% 4.53% 9.65% 5.42% 4.10% 29.20%
After analysis of future contracts, we find that there are only ten distinct maturities inside and these
maturities spread over the third Friday of every quarter from 2014 to the mid year of 2016. As mentioned
in Section 2.2.3, we denote these maturities as ’major maturities’ for simplification in the following work.
In Table 1.3 we present the descriptions of relevant terms. Because of the intrinsic difference between
the products, we separate its description by the type of contract (option, future or discount). We remark
that the date which we are here presenting is not exactly equal to the formatting of the date set initially
provided by the bank BNP. For each date and maturity appeared in the original data set, we transformed
those numerical integers in original information into the current recognizable categorical date format.
This operation can help us understand the patterns behind trading dates. Afterwards, in this original data
set, some non-relevant information was discarded (such as the lot size, bid and ask size). Moreover we
also included four variables which are marked by star and are relevant for the rest of the study, namely:
• Time to maturity (life length of a option contract from its date to maturity.) Note that in this work,
we count the entire natural days among this period, instead of the actual financial trading days.
• Moneyness (calculated based on Equation 2.4 in Chapter 2, which later on we will explain in more
details.)
• Constructed price (the constructed price S of the underlying asset (given the true market price of
the asset is S), deduced from Equation 2.2. The difference between this constructed price and
the asset price will be explained afterwards in the subsection. It is, by the way, a non-standard
terminology, used in this thesis.)
• Interest rate (deduced from Equation 2.1 and mentioned in Subsection 2.2.1.)
3
Table 1.3: The descriptions of relevant terms which are provided from three types of derivatives (option,future and discount). The notation star(*) indicates that the marked term is added. The ’Example Set’which contains the trading information for a fixed date (January 03, 2014) and maturity (March 21, 2014)is shown at the far right.
Term Description Example Set
Date t The beginning date of a derivative contract. 2014-01-03
Maturity T The expiration date of a derivative contract. 2014-03-21
Option ContractsType Call options C or put options P . Both
Strike K The price paid for the asset if the option is exercised.The strike, bid price,
Bid Price The highest price a buyer is willing to pay for the option.ask price and label
Ask Price The lowest price a seller is willing to accept for the option.are shown in Table 1.4
*Moneyness M Deduced by M = KDS = K
F and used to label options.
*Time To Maturity τ The subtraction of date and maturity. 77
*Constructed Price S Constructed as the price of underlying asset through S = FD. 3062.73
Future ContractsBid Price The highst price a buyer is willing to pay for a future contract 3061
Ask Price The lowest price a seller is willing to accept for a future contract 3062
Discount ValueDiscount Value D Discount the value in the future back to current time. 0.9996
Interest Rate r Deduced by D = e−r(T−t). 5.5× 10−6
In order to give readers an understandable perspective on these financial terms and several financial
criteria mentioned below, here we display specifically the trades (marked as ’Example Set’) with trading
date in January 03,2014 and maturity in March 21,2014, as an example to explain the micro structure of
dataset and the following operations. The information in detail is shown in the right side of Table 1.3 and
details of some option contracts is shown in Table 1.4.
1.2.1 Combination of options
Note that the amount of call options is 147416 and the amount of put options is 164923. Due to the
Put-Call Parity, volatility is the same for a call option and a put option with the same combination of date,
maturity and strike (shown in Section 2.4). To avoid calculating the same thing repeatedly, it is vital to
combine the call with corresponding put options. As a result, we have now 107571 pairs of call and put
options.
Following our ’Example Set’ where we extract a set of contracts with the date in January 03, 2014
and maturity in March 21, 2014, the only unfixed variable for pricing the options is strike price. In Table
1.4, together with Figure 1.1 and its enlargement, we can have a clear idea of how the options are
combined, the way that call options and put options are related and the tendency of the price changing.
From Figure 1.1 and Table 1.4, we can figure out that: For call options, when strike value goes up,
the price of call option declines. Oppositely, the price of put option increases. However, the range of
call option prices is larger than the range of put option prices. One of the possible reason might be that
traders believe the market price would go up. What is more, when the call option and put option prices
intersect, it normally means that the case ’at-the-money’ option was reached. In that case, the value is
4
Figure 1.1: The tendencies of option prices of the call and corresponding put options with strike price inExample Set. The enlargement of the crosspoint of lines is shown in the top right, where the differencesof bid prices and ask prices, overlapped in the original scale, are visible and clear. With strike rising up,the price of call options declines, while the price of put options increases.
Table 1.4: Some call and put options valid from Jan 03, 2014 to Mar 21, 2014 with strike ranged from3000 to 3200. These options represent the options whose prices are close to the crosspoint of call andput options prices. The values in the brackets are ask prices, while the values outside are bid prices.
Index Strike Call Option Price Put Option Price Label
28 3000 135.3(137) 63.1 (64) in-the-money for call
29 3025 119.3(120.6) 71.8 (72.7) and
30 3050 104(105) 81.6 (82.5) out-of-the-money for put
constructed price: 3062.73 at-the-money
31 3075 90(90.8) 92.4 (93.3)
out-of-the-money for call32 3100 77(77.7) 104.4 (105.2)
and33 3125 65.2(65.6) 117.6 (118.4)
in-the-money for put34 3150 54.7(55.3) 131.9 (133.4)
35 3175 45.3(46) 147.4 (149.1)
36 3200 37.2(37.9) 164.2 (166)
supposed to be close with ’constructed price’. According to the definition introduced in Section 2.2.4, the
options can be divided into two types by comparing their strike prices with the asset price: ’in-the-money’
call (put) options when strikes are smaller (larger), or ’out-of-the-money’ call (put) options when strikes
are smaller (smaller). This kind of opposite relationship is as expected following their definition and the
opposite sides in the trading movement.
5
1.2.2 EURO STOXX50 index price
As we mentioned at the beginning of this section, the EURO STOXX50 index is namely the underlying
asset under study, i.e. the objective financial asset that our three derivatives associated with. Besides
the three original data sets containing information about options, future and discount, the information
of historical EURO STOXX50 index price (denoted as Smarket) can be obtained easily from the website
’Yahoo Finance’ 1. Note that the newly obtained index prices were captured at 18:00 CET in each trading
day while the options contracts were gathered at 17:15 CET. Thus the index price obtained later, is not
the real-time asset price at 17:15 CET and can variate a bit from the constructed price deduced from
the original dataset, within 45 minutes at very active trading days, in particular at major maturities of
future contracts. Following the ’Example Set’, the value of the EURO STOXX50E index in Jan 03, 2014
is Smarket = 3074.43, while the constructed price S = 3062.73.
According to theoretical inference, for the current date t and a fixed maturity T , when the put option
and call option encounter the same price C(t, T,K) = P (t, T,K) at strike K, via Put-Call Parity in
Equation 2.5 (and Equation 2.6 through another form), there is St = Ke−r(T−t) (and F (t, T,K) = K)
valid for the same strike K. Thus the true value St is supposed to be equal to S, i.e. the constructed
price should worth the same as current true price of underlying asset. Thus, we have two different
situations depending on whether
1. Those options whose maturities belong to also the major maturities of future contracts. In this
situation, their crosspoints shift a bit left from the line of current index price in Figure 1.2(d) for
maturity at 2014-12-19.
The reason of the shifts might be caused by Triple Witching Phenomenon as we mention in Chapter
2, when the contracts for stock index futures, stock index options and stock options expire on the
same day (four times a year on the third Friday of March, June, September and December). Triple
witching days generate trading activity and volatility because contracts that are allowed to expire
may necessitate the purchase or sale of the underlying security. While some derivative contracts
are opened with the intention of buying or selling the underlying security, traders seeking derivative
exposure only must close, roll out or offset their open positions prior to the close of trading on triple
witching days. It is probably the reason why the shifts happened.
What is more, Figure 1.2(c) corresponds to a case where information is not complete, as the
number of records is not enough (less than 10).
2. Those options whose maturities do not belong to the major maturities of future contracts, or their
maturities do belong to the major maturities but the influence of Triple Witching Phenomenon is
not that much and limited.
In this situation, the index price normally passes the crosspoint of the prices of call and put options
by simply observing (like in Figure 1.2(a) and Figure 1.2(b)). As the price of EURO STOXX50E
index is equal to the strike price when C(t, T,K) = P (t, T,K) in most cases, those figures prove
1 From the website ’Yahoo Finance’ https://finance.yahoo.com/quote/%5ESTOXX50E/history?period1=1388620800&
period2=1446076800&interval=1d&filter=history&frequency=1d
6
(a) Maturity 2014-01-17 (b) Maturity 2014-03-21
(c) Maturity 2014-09-19 (d) Maturity 2014-12-19
Figure 1.2: Four types of call and put options’ prices for different maturities.
that the price of the underlying asset is very similar to the index price in these cases, even through
some of the maturities belongs to major maturities in future contracts market. Thus the index price
can be used as a supplement for those options which have no information about the relevant future
price. It can give a rough direction for the investors although its accuracy is not assured.
1.3 Objectives
The work focuses on implementing firstly several financial criterion to a dataset composed by option
contracts, discount values and future contracts, and secondly the quantile methods to a dataset in terms
of the computed implied volatility, in order to generate a clear cognition of real option trading process
and dig out several latent patterns for the financial option market based on the implied volatility of the
price of underlying asset through Black-Scholes formula.
This work first derives the implied volatility for each contract, implicitly by equalizing the estimate
option price (as a function of implied volatility deduced by Black-Scholes formulas) with the option price
in the market, through Bisection method. Black-Scholes formula with i) future price involved, and ii) asset
price involved, and two calculation methods for the boundary condition are discussed with particular
7
emphasis on the accuracy of the results. Then, using the obtained implied volatility, we fit several
prediction models, mainly quantile method on linear regression and random forests. The goal is then,
using the selected models (based on the quality of fitting), to forecast the tendency of implied volatility.
As the dataset has many observations, it is hard to visualize the distribution and significant charac-
teristics. Therefore we focus on the subset with some interesting features to study further.
1.4 Thesis Outline
Chapter 2 introduces the essential financial knowledge in order to understand and analyze the option
contracts dataset from the bank. First part of this chapter starts by explaining the the financial market
assumptions and five common financial derivatives such as the interest rate, index, option, forward and
future, followed by introducing the implied volatility of prices of the underlying asset. Finally, the most
important financial theorems in this work, Put-Call Parity (that relate the put options and call options
prices) and Black-Scholes Formula (to estimate the options prices) are discussed.
Chapter 3 describes the root-finding bisection method, the statistical methods used in pre-processing
steps, and interprets the principle of quantile regression to take into account more general perspectives
of the distribution of response variable, ended by combining the machine learning algorithms such as
Decision Tree and Random Forests with quantile methods.
Chapter 4 demonstrates how to calculate the implied volatility, the main object of this work. On the
contrary of estimating the options prices to a specific number, we generate the estimate prices as a
function of implied volatility deduced by two forms of Black-Scholes formula. We equalize the estimates
with the option prices in the market, and then derive implicitly the implied volatility by Bisection method
for each contract. Due to the fact that the true market prices are composed by bid-ask spread not by a
single value, two methods of calculation are proposed and compared.
Chapter 5 mainly estimates the implied volatility dataset selected and deduced in last chapter. After
pre-processing, this chapter focuses on using quantile methods together with linear regression and
random forests method to fit the dataset. Since the estimate implied volatility has relatively large size
and observations overlap heavily, it is hard to visualize the distribution and significant characteristics,
even harder to fit a proper model, so this chapter focuses on the subset which contains some significant
features to study further. At the end, this chapter displays the results for different modeling results for
the subset selected and compares the the performance of these models.
The thesis concludes with chapter 6, where we also point possible future work.
8
Chapter 2
Financial Theoretical Overview
The first goal of this work is to obtain the features of option prices, implied volatility, by using Black-
Scholes formula as the option pricing model. To understand clearly the financial movement and achieve
this goal, we need to know some knowledge related with financial market. Thus in this chapter we first
present some financial concepts which will be used in the rest of the thesis. Secondly, we address the
main issue of this thesis, which is the implied volatility, providing its definition and the pattern named
volatility smile. The chapter ends with illustrations of the Put-Call Parity which connects call options
prices with put options prices and Black-Scholes formula we mentioned several times.
2.1 Assumptions On The Assets
Due to the fact that in the modern financial markets there are various types of financial derivatives and
therefore different transaction systems and rules, the academic community reaches a consensus and
usually follows several simplified ’ideal conditions’ concerning the financial market and underlying assets
(for instance, stocks). The main assumptions include [1]:
1. The rate of return on the riskless asset is constant through time.
2. The price of the stock follows a Geometric Brownian motion, with drift and volatility assumed to be
constant.
3. The stock pays no dividends or other distributions.
4. The option based on the stock is European, meaning that only operation at maturity is allowed.
5. There is no arbitrage opportunity, i.e., there is no way to make a riskless profit.
6. It is possible to borrow and lend any amount, even fractional, of the price of a security, at the
short-term interest rate.
7. It is possible to buy and sell any amount, even fractional, of the stock (this includes short selling).
8. There are no transaction costs in buying or selling the stock, i.e., a frictionless market.
9
2.2 Financial Derivatives
In finance, a derivative is a contract between two or more parties that derives its value based on an
underlying financial asset (an asset, stock, commodity, index, or interest rate). Common derivatives
include future contracts, forward contracts and options. Here we present the relevant underlying assets
(interest rate and stock index) briefly and mainly focus on derivatives (put and call options, future and
forward contracts).
2.2.1 Interest Rate / Discounts
Interest rate is the amount of interest due per period (day, month, year, ect.). It is a proportion of the
amount that is lent, deposited or borrowed over the same period [16]. If t denotes the current date, and
T denotes the maturity, then the corresponding value of 1 monetary unit at time T at today’s price, given
that the interest rate is r, is:
D(t, T ) = e−r(T−t), (2.1)
in case the interest rate is continuously compounded (here continuously compounding means that the
interest rate is added to the account’s balance every infinitesimal time). There are other compounded
schemes, but here we only assume this one. Single non-arbitrage arguments show that call options
prices increase with the interest rate, where the corresponding put prices decreases [17].
2.2.2 Index
The stock index is generated in order to measure how the stock market is behaving in general. As a
type of financial asset, an index is calculated by the weighted sum of a basket of representative stocks,
to describe the basic forms of option contracts and give a basic understanding of the purpose of options
[18].
For instance, the EURO STOXX 50 is a stock index of Eurozone stocks designed by STOXX, an
index provider owned by Deutsche Borse Group. Introduced on 26 February 1998, it is made up of fifty
of the largest and most liquid stocks. The index futures and options on the EURO STOXX 50, traded on
Eurex, are among the most liquid such products in Europe and the world.
The calculation of the indices employs the Laspeyres formula, which measures price changes against
a fixed base quantity weight. For further details on the construction of this index, we refer STOXX
Calculation Guide1.
2.2.3 Forward and Future Contract
A forward contract is a contract that the buyer promises and is obliged to buy an asset (a stock etc.)
from the seller at some specified price F (t, T ) and time T in the future, given today’s date t [19]. No
1 STOXX Calculation Guide http://www.stoxx.com/indices/rulebooks.html
10
money changes hands until the maturity of the contract. In other words, given the current market price of
the asset (spot price) is St, when it goes to maturity, the buyer will hand over the amount F and receive
the asset, which by then will worth ST .
In order to establish the relationship between F (t, T ) (the forward price when the forward contract
was initiated at time t) and St (the spot price at the same time t), we construct a portfolio as following:
First we enter into a forward contract at price F (t, T ). In the meanwhile, we short-sell the asset and
invest the money (St) in the bank. Therefore we start this strategy without any investment. At the
maturity T , we get Ster(T−t) from the bank, but we are still obliged to pay F (t, T ) for the asset, as settled
in the forward contract, and then return the asset back to the short-seller. In a nutshell, at the maturity we
hold Ster(T−t)−F (t, T ). Based on no-arbitrage principle [20], which means no ’free lunches’, a portfolio
that would have nonnegative payoffs must have a nonnegative cost. Since we begin this portfolio with
zero investment, we should finish these trading with zero profit and loss, which leads to:
F (t, T ) = Ster(T−t) =
StD(t, T )
(2.2)
A future contract is very similar to a forward contract in many ways, but the main difference is that
the profit or loss for no matter buyer and seller is daily calculated and the difference between that spot
price and the initial futures price is paid gradually from one party to another during the validation period
of the contract, leading to the fact that the future price is always equal to the spot price on the maturity.
Although the continual resettlement feature of futures contracts makes it difficult to determine an
equilibrium futures price in terms of its underlying variables, if interest rates are non-stochastic and
there are no arbitrage opportunities, it can be shown that futures prices are equal to forward prices.
Consequently, the valuation formulas given for forward prices will then also hold for futures prices [21].
There is one more thing worth to mention. Future contracts, especially for stock indices, are typically
divided into four or more maturities throughout the year, at which time they are traded most heavily
by day traders. For example, in 2015 there are maturities for future contracts in March 20, June 19,
September 18, and December 18. We denote these maturities which are the third Friday of every
quarter in each year) as major maturities for simplicity in the following part of this work.
Triple Witching Phenomenon happens exactly at these four times of each year. Triple Witching
Phenomenon occurs when the contracts for stock index futures, stock index options and stock options
expire on the same day. Triple witching days generate trading activity and volatility because contracts
that are allowed to expire may necessitate the purchase or sale of the underlying security. While some
derivative contracts are opened with the intention of buying or selling the underlying security, traders
seeking derivative exposure only must close, roll out or offset their open positions prior to the close of
trading on triple witching days.
2.2.4 Option
Different from the holder of a future or forward contract who is obliged to trade the asset at the maturity,
an option in the financial market is a contract giving the right but not the obligation to buy or sell the
11
underlying asset at an agreed price within a specified period of time under some certain conditions.
When the option is European, this right may only be exercised on a chosen future date, called the
maturity. The price that is paid for the asset when the option is exercised is called the strike price [1].
Here, along this thesis, we always assume European options and for this reason, wherever no con-
fusion arises, we omit the word ’European’.
For a call option, the buyer (of the option) has the right to buy the stock, at the maturity, paying a
strike price, whereas in case of a put, he/she has the right to sell the stock [1]. It follows, trivially, that in
case of a call option, the payoff at maturity (i.e., the return of the product) is
max(ST −K, 0),
whereas in the case of a put option is
max(K − ST , 0),
where K is the strike price [18].
In the liquid market, the ’current’ price is an uncertain term, and it only represents the trading price
in the most up-to-date trade which the sellers and the buyers both agree with. Since the bid price is the
highest price that a prospective buyer is willing to pay for the asset till the last trade, and the ask price is
the lowest price acceptable to a prospective seller of the same asset at the same moment, it indicates
that the trade will be executed probably when the trading price is between the bid price and ask price.
The difference between the two prices is called the ”bid-ask spread”. It is a key indicator of the liquidity
of the asset. Generally speaking, the smaller the spread, the better the liquidity. The bid-ask spread can
widen dramatically during periods of ill-liquidity or market turmoil, since traders will not be willing to pay
a price beyond a certain threshold, while sellers may not be willing to accept prices below a certain level
[21].
Moneyness and the types of options
In options trading, the difference between ”in the money” and ”out of the money” is a matter of the
strike price’s position relative to the market value of the underlying asset. Moneyness, here denoted by
M , is defined as [19]:
M =KD(t, T )
St, (2.3)
where, for simplicity, we presently omit T from the notation.
Besides, based on the relationship between the asset prices and the future contracts mentioned in
Equation 2.2, the moneyness can be denoted by
M =K
F (t, T )(2.4)
Therefore, for an option, if its moneyness
• M > 1, the option is an in-the-money call option or an out-of-the-money put option (depending on
12
which contract are we addressing to);
• M = 1, the option is at-the-money option for both call and put option;
• M < 1, the option is an out-of-the-money call option, or an in-the-money put option.
If the difference of St and K is larger, then the option is called deep-in-the money or deep-out-of-the-
money option.
Known that a call option is an investment chosen by those who believe the underlying asset price will
continue to rise, an in-the-money call option is a call option with its strike price lower than the current
stock price. In this case, the option holder is more likely to earn a profit.
Following the same idea as call options, put options are purchased by investors who believe the
asset price will go down. In-the-money put options are the options whose strike prices are above the
current stock price.
2.3 Implied Volatility
Volatility is a measure of the randomness and uncertain risk, specifically a variability of fluctuation to-
wards the price of an option’s underlying security. Volatility of a stock or index price is an evaluation of
how uncertain the price movement is in the future [5]. The higher the volatility, the greater the expected
fluctuations for the underlying asset’s price, and the higher values of both call options and put options
[6].
The unknown input when computing the price of the option is the expected volatility over the life of the
option. In a market economy with actively traded option contracts, which express the market’s view of
the relevant prices for those contracts, one can solve for the volatility that equates the observed market
price of the option contract with the price given by the chosen option pricing formula. This yields the
implied volatility [22]. Therefore, implied volatility is defined as the parameter σ in the option pricing
model Black-Scholes formula that yields the actually observed market price of a particular option, as we
will see later on.
The studies of pricing the options used to focus on the historic volatility, but now the focus shifts to
the implied volatility, which can indicate the expected volatility in the future according to current observed
option prices in the market, given other known option pricing variables (such as asset price and time to
maturity) and option pricing parameters (like interest rate and strike price). Given the historical level of
the underlying asset’s implied volatility, the trader can decide whether to buy an option (when extrinsic
value is on the high end) or whether to sell an option (when extrinsic value is on the low end).
Since the volatility calculated by the observed option prices shows a non-flat shape, similar to a
’smile’, this characteristic is also named ’volatility smile’.
13
2.4 Put-Call Parity
Here we present a popular formula, that relates the prices of a call and a put option, on the same
underlying asset, maturity and strike price. For a fixed date t, we define an European call option and
an European put option with asset price St, maturity T and strike price K. The payoff function for this
portfolio is
max(ST −K, 0)−max(K − ST , 0) = ST −K
Note that the previous formula shows that the profit of a call option minus the profit of a put option is
the same as the following portfolio: borrow Ke−r(T−t) and buy a stock, paying St. At the maturity T , the
return from such portfolio is
ST −K,
i.e., the value of the stock at the maturity minus the amount of money that you need to return back to the
bank.
Therefore, as at the maturity, the payoff of being the owner of a call and the seller of a put is equal to
ST−K, then the price of the call at time t, C(T,K), and the price of the corresponding put, P (T,K), have
to be equal to the initial investment of the portfolio consisting of one stock and money, by non-arbitrage
arguments, therefore [18]:
C(T,K)− P (T,K) = St −Ke−r(T−t). (2.5)
It is a very important financial theorem called Put-Call Parity, and the equality is independent of the
behavior of the asset in the future and holds at any time to maturity.
If traders have long trading positions, it means that they actually hold the traded asset in a long term
and they are concerned when the price of the asset falls. On the opposite, a short position indicates
that a trader first borrows an asset and subsequently sells it on the market. In this way, if the price of the
asset falls, the trader can then buy it back at a lower price to give the asset back to the lender [23].
Noting that we mentioned before the price of forward contract at the maturity is given by F (t, T ) =
Ster(T−t), by replacing the stock price St with forward price F (t, T ), there is one more way to get the
same payoff:
C(T,K)− P (T,K) = F (t, T )e−r(T−t) −Ke−r(T−t) = [F (t, T )−K]D(t, T ). (2.6)
The equation holds because if the stock price at maturity is above the strike price, the call option will
be exercised, while if it is below, the put will be exercised, and thus in either case one unit of the asset
will be purchased for the strike price, exactly as in a forward (future) contract.
Note that the Put-Call Parity (Equation 2.5) holds, assuming in some cases non-realistic conditions,
in particular taking away all frictions and incompleteness of the market. In practice, bid-ask spreads and
liquidity issues imply that observable prices of European options do no align necessarily to the theory.
In this case the left-hand side is a fiduciary call, which is a long call and enough cash (or bonds) to
pay the strike price if the call is exercised, while the right-hand side is a protective put, which is a long
14
put and the asset, so the asset can be sold for the strike price if the stock price is below strike at expiry.
Both sides have payoff max(ST ,K) at maturity (i.e., at least the strike price, or the value of the asset if
more), which gives another way of proving or interpreting Put-Call Parity.
2.5 Black-Scholes Formula
Black and Scholes [1] and Merton [2] in 1973 provided a breakthrough result in the modern option pricing
studies. An outsider may have the idea that the main goal is to capture the empirical properties of option
prices.
An option pricing model can be used as a tool, to obtain the features of option prices, such as
implied volatility, by relating the option price with the price of the underlying asset under the arbitrage-
free assumption to keep a fair market [3]. The main thrusts behind generating the Black-Scholes model
is to find a way to transform the market prices into an expression in terms of implied volatility [4].
Besides the assumptions for the market and underlying asset at the beginning of this section, there
are other assumptions for Black-Scholes model to hold [18], notably the following:
• The underlying stock price follows a log-normal distribution (also known as the Geometric Brownian
Motion);
• The interest rate is fixed or a known function of time;
• There are no dividends on the underlying stock.
Followed the notations in the last subsection, the Black-Scholes formula [1] proposes the estimate
value of the call option C(t, T,K, St, σ) as
C(t, T,K, St, σ) = StΦ(d1)−Ke−r(T−t)Φ(d2), (2.7)
d1 =ln(St
K ) + (r + 0.5σ2)(T − t)σ√T − t
,
d2 = d1 − σ√T − t,
where Φ(·) is the cumulative distribution function of the standard normal distribution, and σ is the volatility
of returns of the underlying asset. It is the square root of the quadratic variation of the stock’s log price
process.
In view of the Put-Call Parity (Equation 2.5), it follows also from Equation 2.7 that the corresponding
put option price C(t, T,K, St, σ) is:
P (t, T,K, St, σ) = −St +Ke−r(T−t) + StΦ(d1)−Ke−r(T−t)Φ(d2)
= −St[1− Φ(d1)] +Ke−r(T−t)[1− Φ(d2)]
= −StΦ(−d1) +Ke−r(T−t)Φ(−d2). (2.8)
Another common form for Black-Scholes formula based on future price and discount value for call
15
option price C(t, T,K, F (t, T ), D(t, T )) is proposed as:
C(t, T,K, F (t, T ), D(t, T )) = D(t, T )(F (t, T )Φ(d1)−KΦ(d2)), (2.9)
d1 =ln(F (t,T )
K ) + 0.5σ2(T − t)σ√T − t
,
d2 = d1 − σ√T − t,
where D(t, T ) = e−r(T−t) is the discount factor, F (t, T ) = Ster(T−t) = St/D(t, T ) is the forward price of
the underlying asset.
Similarly, if we join in the Put-Call Parity (2.6) here for the prices of the call option (2.9), we can then
deduce the prices of the corresponding put option P (t, T,K, F (t, T ), D(t, T )):
P (t, T,K, F (t, T ), D(t, T )) = −[F (t, T )−K]D(t, T ) +D(t, T )(F (t, T )Φ(d1)−KΦ(d2))
= −D(t, T )F (t, T )[1− Φ(d1)] +KD(t, T )[1− Φ(d2)]
= D(t, T )[Φ(−d2)K − Φ(−d1)F (t, T )]. (2.10)
The estimate price of a call option using Black-Scholes is a function of the volatility parameter, and
it is strictly increasing from ]0,+∞[ to ]max{St − Ke−r(T−t), 0}, S0[ [3]. If an observed market price
Cmarket(t, T ) within this range is given, it is possible to compute the estimate of volatility parameter
σt(T,K) such that the corresponding Black-Scholes prices can be matched with the market price:
∃!σt(T,K) > 0, s.t. C(t, T,K, St, σt(T,K)) = Cmarket(t, T )
16
Chapter 3
Statistical Theoretical Overview
In this chapter we do a brief overview of the statistical methods used in this thesis, following the sequence
of methods that are applied through the work.
Firstly, we introduce the Bisection method, which is used to compute the implied volatility as the root
of the non-linear function we generate based on the Black-Scholes formula. After obtaining the implied
volatility, we need some pre-processing methods such as Box-Cox method and Mahalanobis Distance,
which is discussed in the second section in this chapter. Next, we introduce the idea of Quantile regres-
sion models in the third section. It generalizes linear regression methods and the tree-based regressors
with the combination of Quantile method, namely quantile linear regression and quantile random forest,
as our main models.
3.1 Bisection Method
The bisection method (so-called interval halving method) is one of the simplest root-finding algorithms
which is used to find zeros of continuous non-linear functions. The advantages of this method reflect
on two aspects: it is very robust and always tends to the solution if the signs of the function values
are different at the borders of the chosen initial interval, and it can be applied for non-differentiable
continuous functions [24]. However, the speed of convergence is relatively slow because it can only
double the accuracy with each iteration, which can be fast at the first several iterations and become very
slow at the end. In this process, the length of the interval is reduced by 50% in each step, leading to a
strictly monotone linear convergence.
It is used to find the root of a nonlinear real valued scalar function f(x) = 0 for f : R → R. This
algorithm works as following:
1. An interval [xa, xb] inside of the domain of definition is chosen initially. The signs of f(xa) and
f(xb) must be different to make sure that the function must have at least one root inside, by the
Intermediate Value Theorem.
2. At the beginning of t-th iteration, a midpoint of the interval xmidt =xat +x
bt
2 is computed and the
function value f(xmidt ) is evaluated.
17
• If f(xmidt )×f(xat ) > 0, then the root is inside the interval [xmidt , xbt ]. Thus we define new lower
border xat+1 as xmidt and continue the process;
• If f(xmidt )×f(xat ) < 0, then the root is inside the interval [xat , xmidt ].Thus we define new upper
border xbt+1 as xmidt and continue the process;
• If f(xmidt )× f(xat ) = 0, then xmidt is the root and the iteration is complete.
3.2 Pre-processing Methods
Many statistical methods and analysis comply with the assumption that the population under study is
normally distributed with a common variance and additive error structure. When the theoretical as-
sumption is seriously violated, one of the practical measures is to design a new model which contains
important aspects of the original model and also satisfies the assumption, for example, by applying a
proper transformation to the data set [25]. In this work, the parametric power transformation, Box-Cox
transformation, is applied to implied volatility to satisfy the linear regression assumption in Section 5.3.1.
Afterwards, in order to detect the multivariate outliers, the Mahalanobis distance, proposed in 1930 [26],
is applied to indicate how far one observation is from the center of the data bulk with respect to the
covariance structure.
3.2.1 The Formula of Box-Cox Transformation
While some traditional transformations (for example, square root, log, inverse) are more well-known for
improving normality, the Box-Cox transformation (put forward in 1964 by Box and Cox [27]) represents
a family of power transformations which integrates and extends the traditional measures to find the
optimal normalizing transformation [28]. The transformed values can be considered as the concretation
of a normal distribution. The Box-Cox transformation is defined as:
y(λ)i =
yλi − 1
λ, λ 6= 0
ln(yi), λ = 0
, (3.1)
for yi > 0, with transformation parameter λ [25].
Note that the transformation in Equation 3.1 is valid only for yi > 0. Modifications have been made
for a variable assuming negative values, and Box and Cox proposed the shifted power transformation as
an alternative way:
y(λ)i =
(yi + c)λ − 1
λ, λ 6= 0
ln(yi + c), λ = 0
, (3.2)
where λ is still the transformation parameter and c is a constant, satisfied yi + c > 0
The Box-Cox transformation parameter λ can be computed automatically by using any statistical
package, for instance, R which is used in this work. As we mentioned above, this family of transforma-
tions incorporates many traditional transformation measures [28]. Here are some examples shown in
18
Table 3.1:
Table 3.1: Parameter λ’s values and corresponding transformations
λ Transformations
1.00 identical to original data
0.50 square root transformation
0.33 cube root transformation
0.25 fourth root transformation
0.00 natural log transformation
-0.05 reciprocal square root transformation
-1.00 reciprocal (inverse) transformation
3.2.2 Mahalanobis Distance
Different from univariate outliers, multivariate outliers may not deviate from the majority of observations.
The Mahalanobis distance, introduced by Mahalanobis [29], is a measure of distance between groups
in terms of multivariate outliers detection. Mahalanobis distance is unitless and scale-invariant, and it
takes into account the associations between the original variables.
Formally, if xi is a realization of a p-dimensional random vector xi = (xi,1, . . . ,xi,p)> in a multivariate
sample with n observations with the sample mean µ = (µ1, . . . , µp)> and the sample covariance matrix
Σ, then the Mahalanobis Distance between xi and µ is defined as [30]:
DM (xi, µ) =√
(xi − µ)>Σ−1(xi − µ), i = 1, . . . , n. (3.3)
If xi is normally distributed, thenD2M (xi, µ) follows a Chi-square distribution with p degree of freedom,
i.e.
D2M (xi, µ) ∼ χ2
(p).
A certain cut-off value, e.g. the 99% quantile of χ2(p) can be used as an indication of extremeness.
One point can be considered as potential outlier, if it has a larger value of Mahalanobis distance than a
cut-off value [31].
Robust Mahalanobis Distance
Both mean and sample covariance matrix are very sensitive to outliers. Therefore, the classical
outlier identification method does not always find them when the estimators are themselves affected
by outliers. The minimum covariance determinant estimator (MCD) is a method based on very robust
estimators. It searches for a subset containing half of the data, and its covariance matrix has the smallest
determinant (Since this work does not focus on the algorithm of MCD, please check [32] for more details).
The resulting estimators also lead to robust estimates of the Mahalanobis distance, and this robust
distance is better suited to expose the outliers. Some studies show that same cut-off value chosen from
the χ2(p) distribution is still suitable for the robust case [31].
19
3.3 Quantile Regression Method
Koenker and Bassett Jr [12] came up with the quantile regression, expanding the range of ’median re-
gression’ proposed by Farebrother [33], which remedies the situation when in practice the dataset cannot
satisfy the assumptions of Ordinary Least Square (OLS) method. Quantile regression is a generaliza-
tion of linear regression where it models a quantile of interest as a linear function of the explanatory
variables. It has the advantage to be more robust then ordinary least squares. And it has shown that it
can lead to good results when there are complex relations between the variables.
Definition of quantile
Quantile regression focuses on the conditional quantiles of Y given X = x rather than the conditional
mean, where the median of Y given X = x is one particular case. Assume the distribution function for
a real-valued response variable Y is
FY(y) = P (Y ≤ y),
then the τ -th quantile of Y defines as the minimum value of y which satisfies F (y) ≥ τ , i.e.,
QY(τ) = F−1Y (τ) = inf{y : F (y) ≥ τ}, 0 < τ < 1. (3.4)
Definition of quantile loss function
Given a realization of a random sample y = {y1, y2, . . . , yn} of Y and a quantile τ ∈ (0, 1), the
L1 − norm quantile regression is used to minimize the loss function [34]:
L(y, y)τ =
n∑i=1,yi≥yi
τ |yi − yi|+n∑
i=1,yi<yi
(1− τ)|yi − yi| =n∑i=1
ρτ (yi − yi), (3.5)
where yi are the estimated values and ρτ (yi − yi) (named as check function [12]) is defined as:
ρτ (yi − yi) =
τ(yi − yi), yi − yi > 0
− (1− τ)(yi − yi), otherwise. (3.6)
The quantile regression is going to be discussed in more detail in Section 3.3.1, and in Section 3.3.2
estimation methods based on decision trees are briefly reviewed.
3.3.1 Linear regression and its quantile application
Suppose that there is a dataset with n observations, {yi,xi} for i = 1, . . . , n, where yi is a response
variable and the explanatory variables xi = (xi1, . . . , xip). The linear regression model is:
yi = β0 + β1xi1 + · · ·+ βpxip + εi = x>i β + εi, i = 1, . . . , n, (3.7)
where x>i β is the inner product between vectors xi = (1, xi1, . . . , xip), β = (β0, β1, . . . , βp)> and εi is
a random variable, called error term. Sometimes one of the explanatory variables can be a non-linear
function of another one, as in polynomial regression, but the model remains linear because it is still
20
linear function of the parameter vector β. The error term εi captures all other influence on the response
variable yi other than xi.
In linear regression, we assume
E(Y|X = x) = x>β.
The major assumptions made by standard linear regression models with standard estimation tech-
niques (e.g. ordinary least squares) are [35]:
• Linearity: It requires the mean of the response variable is a linear combination of the explanatory
variables.
• Constant variance (homoscedasticity): Different values of the response variable are assumed to
have the same variance, regardless of the values of the explanatory variables, i.e. var(Yi|Xi =
xi) = σ2.
• Independence of errors: The errors of the response variables are assumed to be uncorrelated with
each other, i.e. cov(εi, εj) = 0,∀i 6= j.
• No multicollinearity in explanatory variables. Multicollinearity occurs when the explanatory vari-
ables are too highly correlated with each other.
Linear quantile regression fits a conditional quantile of the response variable by a linear function x>β.
In Table 3.2 we compare the general quantile linear regression with the most common and widely used
alternatives: linear and L1 regression methods [36].
Table 3.2: Comparison between linear, median, and quantile regression methods.
Ordinary Least Squares (OLS)
Conditional mean function E(Y|X = x) = x>β.
Loss function:n∑i=1
(yi − yi)2, minimize the sum of square of residuals.
Estimates β = arg minβ∈Rp+1
n∑i=1
(yi − xi>β)2.
Least Absolute Deviation (LAD)
Conditional median function Q0.5(Y|X = x) = x>β(0.5).
Loss function:n∑i=1
|yi − yi|, minimize the sum of absolute values of residuals.
Estimates β = arg minyi∈Rp+1
∑ni=1 |yi − yi|,.
τ -Quantile
Conditional quantile function Qτ (Y|X = x) = x>β(τ).
Loss function:n∑i=1
ρτ (yi − xi>β), minimize the sum of weighted absolute value of residuals.
Estimates β = β(τ) = arg minβ∈Rp+1
n∑i=1
ρτ (yi − xi>β).
21
3.3.2 Tree-based regressors and its quantile application
Tree-based methods are simple to visualize and to interpret, but in complex cases decision trees may not
give competitive prediction results. Hence, ensemble methods are needed to generate a large amount
of trees which are able to be combined afterwards to yield one predicted result, and usually create
surprising improvements in prediction. This section begins with the introduction of a single decision tree,
followed by the mechanisms of Random Forest and corresponding quantile expansion.
Elementary tree-based model: Decision Tree
Decision tree is a non-parametric supervised learning method in the form of a tree structure, used for
both regression and classification. In this work we focus in particular on the regression part. It splits a
dataset from the entire space (denoted as R) into several regions (denoted as Rj) from the top of the
tree to each leaf in the bottom by a series of binary if-then rules. These rules identify distinct regions
where the observations inside share the most homogeneous responses to predictors. In each internal
node, the dataset is split and predictors are judged to minimize the prediction error. The leaves, named
terminal nodes, represent the final division of regions. At a leaf node, the mean of the response values
assigned to that node is the predicted value returned by the decision tree.
Following the notation of Breiman [37], we define θ as the random vector for the entire tree, to record
how explanatory variables are split at each node and the corresponding tree is represented by T (θ).
Every leaf j = 1, . . . , J of a tree T (θ) corresponds to a rectangular subspace denoted as Rj , j = 1, . . . , J .
For every x, there is one and only one leaf j such that x ∈ Rj (corresponding to the leaf that is obtained
when dropping x down the tree). Denote this leaf by j(x, θ) for tree T (θ). The goal is to construct these
regions R1, R2, . . . , RJ minimizingJ∑j=1
∑i∈Rj
(yi − yRj)2,
where yRjrepresents the mean response over all observations in region Rj , j = 1, . . . , J .
The process of building a decision tree T (θ) is:
1. First, a binary split is applied to the current subset. Due to the fact that it is difficult to take every
possible partition of the dataset into J rectangles into account, we need to apply a top-down,
greedy recursive binary splitting approach to split successively further down to the leaf and to
choose the best split at each step of tree-building process. The decision of making strategic splits
is vital for the accuracy of prediction. In the process of creating two sub-regions, the criteria is used
to make sure the homogeneity in the sub-regions increased. This work uses CART (Classification
and Regression Tree) algorithms. In order to perform this approach, for a predictor Xj , there is a
cutpoint s such that the predictor can be split to two regions, either the region where Xj is smaller
than s, i.e. {Xj < s} or the region where Xj is greater or equal to s, i.e. {Xj ≥ s}. The best
cutpoint for each predictor Xj , j = 1, . . . , p, is chosen such that the tree has the lowest sum of
22
square of residuals, i.e., defining
R1(j, s) = {Xi : Xi,j < s} and R2(j, s) = {Xi : Xi,j ≥ s}.
We want to find the best pair of (j, s) to minimize
∑i:xi∈R1(j,s)
(yi − yR1)2 +
∑i:xi∈R2(j,s)
(yi − yR2)2, (3.8)
where yR1represents the mean response over all observations in R1(j, s) and yR2
represents the
mean response over all observations in R2(j, s).
2. Afterwards the process is repeated and applied to the next node, focusing on splitting one of the
previous identified regions, until user defined stopping criterion is reached. The best predictor and
cutpoint for each further node are chosen to minimize Equation 3.8 and the distinct regions are
generated gradually. Thus a fully grown tree is generated for training dataset with 100% accuracy.
3. Finally, it is known that decision trees can cause the over-fitting problem if the tree is too complex
and full of details, leading to bad performance on prediction of new observations. It is necessary
to set constraints on tree size or to prune the grown tree. (Note that in this work, we mainly use
decision trees following the random forest method mentioned in next subsection. Therefore the
pruning is not necessary and its introduction is omitted here. Those who are interested in pruning,
please check Mingers [38], Mehta et al. [39] and Kearns and Mansour [40].)
Prediction
A decision tree works through two steps: first divide the train set into J distinct and non-overlapping
regions R1, R2, . . . , RJ . The predicted value associated with a given leaf is the mean response of the
observations in train set falling in that region. Thus if a new observation falls in one of the regions, its
predicted value will be the mean response value for the train observations belonging to that region.
Specifically, the prediction of a single tree T (θ) for new points X = x is the average over the observed
values in leaf j(x, θ). The weight vector ωi(x, θ) is a positive constant if observation xi belongs to the
leaf j(x, θ), otherwise it gets 0. The sum of the weights is one, and thus
ωi(x, θ) =1{Xi∈Rj(x,θ)}
#{m : Xm ∈ Rj(x, θ)}. (3.9)
Given X = x, the prediction of a single tree is then the weighted mean value of the original observa-
tions yi, i = 1, . . . , n,
single tree : µ(x, θ) =
n∑i=1
ωi(x, θ)yi.
Decision trees are popular because
• Small trees can be visualized, relatively easy to understand and interpret. A leaf can be explained
by boolean logic.
23
• It is able to use simultaneously categorical and continuous variables.
• The tree is unaffected by monotone transformations and differing scales of the explanatory vari-
ables. Thus, the model allows more general applications without the need of pre-processing the
data.
• The computational cost of using the tree to predict a response is logarithmic in the number of
objects in the train set and thus relatively fast, even though the train of the tree may be a complex
and demanding process.
• Trees are less sensitive to outliers in the explanatory variable.
Despite these benefits, decision trees have several weaknesses as well. They have difficulty in
modeling smooth functions and its structure depends on the sample of data. Small changes in training
set can result in very different results of splits. What is more, decision trees are built by greedy algorithm
where locally optimal decisions are made at each node, rather than returning the globally optimal result
[41].
However, by combination of many decision trees based on the ensemble methods, the problems can
be reduced and the performance of prediction can be quite improved. Random forest, as a representa-
tive ensemble method, is introduced in next section.
Ensemble method: Random Forest
Random forest is an ensemble learning method generated by using many decision trees on selected
training samples. Like decision trees, it is capable of solving both regression and classification problems.
Random forests employs randomness each time it selects a subset of training set for building each tree
and a subset of input attributes to choose the best attribute and split on for generating each node.
In bagging, each tree is independently constructed using a bootstrap sample of the data set. In
the end, a simple average is taken for prediction. Breiman [37] proposed random forests, which add
an additional layer of randomness to bagging. In addition to constructing each tree using a different
bootstrap sample of the data, random forests change the way how regression trees are constructed. In
standard trees, each node is split using the best split among all variables. However, in a random forest,
each node is split using the best variable among a subset of variables which is randomly chosen at each
node. This strategy turns out to perform very well and is robust against over-fitting [37].
The algorithm for generating a random forest regression model is as follow [42]:
1. Before building each tree, first draw several samples from training set by bootstrap resampling
method (with replacement) from the original data. Here we denote the number of samples as
ntree, indicating that ntree trees will be built.
2. For each selected sample, grow a regression decision tree without pruning. At each node of a
decision tree, a random subset of mtry variables is chosen from the entire p variables, as the
target to be split at this node, rather than choosing the best split among all variables. Only one of
24
these mtry variables can be used to generate the best split rule and corresponding subregions at
this node.
Thus for the selected training sets in step 1, ntree trees are fully grown and combined into a random
forest.
3. Predict new observations by aggregating the predictions of this ntree trees. (The way of aggrega-
tion will be discussed further.)
Advantages of Random Forests:
• As long as the number of trees is large enough, the problems of huge computation and the over-
fitting do not bother much, due to the fact that not all the variables are considered in each step of
building nodes.
• Random forest also reduces the influence of some very strong variables which strongly take control
of each tree, and therefore other variables are allowed to show their own performances.
• Random forest runs efficiently on large dataset and the learning speed is fast.
• Random forest can handle a large set of input variables, so there is no need to operate variable
selection.
• Random forest offers an experimental method for detecting variable interactions.
For regression, the prediction of random forests for a new data point X = x is the averaged re-
sponse of all the trees. Using random forests, the conditional mean E(Y|X = x) is approximated by
the averaged prediction of ntree single trees, each constructed with an independent and identically dis-
tributed vector θt, t = 1, . . . , ntree. Let ωi(x) be the average of ωi(x, θt) (defined in Equation 3.9) over
this collection of trees,
ωi(x) =1
ntree
ntree∑t=1
ωi(x, θt). (3.10)
The prediction of random forests is then
E(Y|X = x) = µY|X=x(x) =
n∑i=1
ωi(x)yi.
Quantile Random Forest
It was shown above that random forests approximates the conditional mean E(Y|X = x) by a
weighted mean over the observations of the response variable Y. Not only that, the weighted ob-
servations can reveal the full conditional distribution. The conditional distribution function of Y given
X = x is written as
F (Y|X = x) = P (Y ≤ y|X = x) = E(1{Y≤y}|X = x).
25
The last expression inspires to draw analogies with the random forest approximation of the conditional
mean E(Y|X = x). Just as E(Y|X = x) is approximated by a weighted mean over the observations
of response variable, we can define an approximation for E(1{Y≤y}|X = x) by the weighted mean over
the observations of 1{Y≤y} [14],
F (Y|X = x) =
n∑i=1
ωi(x)1{Y≤y},
using the same weights ωi(x) (defined in equation 3.10) as in random forests. This approximation is at
the heart of the quantile regression forests algorithm.
The estimate QY|X=x(τ) of the conditional quantiles
QY|X=x(τ) = F−1Y|X=x(τ),
are obtained by plugging F−1Y|X=x(τ) instead of F−1Y|X=x(τ) into equation 3.4. Other approaches for
estimating quantiles from empirical distribution functions are discussed in Hyndman and Fan (1996).
The key difference between quantile regression forests and random forests is that: in each tree,
random forests keeps only the mean of the response variable of the observations that fall into each
leaf node and neglects all other information. In contrast, quantile regression forests keeps recording
every quantile of fitted response values in all trees, not just their mean, and assesses the conditional
distribution based on this broader and more comprehensive information.
26
Chapter 4
Computation and Analysis of Implied
Volatility
The main goal of this chapter is to demonstrate how to calculate the implied volatility from the dataset
introduced in Section 1.2 and analyze the results. In order to compute the implied volatility from the
options price, we apply the Black-Scholes formula. In many cases, the Black-Scholes formula, as derived
by Black and Scholes [1], is used to compute the price of a certain option, provided with the data
concerning interest rate, asset price, maturity, strike and volatility. But in our context, the volatility is
unknown, and the option price is provided (or at least one approximate value of it available). Therefore,
we invert the Black-Scholes formula in order to get estimates for the implied volatility.
As we do not have explicitly the option price, but instead the bid and ask price for each contract,
we propose to compute the implied volatility using two ways: either we compute the average of the bid
with the ask price and the obtained value is the input for the Black-Scholes formula; or we input the
bid and then ask price in the Black-Scholes formula individually, and then compute an average of the
resulting implied volatilities. For reasons that will become clear in the following sections, we will resort
to a numerical method (namely the Bisection method) to compute the implied volatilities.
As a result, four sets of implied volatility are generated through operations above. We display and
compare the implied volatilities computed through the example shown in Section 1.2 specifically, and
afterwards give a general description of the main type of implied volatilities, which is deduced based on
future price.
4.1 Calculation Processes
Thanks to Put-Call Parity mentioned in Section 2.4, we can relate call options prices with put options
prices with the same underlying asset, strike, date and maturity. Based on the combination of call and
put options we did in Section 1.2, as a matter of choice, we have decided to focus on call options prices
and thus the implied volatility mentioned afterwards refers to call options.
If we define a call option price generated by Black-Scholes formula with the only unknown parameter
27
implied volatility σ as FBS(σ), it should be equal to the true market value Cmarket of the call option
contract with the same asset, date, maturity and strike price:
FBS(σ) ≈ Cmarket. (4.1)
Regarding to solve Equation 4.1 and obtain the implied volatility σ, we have two challenges:
1. In fact, the market price for an option contract Cmarket is not provided in the dataset. We have
information concerning bid and ask prices, hereby denoted by Cbid and Cask respectively, as the
range of Cmarket (also known as bid-ask spread [Cbid, Cask]). The true price of the option, Cmarket,
may not exactly fall in this range, although most likely it will match inside. Therefore we propose
the following two alternative methods and show the ideas briefly in Figure 4.1:
Method 1:
Assume that
Cmarket =1
2(Cbid + Cask),
and compute the implied volatility from this new value.
Method 2:
First, compute the implied volatility using Cbid as the input variable, and named it σbid.
Second, compute the implied volatility using Cask as the input variable, and named it σask.
Third, compute the resulting implied volatility as the average of σbid and σask
Figure 4.1: Framework of computation Method 1 (m1) and Method 2 (m2).
2. Whichever we decide to use as input values to compute implied volatility, we need to invert the
Black-Scholes formula. Unfortunately, this formula is a non-linear function which does not have a
closed form solution for implied volatility. Therefore one needs to resort to numerical approxima-
tions. Then, as we cannot find explicitly:
σ : FBS(σ) = Cmarket,
we need to estimate:
σ = arg minσ
[FBS(σ)− Cmarket]. (4.2)
28
For the minimization method, we consider the bisection method mentioned in Section 3.1, a root-
finding algorithm for continuous non-linear, in particular non-differentiable functions. The bisection
methods is used to find the root, the implied volatility σ, of this non-linear non-differentiable func-
tion, FBS(σ)− Cmarket = 0. The process is as following:
• Firstly, initialize the limits of an interval [0.001, 1], and choose the first midpoint as 0.2 to
accelerate the computation speed, as according to our knowledge, the implied volatility for
this kind of option is normally small.
• Secondly, in each iteration, compute the midpoint and its corresponding function value. Follow
the second step of the algorithm mentioned in Section 3.1 to compare the signs and reset the
interval.
• Finally, repeat the last step until the absolute value of midpoint’s function value (i.e. the error of
estimate option price and true market price) is less than 0.00001, or the number of iterations
reaches the maximum 1000. For the former case, the estimate implied volatility is attained
as the value of the midpoint in the last iteration. The latter indicates that the computation can
not converge, regardless the number of iterations because there is no improvement even if
we set the maximum of iteration as 3000.
As we mentioned in Section 2.5, for different configuration of portfolio, the Black-Scholes formula
has two forms: Equation 2.7 and Equation 2.9 depended on whether the price of asset or the future is
involved.
By using future price
FFBS(σ) = C(t, T,K, F (t, T ), D(t, T )) = D(t, T )(F (t, T )Φ(d1)−KΦ(d2)),
d1 =ln(F (t,T )
K ) + 0.5σ2(T − t)σ√T − t
,
d2 = d1 − σ√T − t.
By using asset price
FSBS(σ) = C(t, T,K, St, σ) = StΦ(d1)−Ke−r(T−t)Φ(d2),
d1 =ln(St
K ) + (r + 0.5σ2)(T − t)σ√T − t
,
d2 = d1 − σ√T − t.
In Figure 4.2 we show the framework of how we organize the related datasets and as a result, there
are in total four sets of implied volatility, noted as IV m1F , IV m2
F , IV m1S and IV m2
S , where the subscript
F (S) means that we use future price (asset price), and the superscript m1 (m2) means that we have
used method 1 (method 2).
29
Figure 4.2: The framework of how we organize the related datasets.
4.2 Analysis of the computed implied volatility
We start our analysis by showing the implied volatilities computed from the contracts in the subset
mentioned in Section 1.2 which is named ’Example Set’. Afterwards, we focus on the most reliable
result, IVF , and present some plots and descriptive statistics.
4.2.1 Analysis and comparison based on the example
To avoid the plots being unreadable, instead of plotting for the whole dataset the computed implied
volatilities, we here follow the ’Example Set’ mentioned in Section 1.2, which contains 42 contracts of
options from Jan 3, 2014 to March 1,2014, and present the implied volatility computed from these 42
observations in the example in Figure 4.3.
In Figure 4.3, the results computed by method 1 (denoted as m1 in the legend) are denoted by the
notation ’o’, while the ones by method 2 (denoted as m2 in the legend) are denoted by the notation ’+’.
We can see that the computation methods do not cause much difference, except that for index price
case there are several values missing at which points the computation cannot converge.
The three curves in Figure 4.3 represent three input variables in the Black-Scholes formula, which
are relatively the future price (represented by the curve on the top), the index price (represented by the
curve on the bottom), and additionally, the constructed price (represented by the curve in the middle,
closes to the one on the top). Note that although the constructed price is deduced from future prices,
here the constructed price is used as input in the same ’asset price involved’ formula as index price. It
illustrates that
• The implied volatilities based on the future price and the constructed price are similar, especially
for the last 20 observations. The last 20 observations’ strike prices are closer to the daily index
price than the first 22 observations’. Therefore, smaller difference of option’s strike price and daily
30
Figure 4.3: Different types of the implied volatility based on the Example set. Implied volatility computedby future price, asset price and constructed price in two computation methods are shown in differentcolors. The results computed by method 1 (denoted as m1 in the legend) are denoted by the notation’o’, while the ones by method 2 (denoted as m2 in the legend) are denoted by the notation ’+’.
index price indicates less influence on the computed volatilities caused by the input variables of
Black-Scholes formula.
• The implied volatility based on index price is not accurate on the major maturities. As mentioned
earlier in Section 1.2.2, the deduced constructed price can variate a bit from index price in the
major maturities. In this example, the maturity March 21, 2014 is one of the major maturities and
the daily index price is 3074.43, different from the constructed price 3062.73.
In a nutshell, the implied volatility based on future price is the most accurate and reliable result. Only
when the future price is not available, which means that the maturity of an option is not one of the major
maturities, the implied volatility based on the index price can be considered as a supplement. What is
more, because, except those major maturities, the market is not as active and variate as in the major
maturities within the last 45 minutes, the index price can be closer to the constructed price for the sake
of producing reliable implied volatility.
31
4.2.2 Analysis on the entire IVF
In the last subsection, we mainly display a small subset of implied volatilities. Through that way, we
can provide the reader a clear look of the specific value for each contract. We have seen that two
computation methods give very similar results on computed implied volatility. Now it is the time to have
a general look of the difference of the whole set of these two resultant implied volatilities based on future
price (IVF ).
Here we discuss and compare the differences of the implied volatility derived by method 1 and
method 2 respectively in dataset IVF . After plotting the result deduced by method 1 against method
2 in Figure 4.4, we can see that most of points marked black fall around the diagonal line, meaning that
the estimated implied volatilities are pretty similar. These points are noted as stable because the esti-
mates stay stably regardless of calculation methods. Only a small set of option contracts produced the
differences between two methods are marked red. Those red points with differences of estimated results
larger than 0.001 are only 0.65% of the whole dataset, not as reliable as the black ones because they
can not defect the influence of calculation methods. We named the red part as unstable. In a nutshell,
we define these two datasets as ’Stable’ set and ’Unstable’ set according to whether the observations
inside are sensitive to the computation methods, i.e. whether the computed implied volatility remains
relatively the same when the computation method changes from method 1 to method 2. Here the cut-off
point 0.001 is chosen manually by visualization, while other criteria is open to taken into account in the
future.
Figure 4.4: Comparison of implied volatility derived by both methods.
Next we try to figure out what kind of features inside of the unstable set make the result different
32
through two computation methods. The most prominent feature, shown in Figure 4.5, is that the ob-
servations inside all belong to deep ’in-the-money’ options or deep ’out-of-the-money’ options for the
call option. Defined in Section 2.2.4, these two kinds of options normally own a strike far away from
the daily asset price. What is more, the trading sizes are normally pretty small, due to the fact that for
deep in-the-money options, their option prices are more expensive than their profits, not attractive to the
buyers, and for deep out-of-the-money options, they have few chances to earn money in spite of their
cheap prices.
(a) Stable set. (b) Unstable set.
Figure 4.5: Option (ask) prices of their call and put options for the contracts with future price involved.
Therefore, the observations in the unstable set represent an extreme case, in which the option con-
tracts are hard to be exercised successfully. Therefore, we do not focus on the unstable set anymore,
and we continue our study based on the stable set. The dataset mentioned afterwards refers to the
stable set inside of the computed implied volatility with future price as input, denoted by IV stableF .
Table 4.1: For the contracts with estimate implied volatilities IV stableF .mean sd median trimmed mad min max range skew kurtosis se
Common variables
time to maturity 129.74 72.88 127.00 128.45 84.51 1.00 270.00 269.00 0.13 -1.02 0.41
strike 3282.91 403.15 3300.00 3283.72 407.71 1900.00 4500.00 2600.00 -0.04 -0.24 2.26
constructed price 3310.65 223.64 3253.37 3306.92 262.18 2777.45 3760.12 982.67 0.23 -1.17 1.25
Estimate Implied Volatilities
IV(m1) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.03 1.44 5.93 0.00
IV(bid) 0.01 0.00 0.01 0.01 0.00 0.00 0.03 0.03 1.29 5.03 0.00
IV(ask) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.04 1.55 6.58 0.00
IV(m2) 0.01 0.00 0.01 0.01 0.00 0.00 0.04 0.03 1.41 5.81 0.00
Thus in Table 4.1, we display a statistical analysis for the current dataset concerned IV stableF . We
can see that in this dataset both mean and median of time to maturity are almost four months as the
entire IVF . Further study and analysis of dataset IV stableF are discussed in next chapter.
33
34
Chapter 5
Modeling and Predicting the Implied
Volatility
Now that we have estimated the implied volatility, in this chapter we move forward, trying to fit the dataset
concerned with some regression models and give some explanations. In view of the large size of the
dataset IV stableF (31896 observations in total), we will focus on some specific subsets. Therefore, we will
explain in the first section of this chapter how we choose the subsets, in the second section what we do
in explanatory analysis, and in the third section how we fit the models.
5.1 The process of choosing a subset
Mentioned at the end of last chapter, the dataset we are focusing on now is IV stableF , namely the stable
set inside of the computed implied volatility with future price as input. It contains 31896 observations,
occupying 99.35% of the dataset IVF .
From the description of dataset IVF in Section 4.2, we have a rough idea that the distribution of time
to maturity is relatively even, waving around the length of four months. Due to the fact that one of the
biggest challenges of this work is that the observations are highly mixed and overlapped, clear patterns
of implied volatility are hard to attain. However, according to our knowledge for financial activities,
the movement of the market and the actions of the traders tend to have periodic variations. An idea
therefore comes up about if it is possible to extract some subsets which holds the significant seasonal
characteristics of the implied volatility variation.
If we fix a maturity and study the contracts which would be expired at that day, from the descriptive
analysis of IV stableF in Table 4.1, we can illustrate that the earliest trading among these contracts happens
at most 270 days prior to this maturity, and the latest one happens at least one day before it.
We first want to know more features about the date and maturity, i.e. the beginning and ending
day of the life of an option contract. Figure 5.1 shows, respectively, the range of the trading dates for
those contracts which end at ten distinct major maturities. Note that the information of options was
gathered from January 2, 2014 to October 29, 2015. It is worth to highlight that the records lacked in
35
Figure 5.1: Boxplot of trading dates for every maturities appeared.
the original dataset for a period from March 21, 2014 to May 16, 2015, due to unclear reasons. The lack
of information of this period actually affects and reduces the number of contracts traded in June, 2014
and September, 2014. What is more, the first maturity appeared (March 21, 2014) and the last three
maturities shown in this study (December 18, 2015, March 18, 2016 and June 17, 2016) are influenced
by the beginning and ending days of the records. The box plots for the middle four maturities (December
19, 2014, March 20, 2015, June 19, 2015 and September 18, 2015) seem to have more complete
information currently.
Figure 5.2: Sample sizes and rates of each maturity. The red numbers at the top of the columns explainthe proportion of the sample size of each maturity in the total amount of the observations.
36
Figure 5.2 displays the sample sizes of the contracts related with each maturity. The red numbers
at the top of the columns explain the proportion of the sample size of each maturity in the total amount
of the observations. It is considered to be a good choice for studying the contracts related with maturity
at September 18, 2015 as our current target. It first contains relatively complete information, holding
17.7% of the total contracts. Secondly it is the last maturity before the end of the records, supposedly
containing the most comprehensive information throughout the period of records. We need further study
to check if it has similar enough tendency of the entire dataset IVF to represent the latter one.
5.2 Explanatory Analysis
For the chosen dataset composed by options with maturities at September 18, 2015, we display a brief
analysis to attain more information we have not noticed in the previous dataset when observations are
mingled disorderly and unsystematic.
Table 5.1: Statistical description for options whose maturities are at 2015-09-18.mean sd median trimmed mad min max range skew kurtosis se
Common variables
time to maturity 129.14 72.50 133.00 129.31 90.44 1.00 270.00 269.00 -0.02 -1.16 0.97
strike 3466.14 379.90 3500.00 3484.44 370.65 2200.00 4350.00 2150.00 -0.43 -0.03 5.06
discount 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00 -0.56 -0.60 0.00
constructed price 3475.59 186.65 3526.46 3496.08 169.78 2938.49 3753.79 815.30 -0.90 0.06 2.48
Input using future prices
future(bid) 3475.45 186.90 3526.00 3495.96 170.50 2938.00 3754.00 816.00 -0.90 0.06 2.49
future(ask) 3477.25 186.78 3529.00 3497.74 167.53 2940.00 3756.00 816.00 -0.90 0.06 2.49
Estimate Implied Volatilities
IV(m1) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.44 8.94 0.00
IV(ask) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.48 9.46 0.00
IV(bid) 0.01 0.00 0.01 0.01 0.00 0.01 0.03 0.03 2.37 8.19 0.00
IV(m2) 0.01 0.00 0.01 0.01 0.00 0.01 0.04 0.03 2.42 8.79 0.00
Market prices for options
call(bid) 212.67 200.78 155.20 181.66 172.72 0.10 1300.60 1300.50 1.42 2.19 2.67
call(ask) 217.08 205.12 157.95 185.31 175.69 0.20 1306.60 1306.40 1.42 2.13 2.73
put(bid) 201.70 162.41 155.40 180.19 145.00 0.00 779.20 779.20 1.07 0.55 2.16
put(ask) 204.66 166.24 156.55 182.49 147.30 0.00 797.20 797.20 1.08 0.58 2.21
We do a brief statistical analysis for options whose maturities are at 2015-09-18. Table 5.1 and
Figure 5.3 indicate that:
• Both mean and median of time to maturity in our current dataset are almost around four months.
• As for the implied volatility calculated by method 1 and method 2, the only differences occur in the
skew and kurtosis. The values of skew and kurtosis are slightly larger in the current dataset than
the entire set with all maturities, indicating that even though the distribution of implied volatility still
has right tail and skews positively, the observations are actually more centralized together.
• For the distribution of the constructed price, the skew is -0.90 (with heavy left tail), while the kurtosis
is 0.06, indicating that the distribution is much more centralized in the current dataset than the
37
Figure 5.3: Pair plot for options whose maturities at 2015-09-18. Here we plot the seven variables,namely time to maturity, strike, constructed price, IV (m1),IV (ask), IV (bid) and IV (m2).
entire set with all maturities. However, the constructed price is still hard to say it totally follows a
standard log-normal distribution required by the assumption of the Black-Scholes formula.
After preliminary analysis, the current dataset has the improvement on centralizing information from
observations and reinforcing the correlation between our response variable (implied volatility) and pre-
dictors (time to maturity and strike).
(a) Perspective 1: from the side of time to maturity (b) Perspective 2: from the side of strike
Figure 5.4: 3D plots from the side of both time to maturity and strike to check the distribution of impliedvolatility.
38
Afterwards, we show two perspectives in 3D in the Figure 5.4. There are two types of ’smile’, the first
type has smaller values of time to maturity and its ’smile’ is relatively complete, as it shows the inflection
points of the curves and the curves bounce up after implied volatility going down with the increase of
strike. Most observations belong to the second type where the curves remain monotonically decreasing
along with the strike rising up.
5.3 Regression Modeling
Here we come up with some linear regression models to fit the dataset, and random Forest model is
also applied to complement as non-linear model, whose result is compared with the linear models.
5.3.1 Pre-processing process
In pre-processing, we first randomly separate the dataset into train set (75% of the entire observations)
and test set (25% of the entire observations). Response variable (implied volatility) for train and test are
positive-skewed, so in order to fit the following regression models and to satisfy their assumptions, the
train set then adopts Box-Cox method (introduced in Section 3.2.1) to calibrate the distribution of the
implied volatility into normalization.
(a) train (b) test
Figure 5.5: The distribution of response variable before and after Box-Cox transformation.
The parameter λ = −1.59 generated by train is applied on both train and test dataset to stay in
the same scale. Note that the implied volatility in this case ranges from 0.0072 to 0.0379, thus the
transformed implied volatility has the range between -1604.2571 and -113.8081. From the Figure 5.5,
this method works well and the transformed response variable (noted as ’bcIV ’) approximately follows
normal distribution. Since the Box-Cox transformation already transformed the orders of magnitudes in
the similar levels of strike and time to maturity, thus scaling process is not necessary afterwards.
Next we apply Mahalanobis Distance for the set of three variables (time to maturity, strike and bcIV
39
which is the estimate implied volatility adjusted by Box-Cox method) to find the extreme values relatively
far away from the main cluster. Points whose robust Mahalanobis Distances are the largest 1% amongst
the dataset would be treated as extreme values and marked green in the correlation pair plot and 3D
plot in Figure 5.6, although those points can not be treated as outliers from the cluster.
(a) Pairs plot. (b) 3D plot.
Figure 5.6: Plots with extreme values detected by robust Mahalanobis distance. Points whose robustMahalanobis Distances are the largest 1% amongst the dataset would be treated as extreme values andmarked green
Shown in Figure 5.6, instead of choosing several complete ’smile’ curves ( for example, the first
type of smile shown in Figure 5.4), the Mahalanobis distance selects out several observations with high
volatility. Since it is mentioned in Section 3.2.2 that Mahalanobis distance is calculated through each
principal component axis therefore scale-invariant, the influence of different scales for three variables
are eliminated. However, looking back to the first type of curves with inflection points in 3D plots, these
observations probably indicate significant characteristics on how the implied volatility distributes when
the time to maturity is small. Thus the Mahalanobis distance does not give us a convincing evidence of
outliers detection. We leave all observations in further study. Note that the extreme values selected here
remain in color green, and it will be explained later on whether to keep them or remove them as outliers.
5.3.2 Linear Regression
After pre-processing operations, we start now our processes to fit linear models. The implied volatility
from the market shows a pattern named volatility smile, influenced by time to maturity and strike price
of the option. Thus our goal here is to generate some advanced alternative parametric models for
implied volatility based on time to maturity and strike price. In this section, we first come up with three
classical ordinary least square models gradually and apply both least absolute deviation regression and
quantile regression based on the third function we generate. Table 5.2 provides an overview in terms
of the response variable together with original covariates and the ones generated afterwards. Note
that both in the linear regression and random forests we use the transformed implied volatility through
Box-Cox method as the response variable to keep the models and predictions in the same scale, even
40
though random forest does not need the response variable to be normally distributed as needed by
linear regression.
Table 5.2: An overview in terms of the response variable together with original covariates and ones gen-erated afterwards in regression models. Both notations and descriptions are displayed representatively.
Variables Descriptions
Response Y Transformed implied volatility (by Box-Cox)
Covariate
X1 StrikeX2 Time to maturityX3 time to maturity × strikeX4 1/ time to maturityX5 strike2
The models are compared in terms of model efficiency. The coefficient of determination (R2), as an
effective performance statistic, is measured to give information about the goodness of fit of each model,
through
R2 =SSRegressionSSTotal
= 1− SSErrorSSTotal
.
Model 1 : Yi = β0 + β1X1,i + β2X2,i + εi, (5.1)
Model 2 : Yi = β0 + β1X1,i + β2X2,i + β3X3,i + εi, (5.2)
Model 3 : Yi = β0 + β1X1,i + β2X2,i + β3X3,i + β4X4,i + β5X5,i + εi, (5.3)
where εi is the error item.
To avoid repeating the similar operation several times and to give a clear impression, the general
expressions of three linear models are shown in Equation 5.3, and the meaning of covariates are based
on Table 5.2. Specifically, the models are generated as the following steps:
1. The initial model Model 1 is based on the transformed implied volatility related with a simple addi-
tion of our two original covariates, time to maturity and strike.
2. In Model 2, the interaction term is added. Taken into consideration of the relationship between
two covariates from Figure 5.6(a), strike and time to volatility seem to have a kind non-linear
relationship. It indicates that the interaction of these two covariates might be vital, so we combine
the interaction term of time to maturity and strike into the initial model.
3. Two terms are added inModel 3, which are ’the reciprocal value of time to maturity’ and ’the square
of strike’. Shown in Figure 5.6(a), we can see that the time to maturity goes inversely with implied
volatility. What is more, the phenomenon of ’volatility smile’ is visible, showing the existence of the
pattern between the square of strike and implied volatility.
Table 5.3 illustrates that:
For train set, the adjusted R2 values for three models are gradually increase from 0.8185, 0.8540 to
0.8668, stating that the covariates can explain at most 86.68% of the variance of the response variable,
41
Table 5.3: Summary of covariates on regression functions
Estimate Std. Error t value Pr(>|t|)
Model 1
β0 1394.7363 16.2883 85.63 0.0000
β1 -0.5904 0.0045 -131.42 0.0000
β2 -1.3634 0.0236 -57.68 0.0000
Multiple R2: 0.8186, Adjusted R2: 0.8185
Model 2
(Intercept) 2395.9840 34.4390 69.57 0.0000
β0 -0.8872 0.0101 -87.97 0.0000
β1 -7.5530 0.1940 -38.94 0.0000
β3 0.0019 0.0001 32.10 0.0000
Multiple R2: 0.8541, Adjusted R2: 0.8540
Model 3
(Intercept) 1424.2046 107.3273 13.27 0.0000
β0 -0.3830 0.0587 -6.53 0.0000
β1 -6.4615 0.1970 -32.79 0.0000
β3 0.0016 0.0001 27.58 0.0000
β4 447.5932 23.7503 18.85 0.0000
β5 -0.0001 0.0000 -8.14 0.0000
Multiple R2: 0.8670, Adjusted R2: 0.8668
and through the process the efficiency keeps improving. All covariates appeared in three models are
significant indicated by very small p-values. This actually can be caused by underestimation problem,
since for such a variate response variable, we only have two types of information (time to maturity and
strike). Besides the models mentioned above, we also tried some more complicated combination in
terms of time to maturity and strike, like high-order polynomial regression. However, the resultant R2
values do not have significant improvement and the meaning of the coefficients are harder to be ex-
plained. It is a trade-off on the complexity versus conciseness. Taken into consideration of the following
processes, since we are going to apply both robust regression and quantile regression which can bring
more perspectives of the relationship between covariates and response variable, we think it is a better
choice to use a relative simple and efficient model and keep the third model (Model 3) as the best one
till now.
The normal quantile-quantile plot is represented at the top right of Figure 5.7, showing the standard-
ized residuals in ascending order from left to right. The plotted curve are supposed to be close to a
straight line if the residuals follow a normal distribution. Here we have a clear and large deviation from
this theoretical line for small and large residuals, indicated for a “heavy tail”-distribution, i.e., a distribu-
tion with higher probability of events occurring in its tails [43]. The heavy-tail problem is slightly reduced,
compared with the Model 1 and Model 2, but the problem is still obvious and the fitting is hard to be
42
treated as a good one.
Figure 5.7: General plots of regression Model 3. The heavy-tail problem indicates that the fitting is hardto be treated as a good one.
It is worth to mention that we also applied the three functions above to the dataset in which the
extreme values found by Mahalanobis distance were excluded. However the test set can not be operated
into the same scale as the train set by the Mahalanobis distance due to the fact that Mahalanobis
distance uses each time sample’s own mean and variance. If a model is trained by train set without
extreme values, even though the R2s are improved in the train set, the performance of new model in the
pure test set can be even slightly worse than the previous one trained by train set with extreme values.
A solution might be to calculate the Mahalanobis distance for the test set using the mean and variance
of the train set. Since the package in R does not allow to set the mean and variance, due to the time
limits this approach has to be left for future work. Now we use all observations in the train set including
the extreme values, and we next explore the robust regression and quantile regression, which are less
influenced by the latent ’outliers’, based on currently the best model ’Model 3’.
Robust linear regression
Robust regression analysis provides an alternative to a least squares regression when fundamental
assumptions are unfulfilled by the nature of the data [44]. As we know that L1-norm robust regression is
an alternative way of the classical regression which is based on ordinary least squares, when data are
43
contaminated with outliers or influential observations as we mentioned in Section 3.3.1.
Table 5.4: The coefficients of the five covariates and their significance in the robust linear regression.Estimate Std. Error t value Pr(>|t|)
β0 1337.53410 73.01018 18.31983
β1 -0.37826 0.03993 -9.47298 < 2.2e-16
β2 -5.68856 0.13404 -42.43847 < 2.2e-16
β3 0.00134 0.00004 34.07648 < 2.2e-16
β4 389.05375 16.15629 24.08064 < 2.2e-16
β5 -0.00006 0.00001 -10.59057 < 2.2e-16
R2 is 0.8640, R2 on test is 0.8669
Note that the robust regression is one specific case of quantile regression when the quantile required
is equal to 0.5. Thus before showing the whole picture of fitted values from quantile regression, we first
provide an analysis on the robust linear model based on the median of response variable. Table 5.4 de-
scribes the coefficients of the five covariates and their significance in this robust regression. Compared
with the same set of covariates in the classical linear model Model 3 in Table 5.3, the coefficients and
their standard deviation have changed slightly. The R2 for the train set is 0.8640 and for the test set it
0.8669, smaller by 0.2% compared with classical linear regression, showing that robust linear regression
can not handle the outliers better than classical linear regression in this case.
Quantile linear regression
After the robust regression, we extent our study focusing not only on median but also on a set of
quantiles {0.05, 0.10, 0.25, 0.75, 0.90, 0.95} of the response variable.
Table 5.5: The summary of coefficients for difference quantiles.tau= 0.05 tau= 0.10 tau= 0.25 tau= 0.50 tau= 0.75 tau= 0.90 tau= 0.95
β0 764.27 1120.48 1582.73 1276.90 934.43 508.14 576.51β1 0.15 -0.12 -0.50 -0.35 -0.20 -0.05 -0.09β2 -7.99 -7.52 -6.38 -5.42 -4.13 -2.12 -2.23β3 0.0022 0.0021 0.0016 0.0013 0.0008 0.0001 0.0001β4 466.89 490.85 418.87 351.05 361.74 373.10 388.88β5 -0.00018 -0.00013 -0.00005 -0.00006 -0.00007 -0.00007 -0.00006
Figure 5.8 visualizes the change in quantile coefficients described in Table 5.5 along with confidence
intervals for all coefficients. Each black dot is the corresponding variable’s coefficient for the quantile τ
chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}. The red lines are the ordinary least squares esti-
mate and its confidence interval.
We find that the absolute value of covariates ’time to maturity’ is smaller (i.e. it has negatively less
influence on the response variable) when the quantile of implied volatility is larger. Covariate ’time to
maturity × strike’ acts the same way but following positively tendency instead of negatively in previous
case. Covariate ’strike2’ have more significant impact on lower quantiles of implied volatility than the
upper quantiles.
44
Figure 5.8: Plots for coefficients in quantile linear regression. Each black dot is the correspondingvariable’s coefficient for the quantile τ chosen in set {0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95}. The red linesare the ordinary least squares estimate and its confidence interval.
In order to display intuitively the effect and goodness of fitting of quantile regression, Figure 5.9
shows the quantile fitted values of response variable respectively in lower quantile (5%), median and
higher quantile (95%). We can see that the fitted values of lower and higher quantile regressions can
basically cover the boundaries of the entire distribution of response variable and median regression
catches the major feature of response variable as well.
In fact, there is a more robust way to estimating the performance of models by prediction intervals for
linear methods, due to the lack of requirement for the assumption of normally distributed residuals. In
particular, to explain better and visualize straightforward the result, we can define the prediction interval
whose boundaries can be adjusted by the lower and higher quantiles of response variable. In this
case, shown in Figure 5.10, we set two quantiles as 0.05 and 0.95 to create a 90% prediction interval,
then calculate the values in 5% quantile and 95% quantile as two boundaries based on train set. The
observations in test set are checked later on whether their true values are contained in the prediction
interval.
In this way, there are 83.27% observations satisfied to be covered correctly by prediction interval, i.e.
the range of the predictions in 5% and 95% quantile regressions. It is actually a pretty good result and it
will be compared with the prediction accuracy of next model, random forests.
45
(a)Low
erquantile.(b)
Median.
(c)H
igherquantile.
(d)Low
erquantile.(e)
Median.
(f)H
igherquantile.
Figure5.9:
Quantile
LinearR
egressionsfor
transformed
implied
volatilityw
ithstrike
andtim
eto
maturity.
Thequantile
fittedvalues
ofresponse
variablerespectively
inlow
erquantile,median
andhigherquantile
areshow
n.
46
Figu
re5.
10:
90%
pred
ictio
nin
terv
alfo
rtes
tset
inqu
antil
elin
earr
egre
ssio
n.
47
5.3.3 Quantile Random Forest
It is believed that for the dataset in which the relationship between response variable and covariates
follows a linear structure, the linear regression can be more efficient and accurate. Instead, if the rela-
tionship is highly non-linear or complicated, the tree-based models may give better results and expla-
nations. The linear regression above shows an acceptable performance with 83.27% observations in
test set covered correctly by the prediction interval, so our next job is to check on the fitting of quantile
random forests.
Described in Section 3.3.2, the logic of quantile regression forests is simple: let the model to record
not only the mean value of response variables, but all observed values of response in every leaf of
each tree in the forest. Therefore, based on the whole information of each leaf, we can attain the full
conditional distribution of response values for every observation and we can define the prediction interval
as well. In this case, we calculate the values in 5% quantile and 95% quantile as two boundaries in each
leaf based on the train set. After training, in prediction, each observation in test set will go through every
tree and get the prediction set as large as the number of trees. The observations in test set which fall
into a leaf will receive its prediction in the range of the boundaries with 90% probability and we can
easily check if the range contains the true value for the test set later. Note that obviously other estimates
of the distribution like median can be calculated by the quantile random forests. Figure 5.11 shows the
true value and the prediction interval for each observations in the test set, and Figure 5.12 shows the
quantile fitted values of response variable respectively in lower quantile (5%), median (50%) and higher
quantile (95%) for random forests.
Note that in Section 3.3.2, we mentioned one of the advantages of random forests is that: because
not all the covariates are considered in building each node, this model can avoid the over-fitting problem
as long as the number of trees is large enough. Therefore we build 1000 trees without the limitation on
the depth, evaluated by one of the popular criterion methods, mean square error, and bootstrap resam-
pling method. As discussed above, the prediction values at 5% quantile and 95% quantile compose the
lower and upper bound of the prediction interval.
After training, the R2 calculated based on the median prediction for this random forest is surprisingly
0.9907, stating that the covariates explain approximately 99.07% of the variance of the response vari-
able. But note that since different observations contribute different amount of information, the adjusted
R2 may not be the best evaluation method to compare the results of the linear regression (here in this
case it is specifically the median-based L1-norm regression) with the results of random forest (here it is
specifically the median prediction).
Here we use the correct rate at which the true values fall exactly inside of the 90% prediction interval
to compare the two types of models. For random forests, the correct rate is 0.8242, meaning that 82.84%
is predicted correctly with 90% probability. This rate is lower than the one got from linear regression,
which is 0.8327. It shows as expected that each prediction interval describes the upper and lower
bounds for each single observations, therefore, the intervals are tighter and more accurate for predicting
the possible future values for response variable.
48
Figu
re5.
11:
90%
pred
ictio
nin
terv
alfo
rtes
tset
inQ
uant
ileR
ando
mFo
rest
.
49
(a)Low
erquantile.(b)
Median.
(c)H
igherquantile.
(d)Low
erquantile.(e)
Median.
(f)H
igherquantile.
Figure5.12:Q
uantileR
andomForests
fortransformed
implied
volatilityw
ithstrike
andtim
eto
maturity.The
quantilefitted
valuesofresponse
variablerespectively
inlow
erquantile,median
andhigherquantile
areshow
n.
50
5.3.4 A comparison in test set
In order to compare how many observations in test set fall correctly into the 90% prediction interval
generated by the train set (correct rate) through two regression models, we show a brief summary in
Table 5.6.
Table 5.6: Summary of two regression models’ results
Model Structure Correct Rate
Quantile Linear Regression Model 3 83.27%
Quantile Random Forests 1000 trees 82.84%
What is more, because in the test set we have 1411 observations, the figures above are only dis-
played to give a perspective of the general fitting result and the tendency of the implied volatility together
with its prediction interval. Here to be clearly visualized, we choose the first 20 observations to show the
predictions of these observations.
In quantile linear regression, the width of each prediction interval is approximately equal for each
observation. Figure 5.13(a) shows that the quantile linear regression gives prediction intervals with
relatively even widths for these observations, while in Figure 5.13(b) the widths of prediction intervals
are larger when the differences between true values and median predictions are larger for quantile
random forests. The widths of prediction intervals for quantile random forests are generally tighter than
those in quantile linear regression.
This is probably the reason why quantile random forest has less correct rate than quantile linear
regression. Thus even through the points failed to fall into the prediction interval are five, four points
among them are very close to the boundaries. While in quantile linear regressions, there are four points
missed the interval, but only two are close to the boundaries. This indicates that quantile random forest
is more feasible to give accurate prediction and prediction interval for well-fitted observations, and for
the badly-fitted points it sacrifices its accuracy (i.e., by increasing the prediction range) but allows more
space to make correct decision (i.e., to cover the true value under the prediction interval) with 90%
probability.
51
(a) Linear regression.
(b) Random Forests.
Figure 5.13: Two methods with median prediction and quantile boundaries.
52
Chapter 6
Conclusions
6.1 Achievements
The motivation for this work was to develop understanding in the field of options trading, and to propose
a way for computation and estimation of the important parameter, implied volatility. Basically, three steps
were implemented.
Firstly, we reorganized the original dataset and prepared the environment for the calculation of im-
plied volatility. By figuring out the relationship of options together with future contracts, discount values
and the price of underlying index added, we had a clear idea of the tendency of option’s price, and
combined the put options with call options based on Put-Call Parity.
Secondly, with respect to the valuation issue of a derivative’s contracts in finance, the volatility of
the price of the underlying asset is unknown. We derived the implied volatility for each contract through
the Black-Scholes formula and used the bisection method to compute the estimate value of the implied
volatility. After calculating the implied volatility, we analyzed the calculation stability and compared the
differences of results. Here we generated in total 4 types of implied volatility from two forms of Black-
Sholes formula based respectively on the two inputs (future prices or index prices) and two calculation
methods for the range of option prices.
Thirdly, we selected a subset which has some periodic features and has relatively no lack of trading
information, and used it for estimating and predicting the implied volatility by linear and tree-based
regressions. Not only was the median regression considered, but quantile method were applied to
establish the prediction interval and a more general perspective for understanding the dataset. Through
the prediction intervals we built for both linear and random forest regressions, we compared the results
and analyzed the advantages and disadvantages of both models. Based on our analysis, the models
we generated could explain the most of the observations and give a not bad prediction.
From the data set point of view, there were two kinds of dataset we studied during this work, one kind
with information of market option trading movement which was used to derive the implied volatility, the
other kind with information of volatility smile phenomenon which was used to model the implied volatility
based on time to maturity and strike. Due to the large size of the datasets, for each kind of dataset,
53
we extracted one subset to capture the latent patterns and display the results. The ’Example Set’ was
representative of the first kind, containing the information from Jan 03, 2014 to Mar 21, 2014, while the
’maturity at Sept 18, 2015’ gathered the option expired in this maturity.
6.2 Directions for Future Work
Further research in the development of this analysis of real option trades regarding to implied volatility in
this thesis is expected, to explore the full information from the option contracts and reduce the uncertainty
on the behavior of implied volatility for traders.
As for fitting model for the dataset concerned, we have already benefited from catching the subset
with significant features instead of studying the entire dataset. In the future the subsets can be chosen
more specifically, for example, options with one-day time to maturity, options which are only in-the-money
or pure out-of-the-money, options whose maturities are very close to the current index price, etc..
We explored a bit on Gradient Boosting Regression Trees, but due to the tight time and computation
limitation of the computer, it is not easy to calibrate the parameters through cross-validation. Recall that
the type of ensemble techniques we applied, bagging (including random forests), was generated by firstly
creating several subsets of the original training dataset through bootstrap resampling, secondly fitting
each decision tree independently with entire or partial predictors, and finally combining all trees into a
single predictive model based on simple averaging of models after the ensemble. The boosting method
works in a similar way, but different from random forests, the boosting focuses on fitting the residuals
after the prediction of previously generated trees. It is a forward, stagewise procedure. At each iteration,
a new base-learner model is added to increase emphasis on observations fitted poorly by the previous
trees and trained with respect to deduce the error of the current ensemble entity. By using the quantile
Loss Function, a general distribution of the response variable can be captured. By setting the specific
quantile for the Loss function, a Gradient Boosting Regression Tree used for explaining the distribution
of response variable at that quantile is generated, thus its conditional quantile interval can be predicted
as well. Applying Gradient Boosting Regression Tree could possibly give better results for prediction
accuracy in the further study.
If a model is trained by a train set without extreme values, even though the R2s are improved in the
train set, the performance of new model in the pure test set can be even slightly worse than the previous
one trained by the train set with extreme values. A solution might be to calculate the Mahalanobis
distance for the test set using the mean and variance of the train set, and this approach has to be left for
future work.
In Chapter 4 we explained the computation of implied volatility with asset price involved IVS . Due
to the limitation of time, we have not found a very straightforward way to compare the details of the two
datasets, except the basic statistical description. The two datasets are supposed to supplement with
each other as we explained at the beginning of the thesis, which needs to have further attention to dig
out this latent relationship.
In a nutshell, the work developed within this thesis on how to analyze a dataset of real option trading
54
has established a framework which has only been explored on one branch of interesting subsets and its
properties. If desired, it can be extended and give more realistic suggestions in the future.
55
56
Bibliography
[1] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Econ-
omy, 81(3):637–654, 1973.
[2] R. C. Merton. Theory of rational option pricing. The Bell Journal of Economics and Management
Science, pages 141–183, 1973.
[3] P. Tankov. Financial modelling with jump processes, volume 2. CRC press, 2003.
[4] J. D. MacBeth and L. J. Merville. An empirical examination of the black-scholes call option pricing
model. The Journal of Finance, 34(5):1173–1186, 1979.
[5] G. N. Gregoriou. Stock market volatility. CRC press, 2009.
[6] G. W. Schwert. Stock market volatility. Financial Analysts Journal, 46(3):23–34, 1990.
[7] S. J. Koopman, B. Jungbacker, and E. Hol. Forecasting daily variability of the s&p 100 stock index
using historical, realised and implied volatility measurements. Journal of Empirical Finance, 12(3):
445–475, 2005.
[8] T. E. Day and C. M. Lewis. Stock market volatility and the information content of stock index options.
Journal of Econometrics, 52(1-2):267–287, 1992.
[9] S. L. Heston. A closed-form solution for options with stochastic volatility with applications to bond
and currency options. The Review of Financial Studies, 6(2):327–343, 1993.
[10] R. C. Merton. Option pricing when underlying stock returns are discontinuous. Journal of Financial
Economics, 3(1-2):125–144, 1976.
[11] H. Park, N. Kim, and J. Lee. Parametric models and non-parametric machine learning models for
predicting option prices: Empirical comparison study over kospi 200 index options. Expert Systems
with Applications, 41(11):5227–5237, 2014.
[12] R. Koenker and G. Bassett Jr. Regression quantiles. Econometrica: journal of the Econometric
Society, pages 33–50, 1978.
[13] I. Takeuchi, Q. V. Le, T. D. Sears, and A. J. Smola. Nonparametric quantile estimation. Journal of
Machine Learning Research, 7(Jul):1231–1264, 2006.
57
[14] N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7(Jun):
983–999, 2006.
[15] F. Zikes and J. Barunık. Semi-parametric conditional quantile models for financial returns and
realized volatility. Journal of Financial Econometrics, 14(1):185–226, 2014.
[16] D. Brigo and F. Mercurio. Interest rate models-theory and practice: with smile, inflation and credit.
Springer Science & Business Media, 2007.
[17] J. C. Cox, J. E. Ingersoll Jr, and S. A. Ross. A theory of the term structure of interest rates. In
Theory of Valuation, pages 129–164. World Scientific, 2005.
[18] P. Wilmott. Paul Wilmott introduces quantitative finance. John Wiley & Sons, 2007.
[19] J. Voit. The statistical mechanics of financial markets. Springer Science & Business Media, 2013.
[20] H. R. Varian. The arbitrage principle in financial economics. The Journal of Economic Perspectives,
1(2):55–72, 1987.
[21] J. C. Cox and M. Rubinstein. Options markets. Prentice Hall, 1985.
[22] P. Giot. Relationships between implied volatility indexes and stock index returns. The Journal of
Portfolio Management, 31(3):92–100, 2005.
[23] W. F. Sharpe, G. J. Alexander, and J. V. Bailey. Investments, volume 6. Prentice-Hall Upper Saddle
River, NJ, 1999.
[24] D. Bachrathy and G. Stepan. Bisection method in higher dimensions and the efficiency number.
Periodica Polytechnica. Engineering. Mechanical Engineering, 56(2):81, 2012.
[25] R. Sakia. The box-cox transformation technique: a review. The Statistician, pages 169–178, 1992.
[26] P. C. Mahalanobis. On test and measures of group divergence, part i: Theoretical formulae. 1930.
[27] G. E. Box and D. R. Cox. An analysis of transformations. Journal of the Royal Statistical Society.
Series B (Methodological), pages 211–252, 1964.
[28] J. W. Osborne. Improving your data transformations: Applying the box-cox transformation. Practical
Assessment, Research & Evaluation, 15(12):1–9, 2010.
[29] P. C. Mahalanobis. On the generalised distance in statistics. Proceedings of the National Institute
of Sciences of India, 1936, pages 49–55, 1936.
[30] P. Filzmoser and K. Hron. Outlier detection for compositional data using robust methods. Mathe-
matical Geosciences, 40(3):233–248, 2008.
[31] P. J. Rousseeuw and B. C. Van Zomeren. Unmasking multivariate outliers and leverage points.
Journal of the American Statistical Association, 85(411):633–639, 1990.
58
[32] A. S. Hadi. Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society.
Series B (Methodological), pages 761–771, 1992.
[33] R. Farebrother. The historical development of the l1 and l∞ estimation procedures 1793–1930.
Statistical Data Analysis Based on the L1 Norm and Related Methods, North-Holland, Amsterdam,
pages 37–63, 1987.
[34] Y. Li and J. Zhu. L 1-norm quantile regression. Journal of Computational and Graphical Statistics,
17(1):163–185, 2008.
[35] G. A. Seber and A. J. Lee. Linear regression analysis, volume 936. John Wiley & Sons, 2012.
[36] D. Pollard. Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7
(2):186–199, 1991.
[37] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[38] J. Mingers. An empirical comparison of pruning methods for decision tree induction. Machine
Learning, 4(2):227–243, 1989.
[39] M. Mehta, J. Rissanen, R. Agrawal, et al. Mdl-based decision tree pruning. In KDD, volume 21,
pages 216–221, 1995.
[40] M. J. Kearns and Y. Mansour. A fast, bottom-up decision tree pruning algorithm with near-optimal
generalization. In ICML, volume 98, pages 269–277, 1998.
[41] G. James, D. Witten, T. Hastie, and R. Tibshirani. An introduction to statistical learning, volume
112. Springer, 2013.
[42] A. Liaw, M. Wiener, et al. Classification and regression by randomforest. R News, 2(3):18–22,
2002.
[43] P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted least-squares.
Communications in Statistics-theory and Methods, 6(9):813–827, 1977.
[44] R. Maronna, R. D. Martin, and V. Yohai. Robust statistics. John Wiley & Sons, Chichester. ISBN,
2006.
59
60