Handouts 2011

8/3/2019 Handouts 2011

1/168

Handouts for the courseFinancial Econometrics and Empirical Finance I

Francesco Corielli

October 10, 2011

Introduction

The course aims to off er students a selection of probabilistic and statistical applicationscompiled according to a twofold criterion: they should require the introduction of onlyfew new statistical tools and make use as much as possible of what the student shouldalready know; they should be, as far as possible given the introductory level of thecourse, "real world" tools, that is: they should represent simplied versions of toolsreally, and sometimes heavily, applied in the markets.

The course aims also to a much more difficult task: trying to show how probabilisticand statistical thinking can actually be useful for understanding a surviving markets.

Historical experience tells us that the rst aim is achieved for most students, andthis is enough for getting a good grade at the end of the course. History is still too shortfor deciding something about the accomplishment of the second, much more relevant,task.

At the beginning of a course like this, students are concerned with prerequisites.The statistical and probabilistic results which are considered known at the beginning

of this course, are described in detail in the syllabus of the preliminary course, availableon the course workspace in Learning Space.

The syllabus, which roughly corresponds to a standard introductory undergraduatecourse in probability and statistics (so that it could be given as granted), should beread with care and compared with one own knowledge of the topic. The contents of thesyllabus shall not be further discussed during the course (there is a preliminary coursededicated to this). The teachers of this course are fully available to help any studentwith suggestions for readings useful to complete preliminary knowledge. Experiencetells us that sometime students do not remember at once what they indeed did study(and maybe get a good grade for) but that memory of past knowledge comes backeasily when asked for.

1

8/3/2019 Handouts 2011

2/168

Standard tools of analysis and matrix theory are also required.For an even better understanding of what is given for granted (statistics and prob-

ability wise) a summary of the relevant denitions and theory is added as an appendixto these handouts. This is not to be intended as a standalone text in basic statisticsand probability: all students should read this in order to be able to check their levelof knowledge in basic statistics and ask for help to the teachers in case of problems.

The main theoretical tools introduced in this course and perhaps new for moststudents are:

some nonparametric statistics useful for value at risk computations

an introductory but rather complete treatment of the multivariate linear model

principal component analysisUsing these tools, and more basic notions of probability and statistics, the coursedescribes applications in the elds of:

value at risk computation

factorial risk models

style analysis

efficient portfolio analysis

performance evaluation

Black and Litterman style asset allocation

A note on these handouts

This is a completely revised and greatly extended version of the course handouts asthey were up to the academic year 2009/2010. While the topics considered in thecourse did not change, a big eff ort was made in trying to improve and complete the

exposition and to integrate in the handouts a lot of studying materials which werepreviously given separatedly.Among these, the appendix which sumamrises required knowledge on probability

and statistics, a cross table of past exams exercises divided by topics and links torelevant examples.

Other appendixes have been and are being added which I hope shall be useful inintegrating the text and improving its understanding.

The work is still in progress, many points must still be squared out and a lot of errors are undoubtedly still there. In particular a lot of links in the document are still

2

8/3/2019 Handouts 2011

3/168

missing or inactive: the reason for this is that the only reasonable way to use links isnot to require les to be in known directories with unchanged names but to directlypoint to Internet addresses. This requires the relevant documents to be available onthe Internet and, for the moment, this is so only in that they are available in LearningSpace (which is linkable but in a difficult way). This shall be corrected in the next fewmonths.

Ill be grateful for any suggestion, correction, comment.The version of this document available on Learning Space shall be frequently up-

dated during the course so check about it from time to time.

Probability, Statistics and Finance

There is not much need for justifying the presence of (several) courses in applied prob-ability and statistics within a graduate curriculum in nance.

If required, a justication could follow a number of leads.A very direct one could be a visit to any trading oor, participation to meetings

for an M&E operation, spending some time in the risk management o ffice of a bankor simply a reading of laws and regulations concerning the management of nancialcompanies.

Another possible way could be browsing through the program of institutional examsrequired in order to relate with clients in international markets.

But the simplest and most direct way, at least in our opinion, is to point out thefact that most of nance has to do with deciding today prices for economic entitieswhose precise future value is unknown. In such a eld, the availability of a languagefor speaking about uncertain events which, at least in principle, satises simple re-quirements of no contradiction (usually tagged as "coherence" or "no arbitrage") isobviously a necessity.

Up to present time, the most successful language devised with such a purpose isprobability theory (competitors exist but lag far behind both in popularity and practicaleff ectiveness).

It is interesting to notice that the language of probability theory, intimately conju-gate to the statement of prices for betting on uncertain events, is more directly indicated

for dealing with uncertainty in the eld of nance, where bets are actually made, than,say, in physics, where the problem is not (at least at a prima facie level) that of bettingon uncertain results but, maybe, that of describing the long term frequencies (a veryunempirical concept) of experimental results 1 .

1 See the appendix on pag. 109 for a summary of a denition of probability based on bettingsystems and and its connection with frequencies.

3

8/3/2019 Handouts 2011

4/168

Probability and Statistics

The use of probability in nance can be seen as a direct o ff spring of the classical originof probability in the context of gambling.However, gambling problems are usually much easier to deal with than, say, security

pricing problems. The reason for this is that in most "games of chance" two elementsare usually agreed upon: rst, the nature of the game is such that the probabilitiesof its results are agreed upon by the vast majority of participants; second, in typicalsituations the betting decisions of players do not change the probability of results (weare speaking about games of chance the like of roulette trente-quarante, rouge et noir,dice games and the like. In most card games the element of chance in the card dealingis then mediated by a strategic element in the card play phase and this makes thingsmuch more complex).

The consequence of the rst point is that in typical games of chance statistics, asa tool for choosing probabilities, is not required (while it could be required in otherbetting settings, the like, e.g., of horse racing).

Probability theory has noting to say about the "right" probabilities to assign topossible events. Its eld is the consistency (no arbitrage) among probability statementswhose numerical values do not origin in probability theory itself (except for obviouscases like the probability of the sure or impossible event) and, to a lesser degree, theinterpretation of such statements.

It happens that the basic input required by Probability theory, namely the proba-bilities of simple events, are agreed upon in most games of chance almost as everybodyagreed to simple symmetry arguments from which numerical values of probabilities arederived 2 . Maybe these symmetry arguments are justied by some putative set of pastobservations, however, the agreement is so widespread that it could be possible, if stillwrong, to tag as "wrong" probability assessments which disagree with the majoritys.In this sense an inference tool for deriving probability estimates from, say, past data,is not directly required for gambling 3 .

In words we are used to when considering nancial risk management, we could saythat in chance games there is no model and estimation risk: probabilities of possibleevents can well be considered as given and, in some difficult to specify sense, correct.

Let us pass to the second point. The fact that, say, the future price of an asset is

directly dependent on the bets made on it (maybe in an irrational way) by traders isa mainstay of nance (and economics). It embodies the complex interaction between judgment of values and expectations of prices which, through the concept of equilib-rium, is both the stu ff of economic theory and of day by day work in the markets. This

2 A great probabilist: Laplace, believed that this way of computing probability by symmetry was,or should be, possible in any sensible application of the concept of probability, excluding in this way,for instance, the application of probability to horse racing betting.

3 It must be said that some statistics is actually used for periodical checking of the "randomness"of chance generating engines used in gambling (the like of roulettes of fortunes wheels or dice)

4

8/3/2019 Handouts 2011

5/168

interaction by itself contributes in determining the probabilities of market events, enti-ties these which cannot at any time be considered as "given" as those in typical gamesof chance. This interaction is very complex but also subject to change and usually doesnot satisfy symmetry arguments so that it cannot, with the exception of very simplecontexts, be ignored (as we ignore, eg in the rolling of a die, the complex but stableand symmetry justiable physical model of its chaotic rebounds on a hard surface) andexchanged with a simple symmetry induced probability model (for the die: each sidehas 1/6 probability to turn up).

For one: human beings are not identical, even when they behave in a panic riddenway, and their di ff erences are not so nicely constrained that the result of their behaviourcan realistically be described as if they were identical (as in representative agentmodels).

We are not sure or even just agree on the models of these interactions and, providedat least enough stability can be assumed, for each model we need to estimate relevantparameters.

We say that in nance we have both model risk (disagreement on models) and es-timation risk (unknown parameters are estimated, estimates are a ff ected by samplingerror).

In principle it is possible to separate the two aspects that make nancial markets andgambling casinos diff erent. It is possible, and, in fact, it has been done in the past (forinstance by Yahoo Finance) to create ctional markets where "stocks" with absolutelyno economic meaning are "traded" between agents and future prices of these stocks are

determined (typically through an auction system) by the amounts "invested" in themby players. It is interesting to notice that in such contexts, where the "true value" of each share is in fact known to be 0 and the aim of the game is only that of moving aheadof the ock, prices follow paths which are very similar, qualitatively, to those observedin real nancial markets. This should be instructive for understanding how, even whenthe traded securities have a real (if numerically not known) economic meaning andvalue, the simple interaction of agents in the market can create an evolution of pricespartially independent on such value 4 .

4 Economists hope that such partial independence is not too strong. In fact, nancial marketshave the relevant role of allocating investments among di ff erent economic endeavors in some e ffi cientway, where efficient should mean, roughly, that investments with better prospects should receive, atleast on average, more money. The question is whether markets are a setting in which this happensor whether the market induced noise can overwhelm any value signal. The history of market crisescontains a rich set of clues about an answer to this question and we can say that, at least in somecases, the answer may be in the negative (but we are at a loss if we are asked for some systems di ff erentthan market and able to be right at least as frequently).

Notice that even in casino games, where the value of each game (the probability of each possibleresult times the payo ff of each result) is known, frequencies of bets uctuate, together with playerswhims, and it may be possible that, on some gambling tables and for some numbers or colors, weobserve a concentration of bets which is totally unjustied by some anomalous probability of the

5

8/3/2019 Handouts 2011

6/168

Ultimately the decision of how much to bet on a given future scenario requiresboth an assessment of economic value AND an evaluation of the consequences of theinteracting opinions of agents. This is a very di fficult task, which, as we said above,cannot rely on simple symmetry arguments of the like used for "sharing probabilities"in standard games of chances.

Tools are required for economic evaluation and tools are required for connectingpast observations of market and, more in general, economic events to the statementof probabilities useful for deciding future actions. In the nancial eld this makesprobability intimately connected with statistics and more in general with economics.

A Caveat. From what we wrote here it could be deduced that the business of probability and statistics in nance is that of forecasting future prices. If by this ismeant forecasting the exact value of a future price, this would be a wrong deduction.

Instead, if by the term forecasting we intend the assessment of probability distri-butions for future prices, we get a clearer picture of what we intend to do. Standardintroductions to nance theory stress this point by describing in a very simplied wayan investment decision as a choice among risk/expected returns pairs. More advancedanalysis describe investment as a choice among probability distributions of future re-turns.

Antidotes to delusionsThere is another relevant reason for the study of probability and statistics during

a nancial oriented training. Financial markets are full of intrinsic randomness inthe sense that since we do not possess, and with all probability shall never possess,tools which allow us to forecast the future with precision, we must learn to live in anenvironment of unresolvable uncertainty.

The human mind does not seem to adapt well to environments of this kind. Eachtime the future value of variables is relevant for us but we are not able to determinecoveted result but that may, all the same, last even for considerable time. (The Reader should thinkabout The huge literature on late numbers in bingo, lotto and similar games).

As we wrote above, nancial markets are casinos with the added twist that probabilities of outcomesare not known and that outcome themselves depend on the opinions and hopes of players. The resultingmess is, then, fully understandable, we do not like this but, alas, a better tool (or, at least not worse)

for allocating investments among uncertain prospects is still to be discovered!To conclude this footnote on a positive tone we must say that the study of modern nancial marketshas an advantage wrt the study of modern economic systems. While irrational, arbitrage ridden,behaviour is possible in both settings, at least in normal times modern nancial markets tend topunish arbitrage allowing investors in such a quick and (nancially) harmful way that propensity for acoherent assessment of ones own personal bets is a strong point (we repeat: in normal times) of mostbig investors. In more general economics sitations such punishment is not so quick, it can typically bemade to burden on the decision makers o ff spring or on other people hence it does not bind too muchdecisions. In other words the nancial setting is a privileged setting in economics at least because wecan assume that most of times the agent behaviour may be stupid but not irrational.

6

8/3/2019 Handouts 2011

7/168

it, either by forecasting or by direct intervention, our brain, which craves for stablepatterns, shall, if uncontrolled, tend to create such patterns out of nothing and be fooledinto believing in illusions (there exists an immense literature on gambling behaviorwhich substantiates this statement).

This explains at least a subset of observed irrational behaviours of investors.Statistics and probability are relevant also because they can be seen as antidotes to

such delusions 5 . They may not make us right most of times and it may quite be thatsome lucky dumbass shall have results better than ours. But, at least, they may help usin not being upset if something which a priori we considered likely does indeed happenor prevent us in changing our decision rule after events which conrm the optimalityof such rule. Maybe it is not much but in the not too long run it counts for much.By the way, they help us in understanding that, given the amount of variance in the

market ant the huge number of dumbass investors, the fact some investor of this kindshall be better o ff than us is so likely to be almost sure and, for this reason, this factsshould not upset us or induce us in dumbassing.

1 Returns

1.1 Return denitions

There is a love story with returns in nance: while prices are the nancially relevantquantities (what we pay and what we get), we often speak and write models aboutreturns. It is true that for one period models there is substantially no di ff erence inconsidering a change in price and a return. However, returns, while useful, can betricky (the more so in multiperiod models) and must be understood well.

Let P it be the price of the i th stock at time t.The linear or simple return between times t j and t j 1 is dened as:

r it j = P it j /P it j 1 1

The log return is dened as:

rit j = ln( P it j /P it j 1 )

In both these denitions of return we do not consider possible dividends. There existcorresponding denitions of total return where, in the case a dividend D j is accruedbetween times t j 1 and t j , the numerator of both ratios becomes P t j + D j .

5 Only partial antidotes: at some time in the future everybody shall fell to the lure of randomnessdeciphering. Anecdotes where the best statisticians fell for it pepper introductory and expositorybooks on statistics and probability.

7

8/3/2019 Handouts 2011

8/168

Moreover here we do not apply any accrual convention to our returns, that is: we just consider period returns and do not transform these on a, say, yearly basis.

It is to be noticed that, while P t j means price at time t j , r t j is a shorthand for return between time t j 1 and t j so that the notation is not really complete and itsinterpretation depends on the context. When needed for clarity, we shall specify returnsas indexed by the start and the end point of the interval in which they are computedas, for instance, in r t j 1 ;t j .

The two denitions of return yield di ff erent numbers when the ratio between con-secutive prices is far from 1.

Consider the Taylor formula for ln(x) for x near to 1:

ln(x) = ln(1) + ( x 1)/ 1 (x 1)2/ 2 + ...

if we truncate the series at the rst order term we have:

ln(x)= 0 + x 1

so that if x is the ratio between consecutive prices we have that for x near one the twodenitions give similar values. 6

In nance the ratio of consecutive prices (maybe corrected by taking into accountaccruals) is often modeled as a random variable with an expected value very near to 1.This implies that the two denitions shall give di ff erent values with sizable probabilityonly when the variance (or more in general the dispersion) of the price ratio distribution

is non negligible, so that observations far from the expected value have non negligibleprobability. Since standard models in nance assume that variance of returns increaseswhen the time between returns increases, this implies that the two denitions shallmore likely imply diff erent values when applied to long term returns.

Why two denitions? The point is that each one is useful in di ff erent cases.From now on, for simplicity, let us only consider times t and t 1.Let the value of a buy and hold portfolio at time t be:

Xi=1 ..k n iP itIt is easy to see that the linear return of the portfolio shall be a linear function of

the returns of each stock.

r t = Pi=1 ..k n iP itP j =1 ..k n j P jt 1 1 = Xi=1 ..kn iP it

P j =1 ..k n j P jt 1 16 It is clear that ln(x) x 1. In fact x 1 is equal to and tangent to ln(x) in x = 1 and aboveit elsewhere (notice that the second derivative of ln(x) is negative). This implies that if one kind of return is used in the place of the other, the approximation errors shall be all of the same sign.

8

8/3/2019 Handouts 2011

9/168

=

Xi=1 ..kn iP it 1

P j =1 ..k n j P jt 1

P it

P it 11 =

Xi=1 ..kwit r it

Where wit = n i P it 1P j =1 ..k n j P jt 1 are non negative "weights" summing to 1 which repre-sent the percentage of the portfolio invested in the i-th stock at time t 1.

This simple result is very useful. Suppose, for instance, that you know at time t-1the expected values for the returns between time t-1 an t. Since the expected value isa linear operator and the weights wit are known at time t-1 we can easily compute thereturn for the portfolio as:

E (r t ) = Xi=1 ..k wit E (r it )Moreover if we know all the covariances between r it and r jt (if i = j we simply havea variance) we can nd the variance of the portfolio return as:

V (r t ) = Xi=1 ..k X j =1 ..k wiw j Cov(r it ; r jt )For log returns this is not so easy. In fact we have:

rt = ln( Pi=1 ..k n iP it

P j =1 ..k n j P jt 1

) = ln( Xi=1 ..k n iP it 1

P j =1 ..k n j P jt 1

P itP it 1

) = ln( Xi=1 ..k wit exp(rit ))The log return of the portfolio is not a linear function of the log (and also the

linear) returns of the components. In this case assumptions on the expected values andcovariances of the components cannot be translated into assumptions on the expectedvalue and the variance of the portfolio by simple use of basic expected value of thesum and variance of the sum formulas. Think how difficult this could make anystandard portfolio optimization procedure as, for instance, the Markowitz model.

On the other hand log returns are much easier to use than linear return when weconsider a single stock over time.

Suppose we observe the price P t i at time t1, ...t n the log return between t1 and tnshall be:

rt1 ,t n = lnP t nP t1

= lnP tn P t n 1P tn 1 P t1

= ... = ln Yi=2 ...n P t iP t i 1 = Xi=2 ...n rt iIt is then easy, for instance, given the expected values and the covariances of the

sub period returns to compute the expected value and the variance of the full periodreturn. On the other hand this does not happen for the linear return. We have:

r t1 ,t n =P tnP t1

1 =P t n P tn 1P tn 1 P t1

1 = ... = Yi=2 ...n P t iP t i 1 1 = Yi=2 ...n (r t i + 1) 19

8/3/2019 Handouts 2011

10/168

In general the expected value of a product is di fficult to evaluate and does notdepend only on the expected values of the terms. A noticeable special case is that of non correlation among terms. For the computation of the variance, the problem is evenbigger.

It is clear that, when problems involving the modeling of portfolio evolution overtime are considered, often we shall see promiscuous and inexact use of the the twodenitions. You should keep in mind that standard "introductory" portfolio allocationmodels are one period models.

To sum up: the two denition of returns yield di ff erent values when the ratiobetween consecutive prices is not near 1. The linear denition works very well forportfolios over single periods, the log denition works very well for single securitiesover time.

We conclude this section with two warnings. These should be obvious but experi-ence teaches the opposite.

First. Many other denitions of return exist and each one origins from eithertraditional accounting behavior (and typically is connected with some specic assetclass) or from specic computational needs.

Second. No single denition is the correct or the wrong one. In fact such a statementhas no meaning. The correctness in the use of a denition depends on the context inwhich it is applied (accounting uses are to be satised) and, obviously, on avoidingnaive errors the like of exponentiating linear returns for deriving prices or summingexponential returns over di ff erent securities in order to get portfolio returns.

For instance: the fact that, for a price ratio near to 1, the two denitions givesimilar values should not induce the reader in the following consideration: if I break asizable period of time in many short sub periods, such that prices in consecutive timesare likely to be very similar, I am going to make a very small error if I use, say, thelinear return in the accrual formula for the log return. This is wrong: in any singlesub period the error is going to be small, but, as seen above, it is always equal in signso that it shall sum up and not cancel, and on the full time period the total error shallbe the same non matter in how many sub periods we divide it.

1.2 Price and return dataFinance is full of numbers, price data and related statistics are gathered for commercialand institutional reasons and are readily available on free and commercial databases.This has been true for many years and for some relevant market, databases have beenreconstructed back to nineteen century and in some case even before.

As in any eld where data are so overwhelmingly available, any researcher must becautious before using them and follow at least some very simple rules which could besummarized 1in the sentence KNOW YOUR DATA BEFORE USING IT!.

What does the number mean? How was it recorded? Did it always mean the same

10

8/3/2019 Handouts 2011

11/168

0,6

r and r* as functions of Pt/P t 1

0,4

0,2

0

0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 r

*

0,2

0,4

0,6

1 1

8/3/2019 Handouts 2011

12/168

thing? These are three very simple questions which should get an answer before anyanalysis is attempted. Failure to do so could taint results to such a way as to makethem irrelevant or even ridiculous. This is not the place for a detailed discussion butit could be useful to analyze a very simple example.

Suppose you wish to answer to the following question: how did the US stock marketbehave during its history.

You go to the Internet and start a search for literature on the topic. You nd awhole jungle of academic and non academic references among which you choose twofrequently quoted expository books by famous academicians: Irrational exuberanceby Robert J. Shiller (of Yale) and Stocks for the long run By Jeremy J. Siegel (of Wharton) 7 . You browse through the rst chapter of both and nd Figure 1-1 of Siegelwhich tells you that 1 dollar invested in stock in 1802 would have become 7,500,000

dollars by 1997 where (this is in the text) 1 dollar of 1802 is equivalent (according toSiegel) to 12 dollars in 1997. The real return should have been of about 625000 timesin real terms (62,500,000% quite unreadable!)

On the other hand Figure 1.1 of Shillers book gives the following information:between 1871 and 2000 the S&P composite index corrected by ination grew from(roughly) 70 to (roughly) 1400 with a real return of roughly 20 times (2000%). Twobig numbers but quite di ff erent.

You are puzzled, sure a part of the di ff erence is due to the diff erent time basis.Looking to Siegel picture you see that the dollar value of the investment around 1870was about 200, even exaggerating ination, attributing the full 12 times devaluation to

the 1870-2000 period, and assessing this 200 to be worth 2400 1997 dollars, we wouldhave a real increase of 3125 times which is still more than 150 times Shiller number.This obviously cannot come from the di ff erence in terminal years of the sample asthe period 1997-2000 was a bull market period and should reduce, not increase, thediff erence.

Now, both Authors are famous nance professors and at least one of them (Shiller)is one of the gurus of the present crisis. So the problem must be in the reader (us). Letus try to improve our understanding by reading the details. First we notice that Siegelquotes as source for the raw data the Cowles series as reprinted in Shiller book Marketvolatility for the 1871-1926 period and the CRISP data for the following period, whileShiller speaks about the S&P composite index. But reading with care we see anotherdiff erence: Shiller speaks about a price index while Siegel about a reinvested dividendstotal return index. Is this the trick? Browsing the Internet we see that Shillers dataare actually available for downloading ( http://www.econ.yale.edu/

shiller/data.htm ).

We can compute the total return for Shiller data between 1871 and 1997 and the realincrease now in from 1 dollar to 3654 dollars in real terms.

We also see that the CPI passed from 12 to 154 in the same time interval so that the7 The connection between the two authors and the two books is clearly stated by Shiller in his

Acknowledgments.

12
http://www.econ.yale.edu/~shiller/data.htmhttp://www.econ.yale.edu/~shiller/data.htmhttp://www.econ.yale.edu/~shiller/data.htmhttp://www.econ.yale.edu/~shiller/data.htm

8/3/2019 Handouts 2011

13/168

12 times rule for the value of the dollar used by Siegel seems a good approximation.There is still some disagreement between the numbers (Siegel 3125, but with ex-

aggerated ination, and Shiller 3654) but we think that, at least for answering ourquestion, we have enough understanding.

In this very short and summary analysis we did learn some important things:First: understand your question. How did the US market behave during its history

is, now we understand, quite not well specied a question. Do we wish for a summaryof the history of prices or for the history of one dollar invested in the market? The twodiff erent questions have two di ff erent answers and require di ff erent data.

Second: understand your data. Price data? Total return data? raw or inationcorrected?

There are many subtle but relevant points that should be made, we only mention

the Survivorship Bias problem which taints the ex post use of nancial series.But we stop here for the moment and do not mention the fact that a lot of discussion

has run about the relevance of the questions and of the answers and their interpretation.The fact is: Siegel and Shiller start with similar data but they reach quite di ff erent

conclusions (at least, this is their opinion on their work).

ExamplesExercise 1a - returns.xls Exercise 1b - returns.xls

2 Logarithmic random walk

The (naive) log random walk (LRW) hypothesis on the evolution of prices states that,if we abstract from dividends and accruals, prices evolve approximately according tothe stochastic di ff erence equation:

ln P t = ln P t + t

where the innovations t are assumed to be uncorrelated across time ( cov(t ;t 0 ) =

0 8t 6= t0), with constant expected value and constant variance2

. Sometimes,a further hypothesis is added and the t are assumed to be jointly normally distributed.In this case the assumption of non correlation becomes equivalent to the assumptionof independence.

Since ln P t ln P t = rt ;t the LRW is obviously equivalent to the assumptionthat log returns are uncorrelated random variables with constant expected value andvariance.

A linear random walk in prices was sometimes considered, in the earliest times of quantitative nancial research, but it does not seems a good model for prices since a

13

8/3/2019 Handouts 2011

14/168

FIGURE 1-1Total Nominal Return Ind exes, 1802-1997

1 4

8/3/2019 Handouts 2011

15/168

1 5

8/3/2019 Handouts 2011

16/168

sequence of negative innovations may result in negative prices. Moreover, while thehypothesis of constant variance for returns may be a good rst order approximation of what we observe in markets, the same hypothesis for prices is not empirically sound:in general prices changes tend to have a variance which is an increasing function of their level.

A couple of points to stress.First: is the fraction of time over which the return is dened. This may be

expressed in any unit of time measurement: = 1 may mean one year, one month,one day, at the choice of the user. However, care must be taken so that and 2 areassigned consistently with the choice of the unit of measurement of . In fact and

2 represent return, expected value and variance over an horizon of Length = 1 andthey shall be completely di ff erent if 1 means, say, one year or one day (see below for a

particular convention in translating the values of and 2 between diff erent units of measurement of time.

Second: suppose the model is valid for a time interval of and consider whathappens over a time span of, say, 2 .

By simply composing the model twice we have:

ln P t = ln P t 2 + t + t = ln P t 2 + uthaving set ut = t + t . The model appears similar to the single one and in

fact it is but it must be noticed that the u t while uncorrelated (due to the hypothesison the t ) on a time span of 2 shall indeed be correlated on a time span of . This

means, roughly, that the log random walk model can be aggregated over time if we drop the observation in between each aggregated interval (in our example the modelshall be valid if we drop every other original observation).

This is going to be relevant in what follows.The LRW was a traditional standard model for the evolution of stock prices. It is

obviously a wrong model: prices are not dictated by chance and it can be consideredas a descriptive model in the sense that its success depends not on its consistency withthe actual process of price creation (it would fail miserably) but on its consistency withobserved large scale statistical properties of prices. Where consistency is measuredby comparing probabilities of events as given by the model with observed frequencies

of said events.From this point of view, while the model is not dramatically wrong and still usefulfor introductory and simple purposes, the weight of empirical analysis during the lasttwenty years, has led most researchers to abandon this hypothesis as a complete andsatisfactory description of stock price behavior.

While no consensus has been reached on an alternative standard model, there is ageneral agreement about the fact that some sort of (very weak) dependence exists fortodays returns on the full or at least recent history of returns. Moreover the constancyof the expected value and variance of the innovation term has been questioned.

16

8/3/2019 Handouts 2011

17/168

In any case the LRW still underlies many conventions regarding the presentationof market statistics. Moreover the LRW is perhaps the most important justicationfor the commonly held equivalence between the intuitive term "volatility" and thestatistical entity "variance" (or better "standard deviation").

An important example of this concerns the annualization of expected value andvariance.

We are used to the fact that, often, the rate of return of an investment over a giventime period is reported in an annualized way. The precise conversion from a periodrate to a yearly rate depends on accrual conventions. For instance, for an investmentof less that one year length, the most frequent convention is to multiply the periodrate times the ratio between the (properly measured according to the relevant accrualconventions) Length of one year and the Length of the investment. So, for instance,

if we have an investments which lasts three months and yields a rate of 1% in thesethree months, the rate on an yearly basis shall be 4%.

It is clear that this is just a convention: the rate for an investment of one yearin length shall NOT, in general, be equal to 4%, this is just the annualized rate forour three months investment. This shall be true, for instance, if the term structureof interest rates is constant. However such a convention can be useful for comparisonacross investment horizons.

In a similar way, when we speak of the expected return or the standard devia-tion/variance of an investment it is common the report number in an annualized wayeven if we speak of returns for periods off less or of more than one year. The actual

annualization procedure is base on a convention which is very similar to the one usedin the case of interest rates. As in this case the convention is true, that is: annualizedvalues of expected value and variance correspond to per annum expected values andvariances, only in particular cases. The specic particular case on which the conventionused in practice is based is the LRW hypothesis.

If we assume the LRW and consider a sequence of n log returns rt at times t, t1, t 2,...,t n + 1 (just for the sake of simplicity in notation we suppose each timeinterval to be of length 1 and drop the generic ) we have that:

E (rt n,t ) = E ( Xi=0 ,...,n 1 rt i) = Xi=0 ,...,n 1 E (rt i) = nV ar(rt n,t ) = V ar( Xi=0 ,...,n 1 rt i) = Xi=0 ,...,n 1 V ar(rt i) = n 2This obvious result, which is a direct consequence of the assumption of constant

expected value and variance and of non correlation of innovations at di ff erent times, istypically applied, for annualization purposes, also when the LRW is not considered tobe valid.

So, for instance, given an evaluation of 2 on daily data, this evaluation is annualizedmultiplying it, say, by 256 (or any number representing open market days, di ff erent

17

8/3/2019 Handouts 2011

18/168

ones exist), it is put on a monthly basis by multiplying it by, say, 25 and on a weeklybasis by multiplying it by, usually, 5.

As we stressed before this is not a convention, but the correct procedure, if theLRW model holds. In this case, in fact, the variance over n time periods is equal to ntimes the variance over one time period. If the LRW model is not believed to hold, forinstance, if the expected value and-or the variance of return is not constant over timeor if we have correlation among the t , this procedure shall be applied but just as aconvention. 8

The fact that, under the LRW, the expected value grows linearly with the length of the time period while the standard deviation (square root of the variance) grows withthe square root of the number of observations, has created a lot of discussion about theexistence of some time horizon beyond which it is always proper to hold a stock port-

folio. This problem, conventionally called time diversication, and more popularlystocks for the long run, has been discussed at length both on the positive (commonlysustained by fund managers) and the negative side (more rooted in academia: PaulSamuelson is a non negligible opponent of the idea).

To have an idea of the empirical implication of the LRW hypothesis (plus Gaussian 9

distribution) we plot in the following gures an aggregated index of the US stockmarket in the 20th century together with 100 simulations describing possible alternatehistories of the US market in the same period, under the hypothesis that the indexevolution follows a LRW with yearly expected value and standard deviation of logreturn identical with the historical average and standard deviation: resp. 5.36% and

18,1%. Data is presented both in price scale (starting value 100) and in log price scale.The reason is simple. Consider the distribution of log return after 100 year under ourhypothesis. This is going to be the distribution of the sum of 100 iid Gaussian RV eachwith expected value of 5.36% and standard deviation 18.1%, Using known results wehave that this distribution shall be Gaussian with expected value 536% and standarddeviation 181%. So, a standard 2 interval for the terminal value of this sum is536% 362%, or, in price terms, 100e.5 . 36

3 . 62 that is an interval with lower extreme569 and upper extreme 794263. This means that under our hypotheses the possiblehistories can be quite di ff erent. To have an idea, the actual evolution of the market asmeasured by our index gave a nal value equal to about 21000 which correspond, assaid, to a sum of log returns of 536%. This is, by construction, smack in the middle of the distribution of the summed log returns and is the median of the price distribution.However, due to the exponentiation, or if you prefer, due to the power of compoundinterest, the distribution of nal values is highly asymmetric (it is Lognormal) so thatthe range of possible values above the median of prices is much bigger than that below

8 Empirical computation of variances over di ff erent time intervals typically result in sequences whichincrease less than linearly wrt the increase of the time interval between consecutive observations. Thiscould be interpreted as the existence of (small) on average negative correlations between returns.

9 Sometimes called also Normal.

18

8/3/2019 Handouts 2011

19/168

it. We only simulated 100 possible histories. Even with such a limited sample we havea top terminal price of more than 2000000 (in a very lucky, for long investors, world.We wonder what studying nance would be in such a world...) and a bottom terminalprice below 100 (again: in a world so unlucky that, had we lived in it, we likely wouldnot talk about the stock market) 10 .

This range in terminal prices is so wide that, in order to make our real price historyvisible, we had to cut a slice from the range of prices.

2.1 "Stocks for the long run" and time diversicationThese are very interesting and popular topics, part of the lore of the nancial milieu. Ashort discussion shall be useful to clarify some issues connected with the LRW hypoth-

esis together with some implicit assumption underlying much nancial advertising.They come in three avors. The rst and the second are a priori arguments de-pending on the log random walk hypothesis or something equivalent to it, the third isan a posteriori argument based on historical data.

The basic idea of the rst version of the argument can be sketched as follows.Suppose single period (log) returns have (positive) expected value and variance 2.Moreover, suppose for simplicity that the investor requires a Sharpe ratio of say S out of his-her investment. Under the above hypotheses, plus the log random walkhypothesis, the Sharpe ratio over n time periods is given by

n

pn =p

n

10 Compare this with the Siegel-Shiller data we discussed in section 1, then think about the result of our simulation in such extreme worlds. For instance, with the historical mean and standard deviationof the extreme depressed version 20th century the simulation I would show you in this possible world,provided you an I were still interested in this topic, would be quite di ff erent that what you see here.And all the same, this possible story is a result totally compatible (under Gaussian LRW) with whatwe did actually see in our real history. Spend a little time thinking about this point. It could be illuminating.

Think also to the economic sustainability of such extreme worlds: such extreme market behaviourscannot happen by themselves (this is not the plot of some lucky or unlucky casino guy, it is the marketvalue of an economy, which should sustain such values, provided investors are not totally bum) and

how they could be so absurd just because they underline the possible absurd extreme conclusions wecan derive from a simple LRW model.Last but not least, remember that all this comes from the analysis of the stock market in a very,

up to now, successful country: the USA. But we analyze it so much also because it was successful(and so, for instance, most nance schools, journals and researchers are USA based. This biases ourconclusions if we wish to apply such conclusions to the rest of the world or, even, to the future of USA. Maybe a more balanced view could be gained by comparing this result with the evolution of stock markets all around the world (this is not a new idea, Robert J. Barro, for instance did this in Rare Disasters and Asset Markets in the Twentieth Century. (2006) Quarterly Journal of Economics,121(3): 82366.)

19

8/3/2019 Handouts 2011

20/168

100 years of simulated log random walk data100 simulated paths

(mean log return 5.35% dev.st. 18.1%)

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

0 20 40 60 80 100 120

2 0

8/3/2019 Handouts 2011

21/168

100 years of simulated log random walk data (range subset)compared with USA stock market in the 20th century

(mean log return 5.35% dev.st. 18.1%)

0

10000

20000

30000

40000

50000

0 20 40 60 80 100 120

2 1

8/3/2019 Handouts 2011

22/168

100 years of simulated log random walk dataLog scale

compared with USA stock market in the 20th century(mean log return 5.35% dev.st. 18.1%)

1

10

100

1000

10000

100000

1000000

10000000

0 20 40 60 80 100 120

2 2

8/3/2019 Handouts 2011

23/168

so that, if n is big enough, any required value can be reached. Another way of phrasingthe same argument, when we add the hypothesis of normality on returns, is that, if wechoose any probability , the probability of the investment to yield a n periods returngreater than

n pnz1 is equal to 1 . But this, for

pn > 12

z1

is an increasing function so that for any and any chosen value C , there exists a nsuch that from that n onward, the probability for an n period return less than C isless than .

The investment suggestion could be: if your time horizon is of an undeterminednumber n of years, then choose the investment that has the highest expected returnper unit of standard deviation, even if the standard deviation is very high. Even if thisinvestment may seem too risky in the "short run" there is always a time horizon sothat for that horizon, the probability of any given loss is as small as you like or, thatis the same, the Sharpe ratio as big as you like. Typically, such high return (and highvolatility) investment are stocks, so: "stocks for the long run".

Notice, however, that the value of n for which this lower bound crosses a given C level is the solution of

n pnz1 C In particular, for C = 0 the solution is

pn z1

With the typical stock the / ratio for one year is of the order of about 6. So,even allowing for a big, so that z1 is near one (check by yourself the corresponding), the required n shall be in the range of 36 which is only slightly shorter than theaverage working life.

The properness of such an investment suggestion depends on the investors criterionof choice: this, for instance, could be the full period expected return given some prob-

ability of a given loss, or the Sharpe ratio for the full n periods or, for instance, the per period Sharpe ratio (which obviously is a constant) or, again, the absolute volatilityover the full period of investment (which obviously increases without bounds), and soon.

For instance, a typical doubt is phrased like this: "Why should we consider asproper a given investment for n time periods if we do not consider it proper for eachsingle one of those periods?" This critique is correct if we believe that the investortakes into account the per period Sharpe ratio or some measure of probable loss andexpected return per period. In other words the critique is correct is, very reasonably,

23

8/3/2019 Handouts 2011

24/168

we believe the investor does not consider equivalent investments with identical sharpratios but over di ff erent time spans.

Another frequent critique is: "It is true: the expected value of the investmentincreases without bounds but so does its volatility so, in the end, over the long runI am, in absolute terms, much more uncertain on my investment result" (the mean-standard deviation ratio goes up only because the numerator grows faster than thedenominator). This is reasonable as a critique if we believe the investor to decide anthe basis of the absolute volatility of the investment over the full time period.

We should also point out that choosing a single asset class only because, by itself, ithas the highest Sharpe ratio, should always be criticized on the basis of diversicationarguments.

In the end, acceptance or refusal, on an a priori basis, of this argument depend on

the model we choose for the investors decision making. However there may be a pointin it, at the least if you are a very peculiar kind of investor.

The second version of the argument, again based on the log random walk hypothesis,is a real fallacy (that is: it is impossible to justify it in any reasonable way) and is the"time diversication" argument.

There is an enticing similitude, under the log random walk hypothesis, betweenan investment for one year in, say 10, uncorrelated securities with identical expectedreturns and volatilities (this last is just for simplicity: the argument can be extendedto diff erent expected returns and volatilities), and a 10 year investment in a singlesecurity with the same expected value and volatility.

To be precise, in order for the result to hold we must forget the diff

erence betweenlinear and log returns, moreover the comparison implicitly requires zero interest rates.But lets do it (such an approximate way of thinking is very common in any eldwhere some mathematics is used for practical purposes and it is a sound way to proceedprovided the user is able to understand the cases where his-her approximations donot work).

In this case, the expected return and standard deviation for the return correspond-ing to the rst strategy (which could be tagged as the "average per security" return)are and p n , just the same as the expected value and standard deviation for the"average per year" return of the second strategy.

We should be wary from the beginning in accepting such comparisons: in factthe investments cannot be directly compared since they are investments of the sameamount but on di ff erent time periods.

Moreover, but this is not independent from the previous comment, the comparisonis based on the awed idea that the expected return and variance of the rst investment,can be compared with the average per year expected return and variance of the secondinvestment. In fact, while the expected return and variance of the rst investment areproperties of an eff ective return distribution (that is the distribution of a return whichI could eff ectively derive from an investment) the average expected return and variance

24

8/3/2019 Handouts 2011

25/168

of the second investment are not properties of a return which I could derive from thesecond investment.

All that I can derive from the second investment is the distribution of returns overthe ten years period which, obviously, has ten times the expected value and ten timesthe variance than the distribution of the average return (which, we stress again, is notthe return I could get by the investment).

So no time diversication exists but only a wrong comparison between di ff erentinvestments using di ff erent notions of returns.

Comparable investments could be a ten year investment in the diversied portfolioand a ten year investment in the single security and a possible correct comparisoncriterion could be the comparison between the ten year expected return and returnvariance of the two investments. However, in this case the diversied investment is

seen to yield the same expected value of the undiversied investment but with onetenth of the variance so that, these two investments, now comparable, are by no meansequivalent and the single security investment is seen, in the mean variance sense, as aninferior investment.

Analogously we could ask which investment on a single security over ten years hasthe same return mean and variance as the one year diversied investment. The obviousanswer is an investment of one tenth the size of the diversied investment. In otherwords: in order to have the same eff ective (that is: you can get it from an investment)return distribution the two investments must me non only on di ff erent time periodsbut also of diff erent sizes.

While the rst version of the argument could be argued for, at least under somehypothetical, maybe unlikely but coherent, setting, this second version of the argumentis a true fallacy.

The third version of the stocks for the long run argument is the soundest as itcan be argued for without being liable of unlikely assumptions or even blatant logicalerrors.

It is to be noticed that this third version is not an a-priori argument based onassumptions concerning the stochastic behavior of prices and the decision model of agents (and, maybe some logical error). Instead, it is an "a posteriori" or "historical"version of the argument. As such its acceptance or rejection entirely depends on theway we study historical data.

In short this argument states that, based on the analysis of historical prices, stockswere always, or at least quite frequently, a good long run investment.

Being an historical argument, even if true (and here is not the place to argue foror against this point) this it does not imply that the past behavior should replicate inthe future.

While apparently held by the majority of nancial journalists (provided they donot weight too much, say, the last 30 years of prices in Japan or the last 10 to 15 yearsfor most of the rest of the world), and broadly popular in trouble free times (at least

25

8/3/2019 Handouts 2011

26/168

as popular as the, historically false, argument about real estate as the most sure, if notthe best, investment), ans so quite popular for most time periods, at least in the USAand during the rst thirty and the last fty years of the past century, this argument isquite controversial among researchers.

The two very famous and quite readable books we quoted in the chapter aboutreturns: Robert Shillers "Irrational Exuberance" vs Jeremy Siegels "Stocks for theLong Run" share (sic!) opposite views on the topic (derived, as we hinted at but donot have the time to fully discuss, from di ff erent readings of the same data).

While not the place for discussing the point, we would suggest the reader, just forthe sake of amusement, to consider a basic fault of such "in the long run it was ..."arguments.

We have a typical example of the case where the fact itself of considering the

argument, or even the phenomenon itself to which the argument applies, depends of the fact that the phenomenon itself happened, that is: something "was good in thelong run".

In fact we could doubt about the possibility for an institution (the stock market)which survives in the modern form, at least in the USA since, say, the second half of nineteenth century, to survive up to today, without at least giving a sustainableimpression of off ering some opportunities.

Such arguments, if not accompanied by something else to sustain them, becomesomewhat empty as could be the analogue to being surprised observing that the dish Imost frequently eat, is also among those I like the most or, more in the extreme, that

old people did non die young.Sometimes, however, the "opportunity" of some institution and how to connectthis with its survival can manifest in strange, revealing ways. For instance, gamesof chance exist from unmemorable time with the only "long run" property of makingthe bank holder richer together with the occasional random lucky player, while thepopulation of players is made, as a whole, poorer. So, while it is clear here what is the"opportunity" of this institution (both for the, usually, steadily enriched bank holderand the available, albeit unlikely, hope of a quick enrichment), the survival of such aninstitution based on such opportunities tells us something interesting about on mansmind.

We shall get into this topic time and again in what follows (while we wont beable to analize it in full). This should not puzzle the reader as it is the bitter breadand butter of any research eld where we decide to use probability and statistics forwriting and testing modelsy but only observational data are available and no (relevant)experiments are possible. Let us mention some of these elds: evolutionary biology,cosmology, astronomy. A possible alternative, actually chosen by similar elds the likeof history, is to abandon, or not even think it a serious possibility, the writing of modelsin probability language and the testing of these with statistics. In such elds statisticsis still used not as a tool for testing models but as a tool for describing historical data.

26

8/3/2019 Handouts 2011

27/168

Fields the like of political sciences and sociology are divided in their attitude.If we like fringe movements, there exist a minority of historians (who mostly publish

on economics journals and are not very well considered by mainstream historians, (butthis is the gossip side of academe), mostly inspired by Chicago area new economichistory or cliometrics, which try dealing with history pronlems using probability andstatistics (mostly adapting models from economics). On the other side, a not smallnumber of economists believe that the mainstrem attitude to economics shows an ex-cess in the use of such tools and state that there exists useful economic knowledgewhich cannot be expressed in any available mathematical/probabilistic language. Inextreme cases the extreme statement is made according to which only irrelevant pointsof economics can be described with such tools11 .

ExamplesExercise 2 - IBM random walk.xls

3 Volatility estimation

In applied nance the term volatility has many connected meanings. We mentionhere just the main three:

1. Volatility may simply mean the attitude of market prices, rates, returns etc. tochange in an unpredictable and unjustied manner. This without connection toany formal denition of change, unpredictable or unjustied. Here volatil-ity is tantamount chance, luck, destiny, etc. Usually the term has a negativeundertone and is mainly used in bear markets. In bullish markets the term isnot frequently used and it is typically changed in more positive synonima. Avolatile bull market is exhuberant, tonic or lively.

2. More formally, and mostly for risk managers, volatility has something to do withthe standard deviation of returns and, sometimes, is estimated using historicaldata (hence the name Historical Volatility.

3. For derivative traders and frequently fo risk managers volatility is the name of one (or more) parameters in derivative models which, under the hypotheses thatmake true the models, are connected with the standard deviation of underlyingvariables. However, in the understanding that these hypotheses are never valid in

11 For some short remark about the debate on mathematics and economics see the appendix on pag.113

27

8/3/2019 Handouts 2011

28/168

practice, such parameters are not estimated from historical data on the underlyingvariables (say, using time series of stock returns) but directly backwarded fromquoted prices of derivatives, using the pricing model as tting formula. This isin accord to the strange, but widely held and, in fact, formally justiable, notionthat models may be useful even if the hypotheses underlying them are false. Thisis Implied Volatility.

In what follows we shall introduce a standard and widely applied method for estimatigvolatility on the basis of historical data on returns, that is, we consider the secondmeaning of volatility.

Under the LRW hypothesis a sensible estimate of 2 is:

Xi=0 ,...,n rt i r

2

/n

Where ris the sample mean.This is the standard unbiased estimate for the variance of uncorrelated random

variables with identical expected values and variances (the simple empirical variance of the data, where the denominator its taken as the actual number of observations n + 1 ,could be used without problems as in standard applications the sample size is quitebig).

Notice that each data point is given the same weight: the hypothesis is such thatany new observation should improve the estimate in the same way.

The log random walk would justify such an estimate.In practice, nobody uses such estimate and a common choice is the exponentialsmoothing estimate, while already quite old when suggested by J. P. Morgan in theRiskmetrics context, this is commonly known in the eld as the Riskmetrics estimate:

V t = Pi=0 ,...,n ir2t iPi=0 ,...,n iFrom a statisticians point of view this is an exponentially smoothed estimate witha smoothing parameter: 0 < < 1.

Common values of the smoothing parameter are around 0.95.Users of such an estimate do not consider sensible to consider each data point

equally relevant. Old observations are less relevant than new ones.Implicitly, then, while we believe the log random walk when annualizing volatil-

ity, we do not believe it when estimating volatility.Moreover it shall be noticed that, in this estimate, the sampling mean of returns

does not appear. This is a choice which can be justied in two ways: rst we canassume the expected return over a small time interval to be very small. With a nonnegligible variance it is quite likely that an estimate of the expected value of returns

28

8/3/2019 Handouts 2011

29/168

could show an higher sampling variability than its likely size and so it could createproblems to the statistical stability of the variance estimate 12 . Second, an estimateof the variance where the expected value is set to 0 tends to overestimate, not tounderestimate, the variance (remember that variance equals the mean of squares lessthe squared mean). For institutional investors, traditionally long the market, this couldbe seen as a conservative estimate. Obviously this is not a reasonable choice for hedgedinvestors and derivative traders.

The apparent truncation at n should be briey commented. As we have just seenthe standard estimate should be based on the full set of available observations. Thiscould be applied as a convention also to the Riskmetrics estimate. On the other handconsider the fact that, e.g., a = 0 .95 raised to the power of 256 (conventionally oneyear of daily data) is less than 0,000002. So, at least with daily data, to truncate n

after one year of data (or even before) is substantially the same as considering the fulldata set.

As it is well known:

Xi=0 ,..., 1 i = 1 / (1 )(for 0 < < 1) so that we can approximate the V t estimate as:

V t = (1 ) Xi=0 ,...,n ir2t iIn order to understand the meaning of this estimate it is useful to write it in a recursiveform (this is also useful for computational purposes). We can directly check that:

V t = V t 1 +r2t

Pi=0 ,...,n in +1 r2t n 1

Pi=0 ,...,n iIn fact, sinceV t 1 = Pi=0 ,...,n ir2t 1 iPi=0 ,...,n iWe have

12 A simple back of the envelope computation: say the standard deviation for stock returns overone year is in the range of 30%. Even in the simple case where data on returns are i.i.d., if we estimatethe expected return over one year with the sample mean we need about 30 observations (years!) inorder to reduce the sampling standard deviation of the mean to about 5.5% so to be able to estimatereliably risk premia (this is nancial jargon: the expected value of return is commonly called riskpremium implying some kind of APT and even if it also contains the risk free rate) of the size of atleast (usual 2 rule) 8%-10% per year (quite big indeed!). Notice that things do not improve if we usemonthly or weekly or daily data (why?). It is clear that any direct approach to the estimate of riskpremia is doomed to failure. A connected argument shall be considered at the end of this chapter.

29

8/3/2019 Handouts 2011

30/168

V t = Pi=0 ,...,n

ir2t 1 i

Pi=0 ,...,ni +

r2t

Pi=0 ,...,ni

n+1 r2t n 1

Pi=0 ,...,ni =

= Pi=0 ,...,n i+1 r2t 1 iPi=0 ,...,n i +r2t

Pi=0 ,...,n in +1 r2t n 1

Pi=0 ,...,n i ==

r2t + Pi=0 ,...,n 1 i+1 r2t 1 iPi=0 ,...,n i = Pi=0 ,...,n

ir2t i

Pi=0 ,...,n iWhich is the denition of V t .For the standard range of values of and n the last term can be approximated with

0. Using the approximate value of the denominator we have:

V t = V t 1 + (1 )r2tIn practice the new estimate V t is a weighted mean of the old estimate V t 1 (weight

, usually big) and of the latest squared log return (weight 1 , usually small).A simple consequence of this (and of the fact that the estimate does not consider the

mean return) is the following. Since the squared return is always non negative and isusually near one, this formula implies that V t , even if the new return is 0, is still going tobe equal to V t 1 so that the estimated variance at most can decrease of a percentage of 1 . On the other hand, it can increase, in principle, of any amount when abnormallybig squared returns are observed. This implies an asymmetric behavior: following anyshock, which introduces an abrupt jump in V t , while a sequence of normal values forreturns shall reduce the estimated value in a smoothed way, the faster the smaller is

. The reader should remember that this behavior of estimated volatility is purely afeature of the formula used for the estimate.

The use of such an estimate of 2 implies a disagreement with the standard versionLRW hypothesis, as described above,e as it implies a time evolution of the variance of returns. The recursive formula:

V t = V t 1 + (1 )r2tis the empirical analogue of an auto regressive model for the variance of returns the

like of:

2t =

2t 1 +

2t

which is a particular case of a class of dynamic models for conditional volatility(ARCH Auto Regressive Conditional Herteroschedastic) of considerable fortune in theeconometric literature.

30

8/3/2019 Handouts 2011

31/168

3.1 Is it easier to estimate or 2?

It is useful to end this small chapter discussing a widely hold belief, supported by someempirical result, according to which the estimation of variances (and in a lesser degreeof covariances) is an easier task than the estimation of expected returns at least inthe sense that the percentage error in the estimate shall be smaller that in the case of expected return estimation.

The educated heuristics underlying such a belief are as follows.Consider log returns from a typical stock, let them be iid with expected value (on

an yearly basis) of .07 and standard deviation .3. The usual estimate of the expectedvalue, that is the arithmetic mean, shall be unbiased and with a sampling standarddeviation of .3/ pn where n is the number of years used in the estimation (and thisshall be independent on the frequency of observations we are going to use). Hence,the ratio between the true expected value and its sampling standard deviation shallbe smaller than one even with ten years of data, or, in other words, the error couldeasily be bigger than the value you wish to estimate even with 10 years of data.

Now, consider the estimate of the variance and, to keep things simple, supposewe use the standard and not the exponentially smoothed estimate and suppose theexpected value to be known (the general case with smoothed estimate and unknownexpected value yields more difficult computations but a similar result). In this case weneed only an estimate of the second moment (our estimate is going to be the secondempirical moment minus the square of the expected value) and the best (in our setting)unbiased estimate is going to be the mean of square returns.

Let us now compute the sampling variance of our variance estimate and let x i bethe period return

V (Xi x2i

n2) = V (Xi x

2i

n) =

1n

V (x2) =1n

(E (x4) E (x2)2) =1n

(E (x4) ( 2 + 2)2)

In our setting the only unknown is the fourth moment. If we suppose our data to beGaussian (or not very unGaussian) the fourth moment is a function of the rst andsecond moment and we have

E (x4) = 3 4 + 6 2 2 + 4

So that the sampling variance of the variance estimate is:

V (Xi x2i

n2) =

2n

2( 2 + 2 2)

So that the sampling standard deviation of the estimated variance shall be, with ournumbers, .13403/ pn.

31

8/3/2019 Handouts 2011

32/168

The return variance is .09 so that, using 10 yearly data, we get a ratio of more than2 between the value of the parameter and its sampling standard deviation. Over 10years, we are, in a somewhat pictorial language, more that twice better o ff in estimatingvariance that expected value.

But there is more on this, suppose we use, say, monthly data in the place of yearlydata.

Under the iid hypotheses the expected value of monthly returns (let us call it againxi) shall be / 12 and the corresponding variance 2/ 12.

The estimate of the yearly expected value based on the monthly data shall be givenby the monthly average return times 12. The variance of this estimate, over n years of data, that is n12 observations shall then be

V (1212n

Xi=1 xi

12n) = V (

12n

Xi=1 12xi

12n) = V (

12n

Xi=1 xi

n) = 12n

n2V (x) = 12

n

2

12=

2

n

Nothing changes: there is no advantage in using more frequent observations (well, thisshould be obvious: using log return, if we begin and end with the same prices the meanyearly return shall always be equal to the log ratio of these two prices divided by thenumber of years (maybe containing decimals) between prices, independently on howmany times we divide the time period.

For the variance things are di ff erent, again, the estimate of the yearly variance onthe basis of monthly data shall be equal to 12 times the monthly empirical variance.

In our setting we have to compute

V 1212n

Xi=1 x2i

12n2!!= V 12nXi=1 12x

2i

12n != 12nn2 V (x2) = 12n V (x2)Up to now everything goes as in the case of the mean but remember that now returnshave an expected value and a variance divided by 12 so that

V (x2) = E (x4) E (x2)2 = 34

144+ 6

2

1442

12+

12

4

2

12+

2

1442

=

= 2

4

144

+ 4

12

22

12= 2

2

12

2

12+ 2

12

2

and the above formula for the sampling variance of the variance becomes now12n

V (x2) =12n

22

122

12+ 2

12

2

=2n

22

12+ 2

12

2

This is NOT what we had before, it is smaller, in fact if we ignore the term in (which,being squared and divided by 144 is going to amount to almost nothing) this result is

32

8/3/2019 Handouts 2011

33/168

going to be 12 times smaller than before so that the sampling standard deviation of thevariance shall be roughly 3.46 (the square root of 12 to 2 decimal points) times smallerand the ratio of yearly variance to the sampling standard deviation of its estimate shallbe 3.46 times bigger than before .13

Frequency of data matters when estimating the variance and we can expect analo-gous improvements when using weekly or daily data.

All this is true under ideal hypotheses: independence and identical Gaussian dis-tribution. A similar argument holds for dependent observations and non Gaussian but reasonable distributions of returns (for instance the argument breaks down if thefourth moment does not exist).

ExamplesExercise 2 - volatility.xls Exercise 3 - risk premium.xls Exercise 3a - exp smoothing.xlsExercise 3b - historical and implied volatility.xls Exercise 3c - volatility.xls

4 Non Gaussian returns

It can be argued that a reasonable decision maker should be interested in the probabilitydistribution of returns implied by the chosen strategy. This should be true even if incommon academic analysis of decision under uncertainty the use of polynomial utilityfunctions tend to overweight the role of the moments of the return distribution and inparticular of the expected value and variance. 14

In some cases, as for instance in the Gaussian case, the simple knowledge of expectedvalue and variance is equivalent to the knowledge of the full probability distribution. Inthis case the expected value of any utility function shall only depend on the expectedvalue and the variance of the distribution as these two parameters fully specify aGaussian distribution. Another way to say the same is that, in this case, if we areinterested in the probability with which a random variable X can show values lessthan or equal to a given value k, it is enough to possess the tables of the standard

Gaussian cumulative density function and compute:

(k

)

13 The exact values for the sampling standard deviations over 10 years are .04238 with yearly dataand .01167 with monthly data with a ratio of 3.63.

14 Due to linearity of the expected value, the expected value of a polynomial utility function (that isa linear combinations of powers of the relevant variable) is a weighted sum of moments: E (Pi i X i ) =Pi i E (X i ).

33

8/3/2019 Handouts 2011

34/168

But this is not generally true. For instance it is possible to conceive a non standardizedversion of the T distribution where the knowledge of the rst and second moments isnot enough in order to compute probability intervals. In this case a new parameter isnecessary (the degrees of freedom).

It is then of real interest to nd good distribution models for stock returns and, inparticular, to evaluate whether the simplest and most tractable model: the Gaussiandistribution, can do the job.

A better understanding of the problem can be achieved if we consider that, inmost applications, we are not interested in the overall t of the Gaussian distributionto observed returns but only in the quality of t for hot spots of the distribution,mainly tails. In nance the biggest losses are usually connected to extreme, negative,observations (for an unhedged institutional investor). We shall see that the Gaussian

distribution while being, overall, not such a bad approximation of the underlying returndistribution, is not so for the extreme, say 2%, tails 15 .

When studying stock returns, we observe extreme events, mainly negative, in theorder of more that minus 5 and more with a frequency which is incompatible withthe probability of such or more negative events under the hypothesis of Gaussianity.While quite infrequent (do not be fooled by the fact that extreme events always makethe news and so become memorable) they are much more frequent than should becompatible with a Gaussian calibrated on the expected value and variance of observeddata. For instance, the probability of a 5 or more negative observation in aGaussian is less than 0.00000028.

Let us consider an example based on I.B.M. daily returns.Between Jan 2nd 1962 and Dec 29th 2005 the IBM daily return shows a standarddeviation of 0.0164. In this time period for 14 times the return was below 5 (supposea of 0). The number of observations is 11013 so the observed frequency of a 5event is 0.00127, that is: more than 4500 times the probability of such observations fora Gaussian with the same standard deviation!

This is true for a very mature and conservative stock the like of I.B.M.Obviously, a frequency of 0.00127 is very small but the events on which it is com-

puted (big crashes) are those which are remembered in the story of the market. It isquite clear that a Gaussian distribution hypothesis could imply a gross underestimationof the probability of such events.

The behaviour of the empirical distribution of returns can be summarized in themotto: fat tails, thin shoulders, tall head. In other words, given a set of (typically daily)returns over a long enough time period (we need to estimate tails and this requires lotsof data) we can plot the histogram of our data on the density of a Gaussian distribution

15 The Gaussian distribution can be a good approximation of many di ff erent distribution if we areinterested (as is true in many applications of statistics) in the beahviour of a random variable nearits median. For modeling extreme events, having to do with system failures, breakdowns, crisis andsimilar phenomena, a totally di ff erent kind of distribution may be required.

34

8/3/2019 Handouts 2011

35/168

8/3/2019 Handouts 2011

36/168

0,2

Empirical VS Gaussian density. I.B.M. data

0,16

0,18

0,12

0,14

0,08

0,1Relative frequency

Standard Gaussian density

0,04

0,06

0

0,02

3 6

8/3/2019 Handouts 2011

37/168

0,12

Left tail empirical VS gaussian CDF. I.B.M. data

0,1

0,08

0,06

Empirical CDF

Gaussian CDF

0,02

,

0-6 -5 -4 -3 -2 -1 0

3 7

8/3/2019 Handouts 2011

38/168

10

Quantile Quantile Plot. I.B.M. data.

8

4

0

2

-10 -8 -6 -4 -2 0 2 4 6 8

Standard Gaussian equivalent obs

Standardized sorted returns

-4

-2

-8

-6

3 8

8/3/2019 Handouts 2011

39/168

Since the data are standardized, the scale of the plot is in terms of number of standard deviations. We see that, on the left tail, we even observe data near andbeyond -6 times the standard deviation. The tail from minus innity to -6 times thestandard deviation contains a probability of the order of 5 divided by one billion for thestandard Gaussian distribution. We also observe 10 data points on the leftmost 5tail. Since our dataset is based on 10 years of data, roughly 2600 observations, if weread our data as the result on independent extractions from the same Gaussian, theseobservations, while possible, are by no means expected as the probability of observing10 times, in 2600 independent draws, something which has in each draw a probabilityof 0.00000028 to be observed is virtually 016 .

In the following section we shall consider the relevance of these empirical facts fromthe point of view of VaR estimation.

ExamplesExercise 4 - Non normal returns.xls Exercise 4b - Non normal returns.xls

5 Four di ff erent ways for computing the VaR

First it is necessary to dene the VaR 17 .Basically the VaR is a quantile times a sum of money. We begin by dening a time

interval, and a condence coefficient of interest (typical values for time 1 day 1 weeksometimes 1 month, not much more. Typical values for .05 .01. Then we deneR as the random variable return of our portfolio in the period of interest. We thencompute the quantile r dened as:

r = inf {r : P (R r ) }

If the CDF of R is continuous the above denition is equivalent to:16 Use the binomial distribution. Question: suppose the probability of observing a 5 in each of

2600 independent draws is 0.00000028. What is the probability of observing 10 such events? Theanswer,computed with Excel is: 260010 0.00000028

10 (1 0.00000028)2590 = 0 , 0000.... Meaning that,at the precision level of Excel, we have a 0! While the exact number is not 0 this means that, atleast in Excel the actual rounding error could be quite bigger that the result. For all purposes theanswer is 0. Question: in this section we evaluated the un-likelihood of 5 results in two di ff erentways: rst with a ratio between frequency and Gaussian based probability, then using the binomialdistribution and, again, the Gaussian based probability. What is the connection between these two,diff erent, arguments?

17 For this section I implicitly refer to the worksheets: parametric and non parametric var estimateand Gaussian mixture model available on Learning Space

39

8/3/2019 Handouts 2011

40/168

r = r : P (R r ) =

In the end the VaR is simply r times the amount of money invested in the portfolioof which R is the period return.

Just as an aside: the VaR has been widely criticized as a measure of risk. Most of the critiques depend on the choice of strange return distributions, quite inappropriateat our level (we only study stock returns). Anyway, a strong point against the VaRis that it does not tell anything about what happens to its left, that is: the shape of the left tail beyond the VaR quantile is irrelevant for its computation. On the otherhand it could be argued that this shape should be relevant for the actual investmentdecision.

We see that the computation of a VaR boils down to the computation of a quantilein the return distribution. The problem is that the quantile is a rather extreme onein the left tail of the return distribution. We (hopefully) are not going to observe thistail too often, consequently the quality of our estimate could be not satisfactory dueto the lack of data, if we use a model that does not rely on strong assumption, or dueto the incorrectness of our hypotheses if we relay on strong modeling assumptions.

In this section we shall consider four diff erent estimates of the VaR which rely ondiff erent sets of hypotheses. Each estimate shall be presented in a very simple form,the reader is warned that actual implementation of any of these estimates (they are allactually used by practitioners) can imply any amount of additional sophistication.

5.1 Gaussian VaR5.1.1 Point estimate of the Gaussian VaR

This is the most restrictive setting used in practice. We suppose that R is distributedaccording to a Gaussian density with expected value and variance ( , 2) which areeither known or estimated in such a way to minimize sampling error problems. Atypical attitude is that of setting = 0 and estimating 2, for instance, using thesmoothed estimate described above.

The important point to remember with the Gaussian density is that, under thishypothesis, knowledge of mean and variance is equivalent to the knowledge of anyquantile. The reader shall remember of having been exposed to a number of beginnersmodels for decision under uncertainty usually based on some mean-variance compar-ison. This could have been puzzling: after all, why should the decision maker beinterested in these moments and not in the shape of the probability distribution forreturns? A possible justication of this attitude is that, if the return distribution isbelieved to be Gaussian, mean and variance are everything that matters.

Under the Gaussian Hypothesis, the CDF is continuous so we can nd a quantilewith exactly probability on its left for any .

40

8/3/2019 Handouts 2011

41/168

The procedure is simple: we must nd r such that:

P (R r) =

And proceeding with the usual argument, already well known from condence intervalstheory, we get:

P (R r) = = P ((R )/ (r )/ ) = ((r )/ ) = (z)

Where z is the usual quantile for the standard Gaussian CDF (.).We have, then

(r )/ = z

r = + z

A really easy procedure. The problem is that, for small values of we are consid-ering quantiles very far on the left tail and our previous empirical analysis has shownhow the Gaussian hypothesis for returns (overall not so bad) is inadequate for extremetails.

Typically the problem of fat tail shall imply a undervaluation of the VaR.

5.1.2 Approximate condence interval for the VaR

Now a problem: we do not know both and . We must estimate them. The usualRiskmetrics procedure sets = 0 and estimates with the smoothed estimates seenabove. In the end we get an estimate of the VaR, namely br = z. According to soundstatistical practice we should implement this with, at least, a standard deviation. Herewe show a possible approximate and simple way to do so.

Under the assumptions of uncorrelated observations with constant variance andzero expected value it is easy to compute the variance of br

2 = 2z2. In fact we have:

V (

br 2) = z4V Pi=0 ,...,n ir2t i

Pi=0 ,...,n

i != z4

V

Pi=0 ,...,n ir2t iPi=0 ,...,n i

2 = z4Pi=0 ,...,n 2iV r2t iPi=0 ,...,n i

2

= z4Pi=0 ,...,n 2iPi=0 ,...,n i

2 44

41

8/3/2019 Handouts 2011

42/168

8/3/2019 Handouts 2011

43/168

5.1.3 An exact interval under stronger hypotheses (not for the exam)

For those interested in exact condence intervals, we can derive a more formallystrong result using the following theorem:

Theorem 5.1. If {X 1, X 2,...,X n } are iid Gaussian random variables with expected value and standard deviation , then

S 2 =n

Xi=1X i 2

is distributed according to a Chi square distribution with n degrees of freedom.

This implies that, if we estimate the variance with the non smoothed sample vari-ance (with = 0 ):

V (r ) =n

Xi=1 r2i

n

we have thatn V (r )

2 =n

Xi=1 r2i2

> 2n

so that

P (n V (r )

22n, 1 ) = P (

n V (r )2n, 1

2) = 1

where 2n, 1 is the 1 quantile of the 2n distribution.From this we see that a condence interval lower extreme for the quantile is

given by

L br = s n V (r )

2n, 1

z

With the same numbers we used above, and using a = .025 so that 2n, 1 =2256,.975 = 213.5747 an = .025 this becomes

L

br = r

256

0.01642

213.5747 1.96 = .0352against a point estimate of .0321.

We see that our two sigma approximation (which is valid even if we use thesmoothed estimate) is more conservative than the exact result (based on strongerhypotheses).

Note for students: you should see these computations as an example of standard back of theenvelope computations quite common in any eld where statistics is applied for practical purpose.Follow the passages and ask if you do not understand something.

43

8/3/2019 Handouts 2011

44/168

5.2 Nonparametric VaR

5.2.1 Point estimateThe nonparametric VaR estimate stands, in some sense, at the opposite that the Gaus-sian VaR. In the nonparametric case we suppose only that returns are i.i.d. but weavoid making any hypothesis on the underlying distribution.

In order to nd the VaR we need an estimate of the theoretical distribution, thatis unknown. The starting point of all nonparametric procedures is to estimate thetheoretical distribution using the empirical distribution function. Suppose we have asample of n i.i.d. returns which yield observed values {r 1, r 2,...,r n } then our estimateof P (.). shall be:

P (R r ) = F R (r ) =# r i r

nWhere: # r i r means the number of observed returns le

Date post:	06-Apr-2018
Category:	Documents
Upload:	marcofe18
View:	218 times
Download:	0 times

Handouts 2011

Documents