Sample Lecture Notes for Graduate Level Financial ... The_Econometrics of Financial... · Sample...

Sample Lecture Notes for Graduate Level

Financial Econometrics Course

Tian Xie

December 3, 2012

Preface

This is a sample lecture notes for graduate level financial econometrics course. Students are

assumed to have finished an intermediate econometric course and an introductory finance

course or the equivalent. The course aims to help students to establish a solid background

in both theoretical and empirical financial econometrics studies. The lecture notes contain

comprehensive information that is sufficient for a two semester course or it can be condensed

to fit a one semester course, depending on the course design. The lecture notes cover various

topics that start at a moderate level of difficulty, which gradually increases as the course

proceeds.

The lecture notes were developed during my time of completing my PhD in the Queen’s

University economics department. The lecture notes were created based on the following

textbooks:

• “The Econometrics of Financial Markets” by John Y. Campbell, Andrew W. Lo and

A. Craig MacKinlay

• “Finance Theory and Asset Pricing” by Frank Milne

• “Econometric Theory and Methods” by Russell Davidson and James G. MacKinnon

I used these lecture notes when I was working as a lecturer/TA/private tutor for both

undergraduate and graduate students. These lecture notes have received positive feedback

and great reviews from my students, which is demonstrated in the enclosed “2012 Winter

Term Evaluation Form”.

i

Contents

1 Introduction 1

2 The Predictability of Asset Returns 2

2.1 The Random Walk Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1.1 How to Test the Random Walk Hypothesis? . . . . . . . . . . . . . . 2

2.2 Tests For Long-Rang Dependence . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Unit Root Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.4 Recent Empirical Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Market Microstructure 5

3.1 Nonsynchronous Trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1.1 Duration of Nontrading . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.2 Time Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.3 Extensions and Generalizations . . . . . . . . . . . . . . . . . . . . . 7

3.2 The Bid-Ask Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Bid-Ask Bounce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3 Modeling Transactions Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ii

CONTENTS iii

4 Event-Study Analysis 10

4.1 Outline of an Event Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Models for Measuring Normal Performance . . . . . . . . . . . . . . . . . . . 11

4.3 Measuring and Analyzing Abnormal Returns . . . . . . . . . . . . . . . . . . 12

5 The Capital Asset Pricing Model 13

5.1 Review of the CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Statistical Framework for Estimation and Testing . . . . . . . . . . . . . . . 14

5.3 Implementation of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Multifactor Pricing Models 16

6.1 The Arbitrage Pricing Theory (APT) . . . . . . . . . . . . . . . . . . . . . . 16

6.2 Estimation and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.2.1 Portfolios as Factors with a Riskfree Asset . . . . . . . . . . . . . . . 17

6.2.2 Portfolios as Factors without a Riskfree Asset . . . . . . . . . . . . . 18

6.3 Estimation of Risk Premia and Expected Returns . . . . . . . . . . . . . . . 18

6.4 Selection of Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.5 Interpreting Deviations from Exact Factor Pricing . . . . . . . . . . . . . . . 20

7 Present-Value Relations 21

7.1 The Relation between Prices, Dividends, and Returns . . . . . . . . . . . . . 21

7.1.1 Rational Bubbles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7.1.2 An approximate Present-Value Relation with Time-Varying Expected

Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7.2 Present-Value Relations and US Stock Price Behavior . . . . . . . . . . . . . 24

7.2.1 Volatility Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7.2.2 Vector Autoregressive Methods . . . . . . . . . . . . . . . . . . . . . 25

iv CONTENTS

8 Intertemporal Equilibrium Models 26

8.1 The Stochastic Discount Factor . . . . . . . . . . . . . . . . . . . . . . . . . 26

8.2 Consumption-Based Asset Pricing with Power Utility . . . . . . . . . . . . . 27

8.3 Market Frictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8.4 More General Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . 28

9 Derivative Pricing Models 30

9.1 A Brief Review of Derivative Pricing Methods . . . . . . . . . . . . . . . . . 30

9.2 Implementing Parametric Option Pricing Models . . . . . . . . . . . . . . . 32

9.2.1 Implied Volatility Estimators . . . . . . . . . . . . . . . . . . . . . . 32

9.3 Pricing Path-Dependent Derivatives Via Monte Carlo Simulation . . . . . . . 33

10 Fixed-Income Securities 34

10.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

10.1.1 Discount Bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

10.1.2 Coupon Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

10.2 Interpreting the Term Structure of Interest Rates . . . . . . . . . . . . . . . 36

11 Term-Structure Models 38

11.1 Affine-Yield Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

11.2 Fitting Term-Structure Models to the Data . . . . . . . . . . . . . . . . . . . 40

11.3 Pricing Fixed-Income Derivative Securities . . . . . . . . . . . . . . . . . . . 40

CONTENTS v

12 Nonlinearities in Financial Data 43

12.1 ARCH, GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

12.2 Nonparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

12.2.1 Kernel Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

12.2.2 Nonparametric Regression . . . . . . . . . . . . . . . . . . . . . . . . 47

12.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 1

Introduction

They mention this Efficient Market Hypothesis (EMH) that can be traced back at

least as far as the pioneering theoretical contribution of Bachelier (1900!!!!). Fama (1970)

summarizes this idea in a very neat way: “A market in which prices always ‘fully reflect’

available information is called ‘efficient”’.

There are three forms of efficiencies:

• Weak-form Efficiency: The information set includes only the history of prices or

returns themselves.

• Semistrong-Form Efficiency: The information set includes all information known

to all market participants (publicly available information).

• Strong-Form Efficiency: The information set includes all information known to

any market participant (private information).

One critical question one may ask is: Is Market Efficiency Testable? The answer is a

little bit tricky. Although the empirical methodology is well-established, people seems to have

some serious difficulties in interpreting its results. First, any test of efficiency must assume

an equilibrium model that defines normal security returns, which can mean market deficiency

or poorly structured models. Second, perfect efficiency is an unrealistic benchmark that

is unlikely to hold in practice. Therefore, the notion of relative efficiency is perhaps more

appropriate.

1

Chapter 2

The Predictability of Asset Returns

The authors consider the problem of forecasting future price changes, using only past price

changes to construct forecasts.

2.1 The Random Walk Hypothesis

Three versions of the random walk hypothesis:

1. IID Increments: the simplest case perhaps. The model can be written as

Pt = µ+ Pt−1 + εt, εt ∼ IID(0, σ2).

2. Independent Increments: heterskedasticity.

3. Uncorrelated Increments: an even more general version of the random walk hypothe-

sis. Note independence implies uncorrelation, but the vice versa is not correct. For

example, the process with Cov(εt, εt−k) = 0 for all k 6= 0, but Cov(ε2t , ε2t−k) 6= 0 for all

k 6= 0. Such process has uncorrelated increments, but is clearly not independent.

2.1.1 How to Test the Random Walk Hypothesis?

The authors mentioned several traditional statistical tests: sequences, reversals, and runs

for RW1; filter rules, technical analysis for RW2; serial correlation test for RW3.

2

2.2. TESTS FOR LONG-RANG DEPENDENCE 3

2.2 Tests For Long-Rang Dependence

There is one departure from the rank walk hypothesis and that is the phenomenon of long-

range dependence (or long memory in time series literature). A classic example is:

(1− L)dpt = εt, ε ∼ IID(0, σ2ε ),

where L is the lag operator. If d = 1, the above process is random walk with no drift.

If d is extended to non-integer values, the result is a well-defined time series that exhibits

long-range dependence. The series pt is stationary and invertible for d ∈ (−1/2, 1/2). But

it converges to its limit in a longer time and ordinary stationary time series.

The authors talked about the Hurst-Mandelbrot Rescaled Range Statistic or R/S

statistic: let rn denote the sample mean of r1, r2, ..., rn, then the R/S statistic is

Qn =1

sn

[max1≤k≤n

k∑j=1

(rj − rn)− min1≤k≤n

k∑j=1

(rj − rn)

],

where sn is the usual (maximum likelihood) standard deviation estimator. The asymptotic

distribution of Qn is given by the range of a Brownian bridge. Of course, modern time series

has propose many more versatile and robust test statistics, like augmented Dicky-Fuller test.

2.3 Unit Root Test

Apparently, unit root test is not the same as the random walk hypothesis. The simplest and

most widely-used tests are variants of the Dickey-Fuller tests, or DF tests. Consider the

model

yt = βyt−1 + et, et ∼ IID(0, σ2).

When β = 1, this model has a unit root. If we subtract yt−1 from both sides, we obtain

∆yt = (β − 1)yt−1 + et,

The obvious way to test the unit root hypothesis is to use test the t statistic for the hypothesis

β − 1 = 0 against the alternative that this quantity is negative. This statistic is usually

4 CHAPTER 2. THE PREDICTABILITY OF ASSET RETURNS

referred as a τ statistic. Another possible test statistic is n times the OLS estimate of

β−1. This statistic is called a z statistic. If we wish to test the unit root in a model where

the random walk has a drift, the appropriate test regression is

∆yt = γ0 + γ1t+ (β − 1)yt−1 + et,

and if we wish to test the unit root with the random walk has both a drift and a trend, the

appropriate test regression is

∆yt = γ0 + γ1t+ γ2t2 + (β − 1)yt−1 + et.

The asymptotic distributions of the Dickey-Fuller test statistics are referred to as non-

standard distributions or as Dickey-Fuller distributions.

2.4 Recent Empirical Evidence

Recent econometric advances and empirical evidence seem to suggest that financial asset

returns are predictable to some degree. The fine structure of securities markets and frictions

in the trading process can generate predictability. Time-varying expected returns due to

changing business conditions can generate predictability. A certain degree of predictability

may be necessary to reward investors for bearing certain dynamic risks.

Chapter 3

Market Microstructure

For some purposes, the market’s microstructure can be safely ignored, particularly when

longer investment horizons are involved. However, for other purposes–the measurement of

execution costs and market liquidity, the comparison of alternative market making mech-

anisms, the impact of competition and the potential for collusion among market makers–

market microstructure is central.

3.1 Nonsynchronous Trading

The nonsynchronous trading or nontrading effect arises when time series, usually asset

prices, are taken to be recorded at time intervals of one length when in fact they are recorded

at time intervals of other, possibly irregular, lengths.

For each security i in each period t, there is an unobserved or virtual continuously com-

pounded return rit. Assume there is some probability πi that security i does not trade and

whether the security trades or not is independent of rit. The observed return of security i,

r0it, depends on whether security i trades in period t. If security i does not trade in period t,

its observed return is zero–if no trades, then closing price is set to the previous period’s clos-

ing price, and hence r0it = log(pit/pit−1) = log 1 = 0. if security i does trade in period t, let

its observed return be the sum of the virtual returns in period t and in all prior consecutive

periods in which i did not trade.

5

6 CHAPTER 3. MARKET MICROSTRUCTURE

Suppose that virtual returns are governed by a one-factor linear model:

rit = µi + βift + εit, i = 1, ..., N

where ft is some zero-mean common factor. We also assume that ft is IID and is independent

of εitk for all i, t and k. We then introduce two related random variables:

δit =

1(no trade) with probability πi

0(trade) with probability 1− πi

Xit(k) ≡ (1− δit)δit−1δit−2...δit−k, k > 0

=

1(no trade) with probability πi

0(trade) with probability 1− πi

Xit(k) is an indicator variable and takes on the value one when security i trades in period t

but has not traded in any of the k previous consecutive periods, and is zero otherwise. Then,

an explicit expression for observed returns r0it:

r0it =∞∑k=0

Xit(k)rit−k i = 1, ..., N.

3.1.1 Duration of Nontrading

The duration of nontrading, which may be expressed as

kt ≡∞∑k=1

k∏j−1

δit−j

.

Note kt should not be confused with k. A more intuitive definition of kt can be given

with

r0it =kt∑k=0

rit−k i = 1, ..., N.

The mean and variance of kt are

E(kt) =πi

1− πi, Var(kt) =

πi(1− π)2

.

nontrading does not affect the mean of observed returns but does increase their variance if

the security has a nonzero expected return.

3.1. NONSYNCHRONOUS TRADING 7

Denote the r0t vector of observed returns of the N securities and define the autocovariance

matrix Γn as

Γn = E[(r0t − µ)(r0t+n − µ)>], µ ≡ E[r0t ].

Denoting the (i, j)th element of Γn by γij(n), we have

γij(n) =(1− πi)(1− πj)

1− πjπjβiβjσ

2fπ

nj .

If the nontrading probabilities πi differ across securities, Γn is asymmetric:

γij(n)

γji(n)=

(πjπi

)n.

3.1.2 Time Aggregation

Denote by r0iτ (q) the observed return of security i at time τ where one unit of τ -time is

equivalent to q units of t-time, thus

r0iτ (q) ≡τq∑

t=(τ−1)q+1

r0it.

And the first and second moments are:

E[r0iτ (q)] = qµi

Var[r0iτ (q)] = qσ2i +

2πi(1− πqi )(1− πi)2

µ2i

Cov[r0iτ (q), r0iτn(q)] = −µ2

iπ(n−1)q+1i

(1− πqi1− πi

)2

, n > 0

Expected returns time-aggregate linearly, but variances do not (so does covariance).

3.1.3 Extensions and Generalizations

The framework can be extended and generalized in many directions with little difficulty. For

example, allowing ft to be a stationary AR(1) process. Dependence can be built into the

nontrading process itself by assuming that the δit’s are Markov chain, so that the conditional

probability of trading tomorrow depends on whether or not a trade occurs today. Another

direction for further investigation can be made is the possibility of dependence between the

nontrading and virtual returns processes.

8 CHAPTER 3. MARKET MICROSTRUCTURE

3.2 The Bid-Ask Spread

One of the most important characteristics that investors look for in an organized financial

market is liquidity, the ability to buy or sell significant quantities of a security quickly,

anonymously, and with relatively little price impact. To main liquidity, many organized

exchanges use marketmakers, individuals who buy at the bid price Pb and sell at a higher

ask price Pa. The difference Pa − Pb is the bid-ask spread. The presence of the bid-ask

spread complicates matters in many ways: multiple prices, bid, ask, transaction; creating

spurious volatility and serial correlation in returns, as prices can bounce back and forth

between the ask and the bid prices.

3.2.1 Bid-Ask Bounce

Roll (1984) proposes the following simple model. Denote by P ∗t the time-t fundamental

value of a security in a frictionless economy, and denote by s the bid-ask spread. Then the

observed market price Pt may be written as

Pt = P ∗t + Its

2

It IID

+1 with probability 1/2 (buyer-initiated)

−1 with probability 1/2 (seller-initiated)

If P ∗t = P ∗ is fixed through time. Then

∆Pt = ∆P ∗t + (It − It−1)s

2= (It − It−1)

s

2.

And

Var[∆Pt] =s2

2Cov[∆Pt−1,∆Pt] = −s

2

4

indicates that the higher the spread the higher the variance or volatility.

Roll (1984) takes s as given, therefore, it is not a complete theory of the economic

determinants and the dynamics of the spread. There are three primary economic sources for

the bid-ask spread: orderprocessing costs, inventory costs, and adverse-selection costs. The

last component has received much recent attention.

3.3. MODELING TRANSACTIONS DATA 9

3.3 Modeling Transactions Data

One of the most exciting recent developments in empirical finance is the availability of

low-cost transactions databases: historical prices, quantities, bid-ask quotes and sizes, and

associated market conditions, transaction by transaction and time-stamped to the nearest

second. For example, the NYSE’s Trades and Quotes (TAQ) database contains all equity

transactions reported on the Consolidated Tape from 1992 to the present, which includes all

transactions on the NYSE, AMEX, NASDAQ, and the regional exchanges. The Berkeley

Options Database provides similar data for options transactions, and transactions databases

for many other securities.

Transactions data pose a number of unique econometric challenges that do not easily fit

into the traditional econometric framework. For example, transactions data are sampled at

irregularly spaced random intervals – whenever trades occur – and this presents a number

of problems for standard econometric models: observations are unlikely to be identically

distributed.

Several models of price discreteness were proposed to capture and explain the discrete na-

ture of the transactions data. Most of them begin with a “true” but unobserved continuous-

state price process Pt, and obtain the observed price process P 0t by discretizing Pt in some

fashion:

1. Rounding Model: P 0t = bPt

dcd or P 0

t = dPt

ded.

2. Barrier Model: the continuous-state “true” price process Pt is also a continuous-time

process, and trades are observed whenever Pt reaches certain levels or barriers.

3. The Ordered Probit Model. (probably the more up-to-date method).

Chapter 4

Event-Study Analysis

Economists are frequently asked to measure the effect of an economic event on the value of a

firm. On the surface this seems like a difficult task, but a measure can be constructed easily

using financial market data in an event study. The usefulness of such a study comes from

the fact that, given rationality in the marketplace, the effect of an event will be reflected

immediately in asset prices. Thus the event’s economic impact can be measured using asset

prices observed over a relatively short time period. In contrast, direct measures may require

many months or even years of observation.

4.1 Outline of an Event Study

At the outset it is useful to give a brief outline of the structure of an event study. While

there is no unique structure, the analysis can be viewed as having seven steps:

1. Event Definition.

2. Selection Criteria.

3. Normal and Abnormal Returns.

4. Estimation Procedure.

5. Testing Procedure.

10

4.2. MODELS FOR MEASURING NORMAL PERFORMANCE 11

6. Empirical Results.

7. Interpretation and Conclusions.

4.2 Models for Measuring Normal Performance

For the statistical models, it is conventional to assume that asset returns are jointly mul-

tivariate normal and independently and identically distributed through time. Formally, we

have

Assumption 1 Let Rt be an (N × 1) vector of asset returns for calendar time period t. Rt

is independently multivariate normally distributed with mean µ an covariance matrix Ω for

all t.

The Constant-Mean-Return model uses the following procedure. Let µi, the ith

element of µ, be the mean return for asset i. Then the constant-mean-return model is

Rit = µi + ξit, E(ξit) = 0 Var(ξit) = σ2ξi,

where Rit, the ith element of Rt, is the period-t return on security i, ξit is the disturbance

term. Although this model is perhaps the simplest model, it often yields results similar to

those of more sophisticated models.

The market model is a statistical model which relates the return of any given security

to the return of the market portfolio. For any security i we have

Rit = αi + βiRmt + εit, E(εit) = 0 Var(εit) = σ2εi,

This model represents a potential improvement over the constant-mean-return model.

Economic models restrict the parameters of statistical models to provide more constrained

normal return models. Tow common economic models which provide restrictions are the

Capital Asset Pricing Model (CAPM) and exact versions of the Arbitrage Pricing

Theory (APT). The CAPM is an equilibrium theory where the expected return of a given

asset is a linear function of its covariance with the return of the market portfolio. The APT

12 CHAPTER 4. EVENT-STUDY ANALYSIS

is an asset pricing theory where in the absence of asymptotic arbitrage the expected return

of a given asset is determined by its covariances with multiple factors.

4.3 Measuring and Analyzing Abnormal Returns

The whole procedure is very simple.

1. Run OLS.

2. Subtract the even window period, say T.

3. Compute the standardized cumulated abnormal return (CAR), which is basically a

linear combination of standardized residuals.

4. Inference about CAR can be drawn, since it is asymptotically normal.

The above analysis is based on the assumption that the abnormal returns on individual

securities are uncorrelated in the cross section. If we relax this restriction, we extend this

analysis to clustering. We can either aggregate the abnormal returns into a portfolio dated

using even time, therefore, the original procedure also applied; or we can analyze the ab-

normal returns without aggregation and use a multivariate regression model with dummy

variables for the event date.

Chapter 5

The Capital Asset Pricing Model

One of the important problems of modern financial economics is the quantification of the

tradeoff between risk and expected return. Economists were not able to quantify risk fore

the development of CAPM.

5.1 Review of the CAPM

The Sharpe and Lintner derivations of the CAPM assume the existence of lending and

borrowing at a riskfree rate of interest. For this version of the CAPM we have for th

expected return of asset i,

E(Ri) = Rf + βim[E(Rm)−Rf ]

βim =Cov(Ri, Rm)

Var(Rm)

where Rm is the return on the market portfolio, and Rf is the return on the riskfree asset.

The version can be most compactly expressed in terms of returns in excess of this riskfree

rate or in terms of excess returns. Let Zi represent the return on the ith asset in excess of

the riskfree rate, Zi ≡ Ri −Rf . Then for the SL CAPM we have

E(Zi) = βim(Zm)

βim =Cov(Zi, Zm)

Var(Zm),

13

14 CHAPTER 5. THE CAPITAL ASSET PRICING MODEL

where Zm is the excess return on the market portfolio of assets.

In the absence of a riskfree asset, Black (1972) derived a more general version of the

CAPM. In this version, the expected return of asset i in excess of the zero-beta return is

linearly related to its beta. Specifically, for the expected return of asset i, E(Ri), we have

E(Ri) = E(R0m) + βim(E(Rm)− E(R0m)).

Rm is the return on the market portfolio, and R0m is the return on the zero-beta portfolio

associated with m. Econometric analysis of the Black version of the CAPM treats the zero-

beta portfolio return as an unobserved quantity, making the analysis more complicated than

that of the SL version. The Black version can be tested as a restriction on the real-return

market model. For the real-return market model we have

E(Ri) = αim + βimE(Rm),

and the implication of the Black version is

αim = E(R0m)(1− βim) ∀i

Implementation of the model requires three inputs: the stock’s beta, the market risk

premium, and the riskfree return. The usual beta is estimate using:

Zit = αim + βimZmt + εit,

where i denotes the asset and t denotes the time period, t = 1, ..., T . Typically the Standard

and Poor’s 500 Index serves as a proxy for the market portfolio, and the US Treasury bill

rate proxies for the riskfree return.

5.2 Statistical Framework for Estimation and Testing

Initially we use the assumption that investors can borrow and lend at a riskfree rate of

return, and we consider the SL version of the CAPM. Define Zt as an N ×1 vector of excess

5.3. IMPLEMENTATION OF TESTS 15

returns for N assets. The excess returns can be described using the excess-return market

model:

Zt = α+ βZmt + εt,

where Zmt is the tim period t market portfolio excess return, and α and εt are N×1 vectors

of asset return intercepts and disturbances, respectively.

If we eliminate this assumption, we analyze the Black version. Define Rt as an N × 1

vector of real returns for N assets. For these N assets, the real-return market model is

Rt = α+ βRmt + εt,

where Rmt is the time period t market portfolio return.

5.3 Implementation of Tests

An enormous amount of literature presenting empirical evidence on the CAPM has evolved

since the development of the model in the 1960. The early evidence was largely positive.

But in the late 1970s less favorable evidence for the CAPM began to appear in the so-

called anomalies literature. Early anomalies included the price-earning-ratio effect and the

size effect. A number of other anomalies have been discovered more recently. For example,

Fama and French (1992, 1993) find that beta cannot explain the difference in return between

portfolios formed on the basis of the ratio of book value of equity to market value of equity.

Although the results in the anomalies literature may signal economically important de-

viations from the CAPM, there is little theoretical motivation for the firm characteristics

studies in this literature. This opens up the possibility that the evidence against the CAPM

is overstated because of data-snooping and sample selection biases.

Chapter 6

Multifactor Pricing Models

Empirical evidence indicating that the CAPM beta does not completely explain the cross

section of expected asset returns. This evidence suggests that one or more additional factors

may be required to characterize the behavior of expected returns and naturally leads to

consideration of multifactor pricing models.

6.1 The Arbitrage Pricing Theory (APT)

The APT was introduced by Ross (1976) as an alternative to the CAPM. The APT can be

more general than the CAPM in that it allows for multiple risk factors. Also, unlike the

CAPM, the APT does not require the identification of the market portfolio.

The APT assumes that markets are competitive and frictionless and that the return

generating process for asset returns being considered is

Ri = ai + b>i f + εi

E[εi|f ] = 0

E[ε2i ] = σ2i ≤ σ2 <∞,

where Ri is the return for asset i, ai is the intercept of the factor model, bi is a vector of

factor sensitivities for asset i, f is a vector of common factor realizations. The whole system

can be generalized to contain N assets.

16

6.2. ESTIMATION AND TESTING 17

Given this structure, Ross (1976) shows that the absence of arbitrage in large economies

implies that

µ ≈ ιλ0 +BλK ,

where µ is the N × 1 expected return vector, λ0 is the model zero-beta parameter and is

equal tot he riskfree return if such an asset exists, and λK is a K × 1 vector of factor risk

premia. For models that have exact factor pricing, we have

µ = ιλ0 +BλK .

There is some flexibility in the specification of the factors. Most empirical implementations

choose a proxy for the market portfolio as on factor.

6.2 Estimation and Testing

6.2.1 Portfolios as Factors with a Riskfree Asset

We first consider the case where the factors are traded portfolios and there exists a riskfree

asset. Define Zt as an N × 1 vector of excess returns for N assets. For excess returns, the

K-factor linear model is:

Zt = a+BZKt + εt.

B is the matrix of factor sensitivities, ZKt is the K × 1 vector of factor portfolio excess re-

turns, and a and εt are N×1 vectors of asset return intercepts and disturbances, respectively.

For the unconstrained model, the ML estimator is just the OLS estimator.

For the constrained model, with a constrained to be zero, the maximum likelihood esti-

mators are

B∗

=

[T∑t=1

ZtZ>Kt

][T∑t=1

ZKtZ>Kt

]−1

Σ∗

=1

T

T∑t=1

(Zt − B∗ZKt)(Zt − B

∗ZKt)

>.

The null hypothesis a equals zero can be tested using the likelihood ratio statistic.

18 CHAPTER 6. MULTIFACTOR PRICING MODELS

6.2.2 Portfolios as Factors without a Riskfree Asset

In the absence of a riskfree asset, there is a zero-beta model that is a multifactor equivalent

of the Black version of the CAPM. In a multifactor context, the zero-beta portfolio is a

portfolio with no sensitivity to any of the factors, and expected returns in excess of the

zero-beta return are linearly related to the columns of the matrix of factor sensitivities. The

factors are assumed to be portfolio returns in excess of the zero-beta return.

Define Rt as an vector of real returns for N assets. For the unconstrained model, we

have a K-factor linear model:

Rt = a+BRKt + εt,

whereB is the matrix of factor sensitivities, RKt is the vector of factor portfolio real returns.

For the unconstrained model, we estimate using ML in the usual way. For the constrained

model, real returns enter in excess of the expected zero-beta portfolio return γ0. We have

Rt = ιγ0 +B(RKt − ιγ0) + εt,

= (ι−Bι)γ0 +BRKt + εt.

6.3 Estimation of Risk Premia and Expected Returns

All the exact factor pricing models allow one to estimate the expected return on a given

asset. Since the expected return relation is µ = ιλ0 + BλK , one needs measures of the

factor sensitivity matrix B, the riskfree rate or the zero-beta expected return λ0, and the

factor risk premia λK . Obtaining measures of B and the riskfree rate or the expected zero-

beta return is straightforward. Further estimation is necessary to form estimates of the

factor risk premia.

In the case where the factors are the excess returns on traded portfolios, the risk premia

can be estimated directly from the sample means of the excess returns on the portfolios.

λK = µK =1

T

T∑t=1

ZKt.

6.4. SELECTION OF FACTORS 19

In the case where portfolios are factors but there is no riskfree asset, the factor risk premia

can be estimated using the difference between the sample mean of the factor portfolios and

the estimated zero-beta return:

λK = µK − ιγ0.

6.4 Selection of Factors

The selection of factors fall into two basic categories, statistical and theoretical. The statisti-

cal approaches, largely motivated by the APT, involve building factors from a comprehensive

set of asset returns. The theoretical approaches involve specifying factors based on arguments

that the factors capture economy-wide systematic risks.

For the statistical approaches, there are two major methods: factor analysis and principal

components. Estimation using factor analysis involves a two-step procedure. First the factor

sensitivity matrix B and the disturbance covariance matrix Σ are estimated and then these

estimates ar used to construct measures of the factor realizations. The second step in the

estimation procedure is to estimate the factors given B and Σ. An alternative approach is

principal components analysis. PC is a technique to reduce the number of variables being

studied without losing too much information in the covariance matrix.

The underlying theory of the multifactor models does not specify the number of factors

that re required, that is, the value of K. To select the number of factors, one approach is

to repeat the estimation and testing of the model for a variety of values of K and observe

if the tests are sensitive to increasing the number of factors. A second approach is to test

explicitly for the adequacy of K factors. An asymptotic likelihood ratio test of the adequacy

of K factors can be constructed using -2 times the difference of the value of the log-likelihood

function of the covariance matrix evaluated at the constrained and unconstrained estimators.

Theoretically based approaches for selecting factors fall into two main categories. One

approach is to specify macroeconomic and financial market variables that are thought to

capture the systematic risks of the economy. A second approach is to specify characteristics

20 CHAPTER 6. MULTIFACTOR PRICING MODELS

of firms which are likely to explain differential sensitivity to the systematic risks and then

form portfolios of stocks based on the characteristics.

6.5 Interpreting Deviations from Exact Factor Pricing

Let Zt represent the N vector of excess returns for period t. Assume Zt is stationary and

ergodic with mean µ and covariance matrix Ω that is full rank. We also take as given a set

of K factor portfolios and analyze the deviations from exact factor pricing. For the factor

model, we have

Zt = a+BZKt + εt.

Here B is the matrix of factor loadings, ZKt is the vector of time-t factor portfolio excess

returns. Now consider the case where we do not have exact factor pricing, so the tangency

portfolio cannot be formed from a linear combination of the factor portfolios. To develop

the relation between the deviations from the asset pricing model and the residual covariance

matrix. We define the optimal orthogonal portfolio, which is the unique portfolio that

can be combined with the K factor portfolios to form the tangency portfolio and is orthogonal

to the factor portfolios.

Chapter 7

Present-Value Relations

The basic framework for our analysis is the discounted-cash-flow or present-value model.

This model relates the price of a stock to its expected future cash-flows – its dividends –

discounted to the present using a constant or time-varying discount rate.

7.1 The Relation between Prices, Dividends, and Re-

turns

The net simple return on a stock is

Rt+1 ≡Pt+1 +Dt+1

Pt− 1.

An alternative measure of return is the log or continuously compounded return

rt+1 ≡ log(1 +Rt+1).

Throughout this chapter, we use lowercase letters to denote log variables. Under

the assumption that the expected stock return is equal to a constant R:

Et[Rt+1] = R,

we obtain an equation relating the current stock price to the next period’s expected stock

price and dividend:

Pt = Et

[Pt+1 +Dt+1

1 +R

].

21

22 CHAPTER 7. PRESENT-VALUE RELATIONS

After solving forward K periods we have

Pt = Et

[K∑i=1

(1

1 +R

)iDt+i

]+ Et

[(1

1 +R

)KPt+K

].

For now, we assume that last term of the above equation shrinks to zero as the horizon K

increases:

limK→∞

Et

[(1

1 +R

)KPt+K

]= 0.

For future convenience we write the expected present value as PDt:

Pt = PDt ≡ Et

[K∑i=1

(1

1 +R

)iDt+i

].

It is important to avoid two common errors in interpreting these formulas. First, note that

we have made no assumptions about equity repurchases by firms. Second the hypothesis that

the expected stock return is constant through time is sometimes known as the martingale

model of stock prices. But a constant expected stock return does not imply a martingale for

the stock price itself. Recall that a martingale for the price requires Et(Pt+1) = Pt, and

our equations above implies

Et(Pt+1) = (1 +R)Pt − Et(Dt+1).

Even though the stock price Pt is not generally a martingale, it will follow a linear process

with a unit root if the dividend Dt follows a linear process with a unit root. In this case

the expected present-value formula relates two unit-root processes for Pt and Dt. It can be

transformed to a relation between stationary variables, however, b subtracting a multiple of

the dividend from both sides of the equation. We get

Pt −Dt

R=

1

REt

[∞∑i=0

(1

1 +R

)i∆Dt+1+i

].

7.1.1 Rational Bubbles

The argument we showed above relied on the assumption that the expected discounted stock

price, K periods in the future, converges to zero as the horizon K increases. If we remove

this assumption, there is an infinite number of solutions, which can be written in the form

Pt = PDt +Bt,

7.1. THE RELATION BETWEEN PRICES, DIVIDENDS, AND RETURNS 23

where

Bt = Et

[Bt+1

1 +R

].

The term PDt is sometimes called fundamental value, and the termBt is often called a rational

bubble. The word “rational” is used because the presence of Bt is entirely consistent with

rational expectations and constant expected returns. Blanchard and Watson (1982) suggest

a bubble of the form

Bt+1 =

(1+Rπ

)Bt + ςt+1, with probability π;

ςt+1, with probability 1− π.

7.1.2 An approximate Present-Value Relation with Time-Varying

Expected Returns

It is much more difficult to work with present-value relations when expected stock returns

are time-varying, for then the relation between prices and returns becomes nonlinear. One

approach is to use a loglinear approximation, as suggested by Campbell and Shiller (1988).

The loglinear relation between prices, dividends, and returns provides an accounting frame-

work: High prices must eventually be followed by high future dividends, low future returns,

or some combination of the two, and investors’ expectations must be consistent with this,

so high prices must be associated with high expected future dividends, low expected future

returns, or some combination of the two. Thus the loglinear framework enables us to calcu-

late asset price behavior under any model of expected returns, rather than just the model

with constant expected returns.

We have

rt+1 ≡ log(Pt+1 +Dt+1)− log(Pt)

= pt+1 − pt + log(1 + exp(dt+1 − pt+1)).

The last term is a nonlinear function of the log dividend-price ratio, f(dt+1 − pt+1), which

we can expand around the mean using a first-order Taylor expansion.

24 CHAPTER 7. PRESENT-VALUE RELATIONS

7.2 Present-Value Relations and US Stock Price Be-

havior

Popular forecasting variables include ratios of price to dividends or earnings, and various

interest rate measures such as the yield spread between log and short term rates, the quality

yield spread between low and high grade corporate bonds or commercial paper, and measures

of recent changes in the level of short rates.

The interest rate variable is a transformation of the one-month nominal US Treasury bill

rate motivated by the fact that unit-root tests often fail to reject the hypothesis that the

bill rate has a unit root. However, the regression results are rather unimpressive.

7.2.1 Volatility Tests

The early papers in the volatility literature used levels of stock prices and dividends, but

current literature prefer its logarithmic form. We define a log perfect-foresight stock

price,

p∗t ≡∞∑j=0

ρj[(1− ρ)dt+1+j + k − r].

The difference between p∗t and pt is just a discounted sum of future demeaned stock returns.

p∗t − pt =∞∑j=0

ρj(rt+1+j − r).

The constant-expected-return hypothesis implies that p∗t −pt is a forecast error uncorrelated

with information known at time t. Equivalently, it implies that the stock price is a rational

expectation of the perfect-foresight stock price:

pt = Et(p∗t )

.

The above equation implies that p∗t − pt is orthogonal to information variables known

at time t. An orthogonality test regress p∗t − pt onto information variables and tests for

zero coefficients. Instead of testing orthogonality directly, much of the literature tests the

7.2. PRESENT-VALUE RELATIONS AND US STOCK PRICE BEHAVIOR 25

implications of orthogonality for the volatility of stock prices. The most famous one is the

variance inequality for the stock price.

Var[p∗t ] = Var[pt] + Var[p∗t − pt] ≥ Var[pt].

With constant expected returns the stock price forecasts only the present value of future

dividends, so it cannot be more variable than the realized present value of future dividends.

Test of this and related propositions are known as variance-bounds tests.

7.2.2 Vector Autoregressive Methods

Any VAR model can be written in first-order form by augmenting the state vector with

suitable lags of the original variables, so without loss of generality we write:

xt+1 = Axt + εt+1.

Here A is a matrix of VAR coefficients. The VAR approach strongly suggests that the stock

market is too volatile to be consistent with the view that stock prices are optimal forecasts

of future dividends discounted at a constant rate. Some VAR systems suggest that the

optimal divided forecast is close to the current dividend, others that the optimal dividend

forecast is even smoother than the current dividend; neither type of system can account for

the tendency of stock prices to move more than one-for-one with dividends.

Chapter 8

Intertemporal Equilibrium Models

This chapter relates asset prices to the consumption and savings decisions of investors. In

the real world investors consider many periods in making their portfolio decisions, and in

the intertemporal setting one must model consumption and portfolio choices simultaneously.

8.1 The Stochastic Discount Factor

We consider the intertemporal choice problem of an investor who can trade freely in asset i

and who maximizes the expectation of a time-separable utility function:

max Et

[∞∑j=0

δjU(Ct+j)

],

where δ is the time discount factor, Ct+j is the investor’s consumption in period t + j, and

U(Ct+j) is the period utility consumption at t+j. One of the first-order conditions or Euler

equations describes the investor’s optimal consumption and portfolio plan

U ′(Ct) = δEt[(1 +Ri,t+1)U′(Ct+1)]

or

1 = Et[(1 +Ri,t+1)Mt+1],

where Mt+1 = δU ′(Ct+1)/U′(Ct). The variable Mt+1 is known as the stochastic discount

factor, or pricing kernel. The unconditional version of the above equation is

1 = E[(1 +Rit)Mt],

26

8.2. CONSUMPTION-BASED ASSET PRICING WITH POWER UTILITY 27

therefore

E[1 +Rit] =1

E[Mt](1− Cov[Rit,Mt]).

If Ms is small, then state s is cheap in the sense that investors are unwilling to pay a high

price to receive wealth in state s. An asset that tends to deliver wealth in cheap states has a

return that covaries negatively with M . Such an asset is itself cheap and has a high return

on average.

8.2 Consumption-Based Asset Pricing with Power Util-

ity

We begin by assuming that there is a representative agent who maximizes a time-separable

power utility function, so that

U(Ct) =C1−γt − 1

1− γ,

where γ is the coefficient of relative risk aversion. Taking the derivative with respect to

consumption, we have

1 = Et

[(1 +Ri,t+1)δ

(Ct+1

Ct

)−γ].

When a random variable X is conditionally lognormally distributed, it has the convenient

property that

log Et[X] = Et[logX] +1

2Vart[logX],

where Vart[logX] ≡ Et[(logX − Et[logX])2]. Thus with joint conditional lognormality and

homoskedasticity of asset returns and consumption, we obtain

0 = Et[ri,t+1] + log δ − γEt[∆ct+1] +1

2[σ2i + γ2σ2

c − 2γσic].

This equation has both time-series and cross-sectional implications. In the time series, the

riskless real interest rate obeys

rf,t+1 = − log δ − γ2σ2c

2+ γEt[∆ct+1].

28 CHAPTER 8. INTERTEMPORAL EQUILIBRIUM MODELS

The riskless real rate is linear in expected consumption growth, with slope coefficient equal

tot he coefficient of relative risk aversion. This equation can be reversed to express expected

consumption growth as a linear function of the riskless real interest rate.

8.3 Market Frictions

If investors face transactions costs or limits on their ability to borrow or sell assets short,

then they may have only a limited ability to exploit the empirical patterns in returns.

If asset i cannot be sold short, then the standard equality restriction E[(1 +Rit)Mt] = 1

must be replaced by an inequality restriction

E[(1 +Rit)Mt] ≤ 1.

If the inequality i strict, then an investor would like to sell the asset but is prevented from

doing so by the shortsales constraint. Instead, the investor holds a zero position in the asset.

Investors may also face borrowing constraints that limit their ability to seel assets to finance

consumption today. In the presence of shortsales constraints, the vector equality is replaced

by another vector equality

θ = E[(ι+Rt)Mt],

where θ is an unknown vector. The model implies various restrictions on θ such as the

restriction that θi ≤ 1 for all i.

The same sorts of frictions may make aggregate consumption an inadequate proxy for

the consumption of stock market investors.

8.4 More General Utility Functions

One straightforward response to the difficulties of the standard consumption CAPM is to

generalize the utility function. For example, the utility function may be nonseparable in

consumption and some other good. This is easy to handle in a loglinear model if utility is

Cobb-Douglas.

8.4. MORE GENERAL UTILITY FUNCTIONS 29

Constantinides (1990) and Sundaresan (1989) have argued for the importance of habit

formation, a positive effect of today’s consumption on tomorrow’s marginal utility of con-

sumption. The first issue is the form of the utility function. Abel (1990, 1996) has proposed

that U(·) should be a power function of the ratio Ct/Xt, while some others have used a power

function of the difference Ct −Xt. The second issue is the effect of an agent’s own decisions

on future levels of habit. In standard internal-habit models, habit depends on an agent’s

own consumption and the agent takes account of this when choosing how much to consume.

In external-habit models, habit depends on aggregate consumption which is unaffected by

any one agent’s decisions. Abel calls this catching up with the Joneses. The third issue is

the speed with which habit reacts to individual or aggregate consumption. Some make habit

depend on one lag of consumption and some make habit react only gradually to changes in

consumption.

Chapter 9

Derivative Pricing Models

The pricing of options, warrants, and other derivative securities–financial securities whose

payoffs depend on the prices of other securities–is one of the grate successes of modern

financial economics. Based on the well-known Law of One Price or no-arbitrage condition,

the option pricing models of Black and Scholes (1973) and Merton (1973b) gained an almost

immediate acceptance among academics and investment professionals that is unparalleled in

the history of economic science.

Ironically, although pricing derivative securities is often highly computation-intensive,

in principle it leaves very little room for traditional statistical inference since, by the very

nature of the no-arbitrage pricing paradigm, there exists no “error term” to be minimized

and no corresponding statistical inference.

However, there are at least two aspects of the implementation of derivative pricing models

that do involve statistical inference. First, the problem of estimating the parameters of

continuous-time rice processes which are inputs for parametric derivative pricing formulas.

Second, the pricing of path-dependent derivatives by Monte Carlo simulation.

9.1 A Brief Review of Derivative Pricing Methods

Denote by G(P (t), t) the price at time t of a European call option with strike price X and

expiration date T > t on a stock with price P (t) at time t. In addition, BS (1973) make the

30

9.1. A BRIEF REVIEW OF DERIVATIVE PRICING METHODS 31

following assumptions:

A1. There are no market imperfections, e.g., taxes, transactions costs, shortsales con-

straints, and trading is continuous and frictionless.

A2. There is unlimit riskless borrowing and lending at the continuously compounded rate

of return r. Alternatively, if D(t) is the date t price of a discount bond maturing at

date T with face value $1, then for t ∈ [0, T ] the bond price dynamics are given by

dD(t) = rD(t)dt.

A3. The stock price dynamics are given by a geometric Brownian motion, the solution to

the following Ito stochastic differential equation on t ∈ [0, T ]:

dP (t) = µP (t)dt+ σP (t)dB(t), P (0) = P0 > 0,

B(t) is a standard Brownian motion, and at least one investor observes σ without error.

A4. There is no arbitrage.

The goal is to narrow down the possible expressions for G, with the hope of obtaining a

specific formula for it. We first derive the dynamics of the option price by assuming that G

is only a function of the current stock price P and t itself and applying Ito’s Lemma to the

function G, we have

dG = µgGdt+ σgGdB(t),

where

µg ≡1

G

[µP

∂G

∂P+∂G

∂t+σ2

2

∂2G

∂P 2

]σg ≡

1

G

[σP

∂G

∂P

].

Then, we set µg equal to some required rate of return r0. With such an r0 is identified, the

condition reduces to a PDE which, under some regularity and boundary conditions, possesses

a unique solution.

32 CHAPTER 9. DERIVATIVE PRICING MODELS

9.2 Implementing Parametric Option Pricing Models

Because there are so many different types of options and other derivative securities, it is

virtually impossible to describe a completely general method for implementing all derivative-

pricing formulas.

Let us asset that the specific form of the stock price process P (t) is known up to a

vector of unknown parameters θ which lies in some parameter space, and that it satisfies

the following stochastic differential equation:

dP (t) = a(P, t : α)dt+ b(P, t;β)dB(t), t ∈ [0, T ],

where B is a standard Wiener process and θ ≡ [α>β>]> is a vector of unknown parameters.

The functions a and b are called the drift and diffusion functions. For option-pricing purposes,

what concerns us is estimating θ, since pricing formulas for options on P (t) will invariably

be functions of some or all of the parameters in θ. We can use MLE or GMM to estimate θ.

9.2.1 Implied Volatility Estimators

Because implied volatilities are linked directly to current market prices, some investment

professionals have argued that they are better estimators of volatility than estimators based

on historical data such as σ2. Implied volatilities are often said to be “forward looking”.

However, such an argument overlooks the fact that an implied volatility is intimately related

to a specific parametric option-pricing model–typically the BS model–which, in turn, is

intimately related to a particular set of dynamics for the underlying stock price. Using

implied volatility of one option to obtain a more accurate forecast of volatility to be used in

pricing other options is either unnecessary or logically inconsistent.

9.3. PRICING PATH-DEPENDENTDERIVATIVES VIAMONTE CARLO SIMULATION33

9.3 Pricing Path-Dependent Derivatives Via Monte Carlo

Simulation

For a contract, of which the strike price depends on the path that the stock price takes

from 0 to T, and not just on the terminal stock price P (T ), such a contract is called a

path-dependent option. Path-dependent options have become increasingly popular as the

hedging needs of investors become ever more complex. Path-dependent options may be

priced by the dynamic-hedging approach, but the resulting PDE is often intractable. The

risk-neutral pricing method offers a considerably simpler alternative in which the power of

high-speed digital computers may be exploited. If P (t) denotes the date t stock price and

H(0) is the initial value of this put, we have

H(0) = e−rTE∗[

max0≤t≤T

P (t)− P (T )

]= e−rTE∗

[max0≤t≤T

P (t)]− P (0)

].

To evaluate the above equation via Monte Carlo simulation, we simulate many sample paths

of P (t), find the maximum value for each sample path and average the present discounted

value of the maxima over al the replications to yield an expected value over all replications.

The Monte Carlo approach to pricing path-dependent options is quite general and may be

applied to virtually any European derivative security. However, there are several important

limitations to this approach that should be emphasized. First, the Monte Carlo approach

may only be applied to European options, options that cannot be exercised early. Second,

to apply the Cox-Ross technique to a given derivative security, we must first prove that the

security can be priced by arbitrage considerations alone. Also, there are situations where the

derivative security cannot be replicated by any dynamic strategy involving existing securities.

For example, if we assume that the diffusion parameter σ is stochastic, then it may be shown

that without further restrictions on σ there exists no nondegenerate dynamic trading strategy

involving stocks, bonds, and options that is riskless. Therefore, before we can apply the risk-

neutral pricing methods to a particular derivative security, we must first check that it is

spanned by other traded assets.

Chapter 10

Fixed-Income Securities

We study bonds that have no call provisions or default risk, so that their payments are fully

specified in advance. Such bonds deserve the name fixed-income securities that is often used

more loosely to describe bonds whose future payments are in fact uncertain. In the US

markets, almost all true fixed-income securities are issued by the US Treasury.

10.1 Basic Concepts

In principle a fixed-income security can promise a stream of future payments of any form,

but there are two classic cases. Zero-coupon bonds, also called discount bonds, make a

single payment at a date in the future known as the maturity date. The size of this payment

is the face value of the bond. The length of time to the maturity date is the maturity of the

bond. US Treasury bills take this form. Coupon bonds make coupon payments of a given

fraction of face value at equally spaced dates up to and including the maturity date, when

the face value is also paid. US Treasury notes and bonds take this form. Coupon payments

on Treasury notes and bonds are made every six months, but the coupon rates for these

instruments are normally quoted at an annual rate.

34

10.1. BASIC CONCEPTS 35

10.1.1 Discount Bond

For discount bond, the yield to maturity on a bond is that discount rate which equates the

present value of the bond’s payments to its price. If Pnt is the time t price of a discount bond

that makes a single payment of $1 at time t+ n, and Ynt is the bond’s yield to maturity, we

have

Pnt =1

(1 + Ynt)n.

It is common in the empirical finance literature to work with log variables,

ynt = − 1

npnt.

The yield spread Snt = Ynt − Y1t, or in log terms is the difference between the yield on

an n-period bond and the yield on a one-period bond, a measure of the shape of the term

structure. The yield curve is a plot of the term structure, that is a plot of Ynt or ynt against

n on some particular date t.

The holding-period return on a bond is the return over some holding period less than

the bond’s maturity. We define Rn,t+1 as the one-period holding-period return on an n-period

bond purchased at time t and sold at time t + 1. Since the bond will be an (n − 1) period

bond when it is sold, the sale price is Pn−1,t+1 and the holding-period return is

(1 +Rn,t+1) =Pn−1,t+1

Pnt=

(1 + Ynt)n

(1 + Yn−1,t+1)n−1.

Bonds of different maturities can be combined to guarantee an interest rate on a fixed-

income investment to be made in the future; the interest rate on this investment is called a

forward rate. The forward rate is defined to be the return on the time t+ n investment of

Pn+1,t/Pnt:

(1 + Fnt) =1

(Pn+1,t/Pnt)=

(1 + Yn+1,t)n+1

(1 + Ynt)n.

10.1.2 Coupon Bonds

Let C be the coupon rate per period. The per-period yield to maturity on a coupon bond,

Ycnt is defined as that discount rate which equates the present value of the bond’s payments

36 CHAPTER 10. FIXED-INCOME SECURITIES

to its price Pcnt, so we have

Pcnt =C

(1 + Ycnt)+

C

(1 + Ycnt)2+ · · ·+ 1 + C

(1 + Ycnt)n.

For coupon bonds, maturity is an imperfect measure of this length of time because much

of a coupon bond’s value comes from payments that are made before maturity. Macaulay’s

duration is intended to be a better measure.

Dcnt =C∑n

i=1i

(1+Ycnt)i+ n

(1+Ycnt)n

Pcnt.

Many financial intermediaries have long-term zero-coupon liabilities, such as pension

obligations, and they may wish to match or immunize these liabilities with coupon-bearing

Treasury bonds. The classic immunization problem is that of finding a coupon bond or port-

folio of coupon bonds whose return has the same sensitivity to small interest-rate movements

as the return on a given zero-coupon bond. Alternatively, one can try to find a portfolio of

coupon bonds whose cash flows exactly match those of a given zero-coupon bond.

If a complete coupon term structure – the prices of coupon bonds Pc1...Pcn maturing at

each coupon date is available, we can find Pn as

Pn =Pcn − Pn−1C − ...− P1C

1 + C.

Sometimes the coupon term structure may be more-than-complete in the sense that at least

one coupon bond matures on each coupon date and several coupon bonds mature on some

coupon dates. In that case, it makes sense to add a bond-specific error term and estimating

it as a cross-sectional regression with all the bonds outstanding at a particular date. If these

bonds are indexed i = 1, ..., I, then the regression is

Pcini= P1Ci + P2C2 + ...+ Pni

(1 + Ci) + ui, i = 1, ..., I,

where Ci is the coupon on the ith bond and ni is the maturity of the ith bond.

10.2 Interpreting the Term Structure of Interest Rates

There is a large empirical literature which tests statements about expected-return relation-

ships among bonds without deriving these statements from a fully specified equilibrium

10.2. INTERPRETING THE TERM STRUCTURE OF INTEREST RATES 37

model. The most popular simple model of the term structure is known as the expectations

hypothesis. We distinguish the pure expectations hypothesis (PEH), which says that

expected excess returns on long-term over short-term bonds are zero, from the expectations

hypothesis (EH), which says that expected excess returns are constant over time.

A first form of the PEH equates the one-period expected returns on one-period and

n-period bonds.

(1 + Y1t) = Et[1 +Rn,t+1] = (1 + Ynt)nEt[(1 + Yn−1,t+1)

−(n−1)].

A second form of the PEH equates the n-period expected returns on one-period and n-period

bonds:

(1 + Y1t)n = Et[(1 + Y1t)(1 + Y1,t+1)...(1 + Y1,t+n−1)].

Most empirical research uses neither the one-period form of the PEH nor the n-period form,

but a log form of the PEH that equates the expected log returns on bonds of all maturities:

E[rn,t+1 − y1t] = 0.

The EH is more general than the PEH in that it allows the expected returns on bonds of

different maturities to differ by constants, which can depend on maturity but not on time.

The differences between expected returns on bonds of different maturities are sometimes

called term premia. The PEH says that term premia are zero, while the EH says that they

are constant through time.

Chapter 11

Term-Structure Models

This chapter explores the large modern literature on fully specified general-equilibrium mod-

els of the term structure of interest rates.

11.1 Affine-Yield Models

To keep matters simple, we assume throughout this section that the distribution of the

stochastic discount factor Mt+1 is conditionally lognormal. We specify models in which

bond prices are jointly lognormal with Mt+1. We obtain

pnt = Et[mt+1 + pn−1,t+1] + (1/2)Vart[mt+1 + pn−1,t+1].

We first consider the case which mt+1 is homoskedastic, that is

−mt+1 = xt + εt+1.

We assume that ε is normally distributed with constant variance. Next we assume that xt+1

follows the simplest interesting time-series process, a univariate AR(1) process with mean µ

and persistence φ. The shock to xt+1 is written ξt+1:

xt+1 = (1− φ)µ+ φxt + ξt+1.

The innovations may be correlated:

εt+1 = βξt+1 + ηt+1.

38

11.1. AFFINE-YIELD MODELS 39

The presence of the uncorrelated shock ηt+1 only affects the average level of the term structure

and not its average slope or its time-series behavior.

We now guess that the form of the price function for an n-period bond is

−pnt = An +Bnxt.

Since the n-period bond yield ynt = −pnt/n, we are guessing that the yield on a bond of

any maturity is linear or affine in the state variable xt. We then formalize the model using

the guess and verify procedure. The coefficient Bn measures the fall in the log price of an

n-period bond when there is an increase in the state variable xt or equivalently in the one-

period interest rate y1t. It therefore measures the sensitivity of the n-period bond return to

the one-period interest rate. A second implication of the model is that the expected log excess

return on an n-period bond over a one-period bond, Et(rn,r+1)−y1t = Et[pn−1,t+1]−pnt+p1t,

is given by

Et(rn,r+1)− y1t = −Bn−1βσ2 −B2

n−1σ2/2.

The homoskedastic bond pricing model also has implications for the pattern of forward rates,

and hence for the shape of the yield curve.

fnt = pnt − pn+1,t

= y1t + (Et[rn+1,t+1]− y1t)− (Et[pn,t+1]− pnt).

The homoskedastic model is appealing because of its simplicity, but it has several unattrac-

tive features. First, it assumes that interest rate changes have constant variance. Second,

the model allows interest rates to go negative. This makes it applicable to real interest

rates, but less appropriate for nominal interest rates. Third, it implies that risk premia are

constant over time. The square-root model states

−mt+1 = xt + x1/2t βξt+1.

The new element here is that the shock ξt+1 is multiplied by x1/2t . Therefore, the estimation

results are all proportional to the state variable xt, which generates a heteroskedastic model.

So far we have only considered single-factor models. Such models imply that all bond

returns are perfectly correlated. While bond returns do tend to be highly correlated, their

40 CHAPTER 11. TERM-STRUCTURE MODELS

correlations are certainly not one and so it is natural to ask how this implication can be

avoided. We now present a simple model in which there are two factors rather than one, so

that bond returns are no longer perfectly correlated.

−mt+1 = x1t + x2t + x1/21t εt+1,

where

x1,t+1 = (1− φ1)µ1 + φ1x1t + x1/21t ξ1,t+1

x2,t+1 = (1− φ2)µ2 + φ2x2t + x1/22t ξ2,t+1

The relation between the shocks is

εt+1 = βξ1,t+1,

and the shocks ξ1,t+1 and ξ2,t+1 are uncorrelated with each other. We can analyze this model

in the usual way.

11.2 Fitting Term-Structure Models to the Data

All the models we have discussed so far need additional error terms if they are

to fit the data.

11.3 Pricing Fixed-Income Derivative Securities

One of the main reasons for the explosion of interest in term-structure models is the practical

need to price and hedge fixed-income derivative securities.

In pricing fixed-income derivative securities it may be desirable to have a model that

does fit the current term structure exactly. A simple approach is to break observed forward

rates fnt into two components:

fnt = fant + f bnt,

11.3. PRICING FIXED-INCOME DERIVATIVE SECURITIES 41

where fant is the forward rate implied by a standard tractable model and fant is the residual.

Although this procedure works well in any one period, there is nothing to ensure that it will

be consistent from period to period. It is also important to understand that fitting one set

of asset prices exactly does not guarantee that a model will fit other asset prices accurately.

A particularly simple kind of derivative security is a forward contract. An n-period

forward contract, negotiated at time t on an underlying security with price St+n at time

t + n, specifies a price at which the security will be purchased at time t + n. Thus the

forward price, which we write Gnt is determined at time t but no money changes hands until

time t+ n. Cox, Ingersoll, and Ross show that the forward price Gnt is the time t price of a

claim to a payoff of St+n/Pnt at time t + n. They establish this proposition using a simple

arbitrage argument. They consider the following investment strategy: At time t, take a long

position in 1/Pnt forward contracts and put Gnt into n-period bonds. By doing this one can

purchase Gnt/Pnt bonds. The payoff from this strategy at time t+ n is

1

Pnt[St+n −Gnt] +

Gnt

Pnt=St+nPnt

,

where the first term is the profit or loss on the forward contracts and the second term is the

payoff on the bonds. It can also be stated using stochastic-discount-factor notation as

Gnt = Et[Mn,t+nSt+n/Pnt],

where the n-period stochastic discount factor Mn,t+n is the product of n successive one-period

stochastic discount factors: Mn,t+n ≡Mt+1, ...,Mt+n.

A futures contract differs from a forward contract in one important respect: It is

marked to market each period during the life of the contract, so that the purchaser of a

futures contract receives the futures price increase or pays the futures price decrease each

period. If we write the price of an n-period futures contract s Hnt, then we have

Hnt = Et[Mt+1Hn−1,t+1/P1t].

Consider the following investment strategy: At time t, take a long position in 1/P1t futures

contracts and put Hnt into one-period bonds. By doing this one can purchase Hnt/P1t bonds.

At time t+ 1, liquidate the future contacts. The payoff from this strategy at time t+ 1 is

1

P1t

[Hn−1,t+1 −Hnt] +Hnt

P1t

=Hn−1,t+1

P1t

.

42 CHAPTER 11. TERM-STRUCTURE MODELS

Suppose one wants to price a European call option written on an underlying security with

price St. If the option has n periods to expiration and exercise price X, then its terminal

payoff is max(St+n −X, 0). Writing the option price as Cnt(X), we have

Cnt(X) = Et[Mn,t+nSt+n|St+n ≥ X]−XEt[Mn,t+n|St+n ≥ X].

In general, we can evaluate the above equation using numerical methods.

Chapter 12

Nonlinearities in Financial Data

The econometric methods we discuss in this text are almost all designed to detect linear

structure in financial data. However, many aspects of economic behavior may not be linear.

12.1 ARCH, GARCH

The concept of autoregressive conditional heteroskedasticity, or ARCH, was intro-

duced by Engle (1982). The basic idea of ARCH models is that the variance of the error

term at time t depends on the realized values of the squared error terms in previous time

periods. Let ut denotes the error term and Ωt−1 denotes an information set that consists of

data observed through period t− 1, an ARCH(q) process can be written as

ut = σtεt; σ2t ≡ E(u2t |Ωt−1) = α0 +

q∑i=1

αiu2t−i,

where αi > 0 for i = 0, 1, ..., q and εt is white noise with variance 1. The above function is

clearly autoregressive. Since this function depends on t, the model is also heteroskedastic.

Also, the variance of ut is a function of ut−1 through σt, which means the variance of ut

is conditional on the past of the process. That is where the term conditional came from.

The error term ut and ut−1 are clearly dependent. They are, however, uncorrelated. Thus,

ARCH process involves only heteroskedasticity, but not serial correlation. The original

ARCH process has not proven to be very satisfactory in applied work.

43

44 CHAPTER 12. NONLINEARITIES IN FINANCIAL DATA

In fact, the ARCH model became famous because of its descendent: the generalized

ARCH model, which was proposed by Bollerslev (1986). We may write a GARCH(p, q)

process a

ut = σtεt; σ2t ≡ E(u2t |Ωt−1) = α0 +

q∑i=1

αiu2t−i +

p∑i=1

δiσ2t−j,

The conditional variance here can be written more compactly as

σ2t = α0 + α(L)u2t + δ(L)σ2

t .

The simplest and by far the most popular GARCH model is the GARCH(1, 1) process, for

which the conditional variance can be written as

σ2t = α0 + α1u

2t−1 + δ1σ

2t−1.

Unlike the original ARCH model, the GARCH(1, 1) process generally seems to work quite

well in practice. More precisely, GARCH(1, 1) cannot be rejected against any more general

GARCH(p, q) process in many cases.

12.1. ARCH, GARCH 45

There are two possible methods to estimate the ARCH and GARCH models:

(1) Feasible GLS: since ARCH and GARCH processes induce heteroskedasticity, it might

seem natural to use feasible GLS. However, this approach is very very rarely used,

because it is not asymptotically efficient. In case of a GARCH(1, 1), σ2t depends on

u2t−1 which in turn depends on the estimates of the regression function. Because of

this, estimating the following function together yields more efficient estimates

yt = X tβ + ut

σ2t = α0 + α1u

2t−1 + δ1σ

2t−1.

(2) MLE: the most popular way to estimate GARCH models is to assume that the error

terms are normally distributed and use ML method. To do that, we first write a linear

regression model with GARCH errors defined in terms of a normal innovation process

asyt −X tβ

σt(β,θ)= εt, εt ∼ N(0, 1).

The density of yt conditional on Ωt−1 is then

1

σt(β,θ)φ

(yt −X tβ

σt(β,θ)

),

where φ(·) denotes the standard normal density. Therefore, the contribution to the

loglikelihood function made by the tth observation is

lt(β,θ) = −1

2log 2π − 1

2log(σ2t (β,θ)

)− 1

2

(yt −X tβ)2

σ2t (β,θ)

.

This function is not easy to calculate due to the skedastic function σ2t (β,θ). It is

defined implicitly by the recursion

σ2t = α0 + α1u

2t−1 + δ1σ

2t−1

and there is no good starting values for σ2t−1. An ARCH(q) model does not have the

lagged σ2t term, therefore, does not have such problem. We can simply use the first

q observations to compute the squared residuals so as to form the skedastic function

σ2t (β,θ). For the starting values of lagged σ2

t , there are some popular ad hoc proce-

dures:


(a) Set all unknown pre-sample values of u2t and σ2t to zero.

(b) Replace them by an estimate of their common unconditional expectation: an

appropriate function of the θ parameters, or use the SSR/n from OLS estimation.

(c) Treat the unknown starting values as extra parameters.

Anyway, different procedures can produce very different results. For STATA or any

black-box programs, the users should know what the packages are actually doing.

12.2 Nonparametric Estimation

Nonparametric regression is a form of regression analysis in which the predictor does

not take a predetermined form but is constructed according to information derived from

the data. Nonparametric regression requires larger sample sizes than regression based on

parametric models because the data must supply the model structure as well as the model

estimates.

12.2.1 Kernel Estimation

One traditional way of estimating a PDF is to form a histogram. Given a sample xt,

t = 1, ..., n, of independent realizations of a random variable X, for any arbitrary argument

x, the empirical distribution function (EDF) is

F (x) =1

n

n∑t=1

I(xt ≤ x).

The indicator function I is clearly discontinuous, which makes the above EDF discontinuous.

In practice and theory, we always prefer to have a smooth EDF for various reasons, for

example, differentiation. For these reasons, we replace I with a continuous CDF, K(z),

with mean 0. This function is called a cumulative kernel. It is convenient to be able to

control the degree of smoothness of the estimate. Accordingly, we introduce the bandwidth

parameter h as a scaling parameter for the actual smoothing distribution. This gives the

12.2. NONPARAMETRIC ESTIMATION 47

kernel CDF estimator

Fh(x) =1

n

n∑t=1

K

(x− xth

). (12.1)

There are many arbitrary kernels we can choose and a popular one is the standard normal

distribution or say the Gaussian kernel. If we differentiate equation (12.1) with respect to

x, we obtain the kernel density estimator

fh(x) =1

nh

n∑t=1

k

(x− xth

).

This estimator is very sensitive to the value of the bandwidth h. Two popular choices for h

are h = 0.1059sn−1/5 and h = 0.785(q.75− q.75)n−1/5, where s is the standard deviation of xt

and q.75 − q.75 is the difference between the estimated .75 and .25 quantiles of the data.

12.2.2 Nonparametric Regression

The nonparametric regression estimates E(yt|xt) directly, without making any assumptions

about functional form. We suppose that two random variables Y and X are jointly dis-

tributed, and we wish to estimate the conditional expectation µ(x) ≡ E(Y |x) as a function

of x, using a sample of paired observations (yt, xt) for t = 1, ..., n. For given x, we define

g(x) ≡∫ ∞−∞

yf(y, x)dy = f(x)

∫ ∞−∞

yf(y|x)dy = f(x)E(Y |x),

where f(x) is the marginal density of X, and f(y|x) is the density of Y conditional on X = x.

Then,

µ(x) =g(x)

f(x)=

∫ ∞−∞

yf(x, y)

f(x)dy.

We use kernel density estimation for the joint distribution f(x, y) and f(x) with a kernel k,

f(x, y) =1

nhhy

n∑i=1

k

(x− xih

)k

(y − yihy

), f(x) =

1

nh

n∑i=1

k

(x− xih

)Therefore,∫ ∞−∞

yf(x, y)dy =1

nhhy

n∑i=1

k

(x− xih

)∫ ∞−∞

yk

(y − yihy

)dy

=1

nhhy

n∑i=1

k

(x− xih

)∫ ∞−∞

(yi + hyv)k(v)(hy)dv =1

nh

n∑i=1

k

(x− xih

)yi


Finally, we obtain the so-called Nadaraya-Watson estimator,

µ(x) =

∑nt=1 ytkt∑nt=1 kt

, kt ≡ k

(x− xth

).

12.3 Artificial Neural Networks

An alternative to nonparametric regression that has received much recent attention in the

engineering and business communities is the artificial neural network (ANN). ANN may be

viewed as a nonparametric technique. However, because initially they drew their motiva-

tion from biological phenomena–in particular, from the physiology of nerve cells–they have

become part of a separate, distinct, and burgeoning literature.

The simplest example of an artificial neural network is the binary threshold model, in

which an output variable Y taking on only the values zero and one is nonlinearly related to

a collection of J input variables Xj, j = 1, ..., J in the following way

Y = g

(J∑j=1

βjXj − µ

)

g(u) =

1 if u ≥ 0

0 if u < 0.

Each input Xj is weighted by a coefficient βj, called the connection strength, and then

summed across all inputs. Generalizations of the binary threshold model form the basis of

most current applications of artificial neural network models.

Despite the many advantages that learning networks possess for approximating nonlinear

functions, they have several important limitations. In particular, there are currently no

widely accepted procedures for determining the network architecture in a give application.

Difficulties also arise in training the network. Finally, traditional techniques of statistical

inference such as significance testing cannot always be applied to network models because

of the nesting of layers.

Nonlinearities are clearly playing a more prominent role in financial applications, thanks

to increases in computing power and the availability of large datasets. Despite the flexibility

12.3. ARTIFICIAL NEURAL NETWORKS 49

of the nonlinear models we have considered, they do have some serious limitations. They are

typically more difficult to estimate precisely, more sensitive to outliers, numerically less sta-

ble, and more prone to overfitting and data-snooping biases than comparable linear models.

Contrary to popular belief, nonlinear models require more economic structure and a priori

considerations, not less. However, nonlinearities are often a fact of economic life, and for

many financial applications the sources and nature of nonlinearity can be readily identified or

at the very least, characterized in some fashion. In such situations, the techniques described

in this chapter are powerful additions to the armory of the financial econometrician.

Date post:	06-Mar-2018
Category:	Documents
Upload:	buiquynh
View:	302 times
Download:	2 times

Sample Lecture Notes for Graduate Level Financial ... The_Econometrics of Financial... · Sample...

Documents