Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.

Post on 18-Jan-2016

217 views 0 download

Tags:

transcript

Bootstrap Event Study Tests

Peter Westfall

ISQS Dept.Joint work with Scott Hein, Finance

An Example of an “Event”Daily DJIA Returns

-8

-6

-4

-2

0

2

4

29-Apr-01 18-Jun-01 7-Aug-01 26-Sep-01

Date

% C

han

ge

Event (Outlier) Detection

• Main Idea: y0 is an “outlier” if it is unusual with respect to “typical circumstances”.

• Definitions:– Critical value: The threshold c that y0 must

exceed to be called an outlier– level: The probability that Y0 exceeds c

under typical circumstances– p-value: The probability that Y0 exceeds the

particular observed value y0 under typical circumstances

Case 1: Normal distribution, known mean (), known variance (2).

Let

0( ).

YZ

Y0 is associated with an “event” if Z is large.

Critical and p-values are from Z distribution.

Ex: y0 = -7.13, =-.15, =1.0 Z=-6.98.

=.05 critical value: Z/2 = 1.96.

p-value = 2P(Z<-6.98) = 3E-12

Case 2: Normal distribution, unknown known 2.

Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then

22 2

0

0

1( ) 1

11

Var Y Yn n

Y YZ

n

Case 3: Normal distribution, unknown unknown 2.

Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then

20

1

1, where ( )

111

n

ii

Y YT s Y Y

ns

n

Critical and p-values are from tn-1 distribution.

Example: n=87, y0 = -7.13, =-.14, s=1.013 T=-6.86.

=.05 critical value: t87-1,/2 = 1.99.

p-value = 2P(T87<-6.86) =1E-9

Y

0 1

1

0

0, for 0, where 0,1,2,..., , and where

1, for 0

Matrix form of model:

1 0

, where ,1 0

1 1

Regression Method for Event (outlier) detection

i i i i

n

iY X i n X

i

Y

Y X Y XY

Y

1

0

1

0

1

1

0 01 0

00 0

, , .

ˆLeast Squares Estimates: ( ' ) '

1 1 1 11.

1 1 1 1

n

n n

i ii i

X X X Y

Y Y Y Yn Y

nn Y YY Y

0 1

2 2 11 2 0

2 21

Goal: Test : 0.

ˆAssuming , , , , are i.i.d. (0, ), then ( ) ( ' )

1 11 1ˆ, implying that ( ) 1 . 1 1

Further,

Regression-Based Test for Event (outlier)

n

H

N Cov X X

Varnn n

2

0

2 20 0

1

2 2

1

1 ˆ ( )( 1) 2

1{ ( 0)} ( ( ))}

( 1) 2

1( ) .

( 1)

Thus, the regression t-statistic is identical, and the degrees of freedom are

identical (n-1), t

n

i ii

n

ii

n

ii

MSE Y Yn

Y Y Y Y Y Yn

Y Y sn

o the case of testing with normal distribution, unknown

mean, and unknown variance (case 3).

Notes

• The method is essentially asking, “how far into the tail of the typical distribution is y0”?

(Estimation of the mean just gives a minor correction: (1+ 1/n) in the variance formula;

Estimation of the variance gives another minor correction: Tn-1 instead of Z critical and p-values)

• The central limit theorem does not apply since we are concerned with the distribution of Y0, not the distribution of .Y

The Distribution of (Y0-)/

Uniform Distribution: Lower(.05/2)=-1.645, Upper(.05/2)=1.645

0

0.5

1

-3 -2 -1 0 1 2 3

Exponential distribution:Lower(.05/2)=-0.97, Upper(.05/2)=2.67

0

0.5

1

-3 -2 -1 0 1 2 3

Case 1A: Known Distribution

• Exact critical values for Z arecL = {/2 quantile of distribution of Z}

cU = {1-/2 quantile of distribution of Z}

• Exact P-Value: p-value = 2 min{ P(Z z), P(Z z) }

A Simulation-Based Approach

• Simulate “many” (1,000s) of Z’s at random from the pdf

• Critical values:– cL is the 100(/2) percentile of the simulated data– cU is the 100(1-/2) percentile of the simulated data

• P-value:– pL = {proportion of simulated Z’s that are smaller than

z.– pU = proportion of simulated Z’s that are larger than z.– P-value = 2{min(pL, pU)}.

Case 1B: Unknown Distribution

• Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then the empirical pdf approximates the true pdf if n is large (Glivenko-Cantelli Theorem).

• Thus, approximate critical and p-values can be obtained by using the empirical distribution.

• This is the essential nature of the “bootstrap.”

Case 1B.i: Simulation-Based Approach with known ,

Simulate 1000’s of values of Z = (Y0 – )/ as follows:

1. Select a value Y01 at random from the observed data Y1,…,Yn ; let Z1 = (Y01 – )/

2. Select a value Y02 at random from the observed data Y1,…,Yn ; let Z2 = (Y02 – )/

B. Select a value Y0B at random from the observed data Y1,…,Yn ; let ZB = (Y0B – )/

Use the simulated data Z1,…,ZB to determine critical and p-values.

Case 1B.ii: Unknown ,

• Use the statistic

• The distribution of the statistic depends on the randomness inherent in

20

1

1, where ( )

1

n

ii

Y YT s Y Y

s n

and Y s

Case 1B.ii: Simulation-Based Approach

0

11 1 01 1

1 01 1 1 1 1 11 1

Simulate 1000's of values of ( ) / as follows:

1. Select a sample , , , at random from the observed data , , ;

let ( ) / , where , are computed from , , .

2. Select a

n n

n

T Y Y s

Y Y Y Y Y

T Y Y s Y s Y Y

12 2 02 1

2 02 2 2 2 2 12 2

1 0 1

sample , , , at random from the observed data , , ;

let ( ) / , where , are computed from , , .

. Select a sample , , , at random from the observed data , ,

n n

n

B nB B n

Y Y Y Y Y

T Y Y s Y s Y Y

B Y Y Y Y Y

0 1

1

;

let ( ) / , where , are computed from , , .

Use the simulated data , , to determine critical and p-values.

B B B B B B B nB

B

T Y Y s Y s Y Y

T T

0 1

0 1

Let , ,..., , be the sample residuals from the regression model

0, for 0, where 0,1,2,..., , and where

1, for 0

(i.e.,

Regression Method for Event (outlier) detection

n

i i i i

e e e

iY X i n X

i

0 1

11 1 01 1

1 0 1

12 2 02

ˆ ˆ( ).)

1. Select , , , at random from the observed residuals , , ;

let be the regression test for : 0 using these resampled data.

2. Select , , , at random from t

i i i

n n

n

e Y X

Y Y Y e e

T H

Y Y Y

1

2 0 1

1 0 1

he observed residuals , , ;

let be the regression test for : 0 using these resampled data.

. Select , , , at random from the observed residuals , , ;

let be the regression t

n

B nB B n

B

e e

T H

B Y Y Y e e

T

0 1

1

est for : 0 using these resampled data.

Use the simulated data , , to determine critical and p-values.B

H

T T

Extension: Market Model0 1

0 1 2

Let , ,..., , be the sample residuals from the regression model

0, for 0, where 0,1,2,..., , where

1, for 0

and where is a market measure at time .

1. Select

n

i i i i i

i

e e e

iY X M i n X

i

M i

11 1 01 1

1 0 1

12 2 02 1

, , , at random from the observed residuals , , ;

let be the regression test for : 0 using these resampled data.

2. Select , , , at random from the observed residuals , ,

n n

n n

Y Y Y e e

T H

Y Y Y e e

2 0 1

1 0 1

0 1

;

let be the regression test for : 0 using these resampled data.

. Select , , , at random from the observed residuals , , ;

let be the regression test for : 0 using these rB nB B n

B

T H

B Y Y Y e e

T H

1

esampled data.

Use the simulated data , , to determine critical and p-values.BT T

Extension: Multivariate Market Model

The MVRM models may be expressed as

Ri = Xi + Di + i, for i= 1,…,g (firms or portfolios).

Observations within a row of [1 | … | g] are correlated; this is called “cross-sectional” correlation.

Observations on [1 | … | g] between rows 1,…,n are assumed to be independent in the classical MVRM model.

Null hypothesis: H0: [1 | … | g] = [0 | … | 0]

This multivariate test is computed easily and automatically using standard statistical software packages, using exact (under normality) F-tests. The test is based on Wilks’ Lambda likelihood ratio criterion.

Hein, Westfall, Zhang Bootstrap Method

1. Fit the MVRM model. Obtain the F-statistic for testing H0 using the traditional method (assuming normality). Obtain also the ((n+1)g) sample residual matrix e = [e1| … | eg].

2. Exclude the row corresponding to event from e, leaving the (ng) matrix e-.

3. Sample (n+1) row vectors, one at a time and with replacement, from e-. This gives a ((n+1)g) matrix [ R1* | … | Rg* ].

4. Fit the model Ri* = Xi + Di + i, i = 1, …, g, and obtain the test statistic F* using the same technique used to obtain the F-statistic from the original sample.

5. Repeat 3 and 4 NBOOT times. The bootstrap p-value of the test is the proportion of the NBOOT samples yielding an F*-statistic that is greater than or equal to the original F-statistic from step 1.

Traditional, =.10, T=200

0

0.05

0.1

0.15

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Traditional, =.05, T=200

0

0.05

0.1

0.15

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Traditional, =.01, T=200

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Bootstrap, =.01, T=200

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Bootstrap, =.05, T=200

0

0.05

0.1

0.15

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Bootstrap, =.10, T=200

0

0.05

0.1

0.15

1 2 4 8

Number of Firms (or Portfolios)

T1

T2

T4

T8

Z

Figure 1: True type I error rates for bootstrap and traditional tests for events when T=200.

Simulation Study: True Type I error rates

Traditional, =.10, T=50

0

0.05

0.1

0.15

0.2

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r

T1

T2

T4

T8

Z

Traditional, =.05, T=50

0

0.05

0.1

0.15

0.2

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r

T1

T2

T4

T8

Z

Traditional, =.01, T=50

00.010.02

0.030.040.050.060.07

0.080.090.1

0.110.12

0.130.140.15

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r T1

T2

T4

T8

Z

Bootstrap, =.01, T=50

00.010.020.030.040.050.060.070.080.090.1

0.110.120.130.140.15

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r

T1

T2

T4

T8

Z

Bootstrap, =.05, T=50

0

0.05

0.1

0.15

0.2

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r

T1

T2

T4

T8

Z

Bootstrap, =.10, T=50

0

0.05

0.1

0.15

0.2

1 2 4 8

Number of Firms (or Portfolios)

Typ

e I e

rro

r T1

T2

T4

T8

Z

Figure 2: True type I error rates for bootstrap and traditional tests for events when T=50.

Simulation Study: True Type I error rates

Alternative Method (Kramer,2001)

Test statistic is Z = ti/(g1/2st), where ti is the t-statistic from the univariate dummy-variable-based regression model for firm i, and st is the sample standard deviation of the g t-statistics.

Algorithm: (i) create a pseudo-population of t-statistics ti* = ti - reflecting the null

hypothesis case where the true mean of the t-statistics is zero,

(ii) sample g values with replacement from the pseudo-population and compute Z* from these pseudo-values,

(iii) repeat (ii) NBOOT times, obtaining Z1*, …, Zb*. The p-value for the test is then 2*min(pU, pL), where pL is the proportion of the NBOOT bootstrap samples yielding Zi* Z, and where pU is the proportion of the NBOOT samples yielding Zi* Z.

Assumption: The statistics are cross-sectionally independent

t

Modified Kramer Method

• Model-Based bootstrap Kramer: Bootstrap Kramer’s Z = ti/(g1/2st), but by resampling MVRM residual vectors as in HWZ.

• Model-based sum t: Bootstrap St= ti by resampling MVRM residual vectors as in HWZ.

Table 1. Simulated Type I error rates as a function of cross-sectional correlation.

Panel B: g = 30

HWZ 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 BT 0.070 0.057 0.056 0.059 0.062 0.064 0.051 0.058 0.056 0.049 BK 0.073 0.057 0.051 0.055 0.056 0.053 0.055 0.058 0.053 0.050 K 0.056 0.366 0.516 0.590 0.652 0.702 0.718 0.813 0.851 0.885

Panel A: g = 5

HWZ 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 BT 0.055 0.057 0.055 0.057 0.056 0.058 0.056 0.057 0.060 0.056 BK 0.053 0.056 0.049 0.053 0.050 0.042 0.041 0.045 0.050 0.049 K 0.057 0.078 0.113 0.168 0.220 0.275 0.335 0.418 0.500 0.624

Table 2. Simulated power as a function of event effect . Panel A: g = 5 HWZ 0.05 0.04 0.09 0.19 0.28 0.74 0.95 1 1 BT 0.02 0.09 0.2 0.42 0.6 0.92 0.99 1 1 BK 0.05 0.07 0.17 0.34 0.58 0.78 0.92 0.97 1 K 0.06 0.08 0.16 0.27 0.42 0.65 0.75 0.88 0.93

Panel B: g = 30

HWZ BT BK K

Table 3. Simulated Type I error rates as a function of serial correlation.

Panel A: Zero Cross-sectional correlation HWZ 0.052 0.054 0.056 0.057 0.057 0.059 0.072 0.085 0.113 0.176 BT 0.062 0.062 0.067 0.066 0.068 0.07 0.067 0.071 0.079 0.093 BK 0.062 0.06 0.064 0.065 0.067 0.064 0.062 0.067 0.067 0.061 K 0.053 0.052 0.047 0.048 0.045 0.048 0.046 0.046 0.052 0.045

Panel B: Cross-sectional correlation=0.5.

HWZ 0.059 0.054 0.056 0.057 0.057 0.059 0.072 0.085 0.113 0.176 BT 0.058 0.045 0.047 0.048 0.050 0.054 0.054 0.056 0.064 0.075 BK 0.042 0.048 0.052 0.049 0.045 0.047 0.049 0.048 0.051 0.045 K 0.275 0.246 0.245 0.245 0.251 0.251 0.254 0.246 0.251 0.238

/*--------------------------------------------------------------*//* Name: bootevnt *//* Title: Macro to calculate bootstrap p-values for event *//* studies *//* Author: Peter H. Westfall, westfall@ttu.edu *//* Release: SAS Version 6.12 or higher, requires SAS/IML *//*--------------------------------------------------------------*//* Inputs: *//* *//* DATASET = Data set to be analyzed (required) *//* *//* YVARS = List of y variables used in the multivariate *//* regression model, separated by blanks (required) *//* *//* XVARS = List of x variables used in the multivariate *//* regression model, separated by blanks (required) *//* *//* EVENT = Name of dummy variable indicating event *//* observation (e.g., day). This is required. *//* *//* EXCLUDE = Name of dummy variable indicating days that *//* should be excluded from the resampling. If there *//* are multiple event days in the model, then all *//* those days should be excluded because the *//* residuals are mathematically zero. If there are *//* not multiple eventdays, then the EXCLUDE *//* variable should be identical to the EVENT *//* variable. *//* *//* NBOOT = Number of bootstrap samples. This input is *//* required. Pick a number as large as possible *//* subject to time constraints. Start with 100 *//* and work your way up, noting the accuracy as *//* given by the confidence interval in the output. *//* */ /* MODELBOOT = 1 for requesting model-based bootstrap tests, *//* = 0 to exclude them. *//* */ /* NPBOOT = 1 to request Kramer's nonparametric bootstrap *//* tests, =0 to exclude them. *//* *//* SEED = Seed value for random numbers (0 default) *//* *//*--------------------------------------------------------------*//* Output: This macro computes normality-assuming exact p- *//* values and bootstrap approximate p-values that do not *//* require the normality assumption. A 95% confidence interval *//* for the true bootstrap p-value (which itself is approximate *//* because it uses the empirical, not the true, residual *//* distribution) also is given. *//*--------------------------------------------------------------*/

Invocation of Macro

libname fin "c:\research\coba";data sinkey; set fin.sinkey;run;

%bootevnt(dataset=sinkey, yvars=pr1 pr2 pr3 pr4, xvars=ds m1 m2 m3 dsm d2 d3 d4 d5 d6, event=d1, exclude=exclude, nboot=1000, modelboot=1, npboot=1, seed=182161);

Normality-Assuming Tests for Event

TSQ F NDF DDF PVAL

15.025505 3.6957895 4 183 0.0064153

NBOOT

Model-based bootstrap Binder p-value, using 20000 samples

with 95% confidence limits on the true bootstrap p-value

BOOTP LCL UCL

0.01115 0.0096947 0.0126053

Model-based bootstrap Kramer p-value, using 20000 samples

with 95% confidence limits on the true bootstrap p-value

BOOTKP LCLK UCLK

0.0609 0.0561373 0.0656627

NBOOT

Model-based bootstrap Sum t p-value, using 20000 samples

with 95% confidence limits on the true bootstrap p-value

BOOTTSUMP LCLSUMT UCLSUMT

0.0001 -0.000096 0.000296

1.55 % of the bootstrap samples had 0 variance

NBOOT

Nonparametric bootstrap Kramer p-value, using 20000 samples

with 95% confidence limits on the true bootstrap p-value

BOOTTNP LCLNP UCLNP

0.1404 0.1333184 0.1452147

Robustness of Bootstrap to Serial Correlation

• Recall that the method is essentially a comparison of Y0 to the distribution of Y1,…,Yn.

• If the empirical distribution of Y1,…,Yn converges to F, then the unconditional null probability of an “event” also converges to =F(c/2) + (1-F(c/2)).

• Such convergence occurs for typical stationary time series processes.

Conclusions• We use t, not z even when n is large. Why? Because t is

generally more accurate.

• We should use bootstrap tests instead of traditional tests for precisely the same reason.

• We must account for cross-sectional correlation in the analysis.

• The recommended method is our bootstrap with a modification of Kramer’s Z (The model-based sum t method)

Software is available from westfall@ba.ttu.edu