Time Series Analysis - Baruch MFE Program...Time series For modeling purposes, we assume that the...

Basic conceptsAutoregressive modelsMoving average models

Time Series Analysis1. Stationary ARMA models

Andrew Lesniewski

Baruch CollegeNew York

Fall 2019

A. Lesniewski Time Series Analysis


Outline

1 Basic concepts

2 Autoregressive models

3 Moving average models



Time series

A time series is a sequence of data points Xt indexed a discrete set of (ordered)dates t , where −∞ < t <∞.

Each Xt can be a simple number or a complex multi-dimensional object (vector,matrix, higher dimensional array, or more general structure).

We will be assuming that the times t are equally spaced throughout, and denotethe time increment by h (e.g. second, day, month). Unless specified otherwise,we will be choosing the units of time so that h = 1.

Typically, time series exhibit significant irregularities, which may have their origineither in the nature of the underlying quantity or imprecision in observation (orboth).

Examples of time series commonly encountered in finance include:(i) prices,(ii) returns,(iii) index levels,(iv) trading volums,(v) open interests,(vi) macroeconomic data (inflation, new payrolls, unemployment, GDP,

housing prices, . . . )



Time series

For modeling purposes, we assume that the elements of a time series arerandom variables on some underlying probability space.

Time series analysis is a set of mathematical methodologies for analyzingobserved time series, whose purpose is to extract useful characteristics of thedata.

These methodologies fall into two broad categories:(i) non-parametric, where the stochastic law of the time series is not explicitly

specified;(ii) parametric, where the stochastic law of the time series is assumed to be

given by a model with a finite (and preferably tractable) number ofparameters.

The results of time series analysis are used for various purposes such as(i) data interpretation,(ii) forecasting,(iii) smoothing,(iv) back filling, ...

We begin with stationary time series.



Stationarity and ergodicity

A time series (model) is stationary, if for any times t1 < . . . < tk and any τ thejoint probability distribution of (Xt1+τ , . . . ,Xtk +τ ) is identical with the jointprobability distribution of (Xt1 , . . . ,Xtk ).

In other words, the joint probability distribution of (Xt1 , . . . ,Xtk ) remains thesame if each observation time ti is shifted by the same amount (time translationinvariance).

For a stationary time series, the expected value E(Xt ) is independent of t and iscalled the (ensemble) mean of Xt . We will denote its value by µ.

A stationary time series model is ergodic if

limT→∞

1T

∑1≤k≤T

Xt+k = µ, (1)

i.e. if the time average of Xt is equal to its mean.

The limit in (1) is usually understood in the sense of squared mean convergence.



Stationarity and ergodicity

Ergodicity is a desired property of a financial time series, as we are always facedwith a single realization of a process rather than an ensemble of alternativeoutcomes.

The notions of stationarity and ergodicity are hard to verify in practice. Inparticular, there is practical statistical test for ergodicity.

For this reason, a weaker but more practical concept of stationarity has beenintroduced.



Autocovariance and stationarity

A time series is covariance-stationary (a.k.a. weakly stationary ), if:(i) E(Xt ) = µ is a constant,(ii) For any τ , the autocovariance Cov(Xs,Xt ) is time translation invariant,

Cov(Xs+τ ,Xt+τ ) = Cov(Xs,Xt ), (2)

i.e. Cov(Xs,Xt ) depends only on the difference t − s. We will denote it byΓt−s .

For covariance stationary series, Γ−t = Γt (show it!).

Notice that Γ0 = Var(Xt ).




The autocorrelation function (ACF) of a time series is defined as

Rs,t =Cov(Xs,Xt )√

Var(Xs)√

Var(Xt ). (3)

For covariance-stationary time series, Rs,t = Rs−t,0, i.e. the ACF is a function ofthe difference s − t only.

We will write Rt = Rt,0, and note that

Rt =Γt

Γ0. (4)




Note that µ, Γ, and R are usually unknown, and are estimated from sample data.The estimated sample mean µ, autocovariance Γ, and autocorrelation R arecalculated as follows.

Consider a finite sample x1, . . . , xT . Then

µ =1T

T∑t=1

xt ,

Γt =

1T

T∑j=t+1

(xj − µ)(xj−t − µ), for t = 0, 1, . . . ,T − 1,

Γ−t , for t = −1, . . . ,−(T − 1).

Rt =Γt

Γ0.

(5)

These quantities are called the sample mean, sample autocovariance, andsample ACF, respectively.




Usually, Rt is a biased estimator of Rt , with the bias going to zero as 1/T forT →∞.

Notice that this method allows us to compute up to T − 1 estimated sampleautocorrelations.

One can use the above estimators to test the hypothesis H0 : Rt = 0 versusHa : Rt 6= 0.

The relevant t-stat is

r =Rt√

1T

(1 + 2

∑t−1i=1 R2

i

) .If Xt is a stationary Gaussian time series with Rs = 0 for s > t , this t-stat isnormally distributed, asymptotically as T →∞.

We thus reject H0 with confidence 1− α, if |r | > Zα/2, where Zα/2 is the1− α/2 percentile of the standard normal distribution.




Another test, the Portmanteau test, allows us to test jointly for the presence ofseveral autocorrelations, i.e. H0 : R1 = . . . = Rk = 0, versus Ha : Ri 6= 0, forsome 1 ≤ i ≤ k .

The relevant t-stat is defined as

Q∗(k) = Tk∑

i=1

R2i .

Under the assumption that Xt is i.i.d., Q∗(k) is asymptotically distributedaccording to χ2(k).

The power of the test is increased if we replace the statistics above with theLjung-Box stat :

Q(k) = T (T + 2)k∑

i=1

R2i

T − i.

H0 is rejected if Q(k) is greater than the 1−α percentile of the χ2(k) distribution.



Models of time series

For practical applications, it is convenient to model a time series as adiscrete-time stochastic process with a small number of parameters.

Time series models have typically the following structure:

Xt = pt + mt + εt , (6)

where the three components on the RHS have the following meaning:pt is a periodic function called the seasonality,mt is a slowly varying process called the trend,εt is a stochastic component called the error or disturbance.

Classic linear time series models fall into three broad categories:autoregressive,moving average,integrated,

and their combinations.



White noiseThe source of randomness in the models discussed in these lectures is whitenoise. It is a process specified as follows:

Xt = εt , (7)

where εt ∼ N(0, σ2) are i.i.d. (= independent, identically distributed) normalrandom variables.Note that

E(εt ) = 0,

Cov(εs, εt ) =

σ2, if s = t ,0, otherwise.

(8)

The white noise process is stationary and ergodic (show it!).The white noise process with linear drift

Xt = at + b + εt , a 6= 0, (9)

is not stationary, as E(Xt ) = at + b.



Autoregressive model AR(1)

The first class of models that we consider are the autoregressive models AR(p).Their key characteristic is that the current observation is directly correlated withthe lagged p observations.

The simplest among them is AR(1), the autoregressive model with a single lag.

The model is specified as follows:

Xt = α+ βXt−1 + εt . (10)

Here, α, β ∈ R, and εt ∼ N(0, σ2) is a white noise.

A particular case of the AR(1) model is the random walk model, namely

Xt = Xt−1 + εt ,

in which the current value of X is the previous value plus a “white noise”disturbance.




The graph below shows a simulated AR(1) time series with the following choiceof parameters: α = 0.1, β = 0.3, σ = 0.005.




Here is the code snippet used to generate this graph in Python:import numpy as npimport matplotlib.pyplot as pltfrom statsmodels.tsa.arima model import ARMA

alpha=0.1beta=0.3sigma=0.005

#Simulate AR(1)T=250x0=alpha/(1-beta)x=np.zeros(T+1)x[0]=x0eps=np.random.normal(0.0,sigma,T)for i in range(1,T+1):

x[i]=alpha+beta*x[i-1]+eps[i-1]

#Take a look at the simulated time seriesplt.plot(x)plt.show()




Let us investigate the circumstances under which an AR(1) process iscovariance-stationary.

For µ = E(Xt ) to be independent of t we must have from (10):

µ = α+ βµ.

This equation has a solution iff β 6= 1 (except for the random walk casecorresponding to α = 0, β = 1). In this case,

µ =α

1− β. (11)

Let us now compute the autocovariance. To this end, we rewrite (10) as

Xt − µ = β(Xt−1 − µ) + εt . (12)

Notice that the two terms on the RHS of this equation are independent of eachother.




For Γ0 = Var(Xt ) to be independent of t , this implies that

Γ0 = β2Γ0 + σ2,

and so

Γ0 =σ2

1− β2. (13)

Since Γ0 > 0, this equation implies that |β| < 1.

Multiplying (12) by Xt−1 − µ, we find that Γ1 = βΓ0. Iterating, we find that

Γk = βk Γ0, (14)

with Γ0 given by (18). The autocorrelation function is decaying exponentially fastas a function of lag between two observations.

In conclusion, the condition for a AR(1) process to be covariance-stationary isthat |β| < 1.




The AR(1) with |β| < 1 has a natural interpretation that can be gleaned from thefollowing “explicit” representation of Xt . Namely, iterating (10) we find that:

Xt = α+ βXt−1 + εt

= α(1 + β) + β2Xt−2 + εt + βεt−1

= . . .

= α(1 + β + . . .+ βL−1) + βLXt−L + εt + βεt−1 + . . .+ βL−1εt−L+1

= µ(1− βL) + βLXt−L +√

Γ0(1− β2L−1) ξt

(15)

where ξt ∼ N(0, 1).

This implies that

E(Xt |Xt−L) = µ(1− βL) + βLXt−L,

Var(Xt |Xt−L) = Γ0(1− β2L−1).(16)




Since βL → 0 exponentially fast, for large L we have

Xt ≈ µ+√

Γ0 ξt . (17)

In other words, the AR(1) model describes a mean reverting time series. After alarge number of observations, Xt takes the form (17), i.e. it is equal to its meanvalue plus a Gaussian noise.

The rate of convergence to this limit is given by |β|: the smaller this value, thefaster Xt reaches its limit behavior.

The next question is: given a set of observations, how do we determine thevalues of the parameters α, β, and σ in (10)?



Maximum likelihood estimation

Maximum likelihood estimation (MLE) is a commonly used method of estimatingthe parameters of a statistical model given a set of observations.

It is based on the premise that the best choice of the parameter values shouldmaximize the likelihood of making the observations given these parameters.

Given a statistical model with parameters θ = (θ1, . . . , θd ), and a set of datay = (y1, . . . , yN ), we construct the likelihood function L(θ|y), which links themodel with the data in such a way as if the data were drawn from the assumedmodel.

In practice, L(θ|y) is the joint probability density function (PDF) p(y |θ) under themodel, evaluated at the observed values.

In particular, if the observations yi are independent, then

L(θ|y) =N∏

i=1

p(yi |θ), (18)

where p(yi |θ) denotes the PDF of a single observation.




The value θ∗ that maximizes L(θ|y) serves as the best fit between the modelspecification and the data.

It is usualy more convenient to consider the log liklihood function (LLF)− logL(θ|y). Then, θ∗ is the value at which the LLF attains its minimum.

As an illustration, consider a sample y = (y1, . . . , yN ) drawn from the normaldistribution N(µ, σ2). Its likelihood function is given by

L(θ|y) = (2πσ2)−N/2N∏

i=1

exp(−

(yi − µ)2

2σ2

), (19)

and the LLF is

− logL(θ|y) =12

N log σ2 +1

2σ2

N∑i=1

(yi − µ)2 + const . (20)




Taking the µ and σ derivatives and setting them to 0, we readily find that that theMLE estimates of µ and σ are

µ∗ =1N

N∑i=1

yi ,

(σ∗)2 =1N

N∑i=1

(yi − µ∗)2 .

(21)

respectively.

Note that, while µ∗ is unbiased, the estimator σ∗ is biased (N in the denominatorabove, rather than the usual N − 1).

The fact that the MLE estimator of a parameter is biased is a commonoccurance. One can show, however, that MLE estimators are consistent, i.e. inthe limit N →∞ they converge to the appropriate value.

Going forward, we will use the notation θ rather than θ∗ for the MLE estimators.



MLE for AR(1)Consider now the AR(1) model and a time series of data x0, . . . , xT , believed tobe drawn from this model. The easiest way to construct the likelihood function isto focus on the conditional PDF p(x1, . . . , xT |x0, θ). This leads to the conditionalMLE method.Let

εt = xt − α− βxt−1, (22)

for t = 1, . . . ,T , be the disturbances implied from the data. According to themodel specification, each εt is independently drawn from N(0, σ2), and thus

p(x1, . . . , xT |x0, θ) =1

(2πσ2)T/2exp

(−

12σ2

T∑t=1

ε2t

)

=1

(2πσ2)T/2exp

(−

12σ2

T∑t=1

(xt − α− βxt−1)2) (23)

Hence the LLF is given by

− logL(θ|y) =12

T log σ2 +1

2σ2

T−1∑t=0

(xt+1 − α− βxt )2 + const . (24)



MLE for AR(1)Minimizing this function yields:

αβ

=

T∑T−1

t=0 xt∑T−1t=0 xt

∑T−1t=0 x2

t

−1 ∑T−1t=0 xt+1∑T−1

t=0 xt xt+1

,

σ2 =1T

T∑t=1

(xt − α− βxt−1)2 .

(25)

This can also be explicitly rewritten as

β =

∑T−1t=0 (xt − x)(xt+1 − x+)∑T−1

t=0 (xt − x)2,

α = x+ − βx ,

(26)

where

x =1T

T−1∑t=0

xt , x+ =1T

T−1∑t=0

xt+1. (27)



MLE for AR(1)

The exact MLE method attempts to infer the likelihood of x0 from the probabilitydistribution. Since x0 ∼ N(µ, Γ0),

p(x0|θ) =

√1− β2

2πσ2exp

(−

(x0 − α/(1− β))2

2σ2/(1− β2)

). (28)

On the other hand, for t = 1, . . . ,T ,

p(xt |xt−1, . . . , x1, θ) =1

2πσ2exp

(−

(xt − α− βxt−1)

2σ2

2). (29)

From the definition of conditional probability we have the following identity:

p(x0, x1, . . . , xT |θ) = p(x0|θ)T∏

t=1

p(xt |xt−1, . . . , x1, θ). (30)



MLE for AR(1)

Therefore, the LLF is given by

− logL(θ|x) =12

logσ2

1− β2+

12

T log σ2

+(x0 − α/(1− β))2

2σ2/(1− β2)+

12σ2

T∑t=1

(xt − α− βxt−1)2 + const .(31)

Unlike the conditional case, the minimum of the exact LLF cannot be calculatedin closed form, and the calculation has to be done by means of a numericalsearch.



MLE for AR(1)

Here is a Python code snippet implementing the MLE for AR(1):#Conditional MLE estimatey=x[0:T]yp=x[1:(T+1)]m=np.sum(y)/Tmp=np.sum(yp)/TbetaCMLE=np.inner(y-m,yp-mp)/np.inner(y-m,y-m)alphaCMLE=mp-betaCMLE*msigmaCMLE=np.sqrt(np.inner(yp-betaCMLE*y-alphaCMLE,

yp-betaCMLE*y-alphaCMLE)/T)

Alternatively, one can use statsmodels functions:#MLE estimate with statsmodelsmodel=ARMA(x,order=(1,0)).fit(method=’mle’)alphaMLE=model.params[0]betaMLE=model.params[1]sigmaMLE=np.std(model.resid)



Second order autoregressive model AR(2)

A second order autoregressive model AR(2) model is specified as follows:

Xt = α+ β1Xt−1 + β2Xt−2 + εt , (32)

where α, β1, β2 ∈ R, and εt ∼ N(0, σ2) is a white noise.

Under this specification, the state variable depends on its two lags (rather thanone lag as in AR(1).

Let us determine the conditions under which the model is covariance-stationary.

From the requirement that E(Xt ) = µ,

µ =α

1− β1 − β2, (33)

and so we can can rewrite (32) in the following form:

Xt − µ = β1(Xt−1 − µ) + β2(Xt−2 − µ) + εt . (34)




Multiplying (34) by Xt−j − µ, for j = 0, 1, 2, and calculating expectations, we findthat

Γk =

β1Γ1 + β2Γ2 + σ2, if k = 0,β1Γk−1 + β2Γk−2, if k = 1, 2.

(35)

This identity is called the Yule-Walker equation for the autocovariance.

Dividing (57) by Γ0 yields the Yule-Walker equation for the autocorrelation:

Rk = β1Rk−1 + β2Rk−2, (36)

for k = 1, 2.

This equation allows us calculate explicitly the ACF for AR(2).

Namely, plugging in k = 1 and remembering that R−1 = R1 yieldsR1 = β1 + β2R1, or

R1 =β1

1− β2. (37)




Plugging in k = 2 yields R2 = β1R1 + β2, or

R2 = β2 +β2

11− β2

. (38)

Finally, substituting k = 0 in (34) yields

Γ0 = (β1R1 + β2R2)Γ0 + σ2. (39)

Solving this, we obtain

Γ0 =(1− β2)σ2

(1 + β2)((1− β2)2 − β21). (40)



Lag operators and characteristic rootsWe have not yet addressed the question under what condition is an AR(2) timeseries covariance-stationary. We will now introduce the concepts that will settlethis issue and will allow us to formulate criteria for stationarity for more generalmodels,

Let us define the lag operator L as a (linear) mapping:

LXt = Xt−1. (41)

In other words, the lag operator shifts the time index back by one unit.

Applying the lag operator k times shifts the time index by k units:

Lk Xt = Xt−k . (42)

We refer to Lk as the k -th power of L.

Finally, if ψ(z) = ψ0 +ψ1z + . . .+ψnzn is a polynomial in z, we associate with itan operator ψ(L) defined by

ψ(L) = ψ0 + ψ1L + . . .+ ψnLn. (43)



Lag operators and characteristic rootsNotice that equation (32) can be stated as

ψ(L)Xt = α+ εt , (44)

where ψ(z) = 1− β1z − β2z2.Solving this equation amounts to finding the inverse ψ(L)−1 of ψ(L):

Xt =α

ψ(1)+ ψ(L)−1εt . (45)

Suppose that we can write ψ(L)−1 as an infinite series

ψ(L)−1 =∞∑j=0

γj Lj , (46)

with∞∑j=0

|γj | <∞. (47)



Lag operators and characteristic roots

Then

Xt =α

ψ(1)+∞∑j=0

γjεt−j , (48)

withE(Xt ) =

α

ψ(1), (49)

and

Cov(Xt ,Xt+k ) =∞∑j=0

γjγj+k , for k ≥ 0, (50)

independently of t . The series is thus covariance-stationary.

In the case of AR(1), ψ(L) = 1− βL, it is clear that the geometric series doesthe job:

(1− βL)−1 =∞∑j=0

β j Lj , (51)

Condition (47) holds as long as |β| < 1. Another way of saying this is that theroot z1 = 1/β of 1− βz lies outside of the unit circle.



Lag operators and characteristic roots

Now, if ψ(z) is a polynomial with non-zero roots z1, . . . , zn. Then

ψ(L) = cn∏

j=1

(1− z−1j L), (52)

where c is the constant c = (−1)nψn∏n

j=1 zj .

If each of the roots zj (they may be complex) lies outside of the unit circle, i.e.|z−1

j | < 1, then we can invert ψ(L) by applying (51) to each factor in (52).

It is not hard to verify that the convergence criterion (47), and thus the timeseries is stationary.

We can summarize these arguments by stating that a time series model given bythe lag form equation (44) is covariance stationary if the roots of the polynomialψ(z) lie outside of the unit circle.



General autoregressive model AR(p)

The p-th order autoregressive model AR(p) model is specified as follows:

Xt = α+ β1Xt−1 + . . .+ βpXt−p + εt , (53)

where α, βj ∈ R, and εt ∼ N(0, σ2) is a white noise.

For the covariance-stationarity, the requirement that E(Xt ) = µ yields

µ =α

1− β1 − . . .− βp. (54)

Furthermore, we require that the roots of the characteristic polynomialψ(z) = 1− α− β1z − . . .− βpzp lie outside of the unit circle.

We can rewrite (53) in the following form:

Xt − µ = β1(Xt−1 − µ) + . . .+ βp(Xt−p − µ) + εt . (55)



General autoregressive model AR(p)

Multiplying this equation by Xt−j − µ, for j = 0, . . . , p, and calculatingexpectations yields the Yule-Walker equation for the autocovariance:

Γk =

β1Γ1 + · · ·+ βpΓp + σ2, if k = 0,β1Γk−1 + . . .+ βpΓk−p, if k = 1, . . . , p.

(56)

Dividing (56) by Γ0 yields the Yule-Walker equation for the autocorrelation:

Rk = β1Rk−1 + . . .+ βpRk−p, (57)

for k = 1, . . . , p.

Note that the autocorrelations satisfy essentially the same equation as theprocess defining Xt .

The ACF Rk can be found as the solution to the Yule-Walker equation and areexpressed in terms of the roots of the characteristic polynomial.



Choosing the number of lags in AR(p)

In practice, the number of lags p is unknown, and has to be determinedempirically.

This can be done by regressing the variable on its lagged values withp = 1, 2, . . . , and assessing the impact of each added lag on the fit.

It is important not to overfit the model (“torture it until it confesses”) by adding toomany lags.

Useful quantitative guides for model selection are various information criteria.

The Akaike information criterion defined as follows:

AIC = 2k − 2 logL(θ|x). (58)

Here k = #θ is the number of model parameters, − logL(θ|x) denotes theoptimized value of the LLF.

Acoording to this criterion, among the candidate models the model with thelowest value of AIC is the preferred one.



Choosing the number of lags in AR(p)

This is in contrast with picking the model whose optimized LLF is the lowest: thismay be the result of overfitting. The AIC criterion penalizes the number ofparameters, and thus discourages overfitting.

Another popular information criteria is the Bayesian information criterion (a.k.athe Schwarz criterion), which is defined as follows:

BIC = log(N)k − 2 logL(θ|x), (59)

where N = #x is the number of data points.

According to this criterion, the model with the smallest value of BIC is thepreferred model.



Moving average model MA(1)

The moving average model MA(1) is specified as follows:

Xt = µ+ εt + θεt−1., (60)

where µ and θ are constants, and εt is white noise.

The key feature of the MA(1) model is that its disturbances are autocorrelatedwith lag 1.

The expected value of Xt isE(Xt ) = µ, (61)

as E(εt ) = µ, for all t .

Its variance is

E((Xt − µ)2) = E((εt + θεt−1)2)

= E(ε2t ) + 2θE(εtεt−1) + θ2E(ε2

t−1)

= (1 + θ2)σ2.




For the first autocovariance, we have

E((Xt − µ)(Xt−1 − µ)) = E((εt + θεt−1)(εt−1 + θεt−2))

= θσ2.

All autocovariances with lag ≥ 2 are zero (show it!).

As a result, MA(1) is (unlike AR(1)) always covariance-stationary with

Γt =

(1 + θ2)σ2, if t = 0,θσ2, if |t | = 1,0, if |t | ≥ 2.

(62)

As a result, the first autocorrelation R1 = Γ1/Γ0 is given by

R1 =θ

1 + θ2, (63)

with all higher order autocorrelations equal zero.




The graph below shows a simulated MA(1) time series with the following choiceof parameters: µ = 1.1, β = 0.6, σ = 0.5.




Here is the code snippet used to generate this graph in Python:import numpy as npimport matplotlib.pyplot as pltfrom statsmodels.tsa.arima model import ARMA

mu=1.1theta=0.6sigma=0.5

#Simulate MA(1)T=250x0=mux=np.zeros(T+1)x[0]=x0eps=np.random.normal(0.0,sigma,T+1)for i in range(1,T+1):

x[i]=mu+eps[i]+theta*eps[i-1]

#Take a look at the simulated time seriesplt.plot(x)plt.show()



MLE for MA(1)As in the case of AR(1), there are two natural approaches to MLE of an MA(1)model: conditional on the initial value of ε and exact.We begin with the conditional MLE method, which is somewhat easier.Since the value of ε0 cannot be calculated from the observed data, we are freeto set it arbitrarily; we choose ε0 = 0. All the probabilities calculated below areconditional on this choice.We then have, for t = 1, . . . ,T ,

εt = xt − µ− θεt−1, (64)

and so the conditional PDF of xt is

p(xt |xt−1, . . . , x1, ε0 = 0, θ) =1

√2πσ2

exp(−

ε2t

2σ2

). (65)

This expression is deceivingly simply: in reality εt is a nested function of all xswith s ≤ t .The liklihood function of the sample x1, . . .T is given by the product of theprobabilities above, and so

L(θ|x , ε0 = 0) =T∏

t=1

p(xt |xt−1, . . . , x1, ε0 = 0, θ), (66)



MLE for MA(1)The log liklihood has thus the following form:

− logL(θ|x , ε0 = 0) =12

T log σ2 +1

2σ2

T∑t=1

ε2t + const . (67)

This is a quadratic function of the xt ’s. It is cumbersome to write it down explicitly,but easy to code it in a programming language. Its minimum is easiest to find bymeans of a numerical search.In case of |θ| < 1, the impact of the choice ε0 = 0 phases out as we iteratethrough time steps. For |θ| > 1 the impact of this choice accumulates, and themethod cannot be used.For the exact MLE method, we notice that the joint PDF of x is given by

p(x |θ) =1

(2π)T/2 det(Ω)1/2exp

(−

12

(x − µ)TΩ−1(x − µ)), (68)

and thus

− logL(θ|x) =12

log det(Ω) +12

(x − µ)TΩ−1(x − µ). (69)



MLE for MA(1)

Here, Ω is a band diagonal matrix:

Ω = σ2

1 + θ2 θ 0 . . . 0θ 1 + θ2 θ . . . 00 θ 1 + θ2 . . . 0...

...... . . . 1 + θ2.

(70)

The numerics of minimizing (69) can be handled either by (i) a clever triangularfactorization of Ω, or by the Kalman filter method (we will discuss Kalman filterslater in this course).

Unlike the conditional MLE method, the exact method does not suffer frominstabilities if |θ| ≥ 1.



MLE for MA(1)

Here is the Python code snippet implementing the MLE for MA(1) usingstatsmodels:#MLE estimate with statsmodelsmodel=ARMA(x,order=(0,1)).fit(method=’mle’)muMLE=model.params[0]thetaMLE=model.params[1]sigmaMLE=np.std(model.resid)



General moving average model MA(q)

A q-th order moving average model MA(q) is specified as follows:

Xt = µ+ εt + θ1εt−1 + . . .+ θqεt−q ., (71)

where µ and θj are constants, and εt is white noise.

In other words, the MA(q) model fluctuates around µ with disturbances whichare autocorrelated with lag q.

The expected value of Xt isE(Xt ) = µ, (72)

while its autocovariance is

Γj =

(1 + θ2

1 + . . .+ θ2q)σ2, if j = 0,

(θj + θj+1θ1 + . . .+ θqθq−j )σ2, if j = 1, . . . , q,

0, if j > q.(73)



ARMA(p,q) model

A mixed autoregressive moving average model ARMA(p, q) is specified asfollows:

Xt = α+ β1Xt−1 + . . .+ βpXt−p + εt + θ1εt−1 + . . .+ θqεt−q , (74)

where α and βj , θk are constants, and εt is white noise.

The equation above has the following lag operator representation:

ψ(L)Xt = α+ ϕ(L)εt , (75)

where

ψ(z) = 1− β1z − . . .− βpzp,

ϕ(z) = 1 + θ1z + . . .+ θqzq .(76)

The process (49) is covariance stationary if the roots of ψ lie outside of the unitcircle.



ARMA(p,q) modelIn this case, we can write the model in the form

Xt = µ+ γ(L)εt , (77)

where µ = α/ψ(1), and γ(L) = ψ(L)−1ϕ(L). Explicitly, γ(L) is an infinite series:

γ(L) =∞∑j=0

γj Lj , (78)

with∞∑j=0

|γj |2 <∞. (79)

This form of the model specification is called the moving average form.The parameters ARMA models are estimated by means of the MLE method. Thecomplexity of computation required to minimize the LLF increases with thenumber of parameters.Information criteria, such as AIC or BIC, remain useful quantitative guides formodel selection.



Forecasting time series with ARMA(p,q)

An important function of time series analysis is making predictions about futurevalues of the observed data, i.e. forecasting.

Data based forecasting problem can be formulated as follows: given theobservations X1:t = X1, . . . ,Xt , what is the best forecast X∗t+1|1:t of Xt+1?

In mathematical terms, the problem requires minimizing a suitable loss function.We choose to minimize the mean squared error (MSE) given by

E((Xt+1 − X∗t+1|1:t )

2). (80)

We claim that X∗t+1|1:t is, indeed, given given by the conditional expected value:

X∗t+1|1:t = Et (Xt+1). (81)

Here Et denotes expectation, conditional on the information up to time t ,

Et (·) = E( · |X1:t ). (82)




Indeed, if Z is any random variable measurable with respect to the informationset generated by X1:t , then

E((Xt+1 − Z )2) = E

((Xt+1 − Et (Xt+1) + Et (Xt+1)− Z )2)

= E((Xt+1 − Et (Xt+1))2)+ E

((Et (Xt+1)− Z )2)

+ 2E((Xt+1 − Et (Xt+1))(Et (Xt+1)− Z )

).

We argue that the cross term above is zero. Indeed

Et((Xt+1 − Et (Xt+1))(Et (Xt+1)− Z )

)= Et

(Xt+1 − Et (Xt+1)

)(Et (Xt+1)− Z

)=(Et (Xt+1)− Et (Xt+1)

)(Et (Xt+1)− Z

)= 0.

Since E(·) = E(Et (·)|Xt ), the claim follows.




As a result

E((Xt+1 − Z )2) = E

((Xt+1 − Et (Xt+1))2)+ E

((Et (Xt+1)− Z )2),

which has its minimum at Z = Et (Xt+1). This proves (81).

The argument above is, in fact, quite general, and it easily extends to generalk -period forecasts X∗t+k|1:t . Minimizing the corresponding MSE yields:

X∗t+k|1:t = Et (Xt+k ). (83)

Later we will generalize this method to time series models with more complexstructure.




As an example, a single period forecast in an AR(1) model is

X∗t+1|1:t = Et (Xt+1)

= Et (α+ βXt + εt+1)

= α+ βXt .

(84)

The forecast error is εt+1, and so the variance of the forecast error is σ2.

Likewise, a single period forecast in an AR(p) model is

X∗t+1|1:t = α+ β1Xt + . . .+ βpXt−p+1. (85)

with forecast error is εt+1, and the variance of the forecast error is σ2.




A two-period forecast in an AR(1) model is given by

X∗t+2|1:t = Et (Xt+2)

= Et (α+ βXt+1 + εt+2)

= (1 + β)α+ β2Xt .

(86)

The error of the two period forecast is εt+2 + βεt+1; its variance is (1 + β2)σ2.

A one period forecast in an MA(1) model is

X∗t+1|1:t = Et (Xt+1)

= Et (µ+ εt+1 + θεt )

= µ+ θεt .

(87)

The forecast error is εt+1, and its variance is σ2.

These calculations can be generalized to produce a general formula for amulti-period forecast in an ARMA(p, q) model. This result is known as theWiener-Kolmogorov prediction formula and its discussion can be found in [1].



References

[1] Hamilton, J. D.: Time Series Analysis, Princeton University Press (1994).

[2] Tsay, R. S.: Analysis of Financial Time Series, Wiley (2010).


Date post:	05-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Time Series Analysis - Baruch MFE Program...Time series For modeling purposes, we assume that the...

Documents