+ All Categories
Home > Documents > Stochastic Processes (Master degree in...

Stochastic Processes (Master degree in...

Date post: 10-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
52
Stochastic Processes (Master degree in Engineering) Franco Flandoli
Transcript
Page 1: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

Stochastic Processes(Master degree in Engineering)

Franco Flandoli

Page 2: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process
Page 3: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

Contents

Preface v

Chapter 1. Preliminaries of Probability 11. Transformation of densities 12. About covariance matrices 33. Gaussian vectors 5

Chapter 2. Stochastic processes. Generalities 131. Discrete time stochastic process 132. Stationary processes 163. Time series and empirical quantities 194. Gaussian processes 215. Discrete time Fourier transform 226. Power spectral density 247. Fundamental theorem on PSD 268. Signal to noise ratio 309. An ergodic theorem 31

Chapter 3. ARIMA models 371. De�nitions 372. Stationarity, ARMA and ARIMA processes 403. Correlation function 414. Power spectral density 45

iii

Page 4: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process
Page 5: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

Preface

These notes are planned to be the last part of a course of Probability and Stochastic Processes.The �rst part is devoted to the introduction to the following topics, taken for instance from the bookof Baldi (Italian language) or Billingsley (in English):

Probability space (; F; P )Conditional probability and independence of eventsFactorization formula and Bayes formulaConcept of random variable X, random vector X = (X1; :::; Xn)Law of a r.v., probability density (discrete and continuous)Distribution function and quantilesJoint law of a vector and marginal laws, relations(Transformation of densities and moments) (see complements below)Expectation, propertiesMoments, variance, standard deviation, propertiesCovariance and correlation coe¢ cient, covariance matrixGenerating function and characteristic function(Discrete r.v.: Bernoulli, binomial, Poisson, geometric)Continuous r.v.: uniform, exponential, Gaussian, Weibull, GammaNotions of convergence of r.v.(Limit theorems: LLN, CLT; Chebyshev inequality.)Since we need some more specialized material, Chapter 1 is a complement to this list of items.

v

Page 6: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process
Page 7: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

CHAPTER 1

Preliminaries of Probability

1. Transformation of densities

Exercise 1. If X has cdf FX (x) and g is increasing and continuous, then Y = g (X) has cdf

FY (y) = FX�g�1 (y)

�for all y in the image of y. If g is decreasing and continuous, the formula is

FY (y) = 1� FX�g�1 (y)

�Exercise 2. If X has continuous pdf fX (x) and g is increasing and di¤erentiable, then Y = g (X)

has pdf

fY (y) =fX�g�1 (y)

�g0 (g�1 (y))

=fX (x)

g0 (x)

����y=g(x)

for all y in the image of y. If g is decreasing and di¤erentiable, the formula is

fY (y) = �fX (x)

g0 (x)

����y=g(x)

:

Thus, in general, we have the following result.

Proposition 1. If g is monotone and di¤erentiable, the transformation of densities is given by

fY (y) =fX (x)

jg0 (x)j

����y=g(x)

Remark 1. Under proper assumptions, when g is not injective the formula generalizes to

fY (y) =X

x:y=g(x)

fX (x)

jg0 (x)j :

Remark 2. A second proof of the previous formula comes from the following characterization ofthe density: f is the density of X if and only if

E [h (X)] =

ZRh (x) f (x) dx

for all continuous bounded functions h. Let us use this fact to prove that fY (y) =fX(x)jg0(x)j

���y=g(x)

is the

density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We

1

Page 8: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

2 1. PRELIMINARIES OF PROBABILITY

have, from the de�nition of Y and from the characterization applied to X,

E [h (Y )] = E [h (g (X))] =

ZRh (g (x)) f (x) dx:

Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤erentiable.We have x = g�1 (y), dx = 1

jg0(g�1(y))jdy (we put the absolute value since we do not change the extreme

of integration, but just rewriteRR) so thatZ

Rh (g (x)) f (x) dx =

ZRh (y) f

�g�1 (y)

� 1

jg0 (g�1 (y))jdy:

If we set fY (y) :=fX(x)jg0(x)j

���y=g(x)

we have proved that

E [h (Y )] =

ZRh (y) fY (y) dy

for every continuous bounded functions h. By the characterization, this implies that fY (y) is thedensity of Y . This proof is thus based on the change of variable formula.

Remark 3. The same proof works in the multidimensional case, using the change of variableformula for multiple integrals. Recall that in place of dy = g0(x)dx one has to use dy = jdetDg (x)j dxwhere Dg is the Jacobian (the matrix of �rst derivatives) of the transformation g : Rn ! Rn. In factwe need the inverse transformation, so we use the corresponding formula

dx =��detDg�1 (y)�� dy = 1

jdetDg (g�1 (y))jdy:

With the same passages performed above, one gets the following result.

Proposition 2. If g is a di¤erentiable bijection and Y = g (X), then

fY (y) =fX (x)

jdetDg (x)j

����y=g(x)

:

Exercise 3. If X (in Rn) has density fX (x) and Y = UX, where U is an orthogonal lineartransformation of Rn (it means that U�1 = UT ), then Y has density

fY (y) = fX�UT y

�:

1.1. Linear transformation of moments. The solution of the following exercises is based onthe linearity of expected value (and thus of covariance in each argument).

Exercise 4. Let X = (X1; :::; Xn) be a random vector, A be a n � d matrix, Y = AX. Let�X =

��X1 ; :::; �

Xn

�be the vector of mean values of X, namely �Xi = E [Xi]. Then

�Y := A�X

is the vector of mean values of Y , namely �Yi = E [Yi].

Exercise 5. Under the same assumptions, if QX and QY are the covariance matrices of X andY , then

QY = AQXAT :

Page 9: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

2. ABOUT COVARIANCE MATRICES 3

2. About covariance matrices

The covariance matrix Q of a vectorX = (X1; :::; Xn), de�ned as Qij = Cov (Xi; Xj), is symmetric:

Qij = Cov (Xi; Xj) = Cov (Xj ; Xi) = Qji

and non-negative de�nite:

xTQx =

nXi;j=1

Qijxixj =

nXi;j=1

Cov (Xi; Xj)xixj =

nXi;j=1

Cov (xiXi; xjXj)

= Cov

0@ nXi=1

xiXi;

nXj=1

xjXj

1A = V ar [W ]

where W =Pni=1 xiXi.

The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists aorthonormal basis e1; :::; en of Rn where Q takes the form

Qe =

0@ �1 0 00 ::: 00 0 �n

1A :Moreover, the numbers �i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.Since the covariance matrix Q is also non-negative de�nite, we have

�i � 0; i = 1; :::; n:

Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vectorspace with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sumof vectors, multiplication by real numbers, scalar product between vectors) and properties. We may callintrinsic the objects de�ned in these terms, opposite to the objects de�ned by means of numbers, withrespect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence ofnumbers (x1; :::; xn) in in�nitely many ways, depending on the basis we choose. Given an orthonormalbasis u1; :::; un, the components of a vector x 2 Rn in this basis are the numbers hx; uji, j = 1; :::; n. Alinear map L in Rn, given the basis u1; :::; un, can be represented by a matrix of components hLui; uji.We shall write yT � x for hx; yi (or hy; xi).

Remark 5. After these general comments, we see that a matrix represents a linear transformation,given a basis. Thus, given the canonical basis of Rn, that we shall denote by u1; :::; un, given the matrixQ, it is de�ned a linear transformation L from Rn to a Rn. The spectral theorem states that there is anew orthonormal basis e1; :::; en of Rn such that, if Qe represents the linear transformation L in thisnew basis, then Qe is diagonal.

Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1; :::; un,that we call canonical or original basis. Let e1; :::; en be another orthonormal basis. The vector u1, in

Page 10: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

4 1. PRELIMINARIES OF PROBABILITY

the canonical basis, has components

u1 =

0BB@10:::0

1CCAand so on for the other vectors. Each vector ej has certain components. Denote by U the matrix suchthat its �rst column has the same components as e1 (those of the canonical basis), and so on for theother columns. We could write U = (e1; :::; en). Also, Uij = eTj � ui. Then

U

0BB@10:::0

1CCA = e1

and so on, namely U represents the linear map which maps the canonical (original) basis of Rn intoe1; :::; en. This is an orthogonal transformation:

U�1 = UT :

Indeed, U�1 maps e1; :::; en into the canonical basis (by the above property of U), and UT does thesame:

UT e1 =

0BB@eT1 � e1eT2 � e1:::

eTn � e1

1CCA =

0BB@10:::0

1CCAand so on.

Remark 7. Let us now go back to the covariance matrix Q and the matrix Qe given by the spectraltheorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basise1; :::; en. Assume we do not know anything else, except they describe the same map L and Qe isdiagonal, namely of the form

Qe =

0@ �1 0 00 ::: 00 0 �n

1A :Let us deduce a number of facts:

i)

Qe = UQUT

ii) the diagonal elements �j are eigenvalues of L, with eigenvectors ejiii) �j � 0, j = 1; :::; n.To prove (i), recall from above that

(Qe)ij = eTj � Lei and Qij = uTj � Lui:

Page 11: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. GAUSSIAN VECTORS 5

Moreover, Uij = eTj � ui, hence ej =Pnk=1 Ukjuk, and thus

(Qe)ij = eTj � Lei =

nXk;k0=1

UkiUk0juTk0 � Luk =

nXk;k0=1

UkiQijUk0j =�UQUT

�ij:

To prove (ii), let us write the vector Le1 in the basis e1; :::; en: ei is the vector

0BB@10:::0

1CCA, the map L isrepresented by Qe, hence Le1 is equal to

Qe

0BB@10:::0

1CCA =

0BB@�10:::0

1CCA = �1

0BB@10:::0

1CCAwhich is �1e1 in the basis e1; :::; en. We have checked that Le1 = �1e1, namely that �1 is an eigenvalueand e1 is a corresponding eigenvector. The proof for �2, etc. is the same. To prove (iii), just see that,in the basis e1; :::; en,

eTj Qeej = �j :

ButeTj Qeej = e

Tj UQU

T ej = vTQv � 0

where v = UT ej, having used the property that Q is non-negative de�nite. Hence �j � 0.

3. Gaussian vectors

Recall that a Gaussian, or Normal, r.v. N��; �2

�is a r.v. with probability density

f (x) =1p2��2

exp

�jx� �j

2

2�2

!:

We have shown that � is the mean value and �2 the variance. The standard Normal is the case � = 0,�2 = 1. If Z is a standard normal r.v., then �+ �Z is N

��; �2

�.

We may give the de�nition of Gaussian vector in two ways, generalizing either the expression ofthe density or the property that �+ �Z is N

��; �2

�. Let us start with a lemma.

Lemma 1. Given a vector � = (�1; :::; �n) and a symmetric positive de�nite n � n matrix Q(namely vTQv > 0 for all v 6= 0), consider the function

f (x) =1p

(2�)n det(Q)exp

�(x� �)

T Q�1 (x� �)2

!where x = (x1; :::; xn) 2 Rn. Notice that the inverse Q�1 is well de�ned for positive de�nite matrices,(x� �)T Q�1 (x� �) is a positive quantity, det(Q) is a positive number. Then:

i) f (x) is a probability density;

Page 12: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

6 1. PRELIMINARIES OF PROBABILITY

ii) if X = (X1; :::; Xn) is a random vector with such joint probability density, then � is the vectorof mean values, namely

�i = E [Xi]

and Q is the covariance matrix:

Qij = Cov (Xi; Xj) :

Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalledabove that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1; :::; enof Rn where Q takes the form

Qe =

0@ �1 0 00 ::: 00 0 �n

1A :Moreover, the numbers �i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.See above for more details. Let U be the matrix introduced there, such that U�1 = UT . Recall therelation Qe = UQUT .

Since vTQv > 0 for all v 6= 0, we deduce

vTQev =�vTU

�Q�UT v

�> 0

for all v 6= 0 (since UT v 6= 0). Taking v = ei, we get �i > 0.Therefore the matrix Qe is invertible, with inverse given by

Q�1e =

0@ ��11 0 00 ::: 00 0 ��1n

1A :It follows that also Q, being equal to UTQeU (the relation Q = UTQeU comes from Qe = UQU

T ),is invertible, with inverse Q�1 = UTQ�1e U . Easily one gets (x� �)

T Q�1 (x� �) > 0 for x 6= �.Moreover,

det(Q) = det�UT�det (Qe) det (U) = �1 � � � �n

because

det(Qe) = �1 � � � �nand det (U) = �1. The latter property comes from

1 = det I = det�UTU

�= det

�UT�det (U) = det (U)2

(to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de�nes apositive function.

Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidi-mensional integrals, with the change of variables x = UT y,Z

Rnf (x) dx =

ZRnf�UT y

�dy

Page 13: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. GAUSSIAN VECTORS 7

because��detUT �� = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since

UQ�1UT = Q�1e , f�UT y

�is equal to the following function:

fe (y) =1p

(2�)n det(Qe)exp

�(y � �e)

T Q�1e (y � �e)2

!where

�e = U�:

Since

(y � �e)T Q�1e (y � �e) =nXi=1

(yi � (�e)i)2

�i

and det(Qe) = �1 � � � �n, we get

fe (y) =

nYi=1

1p2��i

exp

�(yi � (�e)i)

2

2�i

!:

Namely, fe (y) is the product of n Gaussian densities N ((�e)i ; �i). We know from the theory ofjoint probability densities that the product of densities is the joint density of a vector with indepen-dent components. Hence fe (y) is a probability density. Therefore

RRn fe (y) dy = 1. This provesR

Rn f (x) dx = 1, so that f is a probability density.Step 3. Let X = (X1; :::; Xn) be a random vector with joint probability density f , when written

in the original basis. Let Y = UX. Then (exercise 3) Y has density fY (y) given by fY (y) = f�UT y

�.

Thus

fY (y) = fe (y) =nYi=1

1p2��i

exp

�(yi � (�e)i)

2

2�i

!:

Thus (Y1; :::; Yn) are independent N ((�e)i ; �i) r.v. and therefore

E [Yi] = (�e)i ; Cov (Yi; Yj) = �ij�i:

From exercises 4 and 5 we deduce that X = UTY has mean

�X = UT�Y

and covarianceQX = U

TQY U:

Since �Y = �e and �e = U� we readily deduce �X = UTU� = �. Since QY = Qe and Q = UTQeUwe get QX = Q. The proof is complete. �

Definition 1. Given a vector � = (�1; :::; �n) and a symmetric positive de�nite n� n matrix Q,we call Gaussian vector of mean � and covariance Q a random vector X = (X1; :::; Xn) having jointprobability density function

f (x) =1p

(2�)n det(Q)exp

�(x� �)

T Q�1 (x� �)2

!where x = (x1; :::; xn) 2 Rn. We write X � N (�;Q).

Page 14: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

8 1. PRELIMINARIES OF PROBABILITY

The only drawback of this de�nition is the restriction to strictly positive de�nite matrices Q. It issometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negativede�nite (sometimes called degenerate case). For instance, we shall see that any linear transformationof a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we needto consider also the degenerate case. In order to give a more general de�nition, let us take the idearecalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.

Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1; :::; Zd)

with joint probability density f (z1; :::; zd) =dYi=1

p (zi) where p (z) = 1p2�e�

z2

2 :

ii) All other Gaussian vectors X = (X1; :::; Xn) (in any dimension n) are obtained from standardones by a¢ ne transformations:

X = AZ + b

where A is a matrix and b is a vector. If X has dimension n, we require A to be d� n and b to havedimension n (but n can be di¤erent from d).

The graph of a standard 2-dimensional Gaussian vector is

220 0

0.00­2

xy

­2

0.15

z0.10

0.05

and the graph of the other Gaussian vectors can be guessed by linear deformations of the baseplane xy (deformations de�ned by A) and shift (by b). For instance, if

A =

�2 00 1

matrix which enlarge the x axis by a factor 2, we get the graph

Page 15: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. GAUSSIAN VECTORS 9

44 220 00.00­2

xy

­2­4 ­40.05

0.15

z0.10

First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, withZ of standard type. From exercises 4 and 5 we readily have:

Proposition 3. Mean � and covariance Q matrix of a vector X of the previous form are givenby

� = b

Q = AAT :

When two di¤erent de�nitions are given for the same object, one has to prove their equivalence.If Q is positive de�nite, the two de�nition aim to describe the same object, but for Q non-negativede�nite but not strictly positive de�nite, we have only the last de�nition, so we do not have to checkany compatibility.

Proposition 4. If Q is positive de�nite, then de�nitions 1 and 2 are equivalent. More precisely, ifX = (X1; :::; Xn) is a Gaussian random vector with mean � and covariance Q in the sense of de�nition1, then there exists a standard Gaussian random vector Z = (Z1; :::; Zn) and a n � n matrix A suchthat

X = AZ + �:

One can take A =pQ, as described in the proof. Vice versa, if X = (X1; :::; Xn) is a Gaussian

random vector in the sense of de�nition 2, of the form X = AZ + b, then X is Gaussian in the senseof de�nition 1, with mean � and covariance Q given by the previous proposition.

Proof. Let us prove the �rst claim. Let us de�nepQ = UT

pQeU

wherepQe is simply de�ned as

pQe =

0@ p�1 0 00 ::: 00 0

p�n

1A :We have �p

Q�T= UT

�pQe

�TU = UT

pQeU =

pQ

Page 16: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

10 1. PRELIMINARIES OF PROBABILITY

and �pQ�2= UT

pQeUU

TpQeU = U

TpQepQeU = U

TQeU = Q

becausepQepQe = Qe. Set

Z =�p

Q��1

X � �

where notice thatpQ is invertible, from its de�nition and the strict positivity of �i. Then Z is

Gaussian. Indeed, from the formula for the transformation of densities,

fZ (z) =fX (x)

jdetDg (x)j

����z=g(x)

where g (x) =�pQ��1

x� �; hence detDg (x) = det�pQ��1

= 1p�1���

p�n; therefore

fZ (z) =nYi=1

p�i

1p(2�)n det(Q)

exp

���p

Qz + ��� �

�TQ�1

��pQz + �

�� �

�2

!

=1p(2�)n

exp

��pQz�TQ�1

�pQz�

2

!=

1p(2�)n

exp

��z

T z

2

�which is the density of a standard Gaussian vector. From the de�nition of Z we get X =

pQZ + �,

so the �rst claim is proved.The proof of the second claim is a particular case of the next exercise, that we leave to the

reader. �Exercise 6. Let X = (X1; :::; Xn) be a Gaussian random vector, B a n�m matrix, c a vector of

Rm. ThenY = BX + c

is a Gaussian random vector of dimension m. The relations between means and covariances is

�Y = B�X + c

and covarianceQY = BQXB

T :

Remark 8. We see from the exercise that we may start with a non-degenerate vector X and geta degenerate one Y , if B is not a bijection. This always happens when m > n.

Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariancematrix. This fundamental fact will be used below when we study stochastic processes.

Remark 10. Some of the previous results are very useful if we want to generate random vectorsaccording to a prescribed Gaussian law. Assume we have prescribed mean � and covariance Q, n-dimensional, and want to generate a random sample (x1; :::; xn) from such N (�;Q). Then we maygenerate n independent samples z1; :::; zn from the standard one-dimensional Gaussian law and com-pute p

Qz + �

Page 17: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. GAUSSIAN VECTORS 11

where z = (z1; :::; zn). In order to have the entries of the matrixpQ, if the software does not provide

them (certain software do it), we may use the formulapQ = UT

pQeU . The matrix

pQe is obvious.

In order to get the matrix U recall that its columns are the vectors e1; :::; en written in the originalbasis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at leasta software that makes the spectral decomposition of a matrix, to get e1; :::; en.

Page 18: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process
Page 19: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

CHAPTER 2

Stochastic processes. Generalities

1. Discrete time stochastic process

We call discrete time stochastic process any sequence X0; X1; X2; :::; Xn; ::: of random variablesde�ned on a probability space (; F; P ), taking values in R. This de�nition is not so rigid with respectto small details: the same name is given to sequences X1; X2; :::; Xn; :::, or to the case when the r.v.Xn take values in a space di¤erent from R. We shall also describe below the case when the time indextakes negative values.

The main objects attached to a r.v. are its law, its �rst and second moments (and possibly higherorder moments and characteristic or generating function, and the distribution function). We do thesame for a process (Xn)n�0: the probability density of the r.v. Xn, when it exists, will be denotedby fn (x), the mean by �n, the standard deviation by �n. Often, we shall write t in place of n, butnevertheless here t will be always a non-negative integer. So, our �rst concepts are:

i) mean function and variance function:

�t = E [Xt] ; �2t = V ar [Xt] ; t = 0; 1; 2; :::

In addition, the time-correlation is very important. We introduce three functions:ii) the autocovariance function C (t; s), t; s = 0; 1; 2; ::::

C (t; s) = E [(Xt � �t) (Xs � �s)]and the function

R (t; s) = E [XtXs]

(the name will be discussed below). They are symmetric (R (t; s) = R (s; t) and the same for C (t; s))so it is su¢ cient to know them for t � s. We have

C (t; s) = R (t; s)� �t�s; C (t; t) = �2t :

In particular, when �t � 0 (which is often the case), C (t; s) = R (t; s). Most of the importance willbe given to �t and R (t; s). In addition, let us introduce:

iii) the autocorrelation function

� (t; s) =C (t; s)

�t�sWe have

� (t; t) = 1; j� (t; s)j � 1:The functions C (t; s), R (t; s), � (t; s) are used to detect repetitions in the process, self-similaritiesunder time shift. For instance, if (Xn)n�0 is roughly periodic of period P , � (t+ P; t) will be signi�-cantly higher than the other values of � (t; s) (except � (t; t) which is always equal to 1). Also a trend

13

Page 20: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

14 2. STOCHASTIC PROCESSES. GENERALITIES

is a form of repetitions, self-similarity under time shift, and indeed when there is a trend all values of� (t; s) are quite high, compared to the cases without trend. See the numerical example below.

Other objects (when de�ned) related to the time structure are:iv) the joint probability density

ft1;:::;tn (x1; :::; xn) ; ; tn � ::: � t1

of the vector (Xt1 ; :::; Xtn) andv) the conditional density

ftjs (xjy) =ft;s (x; y)

fs (y); t > s:

Now, a remark about the name of R (t; s). In Statistics and Time Series Analysis, the nameautocorrelation function is given to � (t; s), as we said above. But in certain disciplines related tosignal processing, R (t; s) is called autocorrelation function. There is no special reason except the factthat R (t; s) is the fundamental quantity to be understood and investigated, the others (C (t; s) and� (t; s)) being simple transformations of R (t; s). Thus R (t; s) is given the name which mostly remindsthe concept of self-relation between values of the process at di¤erent times. In the sequel we shall useboth languages and sometimes we shall call � (t; s) the autocorrelation coe¢ cient.

The last object we introduce is concerned with two processes simultaneously: (Xn)n�0 and (Yn)n�0.It is called:

vi) cross-correlation function

CX;Y (t; s) = E [(Xt � E [Xt]) (Ys � E [Ys])] :

This function is a measure of the similarity between two processes, shifted in time. For instance, itcan be used for the following purpose: one of the two processes, say Y , is known, has a known shapeof interest for us, the other process, X, is the process under investigation, and we would like to detectportions of X which have a shape similar to Y . Hence we shift X in all possible ways and computethe correlation with Y .

When more than one process is investigated, it may be better to write RX (t; s), CX (t; s) and soon for the quantities associated to process X.

1.1. Example 1: white noise. The white noise with intensity �2 is the process (Xn)n�0 withthe following properties:

i) X0; X1; X2; :::; Xn; ::: are independent r.v.�sii) Xn � N

�0; �2

�.

It is a very elementary process, with a trivial time-structure, but it will be used as a building blockfor other classes of processes, or as a comparison object to understand the features of more complexcases. The following picture has been obtained by R software by the commands x<-rnorm(1000);ts.plot(x).

Page 21: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

1. DISCRETE TIME STOCHASTIC PROCESS 15

Let us compute all its relevant quantities (the check is left as an exercise):

�t = 0 �2t = �2

R (t; s) = C (t; s) = �2 � � (t� s)where the symbol � (t� s) denotes 0 for t 6= s, 1 for t = s,

� (t; s) = � (t� s)

ft1;:::;tn (x1; :::; xn) =

nYi=1

p (xi) where p (x) =1p2��2

e�x2

2�2

ftjs (xjy) = p (x) .

1.2. Example 2: random walk. Let (Wn)n�0 be a white noise (or more generally, a processwith independent identically distributed W0;W1;W2; :::). Set

X0 = 0

Xn+1 = Xn +Wn; n � 0:This is a random walk. White noise has been used as a building block: (Xn)n�0 is the solution of arecursive linear equation, driven by white noise (we shall see more general examples later on). Thefollowing picture has been obtained by R software by the commands x<-rnorm(1000); y<-cumsum(x);ts.plot(y).

The random variables Xn are not independent (Xn+1 obviously depends on Xn). One has

Xn+1 =nXi=0

Wi:

We have the following facts We prove them by means of the iterative relation (this generalizesbetter to more complex discrete linear equations). First,

�0 = 0

�n+1 = �n; n � 0hence �n = 0 for every n � 0.

By induction, Xn and Wn are independent for every n, hence:

Page 22: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

16 2. STOCHASTIC PROCESSES. GENERALITIES

Exercise 7. Denote by �2 the intensity of the white noise; �nd a relation between �2n+1 and �2n

and prove that�n =

pn�; n � 0:

An intuitive interpretation of the result of the exercise is that Xn behaves aspn, in a very rough

way.As to the time-dependent structure, C (t; s) = R (t; s), and:

Exercise 8. Prove that R (m;n) = n�2, for all m � n (prove it for m = n, m = m+1, m = n+2and extend). Then prove that

� (m;n) =

rn

m:

The result of this exercise implies that

� (m; 1)! 0 as m!1:We may interpret this result by saying that the random walk looses memory of the initial position.

2. Stationary processes

A process is called wide-sense stationary if �t and R (t+ n; t) are independent of t.It follows that also �t, C (t+ n; t) and � (t+ n; t) are independent of t. Thus we speak of:i) mean �ii) standard deviation �iii) covariance function C (n) := C (n; 0)iv) autocorrelation function (in the improper sense described above)

R (n) := R (n; 0)

v) autocorrelation coe¢ cient (or also autocorrelation function, in the language of Statistics)

� (n) := � (n; 0) :

Page 23: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

2. STATIONARY PROCESSES 17

A process is called strongly stationary if the law of the generic vector (Xn1+t; :::; Xnk+t) is inde-pendent of t. This implies wide stationarity. The converse is not true in general, but it is true forGaussian processes (see below).

2.1. Example: white noise. We have

R (t; s) = �2 � � (t� s)

hence

R (n) = �2 � � (n) :

2.2. Example: linear equation with damping. Consider the recurrence relation

Xn+1 = �Xn +Wn; n � 0

where (Wn)n�0 is a white noise with intensity �2 and

� 2 (�1; 1) :

The following picture has been obtained by R software by the commands (� = 0:9, X0 = 0):w <- rnorm(1000)x <- rnorm(1000)x[1]=0for (i in 1:999) {x[i+1] <- 0.9*x[i] + w[i]}ts.plot(x)

It has some features similar to white noise, but less random, more persistent in the direction whereit moves.

Let X0 be a r.v. independent of the white noise, with zero average and variance e�2. Let us showthat (Xn)n�0 is stationary (in the wide sense) if e�2 is properly chosen with respect to �2.

Page 24: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

18 2. STOCHASTIC PROCESSES. GENERALITIES

First we have

�0 = 0

�n+1 = ��n; n � 0hence �n = 0 for every n � 0. The mean function is constant.

As a preliminary computation, let us impose that the variance function is constant. By induction,Xn and Wn are independent for every n, hence

�2n+1 = �2�2n + �

2; n � 0:If we want �2n+1 = �

2n for every n � 0, we need

�2n = �2�2n + �

2; n � 0namely

�2n =�2

1� �2 ; n � 0:

In particular, this implies the relation e�2 = �2

1� �2 :

It is here that we �rst see the importance of the condition j�j < 1.If we assume this condition on the law of X0, then we �nd

�21 = �2 �2

1� �2 + �2 =

�2

1� �2 = �20

and so on, �2n+1 = �2n for every n � 0. Thus the variance function is constant.

Finally, we have to show that R (t+ n; t) is independent of t. We have

R (t+ 1; t) = E [(�Xt +Wt)Xt] = ��2n =

��2

1� �2which is independent of t;

R (t+ 2; t) = E [(�Xt+1 +Wt+1)Xt] = �R (t+ 1; t) =�2�2

1� �2and so on,

R (t+ n; t) = E [(�Xt+n�1 +Wt+n�1)Xt] = �R (t+ n� 1; t)

= ::: = �nR (t; t) =�n�2

1� �2which is independent of t. The process is stationary. We have

R (n) =�n�2

1� �2 :

It also follows that� (n) = �n:

The autocorrelation coe¢ cient (as well as the autocovariance function) decays exponentially in time.

Page 25: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. TIME SERIES AND EMPIRICAL QUANTITIES 19

2.3. Processes de�ned also for negative times. We may extend a little bit the previousde�nitions and call discrete time stochastic process also the two-sided sequences (Xn)n2Z of randomvariables. Such processes are thus de�ned also for negative time. The idea is that the physical processthey represent started in the far past and continues in the future.

This notion is particularly natural in the case of stationary processes. The function R (n) (similarlyfor C (n) and � (n)) are thus de�ned also for negative n:

R (n) = E [XnX0] ; n 2 Z:By stationarity,

R (�n) = R (n)because R (�n) = E [X�nX0] = E [X�n+nX0+n] = E [X0Xn] = R (n). Therefore we see that thisextension does not contain so much new information; however it is useful or at least it simpli�es somecomputation.

3. Time series and empirical quantities

A time series is a sequence or real numbers, x1; :::; xn. Also empirical samples have the same form.The name time series is appropriate when the index i of xi has the meaning of time.

A �nite realization of a stochastic process is a time series. ideally, when we have an experimentaltime series, we think that there is a stochastic process behind. Thus we try to apply the theory ofstochastic process.

Recall from elementary statistics that empirical estimates of mean values of a single r.v. X arecomputed from an empirical sample x1; :::; xn of that r.v.; the higher is n, the better is the estimate.A single sample x1 is not su¢ cient to estimate moments of X.

Similarly, we may hope to compute empirical estimates of R (t; s) etc. from time series. But here,when the stochastic process has special properties (stationary and ergodic, see below the concept ofergodicity), one sample is su¢ cient! By �one sample�we mean one time series (which is one realizationof the process, like the single x1 is one realization of the r.v. X). Again, the higher is n, the better isthe estimate, but here n refers to the length of the time series.

Consider a time series x1; :::; xn. In the sequel, t and nt are such

t+ nt = n:

Let us de�ne

xt =1

nt

ntXi=1

xi+t; b�2t = 1

nt

ntXi=1

(xi+t � xt)2

bR (t) = 1

nt

ntXi=1

xixi+t

bC (t) = 1

nt

ntXi=1

(xi � x0) (xi+t � xt)

b� (t) = bC (t)b�0b�t =Pnti=1 (xi � x0) (xi+t � xt)qPnt

i=1 (xi � x0)2Pnt

i=1 (xi+t � xt)2:

Page 26: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

20 2. STOCHASTIC PROCESSES. GENERALITIES

These quantities are taken as approximations of

�t; �2t ; R (t; 0) ; C (t; 0) ; � (t; 0)

respectively. In the case of stationary processes, they are approximations of

�; �2; R (t) ; C (t) ; � (t) :

In the section on ergodic theorems we shall see rigorous relations between these empirical and theo-retical functions.

The empirical correlation coe¢ cient

b�X;Y = Pni=1 (xi � x) (yi � y)qPn

i=1 (xi � x)2Pn

i=1 (yi � y)2

between two sequences x1; :::; xn and y1; :::; yn is a measure of their linear similarity. If the there arecoe¢ cients a and b such that the residuals

"i = yi � (axi + b)

are small, then jb�X;Y j is close to 1; precisely, b�X;Y is close to 1 if a > 0, close to -1 if a < 0. A valueof b�X;Y close to 0 means that no such linear relation is really good (in the sense of small residuals).Precisely, smallness of residuals must be understood compared to the empirical variance b�2Y of y1; :::; yn:one can prove that

b�2X;Y = 1� b�2"b�2Y(the so called explained variance, the proportion of variance which has been explained by the linearmodel). After these remarks, the intuitive meaning of bR (t), bC (t) and b� (t) should be clear: theymeasure the linear similarity between the time series and its t-translation. It is useful to detectrepetitions, periodicity, trend.

Example 1. Consider the following time series, taken form EUROSTAT database. It collectsexport data concerning motor vehicles accessories, since January 1995 to December 2008.

Its autocorrelation function b� (t) is given by

Page 27: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

4. GAUSSIAN PROCESSES 21

We see high values (the values of b� (t) are always smaller than 1 in absolute value) for all time lagt. The reason is the trend of the original time series (highly non stationary).

Example 2. If we consider only the last few years of the same time series, precisely January 2005- December 2008, the data are much more stationary, the trend is less strong. The autocorrelationfunction b� (t) is now given by

where we notice a moderate annual periodicity.

4. Gaussian processes

If the generic vector (Xt1 ; :::; Xtn) is jointly Gaussian, we say that the process is Gaussian. The lawof a Gaussian vector is determined by the mean vector and the covariance matrix. Hence the law ofthe marginals of a Gaussian process are determined by the mean function �t and the autocorrelationfunction R (t; s).

Proposition 5. For Gaussian processes, stationarity in the wide and strong sense are equivalent.

Proof. Given a Gaussian process (Xn)n2N, the generic vector (Xt1+s; :::; Xtn+s) is Gaussian,hence with law determined by the mean vector of components

E [Xti+s] = �ti+s

and the covariance matrix of components

Cov�Xti+s; Xtj+s

�= R (ti + s; tj + s)� �ti+s�tj+s:

If the process is stationary in the wide sense, then �ti+s = � and

R (ti + s; tj + s)� �ti+s�tj+s = R (ti � tj)� �2

do not depend on s. Then the law of (Xt1+s; :::; Xtn+s) does not depend on s. This means that theprocess is stationary in the strict sense. The converse is a general fact. The proof is complete. �

Most of the models in these notes are obtained by linear transformations of white noise. White noiseis a Gaussian process. Linear transformations preserve gaussianity. Hence the resulting processes are

Page 28: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

22 2. STOCHASTIC PROCESSES. GENERALITIES

Gaussian. Since we deal very often with stationary processes in the wide sense, being them Gaussianthey also are strictly stationary.

5. Discrete time Fourier transform

Given a series (xn)n2Z of real or complex numbers such thatPn2Z jxnj

2 <1, we denote by bx (!)or by F [x] (!) the discrete time Fourier transform (DTFT) de�ned as

bx (!) = F [x] (!) = 1p2�

Xn2Z

e�i!nxn; ! 2 [0; 2�] :

The function can be considered for all ! 2 R, but it is 2�-periodic. Sometimes the factor 1p2�is not

included in the de�nition; sometimes, it is preferable to use the variant

bx (f) = 1p2�

Xn2Z

e�2�ifnxn; f 2 [0; 1] :

We make the choice above, independently of the fact that in certain applications it is customary orconvenient to make others. The factor 1p

2�is included for symmetry with the inverse transform or

the Plancherel formula (without 1p2�, a factor 1

2� appears in one of them).

The L2-theory of Fourier series guarantees that the seriesPn2Z e

�i!nxn converges in mean squarewith respect to !, namely, there exists a square integrable function bx (!) such that

limN!1

Z 2�

0

������Xjnj�N

e�i!nxn � bx (!)������2

d! = 0:

The sequence xn can be reconstructed from its Fourier transform by means of the inverse Fouriertransform

xn =1p2�

Z 2�

0ei!nbx (!) d!:

Among other properties, let us mention Plancherel formulaXn2Z

jxnj2 =Z 2�

0jbx (!)j2 d!

and the fact that under Fourier transform the convolution corresponds to the product:

F"Xn2Z

f (� � n) g (n)#(!) = bf (!) bg (!) :

When Xn2Z

jxnj <1

then the seriesPn2Z e

�i!nxn is absolutely convergent, uniformly in ! 2 [0; 2�], simply becauseXn2Z

sup!2[0;2�]

��e�i!nxn�� =Xn2Z

sup!2[0;2�]

��e�i!n�� jxnj =Xn2Z

jxnj <1:

Page 29: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

5. DISCRETE TIME FOURIER TRANSFORM 23

In this case, we may also say that bx (!) is a bounded continuous function, not only square inte-grable. Notice that the assumption

Pn2Z jxnj < 1 implies

Pn2Z jxnj

2 < 1, becausePn2Z jxnj

2 �supn2Z jxnj

Pn2Z jxnj and supn2Z jxnj is bounded when

Pn2Z jxnj converges.

One can do the DTFT also for sequences which do not satisfy the assumptionPn2Z jxnj

2 < 1,in special cases. Consider for instance the sequence

xn = a sin (!1n) :

Compute the truncation bx2N (!) = 1p2�

Xjnj�N

e�i!na sin (!1n) :

Recall that

sin t =eit � e�it

2i:

Hence sin (!1n) = ei!1n�e�i!1n2iX

jnj�Ne�i!na sin (!1n) =

1

2i

Xjnj�N

e�i(!�!1)n � 1

2i

Xjnj�N

e�i(!+!1)n:

The next lemma makes use of the concept of generalized function or distribution, which is outside thescope of these notes. We still given the result, to be understood in some intuitive sense. We use thegeneralized function � (t) called delta Dirac, which is characterized by the property

(5.1)Z 1

�1� (t� t0) f (t) dt = f (t0)

for all continuous compact support functions f . No usual function has this property. A way to getintuition is the following one. Consider a function �n (t) which is equal to zero for t outside

�� 12n ;

12n

�,

interval of length 1n around the origin; and equal to n in

�� 12n ;

12n

�. Hence � (t� t0) is equal to zero

for t outside�t0 � 1

2n ; t0 +12n

�, equal to n in

�t0 � 1

2n ; t0 +12n

�. We haveZ 1

�1�n (t) dt = 1:

Now, Z 1

�1�n (t� t0) f (t) dt = n

Z t0+12n

t0� 12n

f (t) dt

which is the average of f around t0. As n!1, this average converges to f (t0) when f is continuous.Namely. we have

limn!1

Z 1

�1�n (t� t0) f (t) dt = f (t0)

which is the analog of identity (5.1), but expressed by means of traditional concepts. In a sense, thus,the generalized function � (t) is the limit of the traditional functions �n (t). But we see that �n (t)converges to zero for all t 6= 0, and to 1 for t = 0. So, in a sense, � (t) is equal to zero for t 6= 0, andto 1 for t = 0; but this is a very poor information, because it does not allow to deduce identity (5.1)(the way �n (t) goes to in�nity is essential, not only the fact that � (t) is 1 for t = 0).

Page 30: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

24 2. STOCHASTIC PROCESSES. GENERALITIES

Lemma 2. Denote by � (t) the generalized function such thatZ 1

�1� (t� t0) f (t) dt = f (t0)

for all continuous compact support functions f (it is called the delta Dirac distribution). Then

limN!1

Xjnj�N

e�itn = 2�� (t) :

From this lemma it follows that

limN!1

Xjnj�N

e�i!na sin (!1n) =�

i� (! � !1)�

i� (! + !1) :

In other words,

Corollary 1. The sequencexn = a sin (!1n)

has a generalized DTFT

bx (!) = limN!1

bx2N (!) = p�p2i(� (! � !1)� � (! + !1)) :

This is only one example of the possibility to extend the de�nition and meaning of DTFT outsidethe assumption

Pn2Z jxnj

2 < 1. It is also very interesting for the interpretation of the concept ofDTFT. If the signal xn has a periodic component (notice that DTFT is linear) with angular frequency!1, then its DTFT has two symmetric peaks (delta Dirac components) at �!1. This way, the DTFTreveals the periodic components of the signal.

Exercise 9. Prove that the sequence

xn = a cos (!1n)

has a generalized DTFT

bx (!) = limN!1

bx2N (!) = p�p2(� (! � !1) + � (! + !1)) :

6. Power spectral density

Given a stationary process (Xn)n2Z with correlation function R (n) = E [XnX0], n 2 Z, we callpower spectral density (PSD) the function

S (!) =1p2�

Xn2Z

e�i!nR (n) ; ! 2 [0; 2�] :

Alternatively, one can use the expression

S (f) =1p2�

Xn2Z

e�2�ifnR (n) ; f 2 [0; 1]

which produces easier visualizations because we catch more easily the fractions of the interval [0; 1].

Page 31: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

6. POWER SPECTRAL DENSITY 25

Remark 11. In principle, to be de�ned, this series requiresPn2Z jR (n)j <1 or at least

Pn2Z jR (n)j

2 <1. In practice, on a side the convergence may happen also in unexpected cases due to cancellations,on the other side it may be acceptable to use a �nite-time variant, something like

Pjnj�N e

�i!nR (n),for practical purposes or from the computational viewpoint.

A priori, one may think that S (f) may be not real valued. However, the function R (n) is non-negative de�nite (this means

Pni=1R (ti � tj) aiaj � 0 for all t1; :::; tn and a1; :::; an) and a theorem

states that the Fourier transform of non-negative de�nite function is a non-negative function. Thus,at the end, it turns out that S (f) is real and also non-negative. We do not give the details of thisfact here because it will be a consequence of the fundamental theorem below.

6.1. Example: white noise. We have

R (n) = �2 � � (n)

hence

S (!) =�2p2�; ! 2 R:

The spectra density is constant. This is the origin of the name, white noise.

6.2. Example: perturbed periodic time series. This example is numeric only. Produce withR software the following time series:

t <- 1:100y<- sin(t/3)+0.3*rnorm(100)ts.plot(y)

The empirical autocorrelation function, obtained by acf(y), is

and the power spectral density, suitable smoothed, obtained by spectrum(y,span=c(2,3)), is

Page 32: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

26 2. STOCHASTIC PROCESSES. GENERALITIES

6.3. Pink, Brown, Blue, Violet noise. In certain applications one meets PSD of special typewhich have been given names similarly to white noise. Recall that white noise has a constant PSD.Pink noise has PSD of the form

S (f) � 1

f:

Brown noise:

S (f) � 1

f2:

Blue noise

S (f) � f � 1�:Violet noise

S (f) � f2 � 1�:

7. Fundamental theorem on PSD

The following theorem is often stated without assumptions in the applied literature. One of thereasons is that it can be proved under various level of generality, with di¤erent meanings of thelimit operation (it is a limit of functions). We shall give a rigorous statement under a very preciseassumption on the autocorrelation function R (n); the convergence we prove is rather strong. Theassumption is a little bit strange, but satis�ed in all our examples. The assumption is that there existsa sequence ("n)n2N of positive numbers such that

(7.1) limn!1

"n = 0;Xn2N

jR (n)j"n

<1:

This is just a little bit more restrictive than the conditionPn2N jR (n)j <1 which is natural to impose

if we want uniform convergence of 1p2�

Pn2Z e

�i!nR (n) to S (!). Any example of R (n) satisfyingPn2N jR (n)j <1 that the reader may have in mind, presumably satis�es assumption (7.1) in a easy

way.

Page 33: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

7. FUNDAMENTAL THEOREM ON PSD 27

Theorem 1 (Wiener-Khinchin). If (X (n))n2Z is a wide-sense stationary process satisfying as-sumption (7.1), then

S (!) = limN!1

1

2N + 1E

���� bX2N (!)���2� :The limit is uniform in ! 2 [0; 2�]. Here X2N is the truncated process X � 1[�N;N ]. In particular, itfollows that S (!) is real an non-negative.

Proof. Step 1. Let us prove the following main identity:

(7.2) S (!) =1

2N + 1E

���� bX2N (!)���2�+ rN (!)where the remainder rN is given by

rN (!) =1

2N + 1F

24 Xn2�(N;�)

E [X (�+ n)X (n)]

35 (!)with

�(N; t) = [�N;N�t ) [ (N+

t ; N ]

N+t =

8<: N if t � 0N � t if 0 < t � N0 if t > N

N�t =

8<: �N if t � 0�N � t if �N � t < 00 if t < �N

:

Since R (t) = E [X (t+ n)X (n)] for all n, we obviously have, for every T > 0,

R (t) =1

2N + 1

Xjnj�N

E [X (t+ n)X (n)] :

Thus

(7.3) S (!) = bR (!) = 1

2N + 1F�Z N

�NE [X (�+ n)X (n)]

�(!) :

Then recall that

F"Xn2Z

f (� � n) g (n)#(!) = bf (!) bg (!)

hence

F"Xn2Z

f (�+ n) g (n)#(!) = F

"Xn2Z

f (� � n) g (�n)#(!)

= bf (!) bg (�!)because

F [g (��)] (!) = bg (�!) :

Page 34: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

28 2. STOCHASTIC PROCESSES. GENERALITIES

Moreover, if the input function g is real, then bg (�!) = bg� (!), so we getF"Xn2Z

f (�+ n) g (n)#(!) = bf (!) bg� (!) :

If f (n) = g (n) = X (n) 1[�N;N ] (n) = X2N (n), then, for t � 0,Xn2Z

f (t+ n) g (n) =

N�(t^N)Xn=�N

X (t+ n)X (n) :

For t < 0 we have Xn2Z

f (t+ n) g (n) =

NXn=�N+((�t)^N)

X (t+ n)X (n) :

In general, Xn2Z

f (t+ n) g (n) =

N+tX

n=N�t

X (t+ n)X (n) :

Therefore

F

24 N+�X

n=N��

X (�+ n)X (n)

35 (!) = bX2N (!) bX�2N (!) =

��� bX2N (!)���2 :And thus

F

24 N+�X

n=N��

E [X (�+ n)X (n)]

35 (!) = E ���� bX2N (!)���2� :From (7.3), we now get (7.2).

Step 2. The proof is complete if we show that limN!1 rN (!) = 0 uniformly in ! 2 [0; 2�]. ButXn2�(N;t)

E [X (t+ n)X (n)] =X

n2�(N;t)R (t) = "t

R (t)

"tj�(N; t)j

where j�(N; t)j denotes the cardinality of �(N; t). We havej�(N; t)j � 2 (N ^ t)

hence

1

2N + 1

������X

n2�(N;t)E [X (t+ n)X (n)]

������ � jR (t)j"t

2 (N ^ t) "t2N + 1

:

Given � > 0, let t0 be such that "t � � for all t � t0. Then take N0 � t0 such that 2t02N+1 � � for all

N � N0. It is not restrictive to assume "t � 1 for all t. Then, for N � N0, if t � t0 then2 (N ^ t) "t2N + 1

� 2t0"t2N + 1

� 2t02N + 1

� �

Page 35: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

7. FUNDAMENTAL THEOREM ON PSD 29

and if t � t0 then2 (N ^ t) "t2N + 1

� 2 (N ^ t)2N + 1

� � �:

We have proved the following statement: for all � > 0 there exists N0 such that

2 (N ^ t) "t2N + 1

� �

for all N � N0, uniformly in t. Then also

1

2N + 1

������X

n2�(N;t)E [X (t+ n)X (n)]

������ � jR (t)j"t

for all N � N0, uniformly in t. Therefore

jrN (!)j =

������ 1

2N + 1

1p2�

Xt2Z

e�i!t

24 Xn2�(N;t)

E [X (t+ n)X (n)]

35������� 1

2N + 1

1p2�

Xt2Z

������X

n2�(N;t)E [X (t+ n)X (n)]

������ � 1p2�

Xt2Z

jR (t)j"t

� =Cp2��

where C =Pt2Z

jR(t)j"t

< 1. This is the de�nition of limN!1 rN (!) = 0 uniformly in ! 2 [0; 2�].The proof is complete. �

This theorem gives us the interpretation of PSD. The Fourier transform bXT (!) identi�es thefrequency structure of the signal. The square

��� bXT (!)���2 drops the information about the phase andkeeps the information about the amplitude, but in the sense of energy (a square). It gives us theenergy spectrum, in a sense. So the PSD is the average amplitude of the oscillatory component atfrequency f = !

2� .Thus PSD is a very useful tool if you want to identify oscillatory signals in your time series data

and want to know their amplitude. By PSD, one can get a "feel" of data at an early stage of timeseries analysis. PSD tells us at which frequency ranges variations are strong.

Remark 12. A priori one could think that it were more natural to compute the Fourier transformbX (!) = Pn2Z e

i!nXn without a cut-o¤ of size T . But the process (Xn) is stationary. Therefore,it does not satisfy the assumption

Pn2ZX

2n < 1 or similar ones which require a decay at in�nity.

Stationarity is in contradiction with a decay at in�nity (it can be proved, but we leave it at the obviousintuitive level).

Remark 13. Under more assumptions (in particular a strong ergodicity one) it is possible to provethat

S (!) = limT!1

1

T

��� bXT (!)���2without expectation. Notice that 1

T

��� bXT (!)���2 is a random quantity, but the limit is deterministic.

Page 36: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

30 2. STOCHASTIC PROCESSES. GENERALITIES

8. Signal to noise ratio

Assume the process (Xn)n2Z we observe is the superposition of white noise (Wn)n2Z and a signal(fn)n2Z, namely a process (maybe deterministic) which contains information and we would like todetect in spite of the noise corruption. The �nal problem is the noise �ltering, namely the reconstruc-

tion of a signal� efn�

n2Zas close as possible to (fn)n2Z (the meaning of closedness maybe di¤erent;

for instance we could be interested only in distinguishing between two a priori known signals). Let usmake only preliminary comments on the size of the signal inside the noise.

AssumeXn =Wn + fn

with X and f independent of each other and, for sake of simplicity, assume f stationary. Then

RX (n) = E [XnX0] = E [WnW0] + E [Wnf0] + E [fnW0] + E [fnf0]

= �2� (n) +Rf (n) :

So

�X (n) =RX (n)

RX (0)=�2� (n) +Rf (n)

�2 +Rf (0)=

1

1 + SNR� (n) +

SNR

1 + SNR�f (n)

where

SNR :=Rf (0)

�2=�2f�2W

is the so called signal-to-noise-ratio. We see that we appreciate the shape of �f (n) in �X (n) only ifSNR is su¢ ciently large. [One should be more precise. Indeed, theoretically, since � (n) is equal tozero for n 6= 0, we always see SNR

1+SNR�f (n) with in�nite precision. The problem is that the measured�W (n) is not � (n) but something close to 1 at n = 0, close to zero for n 6= 0 but not equal to zero.However, the closedness to zero of �W (n) is not just measured by �2: it depends on the number ofobserved points, the whiteness of the noise, ... so we cannot write a simple formula.]

Second,

SX (!) =�2p2�+ Sf (!)

where

Sf (!) =1p2�

Xn2Z

e�i!nRf (n) =Rf (0)p2�

Xn2Z

e�i!n�f (n) :

Thus again we seep2�SX (!)

�2= 1 + SNR �

Xn2Z

e�i!n�f (n) :

The contribution of the signal,Pn2Z e

�i!n�f (n), is visible only if SNR is not too small. [Here also

we could say that we always may reconstruct exactlyPn2Z e

�i!n�f (n), just by takingp2� SX(!)

�2� 1;

however, the term �1 is only theoretical, in practice it is a moderately �at function, with �uctuations,and usually with a cut-o¤ at large distances, again all facts depending on the size of the sample andthe whiteness of the noise.]

Page 37: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

9. AN ERGODIC THEOREM 31

9. An ergodic theorem

There exist several versions of ergodic theorems. The simplest one is the Law of Large Numbers.Let us recall it in its simplest version, with convergence in mean square.

Proposition 6. If (Xn)n�1 is a sequence of uncorrelated r.v. (Cor (Xi; Xj) = 0 for all i 6= j),with �nite and equal mean � and variance �2, then 1

n

Pni=1Xi converges to � in mean square:

limn!1

E

24����� 1nnXi=1

Xi � ������235 = 0:

It also converges in probability.

Proof.1

n

nXi=1

Xi � � =1

n

nXi=1

(Xi � �)

hence ����� 1nnXi=1

Xi � ������2

=1

n2

nXi;j=1

(Xi � �) (Xj � �)

E

24����� 1nnXi=1

Xi � ������235 = 1

n2

nXi;j=1

E [(Xi � �) (Xj � �)] =1

n2

nXi;j=1

Cor (Xi; Xj)

=1

n2

nXi;j=1

�2�ij =�2

n! 0:

�Recall that Chebyshev inequality states (in this particular case)

P

����� 1nnXi=1

Xi � ������ > "

!�Eh�� 1n

Pni=1Xi � �

��2i"2

for every " > 0. Hence, from the computation of the previous proof we deduce

P

����� 1nnXi=1

Xi � ������ > "

!� �2

"2 � n:

In itself, this is an interesting estimate on the probability that the sample average 1n

Pni=1Xi di¤ers

from � more than ". It follows that

limn!1

P

����� 1nnXi=1

Xi � ������ > "

!= 0

for every " > 0. This is the convergence in probability of 1nPni=1Xi to �.

Page 38: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

32 2. STOCHASTIC PROCESSES. GENERALITIES

Remark 14. Often this theorem is stated only in the particular case when the r.v. Xi are indepen-dent and identically distributed, with �nite second moment. We see that the proof is very easy undermuch more general assumptions.

We have written the proof, very classical, so that the proof of the following lemma is obvious.

Lemma 3. Let (Xn)n�1 be a sequence of r.v. with �nite second moments and equal mean �.Assume that

(9.1) limn!1

1

n2

nXi;j=1

Cor (Xi; Xj) = 0:

Then 1n

Pni=1Xi converges to � in mean square and in probability.

The lemma will be useful if we detect interesting su¢ cient conditions for (9.1). Here is our mainergodic theorem. Usually by the name ergodic theorem one means a theorem which states that thetime-averages of a process converge to a deterministic value (the mean of the process, in the stationarycase).

Theorem 2. Assume that (Xn)n�1 is a wide sense stationary process (this ensures in particularthat (Xn)n�1 is a sequence of r.v. with �nite second moments and equal mean �). If

limn!1

R (n) = 0

then 1n

Pni=1Xi converges to � in mean square and in probability.

Proof. Since Cor (Xi; Xj) = Cor (Xj ; Xi), we have������nX

i;j=1

Cor (Xi; Xj)

������ �nX

i;j=1

jCor (Xi; Xj)j � 2nXi=1

iXj=1

jCor (Xi; Xj)j

so it is su¢ cient to prove that

limn!1

1

n2

nXi=1

iXj=1

jCor (Xi; Xj)j = 0:

Since the process is stationary, Cor (Xi; Xj) = R (i� j) so we have to prove limn!1 1n2Pni=1

Pij=1 jR (i� j)j =

0. ButnXi=1

iXj=1

jR (i� j)j =nXi=1

i�1Xk=0

jR (k)j

= jR (0)j+ (jR (0)j+ jR (1)j) + (jR (0)j+ jR (1)j+ jR (2)j) + :::+ (jR (0)j+ :::+ jR (n� 1)j)= n jR (0)j+ (n� 1) jR (1)j+ (n� 2) jR (2)j+ :::+ jR (n� 1)j

=

n�1Xk=0

(n� k) jR (k)j � nn�1Xk=0

jR (k)j :

Page 39: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

9. AN ERGODIC THEOREM 33

Therefore it is su¢ cient to prove limn!1 1n

Pn�1k=0 jR (k)j = 0. If limn!1R (n) = 0, for every " > 0

there is n0 such that for all n � n0 we have jR (n)j � ". Hence, for n � n0,

1

n

n�1Xk=0

jR (k)j � 1

n

n0�1Xk=0

jR (k)j+ 1

n

n�1Xk=n0

" � 1

n

n0�1Xk=0

jR (k)j+ ":

SincePn0�1k=0 jR (k)j is independent of n, there is n1 � n0 such that for all n � n1

1

n

n0�1Xk=0

jR (k)j � ":

Therefore, for all n � n1,1

n

n�1Xk=0

jR (k)j � 2":

This means that limn!1 1n

Pn�1k=0 jR (k)j = 0. The proof is complete. �

9.1. Rate of convergence. Concerning the rate of convergence, recall from the proof of theLLG that

E

24����� 1nnXi=1

Xi � ������235 � �2

n:

We can reach the same result in the case of the ergodic theorem, under a suitable assumption.

Proposition 7. If (Xn)n�1 is a wide sense stationary process such that

� :=

1Xk=0

jR (k)j <1

(this implies limn!1R (n) = 0) then

E

24����� 1nnXi=1

Xi � ������235 � 2�

n:

Proof. It is su¢ cient to put together several steps of the previous proof:

E

24����� 1nnXi=1

Xi � ������235 = 1

n2

nXi;j=1

Cor (Xi; Xj) �2

n2

nXi=1

iXj=1

jCor (Xi; Xj)j

� 2

n

n�1Xk=0

jR (k)j � 2�

n:

The proof is complete. �Notice that the assumptions of these two ergodic results (especially the ergodic theorem) are very

general and always satis�ed in our examples.

Page 40: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

34 2. STOCHASTIC PROCESSES. GENERALITIES

9.2. Empirical autocorrelation function. Very often we need the convergence of time averagesof certain functions of the process: we would like to have

1

n

nXi=1

g (Xi)! �g

in mean square, for certain functions g. We need to check the assumptions of the ergodic theorem forthe sequence (g (Xn))n�1. Here is a simple example.

Proposition 8. Let (Xn)n�0 be a wide sense stationary process, with �nite fourth moments, suchthat E

�X2nX

2n+k

�is independent of n and

limk!1

E�X20X

2k

�= 0:

Then 1n

Pni=1X

2i converges to E

�X21

�in mean square and in probability.

Proof. Consider the process Yn = X2n. The mean function of (Yn) is E

�X2n

�which is independent

of n by the wide-sense stationarity of (Xn). For the autocorrelation function

R (n; n+ k) = E [YnYn+k] = E�X2nX

2n+k

�we need the new assumption of the proposition. Thus (Yn) is wide-sense stationary. Finally, from theassumption limk!1E

�X20X

2k

�= 0, which means limk!1RY (k) = 0 where RY (k) is the autocorre-

lation function of (Yn), we can apply the ergodic theorem. The proof is complete. �

More remarkable is the following result, related to the estimation of R (n) by sample path autocor-relation function. Given a process (Xn)n�1, call sample path (or empirical) autocorrelation functionthe process

1

n

nXi=1

XiXi+k:

Theorem 3. Let (Xn)n�0 be a wide sense stationary process, with �nite fourth moments, suchthat E [XnXn+kXn+jXn+j+k] is independent of n and

limj!1

E [X0XkXjXj+k] = 0:

Then the sample path autocorrelation function 1n

Pni=1XiXi+k converges to R (k) as n!1 in mean

square and in probability. Precisely, for every k 2 N, we have

limn!1

E

24����� 1nnXi=1

XiXi+k �R (k)�����235 = 0

and similarly for the convergence in probability.

Proof. Given k 2 N, consider the new process Yn = XnXn+k. Its mean function is constant in nbecause of the wide-sense stationarity of (Xn). For the autocorrelation function,

RY (n; n+ j) = E [YnYn+j ] = E [XnXn+kXn+jXn+j+k]

Page 41: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

9. AN ERGODIC THEOREM 35

it is independent of n by assumption. Moreover, RY (j) converges to zero. Thus it is su¢ cient toapply the ergodic theorem, with the remark that E [Y0] = R (k). The proof is complete. �

With similar proof one can obtain other results of the same type. Notice that the additionalassumptions that E

�X2nX

2n+k

�and E [XnXn+kXn+jXn+j+k] are independent of n are a consequence

of the assumption that (Xn) is stationary in the strict sense. Thus, stationarity in the strict sensecan be put as an assumption for several ergodic statements. Recall that wide-sense stationarity plusgaussianity implies strict-sense stationarity.

Remark 15. Hölder inequality implies

E [X0XkXjXj+k] � E�X20X

2j

�1=2E�X2kX

2j+k

�1=2:

Under the assumptions of proposition 8, in particular

limk!1

E�X20X

2k

�= 0

we deduce limj!1E [X0XkXjXj+k] = 0. The assumption limk!1E�X20X

2k

�= 0 is very close to the

assumption limk!1E [X0Xk] = 0 and is satis�ed in all our examples.

Page 42: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process
Page 43: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

CHAPTER 3

ARIMA models

1. De�nitions

1.1. AR models. An AR(p) (AutoRegressive of order p) model is a discrete time linear equationswith noise, of the form

Xt = �1Xt�1 + :::+ �pXt�p + "t:

Here p is the order, �1; :::; �p are the parameters or coe¢ cients (real numbers), "t is an error term,usually a white noise with intensity �2. The model is considered either on integers t 2 Z, thus withoutinitial conditions, or on the non-negative integers t 2 N. In this case, the relation above starts fromt = p and some initial condition X0; :::; Xp�1 must be speci�ed.

Example 3. We have seen above the simplest case of an AR(1) model

Xt = �Xt�1 + "t:

With j�j < 1 and V ar [Xt] = �2

1��2 , it is a wide sense stationary process (in fact strict sense since itis gaussian, see also below). The autocorrelation coe¢ cient decays exponentially:

� (n) = �n:

Even if the formula is not so simple, one can prove the same result for any AR model.

In order to model more general situations, it may be convenient to introduce models with non-zeroaverage, namely of the form

(Xt � �) = �1 (Xt�1 � �) + :::+ �p (Xt�p � �) + "t:

When � = 0, if we take an initial condition having zero average (this is needed if we want stationarity),then E [Xt] = 0 for all t. We may escape this restriction by taking � 6= 0. The new process Zt = Xt��has zero average and satis�es the usual equation

Zt = �1Zt�1 + :::+ �pZt�p + "t:

But Xt satis�es

Xt = �1Xt�1 + :::+ �pXt�p + "t + (�� �1�� :::� �p�)= �1Xt�1 + :::+ �pXt�p + "t + e�:

37

Page 44: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

38 3. ARIMA MODELS

1.2. Time lag operator. Let S be the space of all sequences (xt)t2Z of real numbers. Let usde�ne an operator L : S ! S, a map which transform sequences in sequences. It is de�ned as

Lxt = xt�1; for all t 2 Z:

We should write (Lx)t = xt�1, with the meaning that, given a sequence x = (xt)t2Z 2 S, we introducea new sequence Lx 2 S, that at time t is equal to the original sequence at time t � 1, hence thenotation (Lx)t = xt�1. For shortness, we drop the bracket and write Lxt = xt�1, but it is clear thatL does not operate on the single value xt (in such a case it could produce any function of xt, but notxt�1).

The map L is called time lag operator, or backward shift, because the result of L is a shift, atranslation, of the sequence, backwards (in the sense that we observe the same sequence but from oneposition shifted on the left).

If we work on the space S+ of sequences (xt)t2N de�ned only for non-negative times, we cannotde�ne this operator since, given (xt)t2N, its �rst value is x0, while the �rst value of Lx should bex�1 which does not exist. Nevertheless, if we forget the �rst value, the operation of backward shiftis meaningful also here. Hence the notation Lxt = xt�1 is used also for sequences (xt)t2N, with theunderstanding that one cannot take t = 0 in it.

Exercise 10. The time lag operator is a linear operator.

The powers, positive and negative, of the lag operator are denoted by Lk:

Lkxt = xt�k; for t 2 Z

(or for t � k _ 0 for sequences (xt)t2N).With this notation, the AR model reads

1�pXk=1

�kLk

!Xt = "t:

1.3. MA models. A MA(q) (Moving Average with orders p and q) model is an explicit formulafor Xt in terms of noise of the form

Xt = "t + �1"t�1 + :::+ �q"t�q:

The process is given by a (weighted) average of the noise, but not an average from time zero to thepresent time t; instead, an average moving with t is taken, using only the last q + 1 times.

Using time lags we can write

Xt =

1 +

qXk=1

�kLk

!"t:

Page 45: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

1. DEFINITIONS 39

1.4. ARMA models. An ARMA(p; q) (AutoRegressive Moving Average with orders p and q)model is a discrete time linear equations with noise, of the form

1�pXk=1

�kLk

!Xt =

1 +

qXk=1

�kLk

!"t

or explicitlyXt = �1Xt�1 + :::+ �pXt�p + "t + �1"t�1 + :::+ �q"t�q:

We may incorporate a non-zero average in this model. If we want that Xt has average �, thenatural procedure is to have a zero-average solution Zt of

Zt = �1Zt�1 + :::+ �pZt�p + "t + �1"t�1 + :::+ �q"t�q

and take Xt = Zt + �, hence solution of

Xt = �1Xt�1 + :::+ �pXt�p + "t + �1"t�1 + :::+ �q"t�q + e�with e� = �� �1�� :::� �p�:

1.5. Di¤erence operator. Integration. The �rst di¤erence operator, �, is de�ned as

�Xt = Xt �Xt�1 = (1� L)Xt:If we call

Yt = (1� L)Xtthen we may reconstruct Xt from Yt by integration:

Xt = Yt +Xt�1 = Yt + Yt�1 +Xt�2 = ::: = Yt + :::+ Y1 +X0

having the initial condition X0.The second di¤erence operator, �2, is de�ned as

�2Xt = (1� L)2Xt:Assume we have

Yt = (1� L)2Xt:Then

Yt = (1� L)ZtZt = (1� L)Xt

so we may �rst reconstruct Zt from Yt:

Zt = Yt + :::+ Y2 + Z1

whereZ1 = (1� L)X1 = X1 �X0

(thus we need X1and X0); then we reconstruct Xt from Zt:

Xt = Zt + :::+ Z1 +X0:

All this can be generalized to �d, for any positive integer d.

Page 46: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

40 3. ARIMA MODELS

1.6. ARIMA models. An ARIMA(p; d; q) (AutoRegressive Integrated Moving Average withorders p, d, q) model is a discrete time linear equations with noise, of the form

1�pXk=1

�kLk

!(1� L)dXt =

1 +

qXk=1

�kLk

!"t:

It is a particular case of ARMA models, but with a special structure. Set Yt := (1� L)dXt. Then Ytis an ARMA(p; q) model

1�pXk=1

�kLk

!Yt =

1 +

qXk=1

�kLk

!"t

and Xt is obtained from Yt by d successive integrations. The number d is thus the order of integration.

Example 4. The random walk is ARIMA(0; 1; 0).

We may incorporate a non-zero average in the auxiliary process Yt and consider the equation 1�

pXk=1

�kLk

!(1� L)dXt =

1 +

qXk=1

�kLk

!"t + e�

with e� = �� �1�� :::� �p�:2. Stationarity, ARMA and ARIMA processes

Under suitable conditions on the parameters, there are stationary solutions to ARMA models,called ARMA processes.

In the simplest case of AR(1) models, we have proved the stationarity (with suitable variance ofthe initial condition) when the parameter � satis�es j�j < 1. In general, there are conditions but theyare quite technical and we address the interested reader to the specialized literature. In the sequelwe shall always use sentences of the form: �consider a stationary solution of the following ARMAmodel�, meaning implicitly that it exists, namely that we are in the framework of such conditions.Our statements will therefore hold only in such case, otherwise are just empty statements.

Integration brakes stationarity. Solutions to ARIMA models are always non-stationary if we takeYt stationary (in this case the corresponding Xt is called ARIMA process). For instance, the randomwalk is not stationary. The growth of such processes is not trivial. But if we include a non-zeroaverage, namely we consider the case

1�pXk=1

�kLk

!(1� L)dXt =

1 +

qXk=1

�kLk

!"t + e�

then we have the following: if d = 1, Xt has a linear trend ; if d = 2, a quadratic trend, and so on.Indeed, at a very intuitive level, if Yt is a stationary solution of the associated ARMA model, withmean �, then its integration produces a trend: a single step integration gives us

Xt = Yt + :::+ Y1 +X0

Page 47: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. CORRELATION FUNCTION 41

so the stationary values of Y accumulate linearly; a two step integration produces a quadratic accumu-lation, and so on. When � = 0, the sum Yt+ :::+Y1 has a lot of cancellations, so the trend is sublinear(roughly it behaves as a square root). But the cancellations become statistically not signi�cant when� 6= 0. If � > 0 and d = 1 we observe an average linear growth; if If � < 0 and d = 1 we observean average linear decay. This is also related to the ergodic theorem: since Yt is stationary and itsautocorrelation decays at in�nity, we may apply the ergodic theorem and have that

Yt + :::+ Y1t

! E [Y1] = �

(in mean square). HenceYt + :::+ Y1 � � � t:

There are �uctuations, roughly of the order of a square root, around this linear trend.

3. Correlation function

3.1. First results. Assume we have a stationary, mean zero, ARMA process. Set n := R (n) =E [XnX0]. Recall that R (�n) = R (n). Notice that

EhXt�jL

kXt

i= E [Xt�jXt�k] = k�j :

Then

j �pXk=1

�k j�k = E

"Xt�j

1 +

qXk=1

�kLk

!"t

#:

In the case j > q (possibly q = 0), we have Lk"t independent of Xt�j , for k � q, hence

j �pXk=1

�k j�k = 0:

This formula, taken for every j > q, allows us to compute the autocorrelation function when q = 0(AR processes).

Example 5. Consider the simple case

Xt = aXt�1 + "t:

We get j � � j�1 = 0

for every j > 0, namely

1 = � 0

2 = � 1

:::

where 0 = E�X20

�. On its own,

V ar [Xt] = a2V ar [Xt�1] + V ar ["t]

Page 48: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

42 3. ARIMA MODELS

hence 0 = a

2 0 + �2

which gives us 0.

Example 6. Consider the next case,

Xt = a1Xt�1 + a2Xt�2 + "t:

We get j = �1 j�1 + �2 j�2

for every j > 0, namely

1 = �1 0 + �2 �1

2 = �1 1 + �2 0

:::

The �rst equation, in view of �1 = 1, gives us

1 =�1

1� �2 0:

Hence again we just need to �nd 0. We have

V ar [Xt] = a1V ar [Xt�1] + a2V ar [Xt�2] + �2 + 2Cov (Xt�1; Xt�2)

hence 0 = a1 0 + a2 0 + �

2 + 2 1:

This is a second relation between 0 and 1; together, they will gives us both quantities.

3.2. ARMA as in�nite MA. Assume always that we have a stationary, mean zero, ARMAprocess. Assume it is de�ned for all integers (in particular we use the noise for negative integers).From

1�pXk=1

�kLk

!Xt =

1 +

qXk=1

�kLk

!"t

we get, when proper convergence conditions are satis�ed,

Xt =

1Xj=0

'jLj"t

whereP1j=0 'jx

j is the Taylor expansion of the function

g (x) =1 +

Pqk=1 �kx

k

1�Ppk=1 �kx

k:

Indeed, assume this function g has the Taylor development g (x) =P1j=0 'jx

j in a neighborhoodU of the origin. Then, for each x 2 U ,

1�pXk=1

�kxk

! 1Xj=0

'jxj =

1 +

qXk=1

�kxk

!:

Page 49: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

3. CORRELATION FUNCTION 43

Assume1Xj=0

'2j <1:

One can prove that the series Xt :=P1j=0 'jL

j"t converges in mean square; and alsoP1j=0 'jL

j�k"tfor every k, and thus we have

1�pXk=1

�kLk

! 1Xj=0

'jLj"t =

1 +

qXk=1

�kLk

!"t:

Therefore Xt :=P1j=0 'jL

j"t solves the equation which de�nes the ARMA process.

Example 7. Consider the simple case

Xt = aXt�1 + "t:

We have

g (x) =1

1� �xhence

g (x) =

1Xj=0

(�x)j

(recall the geometric series). The series converges for j�xj < 1. We need j�j < 1 to haveP1j=0 '

2j <1.

Therefore the ARMA process is

Xt =

1Xj=0

�jLj"t:

3.3. Correlation function, part 2. Assume always that we have a stationary, mean zero,ARMA process and set n := R (n) = E [XnX0]. Under proper conditions,

Xt =1Xi=0

'iLi"t:

Hence we may compute E�Xt�j

�1 +

Pqk=1 �kL

k�"t�which was left aside above, in part 1. We have

(we set �0 = 1)

E

"Xt�j

1 +

qXk=1

�kLk

!"t

#=

1Xi=0

qXk=0

'i�kEhLi"t�jL

k"t

i=

1Xi=0

qXk=0

'i�k�i+j;k�2

=

1Xi=0

'i�i+j�21i+j2f0;:::;qg:

Thus we get

j �pXk=1

�k j�k =1Xi=0

'i�i+j�21i+j2f0;:::;qg:

Page 50: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

44 3. ARIMA MODELS

Notice that it reduces to j �Ppk=1 �k j�k = 0 for j > q, because in such a case i + j 2 f0; :::; qg is

never satis�ed.This is a general formula. A more direct approach to compute E

�Xt�j

�1 +

Pqk=1 �kL

k�"t�even

if j � q consists in the substitution of the equation for Xt�j :

E

"Xt�j

1 +

qXk=1

�kLk

!"t

#= E

" pXk=1

�kLkXt�j +

1 +

qXk=1

�kLk

!"t�j

! 1 +

qXk=1

�kLk

!"t

#:

The products involving Lk"t�j and Lk0"t are easily computed. Then we have products of the form

EhLkXt�jL

k0"t

ithe worse of which is

E�L1Xt�jL

q"t�:

If j � q, it is zero, otherwise not, but we may repeat the trick and go backward step by step. In simpleexamples we may compute all j by this strategy.

Example 8. Consider the simple case

Xt = �Xt�1 + "t + �"t�1:

We get j � � j�1 = 0

for every j > 1, namely

2 = � 1

3 = � 2

:::

but from this relations we miss 1 and 0. About 1, we have

1 � � 0 = E [Xt�1 (1 + �L) "t] = �E [Xt�1"t�1]= �E [(�Xt�2 + "t�1 + �"t�2) "t�1] = ��

2:

Therefore 1 is given in terms of 0 (and then iteratively also 2; 3 and so on). On its own,

V ar [Xt] = a2V ar [Xt�1] + �

2 + �2�2 + 2��Cov (Xt�1; "t�1)

hence 0 = a

2 0 + �2 + �2�2 + 2��Cov (Xt�1; "t�1) :

Moreover,Cov (Xt�1; "t�1) = Cov (�Xt�2 + "t�1 + �"t�2; "t�1) = �

2

hence 0 = a

2 0 + �2 + �2�2 + 2���2:

This gives us 0.

Page 51: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

4. POWER SPECTRAL DENSITY 45

4. Power spectral density

Theorem 4.

S (!) =�2

2�

�����1 +Pqk=1 �ke

�ik!

1�Ppk=1 �ke

�ik!

�����2

:

Proof. We have

bXT (!) = 1p2�

Xn2ZT

e�i!nXn =1p2�

Xn2ZT

1Xj=0

'je�i!n"n�j

bX�T (!) =

1p2�

Xn02ZT

1Xj0=0

'j0ei!n0"n0�j0

Eh bXT (!) bX�

T (!)i=1

2�E

24Xn2ZT

Xn02ZT

1Xj=0

1Xj0=0

'j'j0e�i!nei!n

0E�"n�j"n0�j0

�35

=�2

2�

Xn2ZT

1Xj=0

1Xj0=0

'j'j0e�i!nei!(n�j+j

0) = T�2

2�

1Xj=0

1Xj0=0

'je�i!j'j0e

i!j0 = T�2

2�

�����1Xn=0

'ne�i!n

�����2

:

Remark 16. Consider the case q = 0 and write the formula with ! = 2�f

S (f) =�2

2�

1��1�Ppk=1 �ke

�2�ikf��2 :

Assume there is only k = p:

S (f) =�2

2�

1

j1� �pe�2�ipf j2:

The maxima are at pf 2 Z, namely for f = 1p . The function S (f) immediately shows the periodicity

in the recursion

Xt = apXt�p + "t:

4.1. Example.

Xt = 0:8 �Xt�12 + "t

S (f) =�2

2�

���� 1

1� 0:8 � e�2�i�12�f

����2

Page 52: Stochastic Processes (Master degree in Engineering)users.dma.unipi.it/~flandoli/StochasticProcesses2.pdf · Stochastic processes. Generalities 13 1. Discrete time stochastic process

46 3. ARIMA MODELS

0.0 0.1 0.2 0.30

1

2

3

4

x

y


Recommended