Chapter4 (2W) - Signal Modeling -Statistical Digital Signal Processing and Modeling

Post on 24-Oct-2015

35 views 5 download

transcript

Statistical Digital Signal Processing and Modeling

Prof. Dr. Guido Schuster

University of Applied Sciences of Eastern Switzerland in Rapperswil (HSR)

Chapter 4 – Signal Modeling

Slides follow closely the book „ Statistical Digital Signal Processing and Modeling“ by Monson H. Hayes and most of the figures and formulas are taken from there

1

Introduction •  The goal of signal modeling is that

a parametric description of the signal is available

•  This can be used for filter design and/or interpolation and/or extrapolation and/or compression

•  We always use the same model, which is the output of a causal linear shift-invariant filter that has a rational system function

•  The filter input is typically a discrete impulse δ(n)

2

Direct Least Squares Method •  The modeling error is the

difference between the unit sample response h(n) and the desired signal x(n)

•  The goal is to minimize the squared error

•  Which implies

•  This results in a set of nonlinear equations

–  These are hard to solve and hence the direct least squares method is not used in practice

3

Exact Signal Matching •  Let x(n) be a signal that we

want to model with a causal first-order all-pole filter unit sample response

•  This unit sample response has two degrees of freedom b(0) and a(1)

•  Setting h(0)=x(0) and h(1)=x(1) implies

•  Assuming x(0) is not zero this results in the following model

4

Exact Signal Matching •  If we increase the order of the

model to include one pole and one zero the problem becomes a bit more complicated

•  This unit sample response has three degrees of freedom b(0), b(1) and a(1)

•  Setting h(0)=x(0),h(1)=x(1) and h(2)=x(2) implies

•  These equations are nonlinear, but in this case can still easily be solved

5

Padé Approximation •  The previous example showed,

that for matching the number of unit sample response h(n) values equal to the degrees of freedom (p+q+1) in the system function H(z) can result in a set of nonlinear equations

•  This is true in general, but there is an elegant trick to avoid these nonlinear equations and STILL match a given number of x(n) values with the sample unit response of a linear time invariant filter

6

Padé Approximation •  Instead of working with the

system function directly

•  We use a little trick

•  In the time domain, this becomes a convolution

•  For n>q, the right hand side is zero

•  Or written in matrix notation

7

Padé Approximation •  This equation is solved in two

steps. First for the poles, then with the now known poles, for the zeros.

•  Hence the ap(k) parameters can be found solving this set of linear equations

8

Padé Approximation •  There a three cases which

have to be handled

–  Case I: Xq is non-singular •  Hence the inverse exist and the

coefficients of Ap(z) are unique

–  Case II: Xq is singular and a solution exists.

•  The coefficients of Ap(z) are not unique since there are non-zero solutions z to the homogeneous equation. Hence for any solution and for any z this combination is also a solution. In this case the solution where the most coefficients are zero might be a good choice

–  Case III: Xq is singular and no solution exists.

•  Hence the assumption, that a(0) is not zero was incorrect. So we set a(0) to zero and solve now instead of

9

Since Xq is singular, there is a nonzero solution to this

Padé Approximation •  Having found the coefficients

of Ap(z), the second step is to find the coefficients of Bq(z) using the first q+1 equations

•  Or in matrix notation

•  Or using the convolution formula directly

10

Padé Approximation

11

All Pole Model Padé Approximation •  Assume that H(z) is an all pole

model, i.e., q=0

•  Hence the Padé equations become

•  Or in matrix notation

•  Since X0 is a lower triangular Toeplitz matrix the denominator coefficients may be found easily by back substitution

12

Padé Approximation Example •  The first six values of a signal

are given

•  The goal is to find a three different Padé approximation with three degrees of freedom

–  All pole (q=0, p=2) –  FIR (q=2, p=0) –  IIR (q=1, p=1)

•  All pole model (q=0, p=2)

13

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

Padé Approximation Example •  The last two equations result in

•  Substituting the given values

•  And solving for a(1) and a(2)

•  From the first equation follows

•  Hence the model for x(n) is

•  Note that the model is not stable, since the poles are outside the unit circle, even though the unit sample response matches x(0), x(1) and x(2)

14

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

n (samples)

Am

plitu

de

Impulse Response

0 5 10 15 20 25-200

-100

0

100

200

300

400

n (samples)

Am

plitu

de

Impulse Response

Padé Approximation Example •  FIR model

(q=2, p=0), the result is trivial

•  Hence the model is simply and the first three values of the unit sample response are matched and all other values are zero

15

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

n (samples)

Am

plitu

de

Impulse Response

Padé Approximation Example •  One pole and one zero IIR model

(q=1,p=1)

•  The Padé equations are

•  The pole can be found from the last equation

•  This results in

•  Knowing the pole allows us to calculate the zeros with the top two equations

•  Hence the model has this unit sample response identical to x(n) (Special Case!)

16

Padé Approximation Example •  How can the Padé equations

be solved if Xq singular?

•  Let x(n) be approximated with an IIR model (p=2, q=2),i.e., 5 degrees of freedom

•  The last two equations are used for finding the poles. Clearly this is a singular set of equations and no non-zero a(1), a(2) combination can be found

•  This implies, that the assumption a(0)=1 is incorrect.

17

Padé Approximation Example •  Hence setting a(0) to 0 results

in a new set of equations

•  These have a non-trivial solution

•  Using the first three equations to determine the zeros results in

•  And the model becomes with a unit sample response

•  Hence the first 5 values are which only matches the first 4 values of x(n) !

18

Padé Approximation Example •  The Padé approximation can

also be used for filter design

•  The goal is an ideal halfband lowpass with a jet unspecified filter delay of nd

•  The unit sample response is

•  This response should now be approximated using the Padé approach with 11 degrees of freedom

–  Once with FIR (p=0, q=10) –  Once with IIR (p=5,q=5)

19

Padé Approximation Example •  Since we can exactly match 11

values of i(n), it makes sense to set nd to 5, so that we can capture most of the energy with the 11 degrees of freedom

•  As always, the FIR is easy –  Note that this means, that we simply multiply

the ideal filter unit sample response with a rectangular window (window method)

•  The IIR model follows the steps we used before

20

Padé Approximation Example

21

Prony’s Method •  Padé matches perfectly the

number of samples in x(n) which correspond to the degree of freedom (p+q+1)

•  What happens after p+q+1 is of no concern in the Padé method

•  Prony does not match p+q+1 samples perfectly, but tries to use its degree of freedom such that an overall mean squared error is minimized

22

0 5 10 15 20 25-200

-100

0

100

200

300

400

n (samples)

Am

plitu

de

Impulse Response

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

n (samples)

Am

plitu

de

Impulse Response

Prony’s Method •  Prony uses the same trick as

Padé, but is also concerned about values beyond p+q+1, where the equal sign does not hold anymore

•  The way the error is defined results in a linear problem, which is much easier to solve then the nonlinear equations of the direct least squares method

23

Prony’s Method

•  Since bq(n)=0 for n>q the error can be written explicitly (a(0)=1)

•  Instead of setting e(n)=0 for n=0,..,p+q as in the Padé approximation, Prony’s method begins by finding the coefficients ap(k) that minimize the squared error

•  As with Padé, since we focus on samples n>q, the error depends only on the coefficients ap(k)

24

Prony’s Method •  These coefficients can be

found be setting the partial derivatives to zero

•  Since the partial derivative of e*(n) with respect to ap*(k) is x*(n-k) this equation leads to the orthogonality principle

•  Substituting the error expression into this leads to

•  Or equivalently

25

Prony’s Method •  This can be simplified by using

the following definition, which is very similar to the sample autocorrelation sequence (here we are not dividing by the number of samples N, and the sum does not go over all samples, it starts at q+1)

•  This is now a set of p linear equations in the p unknowns ap(1),…, ap(p) referred to as the Prony normal equations

•  Or in matrix notation

26

Prony’s Method •  The Prony normal equations

can be expressed using the data matrix Xq containing p infinite-dimensional column vectors

•  The autocorrelation matrix Rx may be written in terms of Xq as follows

•  The vector of autocorrelations rx may also be expressed in terms of Xq as follows

•  Hence this is an equivalent form of the Prony normal equations

27

Prony’s Method •  If Rx is nonsingular then the

coefficients ap(k) that minimize the MSE are

•  Or equivalently Note that is also called the pseudo-inverse

28

Prony’s Method •  Now the value for the modeling

error can be determined

•  It follows from the orthogonality principle that the second term is zero, therefore the minimum modeling error is

•  Which can be written in terms of the autocorrelation sequence

29

Prony’s Method

•  The normal equations can be written slightly differently, which will become handy later on as so called augmented normal equations

•  Or in matrix notation

30

Prony’s Method •  Once the coefficients ap(k)

have been found, the coefficients bq(k) are found in the same fashion as with the Padé approximation.

•  In other words, the error is set to 0 for n=0,…,q.

•  This can be done using the convolution formula directly

•  Or with a matrix multiplication

31

Prony’s Method

32

Prony’s Method Example •  Given a pulse x(n)

•  Find the IIR (p=1,q=1) model using Prony’s method

•  Prony’s normal equations

•  For p=1

•  In general

•  Hence

33

Prony’s Method Example •  This results in

•  The denominator of H(z) becomes

•  The numerator coefficients we find in general with a matrix multiplication

•  In particular

•  Thus the model for x(n) is

34

Prony’s Method Example •  For the minimum squared error

we have

•  Since

•  This becomes

•  For example, if N=21

•  The error corresponds to a minimum squared error

•  The “true” error results in a squared true error of (this is the error the direct method tries to minimize)

35

Prony’s Method Example

36

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n (samples)

Am

plitu

de

Impulse Response

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n (samples)

Am

plitu

de

Impulse Response

Prony’s Method Example •  Comparing Prony’s solution

with the solution Padé would produce

•  Using the last equation to find a(1) results in a(1)=-1 and using the top two equations to find b(0) and b(1) results in b(0)=1 and b(1)=0

•  The unit sample response is

•  The model error e(n) is

•  And the true error is 0 for x(n), n=0,..,21 and 1 for all other n!

37

Prony’s Method Example •  The same lowpass as with the

Padé approximation should be designed

•  Again we use an IIR (p=5,q=5) model and nd=5

•  Solving Prony’s normal equations for p=5

•  Using these coefficients to find the Bq(p) coefficients

38

Prony’s Method Example

39

Prony’s Method •  We may also formulate Prony’s

method in terms of finding the least squares solution to a set (infinitely many) of overdetermined linear equations, which all want the error to be zero, but that is not possible, since we have only p+q+1 degrees of freedom

•  In matrix notation

•  For such an overdetermined set of linear equations the least squares solution can be found using the pseudo-inverse

40

Shanks’ Method •  In Prony’s method the

numerator coefficients are found by setting the error to zero for n=0,…,q

•  This forces the model to be exact for those values but does not take into acount values greater than q

•  A better approach is to perform a least squares minimization of the model error over the entire length of the data record

41

Shanks’ Method •  Note that H(z) can be

interpreted as a cascade of two filters

•  Once Ap(z) has been

determined the unit sample response can be computed

•  Instead of forcing e’(n) to zero for the first q+1 values of n as in Prony’s method, Shanks’ method minimizes the squared error

42

Shanks’ Method •  Note that this is the same error

as with the direct method, but since the poles are already determined, solving for the zeros is a linear problem and hence much simpler

•  As we did with Prony’s method, we use the deterministic auto- and cross-correlation sequences

–  Note that here the lower limit starts at 0, while for Prony’s method the lower limit starts at q+1

43

Shanks’ Method •  In matrix form, these equations

are

•  These can be simplified by noticing

•  Since g(n)=0 for n<0 (causal filter) the last term is zero for k>=0 or l>=0 hence

•  Therefore each term rg(k,l) depends only on the difference between k and l

44

Shanks’ Method •  Therefore

•  Becomes now

•  Since rg(k-l)=rg*(l-k) (*)

•  The above equation can be written in matrix form

•  Or more compactly

45 (*) Note that there is an error in the book: rx(k-l)=rx*(l-k)

Shanks’ Method •  The minimum squared error

can now be found

•  Now e’(n) and g(n) are orthogonal hence the second term is zero. Therefore the minimum error is

•  Or in terms of rx(k) and rxg(k)

•  Since rgx(-k)=rxg*(k)

46

Shanks’ Method •  Shanks’ method can also be

interpreted as finding a least squares solution to an overdetermined set of linear equations, by letting

•  Now writing this convolution as a overdetermined set of linear equations results in

•  Or equivalently

•  The pseudo-inverse will then find the least squares solution (Matlab: bq=G0\x0 )

47

-1

Shanks’ Method

48

Shanks’ Method Example •  Back to the unit pulse of length

N

•  For an IIR model (p=1,q=1) Prony’s method (and also Shanks’ method) result in this denominator polynomial

•  Shanks’ method is now used to find the numerator polynomial

•  We need to know g(n)

49

Shanks’ Method Example •  Finding g(n) by using the

inverse z transform resulting in

•  Now the auto- and cross-correlations sequences are needed

50

Shanks’ Method Example •  Therefore

•  And

•  Solving

•  For bq(0) and bq(1) results in

•  And for N=21 the model becomes with a true error of 51

Prony

All-Pole Modeling •  The main advantage of all-pole

models is, that there are fast algorithms (Levinson-Durbin recursion) to solve the Prony normal equations. Another reason is, that many physical processes, such as speech, can be well modeled with an all-pole model

•  The error that we are concerned of in Prony’s method for finding the ap(k) coefficients is (q=0)

52

All-Pole Modeling •  Since x(n)=0 for n<0 then the error

at time n=0 is equal to x(0)-b(0) and hence does not depend on the a coefficients. Hence we can include it in the sum of the squared errors and the minimization will still result in the same a coefficients

•  Following the steps of the Prony derivation we arrive at

•  Where now

•  Note for Prony the sum would start at q+1=1, but since we minimize the above error it starts at 0

53

All-Pole Modeling •  Again this can be simplified,

since x(n)=0 for n<0, the term on the right is zero for k>=0 or l>=0. This means that rx(k,l) depends only on the difference between k and l

•  We define

•  Hence the all-pole normal equations are

•  Or in matrix form, using the conjugate symmetry

54

All-Pole Modeling •  The modeling error is given by

•  We still need to find b(0). The obvious choice would be

•  Since the entire unit sample response is scaled by b(0), it might be better to select b(0) such, that the overall energy in x(n), rx(0), is equal to the overall energy of unit sample response h(n), rh(0)

•  Without proof, this can be achieved by setting

55

All-Pole Modeling •  As with the Padé

approximation and Prony’s method, the all-pole modeling can be interpreted as finding the least squares solution to the following set of overdetermined linear equations

56

All-Pole Modeling

57

All-Pole Modeling Example •  Let us find a first-order all-pole

model of the form

•  For a given signal x(n)

•  The autocorrelation sequence is

•  The all-pole normal equations are

•  Which for a first-order model results in

58

All-Pole Modeling Example

•  Therefore

•  And the modeling error is

•  Selecting b(0) to match the energy

•  The model becomes

59

All-Pole Modeling Example •  Let us find a second-order all-pole

model for the same signal x(n)

•  Which for a second-order model results in

•  In this example

•  Resulting in

•  The modeling error becomes (Note since rx(n>1)=0, the modeling error only depends on a(1))

•  Selecting b(0) to match the energy

•  Resulting in the model 60

All-Pole Modeling •  The all-pole normal

equations can again be brought into a special form, which contains the modeling error

61

Linear Prediction •  We now establish the equivalence

between all-pole signal modeling and linear prediction

•  Recall that Prony’s method finds the set of all-pole parameters that minimize the sum of the squared errors

62

FIR Least Squares Inverse Filters •  Let us find an inverse filter for the

filter g(n)

•  Or in the z-domain

•  In other words, we are looking for an equalizer H(z) such that. Note that most of the time, equality is not possible, since we cannot equalize lost frequency bands with an infinite amplification.

•  Furthermore, as we will see later on, noise amplification is also a problem of inverse filters

63

FIR Least Squares Inverse Filters •  To be sure that the inverse filter is

stable, we constrain it to be a FIR filter of length N

•  The error of the approximation is

•  Now the goal is to minimize the sum of the squared error

•  This is the same as Shanks’ method

64

FIR Least Squares Inverse Filters •  Since this is Shanks’ method the

solution for the optimum least squares inverse filter is

•  Where

•  And

•  Since d(n)=δ(n) and g(n<0)=0

•  Hence the matrix equation becomes

•  Or compactly written

65

FIR Least Squares Inverse Filters •  The minimum squared error is

•  Since

•  Results in

•  Again, this can be formulated as finding the least squares solution to a set of overdetermined linear equations by setting e(n>=0)=0

•  This results in

•  Which can be solved with the pseudo-inverse

66

FIR Least Squares Inverse Example

•  The goal is to find the FIR least squares inverse for the system having a unit sample response (|α|<1)

•  FIR inverse of length N=2

•  Autocorrelation sequence of g(n)

67

FIR Least Squares Inverse Example

•  In general the normal equations are

•  In this example

•  Hence the solution for h(0) and h(1) is

•  With a squared error of

•  And a system function of

68

FIR Least Squares Inverse Filters with Delay

•  In many cases, constraining the least squares inverse filter to minimize the difference between hN(n)*g(n) and δ(n) is overly restrictive. Often a delay is tolerable, i.e., d(n)= δ(n-n0)

•  Clearly we need to solve

•  where rdg(k) has now changed to

•  Hence the equations that define the coefficients of the FIR least squares inverse filter are

•  And the minimum squared error is 69

FIR Least Squares Inverse Filters with Delay

•  Again, this can be formulated as finding the least squares solution to a set of overdetermined linear equations by setting e(n>=0)=0

•  This results in where the 1 on the right hand side is now at position n0+1 (in this example n0=2)

•  Which can be solved with the pseudo-inverse

70

FIR Least Squares Inverse Filters with Delay Example

•  The goal is to find the FIR least squares inverse filter of length N=11 for the system

•  Since h(n) is of length 11 and g(n) is of length 3, then is of length 13. Hence the delay n0 must be between 0 and 12

71

FIR Least Squares Inverse Filters •  So far the goal was to find a FIR

inverse filter which resulted in a pure delay

•  Sometimes it might be interesting to finding a filter which results in something else than a pure delay

•  Since there where no assumptions about d(n) in the derivation, the same result holds

•  In the interpretation as an overdetermined set of linear equation problem, the right hand side simply becomes a vector containing d(n)

72

Finite Data Records •  So far we have assumed, that the data x(n) is available for all

times

•  In reality, this never the case and hence we need to deal with the fact, that the data is only known in the interval [0, N]

•  There are two distinct approaches to this problem, the autocorrelation method and the covariance method.

•  Since they are most often used in the context of all-pole modeling, this is the focus of this section

73

The Autocorrelation Method •  Assume x(n) is only known over

the finite interval [0,N] and should be approximated using an all-pole model

•  Using Prony’s method, the coefficients ap(k) are found that minimize where

•  If x(n) is unknown outside [0,N], then e(n) cannot be evaluated for n<p or for n>N

•  So the autocorrelation method assumes, that all values outside the interval [0,N] are zero

74

The Autocorrelation Method •  Now we can use the Prony normal

equations to find an all-pole model for the windowed signal

•  The main difference is, that for the summation of the autocorrelation sequence, the lower limit is now k and the upper limit N

•  The Toeplitz structure of the normal equations are preserved, which means, that the Levinson-Durbin recursion can be used to efficiently find the solution

75

The Autocorrelation Method

76

Autocorrelation Method Example •  Given the signal x(n), whose first

N+1 values are known

•  Use the autocorrelation method to find a first-order all-pole model of the form

•  The normal equations are in general

•  And in this example (q=1) it follows

77

Autocorrelation Method Example •  Since

and rx(k)=0 for k>N

•  Therefore

•  Hence

78

The Autocorrelation Method •  Applying the window to x(n) forces

the signal to zero outside the interval [0,N]. If this is a bad assumption, then the model will not be very good.

•  The signal in the previous example is the unit sample response of the all-pole filter but clearly the solution is biased

•  Without proofing it, a nice property of this windowing is, that the all-pole model will be stable

79

The Autocorrelation Method •  Again, these results can also be

obtained as the least squares solution to a overdetermined set of linear equations using the pseudo-inverse

•  Specifically, in the autocorrelation method we would like to find a set of coefficients ap(k) so that the error e(n) is equal to zero for n>0

•  Or in matrix form

80

The Autocorrelation Method

•  Or in matrix form

•  And the solution can be found with the pseudo-inverse

81

-1

The Autocorrelation Method •  Clearly the autocorrelation method can also be used for pole-zero

modeling

•  Simply find the poles with the windowed data and then use these poles and the windowed data to find the zeros

•  Since q is then not 0 anymore, the normal equations are not Toeplitz anymore and hence one of the main advantages of the autocorrelation method, the fast Levinson-Durbin recursion, is lost

82

The Autocorrelation Method •  Besides just using a rectangular

window, the autocorrelation method can also be used with other windows

•  One reason for this is, that for n=0,1,..,p-1 the prediction is based on less than p previous values

•  By having for example a hann window, the beginning and the end of the data is scaled down and hence it is easier to predict with fewer values

83

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

The Covariance Method •  The covariance method does not

make any assumptions about the values outside the interval [0,N]

•  This usually results in better models, but both advantages of the autocorrelation method are lost, the fast recursion (Toeplitz) and the guaranteed stability

•  Since the error in Prony’s method is defined as

•  It is clear, that if x(n) is only known in the interval [0,N], the error can only be calculated in the interval [p,N]

84

The Covariance Method •  This can be seen in the right

figure

•  Hence it makes sense, to define the sum of the squared error such, that only e(n) are used which can also be calculated

85

The Covariance Method •  Since no data outside of the

original interval is used, no assumptions about the signal outside of the original interval are necessary

•  Since the only difference to Prony’s method is the definition of the sum of the squared errors, the same normal equations still hold

•  The calculation of the autocorrelation sequence needs to reflect the fact, that only data inside the original interval should be used

86

The Covariance Method •  As with the autocorrelation

method, the covariance method may be formulated as a least squares approximation problem

•  If we set the error equal to zero for n in the interval [p,N] we have a set of (N-p+1) linear equations for which the least squares solution can again be found using the pseudo-inverse

•  Note that these equations are a subset of the equations used for the autocorrelation method 87

The Covariance Method

88

Covariance Method Example •  Let us revisit the problem where x

(n) in the interval [0,N] is given by

•  The goal is to find a first order all-pole model using the covariance method

•  In general

•  In this example

•  Where

89

Covariance Method Example •  Again

•  Hence

•  Therefore

•  With

•  The model becomes

90

Comparison Example •  The goal is to model x(n) as

the unit sample response of a second-order all-pole filter

•  The first 20 values are

•  The autocorrelation method uses a windowed signal and then applies Prony’s method

•  The normal equations are

•  Where

91

Comparison Example •  Evaluating this sum at k=0,1

and 2 results in

•  Hence the normal equations become

•  Solving for a(1) and a(2) we get

•  Hence the denominator polynomial is

•  The modeling error is

•  Setting b(0) to match the energy

92

Comparison Example •  The goal is to model x(n) as

the unit sample response of a second-order all-pole filter

•  The first 20 values are

•  The covariance method uses a different definition of the error and then applies Prony’s method

•  The normal equations are

•  Where

93

Comparison Example •  Evaluating this sum we find

•  Hence the normal equations become singular! Hence a lower order is possible => a(2)=0

•  Solving for a(1)

•  Hence the denominator polynomial is

•  Setting b(0) to 1 the model is

•  Matching the data perfectly for n=0,1,…,N

94

Stochastic ARMA Models •  So far we have modeled

deterministic signals as the unit sample response of a LTI system

•  In this section, the goal is to model stochastic signals, as the output of an causal linear shift-invariant filter, given the input is a unit variance white noise sequence

•  An ARMA(p,q) process may be generated by filtering a unit variance white noise v(n) with a causal linear shift-invariant filter having p poles and q zeros

95

Stochastic ARMA Models •  The goal is now to find the

missing coefficients by minimizing the mean square error

–  This though will result in a nonlinear problem

•  The autocorrelation sequence of an ARMA(p,q) process satisfies the Yule-Walker equations (Eq. 3.115)

•  Where cq(k) is the convolution of bq(k) and h*(-k)

•  And rx(k) is a statistical autocorrelation

96

Stochastic ARMA Models •  Since h(n) is causal, then cq(k)

=0 for k>q and the Yule-Walker equations for k>q are a function only of the coefficients ap(k)

•  This can be expressed in matrix form for k=q+1,…,q+p which is a set of p linear equations with p unknowns ap(k)

–  These equations are called the Modified Yule-Walker equations

–  Hence this approach is called the Modified Yule-Walker Equation (MYWE) method

–  If the autocorrelation is unknown, then an estimate is used 97

Stochastic ARMA Models •  After the coefficients ap(k)

have been determined, the next step is to find the MA coefficients bq(k)

•  One approach is filtering x(n) with Ap(z) resulting in y(n) with power spectrum

•  Now the moving average coefficients bq(k) may be estimated from y(n) using one of the moving average techniques presented later on

98

Stochastic ARMA Models •  A direct approach for finding

the coefficients bq(k) uses the Yule-Walker equations for k=0,…,q

•  Since cq(k)=0 for k>q the sequence cq(k) is then known for all k>=0 (but not for k<0)

•  We denote the z-transform of the causal part of cq(k)

•  Similarly the anti-causal part is

99

Stochastic ARMA Models •  Recall

•  Hence

•  Multiplying Cq(z) by Ap*(1/z*) results in the power spectrum of the MA(q) process

•  Since ap(k)=0 for k<0 , Ap*(1/z*) only contains positive powers of z. The causal part of Py(z) therefore

100

Stochastic ARMA Models •  Although cq(k) is unknown for

k<0, the causal part of Py(z) may be computed from the causal part of cq(k) and the coefficients ap(k)

•  Using the conjugate symmetry of Py(z) we may then determine Py(z) for all z

•  Finally, performing a spectral factorization of Py(z) produces the polynomial Bq(z)

101

Stochastic ARMA Models Example •  The goal is to find an ARMA

(1,1) model for a real-valued random process with

•  The modified Yule-Walker equations are in general

•  In this example

•  Resulting in

•  This was the easy part, now let’s find the MA coefficients

102

Stochastic ARMA Models Example •  In general

•  In this example

•  Resulting in

•  Hence

•  Multiplying with

•  Results in

•  Hence the causal part of Py(z)

103

Stochastic ARMA Models Example •  Using the symmetry of Py(z)

•  Which, after spectral factorization results in

•  Which in turn leads to the desired ARMA(1,1) model

104

Stochastic ARMA Models •  Just like the Padé

approximation, we only used the autocorrelation sequence between q and q+p to estimate ak(p)

•  If the autocorrelation sequence is known for values larger than q+p, then this knowledge can also be used in an extension quite similar to Prony’s method

105

Stochastic ARMA Models •  From the last L-q

equations, we get

•  Which is a set of overdetermined linear equations in the unknowns ap(k)

•  Hence the least squares solution can be found using the pseudo-inverse

106

Stochastic AR Models •  This is clearly a special case

of an ARMA(p,q=0) model

•  Hence its autocorrelation sequence must satisfy the Yule-Walker equations

•  Writing these in matrix form for k>0 using the conjugate symmetry of rx(k)

•  Solving these p equations for the p unknowns ak(p) is called the Yule-Walker method

107

Stochastic AR Models •  Note that these equations are

equaivalent to the normal equations for all-pole modeling using Prony’s method. The only difference is in the definition of the autocorrelation sequence

•  Statistical definition for the Yule-Walker method

•  Deterministic definition for Prony’s method

•  But if we need to estimate the autocorrelation sequence for the Yule-Walker method?

108

Stochastic MA Models •  This is clearly a special case

of an ARMA(p=0,q) model

•  The Yule-Walker equations (which are nonlinear) relating the autocorrelation sequence to the filter coefficients bq(k) are

•  Instead to solving these directly one approach uses spectral factorization. Since an autocorrelation sequence of an MA(q) process is zero for |k|>q the power spectrum has the following form

109

Stochastic MA Models •  Using the spectral

factorization given in Eq. 3.102, where Q(z) is a minimum phase monic (q(0)=1) polynomial of degree q, i.e., |αk|<=1

•  Hence •  And Q(z) is the minimum

phase version of Bq(z) that is formed by replacing each zero of Bq(y) that lies outside the unit circle with one that lies inside the unit circle at the conjugate reciprocal location

110

Stochastic MA Models •  In summary, given the

autocorrelation sequence, we get the power spectrum, which is then factored into a minimum phase (Q(z)) and a maximum phase polynomial Q*(1/z*)

•  Hence the process can now be modeled as the output of a minimum phase FIR filter, driven by a unit variance white noise

•  Note the model is not unique. Any one (1-αkz-1) factor in Q(z) can be replaced by (1-αk*z-1) 111

Stochastic MA Models Example •  For a given autocorrelation

sequence, find an MA(1) model

•  The power spectrum is

•  Performing spectral factorization

•  In general

•  In particular σ0=4 and αk=-1/4, hence as a minimum phase FIR filter or as a maximum phase FIR Filter 112

Power Spectrum Estimation •  Spectrum estimation is an

important problem is stochastic modeling

•  Since the autocorrelation is usually unknown Px must be estimated from a sample realization

•  A direct approach is to estimate the autocorrelation sequence and then transform. But with N+1 values of x(n) we can only estimate rx(-N<=k<=N)

113

Power Spectrum Estimation •  This estimate is limited by two

factors –  Since the autocorrelation was

estimated, all errors in this estimation will directly be in the estimation of Px

–  Since the estimation of rx is of limited length 2N+1, the frequency resolution of Px is limited too

•  The estimate can be improve by including prior knowledge about the process. For example, the process x(n) is an AR(p) process

•  Now the Yule-Walker method with estimated autocorrelations could be used to estimate the missing parameters

114

Spectrum Estimation Example •  Given 64 samples of an AR(4)

process generated by filtering unit variance zero mean white noise through the following model

•  The model coefficients are

•  Which corresponds to a filter having a pair of poles at and at

115

Spectrum Estimation Example •  Estimating the autocorrelation

sequence for |k|<N using the following equation and substituting the resulting values in this equation we obtain a power spectrum estimate called (a)

•  On the other hand, using the estimates of the autocorrelation in the Yule-Walker equations we get

•  Using these coefficients in results in the power spectrum estimate in (b)

116

Spectrum Estimation Example

117

Exercises

118

Exercise

119

Solution

120

Exercise

121

Solution

122

Solution

123

Solution

124

Exercise

125

Solution

126

Exercise

127

Solution

128

Exercise

129

Solution

130

Exercise

131

Solution

132

Exercise

133

Solution

134

Exercise

135

Solution

136

Solution

137