Rocha and Cribari-Neto (2009) Beta Autoregressive Moving Average Models

Test (2009) 18: 529–545DOI 10.1007/s11749-008-0112-z

O R I G I NA L PA P E R

Beta autoregressive moving average models

Andréa V. Rocha · Francisco Cribari-Neto

Received: 1 November 2007 / Accepted: 26 May 2008 / Published online: 13 June 2008© Sociedad de Estadística e Investigación Operativa 2008

Abstract We build upon the class of beta regressions introduced by Ferrari andCribari-Neto (J. Appl. Stat. 31:799–815, 2004) to propose a dynamic model for con-tinuous random variates that assume values in the standard unit interval (0,1). Theproposed βARMA model includes both autoregressive and moving average dynam-ics, and also includes a set of regressors. We discuss parameter estimation, hypothe-sis testing, goodness-of-fit assessment and forecasting. In particular, we give closed-form expressions for the score function and for Fisher’s information matrix. An ap-plication that uses real data is presented and discussed.

Keywords ARMA · Beta distribution · Beta ARMA · Forecasts

Mathematics Subject Classification (2000) 62M10 · 91B84

1 Introduction

The beta distribution is commonly used for modeling experiments in which the vari-able of interest is continuously distributed in the interval (a, b), where a and b areknown scalars, and a < b, since its density can assume quite different shapes depend-ing on the values of the two parameters that index the distribution. A particularly use-ful situation occurs when a = 0 and b = 1 so that the random variable assumes valuesin the standard unit interval, (0,1); this is the case, e.g., of rates or proportions.

The beta probability density function is given by

π(y;p,q) = �(p + q)

�(p)�(q)yp−1(1 − y)q−1, 0 < y < 1, (1)

A.V. Rocha (�) · F. Cribari-NetoDepartamento de Estatística, Universidade Federal de Pernambuco. Cidade Universitária, Recife,PE, 50740-540, Brazile-mail: [email protected]

mailto:[email protected]

530 A.V. Rocha, F. Cribari-Neto

where p > 0, q > 0, and �(·) is the gamma function. The mean and variance of y

are, respectively,

E(y) = p

p + qand Var(y) = pq

(p + q)2(p + q + 1).

The mode of the distribution exists when both p and q are greater than one, in whichcase, mode(y) = (p−1)/(p+q −2). The uniform distribution is a special case of (1)when p = q = 1. Estimation of p and q can be carried out by maximum likelihood.Small sample bias adjustments to the maximum likelihood estimators of p and q

were obtained by Cribari-Neto and Vasconcellos (2002).Ferrari and Cribari-Neto (2004) proposed a regression model in which the de-

pendent variable is beta distributed. Their parameterization is as follows.1 Let μ =p/(p+q) and φ = p+q , i.e., p = μφ and q = (1−μ)φ; here, 0 < μ < 1 and φ > 0.It then follows that the mean and the variance of y are, respectively,

E(y) = μ and Var(y) = V (μ)

1 + φ,

where V (μ) = μ(1 − μ). Note that φ can be interpreted as a precision parameterin the sense that, for a given value of μ, the larger the value of φ, the smaller thevariance of y.

Using this parameterization, Ferrari and Cribari-Neto (2004) defined a regressionmodel which in many aspects resembles the class of generalized linear models (see,for example, Nelder and Wedderburn 1972, and McCullagh and Nelder 1989). Theirmodel, however, is not a generalized linear model (GLM).

Our chief goal in this paper is to propose a time series model for random variablesthat assume values in the standard unit interval. The approach is based on the classof beta regression models of Ferrari and Cribari-Neto (2004). Our approach is alsosimilar to those of Benjamin et al. (2003) and Shephard (1995) (see also Li 1994,and Fokianos and Kedem 2004), who have developed dynamic models for randomvariables in the exponential family. We note that Zeger and Qaqish (1988) proposedthe so-called Markov regression models (which extends the class of GLMs) and thatLi (1991) developed goodness-of-fit tests for such models. In this paper, we proposethe beta autoregressive moving average model (βARMA). It can be used to modeland forecast variates that assume values in the standard unit interval, such as ratesand proportions. The use of the βARMA model avoids the need to transform the dataprior to modeling. Moreover, the distributions of rates and proportions are typicallyasymmetric and, hence, Gaussian-based inference is not appropriate. The βARMAmodel naturally accommodates asymmetries and also non-constant dispersion.

The paper unfolds as follows. Section 2 introduces the proposed model, Sect. 3focuses on parameter estimation, Sect. 4 considers further inference strategies andprediction, and Sect. 5 illustrates the methodology by applying the model to realdata. Finally, concluding remarks are given in Sect. 6.

1For an alternative formulation of the class of beta regressions, see Vasconcellos and Cribari-Neto (2005).

Beta autoregressive moving average models 531

2 The model

Our goal is to define a dynamic model for beta distributed random variables observedover time. For both regression and time series analysis it is typically more convenientto work with the mean response and also with a precision (or dispersion) parameter.Therefore, we shall employ the beta parameterization given in Ferrari and Cribari-Neto (2004).

We shall assume that the response is continuous and takes values in the standardunit interval (0,1). We note, however, that the proposed model is also useful in sit-uations where the response is restricted to the interval (a, b), where a and b areknown scalars (a < b). In this case, one can model (y − a)/(b − a) instead of mod-eling y directly. We shall also assume that the covariates xt , t = 1, . . . , n, wherext = (xt1, . . . , xtk)

′, are non-random. Here, n denotes the sample size and k < n.Let yt , t = 1, . . . , n, be random variables and assume that the conditional distribu-

tion of each yt , given the previous information set Ft−1 (i.e., the smallest σ -algebrasuch that the variables y1, . . . , yt−1 are measurable), follows the beta distribution.That is, the conditional density of yt given Ft−1 is

f (yt | Ft−1) = �(φ)

�(μtφ)�((1 − μt)φ)y

μtφ−1t (1 − yt )

(1−μt )φ−1, 0 < yt < 1, (2)

where E(yt | Ft−1) = μt and Var(yt | Ft−1) = V (μt )/(1 + φ) are, respectively, theconditional mean and the conditional variance of yt ; here, V (μt ) = μt(1 − μt).

In the class of beta regression models (see Ferrari and Cribari-Neto 2004), μt isrelated to a linear predictor, ηt , through a twice differentiable strictly monotonic linkfunction g : (0,1) → R. The most commonly used link functions are the logit, probit,and complementary log–log links. Unlike the linear predictor of the beta regressionmodel, in the systematic component of the βARMA specification there is an addi-tional component, τt , which allows autoregressive and moving average terms to beincluded additively. Thus, a general model for μt is given by

g(μt ) = ηt = x′t β + τt ,

where β = (β1, . . . , βk)′ is a set of unknown linear parameters and τt is an ARMA

component which shall be described below and is similar to what is given in Benjaminet al. (2003).

We shall now motivate the definition of the ARMA component τt . Consider anARMA(p,q) model initially as function of a term ξt , such that ξt = g(yt ) − x′

t β .Then,

ξt = α +p∑

i=1

ϕiξt−i +q∑

j=1

θj rt−j + rt , (3)

where rt denotes a random error and α ∈ R is a constant. Although we have notdefined rt , it is assumed that E(rt | Ft−1) = 0. Taking conditional expectations withrespect to the σ -algebra Ft−1 in (3) we obtain the approximate model

τt = α +p∑

i=1

ϕiξt−i +q∑

j=1

θj rt−j .


Note that ξt−i with i > 0 is Ft−1-measurable, and E(ξt | Ft−1) ≈ τt . Therefore, weobtain the following expression for τt :

τt = α +p∑

i=1

ϕi

{g(yt−i ) − x′

t−iβ} +

q∑

j=1

θj rt−j ,

where xt ∈ Rk , β = (β1, . . . , βk)

′, k < n, and p,q ∈ N are, respectively, the au-toregressive and moving average orders. The ϕ’s and the θ ’s are the autoregres-sive and moving average parameters, respectively, and rt is an error. Finally, sinceτt = g(μt ) − x′

t β , we propose the following general model for the mean μt :

g(μt ) = α + x′t β +

p∑

i=1

ϕi

{g(yt−i ) − x′

t−iβ} +

q∑

j=1

θj rt−j . (4)

The βARMA(p, q) model is defined by (2) and (4). It is noteworthy that boththe fitted values and the out-of-sample forecasts obtained using the βARMA modelwill belong to the standard unit interval. There are several choices for the movingaverage error terms; for example, errors measured on the original scale (i.e., yt −μt ),on the predictor scale (i.e., g(yt ) − ηt ), etc. What is required is that the error rt bemeasurable with respect to Ft .

Let us obtain the mean and variance of two errors. For yt − μt , we have

E(yt − μt | Ft−1) = 0 and Var(yt − μt | Ft−1) = V (μt )

1 + φ,

where V (μt ) = μt(1 − μt). In particular, E(yt − μt) = 0 and Var(yt − μt) =V (μt )/(1 + φ). Note that the errors are orthogonal, since for i < j

E((yi − μi)(yj − μj )

) = E((yi − μi)E(yj − μj | Fj−1)

) = 0. (5)

Since g(·) is continuously differentiable, we can Taylor-expand it as

g(yt ) ≈ g(μt ) + g′(μt )(yt − μt) ⇒ g(yt ) − g(μt ) ≈ g′(μt )(yt − μt).

Moreover, ηt = g(μt ), then, for the error g(yt ) − ηt ,

E(g(yt ) − ηt | Ft−1

) ≈ E(g′(μt )(yt − μt) | Ft−1

) = 0.

Given that g(·) is twice differentiable, it follows from the delta method that

Var(g(yt ) − ηt | Ft−1

) ≈ (g′(μt )

)2 V (μt )

1 + φ.

In particular, E(g(yt )−ηt ) ≈ 0 and Var(g(yt )−ηt ) ≈ (g′(μt ))2V (μt )/(1+φ). With

an analogous argument to the one used with (5), we conclude that these errors are alsoapproximately orthogonal.


3 Parameter estimation

The estimation of the parameters that index the βARMA model can be carried out bymaximum likelihood. Let us denote the vector of parameters as γ = (α,β ′, φ,ϕ′, θ ′)′,where ϕ = (ϕ1, . . . , ϕp)′ and θ = (θ1, . . . , θq)′. As noted earlier, we assume that thecovariates xt are non-stochastic.

The log-likelihood function for the parameter vector γ conditional on the firstm observations, where m = max{p,q}, is � = ∑n

t=m+1 logf (yt | Ft−1), withf (yt | Ft−1) given in (2). Expectations are also taken in conditional fashion. Notethat, conditioned to Fm, the first m errors are zero (or approximately zero). Thus,in the construction of the conditional log-likelihood function, the first q errors areassumed to equal zero.

3.1 Score vector

Let logf (yt | Ft−1) = �t (μt ,φ). Then

�t (μt ,φ) = log�(φ) − log�(μtφ) − log�((1 − μt)φ

) + (μtφ − 1) logyt

+ {(1 − μt)φ − 1

}log(1 − yt ).

Therefore, the conditional log-likelihood function is

� =n∑

t=m+1

�t (μt ,φ).

Thus,

∂�

∂α=

n∑

t=m+1

∂�t (μt ,φ)

∂μt

dμt

dηt

∂ηt

∂α.

Note that dμt/dηt = 1/g′(μt ). We also have that

∂�t (μt ,φ)

∂μt

= φ

[log

yt

1 − yt

− {ψ(μtφ) − ψ

((1 − μt)φ

)}], (6)

where ψ(·) is the digamma function, i.e., ψ(z) = d log�(z)/dz for z > 0. Let y�t =

log{yt/(1 − yt )} and μ�t = ψ(μtφ) − ψ((1 − μt)φ).Then,

∂�

∂α= φ

n∑

t=m+1

(y�t − μ�

t )1

g′(μt ).

Additionally, for l = 1, . . . , k,

∂�

∂βl

=n∑

t=m+1

∂�t (μt ,φ)

∂μt

dμt

dηt

∂ηt

∂βl

.


Then,

∂�

∂βl

= φ

n∑

t=m+1

(y�t − μ�

t )1

g′(μt )

(xtl −

p∑

i=1

ϕix(t−i)l

).

Furthermore,

∂�

∂φ=

n∑

t=m+1

{μt(y

�t − μ�

t ) + log(1 − yt ) − ψ((1 − μt)φ

) + ψ(φ)}.

Note also that, for i = 1, . . . , p,

∂�

∂ϕi

=n∑

t=m+1

∂�t (μt ,φ)

∂μt

dμt

dηt

∂ηt

∂ϕi

,

which yields

∂�

∂ϕi

= φ

n∑

t=m+1

(y�t − μ�

t )1

g′(μt )

(g(yt−i ) − x′

t−iβ).

Finally, for j = 1, . . . , q ,

∂�

∂θj

=n∑

t=m+1

∂�t (μt ,φ)

∂μt

dμt

dηt

∂ηt

∂θj

.

Therefore,

∂�

∂θj

= φ

n∑

t=m+1

(y�t − μ�

t )1

g′(μt )rt−j .

It is now possible to obtain the score vector U(γ ). Let y� = (y�m+1, . . . , y

�n)

′,μ� = (μ�

m+1, . . . ,μ�n)

′ and T = diag{1/g′(μm+1), . . . ,1/g′(μn)}. Let also 1 be ann × 1 vector of ones, M be the (n − m) × k matrix with (i, j)th element given byx(i+m)j − ∑p

l=1 ϕlx(i+m−l)j , P be the (n − m) × p matrix whose (i, j)th elementequals g(yi+m−j ) − x′

i+m−jβ and R be the (n − m) × q matrix with (i, j)th elementgiven by ri+m−j . Hence,

Uα(γ ) = φ1′T (y� − μ�),

Uβ(γ ) = φM ′T (y� − μ�),

Uφ(γ ) =n∑

t=m+1

{μt(y

�t − μ�

t ) + log(1 − yt ) − ψ((1 − μt)φ

) + ψ(φ)},

Uϕ(γ ) = φP ′T (y� − μ�),

and

Uθ(γ ) = φR′T (y� − μ�).


Therefore, the score vector is

U(γ ) =

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

Uα(γ )

Uβ(γ )

Uφ(γ )

Uϕ(γ )

Uθ (γ )

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

,

which is of dimension (k + p + q + 2) × 1. The conditional maximum likelihoodestimator (CMLE) of γ is obtained as the solution of the system of equations givenby U(γ ) = 0. Note that it does not have closed-form. Hence, it has to be numericallyobtained by maximizing the conditional log-likelihood function using a nonlinearoptimization algorithm, such as a Newton or quasi-Newton algorithm (see Nocedaland Wright 1999).

3.2 Conditional Fisher’s information matrix

In what follows λi and δi will be used as surrogates for βi , ϕi or θi . We have

∂2�

∂λi∂δj

=n∑

t=m+1

∂

∂μt

(∂�t (μt ,φ)

∂μt

dμt

dηt

∂ηt

∂δj

)dμt

dηt

∂ηt

∂λi

=n∑

t=m+1

[∂2�t (μt ,φ)

∂μ2t

dμt

dηt

∂ηt

∂δj

+ ∂�t (μt ,φ)

∂μt

∂

∂μt

(dμt

dηt

∂ηt

∂δj

)]dμt

dηt

∂ηt

∂λi

.

Since we are working with the conditional likelihood, we know, from the regu-larity conditions, that E(∂�t (μt , φ)/∂μt | Ft−1) = 0; in particular, we have thatE(∂�t (μt , φ)/∂μt ) = 0.

We also note that ∂ηt/∂βl = xtl −∑p

i=1 ϕix(t−i)l , ∂ηt/∂ϕi = g(yt−i )− x′t−iβ and

∂ηt/∂θj = rt−j are Ft−1-measurable (since Ft is a filtration). Thus, it follows fromthe regularity conditions that

E

(∂2�

∂λi∂δj

∣∣∣∣Ft−1

)=

n∑

t=m+1

E

(∂2�t (μt ,φ)

∂μ2t

∣∣∣∣Ft−1

)(dμt

dηt

)2∂ηt

∂δj

∂ηt

∂λi

.

From (6) we obtain

∂2�t (μt ,φ)

∂μ2t

= −φ{ψ ′(μtφ) + ψ ′((1 − μt)φ

)}.

Furthermore,

E

(∂2�

∂λi∂δj

∣∣∣∣Ft−1

)= −φ

n∑

t=m+1

{ψ ′(μtφ) + ψ ′((1 − μt)φ)}g′(μ)2

∂ηt

∂δj

∂ηt

∂λi

.


Note that

∂2�

∂λi∂α=

n∑

t=m+1

[∂2�t (μt ,φ)

∂μ2t

dμt

dηt

∂ηt

∂λi

+ ∂�t (μt ,φ)

∂μt

∂

∂μt

(dμt

dηt

∂ηt

∂λi

)]dμt

dηt

∂ηt

∂α.

Hence,

E

(∂2�

∂λi∂α

∣∣∣∣Ft−1

)= −φ

n∑

t=m+1

{ψ ′(μtφ) + ψ ′((1 − μt)φ)}g′(μ)2

∂ηt

∂λi

.

Moreover,

∂2�

∂α2=

n∑

t=m+1

[∂2�t (μt ,φ)

∂μ2t

dμt

dηt

∂ηt

∂α+ ∂�t (μt ,φ)

∂μt

∂

∂μt

(dμt

dηt

∂ηt

∂α

)]dμt

dηt

∂ηt

∂α.

Thus,

E

(∂2�

∂α2

∣∣∣∣Ft−1

)= −φ

n∑

t=m+1

{ψ ′(μtφ) + ψ ′((1 − μt)φ)}g′(μ)2

.

We have that

∂�

∂λj

= φ

n∑

t=m+1

(y�t − μ�

t )1

g′(μt )

∂ηt

∂λj

.

Therefore,

∂2�

∂λi∂φ=

n∑

t=m+1

[(y�

t − μ�t ) − φ

∂μ�t

∂φ

]1

g′(μt )

∂ηt

∂λi

.

It also follows from the regularity conditions that E(y�t | Ft−1) = μ�

t . Given that∂μ�

t /∂φ = ψ ′(μtφ)μt − ψ ′((1 − μt)φ)(1 − μt), we have

E

(∂2�

∂λi∂φ

∣∣∣∣Ft−1

)= −φ

n∑

t=m+1

ψ ′(μtφ)μt − ψ ′((1 − μt)φ)(1 − μt)

g′(μt )

dηt

dλi

.

We also have that

∂2�

∂α∂φ=

n∑

t=m+1

[(y�

t − μ�t ) − φ

∂μ�t

∂φ

]1

g′(μt )

∂ηt

∂α,

which yields

E

(∂2�

∂α∂φ

∣∣∣∣Ft−1

)= −φ

n∑

t=m+1

ψ ′(μtφ)μt − ψ ′((1 − μt)φ)(1 − μt)

g′(μt ).


Finally, ∂2�/∂φ2 follows from the differentiation of Uφ(γ ) with respect to φ. Weobtain

E

(∂2�

∂φ2

∣∣∣∣Ft−1

)= −

n∑

t=m+1

(ψ ′(μtφ)μ2

t + ψ ′((1 − μt)φ)(1 − μt)

2 − ψ ′(φ)).

Using

dηt

dβl

= xtl −p∑

i=1

ϕix(t−i)l ,dηt

dϕi

= g(yt−i ) − x′t−iβ, and

dηt

dθj

= rt−j ,

we can obtain Fisher’s information matrix for γ . Let W = diag{wm+1, . . . ,wn}, with

wt = φ{ψ ′(μtφ) + ψ ′((1 − μt)φ)}

g′(μt )2,

c = (cm+1, . . . , cn)′, with ct = φ{ψ ′(μtφ)μt − ψ ′((1 − μt)φ)(1 − μt)}, and D =

diag{dm+1, . . . , dn}, with dt = ψ ′(μtφ)μ2t +ψ ′((1 −μt)φ)(1 −μt)

2 −ψ ′(φ). Thus,

E

(∂2�

∂α2

∣∣∣∣Ft−1

)= −φtr(W), E

(∂2�

∂β∂α

∣∣∣∣Ft−1

)= −φM ′W1,

E

(∂2�

∂α∂φ

∣∣∣∣Ft−1

)= −1′T c, E

(∂2�

∂ϕ∂α

∣∣∣∣Ft−1

)= −φP ′W1,

E

(∂2�

∂β∂β ′

∣∣∣∣Ft−1

)= −φM ′WM, E

(∂2�

∂β∂φ

∣∣∣∣Ft−1

)= −M ′T c,

E

(∂2�

∂φ2

∣∣∣∣Ft−1

)= −tr(D), E

(∂2�

∂ϕ∂ϕ′

∣∣∣∣Ft−1

)= −φP ′WP,

E

(∂2�

∂ϕ∂φ

∣∣∣∣Ft−1

)= −P ′T c, E

(∂2�

∂θ∂θ ′

∣∣∣∣Ft−1

)= −φM ′WM,

E

(∂2�

∂θ∂φ

∣∣∣∣Ft−1

)= −R′T c, E

(∂2�

∂β∂ϕ′

∣∣∣∣Ft−1

)= −φM ′WP,

E

(∂2�

∂β∂θ ′

∣∣∣∣Ft−1

)= −φM ′WR, E

(∂2�

∂ϕ∂θ ′

∣∣∣∣Ft−1

)= −φP ′WR,

and

E

(∂2�

∂θ∂α

∣∣∣∣Ft−1

)= −φR′W1.


Therefore, Fisher’s information matrix can be expressed as

K = K(γ ) =

⎡

⎢⎢⎢⎢⎢⎢⎢⎣

Kαα Kαβ Kαφ Kαϕ Kαθ

Kβα Kββ Kβφ Kβϕ Kβθ

Kφα Kφβ Kφφ Kφϕ Kφθ

Kϕα Kϕβ Kϕφ Kϕϕ Kϕθ

Kθα Kθβ Kθφ Kθϕ Kθθ

⎤

⎥⎥⎥⎥⎥⎥⎥⎦

,

where Kαα = φtr(W), Kβα = K ′αβ = φM ′W1, Kαφ = Kφα = 1′T c, Kϕα = K ′

αϕ =φP ′W1, Kθα = K ′

αθ = φR′W1, Kββ = φM ′WM , Kβφ = K ′φβ = M ′T c, Kφφ =

tr(D), Kβϕ = K ′ϕβ = φM ′WP , Kβθ = K ′

θβ = φR′WR, Kϕϕ = φP ′WP , Kϕφ =K ′

φϕ = P ′T c, Kθθ = φR′WR, Kθφ = K ′φθ = R′T c, and Kϕθ = K ′

θϕ = φR′MP .Note that Fisher’s information matrix is not block-diagonal, which implies that our

model is not a dynamic GLM. Under the usual regularity conditions for maximumlikelihood estimation and when the sample size is large,

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

α

β

φ

ϕ

θ

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

∼ N(k+p+q+2)

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

α

β

φ

ϕ

θ

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

,K−1

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

approximately, where Nr denotes the r-dimensional normal distribution, and α, β ,φ, ϕ, and θ are the CMLEs of α, β , φ, ϕ, and θ , respectively.

4 Hypothesis testing and prediction

Consider the following null and alternative hypotheses:

H0 : tγ = 0 and H1 : tγ �= 0, (7)

where t is an r × (k + p + q + 2) matrix (r < k + p + q + 2) of rank r . For instance,consider the following partition of the (k + p + q + 2) × 1 parameter vector: γ =(γ ′

1, γ′2)

′, where γ2 is r × 1 (r < k + p + q + 2). Note that by letting tγ = γ2 in (7)one can test whether γ2 equals zero.

Let γ be the CMLE of γ under the null hypothesis in (7) and let γ be the un-restricted CMLE of γ . The test statistic commonly used to test H0 : tγ = 0 is theconditional log-likelihood ratio statistic (CLR):

λn = 2{�(γ ) − �(γ )

},

where �(·) is the conditional log-likelihood function. Under mild regularity condi-

tions and under H0, λnD−→ χ2

r , whereD−→ denotes convergence in distribution, so


that the test can be performed using approximate critical values from the limiting χ2r

null distribution. One can also base the testing inference on the square root of theCLR statistic, where the sign of the statistic is that of (γ − γ ), which is asymptoti-cally standard normal under the null hypothesis. It is also noteworthy that by usingthe asymptotic normality of the CMLE of γ , γ , one can easily construct approximateconfidence intervals for the elements of γ .

We shall now consider tests of model adequacy. Li (1991) proposed Portmanteauand score statistics for Markov regression models. We shall now follow his approachto provide Portmanteau and score statistics for the βARMA model. At the outset,consider the standardized score errors defined as

at = y�t − μ�

t√ψ ′(μtφ) + ψ ′((1 − μt)φ)

,

and note that it follows from the regularity conditions that E(at |Ft−1) = 0 andVar(at |Ft−1) = 1/φ; also, E(aiaj ) = 0 whenever i �= j . Then, the lag k innovationautocorrelation of at is

Ck = 1

n

n∑

t=k+1

φatat−k.

The corresponding kth residual autocorrelation, Ck , can be written as

Ck = 1

n

n∑

t=k+1

φat at−k.

We shall work with the subsets of the score vector and Fisher’s information matrixrelative to β,φ and θ . The following quantity will be useful:

V = limn→∞

1

nK,

where K denotes Fisher’s information matrix. Consider the following partitionof V −1:

V −1 = limn→∞

1

n

⎡

⎢⎢⎢⎢⎢⎢⎢⎣

Kαα Kαβ Kαφ Kαϕ Kαθ

Kβα Kββ Kβφ Kβϕ Kβθ

Kφα Kφβ Kφφ Kφϕ Kφθ

Kϕα Kϕβ Kϕφ Kϕϕ Kϕθ

Kθα Kθβ Kθφ Kθϕ Kθθ

⎤

⎥⎥⎥⎥⎥⎥⎥⎦

,

and let V αβϕθ be the block of V −1 which corresponds to α,β,ϕ, and θ :

V αβϕθ = limn→∞

1

n

⎡

⎢⎢⎢⎢⎣

Kαα Kαβ Kαϕ Kαθ

Kβα Kββ Kβϕ Kβθ

Kϕα Kϕβ Kϕϕ Kϕθ

Kθα Kθβ Kθϕ Kθθ

⎤

⎥⎥⎥⎥⎦.


Let C = (C1, . . . , Cm)′ for some m > 0. Then, following Sect. 2 and the Appendixof Li (1991), it can be shown that, under correct model specification,

√nC is asymp-

totically normally distributed with mean zero and variance Im − φX′V αβϕθX, whereIm is the m × m identity matrix and

X = limn→∞

1

n

⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

∑htat−1 · · · ∑

htat−m

∑(xt1 − ∑p

i=1 ϕix(t−i)1)htat−1 · · · ∑(xt1 − ∑p

i=1 ϕix(t−i)1)htat−m

.... . .

...∑(xtk − ∑p

i=1 ϕix(t−i)k)ht at−1 · · · ∑(xtk − ∑p

i=1 ϕix(t−i)k)ht at−m

∑(g(yt−1) − x′

t−1β)htat−1 · · · ∑(g(yt−1) − x′

t−1β)htat−m

.... . .

...∑(g(yt−p) − x′

t−pβ)htat−1 · · · ∑(g(yt−p) − x′

t−pβ)htat−m

∑rt−1htat−1 · · · ∑

rt−1htat−m

.... . .

...∑rt−qhtat−1 · · · ∑

rt−qhtat−m

⎤

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

,

with

ht = (ψ ′(μiφ) + ψ ′((1 − μi)φ))1/2

g′(μt ).

Hence, a test for the joint significance of the first m autocorrelations can be based onnC′(Im − φX′V αβϕθ X)−1C, which is asymptotically χ2

m under the null hypothesisof no serial correlation.

Score tests on the parameter vector can be performed using the approach proposedby Li (1991). Let

β = (β ′1, β

′2)

′, ϕ = (ϕ′1, ϕ

′2)

′, θ = (θ ′1, θ

′2)

′, and τ = (β ′2, ϕ

′2, θ

′2)

′.

The null hypothesis is τ = 0 which is to be tested against the alternative that thenumber of parameters is 1+k1 +k2 +p1 +p2 +q1 +q2, where, ki,pi, and qi are thenumber of parameters in βi,ϕi, and θi , respectively, for i = 1,2.2 The correspondingscore function is

U(γ ) =

⎛

⎜⎜⎜⎜⎝

Uα(γ )

Uβ(γ )

Uϕ(γ )

Uθ (γ )

⎞

⎟⎟⎟⎟⎠,

where γ = (α,β,φ,ϕ, θ). It is possible to show that U/√

n is asymptotically nor-mally distributed with mean zero and variance Vαβϕθ when the null hypothesis

2Note that we do not include α in τ ; one can, however, consider the case where τ = (α,β ′2, ϕ′

2, θ ′2)′ when

the null hypothesis also imposes α = 0.


is true, where Vαβϕθ is the part of V that corresponds to α,β,ϕ, and θ . Then,following Li (1991), and noting that under the null hypothesis U (γ1) = 0, whereγ1 = (α,β ′

1, φ,ϕ′1, θ

′1)

′, a score test statistic is

Q = n−1U (γ )′V αβϕθ U(γ ),

where the estimates are obtained under the null hypothesis. Let a = (a1, . . . , an),St = htZt , S = (S1, . . . , Sn)

′ and

Zt =

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1(xt1 − ∑p

i=1 ϕix(t−i)1)...

(xtk − ∑p

i=1 ϕix(t−i)k)

(g(yt−1) − x′t−1β)

...

(g(yt−p) − x′t−pβ)

rt−1...

rt−q

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

We can rewrite Q as

Q = φ2a′SV αβϕθ S′an

.

An asymptotically equivalent statistic is

Q = φ2a′SKαβϕθ S′a,

where Kαβϕθ is the block of Fisher’s information matrix inverse which correspondsto α,β,ϕ, and θ :

Kαβϕθ =

⎡

⎢⎢⎢⎢⎣

Kαα Kαβ Kαϕ Kαθ

Kβα Kββ Kβϕ Kβθ

Kϕα Kϕβ Kϕϕ Kϕθ

Kθα Kθβ Kθϕ Kθθ

⎤

⎥⎥⎥⎥⎦.

Under the null hypothesis, Q is asymptotically χ2k2+p2+q2

. We note that the condi-tional score test that we just developed only requires the estimation of the null model.

In order to produce forecasts, the CMLE of γ , γ , must be used to obtain estimatesfor μt , t = m + 1, . . . , n, say μt . By using μt one can obtain the estimates of rt , rt ,for t = m + 1, . . . , n (based on the functional structure of the error). For N > n, theforecast of the error rN equals zero. Thus, to predict the mean value of the process atT > n, one should use the CMLE of γ , γ ; the estimates of μt , t = m + 1, . . . , n; theestimates of rt , t = m + 1, . . . , n; replace rt by zero if t > n (these suffice to obtain


μn+1, and one can then proceed analogously to obtain μn+2, and so on); and replaceyt by μt if n < t < T . For instance, the mean response estimate at n + 1 is

μn+1 = g−1

(α + x′

n+1β +p∑

i=1

ϕi

{g(yn+1−i ) − x′

n+1−i β} +

q∑

j=1

θj rn+1−j

).

At time n + 2, we obtain

μn+2 = g−1

(α + x′

n+2β +p∑

i=2

ϕi

{g(yn+2−i ) − x′

n+2−i β}

+ ϕ1{g(μn+1) − x′

n+1β} +

q∑

j=2

θj rn+2−j

),

and so on.Finally, we note that model selection can be performed using the Akaike informa-

tion criterion (AIC) introduced by Akaike (1973, 1974) or, alternatively, the Bayesianinformation criterion (BIC) of Schwarz (1978). For a detailed discussion of informa-tion criteria and their properties, see Choi (1992).

5 An application

This section contains an application of the βARMA model proposed in Sect. 2. Theestimations and computations were carried out using the free statistical software R;see http://www.r-project.org. We used the quasi-Newton algorithm known as BFGSto maximize the conditional log-likelihood function. The data refers to the rate of hid-den unemployment due to substandard work conditions in São Paulo, Brazil (TDOP-RMSP). Hidden unemployment due to substandard work conditions relates to peoplewho work illegally, who perform unpaid work for relatives, and also who have beenseeking employment for the past 12 months. The data were obtained from the data-base of the Applied Economic Research Institute (IPEA) from the Brazilian FederalGovernment3 and covers a period of 179 months (January 1991 through November2005). The maximum and minimum values are 0.057 and 0.024, respectively, and theaverage unemployment rate equals 0.044. A time series plot of the data is given inFig. 1.

We shall consider four βAR models (p = 1, . . . ,4); see Table 1. The link func-tion is logit and model selection is carried out using the AIC (Akaike informationcriterion) and the BIC (Bayesian information criterion):

AIC = −2� + 2p and BIC = −2� + p log(n),

where � denotes the log-likelihood function evaluated at the maximum likelihoodestimates, p is the number of autoregressive parameters, and n is the sample size.

3See http://www.ipeadata.gov.br or obtain directly from http://beta.arma.googlepages.com/beta-arma-data.txt.

http://www.r-project.org

http://www.ipeadata.gov.br

http://beta.arma.googlepages.com/beta-arma-data.txt

http://beta.arma.googlepages.com/beta-arma-data.txt


Fig. 1 Rate of hidden unemployment in São Paulo, Brazil

Table 1 βAR models

Model 1 ηt = α + ϕ1g(yt−1)

Model 2 ηt = α + ϕ1g(yt−1) + ϕ2g(yt−2)

Model 3 ηt = α + ϕ1g(yt−1) + ϕ2g(yt−2) + ϕ3g(yt−3)

Model 4 ηt = α + ϕ1g(yt−1) + ϕ2g(yt−2) + ϕ3g(yt−3) + ϕ4g(yt−4)

Fig. 2 Sample autocorrelation functions of the standardized residuals obtained from Models 1 and 5

The AIC selects Model 4 whereas the BIC picks Model 1. We note that the co-efficient of g(yt−2) (Model 4) is not statistically significant at the usual significancelevels, since the corresponding p-value equals 0.898. We thus consider a new model,namely, the model with AR terms g(yt−1), g(yt−3), and g(yt−4) (Model 5). Figure 2shows the residual correlograms corresponding to Models 1 and 5. It is clear that theresiduals from Model 1 are serially correlated, unlike the residuals obtained usingModel 5. We thus select Model 5 as the best model.


The estimated model is

μt = exp{α + ϕ1g(yt−1) + ϕ3g(yt−3) + ϕ4g(yt−4)}1 + exp{α + ϕ1g(yt−1) + ϕ3g(yt−3) + ϕ4g(yt−4)} ,

where (α, ϕ1, ϕ3, ϕ4) = (−0.16726,1.18317,−0.57566,0.33718), and the respec-tive asymptotic standard errors obtained from the inverse of Fisher’s informationmatrix are (0.0611, 0.0479, 0.0918, 0.0692).

As a final step in the analysis, we turn to forecasting. We remove the final sixobservations from the series, fit the model (Model 5) and produce six out-of-sampleforecasts. The observed values are 0.051, 0.052, 0.050, 0.049, 0.046, and 0.046, andthe corresponding forecasts are 0.052, 0.052, 0.051, 0.050, 0.050, and 0.049. TheβAR forecasts are, overall, quite accurate.

6 Concluding remarks

In this paper we proposed a dynamic beta regression model: the βARMA model.It can be used to model random variates that are continuous, assume values in thestandard unit interval (0,1) and are observed over time. The proposed model is par-ticularly useful for the time series modeling of rates and proportions. The model isbuilt upon the assumption that the conditional distribution of the variable of interestgiven its past behavior is beta. As is well known, the beta distribution is very flexiblefor modeling data that are restricted to the standard unit interval, since the beta den-sity can display quite different shapes depending on the values of the parameters thatindex the distribution. Parameter estimation is performed by maximum likelihood,and we derived closed-form expressions for the score function and Fisher’s informa-tion matrix. Hypothesis testing inference can be carried out using standard asymptotictests. The proposed βARMA yields fitted values and out-of-sample forecasts whichbelong to the standard unit interval, unlike the standard ARMA model fitted to ratesand proportion time series data.

Acknowledgements We thank two referees for their comments and suggestions. We also gratefullyacknowledge partial financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior(CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petroc BN,Kaski F (eds) Second international symposium in information theory. Akademiai Kiado, Budapest,pp 267–281

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control AC-19:716–723

Benjamin MA, Rigby RA, Stasinopoulos M (2003) Generalized autoregressive moving average models.J Am Stat Assoc 98:214–223

Choi B (1992) ARMA model identification. Springer, New YorkCribari-Neto F, Vasconcellos KLP (2002) Nearly unbiased maximum likelihood estimation for the beta

distribution. J Stat Comput Simul 72:107–118


Ferrari SLP, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat31:799–815

Fokianos K, Kedem B (2004) Partial likelihood for time series following generalized linear models. J TimeSer Anal 25:173–197

Li WK (1991) Testing model adequacy for some Markov regression models for time series. Biometrika78:83–89

Li WK (1994) Time series models based on generalized linear models: some further results. Biometrics50:506–511

McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonNelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc A 135:370–384Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New YorkSchwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464Shephard N (1995). Generalized linear autoregressions. Technical report, Nuffield College, Oxford Uni-

versity. Manuscript available at http://www.nu.ox.ac.uk/economics/papers/1996/w8/glar.psVasconcellos KLP, Cribari-Neto F (2005) Improved maximum likelihood estimation in a new class of beta

regression models. Braz J Probab Stat 19:13–31Zeger SL, Qaqish B (1988) Markov regression models for time series: a quasi-likelihood approach. Bio-

metrics 44:1019–1031

http://www.nu.ox.ac.uk/economics/papers/1996/w8/glar.ps

Date post:	25-Oct-2014
Category:	Documents
Upload:	lais-helen-loose
View:	65 times
Download:	2 times

Rocha and Cribari-Neto (2009) Beta Autoregressive Moving Average Models

Documents