Censored Gamma Regression Models for Limited … WG Meeting Surety...Censored Gamma Regression...

Censored Gamma Regression Models for Limited

Dependent Variables with an Application to Loss

Given Default

Fabio Sigrist, Werner A. Stahel∗

Seminar for Statistics, ETH Zurich

November 9, 2010

Abstract

Regression models for limited continuous dependent variables havinga non-negligible probability of attaining exactly their limits are presented.The models differ in the number of parameters and in their flexibility. Itis shown how to fit these models and they are applied to a Loss GivenDefault dataset from insurance to which they provide a good fit.

Keywords: Censored models, Tobit models, limited dependent variables,generalized linear models, Loss Given Default

1 Introduction

In insurance, losses are frequently restricted to be positive and below an upperbound defined by a contract. We analyze a loss given default dataset from aninsurance category called “Surety”. In this example, claims cannot exceed aprespecified insured maximum, i.e., the ratio of loss over maximum is boundedby 1. On the other hand, for several reasons, the loss claims often do not lead toultimate losses. The interest is in relating this variable to a set of explanatoryvariables by a regression model.

A special feature of such data is the non-zero probability for the boundaryvalues. Natural models therefore are censored random variables. They occur indifferent fields of application. In economics, analyzing household expenditureon durable goods, Tobin (1958) first introduced such a model which later wascoined Tobit model by Goldberger (1964). In climate science, precipitation canbe modelled using censored distributions (see, e.g., Bardossy and Plate (1992)or Sanso and Guenni (2004)).

We present a regression model for a variable which takes values between alower and an upper limit and which attains these two points arbitrarily often. In

∗We thank Hans-Rudolf Kunsch for helpful comments and discussions.

1

between these two limits, the variable is assumed to have a continuous distribu-tion. The distribution of this limited dependent variable is later specified to bea censored, shifted gamma distribution. The model determines the distributionof the values between the limits and the frequency of the limiting values in aparsimonious way by requiring three parameters.

However, this might lead to a poor fit for certain data. We, therefore, alsointroduce two extensions of the model to cover cases in which the frequenciesof the limits do not follow this parsimonious description. For instance, in ourexample there may even be administrative reasons for an excessive number ofzero losses, due to incentives to place a claim with little justification. Suchpreventive filing may result in a large number of “additional zeros”. This ideasuggests a mixture model, consisting of a censored part, as introduced above,and a model for the additional zeros. Alternatively, the probabilities of attainingthe boundary value(s) can be modelled separately from the continuous part inbetween them.

The rest of the paper is organized as follows. In Section 2, we introduce thecensored gamma model, show how it can be interpreted, and derive an estima-tion procedure for it. In Section 3, two possible generalizations are presented.In Section 4, we illustrate an application of the models to the dataset mentionedabove.

2 The Censored Gamma Model

In order to establish ideas, consider the Tobit model in its two sided version asdeveloped by Rosett and Nelson (1975). It is assumed that there exists a latentvariable Y ∗ which is, conditional on some covariates X = (X1, . . . , Xp) ∈ Rp,normally distributed. This variable is observed only if it lies in the interval[0, 1]. Otherwise, we observe 0 or 1, depending on whether the latent variableis smaller than 0 or greater than 1, respectively. If Y denotes the observedvariable, this can be expressed as

Y ∗|X ∼ N (µ, σ2) (1)

and

Y = 0, if Y ∗ ≤ 0,

= Y ∗, if 0 < Y ∗ < 1,

= 1, if Y ∗ ≥ 1.

(2)

Furthermore, the expectation µ of the latent variable Y ∗ is related to thecovariates X through

µ = X ′β, β ∈ Rp.

For more details, e.g., on inference, we refer to Maddala (1983), Chapter 6,and Amemiya (1985), Chapter 10. Furthermore, Breen (1996) and Long (1997)give overviews of models for limited dependent variables.

2

Clearly, the assumption of a normal distribution for Y ∗ is not adequare forall data. It is well known that the Tobit model is sensitive to distributionalassumptions (see, e.g., Arabmazar and Schmidt (1982) or Maddala and Nelson(1975)). A natural extension is to replace the normal distribution by anotherone. We choose a shifted gamma distribution since it is a flexible distributionthat is applied in many areas and it provides a good fit to the dataset of in-surance claims mentioned above. We note that semiparametric models (see,e.g., Khan and Powell (2001) or Chen and Khan (2001)) are another potentialextension. However, it is not straightforward to use such models for assessingthe whole distribution of the variable given specific values for the explanatoryvariables, and they are prone to complications and efficiency losses when theproportion of boundary values is high, as is the case for out data including morethan 50% of zeros. Furthermore, a parametric model has the advantage that itcan be readily extended to allow for additional zeros (see Section 3).

To avoid unnecessary inflation of notation, we let the boundaries of theobserved variable be 0 and 1. The model is easily generalized for variables whoserange of values is any interval [yl, yu] with yl < yu, though. This might be doneeither by first applying a linear transformation to the respective variable or byreformulating the model. The case where the observations are only boundedfrom below is included by letting yu →∞.

2.1 The model

Generalizing the Tobit model specified in (1) and (2), it is assumed that thereexists a latent variable Y ∗ which has, conditional on x, a distribution withdensity f∗θ∗(y

∗) and cumulative distribution function F ∗θ∗(y∗), θ∗ being a vector

of parameters. The observed dependent variable Y then depends on the latentvariable as in (2).

It follows that the distribution of such a censored variable Y can be charac-terized by

P [Y = 0] = F ∗θ∗(0),

P [Y ∈ (y, y + dy)] = f∗θ∗(y)dy, 0 < y < 1,

P [Y = 1] = 1− F ∗θ∗(1).

(3)

Consequently, the density of the observed variable Y can be written as

fθ∗(y) =F ∗θ∗(0)δ0(y) + f∗θ∗(y)1{0<y<1}(y) + (1− F ∗θ∗(1))δ1(y), 0 ≤ y ≤ 1,(4)

where δ0(y) and δ1(y) are Dirac measures and where 1{0<y<1}(y) denotes theindicator function equaling 1 if 0 < y < 1 and 0 otherwise.

In order to extend the model to the regression case, we relate the covariatesx to the distribution of Y ∗. This is done by assuming that the main parameterϑ of the distribution of Y ∗, which might be the mean or a scale parameter, isrelated through a link function g to the covariates,

g(ϑ) = x′β. (5)

3

In the following, we will mostly focus on the case where the distribution ofY ∗ is specified as a gamma distribution with a shifted origin. The density andthe distribution function of a gamma distributed variable with shape parameterα and scale parameter ϑ are denoted by gα,ϑ(y) and Gα,ϑ(y), respectively. Thedensity of a shifted gamma distribution is then

gα,ϑ(y∗ + ξ) =1

ϑαΓ(α)(y∗ + ξ)α−1e−(y

∗+ξ)/ϑ, y∗ > −ξ,

where ξ, ϑ, α > 0, and its distribution function is Gα,ϑ(y∗ + ξ).The density of the observed Y can then be expressed as

fα,ϑ,ξ(y) =Gα,ϑ(ξ)δ0(y) + gα,ϑ(y + ξ)1{0<y<1}(y)

+ (1−Gα,ϑ(1 + ξ))δ1(y), 0 ≤ y ≤ 1.(6)

The use of a gamma distribution with a shifted origin, instead of a standardgamma distribution, is motivated by the fact that the lower censoring occurs atzero. In this case, the shift ξ is needed to obtain a positive probability of Y = 0.

For the regression case, we assume that the scale parameter ϑ is related tothe covariates via the logarithmic link function

log(ϑ) = x′β. (7)

Henceforth and if not otherwise stated, we assume that Y ∗ (and Y ) followa (censored) shifted gamma distribution. We will refer to this model as the“censored gamma model”.

Note that that if no censoring occurred and ξ was set to zero, the censoredgamma model would be a generalized linear model (McCullagh and Nelder 1983)for a gamma distributed variable with a logarithmic link function.

2.2 Interpretation

Regression model generally describe how the expectation of a target variabledepends on predictors. If the focus lies on the latent response variable Y ∗, theinterpretation is straightforward. Since

E[Y ∗|x] = αϑ− ξ, (8)

the marginal effect of a continuous predictor xj on Y ∗ is

∂E[Y ∗|x]

∂xj= αϑβj . (9)

On the other hand, one might be primarily interested in the observed variableY , rather than the latent variable Y ∗. Its mean and corresponding marginaleffects are calculated in the following lemma.

4

Lemma 2.1 The following holds true.

E[Y |x] =αϑ (Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ))

+ (1 + ξ) (1−Gα,ϑ(1 + ξ))− ξ (1−Gα,ϑ(ξ)) ,(10)

and for a continuous covariate xj,

∂E[Y |x]

∂xj= αϑ(Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ))βj . (11)

The derivation of these two equations is shown in Appendix A.1.We note that the marginal effect of xj on E[Y |x] is a scaled version of

the effect on E[Y ∗|x], with the scaling factor depending nonlinearly on thecovariates.

2.3 Estimation

In this section, it is shown how to perform maximum likelihood estimation forthe censored gamma model using a Newton-Raphson method known as Fisher’sscoring algorithm (see, e.g., Fahrmeir and Tutz 2001).

Denoting generically by θ all parameters that are to be estimated and by`(θ) the log-likelihood, Fisher’s scoring algorithm starts with an initial estimate

θ(0)

and iteratively calculates (until convergence is achieved)

θ(k+1)

= θ(k)

+ I(θ(k))−1

s(θ(k)), k = 0, 1, 2, . . . ,

where

s (θ) =∂`(θ)

∂θ

denotes the score function, i.e., the first derivative of the log-likelihood, and

I (θ) = Eθ

[s (θ) s (θ)

T]

is the Fisher Information Matrix. How these two quantities are calculated forthe censored gamma model, is shown in the following.

First, we reparametrize the shape parameter α through

α′ = log(α) (12)

to ensure that α attains only positive values. The parameters that are to beestimated, therefore, consist of θ = (α′,β, ξ).

Assuming that we have independent data y1, . . . , yn with covariates x1, . . . ,xn,the log-likelihood function can be written as

`(θ) =

n∑i=1

ì(θ).

5

Lemma 2.2 The following relations hold true.

∂ì(θ)

∂α′=

α

Gα,ϑi(ξ)

(−ψ(α)Gα,ϑi(ξ) +H(1)

α

(0,ξ

ϑi

))1{yi=0} (13)

+ a (− log(ϑi)− ψ(α) + log(yi + ξ))1{0<yi<1}

− α

1−Gα,ϑi(1 + ξ)

(−ψ(α)Gα,ϑi(1 + ξ) +H(1)

α

(0,

1 + ξ

ϑi

))1{yi=1},

∂ì(θ)

∂βk=− xikξ

gα,ϑi (ξ)

Gα,ϑi(ξ)1{yi=0} + xik

(−α+

yi + ξ

ϑi

)1{0<yi<1} (14)

+ xik(1 + ξ)gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)1{yi=1},

∂ì(θ)

∂ξ=

gα,ϑi(ξ)

Gα,ϑi(ξ)1{yi=0} +

(α− 1

yi + ξ− 1

ϑi

)1{0<yi<1} −

gα,ϑi(1 + ξ)

1−Gα,ϑi(1 + ξ)1{yi=1},

(15)

where

ψ(α) =d log(Γ(α))

dα

denotes the digamma function (see Abramowitz and Stegun 1964) and the func-

tions H(1)α and H

(2)α are defined as1

H(1)α (l, u) :=

1

Γ(α)

∫ u

l

log(y)yα−1 exp(−y)dy (16)

and

H(2)α (l, u) :=

1

Γ(α)

∫ u

l

log(y)2yα−1 exp(−y)dy. (17)

In the following, we show how the scoring functions in (13), (14), and (15) arederived.

At first, we infer from (3) that the likelihood function of an interval censoredgamma distribution can be written as

Ly(α, ϑ, ξ) = Gα,ϑ(ξ)1{y=0}+gα,ϑ(y+ξ)1{0<y<1}+(1−Gα,ϑ(1+ξ))1{y=1} (18)

which is equivalent to writing

Ly(α, ϑ, ξ) = Gα,ϑ(ξ)1{y=0} · gα,ϑ(y + ξ)1{0<y<1} · (1−Gα,ϑ(1 + ξ))1{y=1} . (19)

It follows that we can write the log-likelihood function ì(θ) of an observation

1We note that the functions H(1)α (l, u) and H

(2)α (l, u) can be calculated using numerical

integration. In our application, we did this by adaptive quadrature using the QUADPACKroutines ’dqags’ and ’dqagi’ (Piessens, deDoncker-Kapenga, Uberhuber and Kahaner 1983)available from Netlib.

6

yi as

ì(θ) = log(Gα,ϑi(ξ))1{yi=0} + log(gα,ϑi(yi + ξ))1{0<yi<1}

+ log(1−Gα,ϑi(1 + ξ))1{yi=1}

= log(Gα,ϑi(ξ))1{yi=0}

+

(−α log(ϑi)− log(Γ(α)) + (α− 1) log(yi + ξ)− yi + ξ

ϑi

)1{0<yi<1}

+ log(1−Gα,ϑi(1 + ξ))1{yi=1},

whereϑi = exp(x′iβ) and α = exp(α′).

The derivative of ì with respect to the parameter α′ in (13) is calculatedusing the following identity.

∂Gα,ϑ(ξ)

∂α=∂Gα,1

(ξϑ

)∂α

=∂

∂a

(1

Γ(α)

∫ ξ/ϑ

0

yα−1 exp(−y)dy

)

=−Γ′(α)

Γ(α)2

∫ ξ/ϑ

0

yα−1 exp(−y)dy +1

Γ(α)

∫ ξ/ϑ

0

log(y)yα−1 exp(−y)dy

= −ψ(α)Gα,ϑ(ξ) +H(1)α

(0,ξ

ϑ

). (20)

Furthermore, using

∂ì(θ)

∂βk=∂ì(θ)

∂ϑi

∂ϑi∂βk

=∂ì(θ)

∂ϑiϑixik

and (32), differentiating ì(θ) with respect to βk gives the result in (14). Finally,the calculation of the derivative with respect to ξ in (15) is straightforward.

For the Fisher-scoring algorithm and for asymptotic inference, we calculatethe Fisher Information Matrix

I(θ)k,l = Eθ

[∂`(θ)

∂θk

∂`(θ)

∂θl

], 1 ≤ k, l ≤ 2 + p.

Because of the independence of the observations, this can be written as

I(θ)k,l = Eθ

[(n∑i=1

∂ì(θ)

∂θk

)(n∑i=1

∂ì(θ)

∂θl

)]

=

n∑i=1

Eθ

[∂ì(θ)

∂θk

∂ì(θ)

∂θl

].

The specific calculations of the entries Eθ

[∂ì(θ)∂θk

∂ì(θ)∂θl

]are shown in Ap-

pendix A.2.

7

As mentioned before, on the one hand the Fisher Information Matrix I(θ)can be used in the Fisher-scoring algorithm for fitting the model and, on theother hand, it is also used for asymptotic inference, in particular to estimatestandard errors of the coefficients β.

3 Two Extensions of the Model

A striking feature of the model defined in (3) is the assumption that the sameparameters govern both the behaviour of the uncensored values as well as theprobabilities of being censored from below or above.

In the case of the Tobit model, the same holds true. In order to relax thisassumption, various extensions have been proposed. Sample selection models,first introduced by Heckman 1976, are one approach. Cragg 1971 came forwardwith another proposal relaxing the aforementioned assumption of one set ofparameters governing the entire model.

For count data, similar problems can arise: there may be more zeros thanexpected by a simple model, which would otherwise fit well. Basically, twodifferent kind of solutions have been put forward there.

Aitchison 1955 first proposed to model the zeros and the values bigger thanzero separately. Mullahy 1986 used a mixture consisting of a distribution forthe whole range of data, including zeros, and a point mass at zero to captureextra zeros. These two types of models have been extensively applied in differ-ent areas of research including manufacturing defects (Lambert 1992), patentapplications (Crepon and Duguet 1997), road safety (Miaou 1994), species abun-dance (Welsh, Cunningham, Donnelly and Lindenmayer 1996), medical consul-tations (Gurmu 1997), use of recreational facilities (Gurmu and Trivedi 1996;Shonkwiler and Shaw 1996), and sexual behaviour (Heilbron 1994). Ridout,Demetrio and Hinde 1998 give an overview of these models.

Our two extensions are based on similar ideas. The main difference is theway in which the zeros are modeled. In the first extension, the zeros and the non-zero values are modeled separately assuming that the mechanisms that governthe probability of Y being zero and the non-zero part are different. In thesecond extension, the zeros are modelled as a mixture of two mechanisms. Oneis responsible for artificial or extra zeros whereas the other part is the censoredgamma model introduced in Section 2.

3.1 The Two-tiered Gamma Model

Inspired by the approach of Cragg 1971, we extend the model in (3) by allowingfor two different sets of parameters, one governing the probability of Y beingzero, and the other the behaviour for 0 < Y ≤ 1.

Alternatively, the model could also be extended by allowing for a different setof parameters governing the probability of Y being one. The extension presentedhere, which we will call two-tiered gamma model, is mainly motivated by thepresumption that zeros are generated by another mechanism than the one that

8

governs the rest of the data. We remark that the extension to a “three-tiered”model including a different set of parameters for governing the probability of Ybeing one is straightforward.

More specifically, in the two-tiered gamma model, it is assumed that thereexist two latent variables

Y ∗1 ∼ Gα,ϑ(y∗1 + ξ), with ϑ = exp(x′γ), γ ∈ Rp

and

Y ∗2 ∼ Gα,ϑ(y∗2 + ξ) truncated at 0, with ϑ = exp(x′β), β ∈ Rp.

The first latent variable Y ∗1 is again following a shifted gamma distribution,whereas the second variable Y ∗2 has shifted gamma distribution that is lowertruncated at zero. These two latent variables are then related to Y through

Y = 0 if Y ∗1 ≤0

= Y ∗2 if 0 < Y ∗1 and Y ∗2 <1

= 1 if 0 < Y ∗1 and 1 ≤ Y ∗2 .

In other words, the two-tiered gamma model first decides whether Y is zero ornot. This is modeled in the style of a probit model, using, however, a cumulativegamma distribution function instead of a normal one. It is then assumed that,conditional on Y > 0, 0 < Y ≤ 1 has a lower truncated and upper censoredgamma distribution.

The distribution of Y can then be characterized as follows.

P [Y = 0] =Gα,ϑ(ξ),

P [Y ∈ (y, y + dy)] =gα,ϑ(y + ξ)1−Gα,ϑ(ξ)

1−Gα,ϑ(ξ)dy, 0 < y < 1,

P [Y = 1] =(1−Gα,ϑ(1 + ξ))1−Gα,ϑ(ξ)

1−Gα,ϑ(ξ),

(21)

withϑ = exp(x′β), ϑ = exp(x′γ), β,γ ∈ Rp, α, ξ > 0.

Again, gα,ϑ(y) denotes the density of a Gamma(α, ϑ) distributed variable andGα,ϑ(y) is the corresponding distribution function.

We remark that the distributions in both parts of the two-tiered model, i.e.,the part modelling the probability of Y being zero and the part governing thebehaviour of 0 < Y ≤ 1, are assumed to have the same shape parameter α andthe same location parameter ξ. Consequently, if β = γ, or ϑ = ϑ, the two-tieredgamma model presented here and the aforementioned censored gamma modelcoincide, which means that these two models are nested. This is convenient formodel comparison since it allows to use a likelihood ratio test to compare thetwo models.

9

3.2 Estimation of the Two-tiered Gamma Model

Having in mind that the censored gamma model is nested in the two-tieredgamma model, we restrict ourselves to estimating the coefficients β and γ ofthe two linear predictors using Fisher’s scoring algorithm. The shape parameterα and the location parameter ξ could be estimated via numerical optimizationin an outer loop or they could be obtained from first fitting a censored gammamodel.

With θ = (β,γ), the log-likelihood function of the model can be written as`(θ) =

∑ni=1 ì(θ) with

ì(θ) = log(Gα,ϑi(ξ))1{yi=0}

+ (log(gα,ϑi(yi + ξ)) + log(1−Gα,ϑi(ξ))− log(1−Gα,ϑi(ξ)))1{0<yi<1}

+ (log(1−Gα,ϑi(1 + ξ) + log(1−Gα,ϑi(ξ))− log(1−Gα,ϑi(ξ)))1{yi=1},

whereϑi = exp(x′iβ), ϑi = exp(x′iγ).

The score functions are

∂ì(θ)

∂βk=xik

(yi + ξ

ϑi− a− ξ · gα,ϑi (ξ)

1−Gα,ϑi(ξ)

)1{0<yi<1}

+ xik ·(

(1 + ξ) · gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)− ξ · gα,ϑi (ξ)

1−Gα,ϑi(ξ)

)1{yi=1}

(22)

and

∂ì(θ)

∂γk=− xik

ξ · gα,ϑi (ξ)

Gα,ϑi(ξ)1{yi=0} + xik

ξ · gα,ϑi (ξ)

1−Gα,ϑi(ξ)·(1{0<yi<1} + 1{yi=1}

).

(23)

The entries of the Fisher Information Matrix I(θ) are presented in AppendixA.3.

3.3 The Zero-Inflated Gamma Model

The extension presented in this section is motivated by the following idea. As-sume that our quantity of interest follows indeed a censored, shifted gammadistribution. However, additional, artificial zeros occur by some other mech-anism and thus there are more zeros than expected. Recently, a zero-inflatedmodel for censored continuous data has also been presented by Couturier andVictoria-Feser (to appear).

These additional zeros are now allowed to follow their own model, in contrastto the two-tiered model where all zeros were described together. This view maymake sense in specific applications like insurance, where some of the claims thatresult in zero loss may be cases which were filed in order not to miss a formaldeadline or for similar artificial reasons.

10

In the zero-inflated model, the existence of two latent variables is againassumed,

Y ∗1 ∼ N(µ, 1) and Y ∗2 ∼ Gα,ϑ(y∗2 + ξ)

with µ = −x′γ and ϑ = exp(x′β).Concerning the first variable Y ∗1 , there is no longer any need for the shifted

gamma distribution. Again, the censored gamma model is nested in the zero-inflated model. However, the zero-inflated model coincides with the censoredgamma model at the boundary of its parameter space, namely if µ → −∞.For the reason of simplicity, we opt for the normal distribution. I.e., the extrazeros are model using a probit model. Alternatively, one could also use the logitdistribution.

These two variables are then related to Y through

Y = 0 if Y ∗1 ≤0, or if

0 < Y ∗1 and Y ∗2 ≤0

= Y ∗2 if 0 < Y ∗1 and 0 < Y ∗2 <1

= 1 if 0 < Y ∗1 and 1 ≤ Y ∗2

The variable Y ∗1 first decides whether the observed response variable Y iszero, i.e., if Y ∗1 ≤ 0 it follows that Y = 0. Next, conditional on Y ∗1 > 0, Y isdistributed according to a censored, shifted gamma distribution.

This means that the zeros are governed by two different components of themodel. First, zeros can arise if Y ∗1 is smaller than zero. And secondly, theycan occur if, conditional on Y ∗1 > 0, Y ∗2 is smaller than zero. Metaphoricallyspeaking, we add extra mass at zero to the censored gamma distribution, whichcan account for potential extra zeros. This approach allows us to distinguishstructural and extra zeros.

Note that the main distinctive feature of this model, in contrast to the two-tiered model presented in the previous section, is that the distribution of thesecond tier of the model is lower censored instead of lower truncated.

As stated above, we choose to model the extra zeros using a probit model,i.e.,

p0 := P [Y ∗1 ≤ 0] = Φ(x′γ), γ ∈ Rp. (24)

Consequently, the distribution of Y can be characterized by

P [Y = 0] =p0 + (1− p0) ·Gα,ϑ(ξ),

P [Y ∈ (y, y + dy)] =(1− p0) · gα,ϑ(y + ξ)dy, 0 < y < 1,

P [Y = 1] =(1− p0) · (1−Gα,ϑ(1 + ξ)),

(25)

wherep0 = Φ(x′γ), ϑ = exp(x′β), γ,β ∈ Rp, α, ξ > 0.

11

3.4 Estimation of the Zero-Inflated Gamma Model

Since the EM (Dempster, Laird and Rubin 1977) algorithm lends itself natu-rally when it comes to fitting mixtures of distributions and because calcula-tions of scores and the Fisher Information Matrix would be overly complicated,we use the EM algorithm here. Generic optimization algorithms, such as theNelder-Mead method or a quasi-Newton method with numerically approximatedgradients proved to run into convergence problems.

The EM algorithms presented in the following finds the maximum likelihoodestimators of the parameters θ = (α,β,γ). The location parameter ξ is fixedand assumed to be known. Again, ξ could be obtained from first fitting thecensored gamma model or it could be estimated through numerical optimizationin an outer loop.

With regard to the EM algorithm, we introduce two latent data variablesZ and Y ∗. For each i, Zi indicates whether the observation belongs to theextra zero part of the model (Zi = 0) or to the censored gamma distribution(Zi = 1). The second missing data variable Y ∗i is for the censored gamma partof the model. It denotes the value of the underlying latent variable Y ∗i whichthen is censored at zero and one. The complete data W therefore consists of(Z1, Y

∗1 ), . . . , (Zn, Y

∗n ).

Using this, the complete-data likelihood can be written as

LW (θ) =

n∏i=1

(Φ(x′iγ))1−Zi · ((1− Φ(x′iγ)) · gα,ϑi(Y ∗i + ξ))

Zi , (26)

where log(ϑi) = x′iβ and θ = (α,β,γ), and the complete-data log-likelihood is

`W (θ) =

n∑i=1

(1− Zi) log (Φ(x′iγ)) + Zi log ((1− Φ(x′iγ)) · gα,ϑi(Y ∗i + ξ))

=

n∑i=1

(1− Zi) log (Φ(x′iγ)) + Zi log (1− Φ(x′iγ))

+

n∑i=1

Zi

(−α log(ϑi)− log(Γ(α)) + (α− 1) log(Y ∗i + ξ)− Y ∗i + ξ

ϑi

).

(27)

The EM algorithm produces a sequence of estimates {θ(t), t = 0, 1, 2, . . . }by alternatively applying two steps:

E-step. Compute the expected value of the log-likelihood, with respect tothe conditional distribution of W given y under the current estimate of theparameters θ(t):

Q(t+1)(θ) = Eθ(t) [`W (θ)|y] .

M-step. Update the parameter estimated according to:

θ(t+1) = argmaxθ Q(t+1)(θ).

12

From (27) , we infer that in the E-step three different expectations have tobe calculated: Eθ(t) [Zi|y], Eθ(t) [Y ∗i |y], and Eθ(t) [log(Y ∗i )|y]. For the sake ofnotational brevity, we introduce the following two abbreviations:

A(t)i = Φ(x′iγ

(t))

andB

(t)i (ξ) = G

α(t),ϑ(t)i

(ξ).

The three expectations are then calculated as follows:

Eθ(t) [Zi|y] =

(1−A(t)

i )·B(t)i (ξ)

A(t)i +(1−A(t)

i )·B(t)i (ξ)

if yi = 0,

0 if yi > 0,(28)

and

Eθ(t) [log(Y ∗i + ξ)|y] =

log(ϑ

(t)i ) +

H(1)

α(t)

(0, ξ

ϑ(t)i

)B

(t)i (ξ)

if yi = 0,

log(yi + ξ) if 0 < yi < 1,

log(ϑ(t)i ) +

H(1)

α(t)

(1+ξ

ϑ(t)i

,∞)

1−B(t)i (1+ξ)

if yi = 1.

(29)

Concerning the M-step, we note that the log-likelihood in (27) splits intotwo terms which can be maximized separately. The first term contains the pa-rameters of the extra zero model part (γ) and the other contains the parametersof the censored gamma distribution (α and β).

4 An Application

We apply the models presented above to a dataset from insurance. A suretybond is a contractual agreement among three parties: the contractor who per-forms a contractual obligation, the obligee who receives the obligation, and thesurety, in our case the insurance company, who covers the risk that the contrac-tor fails to fulfill the obligation.

Our dataset consists of European surety bonds that resulted in a claim. Theultimate loss for these claims is called “Loss Given Default” (LGD). For eachbond, the maximal amount that is covered by the insurance company, a quantitycalled “face value” (FV), is a priori determined. This allows us to standardizethe LGD by dividing it by the face value, such that our variable of interest liesbetween 0 and 1

0 ≤ LGD

FV≤ 1. (30)

We have worked with the original dataset, but for confidentiality reasons theresults presented here are obtained on the basis of a subsample of the originalset. The subsample, consisting of more than 5000 bonds, is obtained by using

13

a random selection mechanism, with selection probabilities that depend on oncertain characteristics of the respective bonds, so that the value of the averagestandardized loss LGD/FV is altered in order not to reveal the true average.As a consequence, the results presented in this paper are not the real ones butare close enough to the real ones to reflect the major phenomena observed inreality. Even though the results cannot be disclosed, we assure that the fit themodels provide to the original data is at least as good as for the subsample.

For an illustration of the data, see Figure 1 which contains a histogram ofthe standardized losses. Since the insurance company can often recover costs,observations with no ultimate loss at all are frequent. In fact, about 53% ofall bonds in the subsample have no loss. On the other hand, there is a majorproportion (15%) of bonds that have full loss, i.e., a LGD/FV equaling 1.

Figure 1: Histogram of LGD/FV and fitted censored gamma model with nocovariates. The numbers above the blue arrow and bar represent the percentageof LGD/FV’s being exactly zero or one, respectively. In brackets are the corre-sponding numbers as predicted by the censored gamma model. The dashed redline represents the fitted model.

Apart from providing a probabilistic model for the surety LGD, the purposeis also to explore the relation of the losses to certain covariates which are shortlydescribed in the following.

The relative default time (RDT) of a bond is the proportion of time thathas passed at default since its issuance over the total life span of a bond. Thisquantity allows us to explore the time development of the losses from the issuing

14

date to the end date (maturity). Experience and size are two categorical vari-ables, each attaining three different levels, which represent the experience (low,mid, high) and the size (small, medium, large) of the contractor. Furthermore,the face value is included as a covariate. Our data set basically consists of threedifferent types of surety bonds which are called maintenance, performance, andadvance payment bonds. In some cases, it is not possible to distinguish betweenmaintenance and performance bonds and that is why there is an additional cat-egory for these hybrid bonds. Usually, European surety bonds do not cover thewhole amount of an underlying contract but only a certain fraction. We haveinformation about his percentage and include it as an additional covariate.

Figure 2: Scatter plot of face value (on a logarithmic scale) vs. LGD/FV.The jittered points in the bars below 0.0 and above 1.0 represent bonds withLGD/FV being exactly zero and one, respectively. The colored continuouslines are non-parametrically fitted quantiles and mean whereas the dashed linesrepresent the corresponding quantiles and mean of the fitted censored gammamodel with logarithmic and squared logarithmic face value as covariates.

We first estimate the censored gamma model with no covariates and illustrateits fit in Figure 1. The dashed red line represents the fitted model. The numbersin parentheses above the bars show the fitted probabilities of being zero andone. Apparently, the plain model with no covariates fits well to the data. Theobserved and the modeled probabilities of being zero or one are very similar and

15

the continuous part of the model accurately fits the histogram.2

Next, we fit a univariate model including the face value, more specifically thelogarithm and the squared logarithm of the face value, as covariate. We illustratethe fitted model in Figure 2. The colored continuous lines are non-parametricallyfitted quantiles (see Koenker 2005) and mean (calculated using local polynomialregression, see Chambers and Hastie 1992, Chapter 8). The dashed lines repre-sent the corresponding quantiles and mean of the fitted model calculated using(31). The non-parametrically fitted mean and the mean of the fitted censoredgamma model are close together. Moreover, the non-parametrically estimatedquantiles and the quantiles from the fitted model match well.

Finally, in Table 1, the fitted censored gamma, two-tiered and zero-inflatedmodels including all covariates are presented. We remark that for the twoordinal factorial variables experience and size, we use orthogonal polynomialcontrasts. Concerning the factor Type, we use treatment contrasts with thebaseline level equaling maintenance. In the two-tiered and zero-inflated models,the shape parameter α and the location parameter ξ are considered as fixednuisance parameters. Their values are set equal to the ones in the censoredgamma model. Estimates of standard errors are calculated using the Fisherinformation. Concerning the zero-inflated model, standard errors are obtainedby numerically approximating the Fisher Information Matrix at the optimum.The log-likelihood of both the two-tired and zero-inflated inflated models areconsiderably higher than the one of the censored gamma model, the two-tieredmodel having the highest log-likelihood. We interpret this as an indicator thatthe zeros are indeed governed by a separate mechanism.

5 Conclusion

Regression models for limited dependent variables whose range of values isbounded and the values at the boundaries occur frequently were presented.The first model determines the distribution of the values between the limits andthe frequency of the limiting values in a parsimonious way by requiring few pa-rameters. Two extensions of this model to cover cases in which the frequenciesof the limits do not follow this parsimonious description were introduced as well.

The first model was applied to a LGD dataset from insurance to which itprovided a good fit.

A crucial assumption, when fitting the censored gamma model, was the in-dependence assumption among different observations. To relax this assumption,the model can be extended to allow for dependency. This can be done, for in-stance, using random effects in the model specifications as proposed by Min andAgresti (2005).

2Due to the large number of observations, a chi-square goodness of fit test still showssignificant deviations.

16

Mod

elC

enso

red

Tw

o-T

iere

dZ

ero-I

nfl

ate

dC

ovar

iate

Coef

Std

.E

rr.

Coef

(β)

Std

.E

rr.

Coef

(γ)

Std

.E

rr.

Coef

(β)

Std

.E

rr.

Coef

(γ)

Std

.E

rr.

Gam

ma

Par

.lo

g(α

)-1

.58

0.05

***

log(ξ

)-2

.53

0.09

***

Inte

rcep

t3.

860.

35**

*6.1

10.4

***

-0.0

20.4

75.7

60.3

7***

6.1

81.3

9***

RD

TL

in-0

.07

0.11

0.3

70.1

5*

-0.5

90.1

7***

0.0

30.1

30.2

70.2

Qu

ad0.

810.

38*

2.3

10.5

1***

-0.7

0.5

71.5

30.4

6***

1.9

70.6

5**

Exp

erie

nce

Lin

-0.9

50.

08**

*-0

.51

0.0

9***

-1.8

10.1

4***

-0.7

80.0

9***

1.6

0.6

6*

Qu

ad0.

160.

06**

0.0

90.0

60.3

30.1

***

0.1

0.0

6·

-0.6

30.3

7·

Siz

eL

in-0

.12

0.1

0.2

20.1

4-0

.62

0.1

5***

0.0

90.1

30.6

80.1

9***

Qu

ad0.

310.

09**

*-0

.10.1

40.8

20.1

3***

-0.0

20.1

2-0

.92

0.1

5***

Fac

eV

alu

eL

in-0

.82

0.08

***

-1.4

60.0

9***

0.2

60.1

1*

-1.2

70.0

9***

-2.1

20.3

9***

Qu

ad0.

580.

08**

*0.4

80.0

9***

0.9

10.1

5***

0.6

10.0

9***

-1.6

70.5

3**

Typ

eH

yb

rid

-0.1

91.

481.9

2.2

4-1

.89

2.2

60.5

51.8

53.2

12.8

5P

erfo

rman

ce0.

260.

14·

0.5

60.1

8**

-0.1

30.2

30.5

0.1

6**

0.8

40.2

6**

Oth

er0.

450.

19*

1.1

60.3

1***

-0.1

80.2

70.6

80.2

3**

0.4

90.3

2A

dv.

Pay

m.

4.8

2.72·

2.3

23.0

87.5

74.3

8·

3.6

42.7

5-5

.21

8.1

4In

s.F

rac.

1.5

0.55

**1.8

80.8

3*

0.2

70.7

81.6

50.6

5*

1.4

0.8

3·

Log

-Lik

elih

ood

-764

3.6

-7468.4

-7526

Tab

le1:

Fit

ted

cen

sore

d,

two-

tier

ed,

and

zero

-in

flate

dgam

ma

mod

els

incl

ud

ing

all

cova

riate

s.In

the

two-t

iere

dan

dze

ro-

infl

ated

mod

els,

the

valu

esofα

andξ

are

set

equ

al

toth

eon

esin

the

cen

sore

dgam

ma

mod

el.

17

A Proof of Lemma 2.1 and Derivation of FisherInformation Matrices

A.1 Proof of Lemma 2.1

Firstly, a censored gamma distribution with density as in (6) has expectation

E[Y |x] =0 ·Gα,ϑ(ξ) +

∫ 1

0

ygα,ϑ(y + ξ)dy + 1 · (1−Gα,ϑ(1 + ξ))

=

∫ 1+ξ

ξ

(z − ξ)gα,ϑ(z)dz + (1−Gα,ϑ(1 + ξ))

=αϑ(Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ)) + ξGα,ϑ(ξ)

− ξGα,ϑ(1 + ξ) + (1−Gα,ϑ(1 + ξ))

=αϑ(Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ))

+ (1 + ξ) (1−Gα,ϑ(1 + ξ))− ξ (1−Gα,ϑ(ξ)) , (31)

where in the third line we have used the identity (37).Secondly, for a continuous xj , using

∂Gα,ϑ(ξ)

∂ϑ=∂Gα,1(ξ/ϑ)

∂ϑ= − ξ

ϑ2gα,1

(ξ

ϑ

)= − ξ

ϑgα,ϑ (ξ) , (32)

or∂Gα,ϑ(ξ)

∂ϑ= −αgα+1,ϑ (ξ) , (33)

and the fact that∂ϑ

∂xj= ϑβj ,

we can compute the partial derivatives of E[Y |x] with respect to xj as

∂E[Y |x]

∂xj=− ξαgα+1,ϑ (ξ)ϑβj + αϑβj(Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ))

− αϑ1 + ξ

ϑgα+1,ϑ (1 + ξ)ϑβj + αϑ

ξ

ϑgα+1,ϑ (ξ)ϑβj

+ (1 + ξ)αgα+1,ϑ (1 + ξ)ϑβj

=αϑ(Gα+1,ϑ(1 + ξ)−Gα+1,ϑ(ξ))βj . (34)

18

A.2 Fisher Information Matrix for the Censored GammaModel

With (13), it follows that

Eθ

[∂ì∂α′

∂ì∂α′

]= Eθ

[(α

Gα,ϑi(ξ)


α

(0,ξ

ϑi

))1{yi=0}

)2]

+ Eθ

[(α (− log(ϑi)− ψ(α) + log(yi + ξ))1{0<yi<1}

)2]+ Eθ

[(− α

1−Gα,ϑi(1 + ξ)

(−ψ(α)Gα,ϑi(1 + ξ) +H(1)

α

(0,

1 + ξ

ϑi

))1{yi=1}

)2]

=

(α

Gα,ϑi(ξ)


α

(0,ξ

ϑi

)))2

·Gα,ϑi(ξ)

+

∫ 1

0

(α (− log(ϑi)− ψ(α) + log(yi + ξ)))2gα,ϑi(yi + ξ)dyi

+

(α

1−Gα,ϑi(1 + ξ)

(−ψ(α)Gα,ϑi(1 + ξ) +H(1)

α

(0,

1 + ξ

ϑi

)))2

· (1−Gα,ϑi(1 + ξ)).

Using (41) and (42), the middle summand of this expression is calculated as∫ 1

0

(α (− log(ϑi)− ψ(α) + log(yi + ξ)))2gα,ϑi(yi + ξ)dyi

=α2(log(ϑi) + ψ(α))2(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))

− 2α2(log(ϑi) + ψ(α))

(log(ϑi)(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)) +H(1)

α

(ξ

ϑi,

1 + ξ

ϑi

))+ α2 log(ϑi)

2(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)) + 2α2 log(ϑi)H(1)α

(ξ

ϑi,

1 + ξ

ϑi

)+ α2H(2)

α

(ξ

ϑi,

1 + ξ

ϑi

)=α2ψ(α)2(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))− 2α2ψ(α)H(1)

α

(ξ

ϑi,

1 + ξ

ϑi

)+ α2H(2)

α

(ξ

ϑi,

1 + ξ

ϑi

).

19

From this follows that

Eθ

[∂ì∂α′

∂ì∂α′

]=

α2

Gα,ϑi(ξ)


α

(0,ξ

ϑi

))2

+ α2

(ψ(α)2(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))− 2ψ(α)H(1)

α

(ξ

ϑi,

1 + ξ

ϑi

)+H(2)

α

(ξ

ϑi,

1 + ξ

ϑi

))+

α2

1−Gα,ϑi(1 + ξ)

(−ψ(α)Gα,ϑi(1 + ξ) +H(1)

α

(0,

1 + ξ

ϑi

))2

.

For the remaining entries of the Fisher Information Matrix, the calculationprocedure is similar to the one made before. That is, the computation of eachexpectation can be split in to three terms of which the middle term, correspond-ing to the non-censored part of the model, requires more effort to compute. Inthe following, we therefore first calculate the corresponding middle term in eachcase.

With (37), (41), (43), and (35), we calculate

Eθ

[α (− log(ϑi)− ψ(α) + log(yi + ξ))xik

(−α+

yi + ξ

ϑi

)1{0<yi<1}

]=α2xik log(ϑi)(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)) + α2xikψ(α)(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))− α2xik log(ϑi)(Gα+1,ϑi(1 + ξ)−Gα+1,ϑi(ξ))− α2xikψ(α)(Gα+1,ϑi(1 + ξ)−Gα+1,ϑi(ξ))

− α2xik log(ϑi)(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))− α2xikH(1)α

(ξ

ϑi,

1 + ξ

ϑi

)+ α2xik log(ϑi)(Gα+1,ϑi(1 + ξ)−Gα+1,ϑi(ξ)) + α2xikHα+1

(ξ

ϑi,

1 + ξ

ϑi

)=α2xikψ(α)(Gα+1,ϑi(ξ)−Gα,ϑi(ξ)−Gα+1,ϑi(1 + ξ) +Gα,ϑi(1 + ξ))

+ α2xik

(−H(1)

α

(ξ

ϑi,

1 + ξ

ϑi

)+Hα+1

(ξ

ϑi,

1 + ξ

ϑi

))=α2xik (ψ(α)ϑigα+1,ϑi (1 + ξ)− ψ(α)ϑigα+1,ϑi (ξ))

− α2xik

(H(1)α

(ξ

ϑi,

1 + ξ

ϑi

)+Hα+1

(ξ

ϑi,

1 + ξ

ϑi

)).

20

Using this result, (13), and (14), we get

Eθ

[∂ì∂α′

∂ì∂βk

]=Eθ

[α

Gα,ϑi(ξ)


α

(0,ξ

ϑi

))−xikξ · gα,ϑi (ξ)

Gα,ϑi(ξ)1{yi=0}

]+ Eθ

[α (− log(ϑi)− ψ(α) + log(yi + ξ))xik

(−α+

yi + ξ

ϑi

)1{0<yi<1}

]

+ Eθ

−α(−ψ(α)Gα,ϑi(1 + ξ) +H

(1)α

(0, 1+ξϑi

))1−Gα,ϑi(1 + ξ)

xik(1 + ξ) · gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)1{yi=1}

=− xik

aξ · gα,ϑi (ξ)(−ψ(α)Gα,ϑi(ξ) +H

(1)α

(0, ξϑi

))Gα,ϑi(ξ)

+ xikα2 (ψ(α)ϑigα+1,ϑi (ξ + 1)− ψ(α)ϑigα+1,ϑi (ξ))

− xikα2

(H(1)α

(ξ

ϑi,

1 + ξ

ϑi

)+Hα+1

(ξ

ϑi,

1 + ξ

ϑi

))

− xika(1 + ξ) · gα,ϑi (1 + ξ)

(−ψ(α)Gα,ϑi(1 + ξ) +H

(1)α

(0, 1+ξϑi

))1−Gα,ϑi(1 + ξ)

.

Next, with (37), (38), and (35), we calculate

Eθ

[xikxil

(−α+

yi + ξ

ϑi

)2

1{0<yi<1}

]= xikxilα

2(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))− 2α2xikxil(Gα+1,ϑi(1 + ξ)−Gα+1,ϑi(ξ))

+ a(α+ 1)xikxil(Gα+2,ϑi(1 + ξ)−Gα+2,ϑi(ξ))

= α2xikxilϑi (gα+1,ϑi (1 + ξ)− gα+1,ϑi (ξ)− gα+2,ϑi (1 + ξ) + gα+2,ϑi (ξ))

+ αxikxil(Gα+2,ϑi(1 + ξ)−Gα+2,ϑi(ξ)).

21

Using this result and (14), we see that

Eθ

[∂ì∂βk

∂ì∂βl

]=Eθ

[xikxil

(ξ · gα,ϑi (ξ)

Gα,ϑi(ξ)

)2

1{yi=0}

]

+ Eθ

[xikxil

(−α+

yi + ξ

ϑi

)2

1{0<yi<1}

]

+ Eθ

[xikxil

((1 + ξ) · gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)

)2

1{yi=1}

]=α2xikxilϑi (gα+1,ϑi (1 + ξ)− gα+1,ϑi (ξ)− gα+2,ϑi (1 + ξ) + gα+2,ϑi (ξ))

+ xikxil

(α(Gα+2,ϑi(1 + ξ)−Gα+2,ϑi(ξ)) +

(ξ · gα,ϑi (ξ))2

Gα,ϑi(ξ)+

((1 + ξ) · gα,ϑi (1 + ξ))2

1−Gα,ϑi(1 + ξ)

).

Moreover, with (39), (41), (44), and (35), we get

Eθ

[α (− log(ϑi)− ψ(α) + log(yi + ξ))

(α− 1

yi + ξ− 1

ϑi

)1{0<yi<1}

]=−α log(ϑi)

ϑi(Gϑi,a−1(1 + ξ)−Gϑi,a−1(ξ))− αψ(α)

ϑi(Gϑi,a−1(1 + ξ)−Gϑi,a−1(ξ))

+α log(ϑi)

ϑi(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)) +

αψ(α)

ϑi(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))

+α log(ϑi)

ϑi(Gϑi,a−1(1 + ξ)−Gϑi,a−1(ξ)) +

α

ϑiHα−1

(ξ

ϑi,

1 + ξ

ϑi

)− α log(ϑi)

ϑi(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))−

α

ϑiHα

(ξ

ϑi,

1 + ξ

ϑi

)=αψ(α)

ϑi(Gα,ϑi(1 + ξ)−Gϑi,a−1(1 + ξ)−Gα,ϑi(ξ) +Gϑi,a−1(ξ))

+α

ϑi

(Hα−1

(ξ

ϑi,

1 + ξ

ϑi

)−Hα

(ξ

ϑi,

1 + ξ

ϑi

))=αψ(α) (−gα,ϑi (ξ + 1) + gα,ϑi (ξ)) +

α

ϑi

(Hα−1

(ξ

ϑi,

1 + ξ

ϑi

)−Hα

(ξ

ϑi,

1 + ξ

ϑi

)).

22

With this equation, (13), and (15), we calculate

Eθ

[∂ì∂α′

∂ì∂ξ

]=Eθ

[α

Gα,ϑi(ξ)


α

(0,ξ

ϑi

))gα,ϑi(ξ)

Gα,ϑi(ξ)1{yi=0}

]+ Eθ

[α (− log(ϑi)− ψ(α) + log(yi + ξ))

(α− 1

yi + ξ− 1

ϑi

)1{0<yi<1}

]

+ Eθ

−α(−ψ(α)Gα,ϑi(1 + ξ) +H

(1)α

(0, 1+ξϑi

))1−Gα,ϑi(1 + ξ)

−gα,ϑi(1 + ξ)

1−Gα,ϑi(1 + ξ)1{yi=1}

=α(−ψ(α)Gα,ϑi(ξ) +H

(1)α

(0, ξϑi

))gα,ϑi(ξ)

Gα,ϑi(ξ)

+ αψ(α) (−gα,ϑi (ξ + 1) + gα,ϑi (ξ)) +α

ϑi

(Hα−1

(ξ

ϑi,

1 + ξ

ϑi

)−Hα

(ξ

ϑi,

1 + ξ

ϑi

))

+α(−ψ(α)Gα,ϑi(1 + ξ) +H

(1)α

(0, 1+ξϑi

))gα,ϑi(1 + ξ)

1−Gα,ϑi(1 + ξ).

With (37), (39), and (35), we calculate

Eθ

[xik

(−α+

yi + ξ

ϑi

)(α− 1

yi + ξ− 1

ϑi

)1{0<yi<1}

]=−αxik(Gϑi,a−1(1 + ξ)−Gϑi,a−1(ξ))

ϑi+αxik(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))

ϑi

+(α− 1)xik(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))

ϑi− αxik(Gα+1,ϑi(1 + ξ)−Gα+1,ϑi(ξ))

ϑi=xikα (−gα,ϑi (1 + ξ) + gα,ϑi (ξ) + gα+1,ϑi (1 + ξ)− gα+1,ϑi (ξ))

− xik(Gα,ϑi(1 + ξ)−Gα,ϑi(ξ))ϑi

.

23

Using the above result, we have

Eθ

[∂ì∂βk

∂ì∂ξ

]

=Eθ

−xik ξgα,1(ξϑi

)Gα,ϑi(ξ)

gα,ϑi(ξ)

Gα,ϑi(ξ)1{yi=0}

+ Eθ

[xik

(−α+

yi + ξ

ϑi

)(α− 1

yi + ξ− 1

ϑi

)1{0<yi<1}

]+ Eθ

[xik

(1 + ξ)gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)

−gα,ϑi(1 + ξ)

1−Gα,ϑi(1 + ξ)1{yi=1}

]=xikα (−gα,ϑi (1 + ξ) + gα,ϑi (ξ) + gα+1,ϑi (1 + ξ)− gα+1,ϑi (ξ))

+ xik

(−Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)

ϑi− ξgα,ϑi(ξ)

2

Gα,ϑi(ξ)− (1 + ξ) · gα,ϑi(1 + ξ)2

1−Gα,ϑi(1 + ξ)

).

Next, with (39), (40), (35), we calculate

Eθ

[(α− 1

yi + ξ− 1

ϑi

)2

1{0<yi<1}

]

=(α− 1)(Gϑi,a−2(1 + ξ)−Gϑi,a−2(ξ))

(α− 2)ϑ2i− 2

Gϑi,a−1(1 + ξ)−Gϑi,a−1(ξ)

ϑ2i

+Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)

ϑ2i

=

(ϑi(α− 1)2 − (ξ + 1)(α− 3)

ϑi(α− 2)(ξ + 1)

)gα,ϑi (ξ + 1)−

(ϑi(α− 1)2 − ξ(α− 3)

ϑi(α− 2)ξ

)gα,ϑi (ξ)

+Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)

(α− 2)ϑ2i.

Finally, using this result, we have

Eθ

[∂ì∂ξ

∂ì∂ξ

]=Eθ

[(gα,ϑi(ξ)

Gα,ϑi(ξ)

)2

1{yi=0}

]+ Eθ

[(α− 1

yi + ξ− 1

ϑi

)2

1{0<yi<1}

]

+ Eθ

[(−gα,ϑi(1 + ξ)

1−Gα,ϑi(1 + ξ)

)2

1{yi=1}

]

=gα,ϑi(ξ)

2

Gα,ϑi(ξ)+

(ϑi(α− 1)2 − (ξ + 1)(α− 3)

ϑi(α− 2)(ξ + 1)

)gα,ϑi (ξ + 1)

−(ϑi(α− 1)2 − ξ(α− 3)

ϑi(α− 2)ξ

)gα,ϑi (ξ) +

Gα,ϑi(1 + ξ)−Gα,ϑi(ξ)(α− 2)ϑ2i

+gα,ϑi(1 + ξ)2

1−Gα,ϑi(1 + ξ).

24

A.3 Fisher Information Matrix for the Two-tiered GammaModel

First, with (37) and (38), we get

Eθ

[∂ì∂βk

∂ì∂βl

]= Eθ

[xikxil

(yi + ξ


ϑi · (1−Gα,ϑi(ξ))

)2

1{0<yi<1}

]

+Eθ

[xikxilϑ2i

·(

(1 + ξ) · gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)− ξ · gα,ϑi (ξ)

1−Gα,ϑi(ξ)

)2

1{yi=1}

]

= xikxila(Gα+2,ϑi(1 + ξ)−Gα+2,ϑi(ξ))(1−Gϑi,a(ξ))

1−Gα,ϑi(ξ)

+xikxilα2(ξgα+1,ϑi (ξ)− (1 + ξ)gα+1,1

(1+ξϑi

))(1−Gϑi,a(ξ))

(α+ 1)(1−Gα,ϑi(ξ))

+xikxilα ((1 + ξ)gα,ϑi (1 + ξ)− ξgα,ϑi (ξ)) (1−Gϑi,a(ξ))

1−Gα,ϑi(ξ)

−xikxilξ2gα,ϑi (ξ)

2(1−Gϑi,a(ξ))

(1−Gα,ϑi(ξ))2

+xikxil(1 + ξ)2gα,ϑi (1 + ξ)

2(1−Gϑi,a(ξ))

(1−Gα,ϑi(ξ))(1−Gα,ϑi(1 + ξ))

Next, with (37) and the identity in (36), we get

Eθ

[∂ì∂βk

∂ì∂γl

]= Eθ

[xikxil

(yi + ξ


1−Gα,ϑi(ξ)

)ξ · gα,ϑi (ξ)

1−Gϑi,a(ξ)1{0<yi<1}

]

+Eθ

[xikxil ·

((1 + ξ) · gα,ϑi (1 + ξ)

1−Gα,ϑi(1 + ξ)− ξ · gα,ϑi (ξ)

1−Gα,ϑi(ξ)

)ξ · gα,ϑi (ξ)

1−Gϑi,a(ξ)1{yi=1}

]=0.

Finally, we calculate

Eθ

[∂ì∂γk

∂ì∂γl

]= Eθ

xikxil( ξ · gα,ϑi (ξ)

ϑi ·Gϑi,a(ξ)

)2

1{yi=0}

+Eθ

xikxil( ξ · gα,ϑi (ξ)

ϑi · (1−Gϑi,a(ξ))

)2 (1{0<yi<1} + 1{yi=1}

)=xikxil

ξ2 · gα,ϑi (ξ)2

Gϑi,a(ξ)+ xikxil

ξ2 · gα,ϑi (ξ)2

1−Gϑi,a(ξ)

=xikxilξ2 · gα,ϑi (ξ)

2

Gϑi,a(ξ)(1−Gϑi,a(ξ)).

25

A.4 Useful Identities and Integrals

By partial integration, we calculate

Gα+1,ϑ(ξ) =1

ϑα+1Γ(α+ 1)

∫ h

0

yα exp(−y/ϑ)dy

=1

ϑα+1Γ(α+ 1)(−ξαs exp(−ξ/ϑ))

+1

ϑα+1Γ(α+ 1)

∫ h

0

ayα−1s exp(−y/ϑ)dy

=− 1

Γ(α+ 1)

(ξ

ϑ

)αexp(−ξ/ϑ) +

1

ϑαΓ(α)

∫ h

0

yα−1 exp(−y/ϑ)dy

=− ϑgα+1,ϑ (ξ) +Gα,ϑ(ξ).

And from this follows

Gα+1,ϑ(ξ)−Gα,ϑ(ξ) = −ϑgα+1,ϑ (ξ) (35)

or

Gα+1,ϑ(ξ)−Gα,ϑ(ξ) = − ξαgα,ϑ (ξ) . (36)

For 0 ≤ l < u, the following equations hold true.∫ u

l

ygα,ϑ(y)dy = αϑ(Gα+1,ϑ(u)−Gα+1,ϑ(l)). (37)

∫ u

l

y2gα,ϑ(y)dy = ϑ2a(α+ 1)(Gα+2,ϑ(u)−Gα+2,ϑ(l)). (38)

∫ u

l

1

ygα,ϑ(y)dy =

1

(α− 1)ϑ(Gα−1,ϑ(u)−Gα−1,ϑ(l)). (39)

∫ u

l

1

y2gα,ϑ(y)dy =

1

(α− 1)(α− 2)ϑ2(Gα−2,ϑ(u)−Gα−2,ϑ(l)). (40)

∫ u

l

log(y)gα,ϑ(y)dy = log(ϑ)(Gα,ϑ(u)−Gα,ϑ(l)) +H(1)α

(l

ϑ,u

ϑ

). (41)

∫ u

l

log(y)2gα,ϑ(y)dy = log(ϑ)2(Gα,ϑ(u)−Gα,ϑ(l))

+ 2 log(ϑ)H(1)α

(l

ϑ,u

ϑ

)+H(2)

α

(l

ϑ,u

ϑ

). (42)

26

∫ u

l

y log(y)gα,ϑ(y)dy =αϑ log(ϑ)(Gα+1,ϑ(u)−Gα+1,ϑ(l))

+ αϑH(1)α+1

(l

ϑ,u

ϑ

). (43)

∫ u

l

log(y)

ygα,ϑ(y)dy =

1

(α− 1)ϑlog(ϑ)(Gα−1,ϑ(u)−Gα−1,ϑ(l))

+1

(α− 1)ϑH

(1)α−1

(l

ϑ,u

ϑ

). (44)

References

Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions,Dover Publications, New York.

Aitchison, J. (1955). On the distribution of a positive random variable havinga discrete probability mass at the origin, J. Amer. Statist. Assoc. 50: 901–908.

Amemiya, T. (1985). Advanced Econometrics, Harvard University Press, Cam-bridge, Massachusetts.

Arabmazar, A. and Schmidt, P. (1982). An investigation of the robustness ofthe tobit estimator to non-normality, Econometrica 50(4): 1055–1063.

Bardossy, A. and Plate, E. (1992). Space-time model for daily rainfall usingatmospheric circulation patterns, Water Resources Research 28(5): 1247–1259.

Breen, R. (1996). Regression Models: Censored, Sample Selected, or TruncatedData, Sage Publications, Thousand Oaks.

Chambers, J. M. and Hastie, T. J. (1992). Statistical Models in S, Wadsworth& Brooks/Cole.

Chen, S. and Khan, S. (2001). Semiparametric estimation of a partially linearcensored regression model, Econometric Theory 17(03): 567–590.

Couturier, D. L. and Victoria-Feser, M.-P. (to appear). Zero-inflated truncatedgeneralized pareto distribution for the analysis of radio audience data, TheAnnals of Applied Statistics XX: XX–XX.

Cragg, J. G. (1971). Some statistical models for limited dependent variables withapplication to the demand for durable goods, Econometrica 39(5): 829–44.

27

Crepon, B. and Duguet, E. (1997). Research and development, competitionand innovation pseudo-maximum likelihood and simulated maximum like-lihood methods applied to count data models with heterogeneity, Journalof Econometrics 79(2): 355 – 378.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihoodfrom incomplete data via the em algorithm, Journal of the Royal StatisticalSociety, Series B 39(1): 1–38.

Fahrmeir, L. and Tutz, G. (2001). Multivariate statistical modelling basedon generalized linear models, Springer Series in Statistics, Springer-Verlag,New York.

Goldberger, A. S. (1964). Economic Theory, Wiley, New York.

Gurmu, S. (1997). Semi-parametric estimation of hurdle regression models withan application to medicaid utilization., Journal of Applied Econometrics12: 225–242.

Gurmu, S. and Trivedi, P. K. (1996). Excess zeros in count models for recre-ational trips, Journal of Business & Economic Statistics 14(4): 469–77.

Heckman, J. J. (1976). The common structure of statistical models of trunca-tion, sample selection and limited dependent variables and a simple es-timator for such models, Annals of Economic and Social Measurement5(4): 120–137.

Heilbron, D. C. (1994). Zero-altered and other regression models for count datawith added zeros, Biometrical Journal 36: 531–547.

Khan, S. and Powell, J. L. (2001). Two-step estimation of semiparametriccensored regression models, Journal of Econometrics 103(1-2): 73–110.

Koenker, R. (2005). Quantile Regression, Vol. 1, Cambridge University Press,Cambridge University Press, 40 West 20th Street, New York.

Lambert, D. (1992). Zero-inflated poisson regression, with an application todefects in manufacturing, Technometrics 34: 1–14.

Long, J. S. (1997). Regression Models for Categorical and Limited DependentVariables, Advances quantiative techniques in the social sciences; v. 7,SAGE Publications, Inc., Thousand Oaks, California 91320.

Maddala, G. and Nelson, F. D. (1975). Specification errors in limited depen-dent variable models, NBER Working Papers 0096, National Bureau ofEconomic Research, Inc.

Maddala, G. S. (1983). Limited-dependent and qualitative variables in economet-rics, Vol. 3 of Econometric Society Monographs in Quantitative Economics,Cambridge University Press, Cambridge.

28

McCullagh, P. and Nelder, J. A. (1983). Generalized linear models, Monographson Statistics and Applied Probability, Chapman & Hall, London.

Miaou, S.-P. (1994). The relationship between truck accidents and geomet-ric design of road sections. poisson versus negative binomial regressions.,Accident Analysis & Prevention 26: 471–482.

Min, Y. and Agresti, A. (2005). Random effect models for repeated measuresof zero-inflated count data, Statistical Modelling 5(1): 1–19.

Mullahy, J. (1986). Specification and testing of some modified count data mod-els, Journal of Econometrics 33(3): 341–365.

Piessens, R., deDoncker-Kapenga, E., Uberhuber, C. and Kahaner, D. (1983).Quadpack. A subroutine package for automatic integration, Springer Seriesin Computational Mathematics, Volume 1, Springer-Verlag, New-York.

Ridout, M., Demetrio, C. G. and Hinde, J. (1998). Models for count data withmany zeros, Proceedings of the XIXth International Biometrics Conference,Cape Town, pp. 179–190.

Rosett, R. N. and Nelson, F. D. (1975). Estimation of the two-limit probitregression model, Econometrica 43(1): 141–46.

Sanso, B. and Guenni, L. (2004). A Bayesian approach to compare observedrainfall data to deterministic simulations, Environmetrics 15(6): 597–612.

Shonkwiler, J. and Shaw, W. D. (1996). Hurdle count-data models in recre-ation demand analysis, Journal of Agricultural and Resource Economics21(02): 210–219.

Tobin, J. (1958). Estimation of relationships for limited dependent variables,Econometrica 26: 24–36.

Welsh, A. H., Cunningham, R. B., Donnelly, C. F. and Lindenmayer, D. B.(1996). Modelling the abundance of rare species: statistical models forcounts with extra zeros, Ecological Modelling 88(1-3): 297 – 308.

29

Date post:	25-Dec-2019
Category:	Documents
Upload:	others
View:	13 times
Download:	0 times

Censored Gamma Regression Models for Limited … WG Meeting Surety...Censored Gamma Regression...

Documents