Download - Exploring the Use of Asymmetric Maximum Likelihood, Quantile & M-Quantile Regression ... · 2013-09-23 · Exploring the Use of Asymmetric Maximum Likelihood, Quantile & M-Quantile

Exploring the Use of Asymmetric MaximumLikelihood, Quantile & M-Quantile Regression for

Small Area Estimation of Counts

Nikos Tzavidis

Southampton Statistical Sciences Research InstituteUniversity of Southampton

joint work with M.G. Ranalli1, N.Salvati 2, E. Dreassi 3, R. Chambers4

Graybill 2013Colorado State University

1University of Perugia2University of Pisa3University of Florence4University of Wollongong

Outline

• SAE for continuous outcomes

• Asymmetric Maximum Likelihood for counts

• Quantile regression for counts

• M-Quantile regression for counts - Extending Cantoni &Ronchetti (2001, JASA)

• SAE for counts

• Empirical study

• Final remarks

Small Area Estimation: Preliminaries and Notation

• Units indexed by j, Areas indexed by d

• Variable of interest y, for now continuous

• Linear Mixed Model (LMM) approach to estimation, industrystandard:

yjd = xTjdβ + ud + εjd, j = 1, ..., nd, d = 1, ...D

• Small area estimator of area mean

ˆY LMMd = N−1d

[∑j∈sd

yjd +∑k∈rd

xTkdβ + ud

]

SAE: Relevant Literature - Continuous Case

• Robust Estimation

• Ghosh et al. (2008, Bioka)• Sinha & Rao (2009, CJS)• Chambers & Tzavidis (2006, Bioka)• Chambers et al. (2013, JRSS B)• Dongmo Jiongo et al. (2013, Bioka)

• Empirical Best Predictor (EBP, Molina & Rao, 2010, CJS)

M-Quantile Regression for Continuous Data: A Review

• Regression: model for the mean of y given x→ E(y|x) = xTβ.

• Quantile regression: model for the quantiles of y given x→ Qy(q|x) = xTβ(q)(Koenker and Bassett, 1978).

• M-Quantile regression: Qy(q|x;ψ) = xTβ(q)(Breckling and Chambers, 1988).

• an M-type generalization of quantile regression

M-Quantile Estimating Equation

For fixed q and influence function ψ compute β(q) by

n∑j=1

ψq(rjq)xj = 0

• residuals: rjq = (yj −Qy(q|xj ;ψ))/σqψ;

• σqψ scale parameter;

• ψq(t) = 2ψ(t){qI(t > 0) + (1− q)I(t 6 0)}.• Solved via Iterative Weighted Least Squares (IWLS)

Small Area Estimation with M-Quantile Regression

Main idea of SAE with M-Quantile regression

• Quantiles/M-Quantiles used for describing group (domain)heterogeneity

• Similar role to random effects BUT

• Estimation is semiparametric

• If a hierarchical structure does explain part of the variability inthe data, units within the same domain will be clustered inthe same part of f(y|x)

Small Area Estimation with M-Quantile Regression

3 4 5 6 7

05

1015

20

x

y

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●● ●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

q99

q90

q75

q50

q10

Estimation

• Define qjd such that yjd = xTjdβ(qjd)

• Estimate the individual M-Quantile coefficient qjd by solving

yjd = xTjdβ(qjd)

• Estimate the area d M-Quantile coefficient θd = E[qjd|d].

• If there are no sample observations in area d, then set θd = 0.5

Estimation

• The M-Quantile predictor of ˆYd is given by

ˆY LMQd = N−1d

[∑j∈sd

yjd +∑k∈rd

Qy(θd|xkd)],

whereQy(θd|xkd) = xTkdβ(θd)

An Alternative View of Quantile Regression

Quantile Regression - A Parametric Link

A continuous random variable y follows an asymmetric Laplacedistribution, y ∼ ALD(µ, σ, q) with pdf

p(y|µ, σ, τ) =q(1− q)

σexp−|y − µ|

σ

• Geraci and Bottai (2007): Quantile random effects regression

• p(y, v|β, σ,Γ) = p(y|β, σ, v)p(v|Γ)

• y|v ∼ ALD(Xβ + v, σ)

• p(v|Γ), Normal random effects

• p(v|Γ), Robust random effects

Quantile/M-Quantile Regression for Counts

1 Machado & Santos Silva (JASA, 2005; Quantile regression;Frequentist paradigm)

2 Efron (JASA,1992; Expectile regression via AsymmetricMaximum Likelihood)

3 Our proposal - M-Quantile regression: Extending Cantoni &Ronchetti (JASA,2001)

Quantile Regression for Counts

Machado & Santos Silva (JASA, 2005); Lee & Neocleous(JRSS C, 2010)

Problem with estimating conditional quantiles of counts is causedby the combination of a non-differentiable sample objectivefunction with a discrete dependent variable.

• Jittering: Artificial smoothness by adding noise from aUniform(0, 1) to the count

• 1− 1 relationship between the conditional quantiles of thecount and those of the jittered outcome

• Linear model (using Asymmetric Laplace Distribution) forquantiles of jittered outcome

Asymmetric Maximum Likelihood - Expectile Regression

Efron (JASA,1992)

Asymmetric Maximum Likelihood Estimation: Can be seen as theresult of smoothing the objective function used to define thequantile regression estimator. Denote by D the deviance under apopulation model

Dw(y, µ) = D(y, µ), y ≤ µDw(y, µ) = wD(y, µ), y > µ

AML Details

Solution: Minimize∑n

i=1Dw(y, µ)

• Leads to expectile regression (Newey & Powell,1987) forcounts

Robust Estimation for Generalized Linear Models

Cantoni & Ronchetti (JASA, 2001)

• yj from Exponential Family

• E(yj) = µj ; V (yj) = V (µj); g(µj) = xTj β

•∑n

j=1(yj−µj)V (µj)

∂∂βµj = 0

• Large deviations of yj from µj or leverage points− > influence

•∑n

j=1ψ(rj)w(xj)

V 1/2(µj)∂∂βµj − α(β) = 0 (Huber quasi-likelihood)

• rj Pearson residuals; w(xj) controls leverage points

M-Quantiles for Count Data: An Estimating EquationApproach

Let Qy(q|xj ;ψ) = exp(xTj β(q)) = µj(β(q)). Estimate β(q) byusing the following estimating equation

n∑j=1

[ψq

{ (yj − µj(β(q)))

V 1/2(µj(β(q)))

}w(xj)

1

V 1/2(µj(β(q)))

(∂µj(β(q))

∂β(q)

)−a(β(q))

]

• Estimation: Fisher scoring algorithm

Linking AML with the M-Quantile Extension of Cantoni &Ronchetti

AML Details

Start with Efron (1992): Minimize

Dw(βw) = 2

n∑j=1

[yj log(yj/µj(βw))− (yj − µj(βw))]wI(yj>µj(βw))

∂Dw(βw)

∂βw=

n∑j=1

[(yj − µj(βw))xj

]wI(yj>µj(βw)) = 0

w = 1 → corresponds to MLE

Linking AML with M-Quantile Extension of Cantoni &Ronchetti

Extension of Cantoni & Ronchetti

Consider extension of Cantoni & Ronchetti:

n∑j=1

[ψq

{ (yj − µj(β(q)))

V 1/2(µj(β(q)))

}w(xj)

(∂µj(β(q))∂β(q)

)V 1/2(µi(β(q)))

− a(β(q))]

= 0

With large tuning constant in Huber influence function

n∑j=1

[(yj − µj(β(q)))wqjxj

]= 0

with

wqj =[( q

1− q

)I(yj > µj(β(q))) + I(yj 6 µj(β(q)))

].

corresponds to Efron (1992) with w = q1−q

Comparing Different Approaches to Quantile Regressionfor Counts: An Example

• Generate data under a Poisson model

• η = 0.8 + 0.1x; y ∼ Poisson(exp(η))

• Fit M-Quantile regression, AML, Quantile regression (usingjittering)

Comparing Different Approaches to Quantile Regressionfor Counts: An Example

Table: Parameter estimates: Comparing different approaches to quantileregression for counts

Est. MQ c = 0.8 QR MQc = 100 AML

β0q25 0.51 0.43 0.57 0.57

β1q25 0.08 0.09 0.08 0.08

β0q50 0.77 0.62 0.83 0.83

β1q50 0.09 0.12 0.08 0.08

β0q75 1.10 1.08 1.10 1.10

β1q75 0.08 0.08 0.07 0.07

GLMMs for Counts

yjd|ud ∼ Poisson(µjd)

withlog(µjd) = ηjd = xTjdβ + ud

ud ∼ N(0,Σu)

Estimation

Empirical Conditional Mean Predictor (ECMP) of Yd is

ˆYd = N−1d

[∑j∈sd

yjd +∑k∈rd

ykd

],

where ykd = exp{ηkd} and ηkd = xTkdβ + ud

Notes on the use of GLMMs in SAE

• Standard methods for fitting GLMMs can be sensitive tooutliers

• Prediction of the random effects with GLMMs iscomputationally complicated

SAE for Counts Using M-Quantile Regression

E[y|x, d] = exp{xTjdβ(θd)}

• with θd = E[qjd|d],

• and qjd random variables such that

yjd = exp{xTjdβ(qjd)}

Estimation

• Estimate the individual M-Quantile coefficient qjd by solving

yjd = exp{xTjdβ(qjd)}

• Estimate the area d M-Quantile coefficient θd = E[qjd|d].

• The M-Quantile estimator of Yd is given by

ˆY MQd = N−1d

[∑j∈sd

yjd +∑k∈rd

ykd

],

ykd = exp{xTkdβ(θd)}

• MSE estimation with bootstrap

Estimation of qjd

For count data yjd = Qy(qjd|xjd) does not have a solution whenyjd = 0. We will define qjd as the solution to

Qy(qjd|xjd) =

{k(xjd) yjd = 0yjd yjd = 1, 2, . . .

where

× k(xjd) = Qy(qmin|xjd) where qmin denotes the smallest q-value inthe grid of q-values used. However this implies that qjd = qminwhenever yjd = 0, irrespective of the value of xj

X We want that an observation with value y1 = 0 corresponds to asmaller q-value than another with value y2 = 0 whenQy(0.5|x1) > Qy(0.5|x2). A way to achieve this is by setting

k(xjd) = min{1− ε, [Qy(0.5|xjd)]−1}, ε > 0.

Illustration

• Generate data under the Poisson GLMM model

• u ∼ N(0, 1); η = 0.3 + 0.5x+ u; y ∼ Poisson(exp(η))

• Fit MQ regression, estimate area effects

3 4 5 6 7 8

010

2030

4050

6070

x

y

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●● ● ●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●

●● ●

●

●

● ●

●●

●

●

●●

●

●

●●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●● ●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●● ●

●

●

●●

●● ●●●● ●

●● ●

● ●

●●

●

●

●

●●●

●● ●●

●●

●

●

●

●

●

●

●

●

●● ●

●

●●

●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●● ●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●● ●

●●● ●

●

●

●●

●

●●

●●

● ●

●●

●

●● ●

●●

●

●

●

●

●●

●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

● ●● ●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

● ●

●

●●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●● ●

●

●●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●●

● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

q99

q95

q75

q50

q10

Estimation of θd

3 4 5 6 7 8

010

2030

4050

Estimating M−quantile coefficients

x

y

●

●●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

q95

q50

q75

q10

q25

θ1 = 0.11

θ2 = 0.6

u1 = − 0.5

u2 = 0.59

Prediction for Small Areas

3 4 5 6

010

2030

40Predictions for two areas using alternative models for counts

x

y

GLMM PredictionsMQ Predictions, c=100MQ Predictions, c=0.8ALD Model Predictions

Model-Based Simulations

• Compare: M-Quantile, ECMP & Direct Predictors

• Scenarios follow Ronchetti & Lo (2009, JMA)

Simulation Specifications

• u ∼ N(0, 0.5); x ∼ N(0, 1)

• yjd ∼ Poisson(µjd); µjd = exp(x+ ud) D = 50, nd = 10,MC = 500

• Scenario 1- No contamination

• Scenario 2- Contamination - 2%, 5%, 10%

• Contamination Mechanism: Ronchetti & Lo (2009):y = (1− α)Poisson(µ) + αPoisson(5µ)

Model-Based Simulations Results

Table: Model-based simulation results -y = (1− α)Poisson(µ) + αPoisson(5µ)

ScenarioPredictor 0 2% 5% 10%

Mean Values of Bias

ECMP 0.010 0.019 0.004 0.007MQ 0.011 0.042 -0.060 -0.145Direct 0.005 0.003 0.002 0.002

Mean Values of MSE

ECMP 0.139 0.340 0.664 1.190MQ 0.202 0.327 0.556 0.824Direct 0.719 1.122 1.749 2.820

Final Remarks

1. Explore robust prediction for binary outcomes

2. Extensions to aggregate data available - Disease mapping(Chambers, Salvati, & Dreassi, 2013)

3. Define bias corrected predictors (extension of Chambers et al.,2013-JRSS B)

4. Robust prediction using glmms (Maiti, 2001, JSPI; Sinha,2004, CJS)

5. More work on numerical stability of estimating qjd is needed

6. Alternative approach to robust prediction for counts via ALDusing jittering