Exploring the Use of Asymmetric MaximumLikelihood, Quantile & M-Quantile Regression for
Small Area Estimation of Counts
Nikos Tzavidis
Southampton Statistical Sciences Research InstituteUniversity of Southampton
joint work with M.G. Ranalli1, N.Salvati 2, E. Dreassi 3, R. Chambers4
Graybill 2013Colorado State University
1University of Perugia2University of Pisa3University of Florence4University of Wollongong
Outline
• SAE for continuous outcomes
• Asymmetric Maximum Likelihood for counts
• Quantile regression for counts
• M-Quantile regression for counts - Extending Cantoni &Ronchetti (2001, JASA)
• SAE for counts
• Empirical study
• Final remarks
Small Area Estimation: Preliminaries and Notation
• Units indexed by j, Areas indexed by d
• Variable of interest y, for now continuous
• Linear Mixed Model (LMM) approach to estimation, industrystandard:
yjd = xTjdβ + ud + εjd, j = 1, ..., nd, d = 1, ...D
• Small area estimator of area mean
ˆY LMMd = N−1d
[∑j∈sd
yjd +∑k∈rd
xTkdβ + ud
]
SAE: Relevant Literature - Continuous Case
• Robust Estimation
• Ghosh et al. (2008, Bioka)• Sinha & Rao (2009, CJS)• Chambers & Tzavidis (2006, Bioka)• Chambers et al. (2013, JRSS B)• Dongmo Jiongo et al. (2013, Bioka)
• Empirical Best Predictor (EBP, Molina & Rao, 2010, CJS)
M-Quantile Regression for Continuous Data: A Review
• Regression: model for the mean of y given x→ E(y|x) = xTβ.
• Quantile regression: model for the quantiles of y given x→ Qy(q|x) = xTβ(q)(Koenker and Bassett, 1978).
• M-Quantile regression: Qy(q|x;ψ) = xTβ(q)(Breckling and Chambers, 1988).
• an M-type generalization of quantile regression
M-Quantile Estimating Equation
For fixed q and influence function ψ compute β(q) by
n∑j=1
ψq(rjq)xj = 0
• residuals: rjq = (yj −Qy(q|xj ;ψ))/σqψ;
• σqψ scale parameter;
• ψq(t) = 2ψ(t){qI(t > 0) + (1− q)I(t 6 0)}.• Solved via Iterative Weighted Least Squares (IWLS)
Small Area Estimation with M-Quantile Regression
Main idea of SAE with M-Quantile regression
• Quantiles/M-Quantiles used for describing group (domain)heterogeneity
• Similar role to random effects BUT
• Estimation is semiparametric
• If a hierarchical structure does explain part of the variability inthe data, units within the same domain will be clustered inthe same part of f(y|x)
Small Area Estimation with M-Quantile Regression
3 4 5 6 7
05
1015
20
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
q99
q90
q75
q50
q10
Estimation
• Define qjd such that yjd = xTjdβ(qjd)
• Estimate the individual M-Quantile coefficient qjd by solving
yjd = xTjdβ(qjd)
• Estimate the area d M-Quantile coefficient θd = E[qjd|d].
• If there are no sample observations in area d, then set θd = 0.5
Estimation
• The M-Quantile predictor of ˆYd is given by
ˆY LMQd = N−1d
[∑j∈sd
yjd +∑k∈rd
Qy(θd|xkd)],
whereQy(θd|xkd) = xTkdβ(θd)
An Alternative View of Quantile Regression
Quantile Regression - A Parametric Link
A continuous random variable y follows an asymmetric Laplacedistribution, y ∼ ALD(µ, σ, q) with pdf
p(y|µ, σ, τ) =q(1− q)
σexp−|y − µ|
σ
• Geraci and Bottai (2007): Quantile random effects regression
• p(y, v|β, σ,Γ) = p(y|β, σ, v)p(v|Γ)
• y|v ∼ ALD(Xβ + v, σ)
• p(v|Γ), Normal random effects
• p(v|Γ), Robust random effects
Quantile/M-Quantile Regression for Counts
1 Machado & Santos Silva (JASA, 2005; Quantile regression;Frequentist paradigm)
2 Efron (JASA,1992; Expectile regression via AsymmetricMaximum Likelihood)
3 Our proposal - M-Quantile regression: Extending Cantoni &Ronchetti (JASA,2001)
Quantile Regression for Counts
Machado & Santos Silva (JASA, 2005); Lee & Neocleous(JRSS C, 2010)
Problem with estimating conditional quantiles of counts is causedby the combination of a non-differentiable sample objectivefunction with a discrete dependent variable.
• Jittering: Artificial smoothness by adding noise from aUniform(0, 1) to the count
• 1− 1 relationship between the conditional quantiles of thecount and those of the jittered outcome
• Linear model (using Asymmetric Laplace Distribution) forquantiles of jittered outcome
Asymmetric Maximum Likelihood - Expectile Regression
Efron (JASA,1992)
Asymmetric Maximum Likelihood Estimation: Can be seen as theresult of smoothing the objective function used to define thequantile regression estimator. Denote by D the deviance under apopulation model
Dw(y, µ) = D(y, µ), y ≤ µDw(y, µ) = wD(y, µ), y > µ
AML Details
Solution: Minimize∑n
i=1Dw(y, µ)
• Leads to expectile regression (Newey & Powell,1987) forcounts
Robust Estimation for Generalized Linear Models
Cantoni & Ronchetti (JASA, 2001)
• yj from Exponential Family
• E(yj) = µj ; V (yj) = V (µj); g(µj) = xTj β
•∑n
j=1(yj−µj)V (µj)
∂∂βµj = 0
• Large deviations of yj from µj or leverage points− > influence
•∑n
j=1ψ(rj)w(xj)
V 1/2(µj)∂∂βµj − α(β) = 0 (Huber quasi-likelihood)
• rj Pearson residuals; w(xj) controls leverage points
M-Quantiles for Count Data: An Estimating EquationApproach
Let Qy(q|xj ;ψ) = exp(xTj β(q)) = µj(β(q)). Estimate β(q) byusing the following estimating equation
n∑j=1
[ψq
{ (yj − µj(β(q)))
V 1/2(µj(β(q)))
}w(xj)
1
V 1/2(µj(β(q)))
(∂µj(β(q))
∂β(q)
)−a(β(q))
]
• Estimation: Fisher scoring algorithm
Linking AML with the M-Quantile Extension of Cantoni &Ronchetti
AML Details
Start with Efron (1992): Minimize
Dw(βw) = 2
n∑j=1
[yj log(yj/µj(βw))− (yj − µj(βw))]wI(yj>µj(βw))
∂Dw(βw)
∂βw=
n∑j=1
[(yj − µj(βw))xj
]wI(yj>µj(βw)) = 0
w = 1 → corresponds to MLE
Linking AML with M-Quantile Extension of Cantoni &Ronchetti
Extension of Cantoni & Ronchetti
Consider extension of Cantoni & Ronchetti:
n∑j=1
[ψq
{ (yj − µj(β(q)))
V 1/2(µj(β(q)))
}w(xj)
(∂µj(β(q))∂β(q)
)V 1/2(µi(β(q)))
− a(β(q))]
= 0
With large tuning constant in Huber influence function
n∑j=1
[(yj − µj(β(q)))wqjxj
]= 0
with
wqj =[( q
1− q
)I(yj > µj(β(q))) + I(yj 6 µj(β(q)))
].
corresponds to Efron (1992) with w = q1−q
Comparing Different Approaches to Quantile Regressionfor Counts: An Example
• Generate data under a Poisson model
• η = 0.8 + 0.1x; y ∼ Poisson(exp(η))
• Fit M-Quantile regression, AML, Quantile regression (usingjittering)
Comparing Different Approaches to Quantile Regressionfor Counts: An Example
Table: Parameter estimates: Comparing different approaches to quantileregression for counts
Est. MQ c = 0.8 QR MQc = 100 AML
β0q25 0.51 0.43 0.57 0.57
β1q25 0.08 0.09 0.08 0.08
β0q50 0.77 0.62 0.83 0.83
β1q50 0.09 0.12 0.08 0.08
β0q75 1.10 1.08 1.10 1.10
β1q75 0.08 0.08 0.07 0.07
GLMMs for Counts
yjd|ud ∼ Poisson(µjd)
withlog(µjd) = ηjd = xTjdβ + ud
ud ∼ N(0,Σu)
Estimation
Empirical Conditional Mean Predictor (ECMP) of Yd is
ˆYd = N−1d
[∑j∈sd
yjd +∑k∈rd
ykd
],
where ykd = exp{ηkd} and ηkd = xTkdβ + ud
Notes on the use of GLMMs in SAE
• Standard methods for fitting GLMMs can be sensitive tooutliers
• Prediction of the random effects with GLMMs iscomputationally complicated
SAE for Counts Using M-Quantile Regression
E[y|x, d] = exp{xTjdβ(θd)}
• with θd = E[qjd|d],
• and qjd random variables such that
yjd = exp{xTjdβ(qjd)}
Estimation
• Estimate the individual M-Quantile coefficient qjd by solving
yjd = exp{xTjdβ(qjd)}
• Estimate the area d M-Quantile coefficient θd = E[qjd|d].
• The M-Quantile estimator of Yd is given by
ˆY MQd = N−1d
[∑j∈sd
yjd +∑k∈rd
ykd
],
ykd = exp{xTkdβ(θd)}
• MSE estimation with bootstrap
Estimation of qjd
For count data yjd = Qy(qjd|xjd) does not have a solution whenyjd = 0. We will define qjd as the solution to
Qy(qjd|xjd) =
{k(xjd) yjd = 0yjd yjd = 1, 2, . . .
where
× k(xjd) = Qy(qmin|xjd) where qmin denotes the smallest q-value inthe grid of q-values used. However this implies that qjd = qminwhenever yjd = 0, irrespective of the value of xj
X We want that an observation with value y1 = 0 corresponds to asmaller q-value than another with value y2 = 0 whenQy(0.5|x1) > Qy(0.5|x2). A way to achieve this is by setting
k(xjd) = min{1− ε, [Qy(0.5|xjd)]−1}, ε > 0.
Illustration
• Generate data under the Poisson GLMM model
• u ∼ N(0, 1); η = 0.3 + 0.5x+ u; y ∼ Poisson(exp(η))
• Fit MQ regression, estimate area effects
3 4 5 6 7 8
010
2030
4050
6070
x
y
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●● ● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●● ●
●
●
● ●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●● ●
●
●
●●
●● ●●●● ●
●● ●
● ●
●●
●
●
●
●●●
●● ●●
●●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●● ●
●●● ●
●
●
●●
●
●●
●●
● ●
●●
●
●● ●
●●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●● ●
●
●●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
q99
q95
q75
q50
q10
Estimation of θd
3 4 5 6 7 8
010
2030
4050
Estimating M−quantile coefficients
x
y
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
q95
q50
q75
q10
q25
θ1 = 0.11
θ2 = 0.6
u1 = − 0.5
u2 = 0.59
Prediction for Small Areas
3 4 5 6
010
2030
40Predictions for two areas using alternative models for counts
x
y
GLMM PredictionsMQ Predictions, c=100MQ Predictions, c=0.8ALD Model Predictions
Model-Based Simulations
• Compare: M-Quantile, ECMP & Direct Predictors
• Scenarios follow Ronchetti & Lo (2009, JMA)
Simulation Specifications
• u ∼ N(0, 0.5); x ∼ N(0, 1)
• yjd ∼ Poisson(µjd); µjd = exp(x+ ud) D = 50, nd = 10,MC = 500
• Scenario 1- No contamination
• Scenario 2- Contamination - 2%, 5%, 10%
• Contamination Mechanism: Ronchetti & Lo (2009):y = (1− α)Poisson(µ) + αPoisson(5µ)
Model-Based Simulations Results
Table: Model-based simulation results -y = (1− α)Poisson(µ) + αPoisson(5µ)
ScenarioPredictor 0 2% 5% 10%
Mean Values of Bias
ECMP 0.010 0.019 0.004 0.007MQ 0.011 0.042 -0.060 -0.145Direct 0.005 0.003 0.002 0.002
Mean Values of MSE
ECMP 0.139 0.340 0.664 1.190MQ 0.202 0.327 0.556 0.824Direct 0.719 1.122 1.749 2.820
Final Remarks
1. Explore robust prediction for binary outcomes
2. Extensions to aggregate data available - Disease mapping(Chambers, Salvati, & Dreassi, 2013)
3. Define bias corrected predictors (extension of Chambers et al.,2013-JRSS B)
4. Robust prediction using glmms (Maiti, 2001, JSPI; Sinha,2004, CJS)
5. More work on numerical stability of estimating qjd is needed
6. Alternative approach to robust prediction for counts via ALDusing jittering