+ All Categories
Home > Documents > Randomized Quantile Residuals: an Omnibus Model Diagnostic ...longhai/doc/talks/rqrslides.pdf ·...

Randomized Quantile Residuals: an Omnibus Model Diagnostic ...longhai/doc/talks/rqrslides.pdf ·...

Date post: 07-Sep-2019
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
48
Randomized Quantile Residuals: an Omnibus Model Diagnostic Tool with Unified Reference Distribution Longhai Li Department of Mathematics and Statistics University of Saskatchewan Saskatoon, SK, CANADA Presented on 19 June 2017 Xiamen University, China
Transcript
  • Randomized Quantile Residuals: an Omnibus ModelDiagnostic Tool with Unified Reference Distribution

    Longhai Li

    Department of Mathematics and StatisticsUniversity of SaskatchewanSaskatoon, SK, CANADA

    Presented on 19 June 2017Xiamen University, China

  • Acknowledgements

    Joint work with Alireza Sadeghpour and Cindy X. Feng.

    The work was supported by grants from Natural Sciences andEngineering Research Council of Canada (NSERC) and CanadaFoundation for Innovation (CFI).

  • Outline

    1 Introduction

    2 Non-normal Regression Models

    3 Pearson and Deviance Residuals

    4 Randomized Quantile Residual

    5 Simulation StudiesDetection of Non-linearityDetection of Zero-Inflation

    6 Application to a Big Health Dataset

    7 Conclusion and Discussion

  • Section 1

    Introduction

  • Introduction

    Examining residuals, such as Pearson and deviance residuals, is aprimary method to identify the discrepancies between models anddata and to assess the overall goodness-of-fit of a model. In normallinear regression, both of these residuals coincide and are normallydistributed;

    However, in non-normal regression models, the residuals are far fromnormality, with residuals aligning nearly parallel curves according todistinct response values, which imposes great challenges for visualinspection.

    Randomized quantile residual was proposed by Dunn and Smyth(1996) to circumvent the above-mentioned problems in traditionalresiduals.

    Randomized quantile residual is still lack of applications in practice.We will demonstrate how good it is.

    1. Introduction/ 1/36

  • A First Look at Three Residuals

    Pearson

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ●●

    ●●

    ● ●● ● ●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●●●

    ●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●●

    ● ●

    ● ●

    ● ●

    ●● ●●

    ●●

    ●●●

    ●●

    ● ●

    ● ●

    ● ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    −1

    01

    23

    45

    x

    Pea

    rson

    Deviance

    ●●

    ●●

    ●●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5−

    2−

    10

    12

    3

    x

    Dev

    ianc

    e

    Randomized Quantile

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    −3

    −2

    −1

    01

    23

    x

    Ran

    dom

    ized

    Qua

    ntile

    A simulated dataset is checked against the corrected model. However,Pearson and deviance residuals exhibit trend and cluster in lines.

    In addition, the often used χ2 tests are not well-calibrated.

    Randomized quantile residuals can be checked as traditional residualsfor normal regression.

    1. Introduction/ 2/36

  • Section 2

    Non-normal Regression Models

  • Generalized Linear Model

    Generalized Linear Model (GLM): The GLM assumes that a responsevariable yi given xi follows a non-normal distribution, such as Poissonand Gamma, etc. A unified form for the PDF is:

    f (yi ; θi , φ) = exp

    {yiθi − b(θi )

    a(φ)+ c(yi , φ)

    }. (1)

    A link function is used to connect the conditional expected value ofthe response variable, µi = E (yi |xi ), to a linear combination of thecovariates and regression parameters as,

    g(µi ) = ηi = xiβ. (2)

    2. Non-normal Regression Models/ 3/36

  • Zero-Inflated Models: I

    Zero-Inflated Model: In practice, very often, we have excessive zerosin count data, which might not be captured by a conventional GLMmodel. One popular approach to model such data is to use a mixtureof point mass at zero modelling the non-risk group (structural zeros)and a GLM modelling the at-risk group.Example:The zero-inflated Poisson response variable with parameters λi andpi , denoted by ZIP(λi , pi ), is defined as:

    yi ∼

    {δ0 with probability pi

    Poisson(λi ) with probability 1− pi ,(3)

    2. Non-normal Regression Models/ 4/36

  • Zero-Inflated Models: II

    The PMF of the ZIP distribution:

    dzip(yi = 0) = pi + (1− pi )e−λi (4)

    dzip(yi = j) = (1− pi )e−λiλji

    j!, j = 1, 2, · · · . (5)

    The CDF of the ZIP distribution

    pzip(yi = J;λi , pi ) =J∑

    j=0

    dzip(yi = j) = pi +(1−pi )ppois(J, λi ), (6)

    Links of pi and λi to covariates

    logit(pi ) = ziγ and log(λi ) = xiβ, (7)

    where zi and xi are vectors of explanatory variables for pi and λi withγ and β corresponding to their parameter vectors, respectively.

    2. Non-normal Regression Models/ 5/36

  • Section 3

    Pearson and Deviance Residuals

  • Pearson and Deviance Residuals

    Pearson residual is defined as

    ri =yi − µ̂i√V̂ (yi )

    . (8)

    The deviance residual for the ith observation is defined as signedsquare root of the corresponding component of D (y ; µ̂), i.e.

    di = sign(yi − µ̂i )√

    2{ωi

    [yi (θ̃i − θ̂i )− b(θ̃i ) + b(θ̂i )

    ]}, (9)

    where θ̃i and θ̂i denote the parameters in the saturated and fittedmodels, respectively.

    3. Pearson and Deviance Residuals/ 6/36

  • Problems with Traditional Residuals

    In regression models for discrete outcomes, the residuals are far fromnormality, with residuals aligning nearly parallel curves according todistinct response values, which poses great challenges for visualinspection. Therefore, residual plots for the diagnosis of models fordiscrete outcome variables give very limited meaningful informationfor model diagnosis, which renders it of no practical use.

    The Pearson χ2 statistic is written as, X 2 =∑n

    i=1 r2i , and the

    deviance (χ2 statistic) is written as, D =∑n

    i=1 d2i . The asymptotic

    distribution of D and X 2 under the true model is often assumed to beχ2n−p, where n is the sample size and p is the number of parameters.However, the use of this asymptotic distribution for both X 2 and Dappears lack of theoretical underpinning.

    3. Pearson and Deviance Residuals/ 7/36

  • Section 4

    Randomized Quantile Residual

  • Definition of Randomized Quantile Residual

    Predictive p-value for continuous yi :

    F (yi ; µ̂i , φ̂) = P(Yi ≤ yi | µ̂i , φ̂)

    Randomized predictive p-valueIf F is discrete, the estimated lower tail probability is randomized intoa uniform random number.

    F ∗(yi ; µ̂i , φ̂, ui ) = F (yi−; µ̂i , φ̂) + ui d(yi ; µ̂i , φ̂), (10)

    where ui from uniform distribution on (0, 1], F (yi−; µ̂i , φ̂) is the lowerlimit of F at yi , i.e., supy

  • Illustrative Example #1: I

    The true model for yi has the PMF:

    yi 0 1 2

    d0(yi ) 0.25 0.5 0.25(12)

    We compare with a wrong model with PMF d1:

    yi 0 1 2

    d1(yi ) 0.1 0.8 0.1(13)

    4. Randomized Quantile Residual/ 9/36

  • Illustrative Example #1: II

    Figure 2: The randomized predictive p-value for the true model and the wrongmodel.

    (a) F ∗ for true model (b) F̃ ∗ for wrong model

    4. Randomized Quantile Residual/ 10/36

  • Illustrative Example #2: I

    The true model:We simulate a response variable of size n = 1000 from a Poissonmodel with

    log(µi ) = −1 + 2sin(2xi ),

    where µi is the expected mean count for the ith subject andxi ∼ Uniform(0, 2π), i = 1, · · · , nA wrong model:Poisson model with mean structure

    log(µi ) = β0 + β1xi

    with xi as a predictor with linear effect.

    4. Randomized Quantile Residual/ 11/36

  • Illustrative Example #2: II

    CDF linesThe CDF of the response variable Yi given xi (under a consideredmodel with parameters estimated with sample) is denoted byF (k|xi ) = P(Yi ≤ k |xi ), for k = 0, 1, · · · .An illustrative picture

    4. Randomized Quantile Residual/ 12/36

  • Illustrative Example #2: III

    4. Randomized Quantile Residual/ 13/36

  • Normality of Randomized Quantile Residual (RQR)

    Theorem

    Suppose a continuous random variable Y has the CDF F (y), then F (Y ) isuniformly distributed on (0,1].

    Theorem

    Suppose the true distribution of Yi given Xi has the CDF F (yi ;µi , φ) andPMF d(yi ;µi , φ), where µi is a function of Xi involving the modelparameters. The randomized lower tail probability F ∗(yi ;µi , φ, ui ) isdefined as F (yi−;µi , φ) + ui d(yi ;µi , φ) (10). Suppose Ui is uniformlydistributed on (0,1]. Then, we have

    F ∗(Yi ;µi , φ,Ui ) ∼ Uniform(0, 1], (14)

    andqi = φ

    −1(F ∗(Yi ;µi , φ,Ui )) ∼ N(0, 1). (15)

    4. Randomized Quantile Residual/ 14/36

  • Proof of Normality of RQR

    For any interval B ⊆ (0, 1],

    P(F ∗(Yi ;µi , φ,Ui ) ∈ B|Yi = k(j)) =length(F (j) ∩ B)

    p(j),

    where length(·) is the length of interval. By the law of total probability,

    P(F ∗(Yi ;µi , φ,Ui ) ∈ B) (16)

    =∞∑j=1

    P(F ∗(Yi ;µi , φ,Ui ) ∈ B|Yi = k(j))× P(Yi = k(j)) (17)

    =∞∑j=1

    length(F (j) ∩ B)p(j)

    × p(j) (18)

    =∞∑j=1

    length(F (j) ∩ B) (19)

    = length(∪∞j=1F (j) ∩ B) = length(B) (20)

    4. Randomized Quantile Residual/ 15/36

  • Section 5

    Simulation Studies

  • Subsection 1

    Detection of Non-linearity

  • Simulation Setup

    We simulate a covariate x ∼ Uniform(−1.5, 1.5) of size n = 1000.The response variable is simulated from a negative binomialregression model

    log(µi ) = β0 + β1x2i ,

    where µi is the expected count for the ith subject.

    Then, we consider fitting a wrong model assuming

    log(µi ) = β0 + β1xi .

    We set β0 = 0, β1 = 1.

    The reciprocal for the dispersion parameter associated with thenegative binomial distribution is set as k = 2.

    5. Simulation Studies/Detection of Non-linearity 16/36

  • Scatterplots of Residuals for a Single DatasetPearson

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ●●

    ●●

    ● ●● ● ●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●●●

    ●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●●

    ● ●

    ● ●

    ● ●

    ●● ●●

    ●●

    ●●●

    ●●

    ● ●

    ● ●

    ● ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    −1

    01

    23

    45

    x

    Pea

    rson

    Deviance

    ●●

    ●●

    ●●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●● ●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●●

    ●●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5−

    2−

    10

    12

    3x

    Dev

    ianc

    e

    Randomized Quantile

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●

    ●●

    ●●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    −3

    −2

    −1

    01

    23

    x

    Ran

    dom

    ized

    Qua

    ntile

    ●●

    ●●

    ● ●

    ●● ●

    ● ●●

    ● ●

    ●● ● ●

    ● ●

    ●● ●

    ●●

    ● ●

    ● ●●

    ● ●

    ● ●

    ●●● ●

    ● ●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ●● ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ●● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ● ● ●

    ●● ●●

    ● ●●

    ● ●●

    ● ●● ● ●

    ●●

    ●●

    ●● ●

    ●●

    ● ●●

    ●● ●●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●● ●

    ●●

    ●●

    ● ●●

    ● ●

    ● ●

    ●●

    ●● ●

    ● ●

    ● ●

    ●●

    ● ●● ●●

    ● ●

    ●● ●● ●

    ● ●●

    ● ●●●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ● ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●●

    ● ● ●

    ● ●

    ●●

    ● ●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    02

    46

    8

    x

    Pea

    rson

    ●●

    ●●

    ● ●

    ●● ●

    ● ●●

    ● ●

    ●● ● ●

    ● ●

    ●● ●

    ●●

    ● ●

    ● ●●

    ● ●

    ● ●

    ●●● ●

    ● ●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ●● ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●

    ●● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ● ● ●

    ●● ●●

    ● ●●

    ● ●●

    ● ●● ● ●

    ●●

    ●●

    ●● ●

    ●●

    ● ●●

    ●● ●●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●● ●

    ●●

    ●●

    ● ●●

    ● ●

    ● ●

    ●●

    ●● ●

    ● ●

    ● ●

    ●●

    ● ●● ●●

    ● ●

    ●● ●● ●

    ● ●●

    ● ●●●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ● ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ● ●

    ● ●

    ● ●

    ● ●

    ●●

    ● ●●

    ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ● ●●

    ● ●

    ● ●

    ● ●

    ●●

    ●●

    ● ●

    ●●

    ●●●

    ● ● ●

    ● ●

    ●●

    ● ●

    −1.5 −0.5 0.0 0.5 1.0 1.5

    −1

    01

    23

    x

    Dev

    ianc

    e

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●● ●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    ●●●

    ● ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ●●

    ● ●

    ● ●

    ●●

    ● ●

    ●●


Recommended