+ All Categories
Home > Documents > Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and...

Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and...

Date post: 22-Aug-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
17
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=uasa20 Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Generalized Fiducial Inference: A Review and New Results Jan Hannig, Hari Iyer, Randy C. S. Lai & Thomas C. M. Lee To cite this article: Jan Hannig, Hari Iyer, Randy C. S. Lai & Thomas C. M. Lee (2016) Generalized Fiducial Inference: A Review and New Results, Journal of the American Statistical Association, 111:515, 1346-1361, DOI: 10.1080/01621459.2016.1165102 To link to this article: https://doi.org/10.1080/01621459.2016.1165102 View supplementary material Accepted author version posted online: 06 Apr 2016. Published online: 18 Oct 2016. Submit your article to this journal Article views: 871 View related articles View Crossmark data Citing articles: 23 View citing articles
Transcript
Page 1: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=uasa20

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Generalized Fiducial Inference: A Review and NewResults

Jan Hannig, Hari Iyer, Randy C. S. Lai & Thomas C. M. Lee

To cite this article: Jan Hannig, Hari Iyer, Randy C. S. Lai & Thomas C. M. Lee (2016)Generalized Fiducial Inference: A Review and New Results, Journal of the American StatisticalAssociation, 111:515, 1346-1361, DOI: 10.1080/01621459.2016.1165102

To link to this article: https://doi.org/10.1080/01621459.2016.1165102

View supplementary material

Accepted author version posted online: 06Apr 2016.Published online: 18 Oct 2016.

Submit your article to this journal

Article views: 871

View related articles

View Crossmark data

Citing articles: 23 View citing articles

Page 2: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, VOL. , NO. , –, Reviewhttp://dx.doi.org/./..

REVIEW

Generalized Fiducial Inference: A Review and New Results

Jan Hanniga, Hari Iyerb, Randy C. S. Laic,d, and Thomas C. M. Leec

aDepartment of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA; bStatistical Engineering Division, NationalInstitute of Standards and Technology, Gaithersburg, MD, USA; cDepartment of Statistics, University of California, Davis, Davis, CA, USA; dDepartment ofMathematics & Statistics, University of Maine, Orono, ME, USA

ARTICLE HISTORYReceived February Revised January

KEYWORDSApproximate Bayesiancomputations;Data-generating equation;Fiducial inference; Jacobiancalculation; Model selection;Uncertainty quantification

ABSTRACTR. A. Fisher, the father of modern statistics, proposed the idea of fiducial inference during the first half of the20th century. While his proposal led to interesting methods for quantifying uncertainty, other prominentstatisticians of the time did not accept Fisher’s approach as it became apparent that some of Fisher’s boldclaims about theproperties of fiducial distributiondidnot hold up formulti-parameter problems. Beginningaround the year 2000, the authors and collaborators started to reinvestigate the idea of fiducial inferenceand discovered that Fisher’s approach, when properly generalized, would open doors to solvemany impor-tant anddifficult inferenceproblems. They termed their generalizationof Fisher’s idea asgeneralizedfiducialinference (GFI). Themain ideaofGFI is to carefully transfer randomness from thedata to theparameter spaceusing an inverse of a data-generating equationwithout the use of Bayes’theorem. The resulting generalizedfiducial distribution (GFD) can then be used for inference. After more than a decade of investigations, theauthors and collaborators have developed a unifying theory for GFI, and provided GFI solutions to manychallenging practical problems in different fields of science and industry. Overall, they have demonstratedthat GFI is a valid, useful, and promising approach for conducting statistical inference. The goal of this articleis to deliver a timely and concise introduction to GFI, to present some of the latest results, as well as to listsome related open research problems. It is authors’ hope that their contributions to GFI will stimulate thegrowth and usage of this exciting approach for statistical inference. Supplementarymaterials for this articleare available online.

1. Introduction

The origin of fiducial inference can be traced back to R. A. Fisher(1922, 1925, 1930, 1933, 1935) who introduced the concept ofa fiducial distribution for a parameter, and proposed the useof this fiducial distribution, in place of the Bayesian posteriordistribution, for interval estimation of the parameter. In simplesituations, especially in one parameter families of distributions,Fisher’s fiducial intervals turned out to coincide with classicalconfidence intervals. For multi-parameter families of distribu-tions, the fiducial approach led to confidence sets whose fre-quentist coverage probabilities were close to the claimed confi-dence levels but theywere not exact in the repeated sampling fre-quentist sense. Fisher’s proposal led tomajor discussions amongthe prominent statisticians of themid-20th century (e.g., Jeffreys1940; Stevens 1950; Tukey 1957; Lindley 1958; Fraser 1961a,b,1966, 1968; Dempster 1966, 1968). Many of these discussionsfocused on the nonexactness of the confidence sets and alsononuniqueness of fiducial distributions. The latter part of the20th century has seen only a handful of publications (Dawid,Stone, and Zidek 1973; Wilkinson 1977; Dawid and Stone 1982;Barnard 1995; Salome 1998) as the fiducial approach fell intodisfavor and became a topic of historical interest only.

Since the mid-2000s, there has been a revival of interestin modern modifications of fiducial inference. This increase

CONTACT Jan Hannig [email protected] Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC .Color versions of one or more of the figures in the article can be found online atwww.tandfonline.com//r/JASA.Supplementary materials for this article are available online. Please go towww.tandfonline.com/r/JASA.

of interest demonstrated itself in both the number of differentapproaches to the problem and the number of researchers work-ing on these problems, and is leading to an increasing number ofpublications in premier journals. The common thread for theseapproaches is a definition of inferentially meaningful probabil-ity statements about subsets of the parameter space without theneed for subjective prior information.

These modern approaches include Dempster–Shafer theory(Dempster 2008; Edlefsen, Liu, and Dempster 2009) and recent(since 2010) related approach called inferential models (Mar-tin, Zhang, and Liu 2010; Zhang and Liu 2011; Martin and Liu2013, 2015a,c), which aims at provably conservative and effi-cient inference. While their philosophical approach to infer-ence is different from ours, the resulting solutions are oftenmathematically closely related to the fiducial solutions presentedhere. Interested readers can learn about the inferential modelsapproach to inference from the book byMartin and Liu (2015b).A somewhat different approach termed confidence distributionslooks at the problem of obtaining an inferentially meaningfuldistribution on the parameter space from a purely frequentistpoint of view (Xie and Singh 2013). One of the main contribu-tions of this approach is fusion learning: its ability to combineinformation from disparate sources with deep implications formeta analysis (Schweder and Hjort 2002; Singh, Xie, and Straw-derman 2005; Xie, Singh, and Strawderman 2011; Hannig and

© American Statistical Association

Page 3: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1347

Xie 2012; Xie et al. 2013). Another related approach is based onhigher order likelihood expansions and implied data-dependentpriors (Fraser 2004, 2011; Fraser, Reid, and Wong 2005; Fraserand Naderi 2008; Fraser, Fraser, and Staicu 2009; Fraser et al.2010). Objective Bayesian inference, which aims at findingnonsubjective model-based priors, is also part of this effort.Examples of recent breakthroughs related to reference prior andmodel selection are Bayarri et al. (2012), Berger (1992), Bergerand Sun (2008), Berger, Bernardo, and Sun (2009), and Berger,Bernardo, and Sun (2012). Objective Bayesian inference is avery well-developed field but there is room for fiducial infer-ence for many reasons. It often provides a good alternative bothin terms of performance and ease of use, and the generalizedfiducial distribution is never improper. Moreover, generalizedfiducial inference and its various cousins are rapidly evolvingand have the potential for uncovering deep and fundamentalinsights behind statistical inference. Finally, there are severalother recent fiducial related works including Wang (2000), Xuand Li (2006), Veronese and Melilli (2015), and Taraldsen andLindqvist (2013) who show how fiducial distributions naturallyarise within a decision theoretical framework.

Arguably, generalized fiducial inference (GFI) has been onthe forefront of the modern fiducial revival. It is motivated bythe work of Tsui andWeerahandi (1989, 1991) andWeerahandi(1993, 1994, 1995) on generalized confidence intervals and thework of Chiang (2001) on the surrogate variable method forobtaining confidence intervals for variance components. Themain spark came from the realization that there was a connec-tion between these new procedures and fiducial inference. Thisrealization evolved through a series of works (Iyer, Wang, andMathew 2004; Patterson, Hannig, and Iyer 2004; Hannig, Iyer,and Patterson 2006b; Hannig 2009).

GFI defines a data-dependent measure on the parameterspace by carefully using an inverse of a deterministic data-generating equation without the use of Bayes’ theorem. Theresulting generalized fiducial distribution (GFD) is a data-dependent distribution on the parameter space. GFD can beviewed as a distribution estimator (as opposed to a point orinterval estimator) of the unknown parameter of interest. Theresulting GFDwhen used to define approximate confidence setsis often shown in simulations to have very desirable proper-ties, for example, conservative coverages but shorter expectedlengths than competing procedures (E, Hannig, and Iyer 2008).

The strengths and limitations of the generalized fiducialapproach are becoming better understood, see, especially,Hannig (2009, 2013). In particular, the asymptotic exactnessof fiducial confidence sets, under fairly general conditions,was established in Hannig (2013), Hannig, Iyer, and Patterson(2006b), and Sonderegger and Hannig (2014). Higher orderasymptotics of GFI was studied in Majumder and Hannig(2015). GFI has also been extended to prediction problems inWang, Hannig, and Iyer (2012a). Model selection was intro-duced into the GFI paradigm in Hannig and Lee (2009). Thisidea was then further explored in classical setting in WandlerandHannig (2011) and in the ultra high-dimensional regressionin Lai, Hannig, and Lee (2015).

GFI has been proven useful in many practical applications.Earlier examples include bioequivalence (McNally, Iyer, andMathew 2003; Hannig et al. 2006a), problems of metrology

(Hannig, Wang, and Iyer 2003; Wang and Iyer 2005, 2006a,b;Hannig, Iyer, and Wang 2007; Wang, Hannig, and Iyer 2012b),and interlaboratory experiments and international key compari-son experiments (Iyer,Wang, andMathew2004). It has also beenapplied to derive confidence procedures in many important sta-tistical problems, such as variance components (E, Hannig, andIyer 2008; Cisewski and Hannig 2012), maximum mean of amultivariate normal distribution (Wandler and Hannig 2011),multiple comparisons (Wandler and Hannig 2012a), extremevalue estimation (Wandler and Hannig 2012b), mixture of nor-mal andCauchy distributions (Glagovskiy 2006), wavelet regres-sion (Hannig and Lee 2009), and logistic regression and binaryresponse models (Liu and Hannig 2016).

One main goal of this article is to deliver a concise introduc-tion to GFI. Our intention is to provide a single location wherethe various developments of the last decade can be found. As asecond goal of this article, some original work and refined resultson GFI are also presented. Specifically, they are Definition 1 andTheorems 1, 3, and 4.

The rest of this article is organized as follows. Start-ing from Fisher’s fiducial argument, Section 2 provides acomplete description of GFI, including some new results.The issue of model selection within the GFI frameworkis discussed in Section 3. Section 4 concerns the use ofGFI for discrete and discretized data, and Section 5 offerssome practical advice on how to handle common compu-tational challenges when applying GFI. Lastly, Section 5.1provides some concluding remarks while technical detailsare relegated to the online appendix. The following web-site http://anson.ucdavis.edu/∼tcmlee/GFiducial.html containscomputer code for many of the methods in this review.

2. The Switching Principle: Fisher’s “FiducialArgument”Extended

The idea underlying GFI is motivated by our understanding ofFisher’s fiducial argument. GFI begins with expressing the rela-tionship between the data,Y , and the parameters, θ, as

Y = G(U , θ), (1)

where G(·, ·) is a deterministic function termed the data-generating equation, and U is the random component of thisdata-generating equation whose distribution is independent ofparameters and completely known.

The dataY are assumed to be created by generating a randomvariable U and plugging it into the data-generating Equation(1). For example, a single observation fromN(μ, 1) distributioncan be written as Y = μ+U , where θ = μ and U is N(0, 1)random variable.

For simplicity, this subsection only considers the simple casewhere the data-generating Equation (1) can be inverted and theinverseQy(u) = θ exists for any observed y and for any arbitraryu. Fisher’s fiducial argument leads one to define the fiducial dis-tribution for θ as the distribution ofQy(U �)whereU � is an inde-pendent copy ofU . Equivalently, a sample from the fiducial dis-tribution of θ can be obtained by generating U �

i , i = 1, . . . ,Nand using θ�i = Qy(U �

i ). Estimates and confidence intervals for

Page 4: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1348 J. HANNIG ET AL.

θ can be obtained based on this sample. In the N(μ, 1) exam-ple, Qy(u) = y − u and the fiducial distribution is therefore thedistribution of y −U � ∼ N(y, 1).

Example 1. Consider the mean and sample variance Y =(Y , S2) computed from n independent N(μ, σ 2) random vari-ables, where μ and σ 2 are parameters to be estimated. A naturaldata-generating equation forY is

Y = μ+ σU1 and S2 = σ 2U2,

where U1,U2 are independent with U1 ∼ N(0, n−1) and U2 ∼Gamma((n − 1)/2, (n − 1)/2).

The inverse Qy(u) = (y − s u1/√u2, s2/u2). Consequently,

for any observed value y and s, and an independent copy ofU , denoted as U �, the distribution of μ� = y − sU �

1 /√U �2 is

the marginal fiducial distribution of μ. The equal tailed set of95% fiducial probability is (y − ts/

√n, y + ts/

√n) where t is

the 0.025 critical value of the t-distribution with n − 1 degreesof freedom, which is the classical 95% confidence interval forμ.

Remark 1. Generalized fiducial distribution is a data-dependentmeasure on the parameter space. It is mathematically very simi-lar to Bayesian posteriors and can be used in practice in a similarfashion. For example, the median (or mean) of the GFD can beused as a point estimator. More importantly, certain sets of fidu-cial probability 1 − α can be used as approximate (1 − α)100%confidence sets, see Theorem 3. A nested collection of suchapproximate confidence sets at all confidence levels can also beinverted for a use as an approximate p-value (Hannig 2009; Xieand Singh 2013).

A useful graphical tool for visualizingGFDs is the confidencecurve of Birnbaum (1961). If R(θ |x) is the distribution (or sur-vival) function of a marginal fiducial distribution, the confi-dence curve is defined as CV(θ ) = 2|R(θ |x)− 0.5|. On a plotof CV(θ ) versus θ , a line across the height (y-axis) of α, for any0 < α < 1, intersects with the confidence curve at two points,and these two points correspond (on x-axis) to an α level, equaltailed, two-sided confidence interval for θ . Thus, a confidencecurve is a graphical device that shows confidence intervals of alllevels. The minimum of a confidence curve is the median of thefiducial distribution which is the recommended point estimator.Figure 1 shows an example of a confidence curve for a datasetgenerated fromU (θ, θ2) distribution of Example 4.

Remark 2. We have made a conscious choice to eschew philo-sophical controversies throughout the development of GFI.However, we find it inevitable to make at least some philosoph-ical comments at this point:

1. The idea behind GFD is very similar to the idea behindthe likelihood function: what is the chance of observingmy data if any given parameter was true. The added valueof GFD is that it provides likelihood function with anappropriate Jacobian obtaining a proper probability dis-tribution on the parameter space, see (4) below.

2. GFD does not presume that the parameter is random.Instead it should be viewed as a distribution estimator(rather than a point or interval estimator) of the fixedtrue parameter. To validate this distribution estimator ina specific example, we then typically demonstrate good

small sample performance by simulation and prove goodlarge sample properties by asymptotic theorems.

3. From a Bayesian point of view, Bayes’ theorem updatesthe distribution of U after the data are observed. How-ever, when no prior information is present, changingthe distribution of U only by restricting it to the set“there is at least one θ solving the equation y = G(U , θ)”seems to us as a reasonable choice (see the next section).Arguably, this so-called “continuing to regard” assump-tion has been behind most of the philosophical contro-versies surrounding fiducial inference in the past.

2.1. A Refined Definition of Generalized FiducialDistribution

The inverse to Equation (1) does not exist for two possible rea-sons. Either, there is more than one θ for some value of y andu, or there is no θ satisfying y = G(u, θ). The first situation canbe dealt with by using the mechanics of Dempster–Shafer calcu-lus (Dempster 2008). A more practical solution is to select oneof the several solutions using a possibly randommechanism. InSection 4, we will review theoretical results that showed that theuncertainty due to multiple solutions has, in many parametricproblems, only a second-order effect on statistical inference.

For the second situation, Hannig (2009) suggested removingthe values of u for which there is no solution from the samplespace and then renormalizing the probabilities, that is, usingthe distribution of U conditional on the event Uy = {u : y =G(u, θ), for some θ}. The rationale for this choice is that weknow that the observed data y were generated using some fixedunknown θ0 and u0, that is, y = G(θ0, u0). The values of u forwhich y = G(·, u) does not have a solution could not be thetrue u0 hence only the values of u for which there is a solutionshould be considered in the definition of the generalized fiducialdistribution (an exception to this suggestion is in cases wherethe parameter space is in some way constrained. In this case, itis often beneficial to extend the parameter space, perform theinversion in the extended space, and then project to the bound-ary of the constrained parameter space. A good example of sucha situation is the variance componentmodel where variances areconstrained to be greater than or equal to zero (E, Hannig, andIyer 2009; Cisewski andHannig 2012)). However,Uy, the set of ufor which the solution exists, has probability zero formost prob-lems involving absolutely continuous random variables. Condi-tioning on such a set of probability zero will therefore lead tononuniqueness due to the Borel paradox (Casella and Berger2002, sec. 4.9.3).

Hannig (2013) proposed an attractive interpretation of theconditional distribution by limit of discretizations. Here, wegeneralize this approach slightly. Throughout this manuscript,U � denotes an independent copy ofU and θ� denotes a randomvariable taking values in the parameter space �.

To define GFD, we need to interpret the ill-defined condi-tional distribution of U � | U � ∈ Uy. To do that we “fatten up”the manifold Uy by ε so that the enlarged set Uy,ε = {u : ‖y −G(u, θ)‖ ≤ ε, for some θ} has positive probability and the con-ditional distribution of U � | U � ∈ Uy,ε is well defined. Finally,the fattening needs to be done in a consistent way so that the

Page 5: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1349

limit of conditional distributions as ε → 0 is well defined. Thisleads to the following definition:

Definition 1. A probability measure on the parameter space �

is called a generalized fiducial distribution (GFD) if it can beobtained as a weak limit:

limε→0

[arg min

θ�‖y − G(U �, θ�)‖

∣∣∣ minθ�

‖y − G(U �, θ�)‖ ≤ ε

].

(2)

If there are multiple minimizers arg minθ� ‖y − G(U �, θ�)‖,one selects one of them (potentially at random). Notice that theconditioning in (2) is modifying the distribution of U � to onlyconsider values for which an approximate inverse to G exists.

Remark 3. Definition 1 illuminates the relationship betweenGFD and Approximate Bayesian Computations (ABC; Beau-mont, Zhang, and Balding 2002). In an idealized ABC, one gen-erates first an observation θ∗ from the prior, then generates anew sample using a data-generating equation y� = G(U �, θ�)

and compares the generated data with the observed data y. If theobserved and generated datasets are close (e.g., ‖y − y�‖ ≤ ε),the generated θ� is accepted, otherwise it is rejected and the pro-cedure is repeated. If themeasure of closeness is a norm, it is easyto see that when ε → 0 the weak limit of the ABC distributionis the posterior distribution.

On the other hand, when defining GFD one generates U �,finds a best-fitting θ� = arg minθ� ‖y − G(U �, θ�)‖, computesy� = G(U �, θ�), again accepts θ� if ‖y − y�‖ ≤ ε, and rejectsotherwise.

In either approach, an artificial dataset y� = G(U �, θ�) isgenerated and compared to the observed data. The main differ-ence is that the Bayes posterior simulates the parameter θ� fromthe prior while GFD uses the best-fitting parameter.

Remark 4. The GFD defined in (2) is not unique as it dependson both the data-generating Equation (1), the norm used in (2)and the minimizer θ� chosen. LetU � be an independent copy ofU and let for any measurable set A, V [A] be a rule selecting a

possibly random element of the closure of the set A. When theprobability P(∃θ�, y = G(U �, θ�)) > 0 then the limit (2) is theconditional distribution

V [{θ� : y = G(U �, θ�)}] | {∃θ�, y = G(U �, θ�)}.This is an older definition of GFD that can be found in Hannig(2009, 2013).

The next subsection offers a useful computational formulafor evaluating (2).

2.2. A User Friendly Formula for Generalized FiducialDistribution

WhileDefinition (2) for GFD is conceptually appealing and verygeneral, it is not immediately clear how to compute the limit inmany practical situations. In a less general setup using the l∞norm, Hannig (2013) derived a closed form of the limit in (2)applicable to many practical situations. Here, we provide a gen-eralization of this result, which is applicable in most situationswhere the data follow a continuous distribution.

Assume that the parameter θ ∈ � ⊂ Rp is p-dimensional,

the data x ∈ Rn are n dimensional. The following theorem pro-

vides a useful computational formula.

Theorem 1. Suppose Assumptions A.1 to A.3 stated in AppendixA. Then the limiting distribution in (2) has a density

r(θ|y) = f (y, θ)J(y, θ)∫�

f (y, θ′)J(y, θ′) dθ′ , (3)

where f (y, θ) is the likelihood and the function

J(y, θ) = D

(ddθ

G(u, θ)∣∣∣∣u=G−1(y,θ)

). (4)

If (i) n = p then D(A) = | detA|. Otherwise the function D(A)depends on the norm used; (ii) the l∞ norm gives D(A) =∑

i=(i1,...,ip) |det(A)i|; (iii) under an additional Assumption A.4the l2 norm gives D(A) = (detA A)1/2.

Figure . Confidence curve of a GFD for parameter θ based on sample of size from U(θ, θ2) with θ = 100. The minimum and maximum values of the sample usedto generate the GFD are . and ., respectively. The interval between the two points where the dotted line intersects the confidence curve (98.49, 105.92) is theapproximate % confidence interval. Theminimumof the confidence curve is themedian of the generalized fiducial distribution. Its value of . provides a natural pointestimator.

Page 6: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1350 J. HANNIG ET AL.

In (ii) the sum spans over(np

)of p-tuples of indexes i = (1 ≤

i1 < · · · < ip ≤ n). For any n × pmatrix A, the submatrix (A)iis the p× pmatrix containing the rows i = (i1, . . . , ip) of A.

There is a slight abuse of notation in (3) as r(θ|y) is not aconditional density in the usual sense. Instead, we are using thisnotation to remind the reader that the fiducial density dependson the fixed observed data.

Cases (i) and (ii) are a simple consequence of results in Han-nig (2013). The formula in (iii) was independently proposed inFraser et al. (2010) based on arguments related to tangent expo-nential families without being recognized as a fiducial distribu-tion. The proof is in Appendix A. The ease of use of (4) will bedemonstrated on several examples in the next subsection. Therest of this subsection discusses the effects of various transfor-mations.

Remark 5. Just like posterior computed using Jeffreys’ prior,GFD is invariant under smooth reparameterizations.

This assertion has been shown for smooth transformation bychain rule in Hannig (2013). However, this property is generaland follows directly from (2), since for an appropriate selectionof minimizers and any one-to-one function θ = φ(η)

φ

(arg min

η�‖y − G(U �, φ(η�))‖

)= arg min

θ�‖y − G(U �, θ�)‖.

Remark 6. GFD could change with transformations of the data-generating equation.

Assume that the observed dataset has been transformedwith a one-to-one smooth transformation Z = T (Y ). Usingthe chain rule, we see that the GFD based on this new data-generating equation and with observed data z = T (y) is thedensity (3) with the Jacobian function (4) simplified to

JT (z, θ) = D

(ddy

T (y) · ddθ

G(u, θ)∣∣∣∣u=G−1(y,θ)

). (5)

Notice that for simplicity we write y instead of T−1(z) in (5).For completeness, we recall the well-known fact that the like-

lihood based on z = T (y) satisfies

fT (z|θ) = f (y|θ)∣∣∣∣det

(dTdy

)∣∣∣∣−1

. (6)

The second term in the right-hand side of (6) is a constant anddoes not affect the GFD with the exception of model selectionconsiderations in Section 3.

As can be seen from the above calculation, GFD will usuallychange with transformation of the data. An important exceptionis when the number of observations and number of parametersare equal, that is, n = p. Indeed, by careful evaluation of (4),(5), and (6), we see that for z = T (y) we have J(y, θ) fY (y|θ) =JT (z, θ) fT (z|θ) and the GFD is unchanged.

Example 2. Consider the following important transformation.Let Z = (S,A) be one-to-one smooth transformation, whereS is a p-dimensional statistic and A is an ancillary statistic. Lets = S(y) and a = A(y) be the observed values. Since dA/d� =0, the function D in (5) is the absolute value of the determinant

of the p× p nonzero submatrix:

J(z, θ) =∣∣∣∣∣det

(ddθ

S(G(u, θ))∣∣∣∣u=G−1(y,θ)

)∣∣∣∣∣ . (7)

Next, denote the solution of the equation s = S(G(u, θ)) byQs(u) = θ. A straightforward calculation shows that the fiducialdensity (3) with (7) is the conditional distribution of Qs(U �) |A(U �) = a, the GFD based on S conditional on the observedancillary A = a, see Birnbaum (1962) and Iyer and Patterson(2002).

2.3. Two Examples

In this section, we will consider two examples, linear regressionanduniformdistribution. In the first case, theGFD is the same asBayes posterior with respect to the independence Jeffreys’ priorwhile in the second theGFD is not a Bayes posterior with respectto any prior (that is not data dependent).

Example 3 (Linear Regression). Express linear regression usingthe data-generating equation

Y = G(U , θ) = Xβ + σU ,

whereY is the dependent variables, X is the design matrix, θ =(β, σ ) are the unknown parameters, andU is a random vectorwith known density f (u) independent of any parameters.

To compute GFD, simply notice that ddθG(U , θ) =

(X,U ), U = σ−1(y − Xβ). From here the Jacobian in(4) using the l∞ norm simplifies to

J∞(y, θ) = σ−1∑

i=(i1,...,ip)1≤i1<···<ip≤n

|det (X,Y )i|

and the density of GFD is

r(β, σ |y) ∝ σ−n−1 f (σ−1(Y − Xβ)).

This coincides with the Bayesian solution using the indepen-dence Jeffreys’ prior (Yang and Berger 1997).

The J function has a more compact form when using thel2 norm. In particular by Cauchy–Binet formula, we see thatdet((X, y − Xβ) (X, y − Xβ)) is invariant in β . By selectingβ = (X X )−1X y, we immediately obtain

J2(y, θ) = σ−1| det(X X )| 12 RSS

12 ,

where RSS is the residual sum of squares. As the two Jacobianfunctions differ only by a constant, the GFD is unchanged.

As a special case, the GFD for the location-scale model X =1, the l∞ Jacobian is J∞(y, θ) = σ−1∑

i< j |Yi −Yj| while the l2Jacobian becomes J2(y, θ) = σ−1nσn, where σn is themaximumlikelihood estimator of σ .Example 4 (Uniform U {a(θ )− b(θ ), a(θ )+ b(θ )}). As a sec-ond example, we will study a very irregular model. The refer-ence prior for this model is complicated and has been obtainedas Theorem 8 in Berger, Bernardo, and Sun (2009).

Express the observed data using the following data-generating equation

Yi = a(θ )+ b(θ )Ui, Ui iidU (−1, 1).

Page 7: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1351

Simple computations give ddθG(u, θ ) = a′(θ )+ b′(θ )U with

U = b−1(θ )(Y − a(θ )). If a′(θ ) > |b′(θ )|, (4) simplifies to

J1(y, θ ) =n∑

i=1

∣∣a′(θ )+ {log b(θ )}′{yi − a(θ )}∣∣= n[a′(θ )− a(θ ){log b(θ )}′ + yn{log b(θ )}′]. (8)

We used a′(θ ) > |b′(θ )| only to show that the terms inside theabsolute values below are all positive. However, we remark thatunder this assumption both a(θ )− b(θ ) and a(θ )+ b(θ ) arestrictly increasing, continuous functions of θ .

With the above, the GFD is then

r(θ |y) ∝ a′(θ )− a(θ ){log b(θ )}′ + yn{log b(θ )}′b(θ )n

×I{a(θ )−b(θ )<y(1) & a(θ )+b(θ )>y(n)}. (9)

As an alternative fiducial solution, consider a transformationto the minimal sufficient and ancillary inspired by of the Exam-ple 2. Z = {h1(Y(1)), h2(Y(n)), (Y −Y(1))/(Y(n) −Y(1))} .We selected the transformations hi so that their inverseh−11 (θ ) = EY(1) = a(θ )− b(θ )(n − 1)/(n + 1) and h−1

2 (θ ) =EY(n) = a(θ )+ b(θ )(n − 1)/(n + 1). There are only twononzero terms in (5) and consequently

J2(y, θ ) = (w1 + w2)

[a′(θ )− a(θ ){log b(θ )}′

+w1y(1) + w2y(n)w1 + w2

{log b(θ )}′], (10)

where w1 = h′1(y(1)) and w2 = h′

2(y(n)).

We performed a simulation study for the particular case ofU (θ, θ2); a(θ ) = θ, b(θ ) = θ2 − θ . For this model, the likeli-hood is f (y|θ ) = {θ (θ − 1)}−nI

(y1/2(n) ,y(1) )

(θ ) and the Jacobiansare

J1(y, θ ) = ny(2θ − 1)− θ2

θ (θ − 1)and

J2(y, θ ) = (w1y(1) + w2y(n))(2θ − 1)− (w1 + w2)θ2

θ (θ − 1),

where

w1 = 1 + n√n2 + 4(1 + n)y(1)

, w2 = 1 + n√1 + 4n(1 + n)y(n)

.

An example of confidence curve for a GFD based on (10) is inFigure 1.

We compared the performance of the two fiducial distribu-tions to the Bayesian posteriors with the reference prior π(θ ) =(2θ−1)θ (θ−1) e

ψ( 2θ2θ−1 ) (Berger, Bernardo, and Sun 2009) (ψ(x) is the

digamma function defined by ψ(z) = ddz log(�(z)) for z > 0)

and flat prior π(θ ) = 1.For all the combinations of n = 1, 2, 3, 4, 5, 10, 20, 100, 250

and θ = 1.5, 2, 5, 10, 25, 50, 100, 250 we analyzed 16,000 inde-pendent datasets. Based on this we found the empirical coverageof the 2.5%, 5%, 50%, 95%, 97.5% upper confidence bounds.The results are summarized in Figure 2. We observed that thesimple GFD (denoted by F1 in the figures), the alternativeGFD based on minimal sufficient statistics (F2) and the refer-ence prior Bayes posterior (BR) maintain stated coverage for allparameter settings. However, the flat prior Bayes posterior (B1)does not have a satisfactory coverage, with the worst departuresfrom stated coverage observed for small n and large θ .

Figure . Boxplots of empirical coverages of the 2.5%, 5%, 50%, 95%, 97.5% upper confidence bounds for the simple GFD (F), GFD based onminimal sufficient statistics(F), reference Bayes (BR), and flat prior Bayes (B) for all parameter settings. The broken lines provide bounds on randomfluctuations of the empirical coverages showingthat F, F, and BR maintain stated coverage while B does not.

Page 8: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1352 J. HANNIG ET AL.

Figure . Boxplots of log10(MAD) of GFDbased onminimal sufficient statistics (F), reference Bayes (BR), and flat prior Bayesminus the log10(MAD) of the simple fiducial(F) averaged over , simulated datasets. Each parameter combinations provide one data point for the boxplots. Positive valuesmean that GFD F concentrates closerto the truth; consequently, F is the best in this metric.

For each dataset, we have also measured the mean absolutedeviation from the true parameter (MAD) of each of the GFDand posteriors, that is, MAD = ∫ |θ − θ0|r(θ |y)dθ . To aid incomparison, we compute the difference of the log10(MAD) ofF2, BR, and B1 minus the log10(MAD) of F1 on each dataset. Apositive (resp., negative) value of the difference signifies that theposterior is concentrated further from (resp., closer to) the truththan the simple fiducial. These relative MADs are then averagedacross the 16,000 simulated datasets for all the parameter set-tings and reported in Figure 3. We observe that F2 is better thanF1, which is better than BR though the absolute value of the dif-ference is relatively small. B1 is not competitive.

2.4. Theoretical Results

This section discusses asymptotic properties for GFI. We hopethat the material included here will be useful for the study ofGFD in future practical problems.

First, we present a Bernstein–von Mises theorem for GFD,which provides theoretical guarantees of asymptotic normalityand asymptotic efficiency. It also guarantees in conjunction withTheorem 3 that appropriate sets of fiducial probability 1 − α

are indeed approximate 1 − α confidence sets. An early ver-sion of this theorem can be found in Hannig (2009). Here, wewill state a more general result due to Sonderegger and Hannig(2014).

Assume that we are given a random sample of independentobservations Y1, . . . ,Yn with data-generating equation Yi =G(θ,Ui), with Ui iid U (0, 1). This data-generating equationleads to the Jacobian function (4) that is aU -statistic. This real-ization makes the GFD amenable to theoretical study.

Asymptotic normality of statistical estimators usually relieson a set of technical assumptions and GFD is no exception. To

succinctly state the theorem, we denote the rescaled density ofGFD by r∗(s|y) = n−1/2r(n−1/2s + θ|y), where θ is the consis-tent maximum likelihood estimator (MLE).

Theorem 2 (Sonderegger and Hannig 2014). Under Assump-tions B.1 to B.4 in Appendix B∫

Rp

∣∣∣∣r∗ (s|y)−√det |I (θ0)|√

2πe−sT I(θ0)s/2

∣∣∣∣ ds Pθ0→ 0.

One application of GFD is to take sets of 1 − α fiducial prob-ability and use them as approximate confidence intervals. Nextwe state conditions under which this is valid.Assumption 1. Let us consider a sequence of datasets Y n gener-ated using fixed parameters θn ∈ �n with corresponding data-dependent measures (such data-dependent measures can be, forexample, GFD s, Bayes posteriors, or confidence distributions)on the parameter space Rn,Y n . We will assume that these con-verge to a limiting fiducial model in the following way:

1. There is a sequence of measurable functions tn of Yn sothat tn(Yn) converges in distribution to some randomvariable T .

2.(a) The T from Part 1 can be decomposed into

T = (T 1,T 2) and there is a limiting data-generatingequation T 1 = H1(V 1, ξ),T 2 = H2(V 2), whereV = (V 1,V 2) has a fully known distribution inde-pendent of the parameter ξ ∈ �. The distributionof T is obtained from the limiting data-generatingequation using ξ0.

(b) The equation H1 is one-to-one if viewed as a func-tion (possibly implicit) for any combination of ξ, v1,t1, where one is held fixed, one taken as a depen-dent, and one taken as an independent variable. The

Page 9: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1353

equation H2 is one to one. Consequently, the lim-iting GFD defined by (2) is the conditional distri-bution Qt1 (V �

1) | H2(V �2) = t2, where Qt1 (v1) = ξ

is the solution of t1 = H1(v1, ξ). Denote this con-ditional measure by Rt .

(c) For any open setC ⊂ � and limiting data t , the lim-iting fiducial probability of the boundary Rt (∂C) =0.

3. There are homeomorphic injective mappings n from�n into � so that(a) n(θn,0) = ξ0;(b) For any sequence of data tn(yn) → t, the trans-

formed fiducial distribution measures convergeweakly Rn,yn

−1n

W−→ Rt .

Theorem 3. Suppose Assumption 1 holds. Fix a desired cover-age 0 < α < 1. For any observed data yn, select an open setCn(yn) satisfying: (i) Rn,yn (Cn(yn)) = α; (ii) tn(yn) → t implies n(Cn(yn)) → C(t ); (iii) the set Vt2 = {(v1, v2) : Qt1 (v1) ∈C(t ) and t2 = H2(v2)} is invariant in t1.

Then the setsCn(yn) are α asymptotic confidence sets.

The theorem provides a condition on how various sets of afixed fiducial probability need to be linked together across dif-ferent observed datasets to make up a valid confidence set. Tounderstand the key Condition (iii) notice that it assumes thesetsC(t ) are obtained by de-pivoting a common set Vt2 . In par-ticular if the limiting data-generating equation T 1 = H1(V 1, ξ)

has group structure, Condition (iii) is equivalent to assuming thesetsC(t ) are group invariant in t1. The conditions on the limit-ing data-generating equation were partially inspired by resultsfor Inferential Models of Martin and Liu (2015a). The proofof Theorem 3 is in Appendix C. Also, this corollary followsimmediately:

Corollary 1. Any model that satisfies the assumptions of Theo-rem 2 satisfies Assumption 1. In particular, for any fixed interiorpoint θ0 ∈ �0 the limiting data-generating equation T = ξ +V where the random vector V ∼ N(0, I(θ0)−1). The transfor-mations are tn(yn) = n1/2(θn − θ0), n(θ) = n1/2(θ − θ0) andξ0 = 0. Any collection of sets Cn(yn) that in the limit becomeslocation invariant will form asymptotic confidence intervals.

Most of the theoretical results for GFI in the literaturewere derived in regular statistical problems and are covered byCorollary 1. Notice that in the regular case the limiting data-generating equation has no ancillary part. The next exampleshows that the ancillary part in Theorem 3 is needed in somenonregular cases.

Example 5 (Example 4 continued). Recall that Yi = a(θ )+b(θ )Ui, i = 1, . . . , n, whereUi are iidU (−1, 1).We assume thata′(θ ) > |b′(θ )| for θ ∈ � so that the GFD Rn,yn has a densitygiven by (9). Fix an interior point θ0 ∈ �0. To verify conditionsof Theorem 3, we need to define the limiting data-generatingequation, and the transformations tn and n. We start with thelimiting data-generating process:

T1 = ξ +V1, T2 = V2,

where V1 = (E1 − E2)/2, V2 = (E1 + E2)/2 with E1,E2 areindependent, E1 ∼ exp[{a′(θ0)− b′(θ0)}/{2b(θ0)}] and E2 ∼exp[{a′(θ0)+ b′(θ0)}/{2b(θ0)}]. The density of the limitingGFD is therefore proportional to

r(ξ |t ) ∝ e−ξ {log b(θ0)}′I(T1−T2,T1+T2 )(ξ ).

The fact that Assumption 1, Part 2 is satisfied followsimmediately.

Next, define the transformations

tn(y) = n(1/2 −1/21/2 1/2

⎛⎜⎜⎝

y(1) − (a(θ0)− b(θ0))a′(θ0)− b′(θ0)

a(θ0)+ b(θ0)− y(n)a′(θ0)+ b′(θ0)

⎞⎟⎟⎠,

n(θ ) = n(θ − θ0).

Simple calculations show that Assumption 1, Part 1 and 3 aresatisfied with ξ0 = 0.

Finally, notice that any collection of sets of fiducial probabil-ity α that in the limit becomes location invariant in t1 (such asone sided or equal tailed intervals) are asymptotic α confidenceintervals.

2.5. Practical Use of GFI

From a practical point of view, GFI is used in a way similarto the use of a posterior computed using a default (objective)prior, such as probability matching, reference, or flat prior. Themain technical difference is that the objective prior is replacedby a data-dependent Jacobian (4). This data dependence can insome examples lead to the existence of second-order matchingGFD even when only first-order matching is available with thenondata-dependent priors (Majumder andHannig 2015). Someargued (Welch and Peers 1963; Martin and Walker 2014) thatdata-dependent priors are essential in achieving superior fre-quentist properties in complex statistical problems.

First, we suggest using a set of fiducial probability 1 − α andof a good shape (such as one sided or equal tailed) as an approx-imate 1 − α confidence interval, see Theorem 3. Next, the meanor median of the GFD can be used for point estimation.

GFDs can also be used for predicting future observations.This is done by plugging in a random variable having theGFD (2) for the parameter into the data-generating equationfor the new observations. This approach produces a predic-tive distribution that accommodates in a natural way both theuncertainty in the parameter estimation and the randomnessof the future data. More details are in Wang, Hannig, and Iyer(2012a).

GFDs are rarely available in closed form. Therefore, we oftenneed to use aMarkovChainMonteCarlo (MCMC)method suchas a Metropolis–Hastings or Gibbs sampler to obtain a samplefrom the GFD. While the basic issues facing implementation ofthe MCMC procedures are similar for both Bayesian and gener-alized fiducial problems, there are specific challenges related togeneralized fiducial procedures.Wediscuss some computationalissues in Section 5.

Page 10: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1354 J. HANNIG ET AL.

3. Model Selection in GFI

Hannig and Lee (2009) introducedmodel selection into the GFIparadigm in the context of wavelet regression. The presenta-tion here is reexpressed using definition (2). There are twomainingredients needed for an effective fiducial model selection. Thefirst is to include themodel as one of the parameters and the sec-ond is to include penalization in the data-generating equation.

Consider a finite collection of models M. The data-generating equation is

Y = G(M, θM,U ), M ∈ M, θM ∈ �M, (11)

whereY is the observations,M is the model considered, θM arethe parameters associatedwithmodelM, andU is a randomvec-tor of fully known distribution independent of any parameters.

Denote the number of parameters in the model M by |M|.Similar to MLE, an important issue needing to be solved is thatGFI tends to favor models with more parameters over ones withfewer parameters. Therefore, an outside penalty accounting forour preference toward parsimony needs to be incorporated inthe model. See Appendix D for more details.

In Hannig and Lee (2009), a novel way of adding a penaltyinto the GFI framework is proposed. In particular, for eachmodelM they proposed augmenting the data-generating Equa-tion (11) by

0 = Pk, k = 1, . . . ,min(|M|, n), (12)

where Pk are iid continuous random variables with fP(0) = qindependent ofU , and q is a constant determined by the penalty.(Based on ideas from theminimum description length principleHannig and Lee (2009) recommended using q = n−1/2 as thedefault penalty.) Notice that the number of additional equationsis the same as the number of unknown parameters in themodel.

For the augmented data-generating equation, we have the fol-lowing theorem. This theorem has never been published beforebut it does implicitly appear in Hannig and Lee (2009) and Lai,Hannig, and Lee (2015). For completeness, we provide a proofin Appendix D.

Theorem 4. Let us suppose the identifiability Assumption D.1 inAppendix D holds and that each of the models satisfy assump-tions of Theorem 1 (in particular |M| ≤ n). Then the marginalgeneralized fiducial probability of modelM is

r(M|y) = q|M| ∫�M

fM(y, θM )JM(y, θM ) dθM∑M′∈M q|M′| ∫

�M′ fM′ (y, θM′ )JM′ (y, θM′ ) dθM′, (13)

where fM(y, θM ) is the likelihood and JM(y, θM ) is the Jacobianfunction computed using (4) for each fixed modelM.

Remark 7. The quantity r(M|y) can be used for inferencein the usual way. For example, fiducial factor: the ratior(M1|y)/r(M2|y), can be used in the same way as a Bayes fac-tor. As discussed in Berger and Pericchi (2001), one of the issueswith the use of improper priors in Bayesian model selection isthe presence of arbitrary scaling constant. While this is not aproblem when a single model is considered, because the arbi-trary constant cancels, it becomes a problem for model selec-tion. An advantage of GFD is that the Jacobian function (4)

comes with a scaling constant attached to it. In fact, the fidu-cial factors are closely related to the intrinsic factors of Bergerand Pericchi (1996, 2001). This can be seen from the fact thatfor the minimal training sample (n = |M|), we usually have∫�M

fM(y, θM )JM(y, θM ) dθM = 1.Similarly, the quantity r(M|y) can also be used for fiducial

model averaging much akin to the Bayesian model averaging(Hoeting et al. 1999).

We illustrate the use of this model selection on two exam-ples, wavelet regression (Hannig and Lee 2009) and ultra high-dimensional regression (Lai, Hannig, and Lee 2015).

3.1. Wavelet Regression

Suppose n-observed equispaced data points {xi}ni=1 satisfy thefollowing model

Xi = gi + εi,

where g = (g1, . . . , gn) is the true unknown regression func-tion and εi’s are independent standard normal random variableswith mean 0 and variance σ 2, and n = 2J+1 is an integer powerof 2.

Most wavelet regression methods consist of three steps. Thefirst step is to apply a forward wavelet transform to the datay and obtain the empirical wavelet coefficients y = Hx. Here,H is the discrete wavelet transform matrix. The second step isto apply a shrinkage operation to y to obtain an estimate d forthe true wavelet coefficients d = Hg. Lastly, the regression esti-mate g = (g1, . . . , gn) for g is computed via the inverse discretewavelet transform: g = H d. The second step ofwavelet shrink-age is important because it is the stepwhere statistical estimationis performed. Hannig and Lee (2009) used GFI to perform thesecond step. Apparently this is the first published work whereFisher’s fiducial idea is applied to a nonparametric problem.

Due to the orthonormality of the discrete wavelet transformmatrix H , a model for the empirical wavelet coefficients is Y =d + σU with U being a n-dimensional vector of independentN(0, 1) random variables. The assumption of sparsity impliesthat many of the entries in the vector d are zero. This allows usto cast this as a model selection problem, where the model Mis the list of nonzero entries. The data-generating Equation (11)becomes

Yk ={dk + σUk, k ∈ M,σUk, k ∈ M�.

Notice that θM = {σ 2, dk k ∈ M}. As discussed above, we aug-ment the data-generating equations by (12) with q = n−1/2.

It follows fromTheorem 4 that the GFD has generalized den-sity proportional to

r(σ 2, d,M) ∝ (σ−2)n2 +1

∑j∈M� |y j|

n − |M|

× exp

[−|M| log n

2−{∑

k∈M(dk − yk)2 +∑i∈M� y2i

}2σ 2

]

×∏i∈M�

δ0(di), (14)

Page 11: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1355

where δ0(s) is theDirac function, that is,∫A δ0(s) ds = 1 if 0 ∈ A

and 0 otherwise. The term 1/(n − |M|) is an additional normal-ization term introduced to account for the number of the ele-ments in the sum above it.

The normalizing constant in (14) cannot be computed in aclosed form so a sample from r(σ 2, d, I) will have to be sim-ulated using MCMC techniques. Note that the GFD is definedin the wavelet domain. Hannig and Lee (2009) used the inversewavelet transform to define a GFD on the function domain.

Additionally, Hannig and Lee (2009) also assumed that Msatisfies a tree condition (Lee 2002). This condition states that ifa coefficient is thresholded, all its descendants have to be thresh-olded too; the exact formulation is in Hannig and Lee (2009).This constraint greatly reduces the search space and allows forboth efficient calculations and clean theoretical results. In thearticle, they reported a simulation study showing small sampleperformance superior to the alternativemethods considered andproved an asymptotic theorem guaranteeing asymptotic consis-tency of the fiducial model selection.

3.2. Ultra High-Dimensional Regression

Lai, Hannig, and Lee (2015) extended the ideas of fiducial modelselection to the ultra high-dimensional regression setting. Themost natural data-generating equation for this model is

Y = G(M,βM, σ2,Z) = XMβM + σZ,

where Y represents the observations, M is the model consid-ered (collection of parameters that are nonzero), XM is thedesign matrix for model M, βM ∈ R

|M| and σ > 0 are param-eters, and Z is a vector of iid standard normal random variables.For computational expediency, they suggested using a sufficient-ancillary transformation that yields the same Jacobian functionas the l2 Jacobian discussed in Section 2.3. The Jacobian functionused is

JM(y, θM ) = σ−1| det(X ′MXM )| 1

2 RSS12M.

The standard Minimum Description Length (MDL) penaltyn−|M|/2 was not designed to handle ultra high-dimensionalproblems. Inspired by the ExtendedBayesian InformationCrite-rion (EBIC) penalty of Chen and Chen (2008), Lai, Hannig, andLee (2015) proposed extending the penalty by modifying (12)to

0 = B|M|, 0 = Pk, k = 1, . . . , |M|,where |M| is the dimension ofM, Bm is a Bernoulli(1 − rm) ran-dom variable that penalizes for the number of models that havethe same sizem; and Pi are iid continuous random variables withfP(0) = q independent of Bm that penalize for the size of mod-els. Following the recommendation of Hannig and Lee (2009),we select q = n−1/2. Additionally, we select rm = ( p

m

)−γ , where pis the number of parameters in the fullmodel. The second choiceis to penalize for the fact that there is a large number of modelsthat all have the same size. The most natural choice is γ = 1for which rm is the probability of randomly selecting a modelMfrom all models of size m. However, to match the EBIC penaltyof Chen and Chen (2008), we allow for other choices of γ .

We assume that for any size m, the residual vectors {I −XM(X

MXM )−1X

M}y/RSSM are distinct for all the models M ∈M′ of sizem, so that the identifiability Assumption D.1 is satis-fied. Theorem 4 implies

r(M|y) ∝ Rγ (M)

= �

(n − |M|

2

)(πRSSM )−

n−|M|−12 n− |M|+1

2

(p

|M|)−γ

. (15)

Similar to the tree constraint of the previous subsection, Lai,Hannig, and Lee (2015) additionally reduced the number ofmodels by constructing a class of candidate models, denoted asM′. This M′ should satisfy the following two properties: thenumber of models inM′ is small and it contains the true modeland models that have nonnegligible values of rγ (M). To con-structM′, they first apply the sure independence screening (SIS)procedure of Fan and Lv (2008) and then apply LASSO and/orSCAD to those p′ predictors that survived SIS, and take all thosemodels that lie on the solution path asM′. Note that construct-ingM′ in this way will ensure the true model is captured inM′

with high probability (Fan and Lv 2008).Lai, Hannig, and Lee (2015) showed good properties of the

GFI solution both by simulation and theoretical considerations.In particular, they proved a consistency theoremprovided underthe following conditions.

Let M be any model, M0 be the true model, and HM be theprojection matrix of XM , that is, HM = XM(X

MXM )−1X

M .Define �M = ||μ − HMμ||2, where μ = E(Y ) = XM0βM0

.Throughout this subsection, we assume the following identifia-bility condition holds:

limn→∞min

{�M

|M0| log p : M0 �⊂ M, |M| ≤ k|M0|}

= ∞ (16)

for some fixed k > 1. Condition (16) is closely related to thesparse Riesz condition (Zhang and Huang 2008).

Let M be the collection of models such that M ={M : |M| ≤ k|M0|} for some fixed k. The restriction |M| ≤k|M0| is imposed because in practice we only consider modelswith size comparable with the true model.

If p is large, a variable screening procedure to reduce the sizeis still needed. This variable screening procedure should resultin a class of candidate modelsM′, which satisfies

P(M0 ∈ M′) → 1 and log(m′j) = o( j log n), (17)

whereM′j contains all models inM′ that are of size j, andm′

j isthe number of models in M′

j . The first condition in (17) guar-antees the model class contains the true model, at least asymp-totically. The second condition in (17) ensures that the size ofthe model class is not too large. The authors report small sampleperformance preferable to competingmethods as determined bysimulation study and prove asymptotic consistency of the fidu-cial model selection algorithm.

4. GFI for Discrete and Interval Data

Most of thematerial presented in Sections 2 and 3was developedfor exactly observed continuous distributions. This section dis-cusses discrete and discretized observations.

Page 12: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1356 J. HANNIG ET AL.

When the observations are discrete then there is no prob-lem with the Borel paradox and the limiting distribution in (2)can be easily computed; see Remark 4. In particular, if we defineQy(u) = {θ : y = G(u, θ)} the GFD is the conditional distribu-tion

V [Qy(U �)] | {Qy(U �) �= ∅}, (18)

whereV [A] selects a (possibly random) element of the closure ofthe set A andU � is an independent copy ofU . If A = (a, b) is afinite interval, then we recommend a rule that selects one of theendpoints a or b at random independent ofU � (Hannig 2009).This selectionmaximizes the variance of the GFD, has been alsocalled “half correction” (Efron 1998; Schweder and Hjort 2002;Hannig and Xie 2012) and is closely related to the well-knowncontinuity correction used in normal approximations.

4.1. Some CommonDiscrete Distributions

In this subsection, we compute the GFDs for parameters of sev-eral popular discrete distributions.

Example 6. Let X be a random variable with distribution func-tion F(y|θ ). Assume there is Y so that Pθ (Y ∈ Y ) = 1 for allθ , and for each fixed y ∈ Y the distribution function is either anonincreasing function of θ , spanning the whole interval (0, 1),or a constant equal to 1. Similarly, the left limit F(y−|θ ) is alsoeither a nonincreasing function of θ spanning the whole interval(0, 1), or a constant equal to 0.

Define the near inverse F−(a|θ ) = inf{y : F(y|θ ) ≥ a}. It iswell known (Casella and Berger 2002) that if U ∼ U(0,1), Y =F−(U |θ ) has the correct distribution andwe use this associationas a data-generating equation.

Next, it follows that both Q+y (u) = sup{θ : F(y|θ ) = u} and

Q−y (u) = inf{θ : F(y−|θ ) = u} exist and satisfy F(y|Q+

y (u)) =u and F(y−|Q−

y (u)) = u. Consequently ifU � is an independentcopy ofU

P(Q+y (u) ≤ t ) = 1 − F(y|t ) and P(Q−

y (u) ≤ t ) = 1 − F(y−|t ).Finally, notice that for all u ∈ (0, 1) the function F−(u|θ ) isnondecreasing in θ and the closure of the inverse image Qy(u) ={Q−

y (u),Q+y (u)}. Since the condition in (18) has probability 1,

there is no conditioning and the half corrected GFD has distri-bution function

R(θ |y) = 1 − F(y|θ )+ F(y−|θ )2

.

If either of the distribution function is constant, we interpret itas a point mass at the appropriate boundary of the parameterspace.

Analogous argument shows that if the distribution functionand its left limit were increasing in θ than the half correctedGFDwould have distribution function

R(θ |y) = F(y|θ )+ F(y−|θ )2

.

Using this result, we provide a list of the half corrected GFDsfor three well-known discrete distributions. Here, we under-stand Beta(0, n + 1) and Beta(x + 1, 0) as the degenerate dis-tributions (Diracmeasure) on 0 and 1, respectively. Similarly, we

understand �(0, 1) as the degenerate distribution (Dirac mea-sure) on 0.

� X ∼ Binomial(n, p)with n known. GFD is the 50–50mix-ture of Beta(x + 1, n − x) and Beta(x, n − x + 1) distri-butions, see Hannig (2009).

� X ∼ Poisson(λ). GFD is the 50–50 mixture ofGamma(x + 1, 1) and Gamma(x, 1) distributions, seeDempster (2008).

� X ∼Negative Binomial(r, p)with r known.GFD is the 50–50 mixture of Beta(r, x − r + 1) and Beta(r, x − r) distri-butions, see Hannig (2014).

Example 7. Next we consider Y ∼ Multinomial(n, p1, . . . , pk),where n is known and pi ≥ 0,

∑ki=1 pi = 1 are unknown.

When the categories of themultinomial have a natural order-ing, Hannig (2009) suggested to write Y = ∑n

i=1 X i, ql =∑li=1 pi

and model each Xi through the data-generating equation

X i = (I(0,q1 )(Ui), I[q1,q2 )(Ui), . . . , I[qk−1,1)(Ui)

) , i = 1, . . . , n,

where U1, . . .Un are iid U(0, 1) random variables. Denote thefirst quadrant Q = {q : 0 ≤ q1 ≤ . . . qk−1 ≤ 1}. Hannig (2009)showed that the GFD (18) for q is given by

V [{q� ∈ Q : U �

(∑i

j=1 y j )≤ q�i ≤ U �

(1+∑ij=1 y j )

, i = 1 . . . , k − 1}],

where yi is the ith component of the observed y andU �( j) is the

jth order statistics ofU �1 , . . . ,U �

n , which is an independent copyof U . The GFD for p is then obtained by a simple transforma-tion. Hannig (2009) showed good asymptotic and small sampleproperties of this GFD.

A drawback of the solution above is its dependency on theordering of the categories. Lawrence et al. (2009) provided asolution that does not rely on a potentially arbitrary ordering ofthe categories. Their approach starts from analyzing each coor-dinate ofY individually.

As can be seen in Example 6, the fiducial inversion of eachcoordinate when ignoring the others gives a relationship Ui ≤pi ≤ 1whereUi ∼Beta(yi, 1) are independent. Additionally, thefact that

∑ki=1 pi = 1 imposes a condition

∑ki=1Ui ≤ 1. Con-

sider the following random vector with its distribution taken asthe conditional distribution

(W �0 ,W

�1 , . . . ,W

�k ) ∼ (1 −U1 − · · · −Uk,

U1, . . . ,Uk) | {U1 + · · · +Uk ≤ 1}.A straightforward calculation shows that the vector W fol-lows Dirichlet(1, y1, . . . , yk) distribution. Writing Qy(w) ={p : wi ≤ pi, i = 1, . . . , k} the GFD isV [Qy(W �)].

Denote by ei, i = 1, . . . k the coordinate unit vectors inR

k. Notice that the set Qy(w) is a simplex with vertexes{(w1, . . . ,wk)+ eiw0, i = 1, . . . , k}. The selection ruleV anal-ogous to the half correction selects each vertex with equal prob-ability and the GFD is an equal probability (1/k) mixture ofDirichlet(Y1 + 1,Y2, . . . ,Yk), ..., Dirichlet(Y1,Y2, . . . ,Yk + 1).

4.2. Median Lethal Dose (LD50)

Consider an experiment involving k dose levels x1, x2, . . . , xk.Each dose level xi is administered to ni subjects with yi positiveresponses, i = 1, 2, . . . , k. Assume that the relationship between

Page 13: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1357

dose level xi and the probability pi of a positive response can berepresented by the logistic-linear model, given by

logit(pi) = β1xi + β0 = β1(xi − μ),

where μ = −β0/β1 represents the median lethal dose (LD50)and logit(pi) = log{pi/(1 − pi)}. The parameter of interestLD50 is frequently of interest in many applied fields. Examplesinclude ameasure toxicity of a compound in a species in quantalbioassay experiments andmeasure of difficulty in item responsemodels.

There are three classical methods for estimating LD50: thedelta method, Fieller’s method, and the likelihood ratio method.If the dose–response curve is steep relative to the spread of doses,then there may be no dose groups, or at most one dose group,with observedmortalities strictly between 0% and 100%. In suchcases, the maximum likelihood estimator of β1 is not calcula-ble and the Delta method and Fieller’s method fail to provide aconfidence set. Furthermore, when the standard Wald test doesnot reject the null hypothesisβ1 = 0, Fieller’s confidence sets areeither the entire real line or unions of disjoint intervals. Likewise,if the null hypothesis could not be rejected by the likelihood ratiotest, the likelihood ratio confidence sets are either the entire realline or unions of disjoint intervals.

E, Hannig, and Iyer (2009) proposed a generalized fiducialsolution that does not suffer from these issues. They based theirinference on the following data-generating equation: LetYi j, i =1, . . . , k, j = 1, . . . , ni denote the jth subject’s response to thedose level xi. SinceYi j follows a Bernoulli distribution with suc-cess probability pi = antilogit(β0 + β1xi):

Yi j = I(0,antilogit(β0+β1xi))(Ui j), j = 1, . . . , ni, i = 1, . . . , k.

Here (β0, β1) are unknown parameters andUi j are independentstandard uniform random variables.

The GFD is well-defined using (18) and E, Hannig, and Iyer(2009) proposed to use a Gibbs sampler to implement it. Theyperformed a thorough simulation study showing that the gener-alized fiducial method compares favorably to the classical meth-ods in terms of coverage and median length of the confidenceinterval for LD(50). Moreover, the generalized fiducial methodperformed well even in the situation when the classical methodsfail. They also proved that the fiducial CIs give asymptoticallycorrect coverage, and that the effect of discretization is negligi-ble in the limit.

4.3. Discretized Observations

In practice, most datasets are rounded off in some manner, say,by a measuring instrument or by storage on a computer. Mathe-matically speaking, we do not know the exact realized valueY =y. Instead we only observe an occurrence of an event {Y ∈ Ay},for some multivariate interval Ay = [a, b) containing y and sat-isfying Pθ0 (Y � ∈ Ay) > 0, whereY � = G(U �, θ0) is an indepen-dent copy ofY .

For example, if the exact value of the random vector Y wasy = (π, e, 1.28) and due to instrument precision all the valueswere rounded to one decimal place, our observation would bethe event Ay = [3.1, 3.2)× [2.7, 2.8)× [1.2, 1.3).

Since Pθ0 (Y � ∈ Ay) > 0, the arguments in Remark 4 stillapply and the formula (18) remains valid with Qy(u) = {θ :G(u, θ) ∈ Ay}, where Ay is the closure of Ay.

Hannig (2013) proved fiducial Bernstein–von Mises theo-rem for discretized data. He assumed that we observed dis-cretized iid observations with a distribution function F(y|θ).He set F−(a|θ) = inf{y : F(y|θ) ≥ a} and assumed the data-generating equation

Yi = F−(Ui | θ), i = 1, . . . , n,

whereYi are randomvariables, θ ∈ � is a p-dimensional param-eter,Ui are iidU (0, 1).

We restate the main theorem in Hannig (2013) in the lan-guage of this review article:

Theorem 5 (Hannig 2013). Suppose Assumption E.1 inAppendix E holds. Then the GFD defined by (18) has the sameasymptotically normal distribution and satisfies Assumption 1regardless the choice of V [· ]. Consequently, any collection ofsets Cn(yn) that in the limit becomes location invariant willform asymptotically correct confidence intervals.

4.4. LinearMixedModels

Despite the long history of inference procedures for normal lin-ear mixed models, a well-performing, unified inference methodis lacking. Analysis of variance (ANOVA)-based methods offer,what tends to be, model-specific solutions. Bayesian methodsallow for solutions to very complex models, but determining anappropriate prior distribution can be confusing.

Cisewski and Hannig (2012) proposed the use of GFI fordiscretized linear mixed models that avoids the issues men-tioned above. They started with the following data-generatingequation:

Y = Xβ +r∑

i=1

σi

li∑j=1

V i, jUi, j,

where X is a known n × p fixed-effects design matrix, β is thep× 1 vector of fixed effects, V i, j is the n × 1 design vector forlevel j of random effect i, li is the number of levels per randomeffect i, σ 2

i is the variance of random effect i, and the Ui, j areindependent and identically distributed standard normal ran-dom variables.

To compute the GFD in (18), Cisewski and Hannig (2012)designed a computationally efficient modification of sequentialMonte Carlo (SMC) algorithm (Doucet, De Freitas, and Gordon2001; Del Moral, Doucet, and Jasra 2006; Douc and Moulines2008). The fiducial implementation includes a custom designresampling and modification step that greatly improves the effi-ciency of the SMC algorithm for this model.

Cisewski and Hannig (2012) performed a thorough simu-lation study showing that the proposed method yields confi-dence interval estimation for all parameters of balanced andunbalanced normal linear mixed models. The fiducial intervalswere as good as or better than the best tailor made ANOVA-based solutions for the simulation scenarios covered. In addi-tion, for the models considered by Cisewski and Hannig (2012)

Page 14: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1358 J. HANNIG ET AL.

and for the prior selected based on recommendations in the lit-erature, the Bayesian interval lengths were not generally com-petitive with the other methods used in the study.

The authors point out that even though more variationwas incorporated into the data for the generalized fiducialmethod due to the use of discretized data, the generalized fidu-cial method tended to maintain stated coverage (or be con-servative) while having average interval lengths comparable orshorter than other methods even though the competing meth-ods assumed the data are observed exactly.

5. Computational Issues

This section presents some computational challenges involvedwhen applying GFI in practice and some possible solutions tosolve these challenges.

For any given model, we recall that the GFD is defined as theweak limit in (2) and under fairly general conditions, the weaklimit has a density r(θ|y) given in (3). This density can often beused directly to form estimates and asymptotic confidence inter-vals for themodel parameters, in a similarmanner as the densityof the posterior distribution in the Bayesian paradigm. Standardsampling techniques such as MCMC, importance sampling, orsequentialMonte Carlo have been successfully implemented, forexample, Hannig et al. (2006a), Hannig (2009), Hannig and Lee(2009),Wandler andHannig (2012b), and Cisewski andHannig(2012).

The exact form of generalized fiducial density could be hardto compute. For this reason, Hannig, Lai, and Lee (2014) pre-sented a computationally tractable solution for conducting gen-eralized fiducial inference without knowing the exact closedform of the generalized fiducial distribution.

5.1. Evaluating the Generalized Fiducial Density viaSubsampling

In some situations, even the denominator of the density r(θ|y)becomes too complicated to evaluate directly, particularly sowhen the l∞ norm is used in (2). In such situations, the functionD(·) in (4) is a sum over all possible tuples of length p, that is,D(A) = ∑

i=(i1,...,ip) |det(A)i|. If we have n observations, thereare in total

(np

)number of possible tuples. If the sum cannot be

simplified analytically, one is obliged to compute all(np

)terms.

Such computations can become prohibitively expensive even formoderate n and p. Appropriate approximations are required toevaluate the density efficiently.

If the observations are iid and l∞ norm is used,D( d

dθG(u, θ)|u=G−1(y,θ)) is a U -statistic. Given the strong

dependency of the terms in D(·), it seems possible to use muchless than

(np

)terms for approximation without loss of accuracy.

Blom (1976) showed that incomplete U -statistic based onrandom selection of K subsamples behaves very similar to thecomplete U -statistic when n and K are large. On the basis ofthis result, Hannig (2009), and its follow-up articles, we suggestto replace D(·) by

D(A; IK ) =∑i∈IK

|det(A)i| ,

where IK is a random selection of K different p-tuples. Numeri-cal simulations confirm that this approximation is very promis-ing for a wide range of applications. In practice, a commonchoice of K would be in the order of hundreds. One may wantto choose K keeping in mind that a small K may fail to yield agood enough approximation. On the other hand, a large valueof K would cause too much computations and it may be notfavorable.

In most algorithms such as an MCMC sampler, the densityis repeatedly evaluated for different values of θ. We recommendto keep the same choice of IK for different values of θ to gainstability of the algorithm.

The above discussion also applies to the generalized fiducialdensity (13) when model selection is involved.

6. Concluding Remarks and Open Problems

Aftermany years of investigations, the authors and collaboratorshave demonstrated that GFI is a useful and promising approachfor conducting statistical inference. GFI has been validated byasymptotic theory and by simulation in numerous small sampleproblems. In this article, we have summarized the latest theoret-ical and methodological developments and applications of GFI.To conclude, we list some open and important research prob-lems about GFI.

1. As mentioned earlier, the choice of data-generatingequation G in (1) is not unique for many problems.Based on our practical experience gained from simula-tions, GFD-based intervals are usually conservative andoften quite short as compared to competing methodsfor small sample sizes. This property is not well under-stood as traditional asymptotic tools (including higherorder asymptotics) do not explain it. Understanding thisnonasymptotic phenomenon will likely help both withdeeper understanding of GFI and the optimal choice ofG. Although our numerical experience suggests that dif-ferent choices ofG only lead to small differences in prac-tical performances, it would still be important to developan objective method for choosing G.

2. As an interesting alternative, one could modify the GFDdefinition (2) by adding a penalty term p(·) on θ toencourage sparse solutions:

limε→0

[arg min

θ�‖y − G(U �, θ�)‖

+ p(θ�)∣∣∣ ‖y − G(U �, θ�)‖ ≤ ε

]. (19)

For example, in the context of linear regression with anl1 penalty p(·), just as the lasso (Tibshirani 1996) andDantzig selector (Candes andTao 2007) do, (19) will leadto sparse solutions. We stress that while obtaining sparsepoint estimators through a minimization problem hasbecome a standard technique, (19) produces sparse dis-tributions on the parameter space also as a result of opti-mization. This is different from sparse posterior distri-butions obtained as a result of sparsity priors. The hope

Page 15: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1359

is that this approach will lead to computationally effi-cient ways of quantifying uncertainty in model selectionprocedures.

3. One possible way to gain a deeper philosophical under-standing of GFI is to find a general set of conditionsunder which GFI is in some sense an optimal data-dependent distribution on the parameter space (assum-ing such a set exists). The work of Taraldsen andLindqvist (2013) that provides an initial result on a con-nection between decision theory and fiducial inferencewould be a good starting point.

4. It would be interesting to investigate the performance ofGFI when the data-generating equation is misspecified.For example, what would happen to the empirical confi-dence interval coverages ifN(0, 1) is used as the randomcomponent when the truth is in fact t with 3 degrees offreedom?

Lastly, we hope that our contributions to GFI will stimu-late the growth, usage, and interest of this exciting approachfor statistical inference in various research and applicationcommunities.

SupplementaryMaterials

The online supplementary materials contain the appendices for the article,and code for many of the methods in this review.

Acknowledgment

The authors are thankful to Yifan Cui who has found numerous typos in anearlier version of this article. They are also most grateful to the reviewersand the associate editor for their constructive and insightful comments andsuggestions.

Funding

Hannig was supported in part by the National Science Foundation underGrant Nos. 1016441, 1633074, and 1512893. Lee was supported in part bythe National Science Foundation under Grant Nos. 1209226, 1209232, and1512945.

References

Barnard, G. A. (1995), “Pivotal Models and the Fiducial Argument,” Inter-national Statistical Reviews, 63, 309–323. [1346]

Bayarri, M. J., Berger, J. O., Forte, A., and García-Donato, G. (2012), “Crite-ria for BayesianModel ChoiceWith Application to Variable Selection,”The Annals of Statistics, 40, 1550–1577. [1347]

Beaumont, M. A., Zhang, W., and Balding, D. J. (2002), “ApproximateBayesian Computation in Population Genetics,” Genetics, 162, 2025–2035. [1349]

Berger, J. O. (1992), “On the Development of Reference Priors” (with dis-cussion), Bayesian Statistics, 4, 35–60. [1347]

Berger, J. O., Bernardo, J. M., and Sun, D. (2009), “The Formal Def-inition of Reference Priors,” The Annals of Statistics, 37, 905–938.[1347,1350,1351]

——— (2012), “Objective Priors for Discrete Parameter Spaces,” Journal ofthe American Statistical Association, 107, 636–648. [1347]

Berger, J. O., and Pericchi, L. R. (1996), “The Intrinsic Bayes Factor forModel Selection and Prediction,” Journal of the American StatisticalAssociation, 91, 109–122. [1354]

——— (2001), “Objective BayesianMethods forModel Selection: Introduc-tion and Comparison,” inModel Selection (IMS Lecture Notes Monogr.

Ser., Vol. 38), ed. P. Lahiri, Beachwood,OH: Inst.Math. Statist., pp. 135–207. [1354]

Berger, J. O., and Sun, D. (2008), “Objective Priors for the Bivariate NormalModel,” The Annals of Statistics, 36, 963–982. [1347]

Birnbaum, A. (1961), “On the Foundations of Statistical Inference:Binary Experiments,” The Annals of Mathematical Statistics, 32, 414–435. [1348]

——— (1962), “On the Foundations of Statistical Inference,” Journal of theAmerican Statistical Association, 57, 269–326. [1350]

Blom, G. (1976), “Some Properties of Incomplete U-Statistics,” Biometrika,63, 573–580. [1358]

Candes, E., and Tao, T. (2007), “The Dantzig Selector: Statistical EstimationWhen p is Much Larger Than n,” The Annals of Statistics, 35, 2313–2351. [1358]

Casella, G., and Berger, R. L. (2002), Statistical Inference (2nd ed.), PacificGrove, CA: Wadsworth and Brooks/Cole Advanced Books and Soft-ware. [1348,1356]

Chen, J., and Chen, Z. (2008), “Extended Bayesian Information Criteria forModel Selection With Large Model Spaces,” Biometrika, 95, 759–771.[1355]

Chiang, A. K. L. (2001), “A Simple General Method for Constructing Con-fidence Intervals for Functions of Variance Components,” Technomet-rics, 43, 356–367. [1347]

Cisewski, J., andHannig, J. (2012), “Generalized Fiducial Inference for Nor-mal Linear Mixed Models,” The Annals of Statistics, 40, 2102–2127.[1347,1348,1357,1358]

Dawid, A. P., and Stone,M. (1982), “The Functional-Model Basis of FiducialInference,” The Annals of Statistics, 10, 1054–1074. [1346]

Dawid, A. P., Stone, M., and Zidek, J. V. (1973), “Marginalization Paradoxesin Bayesian and Structural Inference,” Journal of the Royal StatisticalSociety, Series B, 35, 189–233. [1346]

Del Moral, P., Doucet, A., and Jasra, A. (2006), “Sequential Monte CarloSamplers,” Journal of the Royal Statistical Society, Series B, 68, 411–436.[1357]

Dempster, A. P. (1966), “New Methods for Reasoning Towards PosteriorDistributions Based on Sample Data,” The Annals of MathematicalStatistics, 37, 355–374. [1346]

——— (1968), “A Generalization of Bayesian Inference” (with discussion),Journal of the Royal Statistical Society, Series B, 30, 205–247. [1346]

——— (2008), “The Dempster-Shafer Calculus for Statisticians,” Interna-tional Journal of Approximate Reasoning, 48, 365–377. [1346,1348]

Douc, R., and Moulines, E. (2008), “Limit Theorems for Weighted SamplesWith Applications to Sequential Monte Carlo Methods,” The Annals ofStatistics, 36, 2344–2376. [1357]

Doucet, A., De Freitas, N., and Gordon, N. (2001), Sequential Monte CarloMethods in Practice, New York: Springer. [1357]

E, L., Hannig, J., and Iyer, H. K. (2008), “Fiducial Intervals for VarianceComponents in an Unbalanced Two-Component Normal Mixed Lin-ear Model,” Journal of the American Statistical Association, 103, 854–865. [1347]

——— (2009), “Applications of Generalized Fiducial Inference,” Ph.D. Dis-sertation, Colorado State University, Fort Collins, CO. [1348,1357]

Edlefsen, P. T., Liu, C., andDempster, A. P. (2009), “Estimating Limits FromPoisson Counting Data Using Dempster–Shafer Analysis,” The Annalsof Applied Statistics, 3, 764–790. [1346]

Efron, B. (1998), “R.A. Fisher in the 21st Century,” Statistical Science, 13,95–122. [1356]

Fan, J., and Lv, J. (2008), “Sure Independence Screening for UltrahighDimensional Feature Space,” Journal of the Royal Statistical Society,Series B, 70, 849–911. [1355]

Fisher, R. A. (1922), “On the Mathematical Foundations of TheoreticalStatistics,” Philosophical Transactions of the Royal Society of London,Series A, 222, 309–368. [1346]

——— (1925), “Theory of Statistical Estimation,” Proceedings of the Cam-bridge Philosophical Society, 22, 700–725. [1346]

——— (1930), “Inverse Probability,” Proceedings of the Cambridge Philo-sophical Society, xxvi, 528–535. [1346]

——— (1933), “The Concepts of Inverse Probability and Fiducial Probabil-ity Referring to Unknown Parameters,” Proceedings of the Royal Societyof London, Series A, 139, 343–348. [1346]

Page 16: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

1360 J. HANNIG ET AL.

——— (1935), “The Fiducial Argument in Statistical Inference,” The Annalsof Eugenics, VI, 91–98. [1346]

Fraser, A. M., Fraser, D. A. S., and Staicu, A.-M. (2009), “The SecondOrder Ancillary: A Differential ViewWith Continuity,” Bernoulli. Offi-cial Journal of the Bernoulli Society forMathematical Statistics and Prob-ability, 16, 1208–1223. [1347]

Fraser, D., andNaderi, A. (2008), “ExponentialModels: Approximations forProbabilities,” Biometrika, 94, 1–9. [1347]

Fraser, D., Reid, N., and Wong, A. (2005), “What a Model With Data SaysAbout Theta,” International Journal of Statistical Science, 3, 163–178.[1347]

Fraser, D. A. S. (1961a), “On Fiducial Inference,” The Annals of Mathemat-ical Statistics, 32, 661–676. [1346]

———(1961b), “The FiducialMethod and Invariance,”Biometrika, 48, 261–280. [1346]

———(1966), “Structural Probability and aGeneralization,”Biometrika, 53,1–9. [1346]

———(1968),The Structure of Inference, NewYork-London-Sydney:Wiley.[1346]

——— (2004), “Ancillaries and Conditional Inference,” Statistical Science,19, 333–369. [1347]

——— (2011), “Is Bayes Posterior Just Quick and Dirty Confidence?” Sta-tistical Science, 26, 299–316. [1347]

Fraser, D. A. S., Reid, N., Marras, E., and Yi, G. Y. (2010), “Default Priorsfor Bayesian and Frequentist Inference,” Journal of the Royal StatisticalSociety, Series B, 72. [1347,1350]

Glagovskiy, Y. S. (2006), “Construction of Fiducial Confidence IntervalsFor the Mixture of Cauchy and Normal Distributions,” Master’s The-sis, Department of Statistics, Colorado State University, Fort Collins,CO. [1347]

Hannig, J. (2009), “On Generalized Fiducial Inference,” Statistica Sinica, 19,491–544. [1347,1348,1349,1352,1356,1358]

———(2013), “Generalized Fiducial Inference via Discretization,” StatisticaSinica, 23, 489–514. [1347,1348,1349,1350,1357]

——— (2014), Discussion of “On the Birnbaum Argument for the StrongLikelihood Principle” by D. G. Mayo, Statistical Science, 29, 254–258.[1358]

Hannig, J., E, L., Abdel-Karim, A., and Iyer, H. K. (2006a), “Simultane-ous Fiducial Generalized Confidence Intervals for Ratios of Means ofLognormal Distributions,” Austrian Journal of Statistics, 35, 261–269.[1347,1358]

Hannig, J., Iyer,H. K., and Patterson, P. (2006b), “Fiducial GeneralizedCon-fidence Intervals,” Journal of American Statistical Association, 101, 254–269. [1347]

Hannig, J., Iyer, H. K., and Wang, J. C.-M. (2007), “Fiducial Approach toUncertainty Assessment: Accounting for Error Due to Instrument Res-olution,”Metrologia, 44, 476–483. [1347]

Hannig, J., Lai, R. C. S., and Lee, T. C. M. (2014), “Computational Issuesof Generalized Fiducial Inference,” Computational Statistics and DataAnalysis, 71, 849–858. [1358]

Hannig, J., and Lee, T. C. M. (2009), “Generalized Fiducial Inference forWavelet Regression,” Biometrika, 96, 847–860. [1347,1354,1355,1358]

Hannig, J.,Wang, C.M., and Iyer,H. K. (2003), “UncertaintyCalculation forthe Ratio of DependentMeasurements,”Metrologia, 4, 177–186. [1347]

Hannig, J., and Xie, M. (2012), “A Note on Dempster-Shafer Recombina-tions of Confidence Distributions,” Electrical Journal of Statistics, 6,1943–1966. [1347,1356]

Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999),“Bayesian Model Averaging: A Tutorial” (with discussion), StatisticalScience, 14, 382–417; corrected version available at http://www.stat.washington.edu/www/research/online/hoetingl999.pdf. [1354]

Iyer, H. K., and Patterson, P. (2002), “A Recipe for Constructing General-ized Pivotal Quantities and Generalized Confidence Intervals,” Techni-cal Report 10, Department of Statistics, Colorado State University, FortCollins CO. [1350]

Iyer, H. K.,Wang, C.M. J., andMathew, T. (2004), “Models and ConfidenceIntervals for True Values in Interlaboratory Trials,” Journal of the Amer-ican Statistical Association, 99, 1060–1071. [1347]

Jeffreys, H. (1940), “Note on the Behrens-Fisher Formula,” The Annals ofEugenics, 10, 48–51. [1346]

Lai, R. C. S., Hannig, J., and Lee, T. C. M. (2015), “Generalized FiducialInference for Ultra-High Dimensional Regression,” Journal of Ameri-can Statistical Association, 110, 760–772. [1347,1354,1355]

Lawrence, E., Liu, C., VanderWiel, S., and Zhang, J. (2009), “ANewMethodfor Multinomial Inference Using Dempster-Shafer Theory.” [1356]

Lee, T. C. M. (2002), “Tree-Based Wavelet Regression for Correlated Datausing theMinimumDescription Length Principle,”Australian andNewZealand Journal of Statistics, 44, 23–39. [1355]

Lindley, D. V. (1958), “Fiducial Distributions and Bayes’ Theorem,” Journalof the Royal Statistical Society, Series B, 20, 102–107. [1346]

Liu, Y., and Hannig, J. (2016), “Generalized Fiducial Inference for BinaryLogistic Item Response Models,” Psychometrica, 81, 290–324. [1347]

Majumder, P. A., and Hannig, J. (2015), “Higher Order Asymptotics forGeneralized Fiducial Inference,” unpublished manuscript. [1347,1353]

Martin, R., and Liu, C. (2013), “Inferential Models: A Framework for Prior-Free Posterior Probabilistic Inference,” Journal of the American Statis-tical Association, 108, 301–313. [1346]

——— (2015a), “Conditional Inferential Models: Combining Informationfor Prior-Free Probabilistic Inference,” Journal of the Royal StatisticalSociety, Series B, 77, 195–217. [1346,1353]

——— (2015b), Inferential Models: Reasoning with Uncertainty (Vol. 145),Boca Raton, FL: CRC Press. [1346]

——— (2015c), “Marginal Inferential Models: Prior-Free ProbabilisticInference on Interest Parameters,” Journal of the American StatisticalAssociation, 110, 1621–1631. [1346]

Martin, R., and Walker, S. G. (2014), “Asymptotically Minimax EmpiricalBayes Estimation of a Sparse Normal Mean Vector,” Electronic Journalof Statistics, 8, 2188–2206. [1353]

Martin, R., Zhang, J., and Liu, C. (2010), “Dempster-Shafer Theory andStatistical Inference With Weak Beliefs,” Statistical Science, 25, 72–87. [1346]

McNally, R. J., Iyer, H. K., and Mathew, T. (2003), “Tests for Individual andPopulation Bioequivalence Based on Generalized p-Values,” Statisticsin Medicine, 22, 31–53. [1347]

Patterson, P., Hannig, J., and Iyer, H. K. (2004), “Fiducial Generalized Con-fidence Intervals for Proportionof Conformance,” Technical Report2004/11, Colorado State University, Fort Collins, CO. [1347]

Salome, D. (1998), “Statistical Inference via Fiducial Methods,”Ph.D. dissertation, University of Groningen, Groningen, TheNetherlands. [1346]

Schweder, T., and Hjort, N. L. (2002), “Confidence and Likelihood,” Scan-dinavian Journal of Statistics, 29, 309–332. [1346,1356]

Singh, K., Xie, M., and Strawderman, W. E. (2005), “Combining Informa-tion From Independent Sources Through Confidence Distributions,”The Annals of Statistics, 33, 159–183. [1346]

Sonderegger, D., and Hannig, J. (2014), “Fiducial Theory for Free-Knot Splines,” in Contemporary Developments in Statistical Theory,a Festschrift in honor of Professor Hira L. Koul, eds. S. Lahiri, A.Schick, A. SenGupta, T. N. Sriram, New York: Springer, pp. 155–189.[1347,1352]

Stevens, W. L. (1950), “Fiducial Limits of the Parameter of a DiscontinuousDistribution,” Biometrika, 37, 117–129. [1346]

Taraldsen, G., and Lindqvist, B. H. (2013), “Fiducial Theory and OptimalInference,” The Annals of Statistics, 41, 323–341. [1347]

Tibshirani, R. (1996), “Regression Shrinkage and Selection via theLasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288.[1358]

Tsui, K.-W., and Weerahandi, S. (1989), “Generalized p-Values in Signif-icance Testing of Hypotheses in the Presence of Nuisance Param-eters,” Journal of the American Statistical Association, 84, 602–607. [1347]

——— (1991), “Corrections: “Generalized p-Values in Significance Testingof Hypotheses in the Presence of Nuisance Parameters,”Journal of theAmerican Statistical Association, 86, 256 (Journal of the American Sta-tistical Association, 84, 1989, 602–607). [1347]

Tukey, J. W. (1957), “Some ExamplesWith Fiducial Relevance,” The Annalsof Mathematical Statistics, 28, 687–695. [1346]

Veronese, P., and Melilli, E. (2015), “Fiducial and Confidence Distributionsfor Real Exponential Families,” Scandinavian Journal of Statistics, 42,471–484. [1347]

Page 17: Generalized Fiducial Inference: A Review and New Results · 2021. 8. 16. · The strengths and limitations of the generalized fiducial approach are becoming better understood, see,

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1361

Wandler, D. V., andHannig, J. (2011), “Fiducial Inference on theMaximumMean of a Multivariate Normal Distribution,” Journal of MultivariateAnalysis, 102, 87–104. [1347]

——— (2012a), “A Fiducial Approach to Multiple Comparisons,” Journal ofStatistical Planning and Inference, 142, 878–895. [1347]

——— (2012b), “Generalized Fiducial Confidence Intervals for Extremes,”Extremes, 15, 67–87. [1347,1358]

Wang, J. C.-M., Hannig, J., and Iyer, H. K. (2012a), “Fiducial PredictionIntervals,” Journal of Statistical Planning and Inference, 142, 1980–1990.[1347,1353]

——— (2012b), “Pivotal Methods in the Propagation of Distributions,”Metrologia, 49, 382–389. [1347]

Wang, J. C.-M., and Iyer, H. K. (2005), “Propagation of Uncertainties inMeasurements Using Generalized Inference,”Metrologia, 42, 145–153.[1347]

——— (2006a), “A Generalized Confidence Interval for a Measurand in thePresence of Type-A and Type-B Uncertainties,”Measurement, 39, 856–863. [1347]

——— (2006b), “Uncertainty Analysis for Vector Measurands Using Fidu-cial Inference,”Metrologia, 43, 486–494. [1347]

Wang, Y. H. (2000), “Fiducial Intervals: What Are They?” The AmericanStatistician, 54, 105–111. [1347]

Weerahandi, S. (1993), “Generalized Confidence Intervals,” Journal of theAmerican Statistical Association, 88, 899–905. [1347]

——— (1994), Correction: “Generalized Confidence Intervals,” Journal ofthe American Statistical Association, 89, 726 (Journal of the AmericanStatistical Association, 88, 1993, 899–905). [1347]

——— (1995), Exact Statistical Methods for Data Analysis, Springer Series inStatistics, New York: Springer-Verlag. [1347]

Welch, B. L., and Peers, H. W. (1963), “On Formulae for Confidence PointsBased on Integrals of Weighted Likelihoods,” Journal of the Royal Sta-tistical Society, Series B, 25, 318–329. [1353]

Wilkinson, G. N. (1977), “On Resolving the Controversy in StatisticalInference,” Journal of the Royal Statistical Society, Series B, 39, 119–171. [1346]

Xie, M., Liu, R. Y., Damaraju, C. V., and Olson, W. H. (2013),“Incorporating External Information in Analyses of Clinical TrialsWith Binary Outcomes,” The Annals of Applied Statistics, 7, 342–368. [1347]

Xie, M., and Singh, K. (2013), “Confidence Distribution, the FrequentistDistribution Estimator of a Parameter: A Review,” International Statis-tical Review, 81, 3–39. [1346,1348]

Xie, M., Singh, K., and Strawderman, W. E. (2011), “Confidence Distribu-tions and aUnified Framework forMeta-Analysis,” Journal of the Amer-ican Statistical Association, 106, 320–333. [1346]

Xu, X., and Li, G. (2006), “Fiducial Inference in the Pivotal Family of Dis-tributions,” Science in China, Series A, 49, 410–432. [1347]

Yang, R., and Berger, J. O. (1997), “A Catalogue of Noninformative Priors,”Technical Report ISDS 97–42, Duke University, Durham, NC. [1350]

Zhang, C., and Huang, J. (2008), “The Sparsity and Bias of the Lasso Selec-tion in High-Dimensional Linear Regression,” The Annals of Statistics,36, 1567–1594. [1355]

Zhang, J., and Liu, C. (2011), “Dempster-Shafer Inference With WeakBeliefs,” Statistica Sinica, 21, 475–494. [1346]


Recommended