pdfs.semanticscholar.org€¦ · On Fiducial Inference – the good, the bad and the ugly Jan...

On Fiducial Inference – the good, the bad andthe ugly

Jan Hannig∗

Department of StatisticsColorado State University

February 23, 2006

Abstract

R. A. Fisher’s fiducial inference has been the subject of many dis-cussions and controversies ever since he introduced the idea duringthe 1930’s. The idea experienced a bumpy ride, to say the least, dur-ing its early years and one can safely say that it eventually fell intodisfavor among mainstream statisticians. However, it appears to havemade a resurgence recently under the label of generalized inference.In this new guise fiducial inference has proved to be a useful tool forderiving statistical procedures for problems where frequentist meth-ods with good properties were previously unavailable. Therefore webelieve that the fiducial argument of R.A. Fisher deserves a fresh lookfrom a new angle.

In this paper we first generalize Fisher’s fiducial argument and ob-tain a fiducial recipe applicable in virtually any situation. We demon-strate this fiducial recipe on many examples of varying complexity.We also investigate, by simulation and by theoretical considerations,some properties of the statistical procedures derived by the fiducialrecipe. In particular, we compare the properties of fiducial inferenceto the properties of Bayesian inference and observe that the two sharemany common strengths and weaknesses.

∗Jan Hannig’s research is supported in part by National Science Foundation undergrant DMS-0504737.

1

In addition to the theoretical considerations mentioned above wealso derive the fiducial distribution and verify its viability by simula-tions for several examples that are of independent interest. In particu-lar we derive fiducial distributions for the parameters of a multinomialdistribution, for the means, variances, and the mixing probability of amixture of two normal distributions, and for the variance componentsin a simple one-way random linear model.

Key words: Fiducial inference, structural inference, generalized inference,asymptotics, multinomial distribution, mixture of normal distributions, MCMC.

1 Introduction

R. A. Fisher introduced the idea of fiducial probability and fiducial inference(Fisher 1930) in an attempt to overcome what he saw as a serious deficiencyof the Bayesian approach to inference – use of a prior distribution on modelparameters even when no information was available regarding their values.Although he discussed fiducial inference in several subsequent papers, thereappears to be no rigorous definition of a fiducial distribution for a vectorparamater θ based on sample observations. In the case of a one-parameterfamily of distributions, Fisher gave the following definition for a fiducialdensity f(θ|x) of the parameter based on a single observation x for the casewhere the cdf F (x|θ) is a monotonic decreasing function of θ:

f(θ|x) ∝ −∂F (x|θ)∂θ

. (1)

Fisher illustrated the application of fiducial probabilities by means of a nu-merical example consisting of four pairs of observations from a bivariatenormal distribution with unknown mean vector and covariance matrix. Forthis example he derived fiducial limits (one-sided interval estimates) for thepopulation correlation coefficient ρ. Fisher proceeded to refine the concept offiducial inference in several subsequent papers (Fisher 1935a). In his 1935 pa-per titled “The Fiducial Argument in Statistical Inference” Fisher explainedthe notion of Fiducial inference for µ based on a random sample from aN(µ, σ2) distribution where σ is unknown. The process of obtaining a fidu-cial distribution for µ was based on the availability of the student’s t-statisticthat served as a pivotal quantity for µ. In this same 1935 paper, Fisher dis-cussed the notion of a fiducial distribution for a single future observation x

2

from the same N(µ, σ2) distribution based on a random sample x1, . . . , xn.For this he used the fact that

T =x− x

s/√

n

is a pivotal quantity. He then proceeded to consider the fiducial distributionfor the mean x′ and s′, the mean and the standard deviation, respectively,of m future observations xn+1, . . . , xn+m. By letting m tend to infinity, heobtained a simultaneous fiducial distribution for µ and σ. He also stated“In general, it appears that if statistics T1, T2, T3, . . . contain jointly thewhole of the information available respecting parameters θ1, θ2, θ3, . . ., andif functions t1, t2, t3, . . . of the T ’s and θ’s can be found, the simultaneousdistribution of which is independent of θ1, θ2, θ3, . . ., then the fiducial dis-tribution of θ1, θ2, θ3, . . . simultaneously may be found by substitution.” Inessence Fisher had proposed a recipe for constructing simultaneous fiducialdistributions for vector parameters. He applied this recipe to the problem ofinterval estimation of µ1−µ2 based on independent samples from two normaldistributions N(µ1, σ

21) and N(µ2, σ

22) with unknown means and variances.

This is the celebrated Behrens-Fisher problem. Fisher noted that the result-ing inference regarding µ1 − µ2 coincided with the approach proposed muchearlier by Behrens (1929). He alluded to the test of the null hypothesis ofno difference, based on the fiducial distribution of µ1 − µ2 as an exact test.This resulted in much controversy as it was noted by Fisher’s contemporariesthat the Behrens-Fisher test was not an exact test in the usual frequentistsense. Moreover, this same test had been obtained by Jeffreys (1940) usinga Bayesian argument with noninformative priors (now known as Jeffreys pri-ors). Fisher argued that, while Jeffreys approach gave the same answer asthe fiducial approach, the logic behind Jeffreys derivation was unacceptablebecause of the use of an unjustified prior distribution on the parameters.Fisher particularly objected to the practice of using uniform priors to modelignorance. This led to further controversy especially between Fisher andJeffreys.

In the same 1935 paper, Fisher gave a second example of application ofhis recipe by deriving a fiducial distribution for φ in the balanced one-wayrandom effects model

Yij = µ + ai + eij, i = 1, . . . , n1; j = 1, . . . , n2

where ai ∼ N(0, φ), eij ∼ N(0, θ), and all random variables are independent.An issue that arose from his treatment of this problem is that the fiducial

3

distribution assigned a positive probability to the event φ < 0 in spite of thefact that φ is a variance.

Fisher’s 1935 paper resulted in a flurry of activity in fiducial inference.Most of this activity was directed towards finding deficiencies in fiducial in-ference and philosophical concerns regarding the interpretation of fiducialprobability. The controversy seems to have risen once Fisher’s contempo-raries realized that, unlike the case in early simple applications involvinga single parameter, fiducial inference often led to procedures that were notexact in the frequentist sense. For a detailed discussion of the controversiesconcerning fiducial inference, the reader is referred to Zabell (1992). Fraser,in a series of articles (Fraser (1961), Fraser (1966)), attempted to provide arigorous framework for making inferences along the lines of Fisher’s fiducialinference. He called his approach structural inference. Wilkinson (1977) at-tempted to explain and/or resolve some of the controversies regarding fiducialinference. Dawid & Stone (1982) provided further insight by, among otherthings, studying situations where fiducial inference led to exact confidencestatements. A wealth of additional references on fiducial inference can befound in Salome (1998). Nevertheless, it is fair to say that fiducial inferencefailed to secure a place in mainstream statistics.

In Tsui & Weerahandi (1989), a new approach was proposed for construct-ing hypothesis tests using the concept of generalized P values and this ideawas later extended to a method of constructing generalized confidence inter-vals using generalized pivotal quantities (Weerahandi 1993). Several papershave appeared since, in leading statistical journals, where confidence inter-vals have been constructed using generalized pivotal quantities in problemswhere exact frequentist solutions are unavailable. For a thorough expositionof generalized inference see Weerahandi (2004). Iyer & Patterson (2002) andHannig, Iyer & Patterson (2006b) noted that every published generalized con-fidence interval was obtainable using the fiducial/structural arguments. Infact, Hannig et al. (2006b) not only established a clear connection betweenfiducial intervals and generalized confidence intervals, but also proved theasymptotic frequentist correctness of such intervals. They further providedsome general methods for constructing GPQs. In particular, they showedthat a special class of GPQs called fiducial GPQs (FGPQ) provide a directfrequentist interpretation to fiducial inference. However, their article focusedon continuous distributions and did not address discrete distributions.

It is interesting to note that not much has been written about fiducialinference for parameters of a discrete distribution. Even for the single param-

4

eter case such as the binomial distribution Fisher was aware, that there weredifficulties with defining a unique fiducial density for the unknown binomialparameter π. In his 1935 paper (Fisher 1935b) titled “The Logic of Induc-tive Inference”, Fisher gives an example where he suggests a clever devicefor “turning a discontinuous distribution, leading to statements of fiducialinequality, into a continuous distribution, capable of yielding exact fiducialstatements, by means of a modification of experimental procedure.” His devicewas to introduce randomization into the experimental procedure and is akinto randomized decision procedures. Inspired by Fisher’s example, Stevens(1950) gave a more formal treatment of this problem where he used a sup-plementary random variable in an attempt to define a unique fiducial densityfor a parameter of a discrete distribution. He discussed his approach in greatdetail using the binomial distribution as an illustration. Unfortunately, thisidea seems to have got lost and subsequent researchers mostly focused onfiducial inference for continuous distributions. In 1996, in his Fisher Memo-rial Lecture at the American Statistical Association annual meetings, Efrongave a brief discussion of fiducial inference with the backdrop of binomial dis-tribution. He said, “Fisher was uncomfortable applying fiducial arguments todiscrete distributions because of the ad hoc continuity corrections required, butthe difficulties caused are more theoretical than practical.” See Efron (1998).In fact, Efron’s suggestion for how to handle discrete distributions is a specialcase of Stevens (1950) proposal.

In this paper we provide a general definition for fiducial distributions forparameters that applies equally well to continuous as well as discrete par-ent distributions. The resulting inference is termed weak fiducial inference,rather than fiducial inference, to emphasize the fact that, multiple fiducialdistributions can be defined for the same parameter. However, the result-ing interval estimates have, under certain regularity conditions, asymptoticfrequentist exactness.

We close this section with some quotes. Zabell (1992) begins his Statis-tical Science paper with the statement “Fiducial inference stands as R. A.Fisher’s one great failure.” On the other hand, Efron, in his 1998 StatisticalScience paper (based on his Fisher Memorial Lecture of 1996), in the sectiondealing with fiducial inference, has said “I am going to begin with the fiducialdistribution, generally considered to be Fisher’s biggest blunder.” However,in the closing paragraph of the same section (Section 8), he says “MaybeFisher’s biggest blunder will become a big hit in the 21st century !”

5

2 The Fiducial Argument

The main aim of fiducial inference is to devise a distribution for parametersof interest that captures all of the information that the data contains aboutthese parameters. This fiducial distribution can later be used for devisinginference procedures such as confidence sets. In this sense, a fiducial dis-tribution is much like a Bayesian posterior distribution. Fisher wanted toaccomplish this without assuming a prior distribution on the parameters.

While our understanding of the fiducial argument cannot be entirely newgiven the large number of great minds who have thought about this problem,we are unaware of any prior work that formulates it in exactly the same way.The idea behind a fiducial distribution, as we understand it, can be explainedusing the following simple example. Consider a random variable X from anormal distribution with unknown mean µ and variance 1, i.e., X = µ + Zwhere Z is standard normal. If x is a realized value of X corresponding tothe realized value z of Z, then we have µ = x−z. Of course the value z is notobserved. However, a contemplated value µ0 of µ corresponds to the valuex − µ0 of z. Knowing that z is a realization from the N(0, 1) distribution,we can evaluate the likelihood of Z taking on the value x − µ0. Speakinginformally, one can say that the “plausibility” of the parameter µ taking onthe value µ0 “is the same” as the plausibility of the random variable Z takingon the value x − µ0. Using this rationale, we write µ = x − Z where x isregarded as fixed but Z is still considered a N(0, 1) random variable. Thisstep, namely, shifting from the true relationship µ = x − z (z unobserved)to the relationship µ = x−Z, is what constitutes the fiducial argument. Wecan use the relation, µ = x − Z, to define a probability distribution for µ.This distribution is called the “fiducial distribution” of µ. In particular, arandom variable M carrying the fiducial probability distribution of µ can bedefined based on the probabilities of observing the value of Z needed to getthe desired value of µ, i.e., define M so that

P (M ∈ (a, b)) = P (x− Z ∈ (a, b)) = P (Z ∈ (x− b, x− a)). (2)

It will be useful to consider the random variable M? = x− Z?, where Z? isa standard normal random variable independent of Z. This random variablehas the same distribution as M , the fiducial distribution for µ.

In conclusion, notice that to obtain a random variable that has a distri-bution described in (2) we had to take the structural equation X = µ + Z,solve for µ = X−Z and set M = x−Z?, where x is the observed value of X

6

and Z? is a random variable independent of Z having the same distributionas Z. We will generalize this idea in Section 3.

There has been a lot of controversy surrounding the fiducial argument.For example Le Cam & Yang (2000) call it a “logically erroneous” argument.The main controversy was related to the philosophical and mathematicalfoundations of the procedure and some non-uniqueness paradoxes. In thispaper we look at fiducial inference differently. We approach the fiducial ar-gument as a tool for deriving inference procedures (much like the maximumlikelihood principle). We then apply it to several examples and study its prop-erties both analytically and through simulations. In general, like Bayesianinference, fiducial inference often leads to procedures with very good frequen-tist properties. In fact, we believe that if computer simulations had been fea-sible when Fisher introduced his fiducial argument, fiducial inference mightnot have been dismissed by mainstream statisticians. Fiducial inference isoften asymptotically correct for much the same reasons as Bayesian inferenceis (see Section 5). Bayesian inference suffers from non-uniqueness due to thechoice of prior. We will show that Fiducial inference, as we present it, alsosuffers from a similar form of non-uniqueness (see Section 9).

3 Fiducial Recipe

We will now generalize the idea described in Section 2 to arbitrary statisticalmodels. Let X be a (possibly discrete) random vector with a distributionindexed by a parameter ξ ∈ Ξ. Assume that X could be expressed in thefollowing form

X = G(U, ξ), (3)

where G is a jointly measurable function and U is a uniform(0, 1) randomvariable. We define a set-valued function

Q(x, u) = {ξ : x = G(u, ξ)}. (4)

The function Q(X, U) could be understood as an inverse of the function G.Here G defines u as an implicit function of ξ and x is regarded as fixed.

Finally, assume for any measurable set S, there is a random elementV (S) with support S, where S is the closure of S. We define a weak fiducialdistribution of ξ as the conditional distribution of

V (Q(x, U?)) | Q(x, U?) 6= ∅. (5)

7

Here x is the observed value of X and U? is an independent copy of U .

Remark 1. In Equation 3, without loss of generality, U could be taken asany random variable or random vector whose distribution is free of unknownparameters, since any such distribution can be generated starting from auniform [0,1] variate. We will take advantage of this fact whenever convenientwithout further comment.

Remark 2. Notice that under Fisher’s assumptions his fiducial density is aspecial case of our definition as seen in the remark 18 in Section 9. Our formof the fiducial distribution (5) is influenced by Fraser’s structural inference –see Appendix 3 of Dawid, Stone & Zidek (1973) for a very concise descriptionof structural inference idea. The main difference is that we do not assume agroup structure which is in our opinion unnecessary and in fact conceals themain issues. See also remark 15 in Section 8.

Remark 3. The choice of a particular form of the structural equation (3)could influence the fiducial distribution. In the remainder of this paper wewill regard data represented by a different structural equation as a differentstatistical problem even if they have the same distribution, c.f., (Fraser 1968).

Remark 4. This definition could be applied, at least in principle, to semi-parametric problems. Of course then Q(X, U) will be a very large set andthe choice of V (· ) would influence the properties of the procedure to a greatextent. This is similar to the big influence the choice of a prior has in Bayesiannon-paramteric problems.

The following examples provide simple illustrations of the definition of aweak fiducial distribution.

Example 1. Suppose U = (E1, E2) where Ei are i.i.d. N(0, 1) and

X = (X1, X2) = G(µ, U) = (µ + E1, µ + E2)

for some µ ∈ R. So Xi are iid N(µ, 1). Given a realization x = (x1, x2) ofX, the set-valued function Q maps u = (e1, e2) ∈ R2 to a subset of R and isgiven by

Q(x, u) =

{{x1 − e1} if x1 − x2 = e1 − e2,

∅ if x1 − x2 6= e1 − e2.

By definition, a weak fiducial distribution for µ is the distribution of x1−E?1

conditional on E?1 − E?

2 = x1 − x2 where U? = (E?1 , E

?2) is an independent

copy of U . Hence a weak fiducial distribution for µ is N(x, 1/2) where x =(x1 + x2)/2.

8

Example 2. Suppose U = (U1, . . . , Un) is a vector of iid uniform (0, 1) randomvariables Ui. Let p ∈ [0, 1]. Let X = (X1, . . . , Xn) be defined by Xi =I(Ui < p). So Xi are iid Bernoulli random variables with success probabilityp. Suppose x = (x1, . . . , xn) is a realization of X. Let s =

∑ni=1 xi be the

observed number of 1’s. The mapping Q : [0, 1]n → [0, 1] is given by

Q(x, u) =

[0, u1:n] if s = 0,

(u1:n, 1] if s = n,

(us:n, us+1:n] if s = 1, . . . , n− 1

and∑n

i=1 I(xi = 1)I(ui ≤ us:n) = s,

∅ otherwise.

Here ur:n denotes the rth order statistic among u1, . . . , un. So a weak fiducialdistribution for p is given by the distribution of V (Q(x, U?)) conditional onthe event Q(x, U?) 6= ∅ where V (Q(x, U?)) is any random variable whosesupport is contained in Q(x, U?). By the exchangeability of U?

1 , . . . , U?n it

follows that the stated conditional distribution of V (Q(x, U?)) is the same asthe distribution of V ([0, U?

1:n]) when s = 0, V ((U?s:n, U

?s+1:n]) for 0 < s < n,

and V ((U?n:n, 1]) for s = n.

It will be useful to denote a random variable having the distributiondescribed in (5) by a Rξ(x). We will call this random variable a FiducialQuantity (FQ). Notice that

Rξ(x)D= (Rξ(X) |X = x)

and the distribution of Rξ(x) does not depend on the parameter ξ.

Remark 5. We are often interested in estimating θ = π(ξ) ∈ Rq. We canthen define

Rθ(x) = π(Rξ(x)). (6)

In some cases this does not lead to satisfactory results. This happens whenthe function π has a zero derivative at the true value of ξ. In this case onecan sometimes obtain an alternate solution by finding Y = η(X) sufficient forθ with distribution depending only on θ and base the fiducial distribution ofθ on Y instead of X. See Hannig et al. (2006b) for an example.

Remark 6. Since the distribution of Rθ(x) for each observed x is known (orat least accessible through simulations), we can use it to set up confidence

9

sets. The idea is that any confidence set based on the distributions of Rθ

should be a reasonably good confidence set for θ. This is often true at leastasymptotically and is confirmed by simulations for small samples in exampleswe have considered.

Remark 7. The definition in (5) does not lead to a unique distribution. Infact there are two sources of non-uniqueness. First source of non-uniquenessis the choice of the random variable V (Q(x, u)) if the set Q(x, u) has morethan one element. This typically happens if we deal with discrete randomvariables. In this case the choice of V (Q(x, u)) is necessarily subjective.

The second source of non-uniqueness comes from the fact that in somesituations P (Q(x, U?) 6= ∅) = 0. This situation typically arises if we dealwith continuous distributions. The nonuniqueness is caused by the fact thatthe event {Q(x, U?) 6= ∅} could be expressed using many different equationrepresentations each leading to a different conditional distribution. This isrelated to the Borel’s paradox described for example in Casella & Berger(2002), Section 4.9.3. We believe that this issue is actually more serious thanthe first. We will discuss these issues in much greater detail in Section 8below.

Remark 8. Consider a function F (X?, ξ), such that U? = F (X?, ξ) has uni-form distribution on (0, 1) . Such a function always exists if we allow for apossible additional randomization. For example this additional randomiza-tion is needed if X? is discrete.

For any value of X and U?, if we have |Q(X, U?)| = 1 then F (X?, ξ) existswithout any additional randomization and

Q(X, F (X?, ξ))

is a generalized pivot (Weerahandi 1993). In fact this is the basic idea ofthe construction of Iyer & Patterson (2002). If |Q(X, U?)| ≤ 1 one canstill define a generalized pivot using conditional distribution functions. Thisconstruction can be found in Hannig et al. (2006b). In the general casewhere |Q(X, U?)| > 1, the construction of Hannig et al. (2006b) still appliesbut one will need to use additional randomization to derive a slightly more“generalized” version of a generalized pivot. We do not further discuss thisgeneral case here. Finally we reiterate the observation of Hannig et al. (2006b)that all published generalized inference results are identical to correspondingfiducial results. These observations suggest that generalized inference couldbe viewed as yet another attempt of defining fiducial distributions.

10

Remark 9. Our definition of a fiducial distribution accommodates, in a verynatural way, problems where the parameter space is constrained to a smallerset Ξ0, e.g., N(µ, σ2) with µ > 0. All we have to do to incorporate thisadditional information into the weak fiducial distribution is to consider onlyparameters ξ ∈ Ξ0 in (4). The conditioning in (5) then makes sure that thisadditional information is incorporated into the weak fiducial distribution.

Remark 10. The approach for handling parameter constraints discussed abovesimply truncates the fiducial distribution to the constrained parameter space.Notice that Bayesian inference deals with the problem of constrained param-eter space in the same way. Alternatively, one can deal with the constrainedparameter space by mapping all the fiducial probability outside of the con-strained space to the boundary, e.g., for N(µ, σ2) with µ > 0 one can considermax(Rµ(x), 0) instead of the constrained fiducial quantity calculated basedon (4) with Ξ0 = (0,∞) × (0,∞). While this approach is not consistentwith the fiducial argument or with Bayesian inference it often leads to goodfrequentist properties. Fisher himself faced the problem of a constrained pa-rameter region in the one-way random effects model Yij = µ+Ai + eij whereAi ∼ N(0, φ) and eij ∼ N(0, θ). Fisher derived a fiducial distribution forφ that assigned a positive probability for the event φ < 0. Buehler (1980),in his article on fiducial inference in R. A. Fisher: An Appreciation, pointsout that the problem of where to put the fiducial probability associated withthe region φ < 0 has puzzled later researchers. The approach of assigningthe forbidden probability to the boundary of the parameter space has beenused by many authors in published work on generalized inference. See, forinstance, Krishnamoorthy & Mathew (2004), Iyer, Wang & Mathew (2004),and Krishnamoorthy & Mathew (2002).

4 Examples

The purpose of this section is to explain the use of the fiducial recipe onexamples. We present one discrete, one continuous and one more complicatedexample.

Fiducial inference for the multinomial distribution

The first series of examples considers fiducial inference for the Multinomialdistribution on k + 1 categories {1, . . . , k + 1}. The special case of Binomial

11

distribution (k = 1) has received some recent attention by Brown, Cai &DasGupta (2001), Brown, Cai & DasGupta (2002), and Cai (2005). Theseauthors show that the classical solutions based on normal approximations donot have good small sample properties. They recommend some alternativesolutions. The one recommendation that stands out consistently is the in-terval estimate based on the posterior distribution arising from the Jeffreysprior. Later in this article we show that this is in fact one of the fiducial in-tervals. We also show that there is another fiducial solution for the binomialparameter p that does just as well.

Example 3. Let X1, . . . , Xn be i.i.d. Multinomial(p) random variables, wherep = (p1, p2, . . . , pk), pj ∈ [0, 1], j = 1, . . . , k, and

∑kj=1 pj ≤ 1. We will derive

a weak fiducial distribution for p. Set q0 = 0 and qj =∑j

l=1 pl, j = 1, . . . , k.The structural equations for the Xi, i = 1, . . . , n could be expressed as

Xi =k∑

j=0

I[qj ,1](Ui), (7)

where U1, . . . , Un are i.i.d. Uniform(0, 1) random variables.Assume that we have observed x1, . . . , xn and denote the number of oc-

curences of j by nj. For j = 1, . . . , k + 1, define tj =∑j

r=1 nr. In particular,tk+1 = n. Let Us:n denote the sth order statistic among U1, . . . , Un. Forsimplicity of notation define t0 = 0, U0:n = 0 and Un+1:n = 1. The setQ(x,U) 6= ∅ if and only if

n =k+1∑j=1

n∑i=1

I(Xi = j)I(Ui ∈ (Utj−1:n, Utj :n]

).

In this case Q(x,U) = Q?(x,U), where

Q?(x,U) =

{(p1, . . . , pk)

∣∣∣∣∣ (q1, . . . , qk) ∈k

×j=1

(Utj :n, Utj+1:n

]}.

Here×i Ai denotes the cartesian product of the sets Ai and qi is as in (7).In particular for j = 1, . . . , k, pj = qj − qj−1 and pk+1 = 1− qk.

The exchangeability of Ui, i = 1, . . . , n then implies that the conditionaldistribution of V (Q(x,U)), conditional on the event Q(x,U) 6= ∅ is thesame as the (unconditional) distribution of V (Q?(x,U)). By our definition

12

the weak fiducial quantity is Rp(x) = V (Q?(x,U)). Equivalently there is arandom vector D = (D1, . . . , Dk) with support [0, 1]k such that

Rp(x) = (R1, R2 −R1, . . . , Rk −Rk−1)>, (8)

where Rj = Utj :n + Dj(Utj+1:n − Utj :n).Notice that if nj = 0 for some j = 2, . . . , k it would be possible to get

a negative value for Rpithe ith element of Rp. This can be prevented by

requiring the random vector D to satisfy Dj ≥ Dj−1 whenever nj = 0.The observation made in the previous paragraph implies, that the fiducial

distribution depends on the particular choice of the structural equation (7).In particular, if one or more categories are not observed in our sample, wemight get a different fiducial distribution by relabeling.

We now further investigate this fiducial quantity in two special cases, thebinomial distribution (k = 1) and the trinomial distribution (k = 2).

Special case 1 - the Binomial distribution.

Example 4. For the special case of a binomial distribution, a fiducial quantityfor p is,

Rp(x) = Us:n + D(Us+1:n − Us:n) (9)

with D being any random variable with support contained in [0, 1] and sbeing the observed number of successes.

Recall that the joint density of (Us:n, Us+1:n) is

f(Us:n,Us+1:n)(u, v) =n!

(s− 1)!(n− s− 1)!us−1(1− v)n−s−1, 0 < u < v < 1.

Therefore, the density of Rp is

fRp(p) =

∫ 1

0

∫ pd∧ 1−p

1−d

0

(n

s

)s(p− dq)s−1

× (n− s) ((1− p)− (1− d)q)n−s−1 dq dFD(d) I(0,1)(p), (10)

where FD(d) is the distribution function of D and x ∧ y = min{x, y}. Ifadditionally D is continuous with density fD, (10) simplifies to

fRp(p) =

(n

s

)∫ p

0

∫ 1

p

fD

(p− u

v − u

)sus−1(n− s)(1− v)n−s−1

v − udv du I(0,1)(p).

(11)

13

There are many reasonable choices for the distribution of D in the descrip-tion of Rp. We have considered five different choices that appeared naturalto us. For the first three choices we assumed D is random and independentof U1, . . . , Un.

The maximum entropy choice is D ∼ uniform(0, 1).The maximum variance choice suggested implicitly by Efron (1998) is

D ∼ uniform{0, 1}. We remark that a direct calculation, cf., (Grundy1956), shows that these two choices lead to fiducial distribution that is nota Bayesian posterior with respect to any prior.

The third choice D ∼ Beta(1/2, 1/2) leads to Rp ∼ Beta(s + 1/2, n− s +1/2) which is the Bayesian posterior for Jeffreys prior.

The fourth choice is a little harder to describe in terms of D. It is Rp ∼Beta(s+1, n− s+1). This is the scaled likelihood, or posterior with respectto the flat prior. Beta(s + 1, n− s + 1) is a fiducial distribution according toour definition, since it is stochastically larger than the distribution of Us:n,which is Beta(s + 1, n− s), and stochastically smaller than the distributionof Us+1:n, which is Beta(s, n − s + 1). This can be seen by noticing thatconditional on U1, . . . , Un the distribution of D is given by D = 0 withprobability Us:n, D = 1 with probability 1 − Us+1,n and D ∼ U(0, 1) withprobability Us+1:n − Us:n.

The last choice is D = 1/2 corresponding to the midpoint of the interval(Us:n, Us+1:n).

To evaluate the performance of the fiducial distribution and comparethe performance of the various choices of D we carried out an extensivesimulation study. As shown in Section 6, the fiducial inference is correctasymptotically. Therefore our simulation study concentrated mostly on smallvalues of n. In particular we considered n = 3, 6, 9, . . . , 45, 48, 100, 1000 andp = 0.01, 0.02, . . . , 0.99. For each of the combinations of n and p we simulated5000 evaluations of the probability Q(X) = P (Rp(X?) < p|X) using each ofthe five variations of fiducial distribution. If the fiducial inference were exact,the Q(X) should follow U(0, 1) distribution. The level of agreement of Q(X)with U(0, 1) distribution was examined using QQ-plots.

Since fiducial inference is a non-randomized procedure, the distributionof Q(X) can take only n values. Therefore it cannot be expected that theagreement with uniform distribution would be very good for small values of n.However, the agreement improves dramatically as n increases. To illustratethis we show the QQ-plots for n = 12 and p = .1, .3, .5, .7, .9 in Figure 1. Wealso show QQ-plots for n = 6, 21, 48, 100, 1000 and p = .3 in Figure 2.

14

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

Lower CI − coverage

entropy

variance

Jeffreys

likelihood

midpoint

n = 12

p = 0.1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 12

p = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 12

p = 0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 12

p = 0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 12

p = 0.9

Figure 1: QQ-plots of Q(X) for n = 12 and p = .1, .3, .5, .7, .9. The blackcolor correspond to an area of natural fluctuation of a QQ-plot due to ran-domness. The colored graphs correspond to the QQ-plots of the variousfiducial distributions.

15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 6

p = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 21

p = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 48

p = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 100

p = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


entropy

variance

Jeffreys

likelihood

midpoint

n = 1000

p = 0.3

Figure 2: QQ-plots of Q(X) for n = 6, 21, 48, 100, 1000 and p = .3. Theblack color correspond to an area of natural fluctuation of a QQ-plot due torandomness.. The colored graphs correspond to the QQ-plots of the variousfiducial distributions.

16

The closer the points on the QQ-plot are to the line y = x the betterthe performance of the procedure. We can see straightaway that the scaledlikelihood performs worse than any of the other choices. To make this com-parison more rigorous we compute, for each of the choices of D, the followingstatistics

A =

∫ 1

0

|FQ(x)− x| dx, and D =

∫ 1

0

(x− FQ(x)) dx,

where FQ(x) is the empirical distribution function of the observed values ofthe Q(X). Smaller values of A and D signify better overall fit. Since we areplanning to use the fiducial distribution for inference one can argue that thecenter of the distribution of Q(X) is of little importance. Therefore we willalso check for the level of agreement in the tails. To this end we define

Al =

∫ .1

0

|FQ(x)− x| dx, Dl =

∫ .1

0

(x− FQ(x)) dx,

Au =

∫ 1

.9

|FQ(x)− x| dx, and Du =

∫ 1

.9

(FQ(x)− x) dx,

Here we chose Al, Dl to describe the average fit for typical lower tail CIsand Au, Du to describe the average fit for typical upper tail CIs. In bothcases positive values of Dl and Du corresponds to being conservative whilenegative values of Dl and Du correspond to being anticonservative.

For each fixed n we plotted the graphs of these statistics as functions ofthe probability p. For illustration we show plots of of these quantities forn = 6, 21, 48, 50, 100 in Figures 3,4, and 5.

The overall conclusion is that the best choice is the maximum variancechoice of D ∼ uniform{0, 1} which is consistently better than other choices.However, D ∼ U(0, 1) and D ∼ B(1/2, 1/2) (the maximum entropy andposterior with respect to Jeffreys prior) were typically very close to the bestchoice. The last two choices were found not to be performing as well. Inparticular the scaled likelihood underperformed the other choices by a largemargin. In light of this we recommend to use the choice D ∼ uniform{0, 1}.Remark 11. Cai (2005) has investigated the two term Edgeworth expansionsfor coverage of several one-sided Binomial Confidence Intervals. We remarkthat similar calculations can be used to derive the two term Edgeworth ex-pansion for the fiducial distributions discussed here. In particular one canshow that just like confidence intervals based on the Jeffreys posterior, the

17

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

0.025 entropy

variance

Jeffreys

likelihood

midpoint

n = 6

LowerTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.01

−0.008

−0.006

−0.004

−0.002

0

0.002

0.004

0.006

0.008

0.01 entropy

variance

Jeffreys

likelihood

midpoint

n = 21

LowerTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−6

−4

−2

0

2

4

6

8

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 48

LowerTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−6

−4

−2

0

2

4

6

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 100

LowerTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−1.5

−1

−0.5

0

0.5

1

1.5

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 1000

LowerTail

p

Inte

gral

Figure 3: Plots of Al (solid line) and Dl (dashed line) as functions of p forn = 6, 21, 48, 100, 1000. Small values of Al and Dl are preferable. Positivevalues of Dl correspond to the method being conservative on average. Thevarious colors correspond to various choices for the fiducial distribution.

18

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.03

−0.02

−0.01

0

0.01

0.02

0.03entropy

variance

Jeffreys

likelihood

midpoint

n = 6

UpperTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.01

−0.008

−0.006

−0.004

−0.002

0

0.002

0.004

0.006

0.008

0.01entropy

variance

Jeffreys

likelihood

midpoint

n = 21

UpperTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−6

−4

−2

0

2

4

6

8

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 48

UpperTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−6

−4

−2

0

2

4

6

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 100

UpperTail

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−1.5

−1

−0.5

0

0.5

1

1.5

x 10−3

entropy

variance

Jeffreys

likelihood

midpoint

n = 1000

UpperTail

p

Inte

gral

Figure 4: Plots of Au (solid line) and Du (dashed line) as functions of p forn = 6, 21, 48, 100, 1000. Small values of Au and Du are preferable. Positivevalues of Du correspond to the method being conservative on average. Thevarious colors correspond to various choices for the fiducial distribution.

19

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4 entropy

variance

Jeffreys

likelihood

midpoint

n = 6

Overall

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.3

−0.2

−0.1

0

0.1

0.2

0.3 entropy

variance

Jeffreys

likelihood

midpoint

n = 21

Overall

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2entropy

variance

Jeffreys

likelihood

midpoint

n = 48

Overall

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.15

−0.1

−0.05

0

0.05

0.1

0.15 entropy

variance

Jeffreys

likelihood

midpoint

n = 100

Overall

p

Inte

gral

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04entropy

variance

Jeffreys

likelihood

midpoint

n = 1000

Overall

p

Inte

gral

Figure 5: Plots of A (solid line) and D (dashed line) as functions of p forn = 6, 21, 48, 100, 1000. Small values of A and D are preferable. The variouscolors correspond to various choices for the fiducial distribution.

20

maximum variance fiducial distribution leads to confidence intervals that isfirst order matching, cf. Ghosh (1994).

Special case 2 - the Trinomial distribution

Example 5. Some aspects of the fiducial distribution for the parameters of atrinomial has been investigated by Dempster (1968), where he used a trino-mial distribution as an example for his definition of upper and lower probabil-ities. In this example we investigate the small sample frequentist propertiesof the fiducial distribution for the trinomial parameters. There are manyreasonable choices for the distribution of D in (8). We have considered fivedifferent choices that appeared natural to us. Based on our experience fromexample 4 we take D independent of U1, . . . , Un. Here are the choices:

The maximum entropy choice is achieved by taking D as a uniform dis-tribution on (0, 1)2 if s2 > 0 and D ∼ uniform{(x, y), 0 < x < y < 1} ifs2 = 0.

The Bayesian posterior for Jeffreys prior is achieved by taking D1, D2 asi.i.d. Beta(1/2, 1/2) if s2 > 0 and D1 ∼ Beta(1/2, 1/2), D2 = 1 if s2 = 0.

The third choice is a first version of a maximum variance distribution.Here D ∼ uniform{0, 1}2 if s2 > 0 and D ∼ uniform{(0, 0), (0, 1), (1, 1)} ifs2 = 0. This is obtained by maximizing the determinant of the covariancematrix ofRp(x). Notice that it is also the uniform distribution on the verticesof Q(x, U).

The fourth choice is a second version of a maximum variance distribution.This is obtained by maximizing the smallest eigenvalue of the covariancematrix of Rp(x). Notice that this distribution is supported on the verticesof Q(x, U).

The last choice is the uniform distribution on the boundary of Q(x, U)..Finally we remark that the scaled likelihood (Bayesian posterior with

respect to flat prior) is not among the fiducial distributions and will not beincluded in the simulation.

To evaluate the performance of the fiducial distribution and compare theperformance of the various choices of D we performed an extensive simula-tion study. As shown in Section 6, the fiducial inference is correct asymptot-ically. Therefore our simulation study concentrated mostly on small valuesof n. In particular we considered n = 5, 10, 15, . . . , 30, 300 and p1, p2 ∈{0.05, 0.1, . . . , 0.95} with p1 + p2 < 1. For each of the combination of theparameters n, p1, p2 we simulated a sample of 2000 observations from the

21

trinomial distribution. For each of the trinomial observation and each of thechoice of D we generated a sample of 3000 observations from the fiducialdistribution Rp(x).

In order to evaluate the quality of the joint fiducial distribution we thenevaluated the empirical coverage of the one-sided equal tailed region. Inparticular, for any random vector X and 0 < α < 1 we define the one sidedequal tailed region C(X, α) as the set {(x.y); x ≤ x0, y ≤ y0} satisfying P (X ∈{(x.y); x ≤ x0, y ≤ y0}) = α and P ({(x, y); x > x0}) = P ({(x, y); y > y0}).Also for simplicity of formulas denote A(X,p) = infα{p ∈ C(X, α)}. Thenthe performance can be evaluated by estimating the probability Q(X) =P (Rp(X) ∈ C(Rp(X), A(Rp(X),p))|X) using the simulated data for each ofthe five variations of fiducial distribution. If the fiducial inference were exact,the Q(X) should follow U(0, 1) distribution. The level of agreement of Q(X)with a U(0, 1) distribution was examined using QQ-plots.

Since fiducial inference is a non-randomized procedure, the distribution ofQ(X) can take only finitely many values. Therefore it can be expected thatthe agreement with uniform distribution will be poor for small values of n andwill improve dramatically as n increases. Since the QQ-plots generated forthe trinomial distribution are very similar to the figure shown in example 4we do not display them here to save space.

The closer the points of the QQ-plot are to the line y = x the betterthe performance of the procedure. We define A, Al and Au as in example 4.Since we have one more parameter than in the binomial case we need a newway to display the comparison between the two procedures. For each fixedn, p1, p2 and each of the five procedures we calculated a relative efficiency ofprocedure j as minj A(j)/A(i), where A(i) is the value of A for procedurei. Values close to 1 then mean a relatively good performance, while smallvalues mean relatively bad performance.

For each fixed n we plotted an image containing a matrix of cells com-paring these relative efficiencies. The cells are then placed on the imagedepending on its value of p1 and p2. For illustration we show plots of thesequantities for n = 5, 10, 30, 300 in Figures 6,7, and 8.

The overall conclusion is that the best choice for D is the first maxi-mum variance choice (called vertex in the figures) for which we have D ∼uniform{0, 1}2. This is typically better than other choices. In particular thischoice seems to consistently outperform the Bayesian posterior computedwith respect to Jeffreys prior.

22

entropy

Jeffreys

vertex

maximin

edge

n = 5

Looking at joint

Relative Efficiency for joint, LowerTail

p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 10

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 30

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 300

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 6: Plots of relative efficiency based on Al for n = 5, 10, 30, 300. Thelonger the bar corresponding to each method the better the method. Thevarious colors correspond to various choices of D.

Fiducial Inference for N(µ, σ2)

In the following example we will derive a joint fiducial distribution for theparameters µ and σ2 based on a random sample from N(µ, σ2). We believethat it is worthwhile to demonstrate the use of the fiducial recipe in this sim-ple case. We will derive the fiducial distribution using two different methodswith additional discussion following in Section 8.

Example 6. Let X1, . . . , Xn be i.i.d. N(µ, σ2). We will offer two differentapproaches to finding the fiducial distribution. Our first approach uses aminimal sufficient statistic, (Xn, S

2n). One has the following structural equa-

23

entropy

Jeffreys

vertex

maximin

edge

n = 5

Looking at joint

Relative Efficiency for joint, UpperTail

p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 10

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 30

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 300

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 7: Plots of relative efficiency based on Au for n = 5, 10, 30, 300. Thelonger the bar corresponding to each method the better the method. Thevarious colors correspond to various choices of D.

tions.

Xn = µ +σZ√

n, S2

n =σ2V

n− 1, (12)

where Z is standard normal and V has chi-square distribution with n − 1degrees of freedom. By solving the structural equation (12) we get

Q(xn, s2n; z, v) =

{(xn −

√(n− 1)s2

n

nvz,

(n− 1)s2n

v

)}. (13)

24

entropy

Jeffreys

vertex

maximin

edge

n = 5

Looking at joint

Relative Efficiency for joint, Overall

p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 10

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 30

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

entropy

Jeffreys

vertex

maximin

edge

n = 300

Looking at joint


p1

p2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 8: Plots of relative efficiency based on A for n = 5, 10, 30, 300. Thelonger the bar corresponding to each method the better the method. Thevarious colors correspond to various choices of D.

Since the set Q(xn, s2n; z, v) is always a singleton we have

R(µ,σ2)(xn, s2n) =

(xn −

√(n− 1)s2

n

nVZ,

(n− 1)s2n

V

).

A simple calculation shows that the density of R(µ,σ2) is

fR(µ,σ2)(m, h) =

e−(m−xn)2

2h/n− (n−1)s2n

2h ((n− 1)s2n)

n−12√

π/n Γ(

n−12

)2n/2h

n2+1

I(0,∞)(h). (14)

This is the joint fiducial density proposed by Fisher (1935a).

25

We will now derive the fiducial distribution without the use the minimalsufficient statistic. This particular derivation is based on Fraser’s structuralapproach (Fraser 1968). We can describe the distribution of X by means ofthe structural equations

Xi = µ + σZi, i = 1, . . . , n.

Here Zi are i.i.d. standard normal random variables. Solving the first andsecond equations for µ, σ2 we get

Q(x1, . . . , xn; z1, . . . , zn) =

{(

z1x2−z2x1

z1−z1,(

x1−x2

z1−z2

)2)}

if xl =zixj−zjxi

zi−zj−∣∣∣xi−xj

zi−zj

∣∣∣ zl, l = 3, . . . , n

∅ otherwise.

Defining

M =z1x2 − z2x1

z1 − z2

, H =

(x1 − x2

z1 − z2

)2

, and Rl =z1x2 − z2x1

z1 − z2

−∣∣∣∣x1 − x2

z1 − z2

∣∣∣∣ zl

we can then interpret the Fiducial Distribution (5) as the conditional distri-bution of (M, H) given R = x, where R = (R3, . . . , Rn) and x = (x3, . . . , xn).A simple calculation shows that the joint fiducial density of (M, H) is

fM,H,R(m,h, x) =e−

Pni=1(m−xi)

2

2h |x1 − x2|2(2π)n/2hn/2+1

I(0,∞)(h). (15)

and therefore the fiducial distribution fM,H|R=x(m, h) is the same as the onestated in (14).

The fiducial density is associated with the usual t and chi-square distri-butions and therefore inference based on it will lead to classical inference.Therefore it is well-known (Mood, Graybill & Boes 1974) that inference basedon the distribution (14) leads to exact frequentist inference even for smallsample sizes.

Fiducial inference for a mixture of two normals

Example 7. In this example we consider the fiducial distribution for the pa-rameters of a mixture of two normal distributions. This is a prototypical

26

example that can be used to construct fiducial distributions for many otherproblems. In particular one can use the ideas demonstrated in this exam-ple to construct a robust fiducial confidence interval for a mean of a normalsample by considering a mixture of normal and Cauchy distributions. To ourknowledge this is the first time the fiducial paradigm has been used in sucha complex situation.

Let X1, . . . , Xn be independent random variables following either N(µ1, σ21)

or N(µ2, σ22) distributions. Moreover assume that each of the observation

comes from the second distribution with probability p independently of oth-ers. For identifiability reasons we assume that µ1 < µ2. We also assume thatwe observe at least two data points from each distribution. Our goal will beto find the fiducial distribution of (µ1, σ

21, µ2, σ

22, p).

We can write a set of structural equations for X1, . . . , Xn as

Xi = (µ1 + σ1Zi)I(0,p)(Ui) + (µ2 + σ2Zi)I(p,1)(Ui), i = 1, . . . , n,

where Zi are i.i.d. N(0,1) and Ui are i.i.d. U(0,1) random variables. Whenfinding the set valued function Q we need to realize that this inversion willbe stratified based on the possible assignment of the observed values xi toone of the two groups. For simplicity of notation the observed points x andcorresponding z values assigned to groups 1 and 2 are denoted by v1 . . . , vk

and h1, . . . , hk, and w1, . . . , wn−k and r1, . . . , rn−k respectively. We can thenwrite

Q(x1, . . . , xn; z1, . . . , zn, u1, . . . un)

=

{(h1v2−h2v1

h1−h2,(

v1−v2

h1−h2

)2

, r1w2−r2w1

r1−r2,(

w1−w2

r1−r2

)2)}

× (us:n, us+1:n),

for each assignment of the xi to the two groups, if

vl = h1v2−h2v1

h1−h2−∣∣∣ v1−v2

h1−h2

∣∣∣hl, l = 3, . . . , s,

and wl = r1w2−r2w1

r1−r2−∣∣∣w1−w2

r1−r2

∣∣∣ rl, l = 3, . . . , n− s;

∅ otherwise.

Similarly as in previous examples, for each possible assignment of the

observations to the two groups, set M1 = H1v2−H2v1

H1−H2, N1 =

(v1−v2

H1−H2

)2

, M2 =

R1w2−R2w1

R1−R2, N2 =

(w1−w2

R1−R2

)2

, P = Us:n + U(Us+1:n − Us:n), Kl = H1v2−H2v1

H1−H2−∣∣∣ v1−v2

H1−H2

∣∣∣Hl, l = 3, . . . , s and Ll = R1w2−R2w1

R1−R2−∣∣∣w1−w2

R1−R2

∣∣∣Rl, l = 3, . . . , n− s.

27

We then interpret the conditional distribution (5) as

limε→0+

n−2∑s=3

∑assignments

P(M1 ∈ (m1, m1 + ε), N1 ∈ (n1, n1 + ε),

M2 ∈ (m2, m2 + ε), N2 ∈ (n2, n2 + ε),

P ∈ (p, p + ε)∣∣∣Kl ∈ (vl, vl + ε), Lj ∈ (wj, wj + ε)

)= C−1

n−2∑s=3

∑assignments

fP (p, s)(ns

) fM1,N1,K(m1, n1,v)fM2,N2,L(m2, n2,w),

(16)

where fP is as defined in (10) and both fM1,N1,K and fM2,N2,L are as definedin (15). The constant C on the left-hand-side of (16) is

C =n−2∑s=3

∑assignments

∫· · ·∫

fP (p, s)(ns

) fM1,N1(m1, n1,v)fM2,N2(m2, n2,w)

=n−2∑s=3

∑assignments

Γ( s−12

)Γ(n−s−12

)P

1≤i<j≤s |vj−vi|

(s2)

P1≤i<j≤n−s |wj−wi|

(n−s2 )(

ns

)πn/2−1

√s(n− s) (

∑si=1(vi − v)2)

(s−1)/2 (∑n−si=1 (wi − w)2

)(n−s−1)/2

(17)

The terms of the form∑

1≤i<j≤s |vj − vi|/(

s2

)are caused by the fact that

we assumed that the order of assignments in the two groups is random.In particular we average over all the possibilities of choosing the first twoobservations for the inverse in (15).

Since the sums in the fiducial distribution have a total of 2n − 2− 2n −n(n−1) terms, we are unable to get close form of the fiducial density usable inpractice. However, we still can use the derived fiducial distribution for infer-ence. The idea is to simulate observations following the fiducial distributionusing a Metropolis-Hastings algorithm.

The main idea is as follows. Once we know the assignment of observa-tions to the groups 1 and 2 it is straightforward to generate the values ofthe 5-dimensional fiducial distribution. This is done by calculating the cor-responding sample means and variances for each group, and using (13) and(9).

28

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m2−m1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for p

Figure 9: QQ-plots of Q1(X), Q2(X), Qd(X) and Q1(X) for n = 80 obser-vations of the p = .65 mixture of N(−1, 1/27) and N(0, 9). The blue andgreen envelope correspond to an area of natural fluctuation of a QQ-plot dueto randomness taken uniformly and pointwise respectively. The QQ-plot isbased on 1000 replications.

To generate a random assignment notice that each particular configura-tion assignment has a probability proportional to the corresponding sum-mand in the right-hand-side of (17). We can therefore generate a proposalconfiguration by taking a previous assignment, randomly choosing one datapoint and switching it to the other group. This new proposed assignmentis then rejected or accepted using the usual Metropolis-Hastings rule. Oncewe have a new random assignment, we then generate the observation fromthe 5-dimensional fiducial distribution. The stationary distribution of theassignment valued Markov chain is clearly the fiducial distribution of the as-signment. Therefore this procedure will generate observations from fiducialdistribution after an adequate burn in period. It is worth pointing out thateven though this procedure is fairly computationally intensive, it is usablefor most situations encountered in practice.

To evaluate the performance of this procedure, we conducted a small

29

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for p

Figure 10: QQ-plots of Q1(X), Q2(X), Qd(X) and Q1(X) for n = 250 obser-vations of the p = .65 mixture of N(−1, 1/27) and N(0, 9). The blue andgreen envelope correspond to an area of natural fluctuation of a QQ-plot dueto randomness taken uniformly and pointwise respectively. The QQ-plot isbased on 1105 replications.

scale simulation study. We considered the following mixture of distributions:N(−1, 1/27) and N(0, 9) with the number of observation n = 80, n = 250and the mixing proportion p = .65. We also considered N(−1.5, 1) andN(1.5, 1) with n = 100, 250 and p = .6. We wish to remark that the secondmixture is actually very hard to estimate. We used the particular choice D ∼Beta(1/2, 1/2) in the definition of fP in (16), cf. (10).

For each of these models we generated a sample from the fiducial dis-tribution and used it to find a sample from Q1(X) = P (Rµ1(X?) < µ1|X),Q2(X) = P (Rµ2(X?) < µ2|X), Qd(X) = P (Rµ2−µ1(X?) < µ2 − µ1|X), andQp(X) = P (Rp(X?) < p|X). Notice that Qd is measuring the performance ofa fiducial solution for a generalization of a Beherns-Fisher problem where wewant a CI for µ2 − µ1 but do not know which observations belong to whichgroup.

If the inference based on fiducial distributions were exact, these random

30

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for p

Figure 11: QQ-plots of Q1(X), Q2(X), Qd(X) and Q1(X) for n = 100 obser-vations of the p = .6 mixture of N(−1.5, 1) and N(1.5, 1). The blue andgreen envelope correspond to an area of natural fluctuation of a QQ-plot dueto randomness taken uniformly and pointwise respectively. The QQ-plot isbased on 1000 replications.

variables should follow a U(0, 1) distribution. To check for the agreement weconstructed QQ-plots. These can be found in Figures 9, 10, 11, and 12. Wesee that while the agreement is not very good in the body of the distributionit is actually very good in the tails of the distribution. This means thatthe inference based on the fiducial distribution will have approximately thestated coverage. We also see that the inference for µ1, µ2 and µ2 − µ1 seemsmore accurate than for p which is often too conservative. In any case, theperformance seemed very good given the fact we chose mixtures that arehard to estimate. Finally, we remark that the fit improves for larger n. Thisleads us to conjecture that the inference will be correct asymptotically asn →∞.

31

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for m2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual p−value

Nom

inal

p−

valu

e

QQ−plot for p

Figure 12: QQ-plots of Q1(X), Q2(X), Qd(X) and Q1(X) for n = 500 obser-vations of the p = .6 mixture of N(−1.5, 1) and N(1.5, 1). The blue andgreen envelope correspond to an area of natural fluactuation of a QQ-plotdue to randomness taken uniformly and pointwise respectively. The QQ-plotis based on 600 replications.

5 Convergence Theorems

In this section we present some general theorems that are applicable in typ-ical situations one encounters in developing inference procedures using thefiducial argument. While the theorems could be formulated in a more generalsetting, we do not do so here.

Let us consider a parametric statistical problem where we observe X1, . . . , Xn

whose joint distribution belongs to some family of distributions parametrizedby ξ ∈ Rp. We will be interested in estimating θ = π(ξ) ∈ Rq. LetS = (S1, . . . , Sk), k ≥ q denote a statistic based on the Xi’s. Also de-note by Rθ(x, U) a random variable having the distribution described in (5)and (6).

For simplicity of notation we will define the following notion of conver-gence for open sets.

32

Definition 1. Sets An converge to an open set A, i.e., An → A, if (lim An)◦ =A. Here lim An = B exists if IAn → IB, IA is the indicator function of A andB◦ is the interior of B.

We are now ready to state the general conditions under which the fidu-cial distribution defined in (5) leads to asymptotically correct frequentistcoverage. We will later show how these conditions could be verified in manyapplications.

Assumptions 1. For a fixed γ ∈ (0, 1) assume the following

1. There exist t(ξ) ∈ Rk such that

√n (S1 − t1(ξ), . . . , Sk − tk(ξ))

D−→ H = (H1, . . . , Hk)>, (18)

where H has a non-degenerate multivariate normal distribution withmean 0 and variance ΣH .

2. For each fixed h ∈ Rk assume that there is random variable R(h) suchthat

(a) For any xn ∈ Rk satisfying√

n(xn − t(ξ)) → h, we have

√n(Rθ(xn)− θ)

D−→ R(h).

(b) There is a general matrix A and a non-negative definite matrixΣR such that AΣHA> = ΣR and R(h) has multivariate normaldistribution with mean Ah and variance ΣR.

3. We consider a collection regions C(X, z, s, γ) ⊂ Rd indexed by randomvariables X, vectors z ∈ Rd, s ∈ Rk, and γ ∈ (0, 1) satisfying:

(a) C(X, z, s, γ) is an open set with boundary of zero Lebesgue mea-sure, i.e., λ(∂C(X, z, s, γ)) = 0 where λ denotes Lebesgue mea-sure.

(b) P (X ∈ C(X, z, s, γ)) = γ.

(c) For all a ∈ R and b ∈ Rd, C(aX+b, az+b, s, γ) = aC(X, z, s, γ)+b.

(d) For all h ∈ Rk, YnD−→ R(h), zn → Ah, sn → t(ξ) and γn → γ

we have C(Yn, zn, sn, γn) → C(R(h), z, t(ξ), γ).

33

Remark 12. The asymptotic behavior described in Condition 2 is actuallyvery common for many fiducial quantities. This is related to the Bernstein-von Mises theorem – see chaper 8 of Le Cam & Yang (2000). While we arenot aware of any specific result proving this type of a theorem for fiducialquantities we believe that such a theorem should follow by a modificationof the proof of the classical Bernstein-von Mises theorem. Unfortunately,we will not be able to pursue details of such a modification in this articlethough we do offer additional remarks on this subject in Section 9. Theexact conditions under which Bernstein-von Mises theorem holds for fiducialquantities is a subject of ongoing investigation. Finally, we remark that thequantity Sn in condition 1 is related to the centering sequences common inthe asymptotic theory of the likelihood.

Remark 13. Condition 3 is concerned with a choice of a shape of the con-fidence region. There are many such regions of probability 1 − γ available.Condition 3 is therefore singling out a particular collection of such sets. Forexample, if d = 1 one of the typical choices is the upper confidence region,C(X, γ) = (−∞, q(X, γ)), where q(X, γ) is the γ-quantile of the distributionof X. Other choices are the lower confidence region and the two sided, equaltailed region. If d > 1 we can also consider the equal tailed regions. In factthe conditions on the region so flexible that they allow most typical multiplecomparison regions. We will demonstrate this in example 12 in Section 7.

Theorem 1. Suppose Assumptions 1 hold and γn → γ. Furthermore assumethat there is a function ζ : Rk → Rd such that for any sn ∈ Rk satisfying√

n(sn − t(ξ)) → h, we have

√n(ζ(sn)− θ) → Ah, (19)

where the matrix A was defined in 2b.Then

limn→∞

Pξ (θ ∈ C(Rθ(S), ζ(S), S, γn)) = γ.

In particular C(Rθ(S), ζ(S), S, γ) is a confidence region for θ with asymptoticcoverage probability equal to γ.

Proof. By assumption 1 and Skorokhod’s representation (Billingsley 1995)theorem imply that we can assume without loss of generality that

√n (S− t(ξ)) → H a.s. (20)

34

This, assumptions 2a, and (19) assure

√n(Rθ(S)− θ)

D−→ R(H) a.s.√

n(ζ(S)− θ) → AH a.s. (21)

(Here the a.s. means for almost all sample paths of the process Sn and subse-quently almost all values of H.) Therefore by (20), (21) and assumption 3d

C(√

n(Rθ(S)− θ),√

n(ζ(S)− θ), S, γn

)→ C (R(H), AH, t(ξ), γ) a.s. (22)

Also, by assumption 3c we see that

Pξ (θ ∈ Cn(Rθ(S), ζ(S), S, γn)) = Pξ

(0 ∈ C(

√n(Rθ(S)− θ),

√n(ζ(S)− θ), S, γn)

).

To finish the proof, we will show the following convergence:

Pξ(0 ∈ C(√

n(Rθ(S)−θ),√

n(ζ(S)−θ), S, γn)) → Pξ(0 ∈ C (R(H), AH, t(ξ), γ)).

First notice that R(h)−Ah has a multivariate normal distribution withmean zero and covariance matrix ΣR. It is the same distribution as thedistribution of −AH. Assumption 2a implies

{h : 0 ∈ C(R(h), Ah, t(ξ), γ))} = {h : −Ah ∈ C(R(h)− Ah, 0, t(ξ), γ)}= {h : −Ah ∈ C(−AH, 0, t(ξ), γ)}. (23)

For simplicity of notation denote Hn =√

n(S− t(ξ)). Also denote

Bn = {h : 0 ∈ C(√

n(Rθ

(t(ξ) + h/

√n)−θ),

√n(ζ

(t(ξ) + h/

√n)−θ), (t(ξ)+h/

√n), γn)}

and B = {h : 0 ∈ C(R(h), Ah, t(ξ), γ)}. The sets are chosen to satisfy

{0 ∈ C(√

n(Rθ(S)− θ),√

n(ζ(S)− θ), S, γn)} = {Hn ∈ Bn}

and{0 ∈ C(R(H), AH, t(ξ), γ)} = {H ∈ B}.

As noted before we have HnD−→ H. Moreover assumptions 2b and 3d

implies that B is open, ∂B = {h : 0 ∈ ∂C(Rθ(h), γ)} and Bn → B. Assump-tion 3a and (23) additionally imply that P (H ∈ ∂B) = 0.

Denote Dm =⋃∞

k=m Bk \ (⋂∞

k=m Bk)◦. Notice that by assumption 3d we

have Dm ↓ D ⊂ ∂B and P (H ∈ D) = 0. Moreover if m ≤ n, Bn4B ⊂ Dm.

35

Fix an ε > 0. Continuity of probability implies that there is m1 such thatPξ(H ∈ Cm1) < ε. Consequently convergence in distribution implies thatthere is m2 such that for all n > m2, Pξ(Hn ∈ Cm2) < ε. This implies thatfor n > max(m1, m2)

|Pξ(Hn ∈ Bn)− P (Hn ∈ B)| ≤ P (Hn ∈ Cm1) < ε.

Finally notice that

|Pξ(Hn ∈ Bn)− Pξ(H ∈ B)|≤ |Pξ(Hn ∈ Bn)− Pξ(Hn ∈ b)|+ |Pξ(H ∈ Bn)− Pξ(H ∈ B)|.

Thus the assumption 3b and (23) together with the definition of conver-gence in distribution imply

Pξ(0 ∈ C(√

n(Rθ(S)− θ),√

n(ζ(S)− θ), S, γn) = Pξ(Hn ∈ Bn)

→ Pξ(H ∈ B) = Pξ(0 ∈ C (R(H), AH, t(ξ), γ) = γ.

This concludes the proof of the theorem.

Remark 14. It is fairly straightforward to generalize the statements of Theo-rem 1 for distributions that are not in the domain of attraction of the normaldistribution. Some examples in that direction have been explored in (Han-nig et al. 2006b). However, the main ideas are better demonstrated withinthe setting we have chosen. In particular the key condition 2b is easier tounderstand if the limiting distribution is normal.

The main issue faced when dealing with continuous distributions is relatedto the need to use conditioning. In this section we first prove a corollary tothe general theorem showing that under some suitable conditions the fiducialdistribution is approximately correct even in the presence of conditioning. Weillustrate this with examples.

Assume that the set Q(x, u) defined in (4) is either a singleton or empty.Additionally assume there are functions Rθ(x, u) and R0(x, u) satisfying

{Rθ(x, u)} = Q(x, u) and R0(x, u) = 0 whenever Q(x, u) 6= ∅.

The fiducial distribution in (5) can be then interpreted as the distribution of

Rθ(x, U) |R0(x, U) = 0. (24)

We will now state assumptions under which confidence regions based on(24) lead to asymptotically correct inference.

36

Assumptions 2. For a fixed γ ∈ (0, 1) assume the following

1. There exist t(ξ) ∈ Rk such that

√n (S1 − t1(ξ), . . . , Sk − tk(ξ))

D−→ H = (H1, . . . , Hk)>, (25)

where H has a non-degenerate multivariate normal distribution withmean 0 and variance ΣH .

2. There are matrices Aθ and A0 such that For each fixed h ∈ Rk and forany xn ∈ Rk satisfying

√n(xn − t(ξ)) → h

(a)

Rn(x) =√

n

((Rθ(xn, U)R0(xn, U)

)−(

θ0

))D−→(

Aθ

A0

)(h−H?) = R(h).

(26)Here H? is independent of and has the same distribution as H.

(b) Assume that R(h) (defined on the right-hand-side of (26)) has anon-degenerate normal distribution. The density of Rn(x) (de-noted by fn(rθ, r0)) converges to the density of R . Moreover, foreach fixed rθ the functions fn(rθ, · ) are uniformly integrable.

3. We consider region C(X, z, s, γ) to satisfy Assumptions 1.3 with matrixA = Aθ − AθΣHA>

0 (A0ΣHA>0 )−1A0.

Theorem 2. Suppose Assumptions 2 hold, γn → γ and there is a functionζ satisfying the same condition as in Theorem 1. Then

limn→∞

EPξ (θ ∈ C(Rθ(s, U?)|R0(s, U

?) = 0, ζ(S), S, γn)|S = s) = γ.

In particular for each observed s, C(Rθ(s, U?)|R0(s, U

?) = 0, γ) is a confi-dence region for θ with asymptotic coverage probability equal to γ.

Proof. By Assumption 2 and Skorokhod’s representation (Billingsley 1995)theorem imply that √

n (S− t(ξ)) → H a.s.

This together with assumption 2.2a and 2.2b this assures that

√n(Rθ(S, U?)− θ)|Rθ(S, U?) = 0

D−→ Aθ(H−H?)|A0(H−H?) = 0 a.s.

37

(Here the a.s. again means for almost all values of Sn and H.)Set A = Aθ−AθΣHA>

0 (A0ΣHA>0 )−1A0. Random vector Aθ(h−H?)|A0(h−

H?) has a normal distribution with mean Ah and variance

AθΣHA>θ − AθΣHA>

0 (A0ΣHA>0 )−1A0ΣHA>

θ ,

which is also the variance of AH. This verifies the Condition 2b. The theoremnow follows by Theorem 1.

6 Discrete Distributions

In this section we will explore some issues related to fiducial inference fordiscrete distributions. We show that the conditions of Theorem 1 can bedirectly verified for the most common discrete distributions.

Example 8. (Continuation of Example 3) Let X1, . . . , Xn be i.i.d. Multinomial(p)random variables, where p = (p1, p2, . . . , pk), pj ∈ (0, 1), j = 1, . . . , k, and∑k

j=1 pj < 1. The fiducial distribution for this model was derived in (8).Using Theorem 1 we now show that inference based on Rp(s) has good

frequentist properties asymptotically. Since we will consider equal tailed re-gions based on the distribution of Rp(s), the Assumptions 1.3 are automat-ically verified. Denote Sj the number of times we observe value j among theX1, . . . , Xn. Recall that S = (S1, . . . , Sk)

> has a multinomial(n, p1, . . . , pk)distribution. Therefore

√n(S − p) → H, where H ∼ N(0, Σ) and Σ =

Diag(p)− pp>. This verifies assumption 1.1.Notice that for any sequence of integers kj, where 0 ≤ kj ≤ j, we have

n(Ukn+1:n −Ukn:n)D−→ Γ(1, 1). Fix h, set s = (p + h/

√n) and denote Wn =

(Us1:n, Us1+s2:n, . . . , Us1+···+sk:n). A simple calculation shows that√

n(Wn −q)

D−→ N(g, Σ), where gj =∑j

k=1 hk and Σi,j = min(qi, qj)(1−max(qi, qj)),

with qj =∑j

k=1 pk. Thus by Slutsky’s theorem

√n(Rp(S)− p)

D−→ N(h, Σ).

The Assumptions 1 are verified.In particular we can conclude that fiducial confidence sets will have

asymptotically correct frequentist coverage regardless of the choice of thedistribution V (· ).

38

We now derive a weak fiducial distribution for a sample from a generalone-parameter discrete distribution.

Example 9. Assume that X is a discrete random variable and ξ ∈ R. Thenthe function G can be chosen to satisfy s = G(u, ξ) if and only if P (X <s|ξ) < u ≤ P (X ≤ s|ξ). Additionally assume that P (X ≤ s|ξ) is a monotonecontinuous function in ξ. Then we can define functions q+ and q− satisfying

q+(x, u) = ξ if P (X ≤ x|ξ) = u, q−(x, u) = ξ if P (X < x|ξ) = u.

Finally, assume that V (a, b) is the uniform distribution on (a, b). Then

Rξ(x) = V (Q(x, U?)) = q−(x, U?) + {q+(x, U?)− q−(x, U?))}U ,

where x is the observed value of X and U? and U are independent uniform(0, 1)random variables.

Example 10. Let X be a Poisson(λ) random variable. As in example 9 defineq+(x, u) = λ if P (X ≤ x|λ) = u and q−(x, u) = λ if P (X < x|λ) = u.

Notice that q−(x, U) < λ < q+(x, U) if and only if the Poisson randomvariable G(U, λ) = x. But using an appropriate Poisson process we canrewrite this equality as E(x) < λ < E(x) + E, where E(x) is Gamma(x, 1)random variable and E is exponential(1) random variable independent ofE(x). Notice that E(x) and E are independent of X. Thus the weak FQcould be written as

Rλ(x) = V (Q(x, U?)) = E(x) + UE, (27)

where x is the observed value of X and U is uniform(0, 1) independent ofX, E(x) and E.

If we choose U to have Beta(1/2, 1/2) instead of Uniform(0, 1) in (27),Rp will have Gamma(x + 1/2) distribution which again corresponds to theBayesian solution using Jeffreys prior. A particularly interesting case is U = 1which leads to Gamma(x + 1) distribution. This is the scaled likelihoodfunction L(λ; x)/

∫∞0

L(λ; x) dλ.Let us now consider X1, . . . , Xn i.i.d. Poisson(λ) random variables and

denote S = 1n(X1 + · · ·+ Xn). Clearly

Rλ(x) =E(ns) + UE

n

is a weak FQ for λ.

39

By central limit theorem√

n(S−λ)D−→ N(0, λ) and a simple calculation

shows that if s = λ + h/√

n

G(ns)− λ√n

D−→ N(h, λ)

This verifies the Assumptions 1.

7 Continuous Distributions

Example 11. The first example is motivated by an unbalanced variance com-ponents model. Such models arise in heritability studies in animal breedingexperiments (Burch & Iyer 1997), quality improvement studies in manu-facturing processes (Burdick, Borror & Montgomery 2005), characterizingsources of error in general variance components models (Liao & Iyer 2004),and in many other applications. In the simplest case one has the followingnormal components of variance model.

Yij = µ + Ai + eij

where µ is an unknown parameter, Ai are iid N(0, φ), eij are iid N(0, θ), andall random variables are jointly independent. In metrology, Yij might be thediameter measurement on a part (ball-bearing) and µ is the mean diameterof the population of ball-bearings output by the process. A random sampleof a ball-bearings are selected. The true diameter of the ith ball-bearing isµ + Ai. Ball-bearing i is measured ni times. If ni = n for all i then we havea balanced one-way random effects model. In the case of unequal ni we havean unbalanced one-way random model. In the balanced case the completesufficient statistics are well known (Searle, Casella & McCulloch 1992). In theunbalaneced case the minimal sufficient statistics are incomplete. Inferenceabout φ and θ is typically based on K independent quadratic forms whichhave scaled chi-square distributions and whose expected values have the formθ + ciφ for some known ci, i = 1, . . . , K. The simplest challenging case isK = 3. Hence we consider the following scenario and illustrate our procedurefor obtaining a weak fiducial distribution for (φ, θ).

Let

S1 =(c1φ + θ)U1

n1

, S2 =(c2φ + θ)U2

n2

, S3 =U3

n3

θ,

40

where c1 > c2 > 0, and U1, U2, U3 are independent chi-square random variablewith n1, n2, n3 numbers of degrees of freedom respectively.

Solving the first two equations for φ and θ and then plugging the resultsinto the third equation suggests defining

W1 =n1s1

(c1 − c2)U1

− n2s2

(c1 − c2)U2

, W2 = − c2n1s1

(c1 − c2)U1

+c1n2s2

(c1 − c2)U2

,

W3 =U3

n3

(− c2n1s1

(c1 − c2)U1

+c1n2s2

(c1 − c2)U2

).

Here s1, s2, s3 are again the observed values of the statistics S1, S2, S3. Thefiducial distribution of φ, θ defined by (5) then could be interpreted as theconditional distribution of W1, W2|W3 = s3.

A routine calculation shows that the joint density of W1, W2, W3 is

fW(w1, w2, w3; s1, s2)

=|c1 − c2|(n1s1)

n12 (n2s2)

n22 n

n32

3 wn32−1

3 exp[−1

2

(n1s1

w1c1+w2+ n2s2

w1c2+w2+ n3w3

w2

)]2

n1+n2+n32 Γ

(n1

2

)Γ(

n2

2

)Γ(

n3

2

)(w1c1 + w2)

n12

+1(w1c2 + w2)n22

+1wn32

2

.

The fiducial distribution of φ, θ has therefore a density

fW(w1, w2, s3; s1, s2)∫∫fW(w′

1, w′2, s3; s1, s2) dw′

1dw′2

.

Consequently, the fiducial distribution of φ is∫fW(w1, w2, s3; s1, s2)dw2∫∫

fW(w′1, w

′2, s3; s1, s2) dw′

1dw′2

.

To set up confidence regions one can use numerical integration. The fidu-cial distribution of φ does not lead to exact frequentist inference. However,simulation results suggest a good practical properties. For details on thesimulation and some generalization we refer reader to a technical report E,Hannig & Iyer (2006).

To show that the fiducial distribution leads at least to asymptoticallyproper frequentist coverage define n = n1 +n2 +n3 and assume that ni/n →pi ∈ (0, 1). Also notice that

√n (S− (c1φ + θ, c2φ + θ, θ))

D−→ H, where

H ∼ N

0,

2(c1φ+θ)2

p10 0

0 2(c2φ+θ)2

p20

0 0 2θ2

p3

(28)

41

As in Assumptions 2 define

Z1 =√

n(W1 − φ), Z2 =√

n(W2 − θ), Z3 =√

n(W3 − s3),

and set h = (h1, h2, h3) to satisfy

s1 = (c1φ + θ) +h1√n

, s2 = (c2φ + θ) +h2√n

, s3 = θ +h3√n

.

Then the density of (Z1, Z2, Z3) is

f(z1, z2, z3) = fW

(φ +

z1√n

, θ +z2√n

, θ +h3√n

+z1√n

;

c1φ + θ +h1√n

, c2φ + θ +h2√n

)n−3/2.

This function is bounded from above by an integrable function. Moreover, ifwe set

B =

(Aθ

A0

)=

1c1−c2

− 1c1−c2

0

− c2c1−c2

c1c1−c2

0

− c2c1−c2

c1c1−c2

−1

then f(z1, z2, z3) converges as n → ∞ to a multivariate normal density. In

fact, it is the density of the random variable B(h −H), where H is definedin (28). Thus Lebesgue dominated theorem and Theorem 2 imply that confi-dence intervals based on the fiducial density of φ have asymptotically correctfrequentist properties.

Example 12. We include this last example to show that the regions defined inAssumptions 1.3 are flexible enough to allow for typical multiple comparisonintervals.

Suppose that for each i = 1, . . . , K, Yij, j = 1, . . . , ni is i.i.d. N(µi, σ2i ).

The K samples are also assumed independent of each other. We are interestedin the problem of constructing simultaneous confidence intervals for δij =µi − µj for all i 6= j.

We first observe that by independence the fiducial distribution for δij isthe same as the distribution of the FGPQ given by

Rδij(S, S?, ξ) = Rµi

−Rµj

where

Rµp = Yp −Sp

S?p

(Y ?p − µp)

42

is the FGPQ for µp (see Example 6).Define

D(S, S∗, ξ) = maxi6=j

∣∣∣∣∣(Y i − Y j)−Rδij(S, S∗, ξ)√

Vij

∣∣∣∣∣where Vij is a consistent estimator of the variance of Y i−Y j, i.e., Vij =

S2i

ni+

S2j

nj. The 100(1 − α)% two-sided simultaneous FGCIs for pairwise diferences

δij, i 6= j of means of more than two independent normal distributions are[Lij, Uij] where

Lij = Y i − Y j − d1−α

√Vij,

Uij = Y i − Y j + d1−α

√Vij

(29)

and dγ denotes the 100γ-percentile of the conditional distribution ofD(S, S∗, ξ)given S = s.

To set up confidence regions one can use simulation. The simultaneousfiducial confidence intervals for δij do not lead to exact frequentist infer-ence. However, simulation results suggest very good practical properties.For details on the simulation and some generalization we refer reader to(Abdel-Karim 2005) and (Hannig, E, Abdel-Karim & Iyer 2006a).

To show that the fiducial distribution leads at least to asymptoticallyproper frequentist coverage define n =

∑Kk=1 nk and assume that ni/n →

pi ∈ (0, 1). It is fairly straightforward to see that S = (Y 1, S21 , . . . , Y K , S2

K)>

satisfies Assumption 1.1. Similarly, R = (Rδ12 ,Rδ13 . . . ,Rδ(K−1)K)> satisfies

the Assumptions 1.2, with a K(K − 1)/2× 2K matrix

A =

1 0 −1 0 0 0 · · · 0 0 0 01 0 0 0 −1 0 · · · 0 0 0 0...

......

......

.... . .

......

......

0 0 0 0 0 0 · · · 1 0 −1 0

.

Similarly, the assumption in (19) will be satisfied with the function ζ(S) =A · S

Finally, we need to show that the region described in (29) satisfies As-sumptions 1.3. To that end observe that the conditional distribution ofD(S, S∗, ξ)|S could be represented as function of distribution of R, ζ(S) andS. Here, the estimator of variance nVij could be expressed as a continu-ous function of S. The various conditions of this assumption now follow bySlutsky’s lemma and simple algebra.

43

8 Non-uniqueness of fiducial distribution

The fiducial recipe of Section 3 seems to provide an approach for derivingstatistical procedures that have good properties. Unfortunately, it does notlead to a unique fiducial distribution.

There are two main sources of the non-uniqueness. The more obviousone is the fact that the sets Q(X, U∗) might have more than one element.This means that we would not be able to find the exact value of ξ even if weknew both X and U . Consequently, the data itself is not able to tell us whichdistribution value of ξ was used. In order to resolve this non-uniqueness onehas to have some apriori way of choosing between the elements of Q(X, U∗).Fortunately, in most application when Assumptions 1 are satisfied we alsoobserve that in particular

√n diam(Q(X, U∗)) → 0. This means that in

these cases the role of the apriori information is negligible asymptotically.Of courses such a situation can be expected only in parametric problems.However, just like the choice of a prior in Bayesian methods, the apriori choiceof V (Q(x, u)) will play a big role in non-parametric and semi-paramtericproblems.

Based on our experience with the problems we investigated we recommendthe use of V (Q(x, u)) that is independent of the data and that maximizesthe determinant of the variance of the fiducial distribution. Another usefuloption is to use the uniform distribution on Q(x, u). This second optionshould work reasonably well and be reasonably easy to implement even if wedeal with higher dimensional problems.

Another way of resolving this problem is using upper and lower probabil-ities, cf. Dempster (1968). In particular, instead of defining a single fiducialdistribution on the parameter space we define an upper and lower fiducial dis-tribution. In our setting the upper probability is obtained as the supremumover possible choices of the distributions V (Q(x, u)) while the lower prob-ability is obtained as the infimum over possible choices of the distributionsV (Q(x, u)). Therefore if one refuses to use any subjective prior informationone can still use the fiducial recipe for obtaining statistical procedures usingthe upper and lower probabilities.

The second source of non-uniqueness is caused by the Borel paradox. Ifin the fiducial recipe (5) we have P (Q(x, u) 6= ∅) = 0, the resulting fiducialdistribution depends on the way we decide to interpret the conditioning.We consider this to be a more severe problem because it is much harder toinvestigate and resolve. To demonstrate the severity of the situation, consider

44

the following continuation of example 6.

Example 13. Let X1, . . . , Xn be i.i.d. N(µ, σ2). In Example 6 we showed twodifferent way of implementing the fiducial recipe that both led to the samedesirable solution. Unfortunately, there are many other ways of implementingthe fiducial recipe that do not lead to good solutions. We will demonstrateone of them here.

We again write the structural equation as

Xi = µ + σZi, i = 1, . . . , n.

For simplicity of notation assume that n is even, i.e., n = 2k. Define

Mj =z2j−1x2j − z2jx2j−1

z2j−1 − z2j

, Hj =

(x2j−1 − x2j

z2j−1 − z2j

)2

j = 1, . . . , k.

Therefore we can write

Q(x1, . . . , xn; z1, . . . , zn) =

{(M1, H1)}

if Mj = M1, Hj = H1, j = 2, . . . , k

∅ otherwise.

Defining Dj,1 = Mj−M1, Dj,2 = Hj−H1, j = 2, . . . k, we can interpret thefiducial distribution (5) as the conditional distribution of (M1, H1)|D = 0. Asimple calculation shows that this conditional distribution has density

fR(µ,σ2)(m, h) =

e−(m−xn)2

2h/n− (n−1)s2n

2h ((n− 1)s2n)n− 3

2√π/n Γ

(n− 3

2

)2n−1hn

I(0,∞)(h). (30)

Here xn =∑n

i=1 xi/n and s2n =

∑ni=1(xi − xn)2/(n − 1). The distribution

derived in (30) is different from the one derived in (14). In fact inferencebased on (30) will not lead to correct frequentist inference. In particularthe confidence intervals on variance will be too large. In fact the coverageprobability of any lower tail confidence interval will converge to 0 as n →∞.

The problem illustrated in examples 6 and 13 is an instance of Borelparadox – see for example Section 4.9.3 of Casella & Berger (2002) and alsoHannig (1996) for a thorough discussion of this paradox. The main messageof Borel paradox is that conditioning on an event of probability zero greatlydepends on the context in which we interpret the conditions.

45

Consider in particular X|Y = 0, where (X,Y ) is jointly continuous.There is a random variable U , such that (X, U) is jointly continuous, thesets {Y = 0} = {U = 0}, but the conditional density of X|Y = 0 is differ-ent from the conditional density X|U = 0, even though the condition is thesame in both cases. Since there is no theoretical reason that would deemeither X|Y = 0 or X|U = 0 as superior to the other, people often rely onthe context of the problem to make the choice, e.g., conditional distributionsin regression settings. However, one can often come up with modification ofthe “story” behind the problem that leads naturally to a different choice ofthe conditioning variable. This then can be then presented as a paradox –two apparently equivalent formulations of the same statistical problem leadto different answers.

The interpretation of the conditioning we used in example 13 is “legal”.However, it does not appear intuitively desirable, because it is unnecessarilycomplicated in comparison to the conditioning in example 6. In the remainderof this section we will explore two more ways of interpreting the conditionaldistribution in (5). They also lead to different answers reaffirming Borel’sparadox.

Example 14. Another important way of interpreting the conditional proba-bility is through the following limiting process. Let x ∈ Rn and define a cubexε = (x1 − ε, x1 + ε) × · · · × (xn − ε, xn + ε). Let us also assume that theX ∈ Rd is a continuous random vector with distribution distribution indexedby parameter ξ ∈ Ξ, where the parameter space Ξ is an open subset of Rp.Denote its density by fX(x|ξ). Additionally assume the cardinality of the set(c.f., (4)) |Q(x, u)| ≤ 1. Finally assume that for all x there is an ε > 0 andC < ∞ such that for all y ∈ xε we have

∫Ξ

fX(y|ξ)dξ < C. Then the densityof the conditional distribution in the definition of fiducial distribution (5)can be interpreted as

r(ξ|x) = limε→0

P (G(U?, ξ) ∈ xε)

P (there exists ξ, G(U?, ξ) ∈ xε)=

fX(x|ξ)∫Ξ

fX(x|ξ)dξ. (31)

The second equality of (31) follows from the bounded convergence theoremand the fact that P (G(U?, ξ) ∈ xε) =

∫xε fX(y|ξ)dy.

The result of (31) implies that, under our conditions, the Bayesian pos-terior with respect to the flat prior, i.e., the scaled likelihood, could be un-derstood as a fiducial distribution. This is rather amusing as it was Fisher’sstrong dislike of this particular Bayesian posterior that led to his invention of

46

fiducial inference. The conditions we imposed to derive (31) are very strong.In fact the same conclusion could be derived under much milder conditions.

It is a well known fact that Bayesian posterior distribution with respectto a flat prior displays some unfavorable frequentist behavior. In fact otherpriors often lead to better performance. The fiducial setting allows us to giveanother argument for illustrating this phenomenon.

Example 15. Let us assume that the parameter of interest ξ is p-dimensional.Recall the structural equation (3) X = G(U, ξ). Write G = (g1, . . . , gn) so thatXi = gi(U, ξ) for i = 1, . . . , n.

Furthermore assume that X0 = (X1, . . . , Xp) and G0 = (g1, . . . , gp). Weassume that, for each fixed u ∈ (0, 1), the mapping G0(u, · ) is invertible. Wedenote this inverse mapping by Q0(x0, u) = (q1(s0, u), . . . , qp(s0, u)). Thuswe have Q0(G0(u, ξ), u) = ξ.

Now let Xc = (Xp+1, . . . , Xn) and Gc = (gp+1, . . . , gk). Substituting ξ =G0(X0, U) in the equations Xj = gj(U, ξ), j = p+1, . . . , k, we get the identity

Xc = Gc(U,Q0(X0, U)). (32)

Therefore the observed values x have to lie on a p-dimensional manifoldin order for Q(x, u?) 6= ∅ . Moreover the fiducial distribution (5) can beinterpreted as the limiting distribution of

limε→0

Q0(x0, U?)|Gc(U

?, Q0(x0, U?) ∈ xε

c (33)

If the random vector Q0(x0, U?), Gc(U

?, Q0(x0, U?)) is jointly continuous the

limiting distribution in (33) is well defined and unique. In fact, it is theconditional density of Q0(x0, U

?)|Gc(U?, Q0(x0, U

?) = xc.We feel that this interpretation of the conditional distribution in (5) is

the most appealing. Since the data must lie on a p-dimensional manifold it ismuch preferable to increase the width of only (n−p)-dimensional observationxc as opposed to increasing the width of the whole n-dimensional observationx as done in the limiting arguments used in example 14. This is based on theheuristic argument that increasing the width will unnecessarily lead to loss ofinformation and is supported by the fact that in most practical situations weare aware that (33) gives rise to statistical procedures with better propertiesthan those based on (31).

Given u, by virtue of (32), it follows that X must lie on the manifoldM(u). Note that the same manifold may be definable using a different set

47

of equations leading to possibly different defining distribution of (33). Wesuggest that in this case we assume that the selection of observations into x0

was done randomly, i.e., we average over all possible assignments. After tak-ing this average we take the limit ε → 0 in (33). This procedure was used inexample 7 and is the reason why the terms of the type

∑1≤i<j≤s |vj − vi|/

(s2

)appeared there.

Remark 15. Fraser (1968) has linked fiducial inference with group structure.A very good explanation of his ideas can be found in Appnedix 3 of Dawidet al. (1973). The main advantage of Fraser’s assumption is the fact that theset Q(x, u) is trivially guaranteed to have at most 1 element for all choices ofx and u. Thus, the first source of non-uniqueness, the choice of a particularelement in Q(x, u) is not a problem. Unfortunately, the second source ofnon-uniqueness, Borel paradox, is still present. We again need to interpret aconditional probability that is conditioned on an event that has probability0. Fraser implicitly assumes a particular way of interpreting the conditionalprobability in (5) as suggested by the group structure. His interpretation is infact very similar to our recommendation described in example 15. However,the problem is still present as demonstrated by an example in Dawid et al.(1973), where the authors show that addition of information clearly irrelevantto the inference leads to a different fiducial distribution. This is caused bythe fact that addition of the information changes the description of the seton which we are conditioning and therefore it leads to a different answer, i.e.,by Borel paradox.

Remark 16. One particular way of avoiding the Borel paradox presents itselfin the case when the parameter space is an open set in Rp and the model al-lows for a p-dimensional complete sufficient statistic that is a smooth functionof the data. In this case we can first reduce the data by obtaining completesufficient statistics and then apply the fiducial recipe to the distribution ofthe complete sufficient statistics. A simple Jacobian calculation shows thatthe fiducial distribution will be independent of the particular form of thecomplete sufficient statistics we used. This idea has been used in the firstpart of Example 6.

9 Comparison with Bayesian Inference

The fact that Bayesian and fiducial inference have a strong connection is wellestablished. See for example Fraser (1961) and Bondar (1972). In this section

48

we first discuss yet another similarity between the two approaches. We arguethat the non-uniqueness due to conditioning for fiducial inference is similarto the non-uniqueness due to the choice of a prior in Bayesian inference. Toexplain our ideas consider the following example:

Example 16. Let us also assume that X ∈ Rd is a continuous random vectorwith distribution indexed by parameter ξ ∈ Ξ, where the parameter spaceΞ is an open subset of Rk. Denote the density of X by fX(x|ξ). Assumethat there is an n- dimensional random variable E with density fE(e) thatdoes not depend on the paprameter ξ such that the structural equation (3)is expressible as

X = G(ξ, E), (34)

where the function G(ξ, · ) is one-to-one and differentiable for each fixedξ. Denote the inverse of the function G(ξ, e) taken as a function of e asG−1(x, ξ). Thus we have

fX(x|ξ) = fE(G−1(x, ξ))JG−1(x, ξ), (35)

where JG1(x, ξ) is the Jacobian of G−1(x, ξ).Additionally assume the cardinality of the set (c.f., (4)) |Q(x, u)| ≤ 1. De-

note the observed value of X by x and assume, that there are a p-dimensionalfunction Hξ(x, E?) and an (n− p)-dimensional function Hc(x, E?) such thatthe definition of the fiducial distribution (5) can be expressed as

Hξ(x, E?) |Hc(x, E?) = 0. (36)

This means in particular that if x = G(ξ, e), then Hξ(x, e) = ξ and Hc(x, e) =0. Let H = (Hξ,Hc) and assume that the function H(ξ, · ) is one-to-one anddifferentiable for each fixed x. Denote the inverse of the function H(x, e)taken as a function of e as H−1(h,x). Notice that the assumptions on thefunctions H and G imply that for any ζ ∈ Ξ, H−1((ζ,0),x) = e if and onlyif G−1(x, ζ) = e. Therefore the density of H(x, E?) at the point (ζ, 0) is

fH(ζ, 0) = fE(H−1((ζ,0),x))JH−1((ζ, 0),x). (37)

By comparing (35) and (37) we get

fH(ξ, 0) = fX(x|ξ)J(x, ξ) where J(x, ξ) =JH−1((ζ, 0),x)

JG1(x, ξ).

49

Therefore the conditional density of the distribution described in (36) is

r(ξ|x) =fX(x|ξ)J(x, ξ)∫

ΞfX(x|ξ′)J(x, ξ′) dξ′

. (38)

The equation (38) is similar to the definition of the usual Bayes posterior.However the role of the prior is taken by the fraction of Jacobians J(x, ξ). Adifferent choice of the functions G and H could lead to different Jacobiansand consequently to a different fiducial distribution. In other words the choiceof the structural equation and the description of the conditional distribution(36) has a similar effect as the choice of a reference prior has on Bayesianmethods.

There are several very important consequences of formula (38). First wesee that a fiducial distribution is Bayesian posterior if and only if J(x, ξ) =k(x)l(ξ) where k and l are measurable functions. However J(x, ξ) does nothave to decompose in this way, in which case the fiducial distribution is nota fiducial distribution with respect to any prior. A classical example of thisis in Grundy (1956). On the other hand, for any J(x, ξ) = k(x)l(ξ) obtainedfrom a reasonable G and H, l(ξ) could be considered as a reference prior.

Next we discuss the connections between asymptotic behavior of fiducialand Bayesian procedures.

Remark 17. Let X1, X2 . . . be random variables with a continuous distribu-tion depending on ξ ∈ Ξ. Also let Xn = (X1, . . . , Xn) and X = (X1, X2, . . .).We will investigate the limit as n → ∞. Assume that the conditions of ex-ample 16 are satisfied for all n and ξ is the value that was used to generatethe data. If the function

Jn

(xn, ξ +

ζ√n

)→ k(x), (39)

one can expect that the fiducial distribution will satisfy the all-importantcondition 2 of Assumption 1 if and only if the Bayesian posterior with respectto the flat prior does, provided some additional technical conditions suchas uniform integrability are satisfied. However, the question of when theBayesian posterior satisfies condition 2 has been well studied and the answeris captured by the Bernstein-von Mises theorem – see for example Le Cam& Yang (2000).

It is particularly interesting to study the result of example 16 when thefunction H is selected following the recommendation in example 15. This isdescribed in our final example below.

50

Example 17. As in example 16, let X = G(ξ, E). Moreover, in a mannersimilar to example 15, set X0 = (X1, . . . , Xp), E0 = (E1, . . . , Ep), Xc =(Xp+1, . . . , Xn) and Ec = (Ep+1, . . . , En). Assume that G = (G0,Gc), where

X0 = G0(ξ, E0) and Xc = Gc(ξ, Ec).

Now assume that for each fixed ξ ∈ Ξ the functions G0(ξ, · ) and Gc(ξ, · )are one-to-one and differentiable. Thus

fX(x|ξ) = fE(G−10 (x0, ξ),G

−1c (xc, ξ))JG−1

0(x0, ξ)JG−1

c(xc, ξ).

To define the function H assume additionally that, for each fixed e0,the mapping G0(· , e0) is invertible and differentiable. Denote this inversemapping by Hξ(x0, e0) and define

Hc(x, e) = Gc(Hξ(x0, e0), ec)− xc.

Notice, that if x = G(ξ, e) then Hξ(x0, e0) = ξ and Hc(x, e) = 0. Finally,for all fixed x

H−1((ζ, s),x) = (G−10 (x0, ζ),G−1

c (xc + s, ζ))

and the Jacobian

JH−1(x, ζ) = JG−10 (x0,· )(x0, ζ)JG−1

c(xc, ζ).

Here JG−10 (x0,· ) is the Jacobian constructed by taking derivatives with respect

to ζ. Thus by comparison we get

r(ξ|x) =fX(x|ξ)J0(x0, ξ)∫

ΞfX(x|ξ′)J0(x0, ξ′) dξ′

(40)

where

J0(x0, ξ) =

∣∣∣∣∣∣det(

ddξ

G−10 (x0, ξ)

)det(

ddx0

G−10 (x0, ξ)

)∣∣∣∣∣∣ . (41)

The quantity J0(x0, ξ) in (41) does not depend on n. Therefore (39)will be satisfied as long as J0 is continuous in ξ. Consequently one shouldexpect that this fiducial distribution will have good frequentist propertiesasymptotically, if and only if Bayesian inference with respect to flat priorhas good frequentist properties asymptotically.

51

Remark 18. The fiducial density described in (40) and (41) could be consid-ered as a generalization of the Fisher’s original definition of fiducial distribu-tion. Indeed, let X be a continuous random variable with density f(x|ξ) anddistribution function F (x|ξ), where ξ ∈ R. We can consider the structuralequation X = F−1(U, ξ). This structural equation leads to

J0(x, ξ) =

∣∣∣ ∂∂ξ

F (x|ξ)∣∣∣

f(x|ξ)

and (40) coincides with Fisher’s definition (1).

10 Conclusions

In this paper we studied the properties of fiducial distributions without re-lying on any additional group assumptions. We have shown how the fiducialargument could be applied to several problems and demonstrated by simula-tion that it leads to statistical procedures with good small sample frequentistproperties. We also investigated the asymptotic properties of fiducial distri-butions and showed that in many examples fiducial distribution has goodasymptotic properties. Thus fiducial inference appears to be a good tool forderiving statistical procedures and should not be ignored by the statisticalcommunity.

Finally we investigated an inherent non-uniqueness of fiducial inferencethat is in some way similar to the non-uniqueness of Bayesian inference dueto the choice of a prior. We argued that the non-uniqueness of fiducial infer-ence is essentially caused by the Borel paradox, the fact that the conditionaldistribution conditioned on an event of probability 0 is not uniquely deter-mined. In fact in our opinion the Borel paradox is the root cause for most ofthe paradoxes associated with fiducial inference. Since Borel paradox cannotbe resolved we believe that there is no way to establish a “paradox free”theory of fiducial inference applicable to a wide range of statistical problems.

11 Acknowledgments

I would like to thank Prof. Iyer, who introduced me to the field of fiducialinference, for many useful conversations that were very critical in developing

52

this manuscript. I would also like to thank Yuriy Glagovskiy and Prof. Mielkefor finding numerous typos in an earlier version of this manuscript.

References

Abdel-Karim, A. (2005), Applications of Generalized Inference, PhD thesis,Colorado State University, Fort Collins, CO.

Behrens, W. V. (1929), ‘Ein beitrag zur fehlerberchnung bei wenigenbeobachtungen’, Landw. Jb. LXVIII, 807 – 837.

Billingsley, P. (1995), Probability and measure, Wiley Series in Probabilityand Mathematical Statistics, third edn, John Wiley & Sons Inc., NewYork. A Wiley-Interscience Publication.

Bondar, J. V. (1972), ‘Structural distributions without exact transitivity’,Ann. Math. Statist. 43, 326–339.

Brown, L. D., Cai, T. T. & DasGupta, A. (2001), ‘Interval estimation for abinomial proportion’, Statist. Sci. 16(2), 101–133. With comments anda rejoinder by the authors.

Brown, L. D., Cai, T. T. & DasGupta, A. (2002), ‘Confidence intervalsfor a binomial proportion and asymptotic expansions’, Ann. Statist.30(1), 160–201.

Burch, B. D. & Iyer, H. K. (1997), ‘Exact confidence intervals for a varianceratio (or heritability) in a mixed linear model’, Biometrics 53(4), 1318–1333.

Burdick, R. K., Borror, C. M. & Montgomery, D. C. (2005), Design and anal-ysis of gauge R&R studies, ASA-SIAM Series on Statistics and AppliedProbability, Society for Industrial and Applied Mathematics (SIAM),Philadelphia, PA. Making decisions with confidence intervals in randomand mixed ANOVA models.

Cai, T. T. (2005), ‘One-sided confidence intervals in discrete distributions’,J. Statist. Plann. Inference 131(1), 63–88.

53

Casella, G. & Berger, R. L. (2002), Statistical inference, The Wadsworth& Brooks/Cole Statistics/Probability Series, 2nd edn, Wadsworth &Brooks/Cole Advanced Books & Software, Pacific Grove, CA.

Dawid, A. P. & Stone, M. (1982), ‘The functional-model basis of fiducialinference’, Ann. Statist. 10(4), 1054–1074. With discussions by G. A.Barnard and by D. A. S. Fraser, and a reply by the authors.

Dawid, A. P., Stone, M. & Zidek, J. V. (1973), ‘Marginalization paradoxes inBayesian and structural inference’, J. Roy. Statist. Soc. Ser. B 35, 189–233. With discussion by D. J. Bartholomew, A. D. McLaren, D. V.Lindley, Bradley Efron, J. Dickey, G. N. Wilkinson, A. P.Dempster, D.V. Hinkley, M. R. Novick, Seymour Geisser, D. A. S. Fraser and A.Zellner, and a reply by A. P. Dawid, M. Stone, and J. V. Zidek.

Dempster, A. P. (1968), ‘A generalization of Bayesian inference. (With dis-cussion)’, J. Roy. Statist. Soc. Ser. B 30, 205–247.

E, L., Hannig, J. & Iyer, H. K. (2006), Fiducial inference for variance com-ponents in an unbalanced one-way random model, Technical Report2006-2, Colorado State University.

Efron, B. (1998), ‘R. A. Fisher in the 21st century (invited paper presentedat the 1996 R. A. Fisher Lecture)’, Statist. Sci. 13(2), 95–122. Withcomments and a rejoinder by the author.

Fisher, R. A. (1930), ‘Inverse probability’, Proceedings of the CambridgePhilosophical Society xxvi, 528–535.

Fisher, R. A. (1935a), ‘The fiducial argument in statistical inference’, Annalsof Eugenics VI, 91 – 98.

Fisher, R. A. (1935b), ‘The logic of inductive inference’, J. Roy. Statist. Soc.Ser. B 98, 29 – 82.

Fraser, D. A. S. (1961), ‘On fiducial inference’, Ann. Math. Statist. 32, 661–676.

Fraser, D. A. S. (1966), ‘Structural probability and a generalization’,Biometrika 53, 1–9.

54

Fraser, D. A. S. (1968), The structure of inference, John Wiley & Sons Inc.,New York-London-Sydney.

Ghosh, J. K. (1994), Higher Order Assymptotics, NSF-CBMS Regional Con-ference Series, Institute of Mathematical Statistics, Hayward.

Grundy, P. M. (1956), ‘Fiducial distributions and prior distributions: anexample in which the former cannot be associated with the latter’, J.Roy. Statist. Soc. Ser. B. 18, 217–221.

Hannig, J. (1996), On conditional distributions as limits of martingales, Mgr.thesis, (in czech), Charles University, Prague, Czech Republic.

Hannig, J., E, L., Abdel-Karim, A. & Iyer, H. K. (2006a), ‘Simultaneousfiducial generalized confidence intervals for ratios of means of lognormaldistributions’, Austrian Journal of Statistics p. to appear. Proceedingsof Perspectives in Modern Statistical Inference III.

Hannig, J., Iyer, H. K. & Patterson, P. (2006b), ‘Fiducial general-ized confidence intervals’, Journal of American Statistical Association101(473), 254 – 269.

Iyer, H. K. & Patterson, P. (2002), A recipe for constructing generalizedpivotal quantities and generalized confidence intervals, Technical Report2002/10, Department of Statistics, Colorado State University.

Iyer, H. K., Wang, C. M. J. & Mathew, T. (2004), ‘Models and confidence in-tervals for true values in interlaboratory trials’, J. Amer. Statist. Assoc.99(468), 1060–1071.

Jeffreys, H. (1940), ‘Note on the Behrens-Fisher formula’, Ann. Eugenics10, 48–51.

Krishnamoorthy, K. & Mathew, T. (2002), ‘Assessing occupational expo-sure via the one-way random effects model with balanced data’, JABES7(3), 440–451.

Krishnamoorthy, K. & Mathew, T. (2004), ‘One-sided tolerance limits inbalanced and unbalanced one-way random models based on generalizedconfidence intervals’, Technometrics 46(1), 44–52.

55

Le Cam, L. & Yang, G. L. (2000), Asymptotics in statistics, Springer Series inStatistics, second edn, Springer-Verlag, New York. Some basic concepts.

Liao, C. T. & Iyer, H. K. (2004), ‘A tolerance interval for the normal distribu-tion with several variance components’, Statist. Sinica 14(1), 217–229.

Mood, A. M., Graybill, F. A. & Boes, D. C. (1974), Introduction to theTheory of Statistics, McGraw-Hill series in probability and Statistics,third edn, McGraw-Hill.

Salome, D. (1998), Staristical Inference via Fiducial Methods, PhD thesis,University of Groningen.

Searle, S. R., Casella, G. & McCulloch, C. E. (1992), Variance components,Wiley Series in Probability and Mathematical Statistics: Applied Prob-ability and Statistics, John Wiley & Sons Inc., New York. A Wiley-Interscience Publication.

Stevens, W. L. (1950), ‘Fiducial limits of the parameter of a discontinuousdistribution’, Biometrika 37, 117–129.

Tsui, K.-W. & Weerahandi, S. (1989), ‘Generalized p-values in significancetesting of hypotheses in the presence of nuisance parameters’, J. Amer.Statist. Assoc. 84(406), 602–607.

Weerahandi, S. (1993), ‘Generalized confidence intervals’, J. Amer. Statist.Assoc. 88(423), 899–905.

Weerahandi, S. (2004), Generalized inference in repeated measures, WileySeries in Probability and Statistics, Wiley-Interscience [John Wiley &Sons], Hoboken, NJ. Exact methods in MANOVA and mixed models.

Wilkinson, G. N. (1977), ‘On resolving the controversy in statistical infer-ence’, J. Roy. Statist. Soc. Ser. B 39(2), 119–171. With discussion.

Zabell, S. L. (1992), ‘R. A. Fisher and the fiducial argument’, Statist. Sci.7(3), 369–387.

56

Date post:	08-May-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

pdfs.semanticscholar.org€¦ · On Fiducial Inference – the good, the bad and the ugly Jan...

Documents