On Some Method for Diagnosing Convergence in MCMC

8/6/2019 On Some Method for Diagnosing Convergence in MCMC

1/24

Control and Cybernetics

vol. 36 (2007) No. 4

On some method for diagnosing convergence in MCMC

setups via atoms and renewal sets

by

Maciej Romaniuk

Systems Research Institute, Polish Academy of Sciencesul. Newelska 6, 01447 Warszawa, Poland

e-mail: [email protected]

Abstract: MCMC setups are among the best known methodsfor conducting computer simulations necessary in statistics, physics,biology, etc. However, to obtain appropriate solutions, additionalconvergence diagnosis must be applied for trajectory generated byMarkov Chain. In the paper we present the method for dealingwith this problem, based on features of so called secondary chain(the chain with specially selected state space). The secondary chainis created from the initial chain by picking only some observationsconnected with atoms or renewal sets. The discussed method hassome appealing properties, like high degree of diagnosis automation.Apart from theoretical lemmas, the example of application is alsoprovided.

Keywords: convergence diagnosis, Markov Chain Monte Carlo,

Markov Property, atom, renewal set, renewal theory, automated di-agnosis of simulations.

1. Introduction

The end of the previous century brought a colossal improvement in speed ofcalculations. Because of computer development, researchers could build morecomplex, more real-life models. The same applies to mathematics, statistics,physics and biology, where computer simulations are widely used.

One of the best known methods in computer simulations are MCMC (MarkovChain Monte Carlo) algorithms, successors of MC (Monte Carlo) approach (seeMetropolis et al., 1953, Metropolis and Ulam, 1949). They are commonly used

in many practical areas (see, e.g., Boos, Zhang, 2000; Booth, Sarkar, 1998;Bremaud, 1999, Doucet et al., 2000; Gelfand et al., 1990; Gilks et al., 1997;Kass et al., 1998; Koronacki et al., 2005; Lasota, Niemiro, 2003; Li et al., 2000;Mehta et al., 2000; Robert, Casella, 2004; Romaniuk, 2003).

The MCMC method is based on a simple observation. In order to find the ex-pected value EXh(X) for some function h(.) and probability distribution X(.),


2/24

986 M. ROMANIUK

we could generate Markov Chain X0

, X1

, X2

, . . . with the stationary distribu-tion X . The convergence of estimator derived from the simulated samples isguaranteed by ergodic theorems (see, e.g., Robert, Casella, 2004, for additionaldetails). Hence, we do not have to generate values directly from X(.), butwe may use more general algorithms like the Gibbs sampler or the Metropolis-Hastings algorithm.

But during the conduct of simulations two questions arise all the time. Thefirst one is connected with choosing appropriate number of steps nstat for sim-ulated trajectory, when the sampled transition probability Prnstatx0 (.) is closeenough to the assumed stationary probability X(.) regardless of the startingpoint x0. The second one is related to finding the number of steps n2, when theestimator ofEXh(X), derived from sample Xnstat+1, Xnstat+2, . . . , X n2 , has suf-ficiently small error, measured e.g. by variance. These two questions are covered

by convergence diagnosis and are one of main aspects for MCMC simulations.There is a lot of various convergence diagnosis methods (see, e.g., Robert,

Casella, 2004, for comparative review). But we have to say that it is not soeasy to compare them and find the best one or even the best ones. Firstly,these methods very often base on different features of the underlying MarkovChains, e.g. specific structure of state space for Markov Chain. Secondly, thetwo questions mentioned before are used to be written in mathematical formulasnot corresponding to one another, i.e. not directly comparable. Thirdly, it isnot even possible to draw a comparison between the heuristic and theoretical(i.e. based on mathematical proofs) methods. Therefore, each new convergencediagnosis method may be seen as additional tool for experimenters, which givesthem a new possibility to check the obtained simulations.

In this paper we discuss the method based on the concept of the so calledsecondary chain. Such a chain is derived from the original trajectory by ob-serving the samples only in moments determined by special, probability rules.These rules are connected with notion of atoms and renewal sets, which area specific example of more general renewal moments and are part of renewaltheory.

The method presented has three main advantages. Firstly, it is supported bystrong mathematical reasoning. Therefore, it is far less influenced by observersintuition and his experience than the entirely heuristic methods. Secondly, theobtained solutions are strict, i.e. they are not asymptotic. Hence, this method isnot biased by additional error provided by limit theorems. Thirdly, the discussedsolutions may be used in highly automated manner. This gives the possibilityto prepare general diagnosis algorithms for a wide class of MCMC problems.

The paper is organized as follows. In Section 2 we present the necessarybasic definitions and theorems. Then, in Section 3.1 we introduce the notion ofsecondary chain and some fundamental facts about it. In Section 3.2 we formu-late two inequalities which are directly connected to the convergence diagnosisquestions mentioned before. Then, we find answers for these questions in Sec-tions 3.3 and 3.4 in a few lemmas and more heuristic remarks. These are based


3/24

On some method for diagnosing convergence in MCMC setups via atoms and renewal sets 987

on the properties of secondary chain. In Section 4 we present how the derivedresults may be applied in case of a simple example. The concluding remarks arecontained in Section 5.

2. Basic definitions and theorems

In this section we introduce fundamental definitions and theorems. Addi-tional necessary definitions may be found in, e.g., Bremaud (1999), Fishman(1996), Robert and Casella (2004).

Let (Xi)i=0 = (X0 = x0, X1, . . .) denote a Markov Chain (abbreviated fur-ther MC), and B(X) be the field of Borel sets for space X.

The chain (Xi)i=0 has its values in a space X, where X N or X B(Rk).

In the first case such MC is called a discrete MC, and in the second an MC

on continuous state space.Suppose that the chain (Xi)i=0 is ergodic and has an adequate stationary

probability distribution X(.). In this paper the term ergodicity means thatthe chain is recurrent (or Harris recurrent in case of MC on continuous statespace X), aperiodic and irreducible.

If (Xi)i=0 is a discrete Markov Chain, we define its transition matrix PX as

PX = (Pr (Xk+1 = j|Xk = i))sXi,j=1 , (1)

where sX is the power of X. In case of continuous state space X, let us denoteby KX(., .) the transition kernel of this chain

Pr(Xk+1 B|Xk = x) = B KX(x, y) dy . (2)Definition 1 The set A is called an atom if there exists a probability distrib-ution (.) such that

Pr(Xk+1 B|Xk = x) = (B) (3)

for every x A and every B B(X)

Definition 2 The set A is called renewal set if there exist a real 0 < < 1and a probability measure (.) such that

Pr(Xk+1 B|Xk = x) (B) (4)

for every x A and every B B(X).

These two definitions may be found in, e.g., Asmussen (1979), Robert andCasella (2004).

If A is a renewal set, it is advantageous to slightly change the used MCMCalgorithm which generates the values of (Xi)i=0. It is easily seen that

Pr(Xk+1|Xk) = (Xk+1) + (1 )Pr(Xk+1|Xk) (Xk+1)

1 (5)


4/24

988 M. ROMANIUK

in case of discrete MC, or

K(xk, xk+1) = (xk+1) + (1 )K(xk, xk+1) (xk+1)

1 (6)

for MC on continuous state space X. Hence, we have the following modificationof the algorithm: when Xk A, generate Xk+1 according to

Xk+1 =

Xk+1 (.) if Uk+1

Xk+1 K(xk,.)(.)

1if Uk+1 >

, (7)

where Ui are iid random variables from a uniform distribution on [0, 1], indepen-dent on (Xi)i=0. Since (5) or (6), the modification (7) of the MCMC algorithm

does not change the properties of the chain. Also its stationary distribution isstill the same, i.e. X(.). This modification for MCMC algorithms was intro-duced in Athreya and Ney (1978), Nummelin (1978). The generation accordingto (7) may be difficult because of the complex structure of the remainderkernel. A way around this problem was shown in Mykland, Tierney and Yu(1995).

Definition 3 The atom (or renewal set) A is called geometrically ergodic atom(or renewal set) if there exist r > 1 and M > 0 such that

|Prnx(y) X(y)| M rn , (8)

for anyx, y A, where Prnx(.) denotes Pr(Xn = . |X0 = x).

Let us denote by EXh(X) the expected value of the function h : X Rcalculated according to the stationary distribution X . Appropriate symbols CovX (g, h) and VarX (h) are used for covariance and variance.

3. Proposal for the of convergence diagnosis method

In this section we present a convergence diagnosis method for the MCMCoutput. This proposal uses the notions ofatoms and renewal sets and also someproperties derived for discrete Markov Chains (see Section 2).

3.1. Introducing secondary chain

Suppose that we are interested in diagnosing convergence of some ergodicMarkov Chain (Xi)i=0 = (X0 = x0, X1, . . .). We denote a stationary probabilitymeasure for this chain by X(.), its transition matrix by PX (or transition kernelby KX(., .) in case of MC on continuous state space) and the space of its valuesby X. Suppose also that we know two atoms (or renewal sets) A1, A2 for thischain.


5/24


Therefore, we can create the secondary chain (Yi)i=1

based on our initialchain (Xi)i=0. If A1, A2 are atoms, then we can define

1 := min{i = 1, . . . : Xi A1 A2} , (9)

k+1 := min{i > k : Xi A1 A2} , (10)

Yk = Xk . (11)

It is seen that the chain (Yi)i=1 has Markov Property for the truncated (reduced)

space Y

:= {A1, A2} see Lemma 1 for proof.If these two sets are renewal sets, we should introduce the modification (7)

and change the definition of the chain (Yi)i=1 according to

1 := min{i = 1, . . . : (Xi A1 Ui A1) (Xi A2 Ui A2)} ,

(12)k+1 := min{i > k : (Xi A1 Ui A1) (Xi A2 Ui A2)} ,

(13)

Yk = Xk , (14)

where Aj denotes the parameter for appropriate renewal set Aj in condi-tion (7). Also in this case the secondary chain (Yi)i=1 has Markov Property for

the space Y

.We may conclude previous observations in a simple lemma:

Lemma 1 If A1, A2 are atoms (or renewal sets), the chain (Yi)i=1 defined byconditions (9) (11) (or (12) (14), respectively) is a Markov Chain for the

space Y

:= {A1, A2}. This chain is ergodic.

Proof. The chain (Yi)i=1 has Markov property for reduced space {x : x A1 A2} from Strong Markov Property. IfAj is a small set, then from (3) probabilityPr(Yk+1 B |Yk = y) is constant for all y Aj . Hence

Pr(Yk+1 Aj |Yk Aj) = Pr(Yk+1 Aj |Yk = y) (15)

for all y Aj .If Aj is a renewal set, the argument is similar. The modification (7) intro-

duces independent generation from probability measure Aj (.) with probabilityAj . And this measure is constant for all y Aj .

The ergodicity of (Yi)i=1 follows directly from ergodicity of (Xi)i=0.

The proof similar to the one given above may be found in Guihenneuc-Jouyaux and Robert (1998), but it is more complicated there than the straight-forward reasoning presented in Lemma 1.

For simplicity of notation, we continue to call atoms or renewal sets Aj asspecial sets, keeping in mind different definitions of the secondary chain (Yi)i=1for these two cases.


6/24

990 M. ROMANIUK

The moments i defined previously, may be additionally partitioned between

corresponding special sets. Hence, we obtain the following definition of (j)i for

the fixed atom Aj :

(j)1 := min{i = 1, . . . : Xi Aj} , (16)

(j)k+1 := min{i > (j)k : Xi Aj} . (17)

For the renewal set Aj the definition of(j)i is an equivalent change of the above

formulas, i.e.:

(j)1 := min{i = 1, . . . : Xi Aj Ui Aj} , (18)

(j)k+1 := min{i >

(j)k : Xi Aj Ui Aj} . (19)

Therefore, (j)1 may be considered as the moment of first visit in the set Aj .

Lemma 2 For the fixed j = 1, 2, the sums of the form(j)i+1

k=(j)i +1

Xk are con-

ditionally iid in the stationary regime for i = 1, . . .. The same applies for the

sumsi+1

k=i+1Xk for i = 1, 2, . . ..

Proof. The variables (j)k and k are stopping times. Therefore, the sequences

X(j)i +1

, . . . , X (j)i+1

(or their equivalents for k) are conditionally iid in the

stationary regime from the Strong Markov Property. Hence, the appropriatesums are also conditionally iid (for additional remarks see, e.g., Bremaud, 1999,

Chapter 2.7).

3.2. Diagnosis of the initial chain

As we have noted in Section 3.1, for chain (Xi)i=0 with two known specialsets Aj (j = 1, 2) we may introduce additional chain (Yi)i=1. The chain (Yi)i=1is a discrete MC with only two states, regardless of cardinality and power of thespace X.

During diagnosis of the initial chain, we are interested in two values nstatand nVar. The first value nstat is the time moment when we are close enoughto stationary distribution X , i.e.

Pnstatx0

X 1 , (20)where . indicates some determined norm for space X, e.g. total variation normwhich is used in the rest of this paper, Prnstatx0 (.) = Pr(Xnstat = . |X0 = x0).When the number of simulations nstat in the MCMC algorithm is attained, inthe light of (20) we may treat (Xi)instat as being distributed almost from astationary distribution X .


7/24


Suppose that we are interested in obtaining an estimator of the expectedvalue EXh(X) based on the average of the initial chain. It is easy to see thatwe would like to achieve sufficiently small variance of this estimator and findthe quantity nVar fulfilling the condition

Var

1

s

nVark=nstat+1

h(Xk) EXh(X)

2 , (21)

where s = nVar nstat.We conclude observations concerning the problems given by (20) and (21)

in the following lemmas and remarks.

3.3. Finding nstat valueLet us start from the the classical case, when X has only two states.

Lemma 3 Suppose that X = {A1, A2} = {1, 2}. Then, inequalityPrnstatx0 Xsup 1 (22)is fulfilled for

nstat ln 1(+)

min{,}

ln , (23)

where and are derived for transition matrix of (Xi)i=0

PX =

1

1

, (24)

and = 1 .

Proof. In this case, it is known that stationary distribution is

TX = (X(1), X(2)) =1

+ (, ) (25)

and the kth step transition matrix is

PkX =

X(1) X(2)X(1) X(2)

+

k

+

. (26)

If we start our chain in the state A1 = 1, then the kth step probability will beY(1) +

k

+ , Y(2) +

k

+

. (27)


8/24

992 M. ROMANIUK

Hence, (22) is fulfilled for such nstat

that nstat

+

1. If we start the chain from

the state A2, we obtainnstat

+ 1. Joining these two results and knowing

that < 1, we establish (23). Another approach to this result with some faults(e.g. the chain considered there is not MC) may be found in Raftery and Lewis(1999).

Then, we can turn to a more general case, when X has more than only twostates.

Lemma 4 Suppose thatX is a finite space and A1 is a known atom for X. Then

yX

| Prnx(y) X(y)| 2Prx((1)1 n) +

n1j=0

Prx((1)1 = j)

nj1k=1

PrkA1(A1) X(A1)PrA1((1)1 n k j)++ X(A1)EA1

(1)1 (n j)

+

. (28)

Proof. Let us remind that (1)1 may be treated as the moment of the first visit

in the set A1.If we know the atom A1, then for any y X we have

X(y) = X(A1)n=0

PrA1(Xn = y, (1)1 n) , (29)

where Prx(.), as usually, denotes Pr(.|X0 = x). The proof of (29) may be foundin Robert and Casella (2004, see Theorem 4.5.3).

We have

Prnx(y) = Prx(Xn = y, (1)1 n) +

n1j=0

Prx(Xj A1, (1)1 = j)

nj1k=0

PrkA1(A1)PrA1(Xnkj = y, (1)1 n k j)

. (30)

The notation PkA1 (A1) and PrA1(.) is validated because of the thesis of Lemma 1.Using expansion (30) we have

|Prnx(y) X(y)| Prx(Xn = y, (1)1 n) +

n1j=0

Prx((1)1 = j)

nj1k=0

PrkA1(A1)PrA1(Xnkj = y, (1)1 n k j

X(y)

. (31)


9/24


Hence

|Prnx(y) X(y)| Prx(Xn = y, (1)1 n) +

n1j=0

Prx((1)1 = j)

nj1k=0

PrkA1(A1)PrA1(Xnkj = y, (1)1 n k j) X(y)

X(y)j=n

Prx((1)1 = j)

. (32)From (29) for any j n 1 we have

X(y) = X(A1)

nj1k=0

PrA1(Xnkj = y, (1)1 n k j)+

+ X(A1)

l=nj+1

PrA1(Xl = y, (1)1 l) . (33)

After applying (33) to (32) we have

|Prnx(y) X(y)| Prx(Xn = y, (1)1 n) +

n1j=0

Prx((1)1 = j)

nj1k=0

PrkA1(A1) X(A1)

PrA1(Xnkj = y, (1)1 n k j)

X(A1)

l=nj+1

PrA1(Xl = y, (1)1 l)

X(y)Prx((1)1 n)

. (34)Straightforwardly

|Prnx(y) X(y)| Prx(Xn = y, (1)1 n) +

n1j=0

Prx((1)1 = j)

nj1

k=0

PrkA1

(A1) X(A1)PrA1(Xnkj = y, (1)1

n k j)+

+X(A1)

l=nj+1

PrA1(Xl = y, (1)1 l)

+ X(y)Prx((1)1 n) , (35)

which constitutes (28).


10/24

994 M. ROMANIUK

The equations (28) and (35) may be used to establish further dependenciesbetween the initial and the secondary chain. Now we present a simple lemma,which may be helpful in the practice of MCMC setups.

Lemma 5 Suppose that A1 is a geometrically ergodic atom with constant M1and coefficient r1, and there exist M2 > 0, r2 > 1, M3 > 0, r3 > 1 such that

PrA1((1)1 n) M2r

n2 , (36)

and

Prx((1)1 = n) M3r

n3 (37)

are fulfilled. Then, inequalityyX

|Prnx(y) X(y)| 1 (38)

is satisfied for n given as the solution of formula

2M3r

1n3

r3 1+

M2M3r3(rn3 r

n2 )

(r2 1)(r2 r3)+

+M1M2M3(r2 r1)

r1r3(r

n3 r

n1 )

(r1 r3)+

r2r3(rn3 r

n2 )

(r3 r2)

1 . (39)

Proof. After applying conditions (8), (36), (37) to inequality (28) we can straight-forwardly prove (39).

It is worth noting that it is possible to improve the inequality (39). If weknow the value of stationary probability X(A1), then we have a more detailedcondition

2M3r

1n3

r3 1+

X (A1)M2M3r3(rn3 r

n2 )

(r2 1)(r2 r3)+

+M1M2M3(r2 r1)

r1r3(r

n3 r

n1 )

(r1 r3)+

r2r3(rn3 r

n2 )

(r3 r2)

1 . (40)

3.4. Finding nVar value

After establishing nstat we may turn to finding evaluation for nVar. To sim-

plify the notation in the rest of this section we denote by (j)1 the first visit in

set Aj for the chain (Xk)knstat , by (j)2 the second one, etc.

As previously, we start from the case of the two-state space X.


11/24


Lemma 6 Suppose that X = {A1

, A2

} = {1, 2} and Xnstat

X . Then condi-tion (21) is fulfilled for any s = n nstat given by inequality

(h(1) h(2))2

( + )3

2(s 1)

s2+

( + )(1 2)

s

2 , (41)

where = 1 .

Proof. For simplicity of notation, as in Lemma 3 we suppose that state i denotesthe atom Ai.

We have

Var

1

s

nk=nstat+1

h(Xk) EXh(X)

=

=1

s2

Var

n1

k=nstat+1

h(Xk) (s 1)EXh(X)

+

+ 2 Cov

n1

k=nstat+1

h(Xk) (s 1)EXh(X)

(h(Xn) EXh(X))

+

+ Var (h(Xn) EXh(X))

. (42)

In this case we can write the covariance as

Cov

n1

k=nstat+1

h(Xk) (s 1)EXh(X)

(h(Xn) EXh(X))

=

=n1

k=nstat+1

E ((h(Xk) EXh(X)) (h(Xn) EXh(X))) =

=s1k=1

E ((h(Xnk) EXh(X)) (h(Xn) EXh(X))) . (43)

From the assumption that Xnstat X , for n nstat the variables Xn maybe treated as derived from stationary distribution X . Therefore

E ((h(Xnk) EXh(X)) (h(Xn) EXh(X))) =

= E (h(Xnk)h(Xn)) E2X

h(X) . (44)


12/24

996 M. ROMANIUK

From properties (25) and (26) for the two-state MC we have

E (h(Xnk)h(Xn)) E2X

h(X) = h2(1)X(1)Pr1(Xk = 1)+

+ h(1)h(2)X(1)Pr1(Xk = 2) + h(2)h(1)X(2)Pr2(Xk = 1)+

+ h2(2)X (2)Pr2(Xk = 2) (h(1)X(1) + h(2)X(2))2

=

=

( + )2k(h(1) h(2))2 . (45)

Applying (42) and simple recurrence to the above formula we obtain

Var

1

s

n

k=nstat

+1

h(Xk) EXh(X)

=

=1

s2

Var

n1

k=nstat+1

Xk (s 1)EXh(X)

+

+2

( + )2(h(1) h(2))2

1 s1

1 + EXh

2(X) E2Xh(X)

=

=1

s2

2

( + )3(h(1) h(2))2

s 1

1 s1

1

+

+ s

h2(1)

+ + h2(2)

+

h(1)

+ + h(2)

+

2,

(46)

which constitutes the thesis (41).

Now we turn to the general case, i.e. when X has more than only two states.In practice, an experimenter, after generation of some length of the chain, isinterested in knowing whether an appropriately small error measured by (21) isattained. Therefore, it is possible in practice that an observer could choose suchvalues of nstat and nVar that they are also moments of visits in special sets A1and A2. This procedure is helpful in elimination of tails, i.e. two fragments ofchain: between nstat and first visit in the special set, and last visit in the specialset and nVar. The estimation of these tails is very complicated in case of nVarevaluation.

Let nstat and nVar be values preliminarily chosen by the experimenter. For

these deterministic parameters, suppose that ns nstat and nV nVar aremoments of visit in a special set Aj , where ns is first such moment, and nV isthe last one. Let

M(j) = #{k : ns (j)k nV} . (47)

Obviously, M(j) is the random number of visits in Aj between ns and nV. For


13/24


the determined j we have

1

nV ns

nVk=ns+1

h(Xk) Exh(X) =

=1

nV ns

M(j)1i=1

(j)i+1

k=(j)i +1

h(Xk)

(j)i+1

(j)i

EXh(X)

, (48)

which constitutes the following remark:

Remark 1 Suppose that ns nstat and nV nVar are moments of visits in thespecial set Aj. Then

VarX

1

nV ns

nVk=ns+1

h(Xk) Exh(X)

=

= VarX

1

nV ns

M(j)1i=1

(j)i+1

k=(j)i +1

h(Xk)

(j)i+1 (j)i

EXh(X)

. (49)

In order to achieve appropriate evaluation of (49) we have to find variance

estimator of single fragment in the trajectory. Let S(j)i =

(j)i+1

k=(j)i +1

h(Xk).

Then the value

2(j) = VarX

(j)2

k=(j)1 +1

h(Xk)

(j)2

(j)1

EXh(X)

(50)

may be estimated by the usual sum of squares estimator

2(j) =1

m(j) 1

m(j)1i=1

S(j)i 1m(j) 1

m(j)1l=1

S(j)l

2

, (51)

where m(j) is the number of visits in Aj (see Lemma 2). A similar estimatorfor the case of M(j) was introduced in Robert (1995).

We could generalize our considerations for the case of more than only onespecial set. Let ns and nV be moments of visits in some special sets A1 and /or A2, not in only one determined Aj . We use additional notation

T(j,l) = (2 and Y2 Al) (1 and Y1 Aj) , (52)


14/24

998 M. ROMANIUK

M(j,l)

= #{k : ns

Yk Aj , Yk+1

Al, k+1

nV

} . (53)

In such a case we have from Strong Markov Property

1

nV ns

nVk=ns+1

h(Xk) Exh(X) =

=1

nV ns

m1i=1

i+1

k=i+1

h(Xk) (i+1 i)EXh(X)

. (54)

In other words, we divide sequence (Xi)nVi=ns

into moments determined by i.In the right-hand side sum of (54) we can distinguish M(1,1) fragments whichstart and finish in A1, M(1,2) fragments which start in A1 and finish in A2, etc.

Therefore we have the following remark:

Remark 2 Let ns nstat and nV nVar be moments of visits in special sets.Then

VarX

1

nV ns

nVk=ns+1

h(Xk) Exh(X)

=

= VarX

1

nV ns

2j,l=1

M(j,l)1i=1

(l)i+1 and Yi+1Al

k=(j)i +1 and YiAj

h(Xk)

T(j,l)EXh(X) . (55)As previously, we need appropriate variance estimator of the trajectory frag-

ment. Let

S(j,l)i =

i+1 and Yi+1Alk=i+1 and YiAj

h(Xk) . (56)

Then, variance

2(j,l) = VarX

2 and Y2Al

k=1+1 and Y1Aj

h(Xk) T(j,l)

EXh(X)

(57)

may be estimated by the usual sum of squares

2(j,l) =1

m(j,l) 1

m(j,l)1i=1

S(j,l)i 1m(j,l)

m(j,l)1k=1

S(j,l)k

2

, (58)

where m(j,l) is the number of transitions between special sets Aj and Al.


15/24


As previously stated, in the method presented we would like to eliminatethe problems caused by tails, which are very hard to estimate for the exper-imenter. Therefore, based on Remark 1 and formula (51), we could postulatethat condition (21) is fulfilled for any n nVar if

nstat + (j)

m(j) 1

2 nVar . (59)

And from Remark 2 and estimator (58) we could postulate the generalizationof condition (59). Then, (21) is fulfilled for any n nVar if

nstat +

m(1,1)

2(1,1) + m(1,2)

2(1,2) + m(2,1)

2(2,1) + m(2,2)

2(2,2)

2 nVar . (60)

We have to note that the presented considerations are based on theoreticalfoundations, but also include strong heuristic flavour. Our initial remarks userandom variables ns and nV, which are also moments of visits in special sets.In (59) and (60) we overinterpreted these results as if s = nVar nstat werea deterministic variable and both nstat and nVar are moments of visits in spe-cial sets. Because of this, it is possible to formulate inequalities for meetingthe condition (21) in a relatively simple way. But the lack of direct connectionbetween (59), (60) and the previous results is a disadvantage of such approach.However, in practice it is always possible to start or finish observations of tra-jectory in appropriate moments, i.e. visits in the special sets.

Results for an estimator similar to (58) may be found in Robert (1995). Butin this reference only the asymptotic features of distances between the values

m(j)2(j)

n(61)

for n and various Aj are used.In this paper we use non-asymptotic features and we show direct connection

between the discussed method and condition (21), which is the basis for MCMCmethodology.

4. Example of application

After introducing the methodology appropriate for finding values nstat andnVar, we present now the example of their application. For simplicity of notation

and in order to derive conclusions, we use state space X with a few atoms.We should emphasize that the solutions established in lemmas in Section 3

give exact (i.e. demonstrated by mathematical reasoning, not only heuristic ap-proach) and precise (i.e. non-asymptotic) values. Therefore, we may focus onlyon the problem of transition of the acquired results from theoretical formulasto the practical example.


16/24

1000 M. ROMANIUK

Let us suppose that we are interested in finding the value Efh(X), wheref(.) describes the state space X with eight atoms and stationary probabilities

f(.) = (1/20, 1/20, 2/20, 2/20, 3/20, 3/20, 4/20, 4/20) , (62)

i.e. first atom has stationary probability 1/20, the second one 1/20, etc.,and h(.) is a uniform function on X, i.e.

h(.) = (1, 1, 1, 1, 1, 1, 1, 1) . (63)

Because of this special form of the function h(.), all the states of space X havethe same weight and importance during MCMC simulations.

In order to calculate Efh(X) we use independent Metropolis-Hastings algo-rithm (see e.g. Robert and Casella, 2004). Our main trajectory has millionelements and is initiated from state one. We also assume that A1 = 3 and

A2 = 7. Therefore, we may compare values nstat and nVar on the basis of stateswith various stationary probabilities.Firstly we would like to find nstat. To apply lemmas from Section 3.3, we

have to evaluate necessary parameters r1, M1, r2, M2, r3, M3 (see assumptionsfor Lemma 5). Normally, experimenter may have some additional knowledgeabout these values, but we use additional simulations in order to determiner1, M1, r2, M2, r3, M3. Hence, we generate additional sets of 50000 trajectorieswith 100 steps in each trajectory and appropriate starting points states one,three and seven. Then, we apply the pessimistic optimization approach.

Namely, if we suppose that for the optimal parameters r1 and M1 we have

| PrnA1(A1) X(A1)| M1rn1 , (64)

then

| PrnA1(A1) X(A1)|

| PrA1(A1) X(A1)| rn+11 . (65)

Therefore, we can find a pessimistic evaluation of r1 in the sense of satisfyingthe condition

r1 = minrR+

n = 2, 3, . . . : rn+1

| PrnA1(A1) X (A1)|

| PrA1(A1) X (A1)| 0

. (66)

As easily seen, (66) gives us the maximal pessimistic guess for r1, becausein this light r1 has to be the upper limit for all steps in strictly deterministicsense. In case of any numerical errors or even for greater values for n (noteexponential decrease in conditions for Lemma 5), this method may give largervalues of r1 than they are in reality. However, other methods, like meeting the

weaker condition

rn+1 | PrnA1(A1) X(A1)|

| PrA1(A1) X(A1)| 0

rn+1 | PrnA1(A1) X(A1)|| PrA1(A1) X(A1)| (67)


17/24


for some small enough , may be easily criticized because of unknown errorgenerated by the selection of the value of .

After fixing the value r1, like in (66), we may find M1 in the same manner,as satisfying the condition

M1 = minMR+

n = 1, 2, . . . : Mrn1 | Pr

nA1(y) X (A1)| 0

. (68)

The analogous formulas may be derived for parameters r2, M2, r3, M3.Then, from the pessimistic optimization for A1 we have

r1 = 1.04 , M1 = 0.0268 ,r2 = 1.0941 , M2 = 1.0888 ,

r3 = 1.0904 , M3 = 0.1372 . (69)

We can substitute these values into the formula (39) in order to find the numberof steps nstat for the given value 1 (see Table 1). In this table, the column Truevalue of1 give the exact value of the left-hand side for (39) and the determinednumber of steps nstat from the second column. The graph of the left-hand sideof (39) as a function of steps number n is shown in Fig. 1.

Table 1. Evaluation of nstat for the third state

Assumed value of 1 Number of steps nstat True value of 10.1 90 0.0978145

0.02 120 0.01967670.01 135 0.00974242

0.001 190 0.000981598

50 100 150 200

0.1

0.2

0.3

0.4

0.5

Figure 1. Error level 1 as a function of n for the third state


18/24

1002 M. ROMANIUK

If we use the improved inequality (40) instead of (39), we may observe thereduction of the necessary number of steps nstat, especially for larger 1 (seeTable 2). This phenomenon is even easier to trace in Fig. 2, where curve ismuch steeper at the beginning of the graph.

Table 2. Evaluation of nstat for the third state based on inequality (40)

Assumed value of 1 Number of steps nstat True value of 10.1 75 0.0981865

0.02 114 0.01950480.01 131 0.00989127

0.001 190 0.000967164

50 100 150 200

0.05

0.1

0.15

0.2

0.25

0.3

Figure 2. Error level 1 as a function of n for the third state based on inequali-ty (40)

We may perform the same analysis for the seventh state, i.e. the special setA2. In this case the necessary parameters may be evaluated as

r1 = 1.0438 , M1 = 0.0793 , r2 = 1.14385 , M2 = 1.1439 ,

r3 = 1.1231 , M3 = 0.1394 . (70)

Because the atom A2 has higher stationary probability than A1, we obtain lessernumbers of steps nstat (see Table 3 and Fig. 3).

We can also apply improved inequality (40) for the set A2, but due to thefaster exponential convergence, guaranteed by higher values of ri, the profit ofreduction of nstat value is not so visible as in the previous case.


19/24


Table 3. Evaluation of nstat for the seventh state

Assumed value of 1 Number of steps nstat True value of10.1 71 0.0992184

0.02 107 0.01921240.01 123 0.00961369

0.001 176 0.000988225

50 100 150

0.05

0.1

0.15

Figure 3. Error level 1 as a function of n for the seventh state

Now we turn to finding the value of nVar. Therefore, we use the methodologydescribed in Section 3.4. Let us start from the conclusions based on inequal-ity (59). The solutions derived for A2 may be found in Table 4. In the secondcolumn there is the appropriate number of steps nstat calculated for 1 = 2(compare with Table 1). Because, according to Remark 1, the beginning andthe ending moments should be the visits in A1, the obtained values are putin the third and fourth column. In order to minimize the possible influenceof nstat on nVar evaluation, we always multiply the necessary number of stepsnstat by two before further analysis. In the last column, the theoretical lengthof necessary trajectory

(1)m(1) 1

2 (71)

(compare with (59)) is calculated.The same calculation could be done for A2 (see Table 5).


20/24


21/24


22/24

1006 M. ROMANIUK

Table 8. Evaluation of nVar for both atoms simultaneously with corrected valueof (j,l)

Assumed Number Start Stop Theoreticalvalue of 2 of steps nstat length

0.1 90 180 230 46.72730.02 120 244 502 252.750.01 135 274 751 469.959

0.001 190 382 4571 4181.49

5. Concluding remarks

We start from formulating two inequalities which correspond to standardquestions in MCMC setups, i.e. when the sampled transition probability is closeto the determined stationary probability of Markov Chain? and how many it-erations should be used in order to minimize the error of estimator? Theseinequalities correspond to finding two values number of steps nstat and nVarfor the trajectory generated by some MCMC method. Then we use the fea-tures of secondary chain in order to appoint these values in a series of lemmas.Thereby we obtain a useful set of conditions which could be used for checkingthe convergence in the MCMC setup. The example of application of theoreticallemmas and reasoning based on them for the case of state space with atoms isalso provided. It should be mentioned that this paper comprises some resultsfrom the Ph.D. dissertation (see Romaniuk, 2007), where additional remarksmay be found.

We should emphasize the usefulness of the presented method, which couldbe used in a highly automated manner and provide strict results for the exper-imenter. However, we should note that not just one, but a whole set of variousalgorithms and methods should be applied in order to control the MCMC out-put and guarantee the convergence of the simulated tra jectory at a satisfactorylevel.

The possibilities of complementing the discussed method might also be con-sidered. For example, the conditions obtained might be improved, like in (40).However, additional information about the structure of state space or underly-ing Markov Chain may be necessary in such a case. The dependencies amongthe number of special sets, their allocation, possible modes in state space andobtained solutions may be examined. The lemmas may be also generalized for

other cases of state space structure and numbers of special sets.

References

Asmussen, S. (1979) Applied Probability and Queues. J. Wiley, New York.Athreya, K.B. and Ney, P. (1978) A new approach to the limit theory of


23/24


recurrent Markov chains. Trans. Amer. Math. Soc. 245, 493501.Boos, D. and Zhang, J. (2000) Monte Carlo Evaluation of Resampling

Based Hypothesis Tests. Journal of the American Statistical Association95, No. 450.

Booth, J.G. and Sarkar, S. (1998) Monte Carlo Approximation of Boot-strap Variances. The American Statistician 52, 4.

Bremaud, P. (1999) Markov Chains Gibbs Fields, Monte Carlo Simula-tion, and Queues. Springer Verlag, New York.

Brooks, S.P. and Roberts, G.O. (1998) Convergence assessment techniquesfor Markov chain Monte Carlo. Statistics and Computing 8, 319335.

Cox, D.R. and Miller, H.D. (1965) The Theory of Stochastic Processes.Chapman and Hall, London.

Doucet, A., Godsill, S. and Andrieu, Ch. (2000) On sequential Monte

Carlo sampling methods for Bayesian filtering. Statistics and Computing10.

Fishman, G.S. (1996) Monte Carlo Concepts, Algorithms and Applica-tions. Springer Verlag, New York.

Gelfand, A.E., Hills, S.E., Racine-Poon, A. and Smith, A.F.M. (1990)Illustration of Bayesian Inference in Normal Data Models Using GibbsSampling. Journal of the American Statistical Association 85, 412.

Geyer, C.J. (1992) Practical Markov chain Monte Carlo (with discussion).Statist. Sci. 7, 473511.

Gilks, W.R., Richardson, S. and Spiegelhalter, D. J. (1997) MarkovChain Monte Carlo in Practice. Chapman & Hall.

Guihenneuc-Jouyaux, Ch. and Robert, Ch.P. (1998) Discretization of

Continuous Markov Chains and Markov Chain Monte Carlo ConvergenceAssessment. Jour. of American Stat. Assoc. 93, 443.Iosifescu, M. (1980) Finite Markov Processes and Their Applications. Wi-

ley, New York.Kass, R.E., Carlin, B.P., Gelman, A. and Neal, R.M. (1998) Markov

Chain Monte Carlo in Practice: A Roundtable Discussion The AmericanStatistician 52, 2.

Kipnis, C. and Varadhan, S.R. (1986) Central limit theorem for additivefunctionals of reversible Markov processes and applications to simple ex-clusions Comm. Math. Phys. 104, 119.

Koronacki, J., Lasota, S. and Niemiro, W. (2005) Positron emission to-mography by Markov chain Monte Carlo with auxiliary variables. PatternRecognition 38, 241250.

Lasota, S. and Niemiro, W. (2003) A version of the Swendsen-Wand al-gorithm for restoration of images degraded by Poisson noise. PatternRecognition 36, 931 941.

Li, S., Pearl, D.K. and Doss, H. (2000) Phylogenetic Tree ConstructionUsing Markov Chain Monte Carlo. Journal of the American StatisticalAssociation 95, 450


24/24

1008 M. ROMANIUK

Mehta, C.R., Patel, N.R. and Senchaudhuri, P. (2000) Efficient MonteCarlo Methods for Conditional Logistic Regression. Journal of the Amer-ican Statistical Association 95, 449

Mengersen, K.L., Robert, Ch.P. and GuihenneucJouyaux, Ch. (1999)MCMC Convergence Diagnostics: A Review. In: J.M. Bernardo, J.O. Ber-ger, A.P. Dawid, A.F.M. Smith, eds., Bayesian Statistics 6, Oxford Uni-versity Press, 415440.

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H.and Teller, E. (1953) Equations of state calculations by fast computingmachines. J. Chem. Phys. 21.

Metropolis, N. and Ulam, S. (1949) The Monte Carlo Method. Journalof American Statistical Association 44.

Mykland, P., Tierney, L. and Yu, B. (1995) Regeneration in Markov

Chain Samplers. JASA 90, 233 241.Nummelin, E. (1978) A splitting technique for Harris recurrent Markov Chains.

Zeitschrift fr Wahrscheinlichkeitstheorie und verwndte Gebiete 43, 309 318.

Raftery, A.E. and Lewis, S.M. (1999) How many iterations in the GibbsSampler? In: J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith,eds., Bayesian Statistics 4. Oxford University Press, 763773.

Robert, Ch.P. (1995) Convergence Control Methods for Markov Chain MonteCarlo Algorithm. Statistical Science 10, 3.

Robert, Ch.P. and Casella, G. (2004) Monte Carlo Statistical Methods,2nd ed., Springer Verlag, New York.

Romaniuk, M. (2003) Pricing the Risk-Transfer Financial Instruments via

Monte Carlo Methods. Systems Analysis Modelling Simulation 43, 8,1043 - 1064.Romaniuk, M. (2007) Application of renewal sets for convergence diagno-

sis of MCMC setups (in Polish). Ph.D. dissertation, Systems ResearchInstitute, Polish Academy of Sciences.

Date post:	07-Apr-2018
Category:	Documents
Upload:	tianyu-sun
View:	220 times
Download:	0 times

On Some Method for Diagnosing Convergence in MCMC

Documents