Download - An interlacing theorem for reversible Markov chains

An Interlacing Theorem for Reversible Markov

Chains

Robert Grone

Department of Mathematics and Statistics

San Diego State University

CA 92182-7720, San Diego, USA

Karl Heinz Hoffmann

Institut fur Physik,

Technische Universitat Chemnitz,

D-09107 Chemnitz, Germany

Peter Salamon

Department of Mathematics and Statistics

San Diego State University

CA 92182-7720, San Diego, USA

June 14, 2005

Abstract

Reversible Markov chains are an indispensable tool in the modelling

of a vast class of physical, chemical, and biological problems. Exam-

ples include the master equation descriptions of relaxing physical systems,

stochastic optimization algorithms like simulated annealing, and chemi-

cal dynamics of protein folding. Very often the large size of the state

spaces requires the coarse graining or lumping of micro states into fewer

mesoscopic states, and a question of utmost importance for the validity of

the physical model is how the eigenvalues of the corresponding stochastic

matrix change under this operation. In this paper we prove an interlac-

ing theorem which gives explicit bounds on the eigenvalues of the lumped

stochastic matrix.

1 Introduction

Reversible Markov chains are chains that satisfy the detailed balance condition(see equation below). Their importance and introduction to the physics liter-ature dates back to the work of Ludwig Boltzmann (1887) who showed that

1

this class of Markov chains is the only class to be used for almost all physicalprocesses at the molecular level. Later the principle also became known as theprinciple of microscopic reversibility. The importance of this class of Markovchains was further increased by the introduction of the Metropolis algorithm(1954) which is the basis for essentially all Monte Carlo simulations of physicalsystems. This algorithm makes all the “kinetic factors” as large as possible con-sistent with detailed balance and thus uses a designer Markov chain which hasa particular stationary distribution, consistent with the requirement of having agiven sparsity matrix and satisfying detailed balance. More recently (1984) thisfact has been recognized to be much more generally useful and is the basis ofGibbs sampling exploited in the family of Markov chain Monte Carlo methods(MCMC) now in widespread use for many sorts of statistical simulations [3].

Another manifestation of reversible Markov chains arises from random walkson a simple graph on n vertices. Let A denote the adjacency matrix of a graphG. By a random walk on G, we mean a sequence of steps between adjacentvertices where each adjacent vertex is equally likely for the next step. Thisdescribes a Markov chain with matrix M = AD−1, where D is diagonal withdii being the degree of vertex i for all i = 1, 2, . . . , n. As we state in proposition2 below, reversible Markov chains have matrices which are diagonally similar tosymmetric matrices and thus many facts about symmetric matrices apply. Therandom walk matrix, M = AD−1, is similar to D−1/2AD−1/2, which is clearlysymmetric.

Our main result is an interlacing theorem. Given real numbers λ1 ≥ λ2 ≥· · · ≥ λn and β1 ≥ β2 ≥ · · · ≥ βn−1, we say the β’s interlace the λ’s if

λ1 ≥ β1 ≥ λ2 ≥ . . . λn−1 ≥ βn−1 ≥ λn. (1)

Perhaps the most well-known example of an interlacing theorem is the CauchyInterlacing Theorem which states that the eigenvalues of an (n − 1)-by-(n − 1)principal submatrix of a symmetric matrix A interlace the eigenvalues of A [5].In this paper we consider a reversible Markov chain on n states with (irreducible)transition matrix M . We describe a “lumped” Markov chain on n − 1 statesobtained by combining two states of M . If M is the transition matrix of thelumped Markov chain, we see that M is reversible and that the eigenvalues ofM interlace those of M .

Physical descriptions that forego many of the details of the real dynamicsby lumping together collections of microstates into mesoscopic states is an oldidea which is often referred to as coarse graining. Such coarse graining is acrucial ingredient in the modern description of many physical processes [4]. Inparticular, it is an indispensable tool for the study of complex energy landscapes[6]. Our result shows that such coarse graining can only speed up relaxation onthese landscapes.

2

https://www.researchgate.net/publication/243391632_Energy_Landscapes_With_Applications_to_Clusters_Biomolecules_and_Glasses?el=1_x_8&enrichId=rgreq-3daf9755-1e65-40e1-8c26-c31ea3619d63&enrichSource=Y292ZXJQYWdlOzIzMDk3OTA1MDtBUzoxMDQwMjk1Mzk1MzY4OTZAMTQwMTgxMzk4MzE2NQ==

2 Reversible Markov Chains

Let M be an n-by-n irreducible column stochastic matrix, i.e. all entries non-negative and all column sums equal to one and there does not exist a permu-tation matrix P such that PT MP is block triangular. The Markov chain withtransition matrix M is said to be reversible [1, 2] iff there exists a non-zerovector v such that

Mijvj = Mjivi, ∀ i, j (2)

If such a vector exists, it follows by summing (2) over j that v must be aneigenvector of M with eigenvalue 1. Thus without loss of generality, we mayassume that v has been properly normalized, i.e. that v > 0 and v1+· · ·+vn = 1.In other words, v corresponds to the stationary distribution of the Markovchain corresponding to M . It also follows from equation (2) and our irreducibleassumption that the Markov chain corresponding to M is regular and so λ = 1is a simple eigenvalue of M . As discussed above, condition (2) is also calleddetailed balance or microscopic reversibility in the physical literature [4]. If thechain induced by M is reversible, then we will also say that M is reversible.The following two propositions are easily shown.

Proposition 1:

Let V = diag(v1/2

1 , . . . , v1/2n ). Then M is reversible iff V −1MV is symmetric.

Proposition 2:If M is reversible, then M has all real eigenvalues, say

1 = λ1 > λ2 ≥ · · · ≥ λn > −1 (3)

3 Lumping states

One can form a (lumped) Markov chain on n−1 states by consolidating 2 states(say states 1 and 2) of a Markov chain with n states [1]. If we start with theMarkov chain with transition matrix M , the corresponding column stochasticmatrix M of the lumped chain obtained from M by lumping the first two statesof M into one state is obtained as as follows

i. add row 1 of M to row 2 of M , and then delete row 1 to form M .

ii. replace column 2 of M by

d1col1(M) + d2col2(M)

whered1 =

v1

v1 + v2

d2 =v2

v1 + v2

and then delete column 1 from M to form M .

3

https://www.researchgate.net/publication/38366809_A_Markovian_Function_of_a_Markov_Chain?el=1_x_8&enrichId=rgreq-3daf9755-1e65-40e1-8c26-c31ea3619d63&enrichSource=Y292ZXJQYWdlOzIzMDk3OTA1MDtBUzoxMDQwMjk1Mzk1MzY4OTZAMTQwMTgxMzk4MzE2NQ==

It is clear that M is (n − 1)-by-(n − 1) and column stochastic. In matrixterms

M = CMD (4)

wherethe C is the (n − 1)-by-n collecting matrix

C =

1 1 0 . . . 0

0 0 1. . .

......

.... . .

. . . 00 0 . . . 0 1

(5)

and D is the n-by-(n − 1) distributing matrix

D =

d1 0 . . . 0d2 0 . . . 0

0 1. . .

......

. . .. . . 0

0 . . . 0 1

(6)

where d1 = v1/(v1 + v2) and d2 = v2/(v1 + v2).

Proposition 3:

M is reversible with stationary distribution Cv = (v1 + v2, v3, . . . , vn)T .

Proof:It is clear that for i, j > 1 condition (2) is satisfied for M and Cv. For i = 1

we have from the reversibility of M and the definition of M that

Mi,1(Cv)1 = (d1Mi,1 + d2Mi,2)(v1 + v2) (7)

= v1Mi,1 + v2Mi,2 (8)

= M1,ivi + M2,ivi (9)

= M1,i(Cv)i. (10)

4

4 The Interlacing Theorem

Let M be an n-by-n column stochastic matrix corresponding to a reversibleMarkov chain, and let M be formed as previously described. Then M corre-sponds to a reversible Markov chain with stationary distribution Cv. It followsthat M has all real eigenvalues, say

1 = β1 > β2 ≥ · · · ≥ βn−1 > −1. (11)

Our main result is the following

Theorem:The eigenvalues of M interlace those of M

1 = λ1 = β1 > λ2 ≥ β2 ≥ λ3 ≥ · · · ≥ βn−1 ≥ λn. (12)

One implication of our theorem is that Mk converges to its limit faster thanMk since convergence is determined by the rates at which αk and αk convergeto zero, where

α = max{|λ2|, . . . , |λn|} (13)

andα = max{|β2|, . . . , |βn|} (14)

Proof:The proof requires some of the same considerations that make the proof of

the Cauchy interlacing theorem for symmetric matrices complicated: we firstargue that the result (12) holds for generic M having distinct eigenvalues andsatisfying an additional genericity condition (??). The general case then followsby a continuity argument.

Since V −1MV is symmetric, we may write

V −1MV = QΛQT (15)

whereΛ = diag(λ1, λ2, . . . , λn) (16)

and Q is real orthogonal. We assume without loss of generality that det(Q) = 1.Now

M = CMD (17)

= CV QΛQT V −1D (18)

and soλIn−1 − M = CV Q(λIn − Λ)QT V −1D (19)

since QQT = V V −1 = In and CD = In−1.

5

If we take the (n − 1)th compound [5] in equation (19) we obtain

det[λIn−1 − M ] (20)

= Cn−1[λIn−1 − M ] (21)

= Cn−1[CV Q(λIn − Λ)QT V −1D] (22)

= Cn−1[CV Q]Cn−1[(λIn − Λ)]Cn−1[QT V −1D]. (23)

Now for simplicity define

g(λ) = det[λIn−1 − M ] =n−1∏

i=1

(λ − βi) (24)

f(λ) = det[λIn − Λ] =n

∏

i=1

(λ − λi) (25)

x = Cn−1[CV Q] (26)

y = Cn−1[QT V −1D]. (27)

Let

fj(λ) =f(λ)

(λ − λj)=

n∏

i=1,i 6=j

(λ − λi) (28)

Since λIn − Λ is diagonal, it is clear that

Cn−1[λIn − Λ] =

fn(λ) 0 · · · 00 fn−1(λ) · · · 0...

.... . . 0

0 0 · · · f1(λ).

(29)

We then have

g(λ) = x

fn(λ) 0 · · · 00 fn−1(λ) · · · 0...

.... . . 0

0 0 · · · f1(λ)

y, (30)

so

g(λ) =n

∑

i=1

xiyifn+1−i(λ). (31)

To continue, we make use of the following lemmas

Lemma 1: x = kyT for k = v3...vn

v1+v2

> 0.

Lemma 2: xn = 0.

Lemma 3: Let 0 6= w ∈ W , where W is a real m-dimensional inner productspace. Then there exists an orthonormal basis of W , say {w1, . . . , wm} suchthat (w,wi) 6= 0, all i = 1, . . . m.

6

Lemma 4: It suffices to prove the theorem under the genericity assumption thatxi 6= 0, all i = 1, . . . n − 1.

Lemma 5: It suffices to prove the theorem under the genericity assumption thatλ1 > λ2 > λ3 > · · · > λn.

We leave the proof of these lemmas till last and proceed with the proof ofthe theorem. Using Lemmas 1 and 2, equation (31) becomes

g(λ) =n−1∑

i=1

kx2i fn+1−i(λ) (32)

so g(λ) is a positive linear combination of f2(λ), . . . , fn(λ). Clearly,

fi(λj) = 0, all i 6= j (33)

and

fi(λi) =

n∏

j=1,j 6=i

(λi − λj) (34)

From this last fact and the strict inequality among the λ’s in Lemma 5, we have

f1(λ1) > 0 (35)

f2(λ2) < 0 (36)

f3(λ3) > 0 (37)

f4(λ4) < 0 (38)

. . . (39)

Since g(λ) is a positive linear combination of f2(λ), . . . , fn(λ), this yields

g(λ1) = 0 (40)

g(λ2) < 0 (41)

g(λ3) > 0 (42)

g(λ4) < 0 (43)

. . . (44)

By the Intermediate Value Theorem, g(λ) must have at least one zero in eachof the intervals

(λn, λn−1), (λn−1, λn−2), . . . , (λ3, λ2). (45)

Since deg(g(λ)) = n − 1, and β1 = 1 is known, this accounts for all the othern − 2 roots of g(λ), and we have shown that

1 = λ1 = β1 > λ2 > β2 > λ3 > · · · > βn−1 > λn (46)

This completes the proof of the theorem. We now present proofs of thelemmas.

7

Proof of Lemma 1: x = kyT

We have

x = Cn−1[CV Q] (47)

= Cn−1[CV ]Cn−1[Q] (48)

= Cn−1

v1/2

1 v1/2

2 0 . . . 0

0 0 v1/2

3 . . . 0...

......

. . ....

0 0 0 . . . v1/2n

Cn−1[Q] (49)

= (0, . . . , 0, wn−1, wn)Cn−1[Q] (50)

(51)

wherewn−1 = v

1/2

1 (v3 . . . vn)1/2 (52)

andwn = v

1/2

2 (v3 . . . vn)1/2 (53)

Similarly,

y = Cn−1[QT V −1D] (54)

= Cn−1[QT ]Cn−1[V

−1D] (55)

= Cn−1[QT ]Cn−1

d1v−1/2

1 0 . . . 0

d2v−1/2

2 0 . . . 0

0 v−1/2

3 . . . 0...

.... . . 0

0 0 . . . v−1/2n

(56)

= Cn−1[QT ](0, . . . , 0, zn−1, zn)T (57)

where

zn−1 = d1v−1/2

1 (v3 . . . vn)−1/2 =v1/2

1

v1 + v2

(v3 . . . vn)−1/2 (58)

and

zn = d2v−1/2

2 (v3 . . . vn)−1/2 =v1/2

2

v1 + v2

(v3 . . . vn)−1/2 (59)

If we letk =

v3 . . . vn

v1 + v2

(60)

8

we see kz = wT . Now

xT = (wCn−1[Q])T (61)

= Cn−1[Q]T wT (62)

= Cn−1[Q]T kz (63)

= kCn−1[Q]T z (64)

= ky (65)

as desired.

Proof of Lemma 2: xn = 0

Recall that x = Cn−1[CV Q], so x = Cn−1(C) Cn−1(V ) Cn−1(Q). It is easyto see that

Cn−1(C) = (0, . . . , 0, 1, 1) (66)

and

Cn−1(V ) = (v1v2 . . . vn)1/2

v−1/2n 0 . . . 0

0 v−1/2

n−1 . . . 0...

.... . . 0

0 0 . . . v−1/2

1

. (67)

Also, from fact 5 in our appendix,

Cn−1(Q) = PT QT P, (68)

which has the form

±v1/2n

∓v1/2

n−1

∗...

−v1/2

2

v1/2

1

. (69)

9

Now

x = Cn−1(C) Cn−1(V ) Cn−1(Q)

= (v1v2 . . . vn)1/2(0, . . . , 0, 1, 1)

v−1/2n 0 0

0 v−1/2

n−1 0...

. . . 0

0 . . . v−1/2

1

Cn−1(Q)

= (v1v2 . . . vn)1/2(0, . . . , 0, v−1/2

2 , v−1/2

1 ) Cn−1(Q)

= (v1v2 . . . vn)1/2(0, . . . , 0, v−1/2

2 , v−1/2

1 )

±v1/2n

∓v1/2

n−1

∗...

−v1/2

2

v1/2

1

= (v1v2 . . . vn)1/2(∗, ∗, . . . , ∗, 0).

Proof of Lemma 3:

We may assume without loss of generality that ‖w‖ = 1. If we extend wto an arbitrary orthonormal basis of W , say {y1, y2, . . . , ym} with w = y1, wemay assume that W = Rm, and that w = e1 = (1, 0, . . . , 0)T . Now let P be anym-by-m orthogonal matrix with no zero entries in the first column. If the rowsof P are r1, . . . , rm, we may let yi = rT

i , for all i = 1, . . . ,m.

Proof of Lemma 4:

We saw in the proof of lemma 3 that x is a positive multiple of

(0, . . . , 0, v−1/2

2 , v−1/2

1 )PT QT P, (70)

where PT QT P is an n-by-n orthogonal matrix with last column

yn = (±v1/2n , . . . , v

1/2

3 ,−v1/2

2 , v1/2

1 )T . (71)

In lemma 2, this was used to show that xn = 0, or that (0, . . . , 0, v−1/2

2 , v−1/2

1 )T

is orthogonal to yn. Now apply lemma 3 to W =< yn >T and w = (0, . . . , 0, v−1/2

2 , v−1/2

1 )T ∈W . We deduce that there exists an orthonormal basis of W , say {y1, . . . , yn−1}where (w, yi) 6= 0, for all i = 1, . . . , n−1. Since {y1, . . . , yn} is orthonormal, thematrix Y = [ y1 y2 . . . yn ] is orthogonal. If we define Q = PT Y T P , then Q isan n-by-n orthogonal matrix sharing the first column of Q, but for which

x =: Cn−1(CV Q) (72)

= (x1, . . . , xn−1, 0), (73)

wherexi = (w, yi) 6= 0, i = 1, . . . , n − 1. (74)

10

Now let Q denote any orthogonal matrix having first column the same as Qbut with xi 6= 0, all i = 1, . . . , n − 1, and define

M = V QΛQT V −1. (75)

Then M has the same eigenvalues as M , and has the same eigenvector corre-sponding to λ1 = 1. (Note that M may have negative entries, but its columnsstill add to 1). Among this set of matrices, the set of M for which xi 6= 0, alli = 1, . . . , n− 1 is a dense subset. Thus it suffices to prove our result under thegeneric assumption xi 6= 0, all i = 1, . . . , n − 1.

Proof of Lemma 5:

Recall that M = V QΛQT V −1, where Λ = diag(λ1, . . . , λn) and λ1 > λ2 ≥λ3 ≥ · · · ≥ λn. For any ǫ < λ1−λ2

n−1, let

Λǫ = diag(λ1, λ2 + (n − 1)ǫ, λ3 + (n − 2)ǫ, . . . , λn + ǫ) (76)

andMǫ = V QΛǫQ

T V −1. (77)

Since Mǫ has distinct eigenvalues for all ǫ ∈ (0, λ1−λ2

n−1) and M = M0, it suffices

by continuity to prove the theorem for Mǫ, where 0 < ǫ < λ1−λ2

n−1.

Corollary: When we lump k states of a reversible Markov chain, the eigenvaluesof the lumped chain interlace the eigenvalues of the original chain according to

λ1 = β1 = 1λ2 ≥ β2 ≥ λ2+k−1

. . .λi ≥ βi ≥ λi+k−1

. . .

(78)

The proof follows from the fact that lumping k states at once gives the samechain as iterated lumping.

5 Conclusions

The main result of this work is an interlacing theorem for reversible Markovchains that results by lumping two states of a bigger chain. The theorem pro-vides a solid tool for the analysis of a large class of Markov chain models inphysics and other areas. It gives strict bounds on the relaxation dynamics for amultitude of processes which can be used as coarse grained or lumped approxi-mations of the underlying process. The realm of irreversible decay towards anequilibrium or stationary state rests on the use of Markov chain descriptions.With the typical 1023 particles per cubic centimeter and the corresponding num-ber of degrees of freedom, it is clear that coarse grained descriptions which re-duce the dimensionality of the problem are necessary. Our theorem provides

11

direct bounds on the eigenvalues of the lumped problem, and thus provides arigorous proof of limits for the underlying dynamics. Note that our theorembounds eigenvalues of the lumped dynamics from above and below. Dependingon the structutre of the original system these could be quite restrictive thusgiving a small range of possible time scales for the lumped dynamics.

For large numbers of sequential lumping steps, the relevant eigenvalues mightchange considerably. On the other hand, for cases where symmetries in the sys-tem lead to the dynamical equivalence of states the eigenvalues might changevery little. In some disordered physical systems Markov chain techniques haverevealed certain “dynamical degeneracies” of states [7]. The behavior of suchsystems under lumping is a highly interesting yet unsolved field. While thepresent interlacing theorem provides partial information regarding these openproblems stronger interlacing theorems along the lines of Weyl’s theorem bound-ing the eigenvalues of the unlumped chain to the eigenvalues of the dynamiccomponents will be explored in a future effort.

Finally we mention that for the designer Markov chains developed in dif-ferent areas as tools for a variety of problems like finding global minima ofmulti-minima functions the interlacing theorem provides a tool to develop fastconverging algorithms. Depending on the structure, clever coarse graining of thestate space might allow much faster convergence compared to standard methods.

6 Appendix

Let A and B be m-by-n matrices, and let 1 ≤ k ≤ min{m,n}. The kth minor(compound) of A is the

(

mk

)

-by-(

nk

)

matrix of k-by-k subdeterminants of A.They have the following properties [5]:

1. C1[A] = A

2. If m = n, Cn[A] = detA

3. Ck[AT ] = Ck[A]T

4. (Cauchy-Binet) Ck[AB] = Ck[A]Ck[B] provided that AB is defined.

5. If A is n-by-n and invertible, then

Cn−1(A) = det(A)PT A−1P, (79)

where

P = [(−1)iδi,n+t−i] =

0 . . . 0 0 −10 . . . 0 +1 00 . . . −1 0 0...

. . ....

......

±1 0 . . . . . . 0

. (80)

12

References

[1] John G. Kemeny and J. Laurie Snell, Finite Markov Chains (Van Nostrand,Princeton, 1960).

[2] C.K. Burke and M. Rosenblatt, “A Markovian function of a Markov chain,”Annals of Mathematical Statistics, 29, 1112-1122, 1958.

[3] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter (Eds. )Markov Chain Monte Carlo in Practice (Chapman & Hall, London,1996).

[4] N.G. Van Kampen, Stochastic Processes in Physics and Chemistry (NorthHolland, Amsterdam, 1992).

[5] Roger A. Horn and Charles R. Johnson, Matrix Analysis (Cambridge Uni-versity Press, New York, 1985).

[6] David J. Wales, Energy Landscapes: With Applications to Clusters, Biomolecules and Glasses(Cambridge University Press, New York, 2004).

[7] P. Sibani, J. C. Schoen, P. Salamon, and J. O. Andersson, ”Emergent Hi-erarchical Structures in Complex System Dynamics,” Europhysics Letters,22, 479-485, 1993.

13