Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | toxopneustes |
View: | 33 times |
Download: | 0 times |
Linear Algebra and its Applications 438 (2013) 1727–1745
Contents lists available at SciVerse ScienceDirect
Linear Algebra and its Applications
journal homepage: www.elsevier .com/locate/ laa
Sensitivity analysis of discrete Markov chains via matrix
calculus
Hal Caswell∗Biology Department MS-34, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA
Max Planck Institute for Demographic Research Rostock, Germany
A R T I C L E I N F O A B S T R A C T
Article history:
Received 29 November 2010
Accepted 29 July 2011
Available online 12 November 2011
Submitted by V. Mehrmann
Keywords:
Sensitivity
Elasticity
Population models
Life expectancy
Life lost to mortality
Community dynamics
Demography
Markov chains are often used as mathematical models of natural
phenomena, with transition probabilities defined in terms of para-
meters that are of interest in the scientific question at hand. Sensi-
tivity analysis is an important way to quantify the effects of changes
in these parameters on the behavior of the chain. Many properties
of Markov chains can be written as simple matrix expressions, and
hence matrix calculus is a powerful approach to sensitivity analysis.
Usingmatrix calculus,wederive the sensitivity and elasticity of a va-
riety of properties of absorbing and ergodic finite-state chains. For
absorbing chains, we present the sensitivities of themoments of the
number of visits to each transient state, the moments of the time to
absorption, themeannumber of states visited before absorption, the
quasistationary distribution, and the probabilities of absorption in
each of several absorbing states. For ergodic chains, we present the
sensitivity of the stationary distribution, themean first passage time
matrix, the fundamental matrix, and the Kemeny constant. We in-
clude two examples of application of the results to demographic and
ecological problems.
© 2011 Elsevier Inc. All rights reserved.
1. Introduction
Perturbation analysis is a long-standing problem in the theory of Markov chains [40,10,15,13,42,
43,31,9,33,35,36,27]. When Markov chains are applied as models of physical, biological, or social
systems, they are oftendefinedas functions of parameters that have substantivemeaning. Perturbation
∗ Address: Biology Department MS-34, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA.
E-mail address: hcaswellwhoi.edu
0024-3795/$ - see front matter © 2011 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.laa.2011.07.046
1728 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
analysis is an important tool for understanding how those parameters determine the properties of the
chain, and in predicting how changes in the environment (sensu lato) will change the outcome. For
applications, the analyses should be easily computable, and flexible enough to handle a variety of
dependent variables. To the extent that the properties of the chain can bewritten in terms ofmatrices,
matrix calculus [28,29,1] is an ideal tool for this end.
In this paper, we applymatrix calculus to absorbing and ergodic discrete-time chains. For absorbing
chains, we derive the sensitivities of the moments of the number of visits to each transient state, of
the moments of the time to absorption, of the mean number of states visited before absorption, of the
quasistationary distribution, and of the probabilities of absorption in each of several absorbing states.
For ergodic chains, we derive the sensitivity of the stationary distribution, of the mean first passage
timematrix, of the fundamentalmatrix, and of the Kemeny constant. The results aremotivated by (but
not restricted to) problems in ecology and demography, and we include two examples. The examples
present, as far as we know, the first calculations of many of these sensitivities.
Notation: Vectors are column vectors, and transition matrices are column-stochastic and operate on
column vectors, to agree with usage in other dynamical systems. The norm ‖x‖ denotes the 1-norm.
The vector e is a vector of ones, ei is the ith unit vector, E = eeT is a matrix of ones, and Eij = eieTj is a
matrix with a 1 in the (i, j) entry and zeros elsewhere. ThematrixD(x) is thematrix with the vector x
on the diagonal and zeros elsewhere. The matrix Xdg is the diagonal matrix with the diagonal of X on
its diagonal and zeros elsewhere. The Hadamard product is denoted by ◦ and the Kronecker product
by ⊗. Where necessary to avoid ambiguity, a subscript denotes the dimension of the identity matrix
(e.g. Is is the identity matrix of dimension s). As inMatlab, row i and column j of X are denoted X(i, :)andX(:, j), respectively. The vec operator applied to am×nmatrixX yields themn×1 vector created
by stacking the columns of X one above the other.
In matrix calculus [29], the derivative of a vector y with respect to a vector x is
dy
dxT =(
dyi
dxj
). (1)
If y and x are nonnegative it is often useful to evaluate the elasticity, or proportional sensitivity of y to
x. This gives the relative change in y caused by a relative change in x. There is no standard notation for
elasticities, but here we adapt Samuelson’s suggestion [39] and use a notation analogous to that for
derivatives, but with ε replacing d:
εy
εxT = D(y)−1 dy
dxT D(x). (2)
Matrices are treated by using the vec operator to transform them to vectors, and differentiated accord-
ingly. Thus the derivative of y with respect to the matrix X is written dy/dvec TX; this arrangement
permits the chain rule to operate (see Appendix A and references therein). Most results are reported
as differentials. If dy = �dx for some matrix �, then
dy
dxT = � (3)
dy
dθT = �dx
dθT (4)
whereθ is anyvector ofwhichx is a function [29]. Ifydependson x1 and x2, then thepartial differentials
of y [14, p. 37] are denoted ∂x1 = (∂y/∂x1) dx1 and similarly for x2. For more details of matrix calculus
notation, see Appendix A.
2. Absorbing chains
The transition matrix for a discrete-time absorbing chain can be written
P =⎛⎝ U 0
M I
⎞⎠ (5)
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1729
whereU, of dimension s× s, is the transitionmatrix among the s transient states, andM, of dimension
a × s, contains probabilities of transition from the transient states to the a absorbing states. Assume
that the spectral radius of U is strictly less than 1. Because we are concerned here with absorption,
but not what happens after, we ignore transitions among absorbing states; hence the identity matrix
(a × a) in the lower right corner. The matrices U[θ ] andM[θ ] are functions of a vector of parameters.
We assume that θ varies over some set in which the column sums of P are 1 and the spectral radius of
U is strictly less than one.
2.1. Number of visits to transient states
Let νij be the number of visits to transient state i, prior to absorption, by an individual starting in
transient state j. The expectations of the νij are entries of the fundamental matrixN = N1 =(E(ηij)
):
N = (I − U)−1 (6)
(e.g. [25,23]). Let Nk =(E(ηk
ij))be a matrix containing the kth moments about the origin of the νij .
The first several of these matrices are [23, Theorem 3.1]
N1 = (I − U)−1 (7)
N2 = (2Ndg − I
)N1 (8)
N3 =(6N2
dg − 6Ndg + I)N1 (9)
N4 =(24N3
dg − 36N2dg + 14Ndg − I
)N1. (10)
Theorem 2.1. Let Nk be the matrix of kth moments of the νij , as given by (7)–(10). The sensitivities of Nk,
for k = 1, . . . , 4 are
dvecN1 =(NT1 ⊗ N1
)dvecU (11)
dvecN2 =[2
(I ⊗ Ndg
) − Is2]dvecN1 + 2
(NT ⊗ I
)dvecNdg (12)
dvecN3 =[I ⊗
(6N2
dg − 6Ndg + I)]
dvecN1
+[6
(NTNdg ⊗ I
)+ 6
(NT ⊗ Ndg
)− 6
(NT ⊗ I
)]dvecNdg (13)
dvecN4 =[I ⊗
(24N3
dg − 36N2dg + 14Ndg − I
)]dvecN1
+[24
(NTN2
dg ⊗ I)
+ 24(NTNdg ⊗ Ndg
)+ 24
(NT ⊗ N2
dg
)
−36(NTNdg ⊗ I
)− 36
(NT ⊗ Ndg
)+ 14
(NT ⊗ I
) ]dvecNdg (14)
where dvecNdg = D(vec I)dvecN1; see (A-12).
Proof. The result (11) is derived in [3, Section 3.1]. For k > 1, and considering Nk as a function of N1
and Ndg, the total differential of Nk is
dvecNk = ∂vecNk
∂vec TN1
dvecN1 + ∂vecNk
∂vec TNdg
dvecNdg. (15)
The two terms of (15) are the partial differentials of vecNk , obtained by taking differentials treat-
ing only N1 or only Ndg as variables, respectively. Denote these partial differentials as ∂N1 and ∂Ndg .
Differentiating N2 in (8) gives
1730 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
∂N1N2 = 2Ndg (dN1) − dN1 (16)
∂NdgN2 = 2(dNdg
)N1. (17)
Applying the vec operator and using (A-8) gives
∂N1vecN2 = [2
(I ⊗ Ndg
) − Is2]dvecN1 (18)
∂NdgvecN2 = 2(NT1 ⊗ I
)dvecNdg (19)
and (15) becomes
dvecN2 = [2
(I ⊗ Ndg
) − Is2]dvecN1 + 2
(NT1 ⊗ I
)dvecNdg (20)
which is (12). The derivations of dvecN3 and dvecN4 follow the same sequence of steps. The details
are given in Online Supplement Appendix B. �
The derivatives of N2, N3, and N4 can be used to study the variance, standard deviation, coefficient
of variation, skewness, and kurtosis of the number of visits to the transient states [3,6,7].
2.2. Time to absorption
Let ηj be the time to absorption starting in transient state j and let ηk = E(ηk1, · · · , ηk
s
)T. The
first several of these moments satisfy [23, Theorem 3.2]
ηT1 = eTN1 (21)
ηT2 = ηT
1 (2N1 − I) (22)
ηT3 = ηT
1
(6N2
1 − 6N1 + I)
(23)
ηT4 = ηT
1
(24N3
1 − 36N21 + 14N1 − I
). (24)
Theorem 2.2. Let ηk be the vector of the kth moments of the ηi. The sensitivities of these moment vectors
are
dη1 =(I ⊗ eT
)dvecN1 (25)
dη2 =(2NT
1 − I)dη1 + 2
(I ⊗ ηT
1
)dvecN1 (26)
dη3 =(6N2
1 − 6N1 + I)T
dη1
+[6
(NT1 ⊗ ηT
1
)+ 6
(I ⊗ ηT
1N1
)− 6
(I ⊗ ηT
1
) ]dvecN1 (27)
dη4 =(24N3
1 − 36N21 + 14N1 − I
)Tdη1
+{24
[(NT1
)2 ⊗ ηT1
]+ 24
(NT1 ⊗ ηT
1N1
)+ 24
(I ⊗ ηT
1N21
)
−36(NT1 ⊗ ηT
1
)− 36
(I ⊗ ηT
1N1
)+ 14
(I ⊗ ηT
1
) }dvecN1 (28)
where dvecN1 is given by (11).
Proof. The derivative of η1 is obtained [3] by differentiating (21) to get dηT1 = eT (dN1) and then
applying the vec operator. For the higher moments, consider the ηk to be functions of η1 and N1, and
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1731
write the total differential
dηk = ∂ηk
∂ηT1
dη1 + ∂ηk
∂vec TN1
dvecN1. (29)
The partial differentials of η2 with respect to η1 and N1 are
∂η1ηT2 =
(dηT
1
)(2N1 − I) (30)
∂N1ηT2 = 2ηT
1 (dN1) . (31)
Applying the vec operator and using (A-8) gives
∂η1η2 =
(2NT
1 − I)dη1 (32)
∂N1η2 = 2(I ⊗ ηT
1
)dvecN1 (33)
which combine according to (29) to yield (26). Thederivations ofdη3 anddη4 follow the samesequence
of steps; the details are shown in Online Supplementary Appendix B. �
2.3. Number of states visited before absorption
Let ξi ≥ 1 be the number of distinct transient states visited before absorption, and let ξ1 = E(ξ).Then
ξT1 = eTN
−1dg N1 (34)
[23, Section 3.2.5], where N−1dg = (
Ndg
)−1.
Theorem 2.3. Let ξ 1 = E(ξ). The sensitivity of ξ is
dξ 1 =[−
(NT1 ⊗ eT
) (N
−1dg ⊗ N
−1dg
)D(vec I) +
(I ⊗ eTN
−1dg
)]dvecN1 (35)
where dvecN1 is given by (11).
Proof. Differentiating (34) yields
dξT1 = eT
(dN
−1dg
)N1 + eTN
−1dg dN1. (36)
Applying the vec operator and using (A-8) gives
dξ 1 =(NT1 ⊗ eT
)dvecN
−1dg +
(I ⊗ eTN
−1dg
)dvecN1. (37)
Applying (A-10) to dvecN−1dg and using (A-13) for dvecNdg gives
dξ 1 = −(NT1 ⊗ eT
) (N
−1dg ⊗ N
−1dg
)D(vec I)dvecN1 +
(I ⊗ eTN
−1dg
)dvecN1 (38)
which simplifies to (35). �
2.4. Multiple absorbing states and probabilities of absorption
When the chain includes a > 1 absorbing states, the entry mij of the a × s submatrix M in (5)
is the probability of transition from transient state j to absorbing state i. The result of the competing
risks of absorption is a set of probabilities bij = P [absorption in i |starting in j ] for i = 1, . . . , a and
j = 1, . . . , s . The matrix B = (bij
) = MN1 [23, Theorem 3.3] .
1732 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
Theorem 2.4. Let B = MN1 be the matrix of absorption probabilities. Then
dvec B =(NT1 ⊗ I
)dvecM +
(NT1 ⊗ B
)dvecU. (39)
Proof. Differentiating B yields
dB = (dM)N1 + M (dN1) . (40)
Applying the vec operator gives
dvec B =(NT1 ⊗ I
)dvecM + (I ⊗ M) dvecN1. (41)
Substituting (11) for dvecN1 and simplifying gives (39). �
Column j of B is the probability distribution of the eventual absorption state for an individual
starting in transient state j. Usually a few of those starting states are of particular interest (e.g. states
corresponding to “birth” or to the start of some process). Let B(:, j) = Bej denote column j of B, where
ej is the jth unit vector of length s. Thus the derivative of B(:, j) isdvec B(:, j) =
(eTj ⊗ Ia
)dvec B (42)
where dvec B is given by (39). Similarly, row i of B is B(i, :) = eTi B and
dvec B(i, :) =(Is ⊗ eT
i
)dvec B (43)
where ei is the ith unit vector of length a.
2.5. The quasistationary distribution
The quasistationary distribution of an absorbingMarkov chain gives the limiting probability distri-
bution, over the set of transient states, of the state of an individual that has yet to be absorbed. Let w
and v be the right and left eigenvectors associated with the dominant eigenvalue of U, normalized so
that ‖w‖ = ‖v‖ = 1. Darroch and Seneta [11] defined two quasistationary distributions in terms of
w and v. The limiting probability distribution of the state of an individual, given that absorption has
not yet happened, converges to
qa = w. (44)
The limiting probability distribution of the state of an individual, given that absorption has not hap-
pened and will not happen for a long time, is
qb = w ◦ v
wTv. (45)
Lemma 2.1. Let the dominant eigenvalue of U, guaranteed real and nonnegative by the Perron–Frobenius
theorem, satisfy 0 < λ < 1, and let w and v be the right and left eigenvectors corresponding to λ, scaledto sum to one. Then
dw =(λIs − U + weTU
)−1 [wT ⊗
(Is − weT
)]dvecU (46)
dv =(λIs − UT + veT
1UT)−1 [ (
Is − veT1
)⊗ vT
]dvecU. (47)
Proof. Eq. (46) is proven in [5, Section 6.1]. Eq. (47) is obtained by treating v as the right eigenvector
of UT. �
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1733
Theorem 2.5. The derivative of the quasistationary distribution qa is given by (46). The derivative of the
quasistationary distribution qb is
dqb = 1
vTw
[ (D(v) − qbv
T)dw +
(D(w) − qbw
T)dv
](48)
where dw and dv are given by (46) and (47), respectively.
Proof. The derivative of qa follows from its definition as the scaled right eigenvector of U. For qb,
differentiating (45) gives
dqb = 1(vTw
)2{ (
vTw)d (v ◦ w) − (v ◦ w)
[(dvT
)w + vT (dw)
]}(49)
= 1
vTw
[d (v ◦ w) − qb
(dvT
)w − qbv
T (dw)]. (50)
Applying the vec operator and using (A-11) for dvec (v ◦ w) gives
dqb = 1
vTw
[D(v)dw + D(w)dv −
(wT ⊗ qb
)dv − qbv
Tdw]
(51)
which simplifies to give (48). �
3. An absorbing chain example: life lost due to mortality
The approach here makes it easy to compute the sensitivity of a variety of dependent variables
calculated from the Markov chain. As an example of this flexibility, consider a recently developed
demographic index, the number of years of life lost due to mortality [44].
The transient states of this chain are age classes, absorption corresponds to death, and absorbing
states correspond to age at death. Let μi be the mortality rate and pi = exp(−μi) the survival proba-
bility at age i. ThematrixU has the pi on the subdiagonal and zeros elsewhere. ThematrixM has 1−pion the diagonal and zeros elsewhere. Let f = B(:, 1) be the distribution of age at death and η1 the
vector of expected longevity as a function of age.
A death at age i represents the loss of some number of years of life beyond that age. The expectation
of that loss is given by the ith entry of η1, and the expected number of years lost over the distribution
of age at death is η† = ηT1 f . This quantity also measures the disparity among individuals in longevity
[44]. If everyone died at the identical age x, f would be a delta function at x and further life expectancy
at age x would be zero; their product would give η† = 0. Declines in discrepancy have accompanied
increases in life expectancy observed in developed countries [12,45]. Thus it is useful to know how η†
responds to changes in mortality.
Differentiating η† gives
dη† =(dηT
1
)Be1 + ηT
1 (dB) e1. (52)
Applying the vec operator gives
dη† = eT1 b
TdηT1 +
(eT1 ⊗ ηT
1
)dvec B. (53)
Substituting (25) for dη1 and (39) for dvec B gives
dη† = fT(I ⊗ eT
)dvecN1 +
(eT1 ⊗ ηT
1
) [ (NT1 ⊗ I
)dvecM +
(NT1 ⊗ B
)dvecU
]. (54)
Simplifying and writing derivatives in terms of μ gives
dη†
dμT =[fT
(NT1 ⊗ ηT
1
)+
(eT1N
T1 ⊗ ηT
1 B)]
dvecU
dμT +(eT1N
T1 ⊗ ηT
1
) dvecM
dμT . (55)
1734 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
0 20 40 60 80 100−0.02
0
0.02Japan
Elas
ticity
0 20 40 60 80 1000
0.05
0.1India
Age
Elas
ticity
Fig. 1. The elasticity of mean years of life lost due to mortality, η†, to changes in age-specific mortality, calculated from the female
life tables of India in 1961 and of Japan in 2006. Data obtained from the Human Mortality Database [20].
Because mortality rates vary over several orders of magnitude with age, it is useful to present the
results as elasticities,
εη†
εμT = 1
η†
dη†
dμT D(μ). (56)
Fig. 1 shows these elasticities for two populations chosen to have very different life expectancies: India
in 1961, with female life expectancy of 45 years and η† = 23.9 years and Japan in 2006, with female
life expectancy of 86 years andη† = 10.1 years [20]. In both cases, elasticities are positive frombirth to
some age (≈50 for India,≈85 for Japan) and negative thereafter. This implies that reductions in infant
and early life mortality would reduce η†, whereas reductions in old age mortality would increase η†.
4. Ergodic chains
Now let us consider perturbations of an ergodic finite-state Markov chain with an irreducible,
primitive, column-stochastic transition matrix P of dimension s × s. The stationary distribution π is
given by the right eigenvector, scaled to sum to 1, corresponding to the dominant eigenvalue λ1 = 1 of
P. The fundamental matrix of the chain is, in our notation, Z =(I − P + πeT
)−1[25]. (This is simply
the transpose of the result derived from a row-stochastic matrix.)
The perturbations of interest are those for which P remains a stochastic matrix. Such perturbations
are easily defined when the pij depend explicitly on a parameter vector θ . However, when the para-
meters of interest are the pij themselves, an implicit parameterizationmust be defined to preserve the
stochastic nature of P under perturbation [10,2]. In Section 4.5 we will explore new expressions for
two different forms of implicit parameterization.
Previous studies of perturbations of ergodic chains focus almost completely on perturbations of
the stationary distribution, and are divided between those focusing on sensitivity as a derivative; e.g.
[40,10,15] and studies focusing on perturbation bounds and condition numbers [13,31,42,21,26]; for
reviews see [9,27].Myapproachhere is similar in spirit to that of [40,10,15] in focusingonderivatives of
Markov chain properties with respect to parameter perturbations, but taking advantage of the matrix
calculus approach. We do not consider perturbation bounds here.
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1735
In Sections4.1–4.4wederive the sensitivities of the stationarydistribution, the fundamentalmatrix,
the mean first passage times, and the Kemeny constant. In Section 4.5 we consider implicit parameter
dependence, and in Section 5 give an ecological example.
4.1. The stationary distribution
Theorem 4.1. Let π be the stationary distribution, satisfying Pπ = π and eTπ = 1. The sensitivity of πis
dπ =[πT ⊗
(Z − πeT
)]dvec P (57)
where Z is the fundamental matrix of the chain.
Proof. The vector π is the right eigenvector of P, scaled to sum to 1. Applying Lemma 2.1, and noting
that λ = 1 and eTP = eT, gives dπ = Z[πT ⊗
(Is − πeT
)]dvec P. Noting that Zπ = π and
simplifying the Kronecker products yields (57). �
Based on an analysis of eigenvector sensitivity [32], Golub andMeyer [15] derived an expression for
the derivative of π to a change in a single element of P using the group generalized inverse (I − P)#
of I− P. Since (I − P)# = Z − πeT [15], expression (57) is exactly the Golub–Meyer result expressed
in matrix calculus notation. Our results here permit sensitivity analysis of functions of π using only
the chain rule. If g(π) is a vector- or scalar-valued function of π , then
dg(π) = dg
dπT
dπ
dvec TPdvec P. (58)
Some examples will appear in Section 5.
4.2. The fundamental matrix
The fundamentalmatrix Z =(I − P + πeT
)−1plays a role in ergodic chains similar to that played
byN1 in absorbing chains [25]. It has been extended using generalized inverses [30,24], but we do not
consider those extensions here.
Theorem 4.2. The sensitivity of the fundamental matrix is
dvec Z =(ZT ⊗ Z
) {Is2 −
[eπT ⊗
(Z − πeT
)]}dvec P. (59)
Proof. From (A-10),
dvec Z = −(ZT ⊗ Z
)dvec
(I − P + πeT
)(60)
=(ZT ⊗ Z
)(dvec P − (e ⊗ Is) dπ) . (61)
Substituting (57) for dπ and simplifying gives (59). �
4.3. The first passage time matrix
Let R =(rij
)be the matrix of mean first passage times from j to i, given by [23, Theorem 4.7].
R = D(π)−1 (I − Z + ZdgE
). (62)
Again, this is the transpose of the expression obtained when P is row-stochastic.
1736 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
Theorem 4.3. The sensitivity of R is
dvec R = −[RT ⊗ D(π)−1
]D (vec Is) (e ⊗ Is) dπ
−{ [
Is ⊗ D(π)−1]−
[E ⊗ D(π)−1
]D (vec Is)
}dvec Z (63)
where dπ is given by (57) and dvec Z is given by (59).
Proof. Differentiating (62) gives
dR = d[D(π)−1
] (I − Z + ZdgE
) + D(π)−1 [−dZ + (dZdg
)E]. (64)
Applying the vec operator gives
dvec R =[(I − Z + ZdgE
)T ⊗ Is
]dvec
[D(π)−1
]−
[Is ⊗ D(π)−1
]dvec Z +
[E ⊗ D(π)−1
]dvec Zdg. (65)
Using (A-10) for dvec[D(π)−1
], (A-16) for dvecD(π), and (A-13) for dvec Zdg yields
dvec R = −[RTD(π) ⊗ Is
] [D(π)−1 ⊗ D(π)−1
]D(vec I) (e ⊗ I) dπ
−[I ⊗ D(π)−1
]dvec Z +
[E ⊗ D(π)−1
]D (vec I) dvec Z (66)
which simplifies to give (63). �
4.4. Mixing time and the Kemeny constant
The mixing time K of a chain is the mean time required to get from a specified state to a state
chosen at random from the stationary distribution π . Remarkably, K is independent of the starting
state [16,22] and is sometimes called Kemeny’s constant; it is a measure of the rate of convergence to
stationarity, and is K = trace(Z) [22]. In addition to being a quantity of interest in itself, the rate of
convergence also plays a role in the sensitivity of the stationary distribution of ergodic chains [21,35].
Theorem 4.4. The sensitivity of K is
dK = (vec Is)T dvec Z. (67)
Proof. Differentiating K = trace(Z) gives
dK = eT (I ◦ dZ) e. (68)
Applying the vec operator and using (A-11) gives
dK =(eT ⊗ eT
)D(vec I)dvec Z (69)
which simplifies to (67). �
4.5. Implicit parameters and compensation
Theorems 4.1–4.4 are written in terms of dvec P. However, perturbation of any element, say pkj ,
to pkj + θkj , must be compensated for by adjustments of the other elements in column j so that the
column sum is 1 [10]. Caswell [2] introduced two kinds of compensation likely to be of use in applica-
tions: additive and proportional. Additive compensation adjusts all the elements of the column by an
equal amount, distributing the perturbation θkj additively over column j. Proportional compensation
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1737
distributes θkj in proportion to the values of the pij , for i �= k. Proportional compensation is attractive
because it preserves the pattern of zero and non-zero elements within P.
To develop the compensation formulae, let us start by considering a probability vector p, of dimen-
sion s × 1, with pi ≥ 0 and∑
i pi = 1. Let θi be the perturbation of pi, and write
p(θ) = p(0) + Aθ (70)
for some matrix A to be determined. If y is a function of p, then
dy = dy
dpT
dp
dθT dθ (71)
evaluated at θ = 0. For the case of additive compensation, we write
p1(θ) = p1(0) + θ1 − θ2
s − 1− · · · − θs
s − 1
p2(θ) = p2(0) − θ1
s − 1+ θ2 − · · · − θs
s − 1
... (72)
ps(θ) = ps(0) − θ1
s − 1− θ2
s − 1− · · · + θs.
The perturbation θ1 is added to p1 and compensated for by subtracting θ1/(s−1) fromall other entries
of p; clearly∑
i pi(θ) = 1 for any perturbation vector θ .The system of equations (72) can be written
p(θ) = p(0) +(I − 1
s − 1C
)θ (73)
where C = E− I is the Toeplitzmatrixwith zeros on the diagonal and ones elsewhere. Thus thematrix
A in (70) is
A = I − 1
s − 1C. (74)
For proportional compensation, assume that pi < 1 for all i. The vector p(θ) is
p1(θ) = p1(0) + θ1 − p1θ2
1 − p2− · · · − p1θs
1 − ps
p2(θ) = p2(0) − p2θ1
1 − p1+ θ2 − · · · − p2θs
1 − ps
... (75)
ps(θ) = ps(0) − psθ1
1 − p1− psθ2
1 − p2− · · · + θs.
The perturbation θ1 is added to p1 and compensated for by subtracting θ1pi/(1 − p1) from the ith
entry of p. Again,∑
i pi(θ) = 1 for any perturbation vector θ .Eq. (75) can be written
p(θ) = p(0) +[I − D(p) C D(e − p)−1
]θ (76)
so that the matrix A in (70) is
A = I − D(p) C D(e − p)−1. (77)
1738 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
Now consider perturbation of a probability matrix P, where each column is a probability vector.
Define a perturbation matrix � where θij is the perturbation of pij . Perturbations of column j are to be
compensated by a matrix Aj , so that
P(�) = P(0) +[A1�(:, 1) · · ·As�(:, s)
](78)
where Ai compensates for the changes in column i of P. Applying the vec operator to (78) gives
vec P(�) = vec P(0) +
⎛⎜⎜⎜⎜⎝
A1
. . .
As
⎞⎟⎟⎟⎟⎠ vec� (79)
= vec P(0) +s∑
i=1
(Eii ⊗ Ai) vec�. (80)
The terms in the summation in (80) are recognizable as the vec of the product Ai�Eii; thus
P(�) = P(0) +s∑
i=1
Ai�Eii. (81)
Theorem 4.5. Let P be a column-stochastic s × s transition matrix. Let � be a matrix of perturbations,
where θij is applied to pij, and the other entries of � compensate for the perturbation. Let C = E − I. If
compensation is additive, then
P(�) = P(0) +(I − 1
s − 1C
)� (82)
dvec P
dvec T�=
[Is2 − 1
s − 1(Is ⊗ C)
]. (83)
If compensation is proportional,
P(�) = P(0) +s∑
i=1
{I − D [P(:, i)] C D [e − P(:, i)]−1
}�Eii (84)
dvec P
dvec T�= Is2 −
s∑i=1
{Eii ⊗ D [P(:, i)] C D [e − P(:, i)]−1
}. (85)
Proof. P(�) is given by (81). If compensation is additive, Ai is given by (74) for all i. Substituting into
(81) gives (82). Differentiating (82), applying the vec operator, and using (A-5) gives (83).
If compensation is proportional, substituting (77) for Ai in (81) gives (84). Differentiating yields
dP = (d�)s∑
i=1
Eii −s∑
i=1
D[P(:, 1)] C D[e − P(:, i)]−1(d�)Eii. (86)
Using the vec operator and (A-5) gives (85). �
Perturbations of P subject to compensation are given by perturbations of �. Thus for any function
y(P) we can write
dy
dvec TP
∣∣∣∣comp
= dy
dvec TP
dvec P
dvec T�(87)
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1739
where dvec P/dvec T� is given (for additive and proportional compensation) by Theorem 4.5. The
slight notational complexity is worthwhile for clarifying how to use Theorem 4.5 in practice.
5. An ergodic chain example: species succession in a marine community
Markov chains are used by ecologists as models of species replacement (succession) in ecological
communities; e.g. [19,18,37]. In these models, the state of a point on a landscape is given by the
species occupying that point. The entry pij of P is the probability that species j is replaced by species
i between t and t + 1. If a community consists of a large number of points independently subject to
the transition probabilities in P, the stationary distribution π gives the relative frequencies of species
in the community at equilibrium.
Hill et al. [18] used a Markov chain to describe a community of encrusting organisms occupying
rock surfaces at 30–35 m depth in the Gulf of Maine. The Markov chain contained 14 species plus an
additional state (“bare rock”) for unoccupied substrate. The matrix Pwas estimated from longitudinal
data in [17,18] and is given, along with a list of species names, in Online Appendix C. We will analyze
the response of species diversity and of the Kemeny constant to perturbations of the transitionmatrix.
5.1. Biotic diversity
The stationary distribution π , with the species numbered in order of decreasing abundance and
bare rock placed at the end as state 15, is shown in Fig. 2 . The two dominant species are an encrusting
sponge (called Hymedesmia) and a bryozoan (Crisia).
The entropy of the stationary distribution, H(π) = −πT(logπ), where the logarithm is applied
elementwise, is used as an index of biodiversity; it is maximal when all species are equally abundant
and goes to 0 in a community dominated by a single species. The sensitivity of H is
dH = −(log πT + eT
)dπ . (88)
Most ecologists, however, would not include bare substrate in a measure of biodiversity [18], so we
define a “biotic diversity” Hb(π) = H (πb) where
πb = Gπ
‖Gπ‖ (89)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Stat
iona
ry d
istri
butio
n
Fig. 2. The stationary distribution for the subtidal benthic community succession model of Hill et al. [18]. States 1–14 correspond to
species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species.
For the identity of species and the transition matrix, see Appendix C.
1740 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
with G a 14 × 15 0–1 matrix that selects rows 1–14 of π . Because π is positive, ‖Gπ‖ = eTGπ .Differentiating πb gives
dπb =(
G
eTGπ− GπeTG(
eTGπ)2
)dπ (90)
which simplifies to
dπb =(G − πbe
TG
eTGπ
)dπ . (91)
This model contains no explicit parameters; perturbations of the transition probabilities them-
selves are of interest and a compensation pattern is needed. Because the relative magnitudes of the
entries in a column of P reflect the relative abilities of species to capture or to hold space, proportional
compensation is appropriate in this case because it preserves these relative abilities.
The sensitivity and elasticity of the biotic diversity Hb to changes in the matrix P, subject to pro-
portional compensation, are
dHb
dvec TP
∣∣∣∣comp
= dHb
dπTb
dπb
dπT
dπ
dvec TP
dvec P
dvec T�(92)
εHb
εvec TP
∣∣∣∣comp
= 1
Hb
dHb
dvec TPD(vec P) (93)
where the four derivatives on the right hand side of (92) are given by (88), (91), (57), and (85), respec-
tively.
The sensitivity and elasticity vectors (92) and (93) are of dimension 1 × s2 = 1 × 255. To reduce
the number of independent perturbations, we consider subsets of the pij: disturbance (in which a
species is replaced by bare rock), colonization of bare rock, replacement of one species by another, and
persistence of a species in its location, where
P[disturbance of sp. i] = psi
P[colonization by sp. i] = pis
P[persistence of sp. i] = pii
P[replacement of sp. i] = ∑k �=i,s
pki
P[replacement by sp. i] = ∑j �=i,s
pij.
Extracting the corresponding elements ofεHb
εvecTPgives the elasticities to these classes of probabilities.
Fig. 3 shows that the dominant species (1 and 2) have impacts that are larger than, and opposite in sign
to, those of the remaining species. Biodiversitywould be enhanced by increasing the disturbance of, or
the replacement of, species 1 and 2, and reduced by increasing the rates of colonization by, persistence
of, or replacement by species 1 and 2.
5.2. The Kemeny constant and mixing
Ecologists have used several measures of the rate of convergence of communities modeled by
Markov chains, including the damping ratio andDobrushin’s coefficient of ergodicity [18]. The Kemeny
constant K is an interesting addition; it gives the expected time to get from any initial state to a state
selected at random from the stationary distribution [22]. Once reaching that state, the behavior of the
chain and the stationary process are indistinguishable.
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1741
−0.01
0
0.01
0.02... to disturbance of
Elas
ticity
of H
−0.04
−0.02
0
0.02... to colonization by
−0.6
−0.4
−0.2
0
0.2... to persistence of
Species
−0.2
−0.1
0
0.1
0.2... to replacement by
Species
Elas
ticity
of H
−0.05
0
0.05
0.1
0.15... to replacement of
Species
Fig. 3. The elasticity of the biotic diversityHb(π) calculated over the biotic states of the stationary distribution of the subtidal benthic
community successionmodel of [18]. States 1–14 correspond to species, numbered in decreasing order of abundance in the stationary
distribution. State 15 is bare rock, unoccupied by any species. For the identity of species and the transition matrix, see Appendix C.
The sensitivity of K , subject to compensation, is
dK
dvec TP
∣∣∣∣comp
= dK
dvec TZ
dvec Z
dvec TP
dvec P
dvec T�(94)
where the three terms on the right hand side are given by (67), (59), and (85), respectively.
Fig. 4 shows the sensitivities dK/dvec TP, subject to proportional compensation, and aggregated
as in Fig. 3. Unlike the case with Hb, the two dominant species do not stand out from the others.
Increases in the rates of replacementwill speed up convergence, and increases in persistencewill slow
convergence. Thedisturbanceof, colonizationby, persistenceof, and replacementof species 6 (it is a sea
anemone,Urticina crassicornis) have particularly large impacts on K . Examination of row 6 and column
6 of P (Appendix C) shows that U. crassicornis has the highest probability of persistence (p66 = 0.86),and one of the lowest rates of disturbance, in the community. While it is far from dominant (Fig. 2), it
has a major impact on the rate of mixing.
6. Discussion
Given thatmany properties of finite stateMarkov chains can be expressed as simplematrix expres-
sions, matrix calculus is an attractive approach to finding the sensitivity and elasticity to parameter
perturbations. Most of the literature on perturbation analysis of Markov chains has focused on the sta-
tionary distribution of ergodic chains, but the approach here is equally applicable to absorbing chains,
and to dependent variables other than the stationary distribution. The perturbation of ergodic chains
is often studied using generalized inverses, since the influential studies ofMeyer [30,15,13,31]. Matrix
calculus provides a complementary approach; the sensitivity of the stationary distributionπ obtained
here agrees with the result obtained by Golub and Meyer [15] using the group generalized inverse.
The issue of compensation to preserve stochasticity of the transition matrix under perturbations
may have interesting generalizations. An anonymous reviewer has pointed out that, in the case of re-
versibleMarkov chains, preserving reversibility under perturbations could be addressed by combining
the approach here with results in [34].
The formulae obtained bymatrix calculusmay appear daunting, but they are in fact far simpler than
expressionswrittenwithout this formalismwould be. They are easy to implement in amatrix oriented
1742 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
−60
−40
−20
0
20... to disturbance of
Sens
itivi
ty o
f K
−2
−1
0
1
2... to colonization by
0
20
40
60... to persistence of
Species
−150
−100
−50
0... to replacement by
Species
Sens
itivi
ty o
f K
−800
−600
−400
−200
0... to replacement of
Species
Fig. 4. The sensitivity of the Kemeny constant K of the subtidal benthic community succession model of [18]. States 1–14 correspond
to species, numbered in decreasing order of abundance in the stationary distribution. State 15 is bare rock, unoccupied by any species.
For the identity of species and the transition matrix, see Appendix C.
language such as Matlab. Some of the intermediate matrices can be large, but they are often quite
sparse, and a language that can take advantage of the sparsity is helpful. The computational problems
arising in very large state spaces have not yet been investigated.
The examples shown here are typical of cases where absorbing or ergodic Markov chains are used
in population biology and ecology. In each example, the dependent variables of interest are functions
several steps removed from the chain itself. The ease with which one can differentiate such functions
is a particularly attractive property of the matrix calculus approach.
Acknowledgments
This research has been supported by NSF Grant DEB-0816514. Completion of the paper was sup-
portedbyaResearchAward fromtheAlexandervonHumboldt Foundation. I amgrateful fordiscussions
withMike Neubert, Alyson van Raalte, Michal Engelman, Catherine Nosova, Jeffrey Hunter, Carl Meyer,
Joel Cohen, and students at the International Max Planck Research School in Demography course in
Perturbation Analysis of Longevity. The comments of two reviewers helped to improve themanuscript.
The hospitality of the Max Planck Institute for Demographic Research was helpful in developing these
ideas.
Appendix A. Matrix calculus
Matrix calculus permits the consistent differentiation of scalar-, vector-, and matrix-valued func-
tions of scalar, vector, or matrix arguments [28,29,41]; for an introduction, see [1], and for ecological
and demographic applications see [4–6]. Here we describe the approach briefly.
If y is a n× 1 vector and x am× 1 vector, the derivative of y with respect to x is the n×m Jacobian
matrix
dy
dxT =(
dyi
dxj
). (A-1)
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1743
Derivatives of matrices are written by transforming the matrices into vectors using the vec operator,
which stacks the columns of the matrix into a column vector. The derivative of the m × n matrix Y
with respect to the p × q matrix X is themn × pq matrix
dvec Y
dvec TX(A-2)
where vec TX ≡ (vecX)T. These definitions (unlike others that do not vectorize the matrices [28])
lead to the familiar chain rule; e.g.
dvec Y
dvec TZ= dvec Y
dvec TX
dvecX
dvec TZ. (A-3)
Derivatives are constructed from differentials, where the differential of a matrix is the matrix of
differentials of the elements. If, for vectors x and y and some matrix A, it can be shown that
dy = Adx (A-4)
then
dy
dxT = A. (A-5)
(the “first identification theorem” of Magnus and Neudecker [28]). Because differentials operate ele-
mentwise, dvecX = vec dX.
The combination of the chain rule and the identification theorem permits more complicated ex-
pressions involving differentials to be turned into derivatives with respect to an arbitrary vector, say
θ . If
dy = Adx + Bdz (A-6)
then
dy
dθT = Adx
dθT + Bdz
dθT (A-7)
for any θ .Obtaining expressions like (A-5) or (A-6) often uses the result [38] that
vec (ABC) =(CT ⊗ A
)vec B. (A-8)
A.1. Sensitivities and elasticities
Perturbation analysis may involve comparison of parameters measured on very different scales.
In such cases, it is helpful to compare proportional effects of proportional perturbations, also called
elasticities. If the sensitivity of y to x is dy/dxT, then the elasticity is
εy
εxT = D(y)−1 dy
dxT D(x). (A-9)
There seems to be no accepted notation for elasticities; the notation used here is adapted [8] from that
in [39].
A.2. Some useful results
Several matrix calculus results will be used repeatedly. The derivative of the inverse of a matrix X
is [29]
dvecX−1 = −[(
X−1)T ⊗ X−1
]dvecX. (A-10)
1744 H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745
The derivative of the Hadamard product A ◦ B is [28]
d (A ◦ B) = D(vecA)dvec B + D(vec B)dvec A. (A-11)
The derivative of Xdg is obtained by differentiating Xdg = I ◦ X,
dXdg = I ◦ dX (A-12)
and applying the vec operator and (A-11) to obtain
dvecXdg = D(vec I)dvecX. (A-13)
The diagonal matrix D(x) can be written
D(x) = I ◦(x eT
). (A-14)
Differentiating and applying the vec operator gives
dvecD(x) = D(vec I)vec(dx eT
)(A-15)
= D(vec I) (e ⊗ I) dx (A-16)
Appendix B. Supplementary material
Supplementary data associated with this article can be found, in the online version, at
doi:10.1016/j.laa.2011.07.046.
References
[1] K.M. Abadir, J.R. Magnus, Matrix Algebra Econometric Exercises 1, Cambridge University Press, Cambridge, United Kingdom,
2005.
[2] H. Caswell, Matrix Population Models: Construction, Analysis, and Interpretation, second ed., Sinauer Associates, Sunderland,Massachusetts, USA, 2001.
[3] H. Caswell, Applications ofMarkov chains in demography, in: A.N. Langville,W.J. Stewart (Eds.), MAM2006:Markov AnniversaryMeeting, Boson Books, Raleigh, North Carolina, USA, 2006, pp. 319–334.
[4] H. Caswell, Sensitivity analysis of transient population dynamics, Ecol. Lett. 10 (2007) 1–15.[5] H. Caswell, Perturbation analysis of nonlinear matrix population models, Demogr. Res. 18 (2008) 59–116.
[6] H. Caswell, Stage, age, and individual stochasticity in demography, Oikos 118 (2009) 1763–1782.
[7] H. Caswell, Perturbation analysis of continuous-time absorbing Markov chains, Numer. Linear Algebra Appl. (2011),doi:10.1002/nla.791.
[8] H. Caswell, M.G. Neubert, C.M. Hunter, Demography and dispersal: invasion speeds and sensitivity analysis in periodic andstochastic environments, Theor. Ecol. (2010)z, doi:10.1007/s12080-010-0091-z.
[9] G.E. Cho, C.D. Meyer, Comparison of perturbation bounds for the stationary distribution of aMarkov chain, Linear Algebra Appl.335 (2000) 137–150.
[10] J. Conlisk, Comparative statics for Markov chains, J. Econom. Dynam. Control 19 (1985) 139–151.
[11] J.N. Darroch, E. Seneta, On quasi stationary distributions in absorbing discrete time finiteMarkov chains, J. Appl. Probab. 2 (1965)88–100.
[12] R.D. Edwards, S. Tuljapurkar, Inequality in life spans and a new perspective on mortality convergence across industrializedcountries, Popul. Develop. Rev. 31 (2005) 645–674.
[13] R.E. Funderlic, C.D. Meyer Jr., Sensitivity of the stationary distribution vector for an ergodic Markov chain, Linear Algebra Appl.76 (1986) 1–17.
[14] R.P. Gillespie, Partial Differentiation, second ed., Oliver and Boyd, Edinburgh, United Kingdom, 1954.
[15] G.H. Golub, C.D. Meyer Jr., Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivityof stationary probabilities for Markov chains, SIAM J. Algebraic Discrete Methods 7 (1986) 273–281.
[16] C.M. Grinstead, J.L. Snell, Introduction to Probability, second ed., American Mathematical Society, 2003.[17] M.F. Hill, J.D. Witman, H. Caswell, Spatio-temporal variation in Markov chain models of subtidal community succession, Ecol.
Lett. 5 (2002) 665–675.[18] M.F. Hill, J.D. Witman, H. Caswell, A Markov chain model of a rocky subtidal community: succession and species interactions in
a complex assemblage, Am. Nat. 164 (2004) E46–E61.[19] H.S. Horn, Markovian properties of forest succession, in: M.L. Cody, J.M. Diamond (Eds.), Ecology and Evolution of Communities,
Harvard University Press, Cambridge, MA, 1975, pp. 196–211.
[20] Human Mortality Database, University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Ger-many). Available from: <http://www.mortality.org> or <http://www.humanmortality.de>.
[21] J.J. Hunter, Stationary distributions and mean first passage times of perturbed Markov chains, Linear Algebra Appl. 410 (2005)217–243.
H. Caswell / Linear Algebra and its Applications 438 (2013) 1727–1745 1745
[22] J.J. Hunter, Mixing times with applications to perturbed Markov chains, Linear Algebra Appl. 417 (2006) 108–123.
[23] M. Iosifescu, Finite Markov Processes and their Applications, Wiley, New York, USA, 1980.[24] J.G. Kemeny, Generalization of a fundamental matrix, Linear Algebra Appl. 38 (1981) 193–206.
[25] J.G. Kemeny, J.L. Snell, Finite Markov Chains, Van Nostrand, New York, NY, USA, 1960.[26] S. Kirkland, Conditioning properties of the stationary distribution for a Markov chain, Electron. J. Linear Algebra 10 (2003) 1–15.
[27] S.J. Kirkland, M. Neumann, N-S. Sze, On optimal condition numbers for Markov chains, Numer. Math. 110 (2008) 521–537.
[28] J.R. Magnus, H. Neudecker, Matrix differential calculus with applications to simple, Hadamard, and Kronecker products, J. Math.Psych. 29 (1985) 474–492.
[29] J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley & Sons,New York, NY, USA, 1988.
[30] C.D. Meyer, The role of the group generalized inverse in the theory of finite Markov chains, SIAM Rev. 17 (1975) 443–464.[31] C.D. Meyer, Sensitivity of the stationary distribution of a Markov chain, SIAM J. Matrix Anal. Appl. 15 (1994) 715–728.
[32] C.D. Meyer, G.W. Stewart, Derivatives and perturbations of eigenvectors, SIAM J. Numer. Anal. 25 (1982) 679–691.[33] A.Y. Mitrophanov, Stability and exponential convergence of continuous timeMarkov chains, J. Appl. Probab. 40 (2003) 970–979.
[34] A.Y. Mitrophanov, Reversible Markov chains and spanning trees, Math. Sci. 29 (2004) 107–114.
[35] A.Y. Mitrophanov, Sensitivity and convergence of uniformly ergodic Markov chains, J. Appl. Probab. 42 (2005) 1003–1014.[36] A.Y. Mitrophanov, A. Lomsadze, M. Borodovsky, Sensitivity of hidden Markov models, J. Appl. Probab. 42 (2005) 632–642.
[37] L.C. Nelis, J.T. Wootton, Treatment-basedMarkov chainmodels clarify mechanisms of invasion in an invaded grassland commu-nity, Proc. B Roy. Soc. London 277 (2010) 539547, doi:10.1098/rspb.2009.1564.
[38] W.E. Roth, On direct product matrices, Bull. Amer. Math. Soc. 40 (1934) 461–468.[39] P.A. Samuelson, Foundations of Economic Analysis, Harvard University Press, Cambridge, Massachusetts, USA, 1947.
[40] P.J. Schweitzer, Perturbation theory and finite Markov chains, J. Appl. Probab. 5 (1968) 401–413.
[41] G.A.F. Seber, A Matrix Handbook for Statisticians, Wiley, New York, NY, USA, 2008.[42] E. Seneta, Perturbation of the stationary distributionmeasured by ergodicity coefficients, Adv. Appl. Probab. 20 (1988) 228–230.
[43] E. Seneta, Sensitivity of finite Markov chains under perturbation, Stat. Probab. Lett. 17 (1993) 163–168.[44] J.W. Vaupel, V. Canudas Romo, Decomposing change in life expectancy: a bouquet of formulas in honor of Nathan Keyfitz’s 90th
birthday, Demography 40 (2004) 201–216.[45] J.R. Wilmoth, S. Horiuchi, Rectangularization revisited: variability of age at death within human populations, Demography 36
(1999) 475–495.