+ All Categories
Home > Documents > GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence...

GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence...

Date post: 25-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
236
GEOMETRICALLY ERGODIC MARKOV CHAINS AND THE OPTIMAL CONTROL OF QUEUES PROEFSCHRIFT TER VERKRIJGING VAN DE GRAAD VAN DOCTOR AAN DE RIJKSUNIVERSITEIT TE LEIDEN, OP GEZAG VAN DE RECTOR MAGNIFICUS DR. J.J.M. BEENAKKER, HOOGLERAAR IN DE FACULTEIT DER WISKUNDE EN NATUURWETENSCHAPPEN, VOLGENS BESLUIT VAN HET COLLEGE VAN DEKANEN TE VERDEDIGEN OP DONDERDAG 14 JUNI 1990 TE KLOKKE 16.15 UUR DOOR FLORA MARGARETHA SPIEKSMA GEBOREN TE CARACAS, VENEZUELA, IN 1958
Transcript
Page 1: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

GEOMETRICALLY ERGODIC MARKOV CHAINS

AND

THE OPTIMAL CONTROL OF QUEUES

PROEFSCHRIFT

TER VERKRIJGING VAN DE GRAAD VAN

DOCTOR AAN DE RIJKSUNIVERSITEIT TE LEIDEN,

OP GEZAG VAN DE RECTOR MAGNIFICUS

DR. J.J.M. BEENAKKER, HOOGLERAAR IN DE

FACULTEIT DER WISKUNDE EN NATUURWETENSCHAPPEN,

VOLGENS BESLUIT VAN HET COLLEGE VAN DEKANEN

TE VERDEDIGEN OP

DONDERDAG 14 JUNI 1990 TE KLOKKE 16.15 UUR

DOOR

FLORA MARGARETHA SPIEKSMA

GEBOREN TE CARACAS, VENEZUELA, IN 1958

Page 2: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Samenstelling van de promotiecommissie:

Promotor: prof. dr. A. Hordijk

Referent: prof. dr. H.C. Tijms

Overige leden: prof. dr. F.A. van der Duyn Schouten

prof. dr. J. Fabius

prof. dr. A.A. Verrijn Stuart

dr. L.C.M. Kallenberg

dr. ir. R. Dekker

The research on which this thesis is based, has been part of the research program of the

Netherlands Foundation for Mathematics S.M.C., which was financially supported by the

Netherlands Organization for Scientific Research N.W.O.

Page 3: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PREFACE

This monograph is a report on my Ph.D. research during the past years. It consists of

three self-contained parts.

I. Geometric ergodicity of Markov chains.

II. Uniform geometric recurrence of Markov decision chains.

III. Linear programming and constrained optimal control of queues.

Each part has its own introduction, which reviews the obtained results and compares them

to related work in the literature. In particular Part I contains an extensive overview of the

literature on geometric ergodicity and its verification. I have included it to this extent,

because only few papers on this subject have been published during the past decade. This

preface will give a brief survey of my work, and an overview of (forthcoming) publications

that contain material from the thesis.

As a reader’s memory back up I have included extensive lists, which comprise most defi-

nitions and symbols used. The cartoons that make this thesis more enjoyable to read, are

drawn by Ton Smits. Most of them appeared as illustrations of a paper by dr. ir. B. van

der Veen called “De Problematiek van het Wachten” (in Dutch), published in Informatie

13, 152-156, 1971. Lidwien Smits kindly permitted me to use them.

Survey.

Part I studies geometric ergodicity of denumerable, multichain Markov chains. It intro-

duces a new condition called strong convergence. This condition requires the existence

of a vector µ on the state space, with components greater or equal to 1, such that the

n-step transition probability matrix converges geometrically fast to the stationary matrix

in µ-norm (this is a weighted supremum norm with bounding vector µ). Strong conver-

gence turns out to be equivalent to geometric ergodicity under conditions on stability and

the number of closed classes. Moreover, for bounded µ-vectors it is equivalent to strong

ergodicity. Thus strong convergence links geometric and strong ergodicity.

Since strong convergence is hardly verifiable for applications, we developed another crite-

rion called strong recurrence. Strong recurrence requires the existence of a vector µ, with

components greater or equal to 1, such that the taboo transition matrix is a contraction

in µ-norm for some finite taboo set. The Key result of Part I is the equivalence of strong

recurrence and strong convergence. Strong recurrence for a specific µ-vector implies strong

convergence for the same µ-vector. This is very important for Markov reward processes.

Properties of geometrically ergodic Markov chains are often studied through an analysis

of the generating functions of the marginal probabilities. We give a characterization of

strong recurrence and strong convergence in which these generating functions appear as

Page 4: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

µ-bounded matrix functions.

The verifiability of strong recurrence is illustrated by our construction of a suitable µ-

vector in the strong recurrence property for two two-dimensional queueing models. This

µ-vector is of productform, such that it increases exponentially fast with increasing states.

This does not only show geometric ergodicity of these models, but also geometrically

fast convergence of the Laplace-Stieltjes transforms of the marginal distributions in a

neighbourhood of 0.

The basic concepts of Part I stem from conditions used in Markov decision theory called

uniform strong convergence and uniform strong recurrence respectively. “Uniform” spec-

ifies the corresponding property to be satisfied uniformly in the set of Markov chains

generated by the deterministic policies. From the literature it is known that both uniform

strong convergence and uniform strong recurrence guarantee the existence of deterministic

(stationary) sensitive optimal policies in denumerable Markov decision chains with com-

pact action spaces, under additional continuity and compactness conditions. Blackwell

optimality is the most sensitive criterion and average optimality is least sensitive. The

results are obtained through the analysis of the Laurent expansion of the α-discounted

rewards in a neighbourhood of 1.

It is useful to have insight in the classes of models that are uniformly strong recurrent

or uniformly strong convergent. It is also interesting to know whether there are weaker

conditions that allow a similar analysis of the Laurent expansion. The Key result of

Part II asserts the equivalence of uniform strong recurrence and uniform strong conver-

gence. Moreover, uniform strong recurrence turns out to be a weak condition to ensure

the existence of a suitable Laurent expansion.

The remainder of Part II studies interesting implications of the uniform strong recurrence

property. We show convergence of the successive approximations algorithm for average

expected rewards in multichain Markov decision chains. If the µ-vector is unbounded, as

is the case in most applications, the rewards are allowed to be unbounded. Since many

queueing models are unichain, the algorithm is a powerful tool to determine the structure

of an optimal policy.

A Blackwell optimal policy in the finite state and action model can be obtained as a

limiting policy of α-discounted optimal policies for α tending to 1. Two counterexamples

show that this is not a reliable method for more general models. Part II ends with two

multi-dimensional queueing models. The existence of sensitive optimal policies is shown

by the verification of uniform strong recurrence, similar to Part I.

In practice it can be desirable to maximize the average expected reward under a constraint

on the average expected cost, e.g. to maximize the throughput while the expected delay of

customers is not allowed to exceed some predetermined bound. The linear programming

formulation of the average reward constrained optimization problem is quite effective to

show the existence of optimal policies and to determine subclasses of the policy space that

Page 5: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

contain an optimal policy. In Part III we study this linear programming formulation for

unichain Markov decision chains with denumerable state and finite action spaces. Since

the corresponding linear programming problem is infinite dimensional in this case, new

proof techniques are needed. The Key lemma of Part III is essential for the derivations. It

states that the stationary distribution on states and actions under a stationary policy is a

convex combination of corresponding distributions for stationary policies that randomize

in one state less. We use this to show the existence of a stationary optimal policy that

randomizes in at most one state between two actions, if there is precisely one constraint

on the average expected cost. This simplifies the search for optimal policies.

As an application we study one-dimensional queueing models. The structure of these

models allows simple expressions for the weights in the convex combinations of the key

lemma. This results in a unifying method to derive conditions for threshold optimality.

The method comprises all known (one-dimensional) queueing models from the literature.

Publications and forthcoming reports that contain material from this thesis.

Part I :

[1] ON ERGODICITY AND RECURRENCE PROPERTIES OF A MARKOV CHAIN WITH AN APPLI-

CATION TO AN OPEN JACKSON NETWORK (with A. Hordijk). Report of the Dep. of

Math. & Comp. Sci., Univ. of Leiden, 1989, submitted.

[2] GEOMETRIC ERGODICITY OF THE ALOHA-SYSTEM AND A COUPLED PROCESSORS MODEL.

Report of the Dep. of Math. & Comp. Sci., Univ. of Leiden, 1990, submitted.

[3] A NOTE ON µ-EXPONENTIAL ERGODICITY IN UNIFORMIZABLE MARKOV PROCESSES WITH

OR WITHOUT CONTROL. Report of the Dep. of Math. & Comp. Sci., Univ. of Leiden,

1990, submitted.

[4] A NEW FORMULA FOR THE DEVIATION MATRIX (with A. Hordijk). Report of the Dep.

of Math. & Comp. Sci., Univ. of Leiden, 1990, forthcoming.

Part II :

[5] ARE LIMITS OF α-DISCOUNTED OPTIMAL POLICIES BLACKWELL OPTIMAL? A COUNTER-

EXAMPLE (with A. Hordijk). Systems Control Lett. 13, 31-41, 1989.

[6] THE EXISTENCE OF SENSITIVE OPTIMAL POLICIES IN TWO MULTI-DIMENSIONAL QUEUE-

ING MODELS. Report of the Dep. of Math. & Comp. Sci., Univ. of Leiden, 1989,

submitted.

[7] ON THE RELATION BETWEEN RECURRENCE AND ERGODICITY PROPERTIES IN DENUMER-

ABLE MARKOV DECISION CHAINS (with R. Dekker & A. Hordijk). Report of the Dep.

of Math. & Comp. Sci., Univ. of Leiden, 1990, forthcoming.

[8] ON THE CONVERGENCE OF SUCCESSIVE APPROXIMATIONS UNDER STRONG RECURRENCE

CONDITIONS (with A. Hordijk). Report of the Dep. of Math. & Comp. Sci., Univ. of

Page 6: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Leiden, 1990, forthcoming.

Part III :

[9] CONSTRAINED ADMISSION CONTROL TO A QUEUEING SYSTEM (with A. Hordijk). Adv.

Appl. Prob. 21, 409-431, 1989.

[10] CHARACTERIZATION OF THE FEASIBLE SOLUTIONS TO THE DUAL LINEAR PROGRAM IN

MARKOV DECISION CHAINS WITH OR WITHOUT CONSTRAINTS (with A. Hordijk). Report

of the Dep. of Math. & Comp. Sci., Univ. of Leiden, 1990, forthcoming.

[1] consists of sections 1.1.2, 1.2, 2.1 and 2.2, together with a special case of the two centre

open Jackson network of Chapter 9. [2] is Chapter 3, [3] contains sections 2.3 and 6.2.2

and [4] is section 4.3.

[5] is essentially section 8.2.1, [6] contains most of Chapter 9, except for the proofs for the

necessity of the conditions on the Laplace-Stieltjes transforms. [7] consists of sections 6.1,

6.2.1 and possibly 6.3 and [8] is Chapter 7.

Finally [9] is Chapter 12 and [10] contains material from Chapters 10 and 11.

Page 7: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

CONTENTS

PART I. GEOMETRIC ERGODICITY OF MARKOV CHAINS

Chapter 1. Introduction and Overview

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 On what is yet to come . . . . . . . . . . . . . . . . . . . 1

1.2 On the existing literature . . . . . . . . . . . . . . . . . . . 8

2. Characterization of ergodicity properties . . . . . . . . . . . . . . . 11

Chapter 2. Equivalence of Strong Convergence and Strong Recurrence

1. Preliminary lemmas on related recurrence conditions . . . . . . . . . 18

2. The Key theorem and its relation with geometric ergodicity . . . . . . 24

3. Strong convergence and recurrence of a uniformizable Markov process . . 32

Chapter 3. Applications

1. Special theorems . . . . . . . . . . . . . . . . . . . . . . . . . 40

2. The one-dimensional random walk . . . . . . . . . . . . . . . . . 43

3. Buffered, asymmetric ALOHA-type system with two queues . . . . . . . 45

4. Two coupled processors . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4. The Region of Convergence of (1− z)P (z)

1. Analyticity in the disk D0,R . . . . . . . . . . . . . . . . . . . . 62

1.1 On Kendall’s criterion for geometric ergodicity . . . . . . . . . 62

1.2 Characterization of strong convergence . . . . . . . . . . . . 64

1.3 Elementary construction of the Laurent expansion . . . . . . . . 68

2. Analyticity in the disk D1,R . . . . . . . . . . . . . . . . . . . . 71

2.1 The Laurent expansion through data transformation . . . . . . . 71

2.2 Characterization of strong recurrence . . . . . . . . . . . . . 74

3. A new formula for the deviation matrix . . . . . . . . . . . . . . . 79

Page 8: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PART II. UNIFORM GEOMETRIC RECURRENCE

OF MARKOV DECISION CHAINS

Chapter 5. Introduction

1. The results versus the literature . . . . . . . . . . . . . . . . . . 85

2. The model with results from the literature . . . . . . . . . . . . . . 94

Chapter 6. Necessary and Sufficient Conditions for the Laurent Expansion

1. Lemmas on the underlying MDC structure . . . . . . . . . . . . . 101

2. Equivalence of recurrence and ergodicity properties . . . . . . . . . 106

2.1 Markov decision chains . . . . . . . . . . . . . . . . . . 106

2.2 Uniformizable Markov decision processes . . . . . . . . . . . 110

3. On the relation with Lasserre’s condition . . . . . . . . . . . . . 112

Chapter 7. Convergence of the Successive Approximations Algorithm

1. The iteration scheme . . . . . . . . . . . . . . . . . . . . . . 116

2. v-conserving policies are s-average optimal . . . . . . . . . . . . . 117

3. Convergence of the iteration scheme. . . . . . . . . . . . . . . . 120

Chapter 8. On the Limits of α-discounted optimal Policies

1. Sensitive optimality results . . . . . . . . . . . . . . . . . . . 127

2. Disproving stronger sensitive optimality results . . . . . . . . . . . 125

2.1 MDC’s with finite state and compact action spaces . . . . . . 129

2.2 MDC’s with denumerable state and finite action spaces . . . . 136

Chapter 9. Applications

1. Introductory remarks . . . . . . . . . . . . . . . . . . . . . . 141

2. K competing queues . . . . . . . . . . . . . . . . . . . . . . 142

2.1 General description of the model and results . . . . . . . . . 142

2.2 Time-slotted model . . . . . . . . . . . . . . . . . . . . 144

2.3 The exponential model . . . . . . . . . . . . . . . . . . . 149

2.4 The semi-Markov model . . . . . . . . . . . . . . . . . . 150

3. The two centre open Jackson network . . . . . . . . . . . . . . . 154

Page 9: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PART III. LINEAR PROGRAMMING AND

CONSTRAINED OPTIMAL CONTROL OF QUEUES

Chapter 10. Introduction of Linear Programs and Results

1. Linear programs . . . . . . . . . . . . . . . . . . . . . . . . 154

2. Some results on stationary policies . . . . . . . . . . . . . . . . 159

Chapter 11. Optimization and Constrained Optimization . . . . . . . . 162

1. Polyhedron of state-action frequencies . . . . . . . . . . . . . . . 162

2. Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 168

2.1 General assumptions . . . . . . . . . . . . . . . . . . . 168

2.2 Weakly uniformly integrable rewards . . . . . . . . . . . . . 163

2.3 Upper-s-bounded rewards . . . . . . . . . . . . . . . . . . 171

3. Constrained optimization . . . . . . . . . . . . . . . . . . . . 172

4. The importance of being tight . . . . . . . . . . . . . . . . . . 176

Chapter 12. Constrained Admission Control to a Queueing System . . . 184

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 184

2. Optimal structure for specific performance measures . . . . . . . . 186

3. Optimality of “nearly deterministic” policies . . . . . . . . . . . . 188

4. Criteria for threshold and thinning optimality . . . . . . . . . . . 189

5. Algorithm and sufficient conditions . . . . . . . . . . . . . . . . 192

6. Proofs of the optimal structure of the models from section 12.2 . . . . 195

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Definition Finder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?

Symbol Finder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?

Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Nawoord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?

Curriculum Vitæ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?

Page 10: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PART I

GEOMETRIC ERGODICITY OF MARKOV CHAINS

Geometric ergodicity paves the highway to equilibrium

Page 11: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 1

CHAPTER ONE

Introduction and Overview.

1. Introduction.

1.1. On what is yet to come.

Geometric ergodicity has been a topic of numerous papers. For stable systems this property

requires the equilibrium situation to be attained at a geometric rate. So for computational

purposes geometric ergodicity is a desirable property, especially when no explicit formula

for the stationary distribution is known. However, even the problem of characterizing

just ergodicity is quite complicated for models with a multi-dimensional state space, cf.

Malyshev [1972], Malyshev & Mensikov [1981], Rosberg [1981], Fayolle & Iasnogorodski

[1979], Szpankowski [1988], Makowski & Shwartz [1989]. Results on geometric ergodicity

of multi-dimensional systems hardly exist.

The analysis of geometric ergodicity in this monograph has three main topics. In the first

place we will develop a unifying framework for geometric and strong ergodicity of denu-

merable Markov chains (MC’s) and its relation to various recurrence conditions. Therefore

we will introduce a new property, called µ-geometric ergodicity, which requires geomet-

rically fast convergence of the n step transition probabilities to the stationary transition

probabilities in µ-norm. This µ-norm is a weighted supremum norm with weighting or

bounding vector µ.

Let E denote a denumerable state space. An important result of Part I is the equivalence

of geometric ergodicity of stable MC’s and µ-geometric ergodicity for some µ with µi ≥ 1,

∀ i ∈ E. Moreover, the MC is strongly ergodic if and only if it is µ-geometrically ergodic

for a bounded µ. Hence, the notion of strong convergence, which is µ-geometric ergodicity

for some µ with µi ≥ 1, i ∈ E, generalizes geometric as well as strong ergodicity. We

emphasize the significance of strong convergence. It implies geometrically fast convergence

not only of the marginal probabilities, but also of the marginal expected rewards for any

µ-bounded reward function.

In order to verify whether specific queueing models are ergodic in some sense, several cri-

teria for ergodicity have been developed in the literature. These criteria require some kind

of recurrence of the MC to a finite set. The weakest form is a version of Foster’s criterion

for positive recurrence. The strongest is the Doeblin condition for strong ergodicity. The

Page 12: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

2 Part I, Chapter 1

criterion in between is Popov’s condition for geometric ergodicity. All three criteria apply

to irreducible MC’s.

In our derivations we use apparently weaker versions of these conditions. These enable us

to unify the two latter criteria into one notion, called strong recurrence.

This notion uses the matrix of taboo transition probabilities MP , which is defined as

follows: MPij = Pij if j 6∈ M , and Pij = 0 if j ∈ M . M ⊂ E is called the taboo set. A

MC is µ-geometrically recurrent if MP is a contraction in µ-norm for some finite taboo set

M . If the MC is µ-geometrically recurrent for some µ with µi ≥ 1, i ∈ E, we call the chain

strong recurrent. In terms of the drift, the condition tells us that the mean drift vector d,

with di = IE(µX(n+1) − µX(n), X(n+ 1) 6∈M | X(n) = i

), is upper-bounded by µ times a

negative constant. So, if µi ≥ 1, for all i ∈ E, the mean drift vector is strictly negative.

This implies the existence of a solution to a generalized version of Foster’s criterion (cf.

Hordijk [1974], Tweedie [1975]), so the MC is stable.

The Key theorem of Part I is the equivalence of strong recurrence and strong convergence.

Moreover, strong recurrence turns out to be equivalent to Popov’s [1977] necessary and

sufficient criterion for geometric ergodicity of irreducible, ergodic MC’s. His result can be

extended to stable MC’s and we conclude that geometric ergodicity and strong convergence

are equivalent for stable systems.

In fact strong convergence states, that the transition matrix is a quasi-compact operator in

a weighted supremum norm. In this context it is interesting to quote a remark by Kendall

[1960], p.183, who states that “the theory of quasi-compact operators is completely useless”

as a technique for the verification of geometric ergodicity of models. Since our method to

check geometric ergodicity consists of verifying that the matrix of transition probabilities

is µ-geometrically recurrent, it is a quasi-compact operator in this µ-norm.

The results on strong convergence mentioned above concern discrete time systems. How-

ever, many queueing processes evolve continuously in time. Tweedie [1981] derives a cri-

terion -on intensity matrices- for an irreducible Markov process (MP) to be exponentially

ergodic (this is geometric ergodicity in continuous time). In his paper he remarks that his

criterion is related to Popov’s criterion for geometric ergodicity. Indeed, for uniformiz-

able MP’s it is straightforward to show that Tweedie’s criterion is equivalent to Popov’s

criterion for the approximating MC (AMC). The transition probabilities of this AMC are

first order approximations of the continuous time transition probabilities. This connects

exponential ergodicity of a MP to geometric ergodicity of the AMC.

It is interesting to know whether other ergodicity and recurrence properties of the AMC

can be carried over to the MP as well. For uniformizable Markov processes (MP’s) we will

indeed show that strong recurrence of the AMC is equivalent to strong convergence and

recurrence of the MP as well as to Tweedie’s [1981] necessary and sufficient criterion for

exponential ergodicity of an irreducible MP. Moreover, the restriction to irreducible MP’s

turns out not to be necessary.

Page 13: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 3

So, the analysis of a time-discretized version of the MP has interesting consequences for

the MP itself as a by-product. The reason for our focus on time-discretized systems is

their high practical significance. For instance, the open Jackson network is an important

tool to analyse the performance of computer configurations, which in practice operate in

a time-slotted way.

So for the verification of geometric ergodicity we can use strong recurrence. This condition

is relatively easy to verify, since it is a condition on the one step transition matrix. As it

implies strong convergence for the same µ vector as well, more information on the evolution

of the system of the system is directly available. This takes us to the second main topic

of Part I.

For various multi-dimensional queueing models we have been able to show strong recur-

rence. These include the two centre open Jackson network with state dependent service

rates, a buffered ALOHA type system, a coupled processors model, and various versions

of the K competing queues model with a fixed service control rule. The conditions re-

quired to establish strong recurrence (hence geometric ergodicity) are a mean negative

drift condition to obtain ergodicity and a spatial geometric boundedness type of condition.

There is no formal definition of spatial geometric boundedness. In stable, discrete time

models it amounts to requiring the probability distribution IP(|X(n+1)−X(n)| |X(n)=i

)to have exponentially fast decaying tails, with one decay parameter for all initial states

i ∈ E. As a condition on the parameters of the analysed models we have to require spatial

geometric boundedness of the service time or arrival distributions, i.e. convergence of the

respective Laplace-Stieltjes transforms in a neighbourhood of 0. In the case of exponential

or Poisson distributions or in case the transition probabilities have bounded support, this

condition is satisfied.

From this point of view it is not remarkable that the strong recurrence property itself has

a spatial geometric boundedness-like character, so that the bounding vector µ increases

exponentially fast with increasing states. It even has a productform structure. On the

other hand, Kendall [1960] showed that for geometric ergodicity of irreducible MC’s, it

is necessary and sufficient that the probability generating function of the recurrence time

to a special state has convergence radius strictly exceeding 1. Hence the recurrence times

are exponentially bounded (cf. Propositions 1.3, 1.5). This is a temporal geometric bound-

edness condition. Thus geometric ergodicity seems to be intimately connected to both

spatial and temporal geometric boundedness. This relation is typical for the analysis of

geometric ergodicity in applications and was already observed by Miller [1966]. As far as

we know, he was the first to use the concepts of spatial and temporal geometric bounded-

ness. We adopted this terminology as a clear reflection of the character of all conditions

used in this context.

Strong convergence does not only imply temporal geometric boundedness. Also the ex-

pected reward received at time n before returning to some finite fixed set, converges

geometrically fast to 0 for any reward vector bounded by µ, as n tends to infinity. More-

Page 14: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

4 Part I, Chapter 1

over, the productform structure of the µ-vector implies the Laplace-Stieltjes transforms

of the marginal distributions in a neighbourhood of 0 to converge at a geometric rate (cf.

Proposition 3.1). Hence, the convergence of moments of any order is guaranteed. It is

interesting to notice that this applies to one dimensional models from the literature as

well, such as the random walk process (1.3) in Miller [1966]. Under his conditions (4.3)

and (5.8) it is easy to check that µ with µi = (1 + x)i, i ∈ IN0, is a suitable bounding

vector for some x > 0, so the random walk is strongly recurrent (cf. section 3.2).

The relation between strong recurrence and convergence can be used as well to settle the

question whether any of the queuing models we discussed, is strongly ergodic. A related

recurrence condition enables us to prove straightforwardly, that a MC having one of the

following properties is not strongly ergodic: a generalized skip-free property to the left or

the random walk structure (1.3) in Miller [1966].

Apart from the one-dimensional random walk all models we study, have the same charac-

teristic that at most one service is finished per unit of time in the discrete time process.

Exponential systems are special cases of such models. The exponential queueing systems

we analyse, are the two centre open Jackson network, the two coupled processors and

the K competing queues model. Strong recurrence of time-discretized versions of these

models is shown under an ergodicity assumption. Since these versions are in fact the

approximating chains of the corresponding continuous time models, strong convergence

and geometric or exponential ergodicity hold for both the discrete and continuous time

processes. Because only Poisson and exponential distributions are involved, ergodicity is

necessary and sufficient for strong recurrence.

For the analysis of the coupled processors we use the rather complicated ergodicity condi-

tions that are derived in Fayolle & Iasnogorodski [1979]. We think that our analysis gives

a better insight in the necessity of these ergodicity conditions.

As examples of the more general model we examine discrete time GI/M/s- type queueing

systems, i.e. time-slotted versions of the ALOHA system and the K competing queues

model. Both models have geometric service requirements and general arrival distributions

per time-slot. For the analysis of the ALOHA-system we have borrowed the necessary

ergodicity conditions from Szpankowski [1988]. Ergodicity is not sufficient to establish

strong recurrence in this case. We need convergence of the Laplace-Stieltjes transforms of

the arrival distributions in a neighbourhood of 0 as well. This assumption together with

ergodicity turns out to be necessary and sufficient for strong recurrence. Recalling that

strong recurrence is equivalent to the recurrence times to some state being exponentially

bounded, this result emphasizes again the connection between geometric ergodicity and

spatial and temporal geometric boundedness.

As an embedded M/G/1-type queueing system on the instants of a service completion

we consider the embedded K competing queues model, with Poisson arrivals and general

service time distributions. In this case ergodicity together with convergence of the Laplace-

Stieltjes transforms of the service time distributions in a neighbourhood of 0 are necessary

Page 15: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 5

and sufficient to ensure proper recurrence to the empty state, thus implying strong recur-

rence. A derivation of ergodicity conditions for the K competing queues model can e.g.

be found in Makowski & Shwartz [1989].

The one-dimensional random walk, as formulated in Miller [1966], comprises the one-

dimensional versions of these models as well as the embedded GI/M/s-queue on the instants

of an arrival. We use the characterization of geometric ergodicity derived by Miller to show

µ-geometric recurrence. Thus we obtain a stronger result than Miller, since our analysis

yields information on the µ-vector in the strong convergence property.

There is a typical anti-symmetry between the conditions for geometric ergodicity (or strong

recurrence) of M/G/s- and GI/M/s-type systems. In the embedded M/Gs-queue at most one

service is completed per time-slot. So the process has a tendency to drift away from the

empty state. To get sufficient recurrence towards the empty state both negative drift and

exponentially bounded service times are necessary, since otherwise too many customers

may enter the system between two successive service completions. If the drift is positive,

the process is transient. This reinforces the tendency to go away from the origin sufficiently

to make any other conditions redundant (cf. e.g. Kendal [1960]). For the GI/M/s-queue

it is just the reverse. There is at most one arrival per time slot, so the process exhibits

an inclination to stay within a finite set. Thus negative drift is sufficient for geometric

ergodicity, whereas exponentially bounded arrival distributions are necessary for geometric

ergodicity if the drift is positive.

Our third topic is related to the temporal geometric boundedness-like property of geomet-

rically ergodic models. The condition on the tails of the recurrence time distributions used

by Kendall [1960] for geometric ergodicity, is also equivalent to the following condition:

the generating function Pij(z) of the probabilities IPi(X(n) = j), n ∈ IN, is meromorphic

in a disk D0,R = z ∈ C | |z| < R, with R > 1, and has a simple pole in z = 1. An

equivalent condition is that the function (1− z)Pij(z) be analytic in D0,R.

This result inspired me to derive conditions on the analyticity of (1− z)P (z) as a matrix

function in order to characterize strong recurrence, and strong convergence and recurrence

for specific µ-vectors as well. As an immediate consequence of the characterization of µ-

geometric ergodicity thus obtained, we derive a lower bound of all possible contraction

factors in the µ-geometric ergodicity property. Roughly speaking, this bound is the mod-

ulus of the second largest value in the spectrum of the transition matrix P as a µ-bounded

operator. This extends a well-known result for finite MC’s, though it is slightly weaker,

and a similar result by Isaacson & Luecke [1978] for e-geometrically or strongly ergodic

MC’s. It is also related to an assertion proposed by Dekker [1985b].

In fact, for finite MC’s the lower bound itself is a suitable contraction factor. It is an open

problem, whether this holds for denumerable MC’s.

The characterization of µ-geometric recurrence we derive, is essentially the same condition

as used by Lasserre [1988] for the existence of Blackwell optimal policies in denumerable

Page 16: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

6 Part I, Chapter 1

MDC’s with µ-bounded rewards. The importance of this relation in the context of MDC’s

will be discussed in Part II. For MC’s it has interesting consequences as well. First we

remark that the characterizations of strong recurrence with or without specification of the

µ-vector are in fact weak conditions to guarantee the existence of the Laurent expansion

of P (z) in z = 1. At first sight it seems remarkable, that these turn out to be equivalent

to strong recurrence. However, both characterization and property are closely connected

to temporal geometric boundedness.

In particular, the Key theorem implies equivalence of the analyticity of the matrix function

(1− z)P (z) in a disk D0,R, R > 1 and in a disk D1,R′ , if aperiodicity is assumed. It is of

importance that aperiodicity is closely connected to the existence of precisely one pole of

P (z) on the unit circle. I conjecture, that this equivalence can be directly established by

using the analytic continuation of the generating function of recurrence times in the larger

disk, thus providing an alternative way to show the equivalence of strong convergence and

recurrence.

For the characterizations in which the bounding vector µ itself appears, no conditions

concerning stability are needed. This is generally true for all recurrence and ergodicity

conditions we use. Thus it confirms a conjecture that transient, geometrically ergodic

MC’s can be characterized through similar properties for µ-vectors with infi∈E µi = 0.

Dekker & Hordijk [1988], [1989] showed the existence of the Laurent expansion around

z = 1 of the generating function of P (z) under both strong convergence and recurrence.

Their proof consists of the derivation of an expression for the Laurent expansion as a

function of the stationary matrix and the deviation matrix. This expression was introduced

by Veinott [1969] for finite MC’s. Under strong convergence conditions it can be made

more precise, since in this case a formula for the deviation matrix exists.

In this monograph we give a constructive proof of the existence of the Laurent expansion

that uses only elementary analysis, if strong convergence is assumed. The equivalence of

strong recurrence and convergence of aperiodic chains combined with a data transforma-

tion (cf. Schweitzer [1971]), allow us to give a direct proof of the existence of the Laurent

expansion under strong recurrence conditions. Moreover, a formula for the deviation ma-

trix as a function of the taboo probability matrix will be derived.

This subsection closes with a brief discussion of the contents of the chapters in Part I. The

next subsection gives an overview of the literature on geometric and strong ergodicity.

Section 2 discusses various ergodicity and recurrence conditions from the literature on

irreducible MC’s. It studies the relation of these conditions with strong convergence and

recurrence.

Chapter 2 proves the Key theorem of Part I, which is the equivalence of strong conver-

gence and recurrence for multichain MC’s. The derivation uses recurrence conditions from

Dekker & Hordijk [1988], [1989] for the existence of sensitive optimal policies in Markov

decision chains with unbounded rewards. These are reviewed in section 2.1. Section 2.2

Page 17: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 7

gives a proof of the Key theorem that only uses elementary matrix theory.

Popov’s criterion is shown to be equivalent to strong recurrence, thus establishing the

equivalence of strong convergence and geometric ergodicity for irreducible MC’s. Under

the conditions of the Key theorem this can be easily extended to hold for multichain MC’s.

For uniformizable MP’s we derive a similar equivalence result in section 2.3. The analysis

uses the results for discrete time processes and Tweedie’s criterion for exponential ergod-

icity of irreducible MP’s. We show, that his criterion is equivalent to strong recurrence of

the AMC. In a way similar to the case of a discrete time prcess, his criterion is shown to

be necessary and sufficient as well, if a multichain structure is allowed.

The third chapter discusses the various applications we already mentioned. Strong con-

vergence, hence geometric ergodicity, of various queueing models is verified by the con-

struction of a µ vector of productform, for which the model is µ-geometrically recurrent.

The chapter starts with a negative result on strong ergodicity in section 3.1. Moreover, for

µ a vector of exponential type we show that µ-geometric recurrence implies geometrically

fast convergence of the Laplace-Stieltjes transforms of the marginal distributions in a

neighbourhood of 0. Both results apply to all models under consideration. As a simple

example of the construction of a suitable µ-vector for the strong recurrence property we

analyse the one-dimensional random walk (cf. Miller [1966]) in section 3.2.

In sections 3.3 and 3.4 we analyse the ALOHA-type system and the coupled processors

model. By its nature the K-competing queues model is a controlled system, and for the

two centre open Jackson network we can allow control of the service rates. Since allowing

control does not complicate the analysis essentially, we postpone it to Chapter 9, that

deals with the optimal control of queues.

The last chapter of Part I, Chapter 4, discusses the Laurent expansion of Pij(z) under

strong convergence and recurrence conditions. Section 4.1 elaborates extensively on the

relation between geometric ergodicity and the analyticity of (1− z)Pij(z) in a disk D0,R,

R > 1, as derived by Kendall. Next it states a characterization of µ-geometric ergodicity

through the analyticity of (1− z)P (z) as a matrix function on a disk D0,R, R > 1. Finally

it derives the Laurent expansion under strong convergence.

Section 4.2 shows the existence of the deviation matrix under strong recurrence condition

by a data transformation technique combined with an elegant argument due to G. Koole.

The same data transformation is a suitable tool for deriving the Laurent expansion, using

the Key theorem from Chapter 2 and the results in section 4.1. Strong recurrence turns

out to be equivalent to analyticity of (1 − z)Pij(z) in a disk D1,r, r > 0. A similar

characterization of µ-geometric recurrence in which the µ-vector appears, is developed as

well.

Part I concludes with the derivation of an explicit expression for the deviation matrix as

a function of the taboo probability matrix in section 4.3.

Page 18: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

8 Part I, Chapter 1

1.2. On the existing literature.

The subject of geometric ergodicity and its verification has attracted but little attention

in the past years. For this reason it seems useful to give an extensive survey of results and

techniques from the literature on this subject.

The first papers on geometric ergodicity were written by Kendall [1959], [1960]. We

already referred to his result on the equivalence of this property and a temporal geometric

boundedness condition for irreducible MC’s. It states the convergence of the probability

generating function of the time between two successive visits to a special state, for values

larger than 1. He considers this as the most suitable property for the verification of

geometric ergodicity of queueing systems. Since most of the older papers in this field

focus on geometric ergodicity properties of specific models, we will start our review with

these.

Kendall uses temporal geometric boundedness to derive conditions for geometric ergod-

icity of the number of customers in the embedded M/G/1-,GI/M/1-queues on the instants

of service completions and arrivals of customers respectively. Schal [1971] extends this

result to the continuous time versions of these models. In 1969 Neuts & Teugels proved

exponential ergodicity of various performance measures of the M/G/1-queue, including the

virtual waiting time and the number of customers in the queue.

A large class of one-dimensional queueing processes can be modelled as a generalized

random walk on the half line. The generalization consists of allowing boundary conditions

that affect the transitions into and out of a finite set of states. Miller [1966] studies such

a random walk on IN0. Thus he obtains results for the embedded GI/M/m-queue.

The random walk on IR+ was analysed by Nummelin & Tweedie [1978]. A special case

of this process is the sequence of customer waiting times in a model with one server.

An earlier related result is by Cheong & Heathcote [1965]. They show geometrically fast

convergence of the waiting time distribution of the nth customer in a GI/G/1-queue via

an analysis of the number of customers served in a busy cycle. This quantity can in fact

be interpreted as the first hitting time of the empty state in the embedded chain on the

instants of service completions.

In Hajek [1982] another derivation of the same result for the GI/G/1-queue is given. As far

as we know, his paper is also the first one to consider a multi-dimensional queueing system,

which is an ALOHA-system with control varying in time according to some prescribed rule.

Each state is taken to be a pair, consisting of the number of backlogged packets and the

retransmission rate for a packet. Thus it can be modelled as a two-dimensional MC. For

the proof he derives recurrence conditions for general processes with a continuous state

space, using first entrance times. We will come back to this later.

As a last paper in this series we mention Tuominen & Tweedie’s [1979b], which analyses

Markovian storage processes by stochastic comparison of discrete time skeletons with a

random walk process. The stochastic process associated with the virtual waiting times in

Page 19: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 9

an M/G/1-queue is included.

As can be expected with Kendall’s equivalence result, most of these papers derive necessary

and sufficient conditions for geometric ergodicity via an analysis of first hitting times. In

view of our remarks it is not remarkable, that in general it turns out to be equivalent to

some spatial geometric boundedness condition.

Another collection of papers, which are mainly of a later date, deal with recurrence type

criteria for various forms of ergodicity. In Popov [1977] the first recurrence type of criterion

for geometric ergodicity is derived. In Isaacson [1979] a characterization of geometric

ergodicity through the δ-coefficient is given. Section 2 borrows from his paper the overview

of the three forms of ergodicity together with their characterizations. Incidentally, it seems

to us,that it is necessary to require gi ≥ 1, i ∈ S in his characterization b) on p. 268.

From Theorem 1 of Nummelin & Tweedie [1978] follows that geometric ergodicity implies

exponentially fast convergence in total variation for each starting state separately. As

shown in this monograph, it implies µ-geometric ergodicity too, which is convergence in

matrix norm and therefore formally stronger. In their Theorem 2.3 Nummelin & Tuomi-

nen [1982] study a condition that requires geometrically fast convergence of an expression,

we might call the expected weighted total variation with respect to some initial distribu-

tion on the state space. For irreducible, geometrically ergodic MC’s they derive equivalent

conditions of ergodic type. None of these are easily verifiable and no characterization of

geometric ergodicity using these conditions is given. For denumerable MC’s their property

is implied by µ-geometric ergodicity for the weighting function µ and the stationary distri-

bution as the initial distribution. Hence, by our equivalence result µ-geometric recurrence

can be used to verify their property, and to show that indeed it characterizes geometric

ergodicity.

In Tuominen & Tweedie [1979a] exponential ergodicity is studied. They show that geo-

metric ergodicity of discrete time h-skeleton chains and exponential ergodicity of the MP

are equivalent. It is of interest to note that our discrete time models are not h-skeleton

chains, but first order approximations (in h) of the transition probabilities.

In Tweedie [1981] criteria for various forms of ergodicity are stated in terms of conditions

on the recurrence time to a finite set. Moreover, he derives a necessary and sufficient

conditions on the intensity matrix for exponential and strong ergodicity. For uniformizable

MP’s, this criterion is equivalent to strong recurrence of the AMC (cf. Lemma 2.7). We

use this to show that strong convergence and exponential ergodicity are equivalent.

Hajek’s [1982] paper derives recurrence conditions for non-homogeneous processes that

are closely related to Popov’s. He uses a real-valued state space, so that the finite sets in

the conditions for the denumerable state space are sets with a finite supremum here. Due

to this circumstance and the possible non-homogeneity he needs to impose some extra

bounds as well.

Strong ergodicity has also been a topic in many papers. Since our main results are on

Page 20: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

10 Part I, Chapter 1

geometric ergodicity, we only mention the papers related to the overview in section 1.2.

In Huang & Isaacson [1977] strong ergodicity is characterized via bounded expected re-

currence times. Strong ergodicity is shown to be equivalent to the existence of a bounded

solution to the Foster-inequalities in Isaacson & Tweedie [1978]. As will be pointed out in

section 1.2, these results are contained in Hordijk [1974] as well.

In Isaacson & Luecke [1978] the best possible convergence rate for strongly ergodic Markov

chains is determined via a spectral radius. As has already been mentioned in subsection

1.1, a similar result is possible for geometric ergodicity, when using the appropriate µ-norm

(cf. section 4.1).

The fundamental problem of stability or equivalently ergodicity of general Markov chains

or specific queueing systems has also been addressed in numerous papers. In a recent pa-

per by Szpankowski [1988] multi-dimensional queueing systems are considered. Sufficient

conditions for ergodicity and non-ergodicity of a multi-dimensional Markov chain are pre-

sented. These methods are applied to two multi-dimensional queueing systems: buffered

contention packet broadcast systems and coupled processor systems. Wit our method we

will show geometric ergodicity of two-dimensional versions of these systems. In fact, the

coupled processors model we analyse, is adopted from Fayolle & Iasnogorodski [1979], who

derive criteria for ergodicity of this model that involve only the arrival and service rates.

Our interest in recurrence and ergodicity properties started in Markov decision theory. Re-

currence conditions play an essential role in establishing the existence of optimal policies.

Foster’s criterion, Lyapunov functions and the Doeblin condition have been extensively

used in Markov decision chains. Many of the criteria and proof techniques for ergodicity

and strong ergodicity for a product-set of Markov chains can be found in Hordijk [1974].

Obviously, these results also hold for one Markov chain. However, most of them are not

explicitly stated, as the objective was to prove the existence of optimal policies. Some of

the results by Hordijk [1974] will be mentioned in the overview in section 2. For an anal-

ysis of recurrence conditions and their applications in Markov decision chains we refer to

Federgruen, Hordijk & Tijms [1978a,b], Thomas [1980], Zijm [1985] and Dekker & Hordijk

[1988], [1989].

In the latter references it is shown that µ-geometric ergodicity or µ-geometric recurrence

together with standard continuity and compactness assumptions are sufficient for the ex-

istence of undiscounted (sensitive) optimal policies. The verification of µ-geometric ergod-

icity is the main step in the application of these results. For instance, let us consider the

problem of controlling the service-rates in a two centre network. Under the assumption of

non-decreasing bounded service-rates, such that they strictly dominate the throughputs

for all large numbers of jobs, the existence of non-discounted optimal policies is guaranteed

for any cost function that is polynomially bounded in the numbers of jobs. Indeed, by the

verification of µ-geometric recurrence, similar to that in Chapters 3 and 9, the results in

Dekker & Hordijk [1989] can be applied.ææææææ

Page 21: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 11

2. Characterization of ergodicity properties.

In this section we will give an overview of ergodicity properties and their characterizations

via recurrence conditions. The propositions summarize the conclusions that follow from

the simple arguments, to which we restrict ourselves in the overview.

Let X(n)n∈IN0be an aperiodic and irreducible Markov chain with state space E. The

n step transition probabilities are

Pnij = IP

(X(n) = j | X(0) = i

), i, j ∈ E.

The MC is ergodic if a probability vector π exists with∑j πj = 1 and

limn→∞

Pnij = πj , i, j ∈ E.

The MC is geometrically ergodic, if for finite constants cij , i, j ∈ E, and a β < 1

|Pnij − πj | ≤ cijβn, n ∈ IN0, i, j ∈ E. (1.1)

Kendall [1959] proposed in his definition a different contraction factor βij < 1 for each pair

of states (i, j). However, under the conditions of this section, Vere-Jones [1962] showed

that one β < 1 exists for which relation (1.1) holds.

The MC is strongly ergodic, if for some constants c and β < 1∑j

|Pnij − πj | ≤ cβn, n ∈ IN0, i ∈ E. (1.2)

Evidently strong ergodicity implies geometric ergodicity, which in turn implies ergodicity.

With µ a vector with µi > 0, i ∈ E we associate the weighted supremum norm ‖ ·‖µ. Then

µ is the vector of positive weights or the bounding vector. For any vector x on the state

space E, its µ-norm is defined as ‖x‖µ = supi∈E|xi|µi

. The operator norm of a matrix A

on E × E is supi∈E µ−1i

∑j |Aij |µj .

Remark 1.1: Notice that for µ-bounded operators A, B and a µ-bounded vector x, the

products AB, Ax are well-defined, so that ‖AB‖µ ≤ ‖A‖µ‖B‖µ and ‖Ax‖µ ≤ ‖A‖µ‖x‖µ.

Moreover, multiplication is associative, i.e. (AB)x = A(Bx). This can be proved by

using the Fubini-Tonelli theorem (cf. Royden [1968]). For µ-bounded operators A(n)nwith

∑n ‖A(n)‖µ <∞ all matrix operations are allowed as well, such as

(∑nA(n)

)B =∑

n(A(n)B and vice versa. This will be used frequently, without explicit justification.

Let e be the bounding vector with all components equal to 1. The e-norm is the supremum

norm and relation (1.2) can be written as

‖Pn −Π‖e ≤ cβn, n ∈ IN0,

Page 22: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

12 Part I, Chapter 1

with Π the matrix with identical rows equal to π. Generalization of this convergence in

supremum norm gives our concept of µ-geometric ergodicity.

Definition 1.1 : A MC is µ-geometrically ergodic (µ − GE) for a bounding vector µ, with

µi ≥ 1, i ∈ E and ‖P‖µ <∞, if for some constants c and β < 1,

supi∈E

µ−1i

∑j

|Pnij −Πij |µj = ‖Pn −Π‖µ ≤ cβn, n ∈ IN0.

If the MC is µ-geometrically ergodic for a bounded µ-vector, say 1 ≤ µi ≤ c∗, i ∈ E, it is

strongly ergodic. Indeed,∑j

|Pnij −Πij | ≤∑j

|Pnij −Πij |µj ≤ cβnµi ≤ cc∗βn, i ∈ E.

Already we saw that strong ergodicity is e-geometric ergodicity. The following proposition

holds:

Proposition 1.1 Strong ergodicity is equivalent to µ-geometric ergodicity for a bounded

µ-vector.

If the chain is µ-geometrically ergodic, then ‖Pn −Π‖µ ≤ cβn, n ∈ IN0, and this implies

|Pnij −Πij | ≤ cµiµjβn, n ∈ IN0.

Hence the chain satisfies relation (1.1) for cij = cµi/µj and is therefore geometrically

ergodic.

The question immediately arises whether geometric ergodicity implies µ- geometric ergod-

icity for some bounding vector µ. We will come back to this after the introduction of the

notion of µ-geometric recurrence.

It is well-known, that one can use the following criterion to verify ergodicity of a chain. It

is essentially due to Foster [1953].

The chain is ergodic iff a vector µ, with µi ≥ 1, i ∈ E, a positive ε and a finite set M

exist, such that

(1)

∑j

Pijµj ≤ µi − ε, i 6∈M∑j

Pijµj <∞, i ∈M.

In Hordijk [1974, pp.52, 53] an apparently weaker version is used:

(1∗)∑j 6∈M

Pijµj ≤ µi − ε, i ∈ E.

Page 23: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 13

However, the two conditions are equivalent. Indeed, if µ is a solution to (1), then µ∗ with

µ∗i =

µi, i 6∈M∑j 6∈M

Pijµj + ε, i ∈M,

is a solution to (1∗). On the other hand, suppose µ∗ is a solution to (1∗), then µ with

µi =

µ∗i , i 6∈Mε

2|M |, i ∈M

satisfies (1) with ε/2 instead of ε. Notice that the newly constructed solutions are positive

and bounded away from 0, so that they can be multiplied with a positive constant to

obtain 1 as a lower bound.

For a subset M of E, let MP be the matrix of taboo probabilities, i.e.

MPij =

Pij , j 6∈M0, j ∈M.

Relation (1∗) is in vector notation

ε · e+ MP µ ≤ µ.

Iteration of this inequality yields for any n ∈ IN,

εn∑t=0

MPte+ MP

n+1µ ≤ µ.

The limit as n→∞ gives∞∑t=0

MPte ≤ µ

ε.

Let T := mint ≥ 1 | X(t) ∈ M be the recurrence time to set M . The event T > t is

equal to X(k) 6∈ M, 1 ≤ k ≤ t. Hence, with IPi and IEi the conditional probability and

expectation operators on the event X(0) = i

IPiT > t =(MP

te)i,

and

IEiT =∞∑t=0

IPiT > t =∞∑t=0

(MP

te)i ≤

µiε.

Consequently, IEiT is finite for any starting state i and the vector µ is an upperbound

for ε · IET . Let 0 denote some fixed state in M . It is easy to show (cf. Hordijk [1974],

Page 24: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

14 Part I, Chapter 1

Lemma 5.10) that in this case also the recurrence time to 0, say T0, has a finite expectation

for any starting state. The conclusion is, that the MC is positive recurrent (cf. Chung

[1967]) and therefore ergodic, since

IEiT0 <∞, i ∈ E. (1.3)

Conversely, if (1.3) holds, then µi = IEiT0 is a solution of (1∗) with ε = 1 and M = 0,as is easily checked. Similarly, µi = IEiT is a solution with ε = 1 and M , and we have

obtained the following result.

Proposition 1.2

i) µi = IEiT is the componentwise smallest solution of (1∗) with ε = 1,

ii) The MC is ergodic iff a solution to (1∗) exists.

In Popov [1977] (see also Isaacson [1979]) the MC was shown to be geometrically ergodic

iff a vector µ, with µi ≥ 1, i ∈ E, a positive ε and a finite set M ⊂ E exist, such that

(2)

∑j

Pijµj ≤ µi(1− ε), i 6∈M∑j

Pijµj <∞, i ∈M.

Analogously to the relation between (1∗) and (1), this condition has the following appar-

ently weaker version,

(2∗)∑j 6∈M

Pijµj ≤ µi(1− ε), i ∈ E.

However, with similar transformations as before we can prove the equivalence of (2) and

(2∗). Moreover, if µ, µ∗ are solutions to (2) and (2∗) respectively, then for any constant

c > 0, cµ and cµ∗ are solutions as well. Consequently, (2) has a solution µ with µi ≥ 1, i ∈E iff (2∗) has a solution µ∗ with µ∗i ≥ 1, i ∈ E.

A MC, for which a solution to (2∗) exists, is said to have the following property:

Definition 1.2 : A MC is µ-geometrically recurrent (µ−GR) for a vector µ with µi ≥ 1, i ∈E, if for some finite set M , and positive ε

‖MP ‖µ ≤ 1− ε.

Clearly, if the chain is µ-geometrically recurrent for the finite set M , then (2∗) holds and

the recurrence time T to M satisfies

IPiT > t =(MP

te)i ≤

(MP

tµ)i ≤ (1− ε)tµi.

So the tail probabilities for the recurrence time converge geometrically fast to 0. In this

case the recurrence time is called exponentially bounded. In the literature (cf. Miller [1966]

Page 25: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 15

and section 1.1) such property is called a temporal geometric boundedness condition. Con-

versely, if

IPiT > t ≤ (1− ε)tµi, t ∈ IN0, i ∈ E,

then

Fi(z) :=∞∑t=0

IPiT > tzt+1

converges for |z| < (1− ε)−1 and the following relation holds,

Fi(z) = z + z∑j 6∈M

PijFj(z), i ∈ E.

With z−1 = (1− ε∗) > 1− ε and µ∗i = Fi(z) the following inequalities are satisfied∑j 6∈M

Pijµ∗j ≤ (1− ε∗)µ∗i , i ∈ E,

and the chain is µ∗-geometrically recurrent. This proves the first part of the following

analogy to Proposition 1.2.

Proposition 1.3

i) The MC is µ-geometrically recurrent iff for some finite set M the recurrence time to

M is exponentially bounded.

ii) The MC is geometrically ergodic iff a solution to (2∗) exists.

Recall that Popov’s theorem states the equivalence of geometric ergodicity and the ex-

istence of a µ, with µi ≥ 1 ∀ i ∈ E, that satisfies (2). In turn, the existence of such

µ is equivalent to the existence of a solution µ∗, with µ∗i ≥ 1 for i ∈ E, to (2∗), thus

establishing the second assertion. Actually the construction of the µ-vector in the proof

of the Proposition is in essence similar to Popov’s construction (cf. section 2.3).

An alternative way of showing the same result uses the equivalence of exponentially

bounded recurrence times to a special state and geometric ergodicity, as was proved by

Kendall [1960]. At the end of this section we will show that exponentially bounded recur-

rence times to a finite set M and to a special state 0 ∈M are equivalent. Thus the second

assertion of Proposition 1.3 is implied by the first as well.

Remark 1.2: For further derivations it is relevant to notice that in fact the proof of

Proposition 1.3i) uses neither irreducibility nor aperiodicity of the MC. Hence the result

holds for general MC’s.

In the Key theorem of Part I (section 2.2) we demonstrate that µ-geometric recurrence

implies µ-geometric ergodicity. The converse is only true if we take a different bounding

vector in the recurrence property (cf. Lemma 2.2). For convenience we introduce some

new terminology.

Page 26: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

16 Part I, Chapter 1

Definition 1.3 : The Markov chain is strongly convergent, recurrent if the chain is µ-

geometrically ergodic, recurrent respectively for some bounding vector µ with µi ≥ 1, i ∈E.

The following theorem is an immediate consequence of the Key theorem and Lemma 2.2.

Theorem 1.1

i) The MC is strongly convergent iff it is strongly recurrent.

ii) It is strongly convergent with bounded µ iff it is strongly recurrent with bounded µ.

This theorem provides the answer to the earlier mentioned question, whether a geometri-

cally ergodic chain is strongly convergent. The answer is “yes”, by combination of Propo-

sition 1.3ii) and Theorem 1.1. The following corollary states this conclusion.

Corollary 1.1 The MC is geometrically ergodic iff it is strongly convergent.

Combining Proposition 1.1 and Corollary 1.1 we can say that strong convergence, i.e.

convergence in a weighted supremum norm, unifies the notions geometric ergodicity and

strong ergodicity.

Our Key theorem has more interesting consequences, especially with respect to strong

ergodicity. So let us continue with the overview, and discuss the existing results for strong

ergodicity that are related to this paper.

In Hordijk [1974] a productset P of Markov matrices is analysed in order to obtain existence

theorems for optimal policies in stochastic dynamic programming. As a special case we

may take P = P and apply the results there to one MC. No assumption is made with

respect to aperiodicity or irreducibility. Theorem 11.3 of Hordijk [1974] states that the

Doeblincondition is equivalent to supi IEiT < ∞ with T the recurrence time to a finite

set. It is well-known (cf. Neveu [1965]) that the Doeblincondition together with the

assumption that the chain is aperiodic is equivalent to strong ergodicity. As mentioned

before, µi = IEiT is the minimal solution to (1∗) with ε = 1 finite set M ⊂ E. This leads

to the following result.

Proposition 1.4 The MC is strongly ergodic iff a bounded solution to (1∗) exists.

Note, that if µ is a bounded solution to (1∗), say µi ≤ b, i ∈ E, then µ is also a solution to

(2∗). In fact, ‖MP ‖µ ≤ 1− (ε/b) for the finite set M . So the proposition is a consequence

of our Key theorem as well.

Isaacson & Tweedie [1978] proved that the chain is strongly ergodic iff the solution to (1)

is bounded. This result follows easily, since systems (1) and (1∗) are equivalent. In Huang

and Isaacson [1977] the boundedness of IEiT0 as a function of the starting state was argued

to be a criterion for strong ergodicity of nonstationary MC’s. For stationary MC’s the

result follows from Theorem 11.3 together with Lemma’s 11.5 and 11.6 in Hordijk [1974].

It is also implied by our Key theorem.

Page 27: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction and Overview 17

In practice one is often interested in the stability, i.e. ergodicity, of aperiodic and irre-

ducible chains, since stability assures that the system will return to the “normal” situation

with probability 1. It is important to have information on the distribution of the recovery

time. Quite often the system has a most preferable state; the empty system plays this role

in open networks. We conclude this section with some observations on the relation of the

ergodicity conditions with recurrence times to one state.

Suppose our MC models the system and state zero is our special state. We are interested

in a measure for T0, the recurrence time to state zero. If we have a solution (ε,M, µ) to

(2∗) then by Proposition 1.3 the recurrence time to M is exponentially bounded.

To obtain a similar bound for T0, with 0 ⊂ M , we can show that (2∗) has a solution

too for M = 0, with generally different values for ε and µ. Indeed, with 0Pij = Pij , 0

when j 6= 0 and j = 0 respectively, and MP the taboo probabilities for M , last exit

decomposition on set M − 0 can be applied to prove that

µ∗ :=

∞∑n=0

0Pnµ ≤ cµ,

for a constant c ≥ 1 (cf. the proof of Lemma 2.5). Since

0P µ∗ =

∞∑n=1

0Pnµ = µ∗ − µ ≤

(1− 1

c

)µ∗,

we obtain, that (ε∗,M = 0, µ∗) with ε∗ = 1/c is a solution to (2∗). Moreover, µ∗ is

bounded if µ is. Hence,

IPiT0 > t ≤(

1− 1

c

)tµ∗i , i ∈ E,

and T0 is exponentially bounded.

If µ∗i ≤ c∗, i ∈ E, then

IPiT0 > t ≤(

1− 1

c

)tc∗, i ∈ E.

In this case T0 is uniformly exponentially bounded, and the following proposition is valid.

Proposition 1.5 The MC is geometrically, strongly ergodic iff the recurrence time to a

state is exponentially, uniformly exponentially bounded respectively.

Page 28: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

18 Part I, Chapter 2

CHAPTER TWO

Equivalence of Strong Convergence and Strong Recurrence.

1. Preliminary lemmas on related recurrence conditions.

In this chapter we will derive our main result concerning the equivalence between various

ergodicity and recurrence concepts. For the proof of our Key theorem we need a detailed

study of geometric recurrence, in which a number of related recurrence conditions appear.

Therefore we decided to analyse thoroughly the relationships between these conditions

(cf. Table 1). However, instead of restricting ourselves to irreducible chains, we will relax

our assumption on the Markov chain structure and allow multichain Markov chains. That

means, that besides inessential states, the chain may contain more than one class. This

generalization makes the result more interesting, whereas it does not essentially complicate

the analysis.

We need several concepts from the theory of Markov chains (cf. Chung [1967]) and some

notation. P is again the matrix of transition probabilities. A set of states is called a class,

if from any state within the class all states in that class can be reached, but no other

states. Notice that our definition of “class” differs slightly from Chung’s. For any set

M ⊂ E, F(n)iM denotes the probability that the system, starting in state i, enters set M for

the first time after n steps, and FiM the probability that set M is eventually reached; in

formulaF

(n)iM =

∑m∈M

(MPn−1 · P )im, n = 1, 2, . . .

FiM =

∞∑n=1

F(n)iM ,

where MPn

= (MP )n and MP0

is equal to the identity matrix. For any i, j ∈ E the Cesaro

limit of the sequence Pnij | n ∈ IN0 exists (cf. Chung [1967], Corollary to Theorem 6.4).

The stationary matrix Π is defined as this limit, i.e.

Πij = limN→∞

1

N + 1

N∑n=0

Pnij .

We will call a MC stable, if Π is a stochastic matrix, i.e.∑j∈E Πij = 1. A set B ⊂ E

is called a set of reference states if it contains precisely one state from each class, but no

other states. ν denotes the number of classes.

Page 29: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 19

For easy reference we recapitulate well-known results from MC theory in a lemma. They

summarize results from Theorems 4.4 and 6.3 to 7.4 in Chung [1967] that are useful for

our analysis.

Lemma 2.1

i) For i an essential state∑j

Πij = 1 ⇐⇒ i is positive recurrent.

ii) If j is transient or null recurrent then Πij = 0, ∀ i ∈ E.

iii) Πjj > 0 ⇐⇒ j positive recurrent.

iv) All states not contained in any class are inessential, hence transient.

v) For j positive recurrent Πij = FijΠjj , ∀ i ∈ E.

vi) For C a positive recurrent class

Fij = Fik, ∀ i ∈ E, j, k,∈ CΠik = Πjk, i, j, k ∈ C.

vi) If Π is stochastic, FiB = 1 for all sets B of reference states, and all classes are

positive recurrent.

We already mentioned that we need to study the µ-geometric recurrence (µ−GR) property

more closely. Indeed, µ − GR will be shown to be closely related to the seven recurrence

conditions below. Each of these is of interest, and the use of them will facilitate our further

proofs. Let M ⊂ E be finite, and µ a vector with µi ≥ 1, ∀ i ∈ E.

Definition 2.1: A MC satisfies

- µ−WGR(M), if ∃c > 0, β < 1, such that

‖MPn‖µ ≤ cβn, n ∈ IN.

- µ− R(M), if ∃no ∈ IN, c1 > 1, β < 1, such that

‖MPno‖µ ≤ β, ‖P‖µ ≤ c1.

- µ− BS(M), if ∃c2 > 1, such that

‖∞∑n=0

MPn‖µ ≤ c2.

- µ−GRRS(M), µ−WGRRS(M), µ−RRS(M) and µ−BSRS(M), if there is a set of reference

states B ⊂ M , such that the chain satisfies µ − GR(B), µ − WGR(B), µ − R(B) and

µ− BS(B) respectively.

The letter combinations (W )(G)R, BS and RS stand for (Weak) (Geometric) Recurrence,

Bounded Sum and Reference States. If there is no need to specify set M , it can be dropped

in the abbreviation.

Mark that condition µ−WGR(M) implies that the recurrence times to set M are exponen-

tially bounded. In the case of an irreducible state space, this property was shown to imply

exponentially bounded recurrence times to a state (cf. Proposition 1.5). If the chain has a

Page 30: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

20 Part I, Chapter 2

multichain structure, we get exponentially bounded recurrence times to a set of reference

states, which is implied by µ−WGRRS(M).

In Dekker, Hordijk & Spieksma [1990] (cf. also Dekker [1985]) the generalizations of

these properties are shown to be equivalent in Markov decision chains, under appropriate

compactness and continuity conditions. The equivalence of µ− BSRS(M), µ− RRS(M) and

µ − WGRRS(M) is also proved in Dekker & Hordijk [1989]. For sake of completeness we

will give the arguments here for one Markov chain.

Our concepts of µ − GR, µ − GRRS are in fact new. We introduced it for reasons of

practical interest. Both are by far the easiest to check recurrence conditions of the eight

mentioned, because they only involve the one-step transition matrix. Moreover, they arose

quite naturally in the context of well-known ergodicity criteria, as we already indicated in

section 1.2. In the following lemmas the connection between µ − GR, µ − GRRS and the

other six recurrence conditions will be analysed.

Lemma 2.2

i) µ−WGR(M), µ− R(M) and µ− BS(M) are equivalent.

ii) µ− GR(M) ⇒ µ−WGR(M).

iii) µ−WGR(M) ⇒ µ− GR(M) with µ =∑∞n=0 MP

nµ.

iv) The same relations hold if we take a set of reference states B ⊂M as the taboo set.

Proof: i) (cf. Dekker & Hordijk [1989]). Obviously µ − WGR(M) implies µ − BS(M) and

the first part of µ − R(M). Assume that condition µ − WGR(M) holds for constants c and

β, then1

µi

∑j

Pijµj =1

µi

∑j

MPijµj +1

µi

∑j∈M

Pijµj

≤ c+∑j∈M

µj , ∀ i ∈ E,(2.1)

so that the second part of µ− R(M) is satisfied for constant c1 = c+∑j∈M µj . The proof

is complete if we show

µ− BS(M) ⇒ µ− R(M) ⇒ µ−WGR(M).

Assume that ‖∞∑n=0

MPn‖µ ≤ c < ∞. Let y =

∞∑n=0

MPnµ, then µ ≤ y ≤ cµ, and y =

µ + MP y. Hence, MP y = y − µ ≤ (1 − c−1)y. Choosing β < 1, and no such that

(1− c−1)no · c < β, we obtain that

MPnoµ ≤ MP

noy ≤ (1− c−1)noy ≤ (1− c−1)no · cµ ≤ βµ.

This establishes the first part of condition µ − R(M). Similar inequalities as in (2.1) can

be used to prove the second part.

Page 31: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 21

Assume now condition µ− R(M) for constants no ∈ IN, c1 > 0 and β < 1. For kno ≤ n <

(k + 1)no, k ≥ 0, we have

‖MPn‖µ ≤ ‖MP

kno‖µ · ‖MPn−kno‖µ ≤ βk · sup

1≤l≤no‖P l‖µ ≤ βk · sup

1≤l≤no‖P‖lµ ≤ βkc

no1 .

Let γ = β1/no , then γ < 1 and βk ≤ γn/β. This establishes µ−WGR(M) for the constants

c2 = cno1 β−1 and γ.

ii) Trivial, since ‖MP ‖µ ≤ β < 1 implies that ‖MPn‖µ ≤ βn for n ∈ IN0.

iii) By virtue of i) we can use either condition. Assume µ−BS(M), then there is a constant

c > 1, such that µ :=∑∞n=0 MP

nµ ≤ cµ. Thus µi(1− c−1) ≥ µi−µi =

∑∞n=1

(MP

nµ)i

=

(MP µ)i, so that ‖MP ‖µ ≤ 1− c−1 < 1.

iv) The proof follows from the same arguments as in the previous part.

The only relation between the seven conditions we have to show yet, is the equivalence of

µ − WGR(M) and µ − WGRRS(M). We need some technical lemmas first. Lemma 2.3ii) is

in fact Lemma 3.2 in Dekker & Hordijk [1989] for one MC.

Lemma 2.3 Assume µ−WGR(M).

i) For all i ∈ E and for all ε > 0, a finite set K(ε, i) ⊂ E exists, such that∑j 6∈K(ε,i)

Pnijµj ≤ ε, ∀n ∈ IN.

ii) There is a c′ > 0, such that ‖Pn‖µ ≤ c′, ∀n ∈ IN.

iii) FiM = 1.

Proof: i) Let β < 1 and c > 1 be such that ‖MPn‖µ ≤ cβn. Using last exit decomposition

on set M , we get

Pnij = MP

nij +

n−1∑k=0

∑m∈M

Pn−kim MP

kmj . (2.2)

Let M ′ be any set containing M . Multiply both sides of (2.2) with µj and sum over all

states outside M ′, then∑j 6∈M ′

Pnijµj =

∑j 6∈M ′

MPnijµj +

∑j 6∈M ′

n−1∑k=0

∑m∈M

Pn−kim MP

kmjµj

≤∑j 6∈M ′

MPnijµj +

∑j 6∈M ′

n−1∑k=0

∑m∈M

MPkmjµj . (2.3)

Choose T ≥ 1 with βT (1− β)−1c ·max(∑m∈M

µm, µi) < ε/2 and n ≥ T + 1. Then

∑j 6∈M ′

MPnijµj +

n−1∑k=T

∑m∈M

∑j 6∈M ′

MPkmjµj ≤ βncµi + (βT + · · ·βn−1)c ·

∑m∈M

µm

≤ βT

1− βc ·max(µi,

∑m

µm) <ε

2

Page 32: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

22 Part I, Chapter 2

⇒ (2.3) <ε

2+∑j 6∈M ′

T−1∑k=0

∑m∈M

MPkmjµj , ∀M ′ ⊃M, ∀n ≥ T + 1. (2.4)

There are only finitely many terms in the summation over k and m. For this reason we can

find a set M ′ ⊃M , for which ε bounds both the righthandside of (2.4) and∑j 6∈M ′ P

kijµj ,

for k ≤ T .

ii) Suppose that µ−WGR(M) holds for constants c > 0, β < 1. We use (2.2) to obtain

∑j

Pnijµj =

∑j

MPnijµj +

∑j

n−1∑k=0

∑m∈M

Pn−kim MP

kmjµj

≤∑j

MPnijµj +

∑m∈M

n−1∑k=0

∑j

MPkmjµj

≤ cβn · µi +∑m∈M

n−1∑k=0

cβk · µm

≤ c

1− β

(1 +

∑m∈M

µmµi

)µi ≤ c′µi, n ∈ IN,

if c′ ≥ c(1− β)−1(1 +

∑m∈M µm

).

iii) (cf. Dekker & Hordijk [1989]). By virtue of condition µ− BS(M),

1− FiM = limN→∞

(1−N∑n=1

F(n)iM ) = lim

N→∞

∑j

MPNij

≤ limN→∞

∑j

∑k≥N

MPkijµj = 0.

A noteworthy consequence of the lemma is the following theorem.

Theorem 2.1 If condition µ− WGR(M) holds, the collection of marginal probability dis-

tributions Pni• | n ∈ IN0 of the system at time n, when the starting state is i, is tight

and uniformly integrable with respect to µ, ∀ i ∈ E.

Proof : The assumption of Lemma 2.i) is precisely the definition of uniform integrability

with respect to µ. The tightness is immediately implied, as µi ≥ 1, ∀ i ∈ E.

A well-known implication of the tightness is stability of the MC.

Corollary 2.1 The MC is stable and ‖Π‖µ <∞.

Proof : By Fatou’s lemma,∑j Πij ≤ 1, ∀ i ∈ E, and

∑j

Πijµj ≤ lim infN→∞

1

N + 1

∑j∈E

N∑n=0

Pnijµj ≤ c′µi, (2.5)

Page 33: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 23

for some c′ > 0, where Lemma 2.3ii) is used for the last equality. This establishes ‖Π‖µ <∞. As the collection Pni• | n ∈ IN is uniformly integrable with respect to µ, the set of

all convex combinations of these measures is uniformly integrable. Hence, the collection

1N+1

∑Nn=0 P

ni• | n ∈ IN is uniformly integrable with respect to µ, consequently tight.

Stability immediately follows (cf. also Chung [1967] Theorem 7.3).

As a last prepatory lemma we derive Lemma 2.4.

Lemma 2.4 Let the MC be stable, such that ‖Π‖µ <∞. Moreover ν <∞. Then the set

Πi• | i ∈ E is uniformly integrable with respect to µ.

Proof : ν > 0, as Π is stochastic. Let C1, . . . , Cν be an enumeration of the positive

recurrent classes, and ε > 0. For i, j ∈ Cl, Πik = Πjk, ∀ k ∈ E. As ν < ∞, a finite set

K(ε) can be chosen to satisfy∑

j 6∈K(ε)

Πijµj ≤ ε, ∀ i ∈ν∪l=1Cl. If i ∈ E − ∪

lCl, then

∑j 6∈K(ε)

Πijµj =ν∑l=1

∑j∈Cl6∈K(ε)

FiClΠjjµj ≤ν∑l=1

FiCl · ε = ε.

These preparations are sufficient to prove

Lemma 2.5

i) µ−WGR(M) and µ−WGRRS(M) are equivalent.

ii) µ− GR(M) implies µ− GRRS(M), with µ =∑n BP

nµ, for a set B ⊂M of reference

states. Hence, µ is µ-bounded.

Proof : i) (cf. Dekker, Hordijk and Spieksma [1990]). In fact we will prove that µ−BS(M)

and µ − BSRS(M) are equivalent. To this end we apply last exit decomposition on set

M −B, where for the moment B will be an arbitrary subset of M , i.e.

BPnij = MP

nij +

n−1∑k=0

∑m∈M6∈B

BPn−kim MP

kmj .

Therefore∞∑n=0

(BP

nµ)i

=

∞∑n=0

(MP

nµ)i+∑m∈M6∈B

∞∑n=0

BPnim

∞∑k=0

(MP

kµ)m. (2.6)

As∑n

(BP

nµ)i≥∑n

(MP

n)i, µ − BSRS(M) implies µ − BS(M). Assume condition

µ−BS(M). By virtue of the Corollary to Theorem 2.1 the stationary matrix Π is stochastic,

so that ν > 0.

Let C1, . . . , Cν be the classes in the chain. Lemmas 2.1 and 2.3iii) imply that they are

positive recurrent, and that Cl ∩M 6= ∅, l = 1, . . . , ν. Consequently, M contains a set B

of reference states.

Page 34: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

24 Part I, Chapter 2

Next we show that∑n BP

nim in (2.6) is bounded by cµi, for some constant c, ∀m ∈M−B,

∀ i ∈ E. Apply first entrance decomposition on set M −B.

∞∑n=0

BPnim = BP

0im +

∑m′∈M6∈B

∞∑n=0

(MP

nBP)im′·∞∑k=0

BPkm′m

= δim +∑m′∈M6∈B

Fim′ ·∞∑k=0

BPkm′m ≤ 1 +

∑m′∈M6∈B

∞∑k=0

BPkm′m. (2.7)

Consider the Markov chain with transition matrix P , defined as

Pij =

Pij , i 6∈ B1, i = j ∈ B0, otherwise.

FiB = 1 by Lemma 2.1 for the original MC. Therefore all states i ∈ E, with i 6∈ B, are

transient in the new Markov chain, so that∑n P

nij <∞ if j 6∈ B. But

∑n P

nij =

∑n BP

nij ,

for i, j 6∈ B.

Because m,m′ range over a finite set, c1 := supm∈M 6∈B

∑m′∈M 6∈B

∑kBP

km′m <∞. To complete

the proof, let c2 = ‖∑n MP

n‖µ, then

(2.6) ≤ c2µi +( 1

µi+ c1c2

∑m∈M6∈B

µmµi

)µi ≤ (1 + c2 + c1c2

∑m∈M6∈B

µm)µi.

ii) Use i) and Lemma 2.2ii) to obtain that∑n BP

nµ is µ-bounded. Similar arguments as

in the proof of Proposition 1.5 give the desired result.

This concludes the section, since we have collected enough material to prove the Key

theorem of Part I.

2. The Key theorem and its relation with geometric ergodicity.

This section states and proves the equivalence of strong recurrence and strong convergence.

Moreover, it discusses the extension of Popov’s result to multichain MC’s. Let us first state

the theorem.

Key theorem I For µ a vector on E with µi ≥ 1, ∀ i ∈ E, the two following conditions

are equivalent:

i)

µ− GE

ν <∞

ii)

µ−WGR(M)

P is the transition matrix of an aperiodic MC.

Page 35: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 25

Before proceeding to the proof, we will recapitulate the relations between the various

properties in the following table, including only those that will be frequently used. The

relations that are not recorded, are self-evident from the table.

µ−geometric ergodicity, ν <∞~waperiodicity

+

µ− GR(M) =⇒ µ−WGR(M) ⇐⇒ µ− BS(M) ⇐⇒ µ− R(M)~w ~w ~wµ− GRRS(M)

µ =∑n BP

⇐= µ−WGRRS(M)⇐⇒ µ− BSRS(M)⇐⇒ µ− RRS(M)

Table 1

As the table shows, strong convergence is equal to strong recurrence with aperiodicity.

However, this assertion does not give any information on the relation between the respec-

tive µ-vectors, for which the strong convergence and strong recurrence properties hold.

Mark that µ−geometrically recurrent models are µ−geometrically ergodic for the same

µ-vector. The importance of this property will be discussed in the next chapter.

Proof of Key theorem I : i) ⇒ ii).

In fact we prove µ − BS(M). Notice that under i), Π is stochastic for any starting state

i ∈ E, as

|1−∑j

Πij | = |∑j

(Pkij −Πij

)| ≤

∑j

|P kij −Πij |

≤∑j

|P kij −Πij |µj ≤ cβkµi → 0, k →∞.(2.8)

Also ‖Π‖µ <∞, since∑j

Πijµj ≤∑j

|Πij − Pij |µj +∑j

Pijµj ≤ (cβ + c1)µi, ∀ i ∈ E.

Therefore we can apply the results in Lemmas 2.1, 2.4. Let D be a matrix with elements

Dij =∑∞k=0

(Pkij −Πij

). This sum is absolutely convergent, and

∑j

|Dij |µj ≤∞∑k=0

∑j

|P kij −Πij |µj ≤∞∑k=0

βkµi ≤c

1− βµi.

Hence ‖D‖µ ≤ c/(1− β). D is usually called the deviation matrix (cf. Dekker & Hordijk

[1988] and Chapter 4). Condition i) imply the existence of a finite set B ⊂ E of reference

states. Choose ε > 0. By Lemma 2.4 a set M ⊃ B exists such that∑j 6∈M Πijµj ≤ ε, for

all i ∈ E.

Page 36: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

26 Part I, Chapter 2

Let g and w be vectors on the state space E with gi :=∑j 6∈M Πijµj , and wi :=

∑j 6∈M Dijµj .

Because of the µ-boundedness of the matrices Π and D, these sums converge (absolutely)

and

wi =∑j 6∈M

∞∑k=0

(Pk −Π

)ijµj =

∑j 6∈M

(P

0 −Π)ijµj +

∑j 6∈M

(P∞∑k=0

(Pk −Π)

)ijµj

= µi1i 6∈M − gi + (Pw)i. (2.9)

Define v = w − φ · e, with φ := mini∈M wi, then vi is nonnegative for i ∈ M . It is

straightforward to check that ‖v‖µ <∞ and that v satisfies equation (2.9) as well. By the

specific choice of set M , gi ≤ ε and thus µi − gi ≥ µi − ε ≥ (1 − ε)µi ∀ i ∈ E, as µi ≥ 1.

Together with equality (2.9) this yields

vi ≥ (1− ε)µi + (Pv)i ≥ (1− ε)µi + (MP v)i, ∀ i 6∈M.

Since the inequality holds for i 6∈M , we can multiply both sides with the taboo probability

matrix MPn, while still preserving the inequality. In vector notation

MPnv ≥ MP

n(1− ε)µ+ MP

n+1v.

Summing over the first N values of n, we get

MP v −MPN+1

v ≥ (1− ε)N∑n=1

MPkµ, ∀N ∈ IN,

so thatN∑n=0

MPnµ ≤ µ+

1

1− ε

(MP |v|+ MP

N+1|v|), ∀N ∈ IN. (2.10)

The righthandside of (2.10) has a finite µ-norm, which is bounded uniformly in N ∈ IN,

because ‖v‖µ <∞ and ‖MPN+1‖µ ≤ ‖P

N+1‖µ ≤ ‖PN+1−Π‖µ+‖Π‖µ ≤ βN+1c+‖Π‖µ ≤

c + ‖Π‖µ, N ∈ IN. Consequently, the lefthandside has a finite µ-norm, uniformly in N .

Taking the limit as N tends to infinity, we find that∑n MP

nµ has a finite µ-norm. This

completes the first part of the proof.

ii) ⇒ i).

It is sufficient to prove that ‖Pn−Π‖µ → 0, as n→∞. To see that, use arguments similar

to the proof of Lemma 2.2 i) “µ−R(M)⇒ µ−WGR(M)” and the fact that Pn−Π =

(P−Π

)nfor n ≥ 1.

We start from condition µ−WGRRS(M) for constants c > 0 and β < 1, i.e. ‖BPn‖µ ≤ cβn,

n ∈ IN, with B a set of reference states. Using the bounds from the conditions, we

reduce the proof to an analysis of finite summations. To this end we apply first entrance

decomposition to set B:

Pnijµj = BP

nijµj +

∑l∈Eb∈B

n−1∑k=0

BPn−k−1il PlbP

kbjµj .

For fixed N < n− 1 we have

Page 37: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 27

∑j

|Pnij −Πij |µj ≤∑j

BPnijµj +

∑j

∑l∈Eb∈B

N−1∑k=0

BPn−k−1il PlbP

kbjµj

+∑j

|∑l∈Eb∈B

n−1∑k=N

BPn−k−1il PlbP

kbj −Πij |µj .

(2.11)

For the first two terms in the righthandside of (2.11) suitable bounds can be found directly.

Obviously ∑j

BPnijµj ≤ cβnµj .

Furthermore,Lemma 2.3ii) and the assumptions yield the existence of a c′ > 0, such that∑j

∑l∈Eb∈B

N−1∑k=0

BPn−k−1il PlbP

kbjµj =

∑l∈Eb∈B

N−1∑k=0

BPn−k−1il Plb

∑j

Pkbjµj

≤∑l∈E

N−1∑k=0

BPn−k−1il

∑b∈B

Plb · c′µb

≤∑l∈E

N−1∑k=0

BPn−k−1il c′2µl

≤(βn−1 + · · ·+ βn−N

)c · c′2µi ≤

βn−N

1− βc · c′2µi.

So, a first bound to (2.11) is

(2.11) ≤ βnc(

1 +c′2

βN (1− β)

)µi +

∑j

|∑l∈Eb∈B

n−1∑k=N

BPn−k−1il PlbP

kbj −Πij |µj . (2.12)

Choose a positive ε. By virtue of Corollary 2.1 ‖Π‖µ ≤ c′ <∞ and the results of Theorem

2.1 and Lemma 2.4 are valid. These imply the existence of a finite set D ⊃ B, such that∑j 6∈D

Pkbjµj ≤

ε

4, k ∈ IN, ∀ b ∈ B

∑j 6∈D

Πijµj ≤ε

4, ∀ i ∈ E.

Consequently,∑j 6∈D

|∑l∈Eb∈B

n−1∑k=N

BPn−k−1il PlbP

kbj −Πij |µj ≤

∑l∈Eb∈B

n−1∑k=N

BPn−k−1il Plb

ε

4+ε

4

≤ FiBε

4+ε

4=ε

2.

(2.13)

Page 38: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

28 Part I, Chapter 2

This gives a new bound to (2.12), and leaves us the problem of determining a suitable

bound to ∑j∈D|∑l∈Eb∈B

n−1∑k=N

BPn−k−1il PlbP

kbj −Πij |µj . (2.14)

Because of the convergence of Pk

to Π, for k →∞, N(ε) can be chosen to satisfy

|P kbj −Πbj | <ε

4|D|(maxd∈D

µd)−1, ∀ b ∈ B, ∀ j ∈ D, ∀ k ≥ N(ε). (2.15)

We study expression (2.14) for the separate values of j ∈ D, i.e. expression

(∑l∈Eb∈B

n−1∑k=N

BPn−k−1il PlbP

kbj −Πij

)µj . (2.16)

Assume N(ε) ≤ N ≤ n− 1. We distinguish two different cases.

Case 1: j is recurrent and belongs to the class of b∗ ∈ B.

(2.16) =(∑l∈E

n−1∑k=N

BPn−k−1il Plb∗P

kb∗j − Fib∗Πb∗j

)µj =

(n−1∑k=N

F(n−k)ib∗ P

kb∗j − Fib∗Πb∗j

)µj ,

because the states in B do not communicate. Together with (2.15), Lemma 2.3ii) and the

fact that j ∈ D, we obtain

(2.16) ≤(n−1∑k=N

F(n−k)ib∗

(Πb∗j +

ε

4|D|(maxd∈D

µd)−1)− Fib∗Πb∗j

)µj ≤

ε

4|D|, (2.17)

and

(2.16) ≥(n−1∑k=N

F(n−k)ib∗

(Πb∗j −

ε

4|D|(maxd∈D

µd)−1)− Fib∗Πb∗j

)µj

=((Fib∗ −

∞∑k=n−N+1

F(k)ib∗)·(Πb∗j −

ε

4|D|(maxd∈D

µd)−1)− Fib∗Πb∗j

)µj

≥ −∞∑

k=n−N+1

F(k)ib∗ Πb∗jµj − Fib∗

ε

4|D|(maxd∈D

µd)−1µj

≥ −βn−N

1− βcc′µi ·Πb∗jµj −

ε

4|D|, (2.18)

as evidently F(k)ib∗ is smaller than

∑l BP

k−1il µl · Plb∗µb∗/µl ≤ cβk−1µic

′. Combine (2.17)

and (2.18) to obtain the following bound for the absolute value of expression (2.16)

|(2.16)| ≤ βn−N

1− βc′cµi ·Πb∗jµj +

ε

4|D|. (2.19)

Page 39: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 29

Case 2: j transient.

Clearly Πij = 0, ∀ i ∈ E. Also Pkbj = 0, ∀ b ∈ B, k ∈ IN, if j is transient. So, in this case

expression (2.16) is equal to 0.

Using that ‖Π‖µ ≤ c′, we get an upperbound for (2.14)

(2.14) ≤∑j∈Db∈B

Πbjµjβn−N

1− βc′cµi +

ε

4≤∑b∈B

µbβn−N

1− βc′2cµi +

ε

4. (2.20)

Finally (2.12), (2.13) and (2.20) together imply that∑j

|Pnij −Πij |µj ≤(βnc

[1 +

c′2

(1− β)βN(1 +

∑b∈B

µb)]

+3

4ε)µi, ∀ i ∈ E,n ≥ N(ε).

Obviously N∗(ε) ≥ N(ε) can be chosen, such that the coefficient of µi is smaller than ε

for all n ≥ N∗(ε). This concludes the proof of the second part of the theorem.

As pointed out in section 1.2, the Key theorem implies the following table of equivalences

for irreducible, aperiodic Markov chains.

strong recurrence(2∗) ⇐⇒ (2)~w ~w Popov

strong convergence ⇐⇒ geometric ergodicity

Table 2

In this section we proved the equivalence of strong recurrence and strong convergence for

multichain Markov chains. Moreover, the equivalence “(2) ⇐⇒ (2∗)” is valid, indepen-

dently of the MC structure. For obvious reasons, strong convergence implies geometric

ergodicity together with stability of the MC. So, in order to obtain the analogy of Table 2

for multichain MC’s, we only have to achieve that geometric ergodicity implies (2). Since

geometric ergodicity in itself does not imply stability, we need assume this in addition.

Furthermore, (2) and (2∗) involve finite sets, so that ν < ∞ is a necessary condition as

well.

Most arguments in Popov are valid for the multichain case. As his exposition is rather

succinct, we review his proof more elaborately. For notational convenience we will denote

by x a real variable and by z a complex variable. Let cij > 0 and β < 1 be constants,

such that

|Pnij −Πij | < cijβn, n ∈ IN0, ∀ i, j ∈ E.

The generating function of the marginal probabilities P (z) satisfies the following expression

for z ∈ D0,1 = z ∈ C | |z − 0| < 1

Pij(z) :=∞∑n=0

Pnijz

n =∞∑n=0

(Pnij −Πij

)+

Πij

1− z, i, j ∈ E. (2.21)

Page 40: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

30 Part I, Chapter 2

By the geometric ergodicity assumption expression (2.21) is a meromorphic extension

of P (z) in the disk D0,β−1 with a unique simple pole in z = 1. Denote by Fij(z) the

probability generating function of the recurrence times to state j, if the initial state is i,

i.e.

Fij(z) =∑n≥1

F(n)ij zn, z ∈ D0,1.

It is well-known (cf. Chung [1967], §10) for z ∈ D0,1, that Pij(z) and Fij(z) are related to

each other as follows,

Pij(z) = Fij(z) · Pjj(z), j 6= i

Pii(z) = 1 + Fii(z) · Pii(z).

(2.22)

(2.23)

The stability assumption together with ν < ∞ imply the existence of a finite set B of

reference states. Let b ∈ B, then Πbb > 0. Pbb(x) > 0 for x ∈ (0, 1). So, for i 6= b and

z ∈ [0, 1] ∩ IR we can rewrite Fib(z) as follows

Fib(z) =(1− z)Pib(z)(1− z)Pbb(z)

=(1− z)

∑n

(Pnib −Πib

)zn + Πib

(1− z)∑n

(Pnbb −Πbb

)zn + Πbb

. (2.24)

By assumption (1 − z)Pib(z), (1 − z)Pbb(z) are analytic in z ∈ D0,β−1 . The zeros of the

latter function determine whether expression (2.24) is an analytic continuation of Fib(z)

in a disk D0,R, for some R > 1. For notational simplicity we write fb(z) = (1− z)Pbb(z).

Since fb is analytic in D0,β−1 and fb 6≡ 0, there are only isolated zeros in any closed disk

D0,r, r < β−1 (cf. e.g. Titchmarsh [1939], 2.6). Moreover fb has no zeros for z with |z| = 1.

To see this we use (2.23) for i = b. The Taylor coefficients of Fbb(z) are real, non-negative

and Fbb(1) = 1. Aperiodicity together with the Lemma on p. 29 in Chung [1967] yield,

after some reflection, that |Fbb(z)| < 1 for z ∈ C0,1 \ 1, with C0,1 = z ∈ C | |z− 0| = 1.

Fix z0 ∈ C0,1 \ 1. There is a sequence znn∈IN ⊂ D0,1, such that Fbb(zn) 6= 1 and

limn→∞ zn = z0. As Pbb(z) is analytic in D0,β−1 \ 1, we obtain through (2.23)

Pbb(z0) = limn→∞

Pbb(zn) = limn→∞

1

1− Fbb(zn)=

1

1− Fbb(z0)6= 0.

Combination with the fact that Pbb(z) has only isolated zeros, this gives the existence of

R(b) ∈ (1, β−1) and R(b) < 1, such that D0,R(b) \ D0,R(b) does not contain any zero of fb.

Together with the analyticity of Fib(z) in D0,1 we achieve that Fib(z) can be analytically

continued in D0,R(b), ∀ i 6∈ B.

Similarly, using (2.23) we obtain the analytic continuation of Fbb(z) in D0,R(b) through

Fbb(z) := 1− (1− z)(1− z)Pbb(z)

. (2.25)

Page 41: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 31

Let R < minb∈B R(b). Then ∀ i ∈ E, ∀ b ∈ B, Fib(z) is analytic in D0,R, and has a

unique Taylor expansion in z = 0. So,

Fib(z) =∑n≥1

F(n)ib zn, i ∈ E, b ∈ B, ∀ z ∈ D0,R.

Hence Fib(z) has a radius of convergence strictly larger than 1, uniformly in i ∈ E and

b ∈ B. Since the coefficients are non-negative

F(n)ib ≤ Fib(R) ·R−n, n ∈ IN, i ∈ E, b ∈ B.

The states in B do not communicate, so that

FiB(z)= z∑j 6∈B

PijFjB(z) + zPiB , i 6∈ B (2.26)

FbB(z)= Fbb(z) = z∑j 6=b

PbjFjB(z) + zPbb, b ∈ B. (2.27)

Moreover, FiB(z) =∑b∈B Fib(z) converges in D0,R. Setting µi := FiB(R), for i 6∈ B, and

µi := 1 for i ∈ B, we obtain by virtue of (2.26) and (2.27)

µi = R∑j∈E

Pijµj , i 6∈ B

Fbb(R) = R∑j∈E

Pbjµj , b ∈ B.

As Fbb(R) <∞, b ∈ B, we conclude that (µ, 1− 1R , B) is a solution to system (2). Notice

that µ is bounded below by a positive constant, since FiB = 1 because of the stability

property, and FiB(R) ≥ FiB(1). Finally we emphasize that essential use is made of ν <∞,

to obtain a common convergence radius of Fib(z) strictly exceeding 1, for all i ∈ E and

b ∈ B.

This establishes the following lemma, which is stated in the form of a table.

Lemma 2.6.

strong recurrenceaperiodicity

(2∗)⇐⇒ aperiodicity

(2)~w ~wstrong convergence

ν <∞ ⇐⇒ geometric ergodicitystability, ν <∞

Table 3

In the derivation for multichain MC’s it is crucial, that our definition of geometric er-

godicity requires the existence of one contraction factor for all pairs of states. In general

Page 42: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

32 Part I, Chapter 2

such contraction factor need not exist, so that Kendall’s definition of geometric ergodicity

is less restrictive, even in the case of unichain Markov chains. The following example is

illustrative.

Example: Suppose the state space E consists of the nonnegative integers. The matrix of

transition probabilities is defined by

P00 = 1

Pi0 = 1− Pii =1

i.

0 is the only recurrent class, all other states are transient. The stationary probabilities

are Πij = 1, 0, if j = 0, 6= 0 respectively. It is easy to see that |P kij − Πij | is equal to 0, if

j 6∈ i, 0, and ( i−1i )k otherwise.

As a contraction factor in state i we can choose (i − 1)/i with contraction coefficient 1.

Therefore the model is geometrically ergodic in Kendall’s definition, if the condition of ir-

reducibility is relaxed. However, there is no contraction factor simultaneously for all states.

So the chain is not geometrically ergodic in the sense of (1.1). ææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææ

3. Strong convergence and recurrence of a uniformizable Markov process.

Consider a MP X(t)t≥0 on a denumerable state space E. We denote the transition

matrix as P (t), with Pij(t) = IP(X(t) = j | X(0) = i). The intensity matrix Q = (qij)i,j∈Eis defined as

qij := limt↓0

(P (t)− I

)ij

t.

Throughout this section we make the following standard assumption.

Assumption 2.1: The MP is standard, conservative, i.e.qij ≥ 0, ∀ j 6= i, ∀ i ∈ E

qi :=∑j 6=i

qij = −qii <∞, ∀ i ∈ E,

and there are no instantaneous states.

It is well-known that the process remains an exponentially (qi) distributed amount of time

in state i, after which it jumps to state j with probability (1− δij)qij/qi. For more details

we refer to Chung [1967].

We will study the MP through the AMC. To this end we need the following assumption.

Assumption 2.2: The MP is uniformizable, i.e. q := supi∈E

qi <∞.

As the AMC of the process we take the MC with transition matrix Ph = I +hQ, for some

h < q−1, as a first order approximation of P (t). It is crucial for our analysis that for

Page 43: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 33

uniformizable MP’s an explicit expression for P (t) as a function of Ph or Q exists. Indeed

Pij(t) =(etQ)ij

=(eth−1(Ph−I)

)ij

=

∞∑n=0

Pnh,ij

(th−1)n

n!e−th

−1

, ∀ i, j ∈ E, (2.28)

where Pnh is the nth iterate of Ph and P

0h = I. For the ergodicity properties we need the

stationary matrix Π, which is defined as

Πij = limt→∞

Pij(t), ∀ i, j ∈ E.

This limit always exists (cf. Chung [1967]) and Π is the stationary matrix of the MP iff it

is the stationary matrix of the AMC (cf. e.g. Ross [1983]).

The following types of ergodicity are studied in this section. Suppose that µ is a vector

with µi ≥ 1, ∀ i ∈ E.

Definition 2.2: The MP satisfies property

– exponential ergodicity, if there are constants cij > 0, β > 0, such that

|Pij(t)−Πij | ≤ cije−βt, t ≥ 0, ∀ i, j ∈ E.

– strong ergodicity, if there are constants c > 0, β > 0, such that∑j∈E|Pij(t)−Πij | ≤ ce−βt, t ≥ 0, ∀ i, j ∈ E.

– µ-exponential ergodicity (µ− EE) if there are constants c > 0, β > 0 such that‖P (t)−Π‖µ ≤ ce−βt, t ≥ 0

‖P (t)‖µ ≤ c, for some t > 0.

The first and second definitions for irreducible MP’s are well-known from the literature.

Indeed, under irreducibility of the process Π has equal rows. Tweedie [1981] proved that an

irreducible MP is exponentially ergodic iff a finite set M ⊂ E, a λ > 0, with λ < qi ∀ i ∈M ,

and a vector y with yi ≥ 0 exist, such that

(3)

infi 6∈M

qi > λ∑k∈E

qikyk ≤ −λyi − 1, ∀ i 6∈M∑k∈E

qikyk <∞, ∀ i ∈M

yi = 0, ∀ i ∈M.

(∗)

It is strongly ergodic iff y is bounded.

Page 44: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

34 Part I, Chapter 2

Several remarks are due. In the first place uniformizability is not required for Tweedie’s

result. Secondly, our presentation of system (3) is slightly different from Tweedie’s. He

formulates it in terms of the transition probabilities of the jump chain. Furthermore we

generalized it, to allow for multichain structures. There is however one essential difference

that concerns inequality (∗). To our opinion the λ in (∗) is omitted in Tweedie’s system

(5).

For the definition of recurrence properties we need the taboo probability matrix MP (t).

Let Si be the stochastic variable that denotes the sojourn time in state i. For taboo set

M the taboo probabilities of the MP are (cf. Theorem II.11.6 from Chung [1967])

MPij(t) =

0, j ∈Mδije

−qit + IPiSi < t,X(s) 6∈M ∀ s ∈ (Si, t), X(t) = j

=∑

n≥m≥0

(Ph,ii)mMP

n−mh,ij · e−th

−1 (th−1)n

n!, i ∈M, j 6∈M

∑n≥0

MPnh,ije

−th−1 (th−1)n

n!, i, j 6∈M.

So MP (t) can be expressed as a function of the taboo pobabilities of the AMC. Notice that

there is a discrepancy between the definitions of the taboo probability matrix of a MC

and a MP. The taboo for the MP is imposed after the first exit from i for states i ∈ M ,

whereas this is not the case for a MC. We get a more consistent expression, when the

taboo probabilities of the MP are rewritten as a function of the taboo probabilities of the

jump chain. Moreover, mark that in the definition of MP (t) the exponential distribution

with rate qi remains the clock in state i, so that the taboo probabilities are independent of

the length h of the time discretization interval. We define the following recurrence notion.

Definition 2.3: A MP satisfies

– µ-weak exponential recurrence (µ−WER), if there are a finite set M ⊂ E and constants

c > 0, β > 0, such that

‖MP (t)‖µ ≤ ce−βt.

Then a MP is strongly convergent (recurrent) if a vector µ, with µi ≥ 1 ∀ i ∈ E, exists,

such that it satisfies µ− EE (µ−WER).

The version of strong recurrence that we present, is difficult to verify in applications, since

it involves all powers of the taboo probability matrix. A generalization of the µ − GR

property is in fact system (3). In the course of this section it will appear that strong

recurrence of the MP, strong recurrence of the AMC and system (3) are equivalent. Thus

both system (3) and condition µ−GR can be used as effective criteria to check exponential

ergodicity, strong recurrence and strong convergence of the MP. The first step of the proof

is contained in the following lemma.

Page 45: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 35

Lemma 2.7 µ − GR for a MC with transition matrix P = (I + hQ) is equivalent to (3).

Furthermore µ is bounded iff y is.

Proof: Suppose that µ − GR holds for the finite set M and β < 1. Then for i 6∈ M ,

(1 − qih) = Ph,ii ≤ µ−1i

∑j 6∈M Pijµj ≤ β, hence qi ≥ (1 − β)/h. Consequently inf

i 6∈Mqi ≥

(1− β)/h > 0. Then (λ,M, y) with0 < λ ≤ 1− β

h

yi =h

1− β(µi − 1), i 6∈M

yi = 0, i ∈M

is a solution to (3). Conversely, if (λ,M, y) is a solution to (3), then the AMC satisfies

µ− GR for µ and β with

β < 1− λh

µi =1− βh

yi + 1, i 6∈M

µi ≥ max1, β−1∑j 6∈M

qijµj, i ∈M.

The proof of the main theorem of this section is straightforward.

Theorem 2.2 Consider a MP and the AMC. The two following equivalences hold.

µ−WER(M) of MP ⇐⇒ µ−WGR(M) of AMCi)

µ− EE of MP ⇐⇒ µ− GE of AMC.ii)

Proof: i) Suppose the MP satisfies µ − WER(M). This means that for some constants

c > 0, β > 0

ce−βtµi ≥∑j∈E

MPij(t)µj =∑j∈E

∑n≥m≥0

(Ph,ii)me−th

−1 (th−1)n

n!MP

n−mh,ij µj ,

∀ i ∈ E∀ t ≥ 0.

(2.29)

Integrate over t ≥ 0, then

c

βµi ≥

∑j∈E

∑n≥m≥0

(Ph,ii)mMP

n−mh,ij µj

∫t≥0

e−th−1 (th−1)n

n!dt, (2.30)

where the reverse of summation and integration order is justified by the Fubini-Tonelli

theorem (cf. Royden [1968]). It is easy to derive that∫t≥0

e−th−1 (th−1)n

n!dt = h, n ∈ IN0. (2.31)

Indeed, partial integration yields for n ∈ IN0

Page 46: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

36 Part I, Chapter 2

∫t≥0

e−th−1 (th−1)n+1

(n+ 1)!dt = −he−th

−1 (th−1)n+1

(n+ 1)!

∣∣∣∞0

+ h

∫t≥0

e−th−1 (th−1)n

n!· 1

hdt

=

∫t≥0

e−th−1 (th−1)n

n!dt.

This establishes (2.31), since∫t≥0

e−th−1

dt = h. Insert this in (2.30) to obtain for any

h < q−1,

c

βµi ≥

∑j∈E

∑n≥m≥0

(Ph,ii)mMP

n−mh,ij µj ≥

∞∑n=0

MPnh,ijµj . (2.32)

Lemma 2.2 asserts that (2.32) implies condition µ−WGR(M).

Conversely, suppose µ−WGR(M) to hold for some h < q−1 and constants c > 0, β < 1, i.e.∑j∈E

MPnh,ijµj ≤ cβnµi, ∀ i ∈ E, ∀n ∈ IN0. (2.33)

Substituting (2.33) in the righthandside of (2.29), we obtain for i ∈ E with qi > 0

∑j∈E

MPij(t)≤∑

n≥m≥0

(Ph,ii

)me−th

−1 (th−1)n

n!cβn−mµi

=

∞∑n=0

e−th−1 (th−1β)n

n!

n∑m=0

(Ph,iiβ

)m· cµi. (2.34)

We need the last summation to converge. This is true if Ph,ii < β. For i 6∈ M the

µ−GR(M) property directly implies that Ph,ii ≤ β. Consider any i ∈M with qi > 0. Then

Ph,ii < 1. Since there are at most finitely many such states i ∈M , a β1 ∈ [β, 1) exists, for

which Ph,ii < β1, ∀ i ∈ E with qi 6= 0. Obviously the µ − GR(M) property remains valid

for this β1. Hence (2.34) holds with β replaced by β1. If for some i ∈M qi = 0, then i is

an absorbing state and MPij(t) = 0, ∀ j ∈ E. Combination with (2.34) yields that

∑j∈E

MPij(t) ≤1

1− Ph,iiβ−11

· e−(1−β1)th−1

.

ii) Suppose that µ−EE holds for constants c > 0, β > 0. First we will show that ‖Q‖µ <∞.

Choose i ∈ E with qi > 0. We derive a lower bound for Pij(t), j 6= i.

Pij(t) ≥t∫

0

qie−qiu · qij

qi· IP(Sj > t− u)du ≥

t∫0

qije−qu · e−q(t−u)du

=

t∫0

e−qtqijdu = te−qt · qij ∀ t > 0.

Page 47: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 37

By assumption t∗ > 0 and c∗ > 0 exist, such that ‖P (t∗)‖µ ≤ c∗. Hence with c′ := c∗eqt∗/t∗∑

j 6=i

qijµj ≤∑j 6=i

Pij(t∗)µj · eqt

∗ 1

t∗≤ c′µi,

so that∑j |qij |µj ≤ (c′ + q)µi, ∀ i ∈ E with qi > 0. If qi = 0, the inequality above holds

trivially, since qij = 0, ∀ j ∈ E. Consequently ‖Q‖µ ≤ c, for c = c′ + q.

For the proof of µ−GE it is sufficient to prove that no ∈ IN and ho < q−1 exist, such that

‖Pnoho −Π‖µ ≤ ε, (2.35)

for some ε ∈ (0, 1) (cf. the proof of the Key theorem in section 2). Fix ε ∈ (0, 1) and tosuch that ce−βto ≤ ε/2, then

‖P (to)−Π‖µ ≤ε

2. (2.36)

The proof is complete if we establish the existence of an No ∈ IN such that

‖P (to)− Pn

ton−1‖µ = ‖etoQ −(I +

tonQ)n‖µ ≤ ε

2, ∀n ≥ No. (2.37)

Indeed, then (2.37) holds for no ≥ No, with tono

< q−1. Combination with (2.36) yields

(2.35) for ho = tono

. We study |etQ − (I + tnQ)n|ij . Denote (Qm)ij as qmij , then

|etQ − (I +t

nQ)n|ij ≤

∑m≥n+1

tm|qmij |m!

+t2|q2

ij |2!

(1− (1− 1

n ))

+ · · ·

+tn|qnij |n!

(1− (1− 1

n )(1− 2n ) · · · (1− n−1

n ))

Hence,∑j∈E|etQ − (I +

t

nQ)n|ijµj ≤

∑m≥n+1

tm

m!

∑j∈E|qmij |µj +

t2

2!

∑j∈E|q2ij |µj

(1− (1− 1

n ))

+ · · ·

+tn

n!

∑j∈E|qnij |µj

(1− (1− 1

n )(1− 2n ) · · · (1− n−1

n ))

≤ ∑m≥n+1

tm

m!cm +

t2

2!c2(1− (1− 1

n ))

+ · · ·

+tn

n!cn(1− (1− 1

n ) · · · (1− n−1n ))µi

= |etc − (1 +t

nc)n|, (2.38)

where we use that ‖Qn‖µ ≤ ‖Q‖nµ ≤ cn, n ∈ IN0, to derive the second inequality. As

limn→∞(1 + tnc)

n = etc, ∀ t ∈ IR, an No ∈ IN exists, such that |etoc − (1 + ton c)

n| < ε2 ,

∀n ≥ No. This establishes (2.35).

Page 48: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

38 Part I, Chapter 2

Since also ‖Pho‖µ = ‖1 + hoQ‖µ ≤ 1 + hoc, µ− GE holds for the AMC with h = ho.

Conversely, suppose that µ− GE holds for constants c > 0, β < 1. Then

∑j∈E|Pij(t)−Πij |µj =

∑j∈E|∞∑n=0

Pnh,ije

−th−1 (th−1)n

n!−Πij |µj

=∑j∈E|∞∑n=0

(Pnh,ij −Πij

)e−th

−1 (th−1)n

n!|µj

≤∞∑n=0

cβne−th−1 (th−1)n

n!µi ≤ c · e−(1−β)th−1

,

so that ‖P (t)−Π‖µ ≤ ce−λt, for λ = (1−β)h−1. Condition µ−GE implies that ‖Π‖µ <∞.

Hence ‖P (t)‖µ <∞, ∀ t ≥ 0. This concludes the proof of ii).

For other recurrence conditions in section 2 similar relations between the MP and the

AMC can be derived. We did not succeed in finding direct arguments for the equivalence

of exponential ergodicity of the MP and geometric ergodicity of the AMC. However, since

µ−GR is equivalent to geometric ergodicity by virtue of the results in the previous section,

Lemma 2.7 and Tweedie’s result can be applied to yield equivalence of geometric ergodicity

of the AMC and exponential ergodicity of the MP.

The last theorem of the section summarizes all equivalence results, hitherto obtained for

the various recurrence and ergodicity conditions we study for MP’s. After the statement

of the theorem we will briefly comment on how the proof by Tweedie can be generalized

to hold for multichain processes.

Let µ be a vector with µi ≥ 1, ∀ i ∈ E.

Theorem 2.3 The following sets of conditions are equivalent.

i) exponential ergodicity of the MP, ν <∞, stability.

ii) µ− EE of MP, ν <∞.

iii) µ−WER(M) of MP.

iv) geometric ergodicity of the AMC, ν <∞, stability.

v) µ− GE of the AMC, ν <∞.

vi) µ−WGR(M) of the AMC.

Proof: The Key theorem together with Lemma 2.7 establish the equivalence of ii), iii),

iv), v) and vi). Similar arguments as for (2.8) apply to show that by ii)∑j Πij = 1,

∀ i ∈ E, hence the MP is stable. As a direct consequence we obtain that ii) implies i).

The condition ν < ∞ comes in, since the recurrence conditions used, as well as system

(3), involve finite sets.

So, for the completion of the proof of the theorem we only need construct a solution to

system (3), if i) is assumed. This solution is related to the recurrence times to a finite set,

Page 49: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Equivalence of Strong Convergence and Strong Recurrence 39

similarly to the constructed solution of (2) under the assumption of geometric ergodicity.

Clearly, Tweedie’s derivation can be generalized to hold for multichain MP’s, but it uses

deep results by Kingman [1964] to prove necessity and sufficiency of his criterion for

exponential ergodicity. Since we only have to prove necessity, we can use more direct

arguments to show, that the “minimal” solution to (3) is finite. The further derivation

by Tweedie can be literally copied. For the validity of the arguments no assumption on

uniformizability is necessary.

We will use results for MC’s to obtain the similar result for the MP. So, let us consider

a MP satisfying i). The skeleton chain X(nh)n∈IN0is a MC with transition matrix

P (h) and it satisfies iv) ∀h > 0. Hence there is a solution (µ, ε,M) to system (2),

and by virtue of Proposition 1.3 and Remark 1.2 the recurrence time T d to set M has

exponentially bounded tails, i.e.

IPiT d ≥ n ≤ βnµi, (2.39)

for some β < 1 and vector µ on E. We define the recurrence time T c to set M in the MP

as

T c := inft>0Si < t,X(s) 6∈M ∀ s ∈ (Si, t), X(t) ∈M.

This definition is different from the one used by Tweedie. However, to our opinion he uses

in fact our definition for his derivations.

The skeleton chain only observes the state of the system at discrete time points nhn, so it

records a visit to set M in an interval (0, t] later than the MP. Hence, for t ∈((n−1)h, nh

],

n ≥ 1,

IPiT c ≥ t ≤ IPiT d ≥ n.Combination with (2.39) yields for λ > 0 with β ≤ e−λh, that

IPiT c > t ≤ βnµi ≤ ci · e−λt, (2.40)

for ci = eλhµi. This implies for λ′ < λ that

yi :=∑j 6∈M

∫t≥0

IPiT c > t,X(t) = jeλ′tdt ≤

∫t≥0

IPiT c > teλ′tdt <∞.

Define yi = 0 for i ∈M . If we show that infi6∈M qi ≥ λ′, the arguments from the proof of

Theorem 1 in Tweedie [1981] can be copied to show that (y, λ∗,M) is a solution to system

(3), for any λ∗ < λ′. Let i 6∈M . Then

∞ > yi ≥∫t≥0

IPiT c > t,X(t) = ieλ′tdt

≥∫t≥0

IPiSi > teλ′tdt =

∫t≥0

e(λ′−qi)tdt.

The last expression converges iff λ′ < qi, ∀ i 6∈M . Hence infi 6∈M qi ≥ λ′ > λ∗.

We conclude that strong convergence and recurrence are equivalent for uniformizable MP’s.

Page 50: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

40 Part I, Chapter 3

CHAPTER THREE

Applications.

1. Special theorems.

In this chapter we first derive several theorems that especially apply to the type of models

we will study. After that we analyse the one-dimensional random walk, the ALOHA-type

system and the coupled processors model. The results derived in this section are also valid

for the two centre open Jackson network with state dependent service rates and the K

competing queues model with a fixed control rule. Since these are special cases of the

corresponding models with control of the service rates and the queue selected for service

respectively, we refer to Chapter 9 for a detailed analysis of these models.

For all models we will show the µ−GR property by the construction of a suitable µ-vector.

An important aspect of the results in the previous chapter is that µ− GR implies µ− GE

(or µ−EE if the approximating chain of a MP is studied) for the same µ-vector. For that

reason µ− GR implies the expected marginal rewards to converge at a geometric rate for

all µ-bounded reward vectors. Therefore it is preferable to construct µ in such a way that

it bounds a large class of vectors.

As was argued in the Introduction, the connection between spatial and temporal geometric

boundedness for geometrically ergodic models involves a µ-vector that is an exponential

function of the states. For one-dimensional models it will be shown to have the following

simple structure

µi = (1 + x)i, for some x > 0, i ∈ E. (3.1)

Suppose that the state space E consists of pairs (i, j) | i, j ∈ IN0. Due to the fact

that the boundary of E contains infinitely many states the structure of suitable bounding

vectors is more complicated, i.e.

µ(i,j) = ci∏

k=1

(1 + xk)

j∏l=1

(1 + yl)

xkk∈IN, yll∈IN non-decreasing

supk∈IN

xk, supl∈IN

yl <∞

infk≥k∗

xk, infl≥l∗

yl > 0 for some k∗, l∗ ∈ IN,

(3.2)

for (i, j) ∈ E = (k, l) | k, l ∈ IN0. The empty product∏i−1k=i · · · is set equal to 1.

Page 51: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 41

For the K competing queues model however, the following simple structure is effective

µ(i1,...,iK) =

K∏k=1

(1 + xk)ik , for x1, . . . , xK > 0, (3.3)

where E = (i1, . . . , iK) | i1, . . . , iK ∈ IN0.

To avoid overburdened notation we derive the results for MC’s with a two-dimensional

state space. The obvious generalization to K-dimensional MC’s can similarly be shown to

hold. So, let X(∞) = (X1(∞), X2(∞)) denote the state under the stationary distribution.

Proposition 3.1 Consider a µ − GE MC, with µ a vector of structure (3.2). There

are constants so1, so2, c(i,j) > 0, β < 1, such that for all initial states (i, j) ∈ E and all

s = (s1, s2) ∈ C2 with |sk| < s0k, k = 1, 2,

|IE(i,j)esX(n)T − IE(i,j)e

sX(∞)T | ≤ c(i,j)βn, n ∈ IN0.

Moreover, if the MC is the approximating chain of a uniformizable MP, then there are

constants c∗(i,j), λ > 0, such that

|IE(i,j)esX(t)T − IE(i,j)e

sX(∞)T | ≤ c∗(i,j)e−λt, t ≥ 0.

Proof: Choose so1, so2, such that es

o1 < 1 + infk≥k∗ xk, es

o2 < 1 + inf l≥l∗ yl. Let c′ :=

maxk<k∗,l<l∗ µ−1(k,l)e

so(k,l)T . Then |eso(i,j)T | ≤ c′µ(i,j). Consequently, for s ∈ C2 with

|sk| < sok, k = 1, 2

|IE(i,j)esX(n)T − IE(i,j)e

sX(∞)T | = |∑(k,l)

Pn

(i,j)(k,l)es(k,l)T −

∑(k,l)

Π(i,j)(k,l)es(k,l)T |

≤∑(k,l)

|Pn(i,j)(k,l) −Π(i,j)(k,l)|eso(k,l)T ≤ c′βnµ(i,j).

Choose c(i,j) = c′µ(i,j). Similar arguments apply for the MP, since by virtue of Theorem

2.2 µ− GE of the aproximating chain implies µ− EE of the MP.

The next theorem deals with criteria that invalidate the strong ergodicity property.

Proposition 3.2 A MC is not strongly ergodic if one of the following conditions holds.

i) ∃n∗ ∈ IN, such that

P(i,j)(k,l) > 0 ⇒ k + l ≥ i+ j − n∗, ∀ (i, j), (k, l) ∈ E.Hence, the stochastic process that records e.g. the total number of customers in a

queueing system satisfies an n∗-skip-free property to the left.

ii) The MC is a one-dimensional random walk on IN0 as in Miller [1966] with ν <∞.

So, there is a sequence ckk∈Z with∑k∈Z ck = 1, and ∃n∗ ∈ IN, such that Pij = cj−i

for i, j > n∗. For i ≤ n∗, Pij is arbitrary but such that∑j∈IN0

Pij = 1. For i > n∗∑j≤n∗ Pij =

∑l≤n∗−i cl.

Page 52: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

42 Part I, Chapter 3

If the MC is the approximating chain of a uniformizable MP, such that it satisfies one of

the conditions i) or ii), the MP is not strongly ergodic.

Proof: i) Recall that strong ergodicity is the same as e-geometric ergodicity. For that

reason we invoke the Key theorem and Lemma 2.2 to obtain that a strongly ergodic chain

satisfies property e − R, i.e. there are n0 ∈ IN, β < 1 and a finite set M ⊂ E, such that

‖MPn0‖e ≤ β.

Choose any finite set M and let m := maxk + l | (k, l) ∈M. Then for any power no of

the taboo probability matrix MP , the taboo probabilities for initial states (i, j) ∈ E with

i+ j > m+ non∗ are not affected by the taboo set. That means that states (i, j) can be

found for which (MP

noe)i

=∑(k,l)

MPno(i,j)(k,l) =

∑(k,l)

Pno(i,j)(k,l) = 1.

Hence, ‖MPno‖e = 1. Since M ⊂ E and no ∈ IN are arbitrarily chosen, the chain cannot

satisfy e− R.

ii) Similar to the proof of i) we have to disprove that the MC has property e − R. So,

let us suppose that it holds for some finite set M ⊂ E, β < 1 and no ∈ IN. We set

M = 0, . . . ,m0 − 1, after possibly renumbering the states. Let β0 ∈ (β, 1) and choose

k0 ∈ Z such that∑k≥k0 ck ≥ β

1/n0

0 . Using induction we show that∑j∈E

MPnij > β

n/n0

0 , i > maxn∗,m0+ n|k0|, n ≤ n0. (3.4)

Indeed, for i > maxn∗,m0+ |k0|,∑j∈E

MPij ≥∑

j≥i−|k0|MPij ≥

∑k≥k0

ck ≥ β1/n0

0 .

Hence, for i > maxn∗,m0+ 2|k0|,∑j∈E

MP2ij =

∑j,l∈E

MPil ·MPlj ≥∑

l≥i−|k0|MPil ·

∑j∈E

MPlj ≥ β2/n0o ,

since l > i − |k0| > maxn∗,m0 + |k0|. Suppose that (3.4) holds upto and including n,

for some n ≤ n0 − 1. For i > maxn∗,m0+ (n+ 1)|k0|∑j∈E

MPn+1ij =

∑l∈E

MPil ·∑j∈E

MPnlj

≥∑

l≥i−|k0|MPil ·

∑j∈E

MPnlj ≥

∑l≥i−|k0|

MPij · βn/n0

0 ≥ βn+1/n0o ,

since l > maxn∗,m0 + n|k0|. Thus,∑j∈E MP

n0

ij ≥ β0, so that ‖MPn0‖e ≥ β0 > β, for

any i > maxn∗,m0+ n0|k0|. This contradicts our original assumption.

Page 53: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 43

iii) By Theorem 2.2 a uniformizable MP and its approximating MC are simultaneously

strongly ergodic. Apply the foregoing results.

2. The one-dimensional random walk.

Consider the generalized random walk on the state space E = IN0, as described in Miller

[1966]. We recall the transition probabilities of this process from the formulation of Propo-

sition 3.2.

So, let ckk∈Z, such that∑k∈Z ck = 1. The transition probabilities are assumed to satisfy

the following set of equations for some n∗ ∈ IN0.∑j

Pij = 1 , i ≤ n∗

Pij = cj−i, i, j > n∗∑j≤n∗

Pij =∑

l≤n∗−i

cl i > n∗.

This description is easily seen to include the embedded GI/M/s-queue on the epochs of

an arrival, the embedded M/G/s-queue on the instants of a service-completion, and the

time-discretized M/M/s-queue.

Obviously aperiodicity is necessary for geometric ergodicity. To this end the following

condition will be assumed.

Aperiodicity assumption: The set of values of k ∈ Z for which ck > 0 does not belong to

the set of multiples of a fixed integer greater than 1.

Stability and spatial geometric boundedness for the process under consideration are char-

acterized as follows.

Stability condition:∑k∈Z

kck < 0.

Spatial geometric boundedness condition: There is an x∗ > 0 such that∑j∈IN0

Pij(1 + x∗)j <∞, i ≤ n∗

∑k≥1

ck(1 + x∗)k <∞.

Notice that irreducibility of the state space does not follow from the conditions and the

only conclusion that can be drawn, is that ν ≤ n∗. Then the characterization of geometric

ergodicity from Miller [1966] is stated in the following proposition.

Proposition 3.3 Under the aperiodicity assumption and irreducibility of the state space

it is necessary and sufficient for ergodicity and geometric ergodicity to require stability

and spatial geometric boundedness.

Page 54: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

44 Part I, Chapter 3

We point out, that Miller also uses a condition that the probability distribution be two-

sided. Since we only consider the stable case, this assumption is not necessary.

In this section we will prove µ − GR for µ a vector with structure (3.1). To this end we

have to find a finite set M ⊂ E, β < 1 and x > 0, such that

∑j 6∈M

Pijµj

µi=∑j 6∈M

Pij(1 + x)j−i ≤ β. (3.5)

Clearly, this expression converges for x < x∗. Since there is no specific information on

the behaviour of the process in states i ≤ n∗, we include these in the finite set M . So,

consider any finite set M ⊃ 0, . . . , n∗. For i > n∗ we obtain∑j 6∈M

Pij(1 + x)j−i≤∑j>n∗

Pij(1 + x)j−i =∑

k>n∗−i

ck(1 + x)k

≤∑k∈Z

ck(1 + x)k (3.6)

Denote the last expression as f(x). Then f(1) = 1 and it converges for x ∈ [0, x∗]. By e.g.

dominated convergence we obtain the existence of the derivative f ′ of f for all x ∈ (0, x∗),

as well as the right derivative, called f ′(0) for convenience, of f in x = 0, i.e.

f ′(x) =∑k

k(1 + x)k−1ck.

Then f ′ is continuous to the right in x = 0. Moreover, by assumption f ′(0) =∑k kck < 0.

Using a first order Taylor expansion we obtain the existence of xo ≤ x∗, such that f(x) <

f(0) = 1, for x ∈ (0, xo). Choose any such x and set β = f(x). Combining this with (3.6),

we conclude that (3.5) is satisfied for this x and β and any finite set M ⊃ 0, . . . , n∗, if

i > n∗.

For i ≤ n∗,∑j>n∗ Pij(1+x)j−i converges by assumption. As n∗ is finite, M can be chosen

to satisfy∑j 6∈M Pij(1 + x)j−i ≤ β, ∀ i ≤ n∗. Notice that no assumption concerning

irreducibility is necessary in this derivation and we believe that necessity of the spatial

geometric boundedness condition can be shown for this case as well. However, it is no

restriction for most queueing applications. Combination with Corollary 2.1 and Lemma

2.6 completes the proof of the following theorem.

Theorem 3.1 Suppose that the aperiodicity condition holds and the state space is irre-

ducible. Spatial geometric boundedness and stability are necessary and sufficient for the

one-dimensional random walk to satisfy µ−GR for µ a vector with stucture (3.1). Hence it

is µ−GE and the assertion of Proposition 3.1 applies. By Proposition 3.2 it is not strongly

ergodic.

Before ending this section, we discuss some special cases of the random walk. Let us

consider the AMC as a time-discretized version of the M/M/s-queue with arrival rate λ

Page 55: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 45

and service rate ν. For h < (λ + sν)−1 the length of the discretization interval, the

discretized process has the following transition probabilities

Pi i+1= λh

Pi i−1=

iνh, i ≤ s

sνh, i ≥ sPi i = 1− Pi i+1 − Pi i−1.

It is a random walk process for c−1 = sνh, c1 = λh, c0 = 1 − c1 − c−1 and n∗ = s − 1.

As can be expected, the stability condition is equivalent to λ < sν. Moreover, the spatial

geometric boundedness condition is automatically satisfied, since the support of ckk is

bounded and the state space is irreducible. Hence, we conclude

Proposition 3.4 The time-discretized M/M/s-queue is µ− GR for µ a vector of structure

(3.1) iff it is ergodic. Hence it is µ − GE, the continous time process is µ − EE and the

conclusions of Propositions 3.1, 3.2 apply.

An analysis of the embedded GI/M/s-queue on the instants of an arrival can be found in

Miller [1966]. Since in this case ck = 0, for k ≥ 2, ergodicity is necessary and sufficient for

µ− GR with µ a vector of structure (3.1).

The embedded M/G/s-queue on the instants of a service completion is studied for s = 1

in Kendall [1960]. It turns out that convergence of the Laplace-Stieltjes transform of the

service time distribution in a neighbourhood of 0 together with ergodicity are necessary

and sufficient for strong recurrence. Again the conclusions of Theorem 3.1 apply.

For the system with s > 1 identical servers, we conjecture that convergence of the Laplace-

Stieltjes transform of the service time distribution in a neighbourhood of 0 is similarly

equivalent to spatial geometric boundedness. This is motivated by the first condition in

the spatial geometric boundedness condition for the random walk with n∗ = s − 1. As

a matter of fact, for i = 1 this condition requires convergence of the Laplace-Stieltjes

transform of the service time distribution. Conversely, for larger states the distribution

of the time between two successive service completions has a converging Laplace-Stieltjes

transform in a neighbourhood of 0, as it is a finite convolution of service time distributions.

Since the arrivals are Poisson distributed, the spatial geometric boundedness seems to

follow.

For s = 1 the first and last models are also special cases of different versions of the K

competing queues model with K = 1, which will be extensively discussed in Chapter 9.

Page 56: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

46 Part I, Chapter 3

3. Buffered, asymmetric ALOHA-type system with two queues.

We adopt this model from Szpankowski [1988]. He derives conditions for the ergodicity of

the ALOHA-type system with r queues. A complete characterization of ergodicity through

the first moments of the one-step transition probabilities only exists for the model with

two queues.

Consider a time-slotted system with two independent queues that want to have access to

the same server, e.g. a broadcast channel. In one time-slot Ak = i packets of fixed length

arrive at queue k with probability pki , i ∈ IN0, k = 1, 2. At the beginning of a time-slot

queue k transmits a packet with probability rk > 0, k = 1, 2. The transmission succeeds

if precisely one queue transmits a packet; if both queues do so, the packets are sent back

to their respective queues.

Let λk = IEAk, k = 1, 2. Szpankowski [1988] derived the following necessary and sufficient

conditions for ergodicity, thereby using results from Malyshev [1972]:

Ergodicity condition:

i) If r1 + r2 ≤ 1, then

(1− r1)(λ1 − r1(1− r2)

)+ r1

(λ2 − r2(1− r1)

)< 0 (A)

r2

(λ1 − r1(1− r2)

)+ (1− r2)

(λ2 − r2(1− r1)

)< 0 (B)

ii) If r1 + r2 > 1 at least one of the inequalities (A), (B) holds.

The system cannot be ergodic if the effective departure rate is smaller than the arrival

rate. Since rk is an upper bound for the departure rate at centre k, necessarily λk < rk,

k = 1, 2. For convenience we will state this in a lemma and give a formal proof.

Lemma 3.1 The ergodicity conditions imply that λk < rk, k = 1, 2.

Proof: Rewrite the lefthandside of (A), i.e.

(1− r1)(λ1 − r1(1− r2)) + r1(λ2 − r2(1− r1))

= (1− r1)(λ1 − r1) + r1(λ2 − r2) + r1r2

= (1− r1)(λ1 − r1) + r1λ2.(3.7)

Hence,(A) implies that (1 − r1)(λ1 − r1) + r1λ2 < 0, so that r1 > λ1. Similarly, r2 > λ2

follows from (B). This proves the assertion if r1 + r2 ≤ 1. Assume r1 + r2 > 1 and (A), so

that λ1 < r1. Consider the second expression in (3.7). Since λ1 − r1 < 0 and 1− r1 < r2,

(1− r1)(λ1 − r1) > r2(λ1 − r1).

Suppose that r2 ≤ λ2. Together with r1 > 1 − r2 this implies that r1(λ2 − r2) ≥ (1 −r2)(λ2 − r2). Combination of both bounds yields (1− r1)(λ1 − r1) + r1(λ2 − r2) + r1r2 ≥r2λ1 + (1− r2)(λ2 − r2) ≥ 0. By (3.7) this contradicts (A). The derivation in the case of

(B) is analogous. This proves the result for r1 + r2 > 1.

Page 57: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 47

Throughout this section we assume the ergodicity condition. Furthermore, to prove strong

recurrence we need a condition on the higher moments of the arrival distributions.

Spatial geometric boundedness condition: ∃ε > 0, such that IEeεAk <∞, k = 1, 2.

For this system, state (i, j) corresponds to i packets in queue 1 and j in queue 2. Then

the transition probabilities are as follows:

P(i,j)(i−1,j+l) = p10p

2l r1δ(i)

(1− r2δ(j)

), l ≥ 0

P(i,j)(i+k,j−1) =p1kp

20r2δ(j)

(1− r1δ(i)

), k ≥ 0

P(i,j)(i+k,j+l) = p1kp

2l

1− r1δ(i)

(1− r2δ(j)

)− r2δ(j)

(1− r1δ(i)

)+ p1

k+1p2l r1δ(i)

(1− r2δ(j)

)+ p1

kp2l+1r2δ(j)

(1− r1δ(i)

), k, l ≥ 0,

where δ(i) = 0, if i = 0, and δ(i) = 1 if i > 0. Aperiodicity of the MC immediately follows.

Indeed, P(0,0)(0,0) = p10p

20 > 0 because λk < rk ≤ 1, k = 1, 2. As µ − GE is shown by the

verification of µ− GR, we have to find a vector µ, a finite set M and a β < 1, such that

∑(k,l)6∈M

P(i,j)(k,l)µ(k,l)

µ(i,j)≤ β, ∀ (i, j) ∈ E. (3.8)

For large initial states the taboo probabilities are not affected by the taboo set, so that

the summation in (3.8) is in fact over all states. Inserting the transition probabilities and

a µ-vector of structure (3.2) in (3.8), we thus obtain for large initial states

∑(k,l)

P(i,j)(k,l)µ(k,l)

µ(i,j)≤ β < 1 (3.9)

⇐⇒∑l≥0

p10p

2l r1δ(i)

(1− r2δ(j)

) j+l∏m=j+1

(1 + ym) · 1

1 + xi

+∑k≥0

p1kp

20r2δ(j)

(1− r1δ(i)

) i+k∏n=i+1

(1 + xn) · 1

1 + yj

+∑k,l≥0

p1kp

2l

[1− r1δ(i)

(1− r2δ(j)

)− r2δ(j)

(1− r1δ(i)

)]+ p1

k+1p2l r1δ(i)

(1− r2δ(j)

)+ p1

kp2l+1r2δ(j)

(1− r1δ(i)

) i+k∏n=i+1

(1 + xn)

j+l∏m=j+1

(1 + ym) ≤ β, (3.10)

where for ease of notation we write x0 = y0 = 0. Combining the terms that contain p1k+1

with the first summation in inequality (3.10), and similarly for the terms containing p2l+1

and the second summation, we obtain

Page 58: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

48 Part I, Chapter 3

(3.10)⇐⇒∑k,l≥0

p1kp

2l

i+k∏n=i+1

(1 + xn)

j+l∏m=j+1

(1 + ym)r1δ(i)

(1− r2δ(j)

) 1

1 + xi+

r2δ(j)(1− r1δ(i)

) 1

1 + yj+[1− r1δ(i)

(1− r2δ(j)

)− r2δ(j)

(1− r1δ(i)

)]≤β

⇐⇒ v(i, j) :=∑k,l≥0

p1kp

2l

i+k∏n=i+1

(1 + xn)

j+l∏m=j+1

(1 + ym)

1− r1δ(i)(1− r2δ(j)

) xi+k1 + xi+k

−r2δ(j)(1− r1δ(i)

) yj+l1 + yj+l

≤ β. (3.11)

By the spatial geometric boundedness condition, (3.11) converges for suitably bounded

sequences xkk∈IN, yll∈IN. Since expression (3.11) is intractable, we start analysing

v(i, j) for sequences xkk, yll that are equal to constants x and y respectively. In that

case v(i, j) only depends on i, j, as they are positive or not. So it has four different forms.

The first one corresponds to i, j = 0, but that is a finite number of states, and as such

need not be taken into account. The other three forms are

f(x, y)=∑k,l≥0

p1kp

2l (1 + x)k(1 + y)l

(1− r1(1− r2)

x

1 + x− r2(1− r1)

y

1 + y

)f1(x, y)=

∑k,l≥0

p1kp

2l (1 + x)k(1 + y)l

(1− r1

x

1 + x

)f2(x, y)=

∑k,l≥0

p1kp

2l (1 + x)k(1 + y)l

(1− r2

y

1 + y

),

which correspond to the states where both queues contain packets or only the first and

second one respectively.

It is possible to choose constant sequences xkk, yll only if there is a solution x, y > 0,

for which f(x, y), fk(x, y), k = 1, 2, are simultaneously strictly smaller than 1. Clearly, the

three functions are equal to 1 in (0, 0). By assumption they are analytic for small values

of x, y. Hence, we can study the behaviour of these functions around (0, 0) by considering

a partial Taylor expansion. Let ∇f(x, y) denote the gradient(∂∂xf(x, y), ∂∂yf(x, y)

)of f

in (x, y). Then

f(x, y) = f(0, 0) + (x, y)∇f(θx, θy)T = 1 + (x, y)∇f(θx, θy)T , (3.12)

for some θ ∈ [0, 1]. Choose any ε > 0, and consider the convex cone

Cε := (x, y) | (x, y)

‖(x, y)‖· ∇f(0, 0)T .

‖∇f(0, 0)‖< −ε,

where ‖ · ‖ denotes a norm in IR2. Then Cε ⊂ (x, y) | (x, y)∇f(0, 0)T < 0. By the

continuity of∇f in a small disk around (0, 0), a δε>0 exists, such that (x, y)∇f(δ1, δ2)T <0,

∀ (δ1, δ2) with ‖(δ1, δ2)‖ < δε, k = 1, 2 and ∀ (x, y) ∈ Cε. Combination with (3.12) yields

f(x, y) < 1, ∀ (x, y) ∈ Cε with ‖(x, y)‖ < δε, (3.13)

Page 59: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 49

with similar conclusions for fk(x, y), k = 1, 2. As a consequence, it is useful the study the

next three convex cones

R =(x, y) | x, y > 0, (x, y)∇f(0, 0)T < 0

Rk =(x, y) | x, y > 0, (x, y)∇fk(0, 0)T < 0, k = 1, 2,

with∇f(0, 0) =

(λ1 − r1(1− r2), λ2 − r2(1− r1)

)∇f1(0, 0) =(λ1 − r1, λ2)

∇f2(0, 0) =(λ1, λ2 − r2).

The quadrants, including the coordinate axis, will be denoted successively as QI , . . . , QIV ,

where the enumeration is anticlockwise. Then the ergodicity conditions imply that∇f(0, 0),

∇fk(0, 0) 6∈ QI , k = 1, 2, so that R, R1 and R2 are not empty. The following connection

between the ergodicity conditions and the gradients can be derived.

Lemma 3.2

(A) ⇐⇒ n(1) :=(∇f1(0, 0)2, −∇f1(0, 0)1

)∈ R =⇒R1 ∩R 6= ∅i)

(B) ⇐⇒ n(2) :=(−∇f2(0, 0)2, ∇f2(0, 0)1

)∈ R =⇒R2 ∩R 6= ∅.ii)

Proof: It is sufficient to prove i), since the arguments for ii) are analogous. Notice that

n(1) ∈ QoI iff λ1 < r1. As R ⊂ QoI we have

n(1) ∈ R ⇐⇒∇f1(0, 0)2∇f(0, 0)1 −∇f1(0, 0)1∇f(0, 0)2 < 0

λ1 < r1

⇐⇒λ2

(λ1 − r1(1− r2)

)− (λ1 − r1)

(λ2 − r2(1− r1)

)< 0

λ1 < r1

⇐⇒r2r1(λ2 − r2) + (1− r1)(λ1 − r1) + r1r2 < 0

λ1 < r1

⇐⇒ (A),

by (3.7) and the fact that (A) implies λ1 < r1. Furthermore, n(k) ∈ R together with

n(k)⊥∇fk(0, 0) imply that Rk ∩R 6= ∅, k = 1, 2.

Constant sequences xkk, yll can be chosen, if R∩R1 ∩R2 6= ∅. However, this is not

necessarily true. For R ∩ R1 ∩ R2 not to be empty, we need that R1 ∩ R2 6= ∅. By the

ergodicity conditions ∇f1(0, 0) ∈ QoII and ∇f2(0, 0) ∈ QoIV . Consequently, R1 ∩ R2 6= ∅iff n(1) ∈ R2 (or n(2) ∈ R1). So,

R1 ∩R2 6= ∅ ⇐⇒ ∇f1(0, 0)2∇f2(0, 0)1 −∇f1(0, 0)1∇f2(0, 0)2 < 0

⇐⇒ r1λ2 + r2λ1 < r1r2.(3.14)

But r1λ2 + r2λ1 < r1r2 is not precluded by the ergodicity conditions. Hence, we will

consider the various combinations of r1, r2, λ1 and λ2 separately.

Page 60: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

50 Part I, Chapter 3

Case 1: r1 + r2 ≤ 1.

First suppose the simple case that r1λ2 + r2λ1 < r1r2. By virtue of ergodicity condition i)

and Lemma 3.2, n(1) ∈ R. By (3.14) n(1) ∈ R2. Since n(1)⊥∇f1(0, 0), R∩R1 ∩R2 6= ∅.Therefore, there are (x∗, y∗) ∈ R ∩ R1 ∩ R2 and β < 1, for which f(x∗, y∗), f1(x∗, y∗),

f2(x∗, y∗) ≤ β < 1. Here the same considerations that lead to (3.13) can be applied, since

R∩R1 ∩R2 is an open convex cone.

Choose xk = x∗, yl = y∗, k, l ∈ IN, and c = 1 in expression (3.2). For this µ-vector and

any finite set M ⊂ E

∑(k,l)6∈M

P(i,j)(k,l)µ(k,l)

µ(i,j)≤ v(i, j) =

f(x∗, y∗) ≤ β, i, j ≥ 1

f1(x∗, y∗) ≤ β, i ≥ 1, j = 0

f2(x∗, y∗) ≤ β, i = 0, j ≥ 1.

(3.15)

Because (Pµ)(0,0) =∑k,l p

1kp

2l (1 + x∗)k(1 + y∗)l < ∞, a finite set M can be chosen to

satisfy

(MP µ)(0,0) ≤ β < 1.

Together with µ(0,0) = 1 and (3.15) this establishes (3.8), hence property µ− GR.

Next suppose that r1λ2 + r2λ1 ≥ r1r2, i.e. R1 ∩R2 = ∅. Then certainly R∩R1 ∩R2 = ∅,and there is no information whether solutions (x, y) exist, such that f(x, y), f1(x, y) and

f2(x, y) < 1 simultaneously. Hence it may not be possible to choose the sequences xkkand yll in (3.2) to equal fixed constants x and y.

So we take the following approach. Choose (x∗, y∗) ∈ R. Recall that ∇f1(0, 0) ∈ QoIIand ∇f2(0, 0) ∈ QoIV , so that R1 contains the positive x-axis and R2 the positive y-axis.

B virtu o Lemm 3.2 Rk ∩ R 6= ∅ k = 1, 2 s ther ar x∗1 ≤ x∗ and y∗1 ≤ y∗ for which

(x∗, y∗1) ∈ R∩R1 and (x∗1, y∗) ∈ R∩R2. Because R is a convex cone, R∗ := (x, y) | x =

x∗, y ∈ [y∗1 , y∗] or x ∈ [x∗1, x

∗], y = y∗ ⊂ R. In fact, R∗ is a rectangular path from R1 to

R2, which is contained in R.

Similarly to the derivation of (3.13) we can choose x∗, y∗, x∗1 and y∗1 small enough as to

satisfy m := maxmax(x,y)∈R∗ f(x, y), f1(x∗, y∗1), f2(x∗1, y∗) < 1, by dividing them all by

a fixed, large enough constant. It is our object to choose the sequences xkk, yll in

(3.2) such, that x1 = x∗1, y1 = y∗1 ; furthermore xk = x∗ and yl = y∗ for k ≥ I∗, l ≥ J∗,

where I∗, J∗ have to be determined yet. To this end we reduce the analysis of f, f1 and

f2 to an analysis of finite summations.

Let I, J ∈ IN be such that

S(I, J) :=∑

k>I,l>J

p1kp

2l (1 + x∗)k(1 + y∗)l ≤ 1−m

3.

We fill the gap between x∗1, x∗ on one hand, and y∗1 , y∗ on the other, by slowly increasing

x∗1, y∗1 . Keeping this and expression (3.11) in mind, we consider the following perturbations

Page 61: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 51

of truncated versions of f(x, y∗) and f(x∗, y),

t1(x, δ) :=∑k≤Il≤J

pk1p2l (1 + x)k(1 + y∗)l

(1− r1(1− r2)

x− δ1 + x− δ

− r2(1− r1)y∗

1 + y∗

)

t2(y, δ) :=∑k≤Il≤J

pk1p2l (1 + x∗)k(1 + y)l

(1− r1(1− r2)

x∗

1 + x∗− r2(1− r1)

y − δ1 + y − δ

).

t1, t2 are uniformly continuous on [x∗1, x∗]× [0, 1] and [y∗1 , y

∗]× [0, 1] respectively. Hence,

there is a δ∗ ∈(0,min(x∗1, y

∗1)), such that ∀ δ ∈ [0, δ∗]

|t1(x, δ)− t1(x, 0)| < 1−m3

, ∀x ∈ [x∗1, x∗]

|t2(y, δ)− t2(y, 0)| < 1−m3

, ∀ y ∈ [y∗1 , y∗].

Define the following sequences xkk, yll in (3.2)

xk :=

x∗1 + (n− 1)δ∗, (n− 1)I < k ≤ nI, 1 ≤ n ≤ I∗

I:=⌈x∗ − x∗1

δ∗

⌉x∗, k > I∗

yl :=

y∗1 + (n− 1)δ∗, (n− 1)J < l ≤ nJ, 1 ≤ n ≤ J∗

J:=⌈y∗ − y∗1

δ∗

⌉y∗, l > J∗.

Notice that xk, yl are constant for at least I, J successive values of k, l respectively. So,

if we truncate v(i, j) to a finite summation over k ≤ I and l ≤ J , at most two different

values of xk and yl appear in the resulting expression. We claim that this choice together

with c = 1 yields a suitable bounding vector µ with structure (3.2). Indeed, insertion in

(3.11) of the defined sequences establishes the following lemma.

Lemma 3.3 v(i, j) ≤ (2 +m)/3, for i > I∗ or j > J∗.

Proof: The case that i > I∗ and j > J∗ is trivial. Indeed, then xi+k = x∗ and yj+l = y∗,

∀ k, l ∈ IN0. Hence, v(i, j) = f(x∗, y∗) ≤ m. Since m < 1, the assertion follows. Suppose

that i > I∗ and j = 0. Then

v(i, j) =∑k,l≥0

p1kp

2l (1 + x∗)k

l∏m=1

(1 + ym)

1− r1x∗

1 + x∗

≤∑k≤Il≤J

p1kp

2l (1 + x∗)k(1 + y∗1)l

1− r1

x∗

1 + x∗

+ S(I, J)

≤ f1(x∗, y∗1) + S(I, J) ≤ m+1−m

3≤ 2 +m

3.

Page 62: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

52 Part I, Chapter 3

Let i > I∗ and (n− 1)J < j ≤ nJ , n ≤ J∗/J . Using that ym ≤ ynJ , for m ≤ j + J , and

− yj+l1 + yj+l

≤ −y(n−1)J

1 + y(n−1)J≤ − ynJ − δ∗

1 + ynJ − δ∗, j + l ≥ j > (n− 1)J,

we obtain

v(i, j) =∑k,l≥0

(1 + x∗)kj+l∏

m=j+1

(1 + ym)

1− r1(1− r2)x∗

1 + x∗− r2(1− r1)

yj+l1 + yj+l

≤∑k≤Il≤J

(1 + x∗)kj+l∏

m=j+1

(1 + ym)

1− r1(1− r2)x∗

1 + x∗− r2(1− r1)

yj+l1 + yj+l

+ S(I, J)

≤ t2(ynJ , δ∗) + S(I, J) ≤ t2(ynJ , 0) + 2

1−m3

≤ f(x∗, ynJ) + 21−m

3≤ 2 +m

3.

For j > J∗ and i = 0 or (n− 1)I < i ≤ nI, n ≤ I∗/I, the derivation is similar.

Choose the finite set M := (i, j) | i ≤ I∗+I, j ≤ J∗+J and β = (2+m)/3. Combination

of the inequalities above with (3.11) and (3.9) directly yields the validity of (3.8), since

∑(k,l) 6∈M

P(i,j)(k,l)µ(k,l)

µ(i,j)≤

v(i, j) ≤ 2 +m

3= β, i > I∗ or j > J∗

S(I, J) ≤ 1−m3≤ β, i ≤ I∗, j ≤ J∗.

This completes the proof of µ− GR for r1 + r2 ≤ 1.

Case 2: r1 + r2 > 1. By ergodicity condition ii) at least (A) or (B) holds. By Lemma 3.2

possibly n(1) 6∈ R or n(2) 6∈ R, so that one of the sets R∩Rk, k = 1, 2, might be empty. If

R1 ∩R2 = ∅ as well, a set R∗, which is a rectangular path from R1 to R2 via R, may not

exist. The existence of such a set is crucial for our construction. However, R1 ∩R2 = ∅ is

precluded by the assumptions. Suppose the contrary, then by (3.14) r1λ2+r2λ1−r1r2 ≥ 0.

Together with λk − rk < 0, k = 1, 2 and 1 < r1 + r2 this implies

0 ≤ r2(λ1 − r1) + λ2r1 ≤ (1− r1)(λ1 − r1) + λ2r1

0 ≤ r1(λ2 − r2) + λ1r2 ≤ (1− r2)(λ2 − r2) + λ1r2,

so that both (A) and (B) are violated. We conclude that R1 ∩R2 6= ∅. By the directions

of the gradients, this means that n(1) ∈ R2 and n(2) ∈ R1. Moreover Lemma 3.2 implies

that n(1) ∈ R or n(2) ∈ R. Combination gives n(1) ∈ R ∩ R2 or n(2) ∈ R ∩ R1. As

n(k)⊥∇fk(0, 0), k = 1, 2, this means that R∩R1∩R2 6= ∅. The construction of a suitable

µ-vector now proceeds similarly to the case, that r1 + r2 ≤ 1 and R1 ∩R2 6= ∅.

Thus the ALOHA-model is strong recurrent, if ergodicity and a spatial geometric bound-

edness condition are assumed. Consequently it is strong convergent and geometrically

Page 63: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 53

ergodic by the Key theorem and Lemma 2.6. Next we show that the spatial geometric

boundedness condition can not be relaxed. In fact, it is implied by geometric ergodicity,

as we will show with similar arguments to Kendall’s [1960] proof for the M/G/1-queue.

Since the MC is irreducible, Kendall’s [1960] equivalence result states that geometric er-

godicity implies the existence of some x > 1, such that (cf. also the proof of Lemma 2.6

and Proposition 1.3)

F(i,j)(k,l)(x) :=∑n∈IN

F(n)

(i,j)(k,l)xn <∞, ∀ (i, j), (k, l) ∈ E. (3.16)

Let em denote the mth unit vector, for m = 1, 2, then e.g. (i, j) + e1 = (i+ 1, j). First we

show that

F(i,j)(0,0)(x) ≤ F(i,j)+em,(0,0)(x), ∀ (i, j) 6= (0, 0),m = 1, 2. (3.17)

For the derivation we use stochastic comparison results. Let X, Y be random variables

on IN0. By definition Xst≤ Y iff IP(X ≥ n) ≤ IP(Y ≥ n), n ∈ IN0. X

st≤ Y is equivalent to∑

n f(n)IP(X = n) ≤∑n f(n)IP(Y = n) for all non-decreasing functions f : IN0 → IR (cf.

e.g. Ross [1983]).

Introduce random variables T(i,j), (i, j) ∈ E, that record the recurrence time or first

hitting time of state (0,0), when the system starts in state (i, j). Obviously IP(T(i,j)=n) =

F(n)

(i,j)(0,0). Define f(n) = (x)n, n ∈ IN0, then f is non-decreasing on IN for x > 1. Hence,

for (3.17) it suffices to show the following assertion.

Lemma 3.4

T(i,j)st≤ T(i,j)+em ,m = 1, 2. (3.18)

This can be proved by a coupling technique (cf. Ross [1983]) combined with standard

sample path arguments, as can be found very clearly in Ridder [1987]. For completeness

of the argumentation we give the proof here.

Proof: Let m = 1, and denote by Xs(n)n∈IN0, s = 1, 2, the MC with transition matrix

P and initial states (i, j) and (i+ 1, j) respectively. We construct two E-valued stochastic

processes Y s(n)n∈IN0 , s = 1, 2, on a common probability space (Ω,F , P ′) with the

following properties. In the first place, Y 1(n)(ω) ≤ Y 2(n)(ω), ∀ω ∈ Ω, n ∈ IN0, where

“≤” denotes the componentwise vector ordering in E. Secondly, the probability laws of

Y s(n)n and Xs(n) agree, s = 1, 2.

Let Unn∈IN0be sequence of i.i.d. uniformly distributed random variables on (Ω,F , P ′)

with support on [0, 1). The interval [0, 1) is partitioned into a sequence of disjunct, half

open intervals in the following way. Consider any enumeration of the set (k, l, n) | k, l ∈IN0, n = 1, . . . , 4. Denote the mth element of the enumeration as f(m). The length l(m)

of the mth interval is defined as

l(m) := p1kp

2l

1n=1r1r2 + 1n=2r1(1− r2) + 1n=3r2(1− r1)

+1n=4(1− r1)(1− r2), if f(m) = (k, l, n),

Page 64: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

54 Part I, Chapter 3

for m ∈ IN. The sequence If(m)m∈IN of intervals is defined as

If(1) :=[0, l(1)

)If(m) :=

[m−1∑n=1

l(n),

m∑n=1

l(n)), m ≥ 2.

We set Y 1(0)(ω) = (i, j), Y 2(0)(ω) = (i + 1, j), ∀ω ∈ Ω. Consider any ω ∈ Ω, n ∈ IN0,

s ∈ 1, 2 and let Y s(n)(ω) = y, U(n)(ω) = u.

Y s(n+ 1) := y

+1y1>0,y2>0

∑k,l≥0

(k, l) · 1u ∈ I(k,l,1) ∨ u ∈ I(k,l,4) +∑k,l≥0

(k − 1, l) · 1u ∈ I(k,l,2)

+∑k,l≥0

(k, l − 1) · 1u ∈ I(k,l,3)

+1y1=0,y2>0

∑k,l≥0

(k, l) · 1u ∈ I(k,l,2) ∨ u ∈ I(k,l,4)

+∑k,l≥0

(k, l − 1) · 1u ∈ I(k,l,1) ∨ u ∈ I(k,l,3)

+1y1>0,y2=0

∑k,l≥0

(k, l) · 1u ∈ I(k,l,3) ∨ u ∈ I(k,l,4)

+∑

(k,l≥0

(k − 1, l) · 1u ∈ I(k,l,1) ∨ u ∈ I(k,l,2)

+1y1=0,y2=0

∑k,l≥0

(k, l) · 1u ∈ ∪4n=1I(k,l,n).

Straightforward comparison yields that the processes Y s(n)n, s = 1, 2, satisfy the second

property. The first property is easily shown with induction to n ∈ IN0. Indeed, by

definition it holds for n = 0. Suppose that Y1(n)(ω) ≤ Y2(n)(ω), ∀ω ∈ Ω and some

n ∈ IN0. If Y 1(n)(ω), Y 2(n)(ω) are both on the same coordinate axis but not in (0, 0),

or both have positve coordinates, their evolution in one step is the same, so that the

inequality is preserved. For ease of notation we set y := Y 1(n)(ω), y′ := Y 2(n)(ω) and

u := U(n+ 1)(ω). Suppose that y1 = 0 and y′1 ≥ 1. Then,

u ∈ I(k,l,1) =⇒Y 1(n+ 1)(ω) := y + (k, l − 1)1y2>0 + (k, l)1y2=0Y 2(n+ 1)(ω) := y′ + (k, l)1y′2>0 + (k − 1, l)1y′2=0

u ∈ I(k,l,2) =⇒Y 1(n+ 1)(ω) := y + (k, l)

Y 2(n+ 1)(ω) := y′ + (k − 1, l)

u ∈ I(k,l,3) =⇒Y 1(n+ 1)(ω) := y + (k, l − 1)1y2>0 + (k, l)1y2=0Y 2(n+ 1)(ω) := y′ + (k, l − 1)1y′2>0 + (k, l)1y′2=0

u ∈ I(k,l,4) =⇒Y 1(n+ 1)(ω) := y + (k, l)

Y 2(n+ 1)(ω) := y′ + (k, l).

Page 65: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 55

A problem can arise only, when u ∈ I(k,l,1) or u ∈ I(k,l,2). However, since y′1 ≥ y1 + 1,

Y 21 (n+ 1)(ω) ≥ Y 1

1 (n+ 1)(ω). The case that y2 = 0 and y′2 ≥ 1 is solved analogously.

Let us define the random variable T s on (Ω,F , P ′) to be the first hitting time of state

(0, 0) for the process Y s(n)n, s = 1, 2. Obviously,

T 1(ω) ≤ T 2(ω), ∀ω ∈ Ω. (3.19)

Since Xs(n)n, and Y s(n)n are stochastically indistinguishable, s = 1, 2, so are T(i,j),

T 1 on one hand and T(i+1,j), T2 on the other. Combination with (3.19) yields (3.18), thus

completing the proof of the lemma.

Because in one step from state (i, j)+em only states (k, l) can be reached with k+l ≥ i+j,the monotonicity result of Lemma 3.4 yields that

F(i,j)+em(0,0)(x) = x∑(k,l)

P(i,j)+em(k,l) · F(k,l)(0,0)(x)

≥ x · min(k,l):k+l=i+j

F(k,l)(0,0)(x), ∀ (i, j) 6= (0, 0).

Consequently, F(i,j)(0,0)(x) ≥ xi+j−1 min(k,l):k+l=1 F(k,l)(0,0)(x). This concludes the proof,

sinceF(0,0)(0,0)(x) = x ·

∑(k,l)6=(0,0)

p1kp

2l F(k,l)(0,0)(x) + xp1

0p20

≥∑

(k,l) 6=(0,0)

p1kp

2l xk+l · min

(k,l):k+l=1F(k,l)(0,0)(x) + xp1

0p20

≥∑k,l≥0

p1kp

2l xk+l,

where for the last inequality we use that F(k,l)(0,0)(x) ≥ F(k,l)(0,0)(1) = 1, since x > 1.

Consequently IEeεAk <∞ if eε ≤ x.

Notice that µ − GR implies ergodicity. Furthermore, condition i) of Proposition 3.2 is

satisfied for n∗ = 1. Combination with the Key theorem and Propositions 3.1, 3.2 proves

the following theorem.

Theorem 3.2 For the ALOHA-type system ergodicity together with spatial geometric

boundedness are equivalent to µ−GE for µ a productform bounding vector with structure

(3.2). It is not strongly ergodic and the result of Proposition 3.1 applies.

Page 66: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

56 Part I, Chapter 3

4. Two coupled processors.

This queueing system was studied by Fayolle & Iasnogorodski [1979] to obtain conditions

for ergodicity. They derive a complete characterization, which only involves the first

moments of the arrival and service time distributions. This analysis uses deep results from

complex function theory. The authors give conditions for the existence of a productform

stationary distribution as well. Sufficient conditions for ergodicity of a generalized version

with r processors were derived by Szpankowski [1988].

The model consists of two service-centres with infinite buffers. The arrival stream at centre

k is Poisson (λk) distributed, k = 1, 2. The service time distributions in the respective

centres are exponentially distributed. We allow the service rate in centre k to depend on

the number of customers present in the other centre, in the following sense: the service

rate is νk if the server at the other centre is serving, otherwise it is ν∗k , k = 1, 2.

State (i, j) corresponds to i customers in centre 1 and j in centre 2. If X(t) denotes the

state of the system at time t, then X(t)t≥0 is a standard, conservative and uniformizable

MP.

In this section we show µ − GR of the AMC. By the Key theorem and Theorem 2.3 this

establishes both µ−GE of the AMC and µ−EE of the MP. So, let h < (λ1 +λ2 +maxν1 +

ν2, ν∗1 , ν∗2)−1. The transition matrix P of the corresponding approximating chain is equal

to I + hQ, so that

P(i,j)(i+1,j) = λ1h

P(i,j)(i−1,j) = δ(i)δ(j)ν1h+ δ(i)(1− δ(j)

)ν∗1h

P(i,j)(i,j+1) = λ2h

P(i,j)(i,j−1) = δ(i)δ(j)ν2h+(1− δ(i)

)δ(j)ν∗2h

P(i,j)(i,j) = 1−∑

(k,l)6=(i,j)

P(i,j)(k,l).

Both MC and MP are ergodic iff the following condition is satisfied, after possibly renum-

bering the centres (cf. Fayolle & Iasnogorodski [1979]).

Ergodicity condition: Conditions i) to iii) hold:

λ1 < ν1 (A)i)

ν∗2 >ν1λ2 − ν2λ1

ν1 − λ1(B)ii)

iiia) If (ν1 − ν∗1 )(ν2 − ν∗2 ) = ν1ν2, then

ν∗1 , ν∗2 6= 0 (C)

iiib) If (ν1 − ν∗1 )(ν2 − ν∗2 ) 6= ν1ν2 and ν2 > λ2, then

ν∗1 >ν2λ1 − ν1λ2

ν2 − λ2. (D)

Page 67: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 57

Notice that (A), (B) and (C) imply (D), if (ν1 − ν∗1 )(ν2 − ν∗2 ) = ν1ν2 and ν2 > λ2.

In the sequel the ergodicity condition will be assumed to hold. No further assumption is

necessary, since all relevant distributions are either Poisson or exponentially distributed

and the process is aperiodic.

Strong recurrence of this model is shown by the construction of a µ-vector, such that (3.8)

is satisfied for some finite set M ⊂ E and β < 1. Similarly as for the ALOHA model we

study (3.9) first. Insert expression (3.2) and the transition probabilities, then

(3.9)⇐⇒ λ1h(1 + xi+1) + δ(i)[δ(j)ν1 +

(1− δ(j)

)ν∗1

] h

1 + xi

+λ2h(1 + yj+1) + δ(j)[δ(i)ν2 +

(1− δ(i)

)ν∗2

] h

1 + yj

+1− hλ1 + λ2 + δ(i)

[δ(j)ν1 +

(1− δ(j)

)ν∗1

]+δ(j)

[δ(i)ν2 +

(1− δ(i)

)ν∗2

]≤ β

⇐⇒ 1 + λ1hxi+1 + λ2hyj+1 − δ(i)h[δ(j)ν1 +

(1− δ(j)

)ν∗1

](1− 1

1 + xi

)−δ(j)h

[δ(i)ν2 +

(1− δ(i)

)ν∗2

](1− 1

1 + yj

)≤ β

⇐⇒ v(i, j) := δ(i)[δ(j)ν1 +

(1− δ(j)

)ν∗1

] xi1 + xi

+ δ(j)[δ(i)ν2 +

(1− δ(i)

)ν∗2

] yj1 + yj

−λ1xi+1 − λ2yj+1 ≥1− βh

> 0. (3.20)

The definition of v(i, j) via the lefthandside of the second inequality would result in a

better correspondence with the analysis in section 3.3. However, we prefer to adopt the

notation of Chapter 9, where the two centre open Jackson network is analysed.

As in the previous section we start by considering sequences xkk, yll that are equal

to constants x and y. Then v(i, j) has three forms that are important for our discussion,

which are

v(i, j) =

f(x, y) :=ν1

x

1 + x+ν2

y

1 + y− λ1x− λ2y, i, j > 0

f1(x, y) :=ν∗1x

1 + x− λ1x− λ2y, i > 0, j = 0

f2(x, y) := ν∗2y

1 + y− λ1x− λ2y, i = 0, j > 0.

(3.21)

Hence, constant sequences in the expression for the bounding vector exist, if there are

x, y, such that f(x, y), f1(x, y) and f2(x, y) are simultaneously positive. Suppose that

e.g. f(x, y) > 0 for some x, y > 0, then f(xc−1, yc−1) > 0, for all constants c > 1. So

it is sufficient to study the behaviour of the three functions in a neighbourhood of (0, 0).

Choose ε > 0 and define the convex cone Cε as

Cε :=

(x, y) | (x, y)

‖(x, y)‖· ∇f(0, 0)T

‖∇f(0, 0)‖> ε.

Page 68: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

58 Part I, Chapter 3

As in the previous section, we use a partial Taylor expansion and the fact that f(0, 0) =

f1(0, 0) = f2(0, 0) = 0 to establish the existence of a δε, such that

f(x, y) > 0, ∀ (x, y) ∈ Cε, with ‖(x, y)‖ < δε,

with similar conclusions for f1, f2. Therefore the following convex cones are of interest

R = (x, y) | (x, y)∇f(0, 0)T > 0

Rk = (x, y) | (x, y)∇fk(0, 0)T > 0, k = 1, 2,

with

∇f(0, 0) = (ν1 − λ1, ν2 − λ2)

∇f1(0, 0) = (ν∗1 − λ1,−λ2)

∇f2(0, 0) = (−λ1, ν∗2 − λ2).

In contrast to section 3.3 we do not require R,R1,R2 ⊂ QoI . The ergodicity conditions

in this section are more complex, and in the course of this section it will become clear

that it is not possible to restrict ourselves to QI only. These conditions imply, that

∇f(0, 0) ∈ (QI ∪ QIV )o, ∇f1(0, 0) ∈ (QIII ∪ QIV )o and ∇f2(0, 0) ∈ (QII ∪ QIII)o.The following lemmas study the relations between the various regions and the ergodicity

conditions.

Lemma 3.5 Assume (A).

R∩QoI 6= ∅i)

(B) ⇐⇒ n :=(−∇f(0, 0)2,∇f(0, 0)1

)∈ R2 =⇒ R∩R2 6= ∅ii)

iii) If ν2 > λ2, then

(D)⇐⇒ −n ∈ R1 =⇒ R∩R1 6= ∅.Proof: i) Since ν1 − λ1 > 0, ∇f(0, 0) 6= (0, ν2 − λ2) so that R∩QoI 6= ∅.

n ∈ R2 ⇐⇒ −∇f(0, 0)2∇f2(0, 0)1 +∇f(0, 0)1∇f2(0, 0)2 > 0ii)

⇐⇒ −(ν2 − λ2)(−λ1) + (ν1 − λ1)(ν∗2 − λ2) > 0

⇐⇒ (B).

As n⊥∇f(0, 0), R∩R2 6= ∅. The proof of iii) is similar.

Recall that constant sequences xkk, yll for the bounding vector exist iff R∩R1∩R2∩QoI 6= ∅. So we need more precise information on the relation between the sets R ∩ Rk,

k = 1, 2, and the respective quadrants. This is provided by the lemma below.

Page 69: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 59

Lemma 3.6 Assume (A).

i) For (ν1 − ν∗1 )(ν2 − ν∗2 ) = ν1ν2 and (C)

(B) ⇐⇒ λ1

ν∗1+λ2

ν∗2< 1 ⇐⇒ n(1) :=

(−∇f1(0, 0)2,∇f1(0, 0)1

)∈ R ∩R2 ∩QoI

=⇒ R∩R1 ∩R2 ∩QoI 6= ∅.

ii) For ν2 > λ2,

(B) ⇐⇒ n ∈ R2 ∩QoII =⇒ R∩R2 ∩QoII 6= ∅(D) ⇐⇒ − n ∈ R1 ∩QoIV =⇒ R∩R1 ∩QoIV 6= ∅.

iii) For ν2 ≤ λ2,

(B) ⇐⇒ n ∈ R2 ∩QI \ (x, y) | y = 0 =⇒ R∩R2 ∩QoI 6= ∅R ∩R1 ∩QoIV 6= ∅.

Proof: i) The first equivalence is easy to check by writing ν1 = ξν∗1 and ν2 = (1− ξ)ν∗2 . It

follows from Fayolle & Iasnogorodski [1979] as well. Furthermore n(1) ∈ QoI iff ν∗1−λ1 > 0.

Consequently,

n(1) ∈ R2 ∩QoI ⇐⇒n(1)∇f2(0, 0)T > 0

ν∗1 − λ1 > 0

⇐⇒

λ2(−λ1) + (ν∗1 − λ1)(ν∗2 − λ2) = ν∗1ν∗2

(1− λ1

ν∗1− λ2

ν∗2

)> 0

ν∗1 − λ1 > 0

⇐⇒ (B),

since (B) implies ν∗1 − λ1 > 0, through the first equivalence. Similarly we prove that (B)

and n(1) ∈ R∩QoI are equivalent, by inserting ν1 = ξν∗1 and ν2 = (1−ξ)ν∗2 . Together they

yield the second equivalence. As n(1)⊥∇f1(0, 0), (B) implies that R∩R1 ∩R2 ∩QoI 6= ∅.

ii) The assumptions imply that n ∈ QoII and −n ∈ QoIV . Use the proof of Lemma 3.5.

iii) The second assertion trivially holds, as ∇f(0, 0) ∈ QIV and ∇f1(0, 0) ∈ (QIII ∩QIV )o.

We obtain the equivalence by remarking that ν2 ≤ λ2 implies that n ∈ QI \(x, y) | y = 0.The assertion follows similarly to the proof of Lemma 3.5.

For the construction of a suitable bounding vector we study the different sets of ergodicity

conditions separately.

Case 1: (ν1 − ν∗1 )(ν2 − ν∗2 ) = ν1ν2.

Since (B) holds by assumption, we apply Lemma 3.6i) to obtain that R∩R1∩R2∩QoI 6= ∅.The construction of a µ-vector is standard now. Let (x∗, y∗) ∈ R ∩ R1 ∩ R2 ∩ QoI , such

that m := minf(x∗, y∗), f1(x∗, y∗), f2(x∗, y∗) > 0. Using a partial Taylor expansion, we

can always achieve this for sufficiently small x∗, y∗, since R ∩ R1 ∩ R2 ∩ QoI is a convex

cone. Then µ with µ(i,j) = (1 + x∗)i(1 + y∗)l is a suitable bounding vector. Indeed, if we

Page 70: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

60 Part I, Chapter 3

choose β < 1, with (1 − β)/h ≤ m, then v(i, j) ≥ (1 − β)/h, for (i, j) 6= (0, 0), by (3.21).

Together with (3.9) and (3.20) this yields for the finite set M := (i, j) | i, j ≥ 1,

∑(k,l)6∈M

P(i,j)(k,l)µ(k,l)

µ(i,j)

= 0, (i, j) = (0, 0)

≤∑j

P(i,j)(k,l)µ(k,l)

µ(i,j)≤ β, (i, j) 6= (0, 0).

Case 2: (ν1 − ν∗1 )(ν2 − ν∗2 ) 6= ν1ν2 and ν2 > λ2.

(B) and (D) hold by assumption. Combination with Lemma 3.6ii) establishes R ∩ R2 ∩QoII 6= ∅, R∩R1 ∩QoIV 6= ∅ and R ⊃ QI .

The ergodicity condition yields no information on R ∩ R1 ∩ R2 ∩ QoI . Hence, this set

may be empty, so that there is no information on the existence of positive x, y, for which

f(x, y), f1(x, y), f2(x, y) > 0 simultaneously. Thus constant sequences for the bounding

vector need not exist and we have to construct non-decreasing sequences xkk, yll.

Let (x∗, y∗) ∈ R ∩ QoI , x∗1 ≤ x∗ such, that (x∗1, y∗) ∈ R ∩ R2 ∩ QoII and y∗1 ≤ y∗ such,

that (x∗, y∗1) ∈ R ∩ R1 ∩ QoIV . Thus x∗1, y∗1 < 0. Define R∗ := (x, y) | x = x∗, y ∈

[y∗1 , y∗] or x ∈ [x∗1, x

∗], y = y∗, then R∗ ⊂ R as R is a convex cone. By arguments that

have been frequently used in this chapter, x∗, y∗, x∗1 and y∗1 can be chosen sufficiently

small as to satisfy m := minmin(x,y)∈R∗ f(x, y), f1(x∗, y∗1),f2(x∗1, y∗) > 0. Let δ∗ satisfy

δ∗ < m/(2 maxλ1, λ2). We choose the following sequences xkk, yll in (3.2):

xk :=

x∗1 + (k − 1)δ∗, k ≤ I∗ :=⌈x∗ − x∗1

δ∗

⌉x∗, k > I∗

yl :=

y∗1 + (l − 1)δ∗, l ≤ J∗ :=⌈y∗ − y∗1

δ∗

⌉y∗, l > J∗.

Inserting this in (3.20) we obtain

v(i, j)

= f(x∗, y∗) ≥ m, i > I∗, j > J∗

≥ f(x∗, yj)− λ2δ∗ ≥ m

2, i > I∗, 0 < j ≤ J∗

= f1(x∗, y∗1) ≥ m, i > I∗, j = 0

≥ f(xi, y∗)− λ1δ

∗ ≥ m

2, 0 < i ≤ I∗, j > J∗

= f2(x∗1, y∗) ≥ m, i = 0, j > J∗,

where we will only explain the second inequality. Let i > I∗, 0 < j ≤ J∗. Then

v(i, j) = ν1x∗

1 + x∗+ ν2

yj1 + yj

− λ1x∗ − λ2yj+1

=

f(x∗, yj − λ2δ

∗) ≥ m

2, for

j ≤ J − 1

j = J,

Page 71: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 61

since (x∗, yj) ∈ R∗, and λ2δ∗ ≤ m/2. To obtain a vector µ with µ(i,j) ≥ 1, ∀ (i, j) ∈ E, we

set c =(∏

k:xk<0(1 + xk)∏l:yl<0(1 + yl)

)−1. The model is µ− GR for µ of structure (3.2)

with the above specified sequences xkk, yll and constant c, if we set M := (i, j) | i ≤I∗ + 1, j ≤ J∗ + 1 and β < 1, with (1− β)/h ≤ m/2. Indeed, using inequality (3.20) we

notice that (3.9) is satisfied, since

∑(k,l) 6∈M

P(i,j)(k,l)µ(k,l)

µ(i,j)

≤ β, i > I∗ or j > J∗

= 0, i ≤ I∗, j ≤ J∗.

Case 3: (ν1 − ν∗1 )(ν2 − ν∗2 ) 6= ν1ν2 and λ2 ≥ ν2.

The construction is similar to the case that λ2 < ν2. However, since (B) holds by as-

sumption, Lemma 3.6iii) implies that R∩R2 ∩QoI 6= ∅. So we can choose x∗1 = x∗. Then

xk = x∗, ∀ k ∈ IN. The remainder of the analysis proceeds analogously.

Remark that our construction requires the existence of a set R∗, with possibly x∗1 = x∗

and y∗1 = y∗, thus needing the existence of a rectangular path from R1 to R2 via R,

intersecting QoI . In fact all two-dimensional models we studied, have such a property if

they are ergodic. It is an open problem whether such a property is generally due to some

underlying property of these models.

Thus strong recurrence of the coupled processors model is established. Hence Key theo-

rem I and Theorem 2.3 apply. Also condition i) of Proposition 3.2 is satisfied, and µ has

structure (3.2). By virtue of Corollary 2.1 strong recurrence implies ergodicity and the

following theorem is proved.

Theorem 3.3 The coupled processors model with two processors is ergodic iff the AMC

is µ − GE, or the MP is µ − EE, for µ a bounding vector with structure (3.2). Then the

result of Proposition 3.1 is valid.Moreover, neither MP nor AMC are strongly ergodic.

Page 72: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

62 Part I, Chapter 4

CHAPTER FOUR

The Region of Convergence of (1− z)P (z).

1. Analyticity in the disk D0,R.

1.1. On Kendall’s criterion for geometric ergodicity.

In Chapters 1 and 2 we discussed the relation between geometric ergodicity -as defined

through (1.1)- and the convergence radius of the probability generating function FiB(z) of

the recurrence times to a finite set B of reference states (cf. Proposition 1.3, Lemma 2.6).

The proof of Lemma 2.6 uses the following direct implication of the geometric ergodicity

property: the generating function Pij(z) =∑n∈IN0

Pnijz

n is meromorphic in a common

disk D0,R, for some R > 1 and all pairs i, j ∈ E, with at most one simple pole (i.e. a pole

of order 1) in z = 1. For convenience we will state this implication of geometric ergodicity

in a slightly different way.

Condition 4.1: There is an R > 1, such that (1 − z)Pij(z) can be analytically continued

in D0,R, ∀ i, j ∈ E.

Indeed for a general MC the Cesaro limit of the sequence Pnij | n ∈ IN0 exists and equals

Πij (cf. Chung [1967]). By a Tauberian theorem (cf. Titchmarsh [1939]) the Abelian limit

exists and consequently equals Πij , i.e.

limx↑1

(1− x)Pij(x) = Πij , ∀ i, j ∈ E, (4.1)

so that z = 1 is a pole of order at most 1. The condition requires that z = 1 is an isolated

singular point.

So, to be more specific, for a geometrically ergodic MC and some R > 1 (cf. (2.21))

Pij(z) =∞∑n=0

(Pnij −Πij

)zn +

Πij

1− z, z ∈ D0,R, i, j ∈ E, (4.2)

is a meromorphic continuation of Pij(z). For irreducible chains Kendall [1960] showed

the converse as well. More detailedly, he proved that analyticity of (1 − z)Pii(z) in a

disk D0,R, for R > 1 and some i ∈ E, implies geometric ergodicity. The sufficiency of

this weaker version of Condition 4.1 is due to the irreducibility of the MC. For the more

general case we will use Condition 4.1, although a slightly weaker version in the spirit

Page 73: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 63

of Kendall’s condition, is sufficient. Let us review the arguments to establish geometric

ergodicity from Condition 4.1. As is well-known, the function Pjj(z) has d poles on the

unit circle if j is positive recurrent with period d (cf. Cinlar [1975]). So, aperiodicity

follows from Condition 4.1.

The analyticity of (1−z)Pij(z) in D0,R, for some R > 1, allows a unique Taylor expansion

in z = 0 that converges for z ∈ D0,R, ∀ i, j ∈ E. The Taylor expansion of (1−z)Pij(z) for

z ∈ D0,1 equals δij +∑∞n=1

(Pnij − P

n−1ij

)zn. By the unicity of this expansion we achieve

that the latter expression is the Taylor expansion for z ∈ D0,R.

Application of Cauchy’s inequality (cf. Titchmarsh [1939] 2.5) gives for any R ∈ (1, R)

that

|Pnij − Pn−1ij | ≤ R−n · Cij(R),

where Cij(R) = sup|(1−z)Pij(z)| | z ∈ C0,R. Hence, the sequence Pnijn is a (bounded)

Cauchy sequence in IR. Therefore, it has a limit. If the limit exists, it is equal to the

Cesaro-limit, so that Pnij → Πij , for n tending to infinity. Thus

|Πij − Pnij |= |

N∑m=n+1

(Pmij − P

m−1ij

)+(Πij − P

Nij

)|

≤ Cij(R)

1− R−1R−n + |Πij − P

Nij |, ∀N ≥ n+ 1, i, j ∈ E,

and we conclude that

|Pnij −Πij | ≤ limN→∞

( Cij(R)

1− R−1· R−n + |PNij −Πij |

)=

Cij(R)

1− R−1· R−n, ∀ i, j ∈ E.

Consequently the MC is geometrically ergodic. This proves the following theorem.

Theorem 4.1. Condition 4.1 and geometric ergodicity are equivalent.

Let us study the Taylor expansion of (1− z)Pij(z) more closely, under the assumption of

geometric ergodicity. We write

Dij(z) = Pij(z)−Πij

1− z. (4.3)

Then by assumption Dij(z) is the analytic part of Pij(z) in D0,R, for some R > 1 and all

pairs i, j ∈ E; in formula (cf. (4.2))

Dij(z) =∞∑n=0

(Pnij −Πij

)zn.

For notational simplicity we use P (z), D(z) for the matrix functions with elements Pij(z)

and Dij(z), i, j ∈ E, respectively. Moreover, we have to introduce the following concept.

Page 74: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

64 Part I, Chapter 4

Definition 4.1: The deviation matrix D is the matrix with components

Dij := limx↑1

∞∑n=0

(Pnij −Πij

)xn, i, j ∈ E.

D is also called the ergodic potential (cf. Syski [1978]). Notice that the definition of the

deviation matrix uses the Abelian limit, so as not to preclude periodicity.

The analyticity of Dij(z) implies, that the limit Dij exists and

Dij = Dij(1) =

∞∑n=0

(Pnij −Πij

). (4.4)

It seems that no further information can be extracted from the geometric ergodicity prop-

erty. Indeed, without extra assumptions the convergence of matrix operations is not

guaranteed. Under the additional assumptions of stability and ν <∞, geometric ergodic-

ity is equivalent to strong convergence, by virtue of Lemma 2.6. In the remainder of this

section we will therefore study the µ− GE property, i.e. for some bounding vector µ with

µi ≥ 1 ∀ i ∈ E, and constants c > 0, β < 1, i.e.‖Pn −Π‖µ ≤ cβn

‖P‖µ ≤ c.

The use of this property allows us to work in the Banach space of µ-bounded operators,

hence all matrix operations mentioned in Remark 1.1, can be freely applied. We will do

so without further explicit referring.

1.2. Characterization of strong convergence.

From the foregoing analysis and the proof of Lemma 2.6 we conclude directly that the

convergence radii of (1 − z)Pij(z) and∑n(P

nij − Πij)z

n are equal and independent of

i, j ∈ E, if geometric ergodicity is assumed. Under the assumption of µ− GE this can be

generalized to a similar statement on the matrix functions (1−z)P (z) and∑∞n=0(P

n−Π)zn

as operators in the space of µ-bounded linear operators.

To this end we need some terminology from the theory of the inverse of a closed linear

operator on a normed Banach space, which we borrow from Yosida [1980]. For a µ-bounded

matrix A, the resolvent set ρ(A) is the set λ ∈ C | λI−A has a µ-bounded inverse. The

inverse (λI −A)−1 is denoted by R(λ,A), which is the resolvent of A at λ. Then R(λ,A)

is an analytic matrix function in λ ∈ ρ(A) (to the set of µ-bounded matrices on E × E).

σ(A) =(ρ(A)

)cis the so-called spectrum of A. The spectral radius rσ(A) is defined

as limn→∞ ‖An‖1/nµ . It can be shown that rσ(A) = sup|λ| | λ ∈ σ(A). Moreover,∑∞n=0 λ

−nAn converges in µ-norm for |λ| > rσ(A), and ‖An‖µλ−n → 0, for n → ∞. In

this case R(λ,A) is equal to λ−1∑∞n=0 λ

−nAn. For |λ| < rσ(A) the summation diverges.

Page 75: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 65

These results can be found quite extensively in Yosida [1980], pp. 209-212. For convenience

we will recapitulate them in a proposition, where we insert z = λ−1 for consistence with

our previous notation. Denote 1/rσ(A) by r−1σ (A).

Proposition 4.1

i) R(z−1, A) has an analytic matrix expansion in z−1 ∈ ρ(A). Thus, ‖R(z−1, A)‖µ is

continuous in z−1 ∈ ρ(A).

ii) R(z−1, A) = z∑∞n=0A

nzn for z ∈ D0,r−1σ (A), and ‖An‖µzn → 0, for n→∞.

iii)∑∞n=0A

nzn diverges for |z| > r−1σ (A).

Assume µ−GE and insert A = (P −Π). Then ‖Π‖µ <∞. Combination with Proposition

4.1 yields that the radius of convergence (in µ-norm) of∑∞n=0(Pn − Π)zn =

∑∞n=0(P −

Π)nzn − Π equals r−1σ (P − Π). Moreover, r−1

σ (P − Π) > 1 and by (4.2) (1 − z)P (z) has

the following elementwise analytic continuation in D0,r−1σ (P−Π)

(1− z)P (z) = (1− z)∞∑n=0

(Pn −Π)zn + Π.

By virtue of Proposition 4.1 we conclude that the operator (1−z)P (z) is a µ-bounded and

analytic matrix function in z ∈ D0,r−1σ (P−Π). This is in fact tantamount to the following

condition.

Condition 4.2:

i) There is an R > 1, such that (1− z)Pij(z) can be analytically continued in the disk

D0,R, ∀ i, j ∈ E.

ii) sup‖(1− z)P (z)‖µ | z ∈ C0,x <∞, ∀x ∈ (0, R).

Clearly, by Proposition 4.1i) and the foregoing remarks the µ − GE property implies the

continuity of ‖(1 − z)P (z)‖µ in D0,r−1σ (P−Π). Thus Condition 4.2 is satisfied for R =

r−1σ (P −Π).

Conversely, let us assume Condition 4.2. As in the proof of Theorem 4.1, (1− z)P (z) has

an elementwise unique Taylor expansion in z = 0, which is given by

(1− z)P (z) = I +

∞∑n=1

(Pn − Pn−1)zn.

Application of Cauchy’s integral theorem (cf. Titchmarsh [1939], pp. 80-84) yields for

x ∈ (0, R)

Pnij − P

n−1ij =

1

2πi

∮C0,x

z−n−1(1− z)Pij(z)dz

=1

2πix−n

2π∫0

ie−niφ(1− xeiφ)Pij(xeiφ)dφ, ∀n ∈ IN.

Hence, the Fubini-Tonelli theorem (Royden, [1968], pp. 269-270) gives for x ∈ (0, R)

Page 76: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

66 Part I, Chapter 4

∑j∈E|Pnij − P

n−1ij |µj ≤

1

∑j∈E

x−n2π∫0

|e−niφ| · |(1− xeiφ)Pij(xeiφ)|dφ · µj

=1

2πx−n

2π∫0

∑j∈E|(1− xeiφ)Pij(xe

iφ)|µjdφ

≤ x−n sup‖(1− z)P (z)‖µ | z ∈ C0,xµi, n ∈ IN,

so that for sx := sup‖(1− z)P (z)‖µ | z ∈ C0,x‖Pn − Pn−1‖µ ≤ x−nsx, ∀n ∈ IN, x ∈ (0, R). (4.5)

An analogous reasoning as in the proof of Theorem 4.1 cannot be contrived, since we need

that ‖Pn − Π‖µ → 0, as n → ∞. Therefore we have to resort to the theory of resolvent

operators. Obviously, P (z) has a meromorphic continuation in D0,R \ 1 as a µ-bounded

matrix function, which satisfies

P (z) =1

1− zI +

1

1− z

∞∑n=1

(Pn − Pn−1)zn, (4.6)

with z = 1 an isolated pole of order at most 1. Let x ∈ (1, R), then (4.5) implies that

‖Pn‖µ ≤ 1 + ‖P − I‖µ + ‖P 2 − P‖µ + · · ·+ ‖Pn − Pn−1‖µ ≤sx

x− 1, N ∈ IN.

Consequently, P is a µ-bounded linear operator and rσ(P ) = limn→∞ ‖Pn‖1/nµ ≤(sx/(x−

1))1/n

= 1. Suppose that rσ(P ) < 1. By Proposition 4.1 ‖Pn‖µzn → 0, if n → ∞, in

particular for z = 1. Fix i ∈ E and choose ε > 0, no ∈ IN, such that εµi < 1 and

‖Pno‖µ < ε. As µj ≥ 1 for all j ∈ E,∑j∈E P

noij ≤

∑j∈E P

noij µj ≤ εµi < 1. This

contradicts the stochasticity of P , so that rσ(P ) = 1 and z = 1 is a pole of order precisely

1 of P (z). Thus, Π 6≡ 0.

It is straightforward to check for z ∈ D0,R \ 1, that the extension (4.6) of P (z) satisfies

P (z)(I − zP ) = (I − zP )P (z) = I. The necessary matrix operations are allowed due

to (4.5) and Remark 1.1. This demonstrates, that zP (z) is the resolvent R(z−1, P ) for

z ∈ D0,R \0, 1. By virtue of Dunford’s integral theorem (cf. Yosida [1980], pp. 227-230)

the Laurent expansion of R(z−1, P ) exists in z = 1, i.e. there are µ-bounded matrices

A−1, A0, such that

R(z−1, P ) =z

1− zA−1 +

∞∑n=0

(z − 1

z

)nAn+1

0 , |z − 1

z| < ‖A0‖−1

µ , (4.7)

where A0A−1 = A−1A0 = 0, and (P−I)A0 = A−1. By the µ-boundedness of the operators

we can multiply (4.7) on the righthandside with z(I − z−1P ). Reversing the order of

multiplication and summation, and comparing coefficients we also establish A0(P − I) =

Page 77: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 67

A−1. Moreover, invoking (4.1) we easily verify, that A−1 = Π. Similarly, by (4.3) and the

first equality in (4.4) we obtain analogously, that A0 = D.

Define the operator O(z) := zP (z)− z2(1− z−1Π. We will show that O(z) is the resolvent

R(z−1, P −Π) for z ∈ D0,R \ 0. Indeed, for z ∈ D0,R \ 0, 1 it is a µ-bounded operator

with

O(z)(z−1I − (P −Π)

)= I + zP (z)Π− z

1− zΠ

= I +z

1− zΠ +

∞∑n=1

(Pn − Pn−1)Π − z

1− zΠ = I,

where with the usual arguments all matrix operations are justified by (4.5) and Remark

1.1. Analogously we derive(z−1I − (P − Π)

)O(z) = I. Thus O(z) is the resolvent

R(z−1, P −Π) for z ∈ D0,R \ 0, 1. Moreover, for z in a neighbourhood of 1,

O(z) = R(z−1, P )− z2

1− zΠ = zΠ +

∞∑n=0

(z − 1

z

)nDn+1.

So, the matrix function O(z) has an analytic continuation in z = 1, with O(1) = Π + D.

Then O(1)(I − (P − Π)

)= Π + D(I − P ) = I, and accordingly

(I − (P − Π)

)O(1) = I.

We conclude that R(z−1, P −Π) = O(z) for z ∈ D0,R \ 0. Hence, r−1σ (P −Π) ≥ R and

the µ− GE property holds for β > R−1. This proves the following theorem.

Theorem 4.2 Condition 4.2 and µ− GE are equivalent. Moreover, the maximal value of

R is r−1σ (P −Π).

Notice, that Condition 4.2 implies analyticity of (1− z)P (z) as a matrix function.

Let β := sup|λ| | λ ∈ σ(P ), λ 6= 1. In the proof of the theorem we argued that the

resolvent R(z−1, P ) is a µ-bounded operator in D0,r−1σ (P−Π) \0, 1. Hence, r−1

σ (P −Π) ≤β−1. By virtue of Proposition 4.1i) the resolvent is an analytic operator in z−1 ∈ ρ(P ).

Thus, if β−1 > r−1σ (P − Π), (1 − z)P (z) has an analytic continuation beyond the disk

D0,r−1σ (P−Π) and we can choose R = β−1. But then r−1

σ (P − Π) ≥ β−1. In this way we

have recovered a slightly weaker version of a well-known theorem from the theory of finite

MC’s.

Corollary 4.1 µ− GE implies that rσ(P −Π) = β.

For finite MC’s it states that the spectral radius of (P −Π) is the modulus of the second

largest eigenvalue of P (cf. Cinlar [1975], pp. 376-379). In this case β is even the smallest

contraction factor. To achieve this, essential use is made of P being an operator in a

finite dimensional linear space. Corollary 4.1 generalizes a theorem by Isaacson & Luecke

[1978], who proved the assertion for µ = e, i.e. the supremum norm. Moreover, it extends

a result proposed by Dekker [1985b].

Page 78: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

68 Part I, Chapter 4

1.3. Elementary construction of the Laurent expansion

Several methods can be used to obtain an expression for the Laurent expansion in z = 1 of

P (z). The analysis in the previous subsection shows that the theory of resolvent operators

may be invoked directly. This approach is taken for finite MC’s by Veinott in his paper

from 1969. Under the assumption of the µ−GE property (or µ−GR in Dekker & Hordijk

[1989]), the method used by Dekker & Hordijk [1988] is essentially to insert the expression

derived by Veinott in the equation P (z) = I + zP (z). Our approach is different from

Dekker & Hordijk’s, as we use a straightforward computation of the Taylor expansion of

D(z) in z = 1 (cf. (4.3)).

Since D(1) is the deviation matrix D (cf. 4.4)), we will study it first. The expression for

D in (4.4) and (4.7) et seq. shows D to have the following property under the µ − GE

property.

Property 4.1:

i) ΠD = DΠ = 0.

ii) (I − P )D = D(I − P ) = I −Π.

The following lemma proves that D can be determined uniquely by solving equations i)

and ii).

Lemma 4.1 Under condition µ − GE the deviation matrix D is the unique µ-bounded

matrix with Property 4.1.

Proof: By (4.7) we only need show unicity. By the µ − GE property there is a constant

c′ > 0, such that ‖Pn‖µ, ‖Π‖µ ≤ c′. Let D′ be another µ-bounded matrix with Property

4.1, then

D −D′ = P (D −D′).

Iteration of this equality yields

D −D′ = Pn(D −D′), n ∈ IN.

Obviously, µ− GE implies the convergence of Pnµ to Πµ, for n tending to infinity. Since

D−D′ is µ-bounded, application of the theorem on dominated convergence (cf. Royden)

establishes

D −D′ = limn→∞

Pn(D −D′) = Π(D −D′) = 0,

so that D = D′.

For our further analysis we need a technical lemma first.

Lemma 4.2 ( 1

1− x

)k=∞∑m=0

(m+ k − 1

m

)xm, x < 1, k ∈ IN0.i)

n∑m=0

(m+ k

m

)=

(n+ k + 1

n

), n, k ∈ IN0.ii)

Page 79: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 69

Proof: Assertion i) is obtained by taking the (k− 1)th derivative of (1− x)−1 and∑n x

n.

As both expressions represent the same function, their derivatives are the same. For the

proof of ii) we use that (1−x)−k · (1−x)−1 = (1−x)−k−1, for k ≥ 0. Inserting the Taylor

expansions of the three functions, we obtain the desired result by comparing the nth Taylor

coefficients of the product on the lefthandside and the expression on the righthandside.

For Dn the nth iterate of D we derive the following formula, which can be found in Veinott

[1969] for finite MC’s as well.

Lemma 4.3

Dn =∞∑m=0

(m+ n− 1

m

)(Pm −Π

), n ∈ IN. (4.8)

Proof: Notice first that the µ-boundedness of D implies the finiteness and µ-boundedness

of Dn, n ∈ IN. Moreover, as ‖Pm−Π‖µ ≤ cβm, the righthandside of (4.8) is µ-bounded by∑∞m=0

(m+n−1

m

)cβm, which converges by Lemma 4.2. Thus both sides of (4.8) are defined

for any n ∈ IN.

For n = 1 the assertion is obviously true. Assuming that it is true for some value n, we

will show the validity of the assertion for n+ 1.

Dn+1ij =

∑k∈E

DikDnkj =

∞∑s,m=0

(m+ n− 1

m

)(Ps+mij −Πij

)=∞∑s=0

s∑m=0

(m+ n− 1

m

)(Psij −Πij

)=∞∑s=0

(s+ n

s

)(Psij −Πij

),

where we use summability and the Fubini-Tonelli theorem (cf. Royden p. 269-270) for the

second and third equalities and Lemma 4.2ii) for the fourth.

Let D(n)(z) denote the matrix with the nth derivative of Dij(z), i, j ∈ E, as elements.

We will study the Taylor expansion of D(z) in z = 1. Compute the nth derivative, then

by virtue of Lemma 4.3 and Property 4.1i) the nth Taylor coefficient is in matrix notation

equal to

1

n!D(n)(1)=

∞∑m=n

(m

m− n

)(P −Π)m

=∞∑m=0

(m+ n

m

)(P −Π)m · (P −Π)n

= Dn+1(P −Π)n = Dn+1Pn. (4.9)

Page 80: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

70 Part I, Chapter 4

Furthermore, we assert that

Dn+1Pn =n∑

m=0

(n

m

)(−1)n−mDm+1, n ∈ IN0. (4.10)

For n = 0 it is clearly true. For n = 1 the use of Property 4.1 yields

D2P = D(DP ) = D(D − I + Π) = D(D − I) = D2 −D,

so that (4.10) is satisfied. Assume that the assertion is true for some n ≥ 2 and notice

that D and P commute by Property 4.1. Then

Dn+1Pn= DP (DnPn−1) = (D − I + Π)(DnPn−1) = (D − I)(DnPn−1)

= (D − I)n−1∑m=0

(n− 1

m

)(−1)n−1−mDm+1

= Dn+1−D +n−2∑m=0

(n− 1

m

)(−1)n−1−mDm+2−

n−1∑m=1

(n− 1

m

)(−1)n−1−mDm+1

= Dn+1 −D +

n−1∑m=1

(n− 1

m− 1

)(−1)n−m −

(n− 1

m

)(−1)n−1−m

Dm+1

=n∑

m=0

(n

m

)(−1)n−mDm+1.

Combination of (4.9) and (4.10) yields for the Taylor expansion of D(z) in z = 1

D(z) =∞∑n=0

1

n!D(n)(1)(z − 1)n =

∞∑n=0

n∑m=0

(n

m

)(−1)n−mDm+1(z − 1)n

=∞∑m=0

Dm+1(z − 1)m∞∑n=m

(n

m

)(1− z)n−m =

∞∑m=0

Dm+1(z − 1)m∞∑n=0

(n+m

m

)(1− z)n

=1

z

∞∑m=0

Dm+1(z − 1

z

)m, z ∈ D1,(1+‖D‖µ)−1 ,

where the Fubini-Tonelli theorem together with the summability of the arguments allows

the third equality. Moreover, the bound on z is chosen such as to ensure convergence of

the righthandside of the second and third equalities. This shows the following theorem.

Theorem 4.3 Consider a MC satisfying µ−GE, for some bounding vector µ with µi ≥ 1,

∀ i ∈ E. Then the Laurent expansion of P (z) exists and satisfies

P (z) =Π

1− z+

1

z

∞∑n=0

(z − 1

z

)nDn+1, z ∈ D1,(1+‖D‖µ)−1 , (4.11)

Page 81: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 71

All separate terms in the foregoing summation are µ-bounded and their summation con-

verges for appropriate z. Consequently, for any reward vector r on E, with ‖r‖µ < ∞,

and z ∈ D0,1 ∩ D1,(1+‖D‖µ)−1 ,

V zi : = IEi

∞∑n=0

rX(n)zn =

∞∑n=0

∑j∈E

(Pnr)izn =

(P (z)r

)i

=1

1− z(Πr)i+

1

z

∞∑n=0

(z − 1

z

)n(Dn+1r

)i

= (1 + ρ)1

ρ

(Πr)i+∞∑n=0

(−1)nρn(Dn+1r

)i

, ρ =

1− zz

, |ρ| < ‖D‖−1µ . (4.12)

This expression is well-known from the theory of MDC’s. Indeed, for α ∈ (0, 1) V αi is

the expected α-discounted reward. Since Pn, Π are uniformly µ-bounded, an Abelian

theorem (cf. Titchmarsh pp. 224-229) can be applied to establish the convergence of the

Cesaro sums 1N+1

∑Nn=0

(Pnr)i

to(Πr)i, as N tends to infinity. So, the latter expression

is the expected average reward, when the system starts in state i ∈ E at time 0. Hence,

the expected average reward is determined by the behaviour of the Laurent expansion of

the α-discounted rewards for α in the vicinity of 1. In this case the ρ in (4.12) can be

interpreted as the interest rate.

The Laurent expansion was firstly exploited by Miller & Veinott [1969], Veinott [1969] to

establish the existence of sensitive optimal policies for finite MDC’s. Later it was gener-

alized to denumerable MDC’s with (un)bounded rewards, which will be discussed in more

detail in Part II. We only mention the papers by Dekker & Hordijk [1988], [1989], which

are at the basis of this monograph. They proved the existence of the Laurent expansion of

the α-discounted rewards under both strong convergence and strong recurrence conditions.

By virtue of the Key theorem the existence of the Laurent expansion is also guaranteed

for aperiodic MC’s that satisfy property µ−GR. In the next section we will analyse strong

recurrence and its relation to the Laurent expansion of P (z) in z = 1.

2. Analyticity in the disk D1,R.

2.1. The Laurent expansion through data transformation.

Assume µ − GR for the finite set M and β < 1. We consider a transformed MC (cf.

Schweitzer [1971]) with transition matrix P , which is defined as follows

P = λI + (1− λ)P, λ ∈ (0, 1).

Thus P is the transition matrix of an aperiodic MC. The µ−GR property is preserved for

this chain for the same finite set M , as

‖M P ‖µ = λ + (1− λ)‖MP ‖µ ≤ λ + (1− λ)β < 1.

Page 82: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

72 Part I, Chapter 4

So, by virtue of the Key theorem the transformed MC is µ−GE. Application of Theorem

4.2 establishes the existence of the Laurent expansion Pij(z) =∑n P

nijz

n in z = 1; in

formula

Pij(z) =Πij

1− z+

1

z

∞∑n=0

(z − 1

z

)Dn+1ij , z ∈ D1,(1+‖D‖µ)−1 , ∀ i, j ∈ E,

where Π, D denote the stationary matrix and the deviation matrix of the transformed MC

respectively.

It is well-known that Π = Π, as

PΠ = Π ⇐⇒ (λI + (1− λ)P )Π = Π,

so that we will use Π in the sequel to denote the stationary matrix of the transformed

MC.

By Lemma 4.1 D is the unique µ-bounded matrix with Property 4.1 for the transformed

MC. Straightforward insertion yields, that D satisfies Property 4.1 for the transformed

MC iff A = (1−λ)D satisfies Property 4.1 for the original MC. Obviously A is µ-bounded.

Moreover, by the unicity of D, A is unique. Thus we have established the existence and

unicity of a µ-bounded matrix A that satisfies Property 4.1 for the original MC.

We have to show yet, that A is indeed the deviation matrix. To this end we use an elegant

argument due to G. Koole. Consider the equation

A = I −Π + PA.

Substract xPA on both sides, for some x ∈ (0, 1). This gives

(I − xP )A = (I −Π) + (1− x)PA.

Multiply with∑n P

nxn, then

A=∞∑n=0

Pnxn(I −Π) + (1− x)∞∑n=0

Pnxn · PA

=∞∑n=0

(Pn −Π)xn + (1− x)∞∑n=0

Pnxn · PA, ∀x ∈ (0, 1) (4.13)

Hence

A = limx↑1

∞∑n=0

(Pn −Π)xn + (1− x)∞∑n=0

Pnxn · PA,

so that the deviation matrix D exists and is equal to A if and only if

limx↑1

(1− x)

∞∑n=0

Pnxn · PA = 0. (4.14)

Page 83: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 73

Since P,A are µ-bounded operators it suffices to show that

limx↑1

(1− x)∞∑n=0

Pnxnµ = Πµ. (4.15)

Indeed, combination of (4.15) and Property 4.1i) yields (4.14) by dominated convergence.

We invoke Theorem 2.1 on the uniform integrability of the sequence Pni• | n ∈ IN0 for

i ∈ E. For any ε > 0 there is a finite set K(ε, i), such that∑j 6∈K(ε,i) P

nijµj ≤ ε, for n ∈ IN0.

Together with (4.1) this establishes for any ε > 0

lim supx↑1

(1− x)

∞∑n=0

xn∑j

Pnijµj

≤ lim sup

x↑1

(1− x)

∞∑n=0

xn∑

j∈K(ε,i)

Pnijµj

+ ε

=∑

j∈K(ε,i)

Πijµj + ε ≤∑j∈E

Πijµj + ε. (4.16)

By Fatou’s lemma we also have∑j∈E

Πijµj ≤ lim infx↑1

(1− x)

∞∑n=0

xn∑j∈E

Pnijµj

. (4.17)

(4.16) and (4.17) together imply (4.15) and thus (4.14). This proves the following lemma.

Lemma 4.4 Under the µ − GR property the deviation matrix exists and is the unique

µ-bounded matrix that satisfies Property 4.1.

The data transformation enables us to show the existence of the Laurent expansion in

z = 1 as well. Thus it provides an alternative proof for the same result in Dekker &

Hordijk [1989].

In the first place we insert D = D/(1 − λ) in the Laurent expansion of Pij(z). Then for

z′ := (z − 1)/(1− λz),

Pij(z)=Πij

1− z+

1

(1− λ)z

∞∑n=0

( z − 1

(1− λ)z

)nDn+1ij

=1

1− λz

Πij

1− z′+

1

z′

∞∑n=0

(z′ − 1

z′

)nDn+1ij

, z′ ∈ D1,(1+‖D‖µ)−1 , (4.18)

as z′ − 1 = (z − 1)/(1− λz) and (z′ − 1)/z′ = (z − 1)/(1− λ)z

).

Secondly we show, that (1−λz′)−1Pij(z′) coincides with (4.18), for z′ ∈ D0,1∩D1,(1+‖D‖µ)−1

Indeed, for z ∈ D0,1

Pij(z)=

∞∑n=0

(λI + (1− λ)P

)nzn =

∞∑n=0

n∑m=0

(n

m

)λn−m(1− λ)mP

mij

· zn

=

∞∑m=0

Pmij (1− λ)mzm

∞∑n=m

(n

m

)(λz)n−m =

1

1− λz

∞∑m=0

[ (1− λ)z

1− λz

]mPmij

=1

1− λz· Pij(z′), (4.19)

Page 84: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

74 Part I, Chapter 4

where the Fubini-Tonelli theorem is used for the third equality and Lemma 4.2 for the

fourth. Observe that there are no convergence problems, as z′ < 1 iff z < 1.

Combination of (4.18) and (4.19) through the Laurent expansion of Pij(z) shows that

(4.18) is a meromorphic continuation of Pij(z′) in the disk D1,(1+‖D‖µ)−1 , ∀ i, j ∈ E. This

establishes the following theorem.

Theorem 4.4 Assume µ − GR. The Laurent expansion of Pij(z) exists for all i, j ∈ Eand some R > 0 in a common disk D1,R, and satisfies (4.11).

2.2. Characterization of strong recurrence.

The conclusion of Theorem 4.4 implies the validity of an analogous condition to Condition

4.1, phrased in terms of the analyticity of (1− z)Pij(z).

Condition 4.3:

i) The MC is stable, ν <∞.

ii) (1 − z)Pij(z) can be analytically continued in a disk D1,R, for some R > 0 and

∀ i, j ∈ E.

Due to the equivalence of geometric ergodicity and Condition 4.1, the question immediately

forces itself upon us whether the converse is also true. The remainder of this section will

address this and related topics. First we will prove the following theorem.

Theorem 4.5 Condition 4.3 and strong recurrence are equivalent.

Proof: Strong recurrence implies Condition 4.3 by virtue of Theorem 4.4. So, we assume

Condition 4.3. Due to Remark 1.2 to Proposition 1.3 a MC is strongly recurrent iff the

recurrence times to a finite set are exponentially bounded.

Choose a finite set B of reference states. It is sufficient to show the existence of an R > 1,

such that

FiB(x) =∞∑n=1

F(n)iB xn <∞, x ∈ (1, R). (4.20)

Indeed, for T the recurrence time to set B, this gives

IPiT > N =∑n>N

F(n)iB ≤ FiB(x)x−N ,

so that T is exponentially bounded and the strong recurrence property holds.

As limz→1(1 − z)Pbb(z) = Πbb > 0 and |B| < ∞, there is an r ∈ (0, R) such that

(1−z)Pbb(z) 6= 0, ∀ z ∈ D1,r, b ∈ B. Define for z ∈ D1,r the following analytic continuation

of the probability generating function Fib(z) (cf. (2.24), (2.25))

Fib(z) =(1− z)Pib(z)(1− z)Pbb(z)

i 6∈ B, b ∈ B

Fbb(z) = 1− 1− z(1− z)Pbb(z)

, b ∈ B.

Page 85: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 75

Our purpose is, to show that the analytic continuation of FiB(z) =∑b∈B Fib(z) in D1,r

satisfies a similar expression as the righthandside of (4.20). To this end we study the

Taylor expansion of Fib(z) in z = 1,

Fib(z) =∞∑n=1

1

n!F

(n)ib (1)(z − 1)n, z ∈ D1,r, (4.21)

where F(n)ib (z) denotes the nth derivative of Fib(z). Notice the slight ambiguity in notation

with respect to F(n)ib and F

(n)ib (z).

We derive an expression for F(n)ib (1). For z ∈ D0,1, the analyticity of Fib(z) and formula

(4.20) yield

F(n)ib (z) =

∞∑m=n

m!

(m− n)!F

(m)ib zm−n. (4.22)

F(n)ib (x) is monotonically non-decreasing in x ∈ (0, 1). The monotone convergence theorem

(cf. Royden p. 227) together with the analyticity of F(n)ib (z) in D1,r gives

F(n)ib (1) = lim

x↑1F

(n)ib (x) = lim

x↑1

∞∑m=n

m!

(m− n)!F

(m)ib xm−n

=∞∑m=n

m!

(m− n)!F

(m)ib .

Combining this with (4.21) for x ∈ (1, 1 + r) we achieve

Fib(x) =∞∑n=0

∞∑m=n

(m

n

)F

(m)ib (x− 1)n =

∞∑m=0

F(m)ib

m∑n=0

(m

n

)(x− 1)n1m−n

=∞∑m=0

F(m)ib xm,

where the reverse of the order of summation in the second equality is justified by the

Fubini-Tonelli theorem and the non-negativity of the terms in the summation over n,m.

Because the states in B do not communicate FiB(z) =∑b∈B Fib(z) converges in D1,r.

This establishes the assertion of the theorem.

The result of the theorem implies that the power series Fib(z) converges absolutely for

z ∈ D0,1+r, since its coefficients are non-negative real. It is known (cf. Titchmarsh [1939])

that a power series is an analytic function within its region of convergence. Consequently

Fib(z) =∑n F

(n)ib zn is the analytic continuation of Fib(z) in the disk D0,1+r.

If we want to have the bounding vector of the strong recurrence property reflected in the

condition itself, we arrive at Condition 4.4 below. It is essentially the condition used by

Page 86: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

76 Part I, Chapter 4

Lasserre [1988] for the existence of average and Blackwell optimal policies in denumerable

MDC’s.

Condition 4.4: There is a bounding vector µ with µi ≥ 1 ∀ i ∈ E, such that

i) ν <∞.

ii) (1 − z)Pij(z) can be analytically continued in a common disk D1,R for some R > 0

and all i, j ∈ E.

iii) sup‖P (z)‖µ | z ∈ C1,x <∞, ∀x ∈ (0, R).

Lasserre also uses the condition rσ(P ) = 1. To me however it seems to be implied by

Condition 4.4iii). Arguments similar to the proof of Theorem 4.2 lead to ‖Pn‖µ ≥ 1, for

n ∈ IN, by the stochasticity of Pn and the fact that µi ≥ 1, ∀ i ∈ E. Suppose that

‖Pnk‖1/nkµ ≥ c for some c > 1 and an infinite subsequence nkk∈IN of IN, with nk → ∞if k tends to infinity. Then ‖Pnk‖µ ≥ cnk and thus ‖P (x)‖µ = ∞ for x ∈ (c−1, 1).

This contradicts Condition 4.3iii). It is fairly obvious that analogous arguments lead to

‖P‖µ <∞. We will prove the following assertion.

Theorem 4.6 Condition 4.4 is equivalent to µ−WGR.

Proof: By Lemma 2.2 µ − WGR implies µ − GR for µ =∑n MP

nµ. Hence the result

of Theorem 4.4 applies for the bounding vector µ. Since µ is µ-bounded, all µ-bounded

operators are µ-bounded. Consequently the result of Theorem 4.4 applies for µ as well

and thus Condition 4.4 holds if we assume µ−WGR.

Assume Condition 4.4. First we will show, that zP (z) is the resolvent R(z−1, P ) for z ∈D1,R \1. By arguments similar to the proof of Theorem 4.2, Condition 4.4 and Cauchy’s

integral theorem yield the existence of the Taylor expansion of the matrix function (1 −z)P (z) in D1,R, say (1 − z)P (z) =

∑∞n=0A(n)(z − 1)n, for µ-bounded matrices A(n),

n ∈ IN0.

Since I−zP is a µ-bounded and analytic matrix function, the product (1−z)P (z)(I−zP )

can be expanded as a power series in z = 1. However, (1 − z)P (z)(I − zP ) = (1 − z)Ifor z ∈ D0,1 ∩ D1,R. Hence, all coefficients of (z − 1)n, n 6= 1, cancel in the expansion.

Accordingly, (I − zP )(1 − z)P (z) = (1 − z)I for z ∈ D1,R, so that indeed the resolvent

R(z−1, P ) exists on D1,R \ 1, and equals zP (z).

So, similarly to the proof of Theorem 4.2 we can apply results from the theory of the inverse

of a closed linear operator on a normed Banach space. This implies that the deviation

matrix D exists and satisfies Property 4.1 (cf. (4.7) et seq.). Moreover, the Laurent

expansion of Pij(z) satisfies expression (4.11) and Π and D are µ-bounded operators.

We will first show stability of the MC. Consider a class C and suppose it is not positive

recurrent. Then ∀ i, j ∈ C, Πij = 0 and

Dij = limx↑1

∞∑n=0

(Pnij −Πij)x

n = limx↑1

∞∑n=0

Pnijx

n =∞∑n=0

Pnij , (4.23)

Page 87: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 77

by virtue of the theorem of monotone convergence. Let P denote the transition matrix

restricted to states i, j ∈ C. Since D is µ-bounded, ‖∑n P

n‖µ < ∞. By a similar

reasoning as in the proof of Lemma 2.2i) “µ−BS(M)⇒ µ−R(M)” we achieve the existence

of an no ∈ IN, such that ‖Pno‖µ < 1. Thus, for any finite set M ⊂ C ‖M Pno‖µ ≤‖Pno‖µ < 1. Together with Corollary 2.1 we obtain that C is positive recurrent. This

contradicts our original assumption that C is not positive recurrent, so that C is positive

recurrent after all.

Consider the set of states that are not contained in any class. Because these are transient

by Lemma 2.1, we denote this set by T and we know that Πij = 0, ∀ j ∈ T . Consequently

(4.23) holds for i, j ∈ T as well. Choose a set B of reference states and let Cb be the

positive recurrent class of b ∈ B. We analyse a transformed MC with state space T ∪ B,

and transition matrix P defined by

Pij = Pij , i, j ∈ T

Pib =∑j∈Cb

Pij , i ∈ T, b ∈ B

Pbb = 1, b ∈ B.

Obviously Pnij = BP

nij , i, j ∈ T . Using the µ-boundedness of D and (4.23), we obtain for

i ∈ T ∑n∈IN0

∑j∈T

BPnijµj =

∑j∈T|Dij |µj ≤ ‖D‖µµi.

Since (BPnµ)b = 0, for b ∈ B, n ∈ IN0 we conclude that

‖∞∑n=0

BPn‖µ ≤ ‖D‖µ,

so that condition µ − BS holds for the transformed MC. Application of Lemmas 2.2, 2.3

yields FiB = 1, for i ∈ T . Since the behaviour of MC and transformed MC within set

T are the same, we obtain that FiB = Fi,∪b∈BCb = 1 , i ∈ E, for the original MC. Thus

stability directly follows and z = 1 is a simple, isolated pole.

Our next object will be to show that the MC satisfies property µ − BS(B). Similarly as

before, combination with Lemma 2.2 yields that µ−WGR is satisfied in this case. Use first

entrance decomposition to set B, i.e.

Pnij = BP

nij +

∑b∈B

n−1∑m=0

(BPmP )ibP

n−m−1bj = BP

nij +

∑b∈B

n∑m=1

F(m)ib P

n−mbj .

For x ∈ (0, 1) we thus obtain

Pij(x) =

∞∑n=0

BPnijx

n +∑b∈B

Fib(x)Pbj(x),

Page 88: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

78 Part I, Chapter 4

since Fib(x), Pbj(x) are absolutely convergent sequences. Denote again by Dij(x) the

analytic part of Pij(x) (cf. 4.3)), then

∞∑n=0

BPnijx

n= Pij(x)−∑b∈B

Fib(x)Pbj(x)

=1

1− x(Πij −

∑b∈B

Fib(x)Πbj

)+Dij(x)−

∑b∈B

Fib(x)Dbj(x). (4.24)

Rewrite the first expression in the righthandside of (4.24),

1

1− x(Πij −

∑b∈B

Fib(x)Πbj

)=∑b∈B

Fib(1)− Fib(x)

1− xΠbj ,

by the stability of the MC. In the proof of Theorem 4.5 we showed the existence of an

R > 1, such that Fib(z) =∑∞n=1 F

(n)ib zn is analytic in D0,R, for i ∈ E, b ∈ B. So, for any

z ∈ D0,R all derivatives of Fib(z) exist. Using the notation in the afore mentioned proof,

we obtain

limx↑1

∑b∈B

Fib(1)− Fib(x)

1− xΠbj =

∑b∈B

F(1)ib (1)Πbj , i, j ∈ E.

Together with the monotone convergence theorem this yields

∞∑n=0

BPnij =

∑b∈B

F(1)ib (1)Πbj +Dij(1)−

∑b∈B

Fib(1)Dbj(1),

so that

∞∑n=0

∑j∈E

BPnijµj ≤

∑b∈B

F(1)ib (1)‖Π‖µµb + ‖D(1)‖µµi +

∑b∈B

‖D(1)‖µµb, (4.25)

where we use that ‖D(1)‖µ = ‖D‖µ < ∞. The third expression in the righthandside

of (4.25) increases by multiplication with µi. Since B is finite, it is sufficient for the

completion of the proof to show that the vector with components F(1)ib (1), i ∈ E, is µ-

bounded. By considering the Taylor expansion of Fib(x) in x = 1 for small positive values

of x − 1, we obtain directly, that µ-boundedness of the vector with components Fib(x),

i ∈ E, implies µ-boundedness of the vector with components F(1)ib (1), i ∈ E. Here we use

that all derivatives of Fib(x) in x = 1 are non-negative (cf. (4.22)).

Thus, we will show the existence of some c > 0, such that Fib(x) ≤ cµi for i ∈ E. Using

similar arguments as in the proof of Theorem 4.4 we can show the existence of an R < R,

for which

c := minb∈B

inf

z∈D1,R

|(1− z)Pbb(z)|> 0,

Page 89: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 79

and R small enough to satisfy ‖D‖µ R1−R < 1. Then for z ∈ D1,R and i 6∈ B

|Fib(z)| = |(1− z)Pib(z)(1− z)Pbb(z)

|≤ 1

c|(1− z)Pib(z)|

≤ 1

c

‖Π‖µ +

∞∑n=0

|z − 1

z|n+1‖D‖n+1

µ

µi

≤ 1

c

‖Π‖µ +

‖D‖µR1− (‖D‖µ + 1)R

µi, (4.26)

and for i ∈ B we have

|Fib(z)| = |1−1− z

(1− z)Pbb(z)| ≤ 1 +

R

c, ∀ i ∈ E. (4.27)

Combination of (4.26), (4.27) yields the desired result.

Thus we have achieved a complete characterization of geometric ergodicity, µ−GE, strong

recurrence and µ−WGR through analyticity conditions on (1− z)P (z). A striking feature

of Conditions 4.2 and 4.4 in comparison to the more general versions is, that stability need

not be required. An important aspect of the analysis is, that µ is strictly bounded away

from 0. This suggest an intimate connection between transience and positive µ-vectors

with infi∈E µi = 0. We conjecture that a characterization of transient, geometrically

ergodic MC’s using such µ-vectors, is possible.

The results of the previous sections together with Key theorem I imply the following

intriguing result.

Corollary 4.1 Condition 4.2 with ν < ∞ is equivalent to Condition 4.4 together with

aperiodicity.

3. A new formula for the deviation matrix.

Under strong convergence conditions there is an explicit formula for the deviation matrix

(cf. (4.4)). Under strong recurrence conditions the data transformation technique also

yields an explicit formula for the deviation matrix, as a function of the deviation matrix of

the transformed MC. This section derives a formula in which the taboo probability matrix

appears.

There is no loss of generality in assuming µ−GR(B) instead of the more general and more

readily verified condition µ − GR(M). Recall that Table 3 states that µ − GR(M) implies

µ − GRRS(M), where µi ≤ µi ≤ cµi, i ∈ E and some c > 1. Consequently all µ-bounded

operators and vectors are µ-bounded and vice versa, so that the deviation matrix is µ-

bounded iff it is µ-bounded.

So, throughout this section we will assume, that property µ−GR holds for some bounding

vector µ with µi ≥ 1, ∀ i ∈ E, a set B of reference states and β < 1, i.e.

‖BP ‖µ ≤ β.

Page 90: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

80 Part I, Chapter 4

By Lemma 4.4 the deviation matrix is the unique µ-bounded matrix with Property 4.1.

Hence,

D = I −Π + PD. (4.28)

Consider a transformed matrix D′, which is defined as follows

D′ij := Dij −∑b∈B

FibDbj , (4.29)

so that D′bj = 0, for b ∈ B, j ∈ E. Recall that Fib =∑∞n=0(BP

nP )ib, as the states in set

B do not communicate. Moreover

‖D′‖µ = supi∈E

µ−1i

∑j∈E|D′ij |µj ≤ sup

i∈Eµ−1i

∑j∈E|Dij |µj +

∑b∈B

Fib|Dbj |µj

≤ ‖D‖µ(1 +∑b∈B

µb) <∞.

Insertion of (4.29) in (4.28) gives

D′ij = Dij −∑b∈B

FibDbj = δij −Πij +∑k∈E

Pik(Dik −∑b∈B

FkbDbj)

= δij −Πij +∑k∈E

PikD′kj

= δij −Πij +∑k∈E

BPikD′kj ,

where for the second equality we use that Fib =∑k∈E PikFkb, as both sides of this ex-

pression denote the probability that state b ∈ B is eventually reached. In matrix notation

this is equal to

D′ = I −Π + BPD′.

Since D′, Π (cf. Corollary 2.1) and BP are µ-bounded operators, this equality can be

iterated to yield for any N ∈ IN

D′ =N∑n=0

BPn(I −Π) + BP

N+1D.

As ‖BPN+1

D′‖µ ≤ βN+1‖D′‖µ

limN→∞

BPN+1

D′ = 0.

Hence, the limit for N tending to infinity gives

D′ =

∞∑n=0

BPn(I −Π). (4.30)

Page 91: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

The Region of Convergence of (1− z)P (z) 81

By assumption D satisfies Property 4.1i), so that ΠD = 0. Together with (4.29) we obtain

0 =∑k∈E

ΠikDkj =∑k∈E

Πik

D′kj +

∑b∈B

FkbDbj

=∑k∈E

ΠikD′kj +

∑b∈B

FibDbj .

So, for i = b ∈ BDbj = −

∑k∈E

ΠbkD′kj .

Combination with (4.29) and (4.30) establishes

Dij = D′ij −∑b∈B

Fib∑k∈E

ΠbkD′kj = D′ij −

∑k∈E

ΠikD′kj

=∑k∈E

(δik −Πik

)[ ∞∑n=0

BPn(I −Π)

]kj

and thus

D = (I −Π)∞∑n=0

BPn(I −Π). (4.31)

This completes the proof. An alternative proof inserts expression (4.31) in Property 4.1.

From the foregoing analysis this D satisfies (4.28).

Obviously D has Property 4.1i), as Π(I − Π) = (I − Π)Π = 0. So, we have to check

whether D(I − P ) = I −Π.

D(I − P )= (I −Π)

∞∑n=0

BPn(I −Π)(I − P )

= (I −Π)∞∑n=0

BPn(I − BP − BcP )

= (I −Π)− (I −Π)∞∑n=0

BPnBcP (4.32)

The ijth component of the last expression in (4.32) equals 0, if j 6∈ B. Let j ∈ B, and

denote by C the positive recurrent class that contains j. Then[(I −Π)

∞∑n=0

BPnBcP

]ij

=∑k∈E

(I −Π)ikFkj

= Fij −∑k∈C

ΠikFkj = Fij −∑k∈C

Πik

= Fij − Fij = 0.

Combining this with (4.32) we conclude that D(I −P ) = I −Π. Since any matrix A with

Property 4.1 satisfies (4.31), we have shown the following result.

Page 92: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

82 Part I, Chapter 4

Theorem 4.7 Under condition µ− GR(B), with B a set of reference states, the deviation

matrix exists and is the unique µ-bounded matrix with Property 4.1. Moreover,

D = (I −Π)∞∑n=0

BPn(I −Π).

Our remark at the beginning argues the validity of the following corollary, which we state

for completeness of the exposition.

Corollary 4.2 Assume µ− GR(M) and let B ⊂ M be a set of reference states. Then the

assertions of Theorem 4.7 are valid.

Page 93: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PART II

UNIFORM GEOMETRIC RECURRENCE

OF MARKOV DECISION CHAINS

It recurred to him that leaving the chain is the optimal decision

Page 94: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties
Page 95: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 85

CHAPTER FIVE

Introduction.

1. The results versus the literature.

Average optimality is the most often used criterion in Markov decision chains (MDC’s),especially in practice. In comparison to α-discounted optimality this is caused by the factthat the relevant discount rate is often close to 1, but difficult to specify. Unfortunately,the average optimality criterion sometimes selects a rather weak policy.

We illustrate this with a poignant example. An employee is about to retire. He will receivea pension of x pounds a day for an infinitely long time period. Furthermore, the firm offershim, as a present, either a watch or a million pounds. If the employee uses the averagereward criterion, then there is no difference between the two offers!

Blackwell, already in his paper from 1962, proposed a more selective criterion. He calledit 1-optimality, which later became known as Blackwell optimality. This criterion selectsa policy, if it is discounted optimal for all discount factors sufficiently close to 1. In hispaper the existence of optimal policies was shown for MDC’s with a finite state space andfinite action sets. Miller & Veinott [1969] constructed a finite algorithm to compute aBlackwell optimal policy. Also Veinott [1969] introduced a sequence of optimality criteriaranging from average to Blackwell optimality, which we refer to as sensitive optimalitycriteria. In order to explain these criteria, we remark that the probability generatingfunction P (f, z) =

∑n P

n(f)zn of the MC generated by the deterministic policy f∞,

allows a Laurent expansion in a neighbourhood of z = 1, if the MC has a finite statespace. Then also the expected α-discounted rewards allow a Laurent expansion (cf. 4.11)of the form

(1 + ρ)

∞∑k=−1

ρkuk(f),

where the interest rate ρ is related to the discount factor α via α = 1/(1 + ρ).

In terms of the Laurent series, Veinott’s so called n-discount optimality is the same aslexicografical maximization of the first n+ 2 terms u−1(f), u0(f), . . . , un(f). As the Abelsum is equal to the Cesaro sum, u−1(f) is the average reward. Consequently, −1-discountoptimality is equivalent to average optimality. Clearly (n+1)-discount optimality is a moreselective criterion than n-discount optimality. Finally, Miller & Veinott also proved that aBlackwell optimal policy maximizes lexicographically all terms of the Laurent expansion.

For MDC’s with finite state and action spaces optimal policies for all these criteria exist

Page 96: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

86 Part II, Chapter 5

and can be computed with a finite algorithm. For MDC’s with compact action sets and/ordenumerable state space the existence of optimal policies is not guaranteed, not even forthe weakest criterion, i.e. average optimality. Therefore conditions have to be imposedto guarantee the existence of a, possibly partial, Laurent expansion of the expected α-discounted rewards and thus of P (f, α) in a neighbourhood of α = 1.

The analysis of sensitive optimal policies through the Laurent expansion contains twostages. A first step is to prove that indeed the mere existence of the Laurent expansiondoes yield the desired results. Dekker & Hordijk [1988] show this under additional con-tinuity and compactness conditions together with a condition on the validity of a policyimprovement step. The paper by Lasserre [1988] poses the existence of the total Laurentexpansion of P (f, z), as a µ-bounded and µ-continuous operator. For µ-bounded rewardshe applies the analysis by Dekker & Hordijk [1988] to obtain optimality results. Withinthe framework of µ-bounded vectors and matrices his conditions are quite weak.

The second step is to develop verifiable conditions on the structure of the immediaterewards and the transition probabilities. Dekker & Hordijk [1988], [1989] use ergodicity(or quasi-compactness) and recurrence conditions. These are generalized versions of widelyused conditions in the literature on the existence of average optimal policies mainly. Theergodicity condition is introduced in Dekker & Hordijk [1988] as µ-uniform geometricconvergence, but renamed in Dekker, Hordijk & Spieksma [1990] as µ-uniform geometricergodicity to have a better correspondence with the terminology for one MC. Indeed,µ-uniform geometric ergodicity requires that the µ-geometric ergodicity property holdsuniformly for the set of MC’s induced by the deterministic policies. Recalling the analysisof section 4.1, it is evident that the condition implies analyticity of (1 − z)P (f, z) in acommon disk D0,R, R > 1, for all deterministic policies. Consequently the existence of theLaurent expansion of P (f, z) is guaranteed in a common disk D1,R′ .

The use of the µ-uniform gemetric ergodicity property together with some compactnessand continuity assumptions provides a direct method to establish all necessary conditionsdeveloped by Dekker & Hordijk [1988] on the Laurent expansion we mentioned earlier.The core of the proof relies on the applicability of two theorems on convergent sequencesof continuous functions on a compact set. The first states that the limiting function itselfis continuous, if the convergence is uniform. The second is Dini’s theorem on uniformconvergence, which asserts the convergence to be uniform, if the sequence is non-decreasing(non-increasing) and the limiting function is continuous (cf. Royden [1968]).

Although a mathematically elegant analysis results when using ergodicity conditions, theyare very hard to verify in practice. That is why recurrence conditions are used. However,these require an intricate analysis, when no information on quasi-compactness is available.

One such condition is called µ-uniform geometric recurrence, which is nothing but µ-geometric recurrence, with one contraction factor and one finite taboo set for all MC’sinduced by the deterministic policies. This condition was used by Dekker & Hordijk[1989] to derive results similar to their paper from 1988. As a condition on the one-steptransition matrices it is relatively easy to verify. Moreover, it is related to many otherrecurrence conditions that are used for MDC’s (cf. Dekker & Hordijk [1989], Dekker,Hordijk & Spieksma [1990b]) and which appear in the analysis as well.

Page 97: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 87

For easy reference we will call a MDC uniformly strong convergent (recurrent) if it isµ-geometrically convergent (recurrent) for µ a bounding vector with µi ≥ 1, ∀ i ∈ E,which is the obvious extension of the similar notion for one MC. An important issue isthe relation between the two conditions. Because the recurrence condition is effectivelychecked in applications, it is especially of interest to know in which respect the class ofuniformly strong convergent models differs from the class of uniformly strong recurrentmodels.

Let us continue with a general account of results on this subject from the literature.We first emphasize that the papers by Dekker & Hordijk [1988], [1989] implicitly includeresults for the α-discounted reward criterion. To see this we can directly use the Laurentexpansion for discount factors close to 1. Alternatively, we reformulate the α-discountedrewards as 0-discount rewards in a transformed MDC with transition matrix αP (f) for alldeterministic policies, and one extra absorbing state with immediate reward 0, which isreached in one step with probability 1−α from any other state. Then we get a MDC thatis both strongly convergent and recurrent. The result however, is not new, since weightedsupremum norms were already used by Wessels [1977] for the analysis of α-discountedoptimality.

As we focus on the comparison of specific conditions, we will only mention results in thisfield. For an overview of ergodicity and recurrence conditions we refer to Thomas [1980]and to Dekker & Hordijk [1988] for a discussion on various results for optimality in MDC’s.

The first result on the relation between uniform strong convergence and recurrence isderived by Federgruen, Hordijk & Tijms [1978a,b] in the context of average optimalityin MDC’s. Actually they use the supremum norm, or the e-norm. For unichain MDC’sthey show the equivalence of e-uniform geometric ergodicity and the simultaneous Doeblincondition, thereby using a representation of the latter condition derived by Hordijk [1974].For one MC this result is a very old one, which can be found in e.g. Neveu [1965]. It iseasy to show equivalence of the simultaneous Doeblin condition and µ-uniform geometricrecurrence, for µ a bounded vector.

Since these papers use the e-norm, the results are valid only for uniformly bounded rewards.This is a severe restriction for many queueing models. Indeed, interesting reward functions,such as waiting costs in an open network, are unbounded. A first extension of the results byFedergruen, Hordijk & Tijms is proved by Zijm [1985], also in the context of the averagereward criterion. He allows a multichain structure under all deterministic policies, butstill needs bounded rewards. Besides aperiodicity and finiteness of the number of positiverecurrent classes in the MC’s induced by the deterministic policies, he has to requirecontinuity of the number of the positive classes as a function of the deterministic policies.

In Chapter 6 we prove the same result for general µ-norms. This connects the two papersby Dekker & Hordijk. Using a similar data transformation as in section 4.2 the analysisin Dekker & Hordijk [1988] can be straightforwardly applied to periodic, µ-uniformlygeometrically recurrent MDC’s as well.

Another topic of Chapter 6 concerns the relation of these conditions to the existence of theLaurent expansion of the expected α-discounted rewards in α = 1. Thus a comparison withLasserre’s condition is of interest. Indeed, it turns out to be effective as a characterization

Page 98: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

88 Part II, Chapter 5

of uniform strong recurrence, under the assumption of a finite number of positive recurrentclasses under any deterministic policy. Hence, uniformly strong recurrence is the weakestreasonable condition on the MDC structure for the existence of the Laurent expansion ofP (f, z) in a common disk for all deterministic policies.

For continuous time Markov decision processes (MDP’s) a similar equivalence result ofuniform strong recurrence and convergence can be proved under the assumption of uni-formizability of the process under all deterministic policies. Here the arguments for thecase of one MP (cf. section 2.3) can be copied to carry over the results for the approximat-ing MDC (AMDC) to the MDP. We hope that this equivalence result will prove fruitfulfor the existence of sensitive optimal policies in MDP’s. As far as we know, only fewresults in this field exist. We mention Veinott [1969] for finite state and action spaces andPuterman [1974] for compact state and action spaces.

Pursueing the comparison between e-uniform geometric ergodicity and µ-geometric ergod-icity, we observe that the first one is not only a heavier condition on the MDC structure,but also allows optimality results for a smaller class of reward functions. All applicationswe analyse in Chapters 3 and 9 are µ-(uniformly) geometrically ergodic for a µ-vector ofexponential type, but not (uniformly) strongly ergodic.

On the other hand, µ-uniform geometric ergodicity and recurrence are generally heavyconditions on the MDC structure, if weaker optimality criteria are considered, because ofthe relation with spatial geometric boundedness as discussed in Part I for one MC. Bothconditions require the convergence of the Laplace-Stieltjes transforms of the arrival and/orservice time distributions in a neighbourhood of 0, thereby implying the convergence ofall moments. Indeed, if weaker optimality criteria are considered, the existence of thecomplete Laurent expansion is not needed.

In e.g. Hordijk & Sladky [1977] it was shown that the existence of n+2 Lyapunov functionsimplies the existence of a partial Laurent expansion of order n+ 2, in a common disk forall deterministic policies. Consequently the existence of k-discount optimal policies isguaranteed for k = −1, 0, . . . , n. In this case the convergence of at most n + 2 momentsis required. In this context we point out, that the µ-vector in the µ-geometric recurrenceproperty is a Lyapunov function that bounds all Lyapunov functions.

Even then, many control problems are mainly concerned with minimizing the expectedα-discounted or average expected cost for non-negative immediate costs, so they do notrequire such a general setting. This is in fact negative dynamic programming, whichallows weaker conditions on the MDC structure. In recent papers by Weber & Stidham[1987], Stidham & Weber [1989] and Sennott [1989a,b] conditions for the existence of(deterministic) average optimal policies in unichain MDC’s have been developed that donot require ergodicity of all Markov chains induced by the deterministic policies. Thereforethey can handle service control problems, where it is allowed to turn the server off. Thisis not possible in Dekker & Hordijk’s approach, since uniform strong recurrence impliesrecurrence to some finite fixed set of states under all deterministic policies.

Weber & Stidham [1987] study networks of queues with control of the service rates, for theα-discounted and average optimality criteria. For both optimality criteria also conditionsfor transition monotonicity of optimal policies are derived, that incorporate the necessity

Page 99: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 89

of the option to switch the server off.

Sennott [1989a,b] derives conditions for the existence of deterministic average optimalpolicies in Markov and semi-Markov decision models. These conditions are closely relatedto Weber & Stidham’s, and involve the existence of a Lyapunov function for the MCgenerated by some deterministic policy. In both papers she applies her conditions tovarious queueing models with one service centre. In Stidham & Weber [1989] the left-skip-free property of many one dimensional queueing models is exploited to obtain bothexistence of average optimal policies and conditions on the cost structure for the optimalityof monotonic policies. This paper and the papers by Sennott allow control of serviceand/or arrival rates. Of course, there is a large literature on average and α-discountedoptimality and structures of optimal policies, mainly on one-dimensional queueing models.The mentioned papers contain extensive references on earlier work in this field.

Although these papers study weaker sufficient conditions for the existence of and structuralresults for average and α-discounted optimal policies, they do not yield explicit formulasand convergence of algorithms. In this respect, the uniform strong recurrence criterion isuseful, even for the average and α-discounted optimality criteria.

An algorithm that is often used for results on the structure of average optimal policies,is the successive approximations algorithm. For aperiodic, unichain MDC’s with finitestate and action spaces it is known that the successive approximation algorithm convergesat a geometric rate (cf. White [1963]). The unichain assumption can be relaxed due toresults by Schweitzer & Federgruen [1977]. Zijm [1987] gives an alternative proof of thegeometrically fast convergence of the algorithm.

Obviously, for finite state and action spaces this algorithm serves as a means to computeaverage optimal policies, and to get bounds for the bias vector in the average optimalityequation. If the MDC has denumerable state and/or compact action spaces, the algorithmis less suited for computational purposes. However, in this case it serves as a meansto obtain information on the structure of optimal policies. Indeed, convergence of thealgorithm implies that any sequence fnn, where fn is a rule that is optimal for the nth

iteration of the algorithm, and any limit point f of this sequence, the deterministic policyf∞ is average optimal. Then if all fn have a specific structure, it is inherited by the limitf and the search for an optimal policy may be restricted to a specific subclass of policies.

If fn = f for n ∈ IN, then obviously no assumptions on the MDC structure are necessary.This approach is often used to show optimality of fixed rules, such as the µc-rule in theK competing queues model (cf. Baras, Ma & Makowski [1985]).

If additionally all average optimal policies satisfy some communicatedness condition, acommon structure of the maximal gain vectors vn over n time periods is inherited by thebiasvector v in the average optimality equation. This is an alternative way to determinestructures of average optimal policies. It is sometimes easier to prove a common structureof the vn’s than of the fn’s (cf. Tijms & Eikeboom [1986], van Dijk & Lamond [1988]).

Hordijk, Schweitzer & Tijms [1975] show convergence of the algorithm for denumerableMDC’s with finite action spaces, where the object is to minimize the average expectedcost for non-negative immediate costs. The most important conditions they use, are the

Page 100: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

90 Part II, Chapter 5

following ones: there is a solution to the average optimality equation, the MC induced byany average optimal policy is unichain and aperiodic and the difference of the biasvector inthe optimality equation and the minimal total gain vector after one iteration is uniformlybounded. The latter condition cannot hold for control models with bounded costs, satis-fying a left-skip-free property. Indeed, in this case the biasvector is unbounded, whereasthe gain after one step of the algorithm is bounded.

In Chapter 7 we will show convergence of the successive approximations algorithm underthe assumption of µ-uniform geometric recurrence and aperiodicity, allowing a multichainstructure. The aperiodicity assumption is not a severe one, since Schweitzer’s data trans-formation technique (cf. section 4.2) does not affect the average expected rewards. Undera communicatingness assumption on average optimal policies, we show that the bias vec-tor v is unique upto a constant vector, and so the vn converge to this vector plus someconstant as n tends to infinity.

We conjecture that the successive approximations algorithm converges geometrically fast.Until now we have not been able to prove this. The proof techniques in Schweitzer &Federgruen [1979], or Zijm [1987] can not be used, since they lean on the finiteness of theclass of deterministic policies. This implies the existence of some n0, such that the optimalrule in the nth iteration, n ≥ n0, is contained in the set of average optimal policies. Suchresult generally does not hold for models with a denumerable state space.

A natural continuation of an algorithm for the structure of average optimal policies isto develop algorithms that can be applied to obtain consecutively results on structuresof n-discount optimal policies. The iteration procedure in Hordijk [1976] for unichainMDC’s satisfying a Lyapunov function criterion is such an algorithm. Under the µ-uniformgeometric recurrence property it is straightforward to show convergence of the procedureas well, although we will not do this in this monograph.

As regards the Blackwell optimality criterion, the limit of α-discounted optimal policies,for α tending to 1, is known to be Blackwell optimal for MDC’s with finite state and ac-tion spaces. In this case the existence of only finitely many deterministic policies implies,that there is always an interval [α0, 1) where some α-discounted policy remains optimal.Cavazos-Cadena and Lasserre [1988] conjecture that the limiting policy is Blackwell opti-mal under more general assumptions. However, this turns out not to be true.

In Chapter 8 we will construct two counterexamples, for which no limiting policy is Black-well optimal. Both models are unichain. The first has a finite state space, and compactaction spaces, the second a denumerable state space and finite action spaces. In fact thesecond one is a conversion of the first one, and its construction was partially suggestedby Th. Hill, during a discussion with him on this subject. Inspired by the structure ofthe counterexample, he conjectured that a Blackwell optimality result for limiting poli-cies might be true under the following additional condition: the positive entries of thetransition matrix of the MC generated by any deterministic policy are the same for eachpolicy.

However, it is possible to derive a partial result for limiting policies. Due to the equivalenceof Abel and Cesaro sums under uniform strong recurrence, the limiting policy is alwaysaverage optimal. Hordijk [1976] and Cavazos-Cadena & Lasserre [1988] showed, under

Page 101: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 91

different sets of assumptions, that the limiting policy is also 0-discount or bias optimal.Under the condition of uniform strong recurrence we show a slightly more general result.Suppose that all deterministic policies are n-discount optimal. Then the limiting policy isn + 1- and n + 2-discount optimal. This is the strongest general result possible. Indeed,in our counter-example all policies are average optimal, the unique limiting policy is biasand 1-discount optimal, but not 2-discount optimal.

A last topic of Part II is the verification of the uniform strong recurrence property. InChapter 9 this property is checked for basically two multi-dimensional queueing models,by the construction of a suitable bounding vector µ. As far as we know, results on theexistence of sensitive optimal policies hardly exist, and even less so for models where thedimension of the state space exceeds 1. This is due to the fact, that the boundary ofthe state space consists of infinitely many states with negative drift. If the state space isone-dimensional, the boundary consists of only one state, namely state 0.

The spatial geometric boundedness property of µ-geometrically recurrent models as dis-cussed in Chapter 1 leads us to µ-vector of productform, such that it increases exponen-tially fast with increasing states. Together with Key theorem II, this implies geometri-cally fast convergence of the Laplace-Stieltjes transforms of the marginal distributions ina neighbourhood of 0, uniformly in the deterministic policies.

The first model we analyse, is the K competing queues model with random routing. Kqueues compete for service of one server. Once a customer has finished service, he isrouted to one of the other queues or he leaves the network. Under a stability assumption,three versions of the model will be studied. The first one is a time-slotted version withfixed, but general arrival distributions in each time-slot and geometric service require-ments. Then convergence of the Laplace-Stieltjes transforms of the arrival distributionsin a neighbourhood of 0 is equivalent to uniform strong recurrence.

The second version is the continuous time model with Poisson arrivals and exponentiallydistributed service times. We study a time-discretized version first and show uniformstrong recurrence. This version being an AMDC the result is valid for the continuous timeprocess as well. The third model is an embedded MDC on the epochs of a departure ina semi-Markov model with Poisson arrivals and general service time distributions. Thenthe convergence of the Laplace-Stieltjes transforms of the service time distributions in aneighbourhood of 0 is necessary and sufficient for uniform strong recurrence.

Thus the existence of sensitive optimal policies is guaranteed. Let us compare this withresults from the literature. The K competing queues model has been discussed in manypapers (cf. Makowski & Shwartz [1989] for extensive references on this subject). Thefirst version without random routing was studied by e.g. Baras, Ma & Makowski [1985],Buyukkoc, Varaiya & Walrand [1985]. Contrary to our assumptions idling policies areallowed and the arrival distribution for each queue may vary over the time-slots, if onlytheir expectations are equal. Then the µc-rule is optimal for linear cost structures andall discount factors smaller than 1, within the class of deterministic, but possibly non-stationary policies. Hence, it is average optimal in this class.

If the model satisfies our assumptions as well, then combination with our results yieldsthat the µc-rule is α-discounted optimal within the class of all policies for all α ∈ (0, 1),

Page 102: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

92 Part II, Chapter 5

hence it is Blackwell optimal.

Nain [1989] proves a similar result as the papers we mentioned, for the continuous timeversion of the model with exponential service time distributions. If random routing isallowed, he obtains α-discounted optimality for all α ∈ (0, 1) of a strict priority rule forrouting matrices with a special structure.

Klimov [1974] studied the third version in continuous time for linear costs, under the samestability condition as we did, together with only finiteness of the first and second momentsof the service time distributions. He proved average optimality of an index-rule policy.The model was analysed by Sennott [1989] as well. She also assumes ergodicity underall deterministic policies, but further only requires the cost structure under one specificpolicy to be bounded by a polynomial of degree n if the service times have a finite (n+1)th

moment. Under these conditions the existence of an average optimal policy is established.

As a last reference on the K competing queues we mention the paper by Makowski &Shwartz [1989]. It deals with probabilistic aspects of the first version of the model, i.e.with recurrence properties and stability conditions. If the average load on the system is< 1(> 1) for a non-idling policy, the system is stable (non-stable). Another result concernsexplicit formulae for the distributions of busy cycles and the finiteness of the nth momentof the length of the busy cycle, if the nth moments of the arrival streams and the initialdistribution of the system are finite. Our conditions imply finiteness of all moments of therecurrence times to state 0, hence of the busy cycles.

The second model we discuss, is the continuous time two-centre open Jackson networkwith control of the service rates. Under a uniformizability condition and uniform stabilityunder all deterministic policies, the AMDC is uniformly strong recurrent. Hence, sensitiveoptimal policies exist, and both discrete time and continuous time models are uniformlystrong convergent. This result appears to be quite new, even if the average optimalitycriterion is considered. Indeed, with respect to this criterion productform solutions of thestationary distributions are often exploited to obtain average optimality results. However,these need not exist if the service rates are allowed to vary over the states, which is arealistic option if the service centres interact.

A drawback of the analysis as presented in this monograph is, that it is not possible to allowtransient MC’s under any deterministic policy, e.g. to turn the server off. However, apartfrom convergent algorithms and sensitive optimality results, the verification of geometricrecurrence of the MC induced by one deterministic policy is a useful result. Then, if theaverage costs are minimized for non-negative polynomial cost functions, the analysis bySennott [1989b] can be applied to yield the existence of average optimal policies.

This section ends with a brief summary of the contents of Part II. In section 2 of thischapter we introduce the model. We give an overview of various recurrence conditionsused by Dekker & Hordijk [1989] for their analysis of Blackwell optimality under strongrecurrence conditions. The relation between these conditions was examined in the contextof one MC in section 2.1. Section 2 also lists various optimality results derived by Dekker& Hordijk [1988], [1989], in so far as they are needed for the analysis in this monograph.We point out that the overview is not complete!

Page 103: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 93

The main results of Chapter 6 are the Key theorem of Part II, which is the equivalenceof uniform strong convergence and recurrence, and the equivalence of uniform strong re-currence and Lasserre’s condition [1988], under appropriate conditions. Some technicallemmas, however, are necessary first. These are derived in section 6.1 under more gen-eral conditions than are strictly necessary, in view of a better insight in the underlyingproperties of MDC’s that satisfy one of the afore-mentioned conditions.

The emphasis is on the extra difficulties that arise, when extending the analysis of oneMC to a set of MC’s. Indeed, the lemmas in section 6.1 are structured in such a way, thatthe proofs of Part I need adjustment in a few places only. In comparison with the proofsin Part I we need continuity results and proper recurrence to a finite set, uniformly in thedeterministic policies. The results of most of these lemmas can be found scattered overthe literature, although sometimes implicit.

Comparing our analysis with the papers by Federgruen, Hordijk & Tijms [1978a,b] andZijm [1985], we remark that the analysis in these papers is facilitated by the knowledge,that for µ = e,

∑j Pij(f)µj = 1 for all initial states i ∈ E and all deterministic policies

f∞. For general µ the only information available is the existence of some c > 0, suchthat

∑j Pij(f)µj ≤ cµi. To my opinion, the use of general µ-norms gives a better un-

derstanding of which properties of the MDC are important for the existence of optimalpolicies.

For the final proofs of the Key theorem for both MDC’s and uniformizable MDP’s insection 6.2 and the equivalence of uniform strong recurrence and Lasserre’s condition insection 6.3 a short argumentation suffices.

Chapter 7 discusses the successive approximations algorithm under uniform strong recur-rence together with aperiodicity. It allows a multichain structure of the MDC. Then theaverage optimality equations consist of a set of two equations. In the second optimalityequation maximization takes place over a subset of the deterministic policies. Using anelegant argument by Schal [1989], we show that it is allowed to maximize over all deter-ministic policies. This enables us to show average optimality of a conserving policy for the“limsup” criterion in section 7.2 in contrast with Dekker & Hordijk, who use the “liminf”criterion.

These results are used to achieve convergence of the algorithm in section 7.3. The analysisparallels the one in Hordijk, Schweitzer & Tijms [1975] to a great extent, although differentassumptions are used. Moreover, we tried to exploit the contraction property of the tabooprobability matrices as much as possible.

In Chapter 8 we investigate the limiting behaviour of α-discounted optimal policies. Sec-tion 8.1 shows the limiting policy to be n+2-discount optimal if all policies are n-discountoptimal. As we already pointed out, it generalizes results from the literature. The deriva-tion uses the Laurent expansion of the expected α-discounted rewards extensively. It alsoindicates, which are the prerequisites for a counter-example for stronger sensitivity results.Indeed, the result is in general the strongest possible result. In section 8.2 two counter-examples disprove n + 3-discount optimality. These examples are the weakest possibleextensions of the MDC with finite state and action spaces.

Page 104: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

94 Part II, Chapter 5

Obviously, the results are less interesting, if the conditions can not be verified in applica-

tions. Chapter 9 constructs bounding vectors of exponential type in the uniform strong

recurrence property for two queueing multi-dimensional queueing models. The first is the

K competing queues model, three versions of which are analysed. The existence of sensi-

tive optimal policies thus obtained, extends previous results on the structure of optimal

policies to a larger class of policies. Moreover, sensitive optimal policies also turn out to

have this structure. As far as we know, the result for the two centre open Jackson network

is new, even for the average optimality criterion. The models are discussed in section 9.1

and 9.2 respectively.

2. The model with results from the literature.

Consider the standard description of a MDC. A dynamic system is observed at discrete

time points to be in one of a denumerable set E of states. At each time point the controller

of the system chooses an action a from the set of available actions A(i), if the system is

in state i. When action a is chosen, a reward ria is earned and the system moves to state

j with probability Piaj .

A decision rule πn at time n is a function that assigns the probability of taking action

a at time n. Generally it may depend on all realized states upto and including time n

and all actions upto time n. A policy R is a sequence of decision rules (π0, π1, . . .). For

a Markov policy the decision rule πn at time n only depends on the state at time n. A

policy is stationary if all decision rules are equal; it is deterministic if it is stationary and

if exactly one action is prescribed in each state. C, C(M), C(S) and C(D) denote the class

of all policies, all Markov policies, all stationary policies and all deterministic policies.

Remark 5.1: A generalization of a result by Derman & Strauch [1966] shown by Strauch

& Veinott for finite MDC’s (cf. pp. 91-93 Derman [1970]) and extended by Hordijk [1974]

to denumerable MDC’s, asserts that for any policy R ∈ C and any initial state i ∈ E

there is a Markov policy that generates the same marginal probability distributions (cf.

Proposition 10.2). Hence, the analysis of sensitive optimality may be restricted to Markov

policies.

We write S,F for the set of stationary and deterministic decision rules that depend on

the state of the system only. By Remark 5.1, the set of all decision rules can be identified

with S. Let π, f be the notation for elements of S, F and π∞, f∞ the notation for the

stationary policy (π, π, . . .) and the deterministic policy (f, f, . . .) respectively.

Throughout Part II the following standard condition for denumerable MDC’s will be as-

sumed.

Assumption 5.1:

i) A(i) is a compact, metric set for all i ∈ E.

ii) Piaj , ria are continuous functions of a ∈ A(i).

Combination of this assumption and a theorem by Tychonov yields that F is a compact

Page 105: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 95

set in the weak topology of componentwise convergence. Moreover, F is metrizable by

virtue of Theorem 4.14 in Kelley [1955].

For ease of notation we denote the transition matrix of the MC induced by policy f∞ as

P (f), with ijth element Pij(f) = Pif(i)j . Pn(f) is the nth iterate of P (f) and P

0(f) = I.

Similarly, r(f) is the vector of immediate rewards with ith component ri(f) = rif(i).

As the system operating under a stationary policy is a homogeneous MC, all quantities

used in Part I are functions on F in Part II. For completeness we will recall these. Thus we

write B(f), ν(f) for a set of reference states and the number of classes in the MC generated

by f∞. The taboo probability matrix with taboo set M is expressed as MP (f) and its nth

iterate as MPn(f). Then F

(n)iM (f) =

∑m∈M

(MP

n−1(f)P (f)

)im

is the probability that set

M is first reached at time n, FiM (f) =∑∞n=1 F

(n)iM (f) that set M is eventually reached and

FiM (f, z) =∑∞n=1 F

(n)iM (f)zn denotes the probability generating function. Similarly, we

define P (f, z) =∑∞n=0 P

n(f)zn as the generating function of the marginal probabilities.

Finally, Π(f) and D(f) denote the stationary matrix and the deviation matrix of the MC

induced by f∞.

Let µ be a bounding vector with µi ≥ 1, i ∈ E. The obvious generalizations of the basic

conditions of Part I are defined as follows.

Definition 5.1: The set of MC’s with transition matrices P (f), f ∈ F , has property

– µ-uniform geometric ergodicity (µ− UGE), if ∃c > 0, β < 1, such that for any f ∈ F‖Pn(f)−Π(f)‖µ ≤ cβn, n ∈ IN0

‖P (f)‖µ ≤ c.

– µ-uniform geometric recurrence (µ − UGR), if a finite set M and a β < 1 exist, such

that for any f ∈ F‖MP (f)‖µ ≤ β.

– uniform strong convergence (recurrence) if a vector µ, with µi ≥ 1 ∀ i ∈ E, exists such

that µ− UGE (µ− UGR) holds.

If e.g. µ− UGR holds for the finite set M , this property is denoted as µ− UGR(M).

To obtain the existence of maximizing policies Dekker & Hordijk need the continuity of

P (f)r(f) on F . For reward vectors bounded by µ, it is sufficient to require the following

condition, which will be assumed throughout this section.

Assumption 5.2:∑j∈E

Piajµj is continous on A(i), ∀ i ∈ E.

For convenience we will introduce a weak concept for continuity of operators, called µ-

continuity, as introduced by Dekker & Hordijk [1988]. Let A(f) be a matrix function on

E × E, which is µ-bounded for any f ∈ F .

Page 106: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

96 Part II, Chapter 5

Definition 5.2: A(f) is µ-continuous on F , if for any i ∈ E, and any converging sequence

fnn∈IN ⊂ F , with limit f∗ say,

limn→∞

∑j∈E|Aij(fn)−Aij(f∗)|µj = 0.

In Chapter 4 we used the notions continuity, analyticity in operator norm. We emphasize,

that µ-continuity is a weaker condition. It can be characterized through the following

lemma, which is in fact Lemma 4.5 from Dekker & Hordijk [1988].

Lemma 5.1 The following assertions are equivalent for a matrix function A(f), with

‖A(f)‖µ <∞, ∀ f ∈ F .

i) A(f) is µ-continuous on F .

ii) A(f) and A(f)µ are pointwise continuous on F .

iii) For any pointwise converging sequence xnn on E, with limit x∗ say, and supn∈IN0

‖xn‖µ <∞,

and for any converging sequence fnn ⊂ F with limit f∗say,

limn→∞

(A(fn)xn

)i

=(A(f∗)x∗

)i, i ∈ E.

So, by Assumptions 5.1, 5.2 P (f) is µ-continuous, but not generally continuous as a matrix

function (in operator norm).

As already mentioned in section 5.1, in this section we will focus on the properties of

MDC’s satisfying µ−UGE or µ−UGR that are directly related to our analysis. To this end

we need the expected α-discounted reward vector V αi (R) and the average expected reward

vector g(R) of the system starting in state i ∈ E and operating under policy R. Let

X(n), Y (n) denote the state of the system and the action chosen respectively at time n.

With IEi,R the expectation operator when policy R is used and the starting state is i,

V αi (R) := IEi,R( ∞∑n=0

αnrX(n),Y (n)

), gi(R) = lim inf

N→∞

1

N + 1IEi,R

N∑n=0

rX(n),Y (n).

Together with the discount factor α, we use the interest rate ρ = 1−αα . We introduce the

following optimality criteria.

Definition 5.3 :

R∗ is average optimal iff gi(R∗) ≥ gi(R), ∀R ∈ C, ∀ i ∈ E.

R∗ is α-discounted optimal iff V αi (R∗) ≥ V αi (R), ∀R ∈ C, ∀ i ∈ E.

R∗ is n-discount optimal iff lim infρ↓0

ρ−n(V αi (R∗) − V αi (R)

)≥ 0, ∀R ∈ C, ∀ i ∈ E, n ∈

−1, 0, 1, . . ..R∗ is Blackwell optimal iff ∀ i ∈ E, ∀R ∈ C ∃ρ(i, R) such that V αi (R∗) ≥ V αi (R), 0 <

ρ ≤ ρ(i, R).

In section 5.1 we also indicated that sensitive optimality is related to lexicographical max-

imization of terms of the Laurent expansion of the α-discounted rewards. Let x(ρ) =∑∞n=−1 x(n)ρn, y(ρ) =

∑∞n=−1 y(n)ρn be Laurent series, with x(n), y(n) ∈ IR, n =

Page 107: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 97

−1, 0, . . . . Let no be the first index for which x(no) 6= y(no). Then x(ρ) is lexicographically

not less than y(ρ), notation: x(ρ) l≥ y(ρ), iff x(no) ≥ y(no). For vectors x(ρ) =(xi(ρ)

)i∈E ,

y(ρ) =(yi(ρ)

)i∈E with Laurent series as components, x(ρ) l≥ y(ρ) iff xi(ρ) l≥ yi(ρ), ∀ i ∈ E.

Let µ be a vector with µi ≥ 1, ∀ i ∈ E. The following proposition comprises some results

from Dekker & Hordijk [1988], [1989], thus emphasizing the scope of the conditions used.

Proposition 5.1 ‖r(f)‖µ < ∞, ∀ f ∈ F . Assume either µ − UGE or µ − UGR with

continuity of ν(f) on F . Then

i) The Laurent expansion of the α-discounted rewards exists in a common disk, for all

f ∈ F , and satisfies the following formula

V α(f∞) = (1 + ρ)ρ−1Π(f)r(f) +

∞∑n=−1

ρn(−1)nDn+1(f)r(f) = P (f, α)r(f).

ii) There is a deterministic Blackwell optimal policy, say f∞o . Moreover, f∞o is n-discount

optimal, n = −1, 0, . . ., hence average optimal.

iii) n-discount optimality and Blackwell optimality are equivalent to lexicographical max-

imization of the first n+2 terms of the Laurent expansion and all terms of the Laurent

expansion respectively.

iv) There is an α-discounted deterministic optimal policy.

v) A solution (g, v) to the average optimality equations exists in the space of µ-bounded

vectors, i.e. ∀ i ∈ Egi = sup

a∈A(i)

∑j

Piajgj

vi = supa ∈ A(i) | gi =

∑j Piajgj

(ria − gi +∑j

Piajvj).

(5.1a)

(5.1b)

Consider any µ-bounded solution (g, v) to the average optimality equations. A policy

f∞1 for which f1 chooses the maximizing actions in the second equation, is average

optimal and(g(f∞1 ), v(f1) = D(f1)r(f1)

)is a solution as well.

vi) g(f∞) = Π(f)r(f) and Π(f) is stochastic for all f ∈ F .

vii) If µ− UGE holds, D(f) =∑∞n=0

(Pn(f)−Π(f)

).

viii) Π(f), D(f) are µ-continuous on F .

ix) ∃c, c > 0, such that ‖D(f)‖µ ≤ c, ‖Π(f)‖µ ≤ c, ‖Pn(f)‖µ ≤ c, n ∈ IN0, ∀ f ∈ F .

Observe, that under strong recurrence conditions a formula for the deviation matrix exists

as well (cf. Theorem 4.7). We briefly comment on the assertions of the Proposition that

are straightforward to derive. For the µ − UGE property i) follows from the analysis in

section 4.1, since the property requires a common convergence radius for all deterministic

policies. For the µ − UGR property, the Key theorem of Part II together with a data

transformation similarly to section 4.2 yield the Laurent expansion.

vii) and ix) are directly implied by condition µ − UGE. Furthermore, mark that the

componentwise continuity of P (f) and P (f)µ on F imply inductively the componentwise

continuity of Pn(f) and P

n(f)µ by virtue of Proposition 18, p.232, of Royden [1968], so

Page 108: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

98 Part II, Chapter 5

that Pn(f) is µ-continuous on f for n ∈ IN. Then the continuity of Πij(f) and (Π(f)µ)i,

∀ i ∈ E, directly follows. Indeed, these functions are the limits of the uniformly convergent

sequences Pnij(f)n∈IN0 and (Pn(f)µ)in∈IN0 of continuous functions on the compact

topological space F with values in the metric space IR (cf. Royden [1968] Problem 8.3.17).

Similar arguments can be applied for the µ-continuity of D(f).

We establish the stochasticity of Π(f) by taking the rewards ria to be equal to 1, ∀ a ∈A(i), i ∈ E. Then 1 = gi(f

∞) =∑j Πij(f), ∀ f ∈ F . Alternatively it is implied by

µ− UGE through (2.8). In the case of µ− UGR a more direct argument can be used. This

property implies the existence of a solution to a generalized version of Foster’s criterion

for positive recurrence, for any MC induced by an f ∈ F (cf. Hordijk [1974], Tweedie

[1975]). The same result also follows from a tightness property that will be discussed in

the next section.

The uniform boundedness of ‖Pn(f)‖µ, n ∈ IN0 and ‖Π(f)‖µ on F under condition µ−UGRcan be derived by last exit decomposition on set M similarly to the proof of Lemma 7.2.

The section concludes with a review of recurrence conditions related to uniform strong

recurrence, which we need for the proof of our Key theorem. The analogous conditions

for one MC discussed in section 2.1, are special cases of these. They were introduced in

Dekker [1985], and their relationship will also be discussed in a future paper by Dekker,

Hordijk and Spieksma [1990]. We adjusted the nomenclature in the papers by Dekker &

Hordijk for a better consistence with existing terminology for MC’s. Also we introduced

µ− UGR as a condition on the one step transition matrices for reasons of verifiability.

Let µ be a vector with µi ≥ 1 ∀ i ∈ E, and M ⊂ E be a finite set.

Definition 5.4: The set of MC’s with transition matrices P (f), f ∈ F , satisfies condition

– µ− UWGR(M), if ∃c > 0, β < 1, such that ∀ f ∈ F

‖Pn(f)‖µ ≤ cβn, n ∈ IN0.

– µ− UR(M), if ∃n0 ∈ IN0, β < 1, c1 > 1, such that ∀ f ∈ F

‖Pn0(f)‖µ ≤ β, ‖P (f)‖µ ≤ c1.

– µ− UBS(M), if ∃c2 > 1, such that ∀ f ∈ F

‖∞∑n=0

MPn(f)‖µ ≤ c2.

– µ − UWGRRS(M), if ∃c > 0, β < 1, such that ∀ f ∈ F M contains a set of reference

states B(f) with

‖B(f)Pn(f)‖µ ≤ cβn, n ∈ IN0.

Page 109: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction 99

– µ − URRS(M), if ∃n0 ∈ IN0, β < 1, c1 > 1, such that ∀ f ∈ F M contains a set of

reference states B(f) with

‖B(f)Pn0

(f)‖µ ≤ β, ‖P (f)‖µ ≤ c1.

– µ − UBSRS(M), if ∃c2 > 1, such that ∀ f ∈ F M contains a set of reference states

B(f) with

‖∞∑n=0

B(f)Pn(f)‖µ ≤ c2.

The letter combinations U, (W )GR, BS, RS stand for Uniform, (Weak) Geometric Recur-

rence, Bounded Sum, Reference States.

For bounded µ, µ−UBS(M) is the same as requiring the recurrence times to a finite set M

to be uniformly bounded in the starting states and the deterministic policies. µ− UR(M)

then requires a positive lower bound on the probability of being in set M at time n0

uniformly in the initial states and the deterministic policies. This is the simultaneous

Doeblincondition (cf. Hordijk [1974]).

In Federgruen, Hordijk & Tijms [1978a,b] the equivalence of e − UR(M), e − UBS(M),

e−URRS(M) and e−UBSRS(M) was proved for unichain MDC’s. Moreover, these conditions

are shown to be equivalent to e−UGE, if aperiodicity is assumed as well. In Zijm [1985] the

same relations are established for multichain MDC’s if in addition to conditions e−UR(M)

or e− UBS(M), ν(f) is finite and continuous on F .

Combination of results by Dekker & Hordijk [1989], [1990] and Key theorem II yield the

same assertions as in Zijm [1985] when we work in the space of µ-bounded vectors instead

of e− bounded or uniformly bounded vectors. The results by Dekker, Hordijk & Spieksma

are useful for our analysis, so we recapitulate these in a lemma (cf. also section 2.2 of this

monograph).

Lemma 5.2

i) µ− UWGR(M), µ− UR(M) and µ− UBS(M) are equivalent.

ii) µ− UWGRRS(M), µ− URRS(M) and µ− UBSRS(M) are equivalent.

iii) µ− UWGR(M), µ− UR(M) and µ− UBS(M) together with continuity of ν(f) on F are

equivalent to µ− UWGRRS(M), µ− URRS(M) and µ− UBSRS(M).

The relation between µ−UGR(M) and these properties is expressed in the following lemma.

Lemma 5.3

µ− UGR(M) ⇒ µ− UWGR(M)i)

µ− UWGR(M)⇒ µ− UGR(M) with µ = supf∈F

∞∑n=0

MPn(f)µ.ii)

Proof: i) trivial. For the proof of ii) we use results for positive dynamic programming

from Hordijk [1974]. By Remark 5.1 it is sufficient to consider Markov policies. So, let

Page 110: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

100 Part II, Chapter 5

R = (π0, π1, . . .) ∈ C(M), and P (π) the transition matrix of the MC induced by the

stationary decision rule π. We write MPn(R) for the product MP (πn−1) · · ·MP (π0) and

MP0(R) = I. By the existence of nearly optimal policies in positive dynamic programming

we have that

supR∈C(M)

∑n∈IN0

MPn(R)µ = sup

f∈F

∑n∈IN0

MPn(f)µ =: µ,

so that µ is µ−excessive with respect to MP (f) | f ∈ F, i.e.

µ+ MP (f)µ ≤ µ, ∀ f ∈ F .

If condition µ− UWGR(M) holds for the constants c > 0, β < 1, then µ ≤ µ ≤ (1− β)−1cµ.

Hence,

MP (f)µ ≤ µ− µ ≤

1− 1− βc

µ.

Thus µ− UGR(M) holds for the constant β = 1− (1− β)/c.

Notice that µ is bounded iff µ is.

Page 111: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 101

CHAPTER SIX

Necessary and sufficient Conditions for the Laurent Expansion.

1. Lemmas on the underlying MDC structure.

Throughout this section Assumption 5.1 is supposed to hold. The results of the lemmas

obtained here, are all implied by at least one of the main conditions of this chapter: µ−UGEwith ν < ∞, µ − UGR with ν(f) continuous on F or Lasserre’s condition on the Laurent

expansion together with ν < ∞. As a first step to obtain µ− UGR from one of the other

two conditions, we need the existence of a finite set K ⊂ E that can be reached from all

initial states under all deterministic policies. The following lemma guarantees this under

weak conditions. It is Lemma 3.2 from Zijm [1985], albeit in a slightly modified version.

The proof is essentially due to Deppe [1985].

Lemma 6.1 Let ν(f) < ∞ and Π(f) stochastic for all f ∈ F . Then there is a finite set

K ⊂ E, such that K contains a set B(f) of reference states for any f ∈ F .

Under an additional continuity condition this can be strengthened.

Lemma 6.2 Let ν(f) <∞, Π(f) stochastic for all f ∈ F and componentwise continuous

on F . Then there are a finite set K ⊂ E and an ε > 0, such that K contains for any

f ∈ F a set B(f) of reference states with

Πbb(f) ≥ ε, ∀ b ∈ B(f). (6.1)

Combining this with Lemma 2.1 we achieve the existence of a positive lower bound on

the stationary probability on set K, uniformly in the initial states and the deterministic

policies. The proof of the lemma is implicit in Zijm’s [1985] proof of Lemma 3.3, but we

give it here explicitly.

Proof of Lemma 6.2: Lemmas 2.1 and 6.1 imply the existence of a finite set D ⊂ E with

FiD(f) = 1, ∀ i ∈ E, f ∈ F . Choose δ ∈ (0, 1). For i ∈ D there is a finite set D(i) ⊂ E,

for which∑j∈D(i) Πij(f) > δ, ∀ f ∈ F . To see this, we use similar arguments as in

Theorem 3 of Federgruen, Hordijk & Tijms [1978b].

Let Snn∈IN be a sequence of finite sets such that Sn ⊂ E, Sn+1 ⊇ Sn and limn→∞ Sn =

E. Let an(f) :=∑j∈Sn Πij(f), a(f) ≡ 1, for all f ∈ F . Then a, an, n ∈ IN, are continuous

functions on F ; an(f) ≤ an+1(f), n = 1, 2, . . ., and limn→∞ an(f) = 1 = a(f), for f ∈ F .

Since F is compact, we can apply Dini’s theorem of uniform convergence (cf. Royden

Page 112: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

102 Part II, Chapter 6

[1968]) to obtain the uniform convergence of an to a.

Define N := maxi∈D |D(i)|. For all i ∈ D and f ∈ F , there is a state si,f ∈ D(i) such

that ε := δ/N ≤ Πisi,f (f). Then ε ≤ Πsi,fsi,f (f), by Lemma 2.1. D contains a set B(f)

of reference states ∀ f ∈ F . Let K := ∪i∈DD(i)∪D. Then sb,f | b ∈ B(f) ⊂ K is a set

of reference states that satisfies the assertion of the lemma.

Componentwise continuity of Π(f) on F is closely related to tightness of the collection

Πi•(f) | f ∈ F, for any i ∈ E. Indeed, we can prove the following generalization to pp.

82-83, Hordijk [1974].

Lemma 6.3 ν(f) < ∞, and µ be a vector on E with µi ≥ 1, ∀ i ∈ E. Then the two

following sets of conditions are equivalent:

i)

ν(f) continuous on F

Πi•(f) | f ∈ F is uniformly integrable with respect to µ,hence tight, ∀ i ∈ E.

ii)

Π(f) stochastic , ∀ f ∈ F

Π(f) µ-continuous on F .

Proof: i) ⇒ ii). As µi ≥ 1, for all i ∈ E, the collection Πi•(f) | f ∈ F is tight, and the

stochasticity follows. Next we prove the componentwise continuity of Π(f).

Let fnn∈IN be a convergent sequence in F with limit f∗ ∈ F . The corresponding

collection Πi•(fn) | n ∈ IN is tight, for any i ∈ E, since a subset of a tight set is tight.

Choose i ∈ E. By virtue of a theorem of Prohorov (cf. Billingsley [1968]) it contains

a weakly convergent subsequence Πi•(fnk)k∈IN with weak limit say Πi•. As the state

space is discrete, all states are closed sets, hence

limk→∞

Πij(fnk) = Πij , ∀ j ∈ E.

By a diagonalization procedure we obtain a subsequence, call it again nkk∈IN, of IN, and

a stochastic matrix Π, for which

limk→∞

Πij(fnk) = Πij , ∀ i, j ∈ E. (6.2)

Clearly, Π(fnk)P (fnk) = Π(fnk) = P (fnk)Π(fnk). So, together with Proposition 18 in

Royden [1968] this yields ΠP (f∗) = Π = P (f∗)Π, if we let k tend to infinity. Iterating,

summing and averaging we obtain with the Fubini-Tonelli theorem Π 1N+1

∑Nn=0 P

n(f∗) =

Π = 1N+1

∑Nn=0 P

n(f∗)Π. Again apply Royden’s Proposition 18 to establish

Π ·Π(f∗)=Π (6.3)

Π(f∗) ·Π=Π (6.4)

The stochasticity of Π(f) and Lemma 6.1 yield the existence of a finite set D ⊂ E that

contains a set of reference states for any f ∈ F . We claim that D contains a set B∗, such

that B∗ is a set of reference states for f∗ and a subsequence of fnkk∈IN.

Page 113: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 103

Obviously, since D is finite and ν(f) is integer valued and continuous on F , there is

a subsequence of nkk∈IN, call it nkk∈IN for simplicity, such that B(fnk) = B and

ν(f) = ν(f∗) = ν, for some finite set B ⊂ D and ν <∞.

If b ∈ B are all positive recurrent states in the MC induced by f∗, we can choose B∗ = B.

Indeed, in this case Pnbb′(f

∗) = limk→∞ Pnbb′(fnk) = 0, n = 1, 2, · · ·, for any b, b′ ∈ B.

On the other hand, suppose that b ∈ B is transient under f∗. Since D contains a set of

reference states for f∗, there is a state b′ ∈ D, such that Fbb′(f∗) > 0, hence P

nbb′(f

∗) > 0

for some n. Consequently, the componentwise continuity of Pn(f) implies the existence

of a constant Kb, such that Pnbb′(fnk) > 0 if k ≥ Kb. This means that b′ is in the same

positive recurrent class as b in the MC induced by fnk , for k ≥ Kb.

Adjust the subsequence nkk∈IN and replace b by b′. Iterating the procedure for all states

b ∈ B that are transient under f∗, we end after finitely many steps with a set B∗ that

satisfies our claim.

Let Cl(f∗), l = 1, . . . , ν be the positive recurrent classes and T (f∗) be the set of transient

states in the MC induced by f∗. We write B∗ = b1, . . . , bν with bl ∈ Cl(f∗), and show

that Π = Π(f∗), without explicitly referring to the results of Lemma 2.1. Then we consider

the following cases:

1) j ∈ T (f∗): by (6.3) Πij =∑k∈E

ΠikΠkj(f∗) = 0.

2) i ∈ Cl(f∗): for any b, b′ ∈ B∗, Πbb′(fnk) = 0, hence Πbb′ = 0. Furthermore, combina-

tion with 1) and (6.3) yields for m 6= l

0 = Πblbm =∑

k 6∈T (f∗)

ΠblkΠkbm(f∗)

=∑

k∈Cm(f∗)

ΠblkΠbmbm(f∗).

Since Πbmbm(f∗) > 0, Πblk = 0 if k ∈ Cm(f∗) for m 6= l. Together with 1) this gives

that Πblk = 0 if k 6∈ Cl(f∗). Hence,∑k∈Cl(f∗) Πblk = 1. Then, by (6.3) we have for

j ∈ Cl(f∗)Πblj =

∑k∈Cl(f∗)

ΠblkΠkj(f∗) = 1 ·Πblj(f

∗).

Using this relation, (6.4), the fact that i ∈ Cl(f∗) and

Πij =∑k∈E

Πik(f∗)Πkj =∑k∈E

Πblk(f∗)Πkj = Πblj ,

we obtain that Πij = Πij(f∗) for any i 6∈ T (f∗) and all j ∈ E.

3) i ∈ T (f∗). Use (6.4) and 2) to achieve for any j ∈ E

Πij=∑k∈E

Πik(f∗)Πkj =ν∑l=1

Fibl(f∗)

∑k∈Cl(f∗)

Πblk(f∗)Πkj

=

ν∑l=1

Fibl(f∗)Πblj=

ν∑l=1

Fibl(f∗)Πblj(f

∗) = Πij(f∗).

Page 114: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

104 Part II, Chapter 6

2) and 3) together prove that Πij = Πij(f∗) for any i, j ∈ E. Combination with relation

(6.2) yields the componentwise continuity of Π(f) on F .

The componentwise continuity of Π(f) and the uniform integrability condition together

establish the continuity of(Π(f)µ

)i

on F , for any i ∈ E.

ii) ⇒ i). We prove the continuity of ν(f) on F . So, let fnn∈IN be a converging sequence

in F with limit f∗. Let D ⊂ E be a finite set as in the assertion of Lemma 6.2, i.e. D

contains a set of reference states B(f) with Πbb(f) ≥ ε for all b ∈ B(f), for any f ∈ F ,

and some ε > 0.

Consider any set B ⊂ D that occurs infinitely often in the sequence B(fn)n∈IN, and let

fnkk∈IN be the subsequence for which B(fnk) = B. Since Πbb′(fnk) = 0, the continuity

of Πbb′(f) implies that Πbb′(f∗) = 0 for any b, b′ ∈ B. Hence, the states in B are positive

recurrent and do not communicate in the MC induced by f∗. Thus ν(f∗) ≥ |B|.

Suppose that ν(f∗) > |B|. By Lemma 6.2 a positive recurrent state b∗ ∈ D exists that

does not communicate with the states in B. However,

Πb∗B(f∗) = limk→∞

Πb∗B(fnk) = limk→∞

∑b∈B

Fb∗b(fnk)Πbb(fnk) ≥ ε.

Consequently,

ν(f∗) = |B| = ν(fnk), k ∈ IN. (6.5)

As B was an arbitrary limiting set, (6.5) holds for all limiting sets. Since there are only

finitely many limiting sets, there is an N0 ∈ IN, such that each B(fn) occurs infinitely

often for n ≥ N0. We conclude that ν(fn) = ν(f∗) for n ≥ N0.

The componentwise continuity of Π(f), Π(f)µ together with Dini’s theorem on uniform

convergence establish the uniform integrability of the set Πi•(f) | f ∈ F with respect

to µ, similarly to the proof of Lemma 6.2.

Two observations are trivial now, but convenient to state.

Lemma 6.4 Let ν(f) < ∞ and µ be a vector with µi ≥ 1 for all i ∈ E. Furthermore

Π(f) be stochastic for any f ∈ F and Πi•(f) | f ∈ F uniformly integrable with respect

to µ. Then there is a finite set K(ε) ⊂ E that contains a set of reference states B(f), such

that∑j 6∈K(ε) Πij(f)µj ≤ ε, ∀ i ∈ E, f ∈ F .

Proof: By virtue of Lemma 6.1 we can find a finite set D ⊂ E that contains a set of

reference states B(f) for any f ∈ F . Choose ε > 0.

The uniform integrability condition yields for each i ∈ D the existence of a finite set

D(i, ε) with∑j 6∈D(i,ε) Πij(f)µj < ε for any f ∈ F . Let K(ε) = ∪

i∈DD(i, ε) ∪D, then also∑

j 6∈K(ε) Πij(f)µj < ε, for any i ∈ D and f ∈ F . For i 6∈ D

Page 115: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 105

∑j 6∈K(ε)

Πij(f)µj=∑

b∈B(f)j 6∈K(ε)

Fib(f)Πbj(f)µj

≤∑

b∈B(f)

Fib(f) · ε = ε.

Corollary 6.1 The collection Πi•(f) | i ∈ E, f ∈ F is uniformly integrable with respect

to µ, consequently tight.

A brief review of the results presented hitherto is useful. For µ − UGE and Lasserre’s

condition, condition set ii) of Lemma 6.3 is easily verified, as we already mentioned in

section 5.2. This implies condition set i) and the assertion of Lemmas 6.1, 6.2 and 6.4,

thus providing us information on the behaviour of the system with respect to some finite

set. This will help us to prove µ − UGR. On the other hand, if µ − UGR is assumed, it

turns out that the conditions of Lemma 6.4 are easier to verify. If additionally continuity

of ν(f) is assumed, condition set i) holds. Lemma 6.5 gives conditions on the transition

matrices for the assumptions of Lemma 6.4. In the next section we show the assumption

of the Lemma 6.5 to hold for property µ− UGR.

Lemma 6.5 Let µ be a vector on E with µi ≥ 1, ∀ i ∈ E, and (P (f1) · · ·P (fN )

)i• |

f1, . . . , fN ∈ F , N ∈ IN be a uniformly integrable set with respect to µ. Then Π(f) is

stochastic for any f ∈ F and the set Πi•(f) | f ∈ F is uniformly integrable with respect

to µ.

Proof: The collection Pni•(f) | f ∈ F , n ∈ IN is uniformly integrable with respect

to µ, hence the set of all convex combinations of these measures. In particular, the set

1N+1

∑Nn=0 P

ni•(f) | f ∈ F , n ∈ IN is uniformly integrable with respect to µ, consequently

tight and hence relatively compact by a theorem of Prohorov, for any i ∈ E. Using the

definition of the stationary matrix, this ensures the rowsums of Π(f) to be equal to 1 and

the uniform integrability of Πi•(f) | f ∈ F.

Under the assumption of µ−UGR, the assertion of Lemma 6.5 is used to reduce the proof of

µ−UGE to a derivation of suitable bounds for the difference between Pnij(f) and Πij(f), for

finitely many states j and uniformly in f ∈ F . The existence of such bounds is guaranteed

by Lemma 6.6. It is obvious that such a lemma is necessary as well if e-norms are used.

However, Federgruen, Hordijk & Tijms [1978a] do not prove it explicitly, since the stronger

MC structure allows more direct arguments to establish the e − UGE property. Yet the

basic ideas of the proof are contained in the proof of Theorem 2.5 in Federgruen, Hordijk

& Tijms [1978a].

Lemma 6.6 Let (P (f1) · · ·P (fN )

)i• | f1, . . . , fN ∈ F , N ∈ IN be tight for any i ∈ E.

P (f) be aperiodic for any f ∈ F , and ν(f) be continuous on F . Then Pnij(f) converges

to Πij(f) uniformly in f ∈ F as n tends to infinity, for any i, j ∈ E.

Page 116: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

106 Part II, Chapter 6

Proof: Lemmas 6.3 and 6.5 yield that Π(f) is stochastic and continuous on F . Choose ε >

0, i, j ∈ E. Let K ⊂ E be a finite set, such that∑k 6∈K P

nik(f) < ε/3 and

∑k 6∈K Πik(f) <

ε/3, ∀n ∈ IN, f ∈ F . We define

n(f) := minn ∈ IN | |Pnkj(f)−Πkj(f)| < ε

3|K|, ∀ k ∈ K

.

Then the set Sm := f ∈ F | n(f) ≥ m ⊂ F is closed ∀m ∈ IN. Indeed, let fnn∈IN ⊂Sm be a converging sequence with limit f∗ ∈ F , for some m ∈ IN. Suppose that f∗ 6∈ Sm.

This means that n(f∗) < m, so that

|Pn(f∗)kj (f∗)−Πkj(f

∗)| < ε

3|K|, ∀ k ∈ K.

Since Pnkj(f), Πkj(f) are continuous functions on F , this yields a contradiction with the

assumption that fn ∈ Sm, ∀n ∈ IN.

We conclude that the function n(f) is u.s.c. (upper semi-continuous) on the compact set

F , so that M := supf∈F

n(f) <∞ (cf. Royden [1968]). For all n > M and f ∈ F we obtain

|Pnij(f)−Πij(f)|=∑k∈E

|Pn−n(f)ik (f)

(Pn(f)kj (f)−Πkj(f)

)|

≤∑k∈K

Pn−n(f)ik (f)|Pn(f)

kj (f)−Πkj(f)|+∑k 6∈K

Pn−n(f)ik (f) · 2

≤ ε

3+ 2

ε

3= ε, ∀ f ∈ F .

2. Equivalence of recurrence and ergodicity properties.

2.1. Markov decision chains.

In this section both assumptions 5.1 and 5.2 are supposed to hold. The next lemma shows

that the uniform integrability property required for Lemmas 6.5 and 6.6, is implied by

condition µ − UGR. It is a generalization of Theorem 2.1 for one MC. Let µ be a vector

with µi ≥ 1 for all i ∈ E.

Lemma 6.7 Assume µ−UGR(M). Then the collection (P (f1) · · ·P (fN )

)i• | f1, . . . , fN ∈

F , n ∈ IN is uniformly integrable with respect to µ for any i ∈ E.

Proof: Choose any ε > 0. For any sequence fnn∈IN ∈ F we apply last exit decomposition

on set M . Then, with the convention that MP (fn+1) ·MP (fn) = I,(P (f1) · · ·P (fn)

)ij

=(MP (f1) · · ·MP (fn)

)ij

+n−1∑k=0

∑m∈M

(P (f1) · · ·P (fn−k)

)im

(MP (fn−k+1) · · ·MP (fn)

)mj. (6.6)

Page 117: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 107

Let M ′ be any set containing M . Multiply both sides in (6.6) with µj and sum over all

states outside M ′. This gives∑j 6∈M ′

(P (f1) · · ·P (fn)

)ijµj=

∑j 6∈M ′

(MP (f1) · · ·MP (fn)

)ijµj

+∑j 6∈M ′

n−1∑k=0

∑m∈M

(P (f1) · · ·P (fn−k)

)im

(MP (fn−k+1) · · ·MP (fn)

)mjµj

≤ βnµi +∑m∈M

n−1∑k=0

∑j 6∈M ′

(MP (fn−k+1) · · ·MP (fn)

)mjµj , (6.7)

as ‖MP (f1) · · ·MP (fn)‖µ ≤ βn, for any sequence f1, . . . , fn ∈ F , and any n ∈ IN. Choose

No > 1, such that βNo(1− β)−1 max(∑m∈M µm, µi) < ε/2. Let n > No. Then

∑m∈M

n−1∑k=No

∑j 6∈M ′

(MP (fn−k+1) · · ·MP (fn)

)mjµj ≤ (βNo + · · ·+ βn−1)

∑m∈M

µm,

so that

(6.7) <ε

2+∑m∈M

No−1∑k=0

∑j 6∈M ′

(MP (fn−k+1 · · ·MP (fn)

)mjµj , (6.8)

for all choices of fnn∈IN, any n > No and any set M ′ ⊃ M .(P (f)µ

)l

is continuous on

F , ∀ l ∈ E. Furthermore supf∈F ‖P (f)‖µ ≤ c for some c > 0 by virtue of Proposition

5.1. We invoke again Royden’s Proposition 18, p. 232 [1968] to obtain inductively the

continuity of[(P (f1) · · ·P (fn)

)µ]l

on the compact set F n, n ∈ IN, l ∈ E. Since N0 is

finite, we conclude that a finite set M∗ ⊃M exists, such that for any k ≤ N0 − 1∑j 6∈M∗

(P (f1) · · ·P (fk)

)ljµj <

ε

2|M | ·N0, l ∈M, or l = i, ∀ f1, . . . , fk ∈ F . (6.9)

Combine this with (6.8) to establish that∑j 6∈M∗

(P (f1) · · ·P (fn)

)ijµj < ε, for any se-

quence f1, . . . , fn ∈ F and n > No. By (6.9) this inequality holds for n ≤ N0 as well,

which completes the proof of the lemma.

Since the Key theorem uses property µ− UWGR(M) instead of µ− UGR(M), the following

corollary is convenient to state.

Corollary 6.2 µ−UWGR(M) implies that the collection (P (f1) · · ·P (fn)

)i• | f1, . . . , fn ∈

F , n ∈ IN is uniformly integrable with respect to µ for any i ∈ E.

Proof: By Lemma 5.2 µ − UGR(M) holds for µ = supf∈F∑n MP

n(f)µ, so that the set

above is uniformly integrable with respect to µ. As µ ≤ c′µ for some c′ > 0 by Lemma

5.1, this set is also uniformly integrable with respect to µ.

As a conclusion of the previous analysis, we summarize the results hitherto obtained in

the form we will need for our Key theorem.

Page 118: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

108 Part II, Chapter 6

Corollary 6.3

i) µ − UGR(M), hence µ − UWGR(M), together with aperiodicity and continuity of ν(f)

on F imply the assertion of Lemma 6.7, consequently of Lemmas 6.1 to 6.6.

ii) µ− UGE together with the finiteness of ν(f), for any f ∈ F , implies the assertions of

Lemmas 6.1 to 6.4.

We are finally in a position to prove the Key theorem. By the prepatory lemmas this proof

is nearly reduced to a mere copy of the proof of the Key theorem of Part I, which states

the same result for one MC. Therefore we will only discuss the details by which the proofs

differ.

Key theorem II The two following sets of conditions are equivalent

i)

µ− UGE

ν(f) <∞, ∀ f ∈ F .

ii)

µ− UWGR(M)

ν(f) continuous on FP (f) aperiodic for any f ∈ F .

Proof: Replace all quantities as the transition matrix P , the stationary matrix Π, the

deviation matrix D and the set of reference states B in the proof of Key theorem I by the

corresponding matrix functions on F . Evidently, all bounds in Part I that are derived by

a direct application of the conditions of the theorem, are uniform bounds in f ∈ F here.

So, we only have to consider a few places where we have to invoke the lemmas derived

above.

i) ⇒ ii). Similarly as in the proof of Key theorem I we will show µ− UBS(M). The vectors

w, g and v are functions on F here as well. There are two things that need attention.

In the first place we need the existence of one fixed set M , such that (2.10) holds for all

f ∈ F .

Consider the conditions on set M in that proof. Then these imply the following conditions

for M here. M contains a set B(f) of reference states, ∀ f ∈ F∑j 6∈M

Πij(f)µj ≤ ε, ∀ i ∈ E, f ∈ F and some ε > 0.

However, the existence of such a set is guaranteed by Corollary 6.3 and Lemmas 6.2 to

6.4. These lemmas also establish the continuity of ν(f) on F .

Secondly, the righthandside of (2.10) has to be µ-bounded, not only in N ∈ IN, but also

in f ∈ F . By Proposition 5.1 PN

(f) is uniformly µ-bounded in N ∈ IN and in f ∈ F .

Hence the same applies for MPN

(f). Since D(f) is uniformly µ-bounded in f ∈ F , say

Page 119: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 109

by a constant c′ > 0, we have

|vi(f)| = |∑j 6∈M

Dij(f)µj − minm∈M

∑j 6∈M

Dmj(f)µj |

≤ c′µi + maxm∈M

∑j 6∈M

|Dmj(f)|µj

≤ c′(1 + maxm∈M

µm)µi.

This establishes µ − UBS(M). Since Pnij(f) converges to Πij(f) the aperiodicity is auto-

matically fulfilled. This completes the first part of the proof.

ii) ⇒ i) It is sufficient to prove that supf∈F ‖Pn(f) − Π(f)‖µ → 0, for n → ∞. Indeed,

suppose that supf∈F ‖Pno(f) − Π(f)‖µ < ε for some ε ∈ (0, 1) and some no ∈ IN. By

Proposition 5.1viii) there is a c′ > 0, such that

supf∈F‖Pn(f)−Π(f)‖µ ≤ sup

f∈F

‖Pn(f)‖µ + ‖Π(f)‖µ

≤ 2c′, ∀n ∈ IN0.

If n ∈ kno, . . . , (k + 1)no − 1 for some k ∈ IN0, then with β = ε1/no ,

‖Pn(f)−Π(f)‖µ = ‖(P (f)−Π(f)

)n‖µ≤ ‖Pno(f)−Π(f)‖kµ · ‖P

n−kno(f)−Π(f)‖µ

≤ εk · 2c′ = βkno · 2c′ ≤ βn · 2c′

ε,

so that µ− UGE holds for the constants c = 2c′/ε and β.

We start from condition µ − UWGRRS(M). There are only three places where we have

to pay attention to the derivation of the appropriate bounds. The first place is where a

suitable finite set D is created that reduces the third term in (2.11) to an analysis of a

finite summation over j. A careful check reveals that it is sufficient to require for this set

D the following conditions:

D ⊃M ⊃ B(f), ∀ f ∈ F∑j 6∈D

Pkmj(f)µj ≤

ε

4, ∀ k ∈ IN, ∀m ∈M, ∀ f ∈ F

∑j 6∈D

Πij(f)µj ≤ε

4, ∀ i ∈ E, f ∈ F .

Such a set D exists by virtue of Lemma 6.7 and Corollary 6.1. The second place concerns

the choice of the constant N(ε). We need that the kth step transition matrix and the

stationary matrix differ only slightly from each other, for finitely many pairs of states and

all sufficiently large values of k, uniformly in f ∈ F . In formula

|P kmj(f)−Πmj(f)| < ε

4|D|(maxd∈D

µd)−1

, ∀m ∈M, ∀ j ∈ D, ∀ k ≥ N(ε), ∀ f ∈ F .

Page 120: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

110 Part II, Chapter 6

This is ensured by Lemma 6.6. The third place is inequality (2.20). We increase the bound

by summing over b ∈ D instead of over b ∈ B(f) only. Then the corresponding summation

in the last inequality of the proof is also replaced by a summation over D. This obviously

does not invalidate the conclusion, so that the proof of the second part is completed.

The relations between the various recurrence and ergodicity properties is captured schemat-

ically in the following table.

µ− UGE

ν(f) <∞mν(f) continuous on F

aperiodicity+

µ− UGR(M) =⇒µ− UGR(M),µ =

∑n MP

n(f)µ

⇐=

µ− UWGR(M) ⇐⇒ µ− UBS(M) ⇐⇒ µ− UR(M)

+ν(f) continuous on F

mµ− UWGRRS(M)⇐⇒ µ− UBSRS(M)⇐⇒ µ− URRS(M)

Table 4

Notice that µ−UGR implies µ−UGE for the same µ-vector. Thus, under the assumption of

µ−UGR a data transformation technique, this result and the analysis in Dekker & Hordijk

[1988] can be used to establish the existence of sensitive optimal policies, for all reward

functions that are uniformly µ-bounded in the deterministic policies. Moreover, for µ−UGRmodels, Key theorem II also yields geometrically fast convergence of the marginal expected

rewards to the rewards under the stationary distribution, for all uniformly µ-bounded re-

ward functions, such that the convergence is uniform in the deterministic policies. Finally,

the Key theorem shows the validity of the title of this section.

Corollary 6.4 Uniform strong recurrence together with aperiodicity and continuity of

ν(f) is equivalent to uniform strong convergence with ν(f) <∞.

2.2. Uniformizable Markov decision processes.

The equivalence result of the previous section can be easily extended to hold for uniformiz-

able MDP’s similarly as in section 2.3. In fact, by combination of the two the proofs in

this section are empty. However, for sake of completeness I preferred to state the model

and results explicitly, as a basis for future research on sensitive optimality of MDP’s. In

interest of brevity we omit details. The model description is adopted from Serfozo [1979].

Consider a dynamic system that is controlled continuously in time. At each moment it is

in one of a denumerable set E of states. If the system is in state i ∈ E an action from set

A(i) is chosen. If the action selected is a, a reward ria is incurred and the system remains

Page 121: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 111

an exponentially (qia) distributed amount of time in state i, after which it jumps to state

j 6= i with probability Piaj .

Since our conditions involve only deterministic policies, we restrict to the set F . Then no

problems regarding (semi-)continuity and measurability aspects of policies occur.

The process induced by f∞ is a MP with intensity matrix Q(f), which is defined as

Qij(f) =

− qif(i), j = i

qif(i) · Pif(i)j , j 6= i.

Since F thus induces a set of MP’s, we need some continuity and compactness conditions

on the MDP, which are the continuous time equivalents of Assumptions 5.1 and 5.2. For

our results also a “uniform” uniformizability condition is necessary. Let µ be a vector on

E with µi ≥ 1, i ∈ E.

Assumption 6.1:

i) A(i) is a compact set for i ∈ E.

ii) qia Piaj and∑j 6=i qiaPiajµj are continuous on A(i) for i, j ∈ E.

iii) q := supi,a qia <∞.

For notational convenience we denote with P (t, f), MP (t, f) the transition matrix and

taboo probability matrix of the MP induced by policy f∞ and with Ph(f) the transition

matrix of the approximating chain, for h < q−1. Then

Ph,ij(f) =

1− hqif(i), j = i

hqif(i)Pif(i)j , j 6= i.

All other quantities are expressed similarly as for MDC’s.

Most ergodicity and recurrence conditions of the previous sections are extended in an

obvious way to the framework of MDP’s. Indeed, we replace “geometric” by “exponential”

Alternatively, we use the concepts in section 2.3 by adding “uniform”. Then also Tweedie’s

criterion can be generalized. We will only give the definitions that are explicitly used in

this section.

Let µ be a vector with µi ≥ 1, ∀ i ∈ E.

Definition 6.1: The MDP satisfies property

– µ-uniform exponential ergodicity (µ− UEE), if constants c, β, t0 > 0, exist, such that

∀ f ∈ F ‖P (t, f)−Π(f)‖µ ≤ c · e−βt, ∀ t ≥ 0

‖P (t0, f)‖µ ≤ c

– µ-uniform weak exponential recurrence (µ − UWER), if there are constants c, β > 0

and a finite set M ⊂ E, such that ∀ f ∈ F

‖MP (t, f)‖µ ≤ ce−βt.

Page 122: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

112 Part II, Chapter 6

– uniform strong convergence (recurrence) if µ − UEE (µ − UWER) holds for some µ

with µi ≥ 1, ∀ i ∈ E.

The following theorem is a straightforward generalization of Theorem 2.2, for one MP.

Theorem 6.1 Consider the MDP and the AMDC. Then

µ− UWER(M) of the MDP⇐⇒ µ− UWGR(M) of the AMDCi)

µ− UEE of the MDP ⇐⇒ µ− UGE of the AMDCii)

Proof: For fixed f ∈ F , the assumptions of this section imply the assertions of Theorem

2.2. All bounds used in the proof of that theorem are uniform bounds in f ∈ F here.

Combination with the results from the previous section proves the following relations.

Theorem 6.2 The following sets of conditions are equivalent.

i) µ− UEE of the MDP, ν(f) <∞.

ii) µ− UWER(M) of the MDP, ν(f) continuous on F .

iii) µ− UGE of the AMDC, ν(f) <∞.

iv) µ− UWGR(M) of the AMDC, ν(f) continuous on F .

Thus µ − UGR(M) of the AMDC implies both µ − UWER(M) and µ − UEE of the MDP,

through Lemma 5.3. Hence, for µ- bounded reward functions r(f) on F the expected

marginal rewards converge exponentially fast, uniformly in f ∈ F , i.e.

supf∈F

µi−1∑j∈E|Pij(t, f)rj(f)−Πij(f)rj(f)| ≤ sup

f∈F‖P (t, f)−Π(f)‖µ · sup

f∈F‖r(f)‖µ ≤ ce−βt,

for some c, β > 0.

3. On the relation with Lasserre’s condition.

In the Introduction we pointed out, that the analysis of sensitive optimal policies involves

the (partial) Laurent expansion of the matrixfunction P (f, z) in z = 1. In the case of

Blackwell optimality it is important to have the existence of the entire Laurent expansion,

even more so if we do not restrict to specific immediate rewards, as in negative dynamic

programming. Then for µ-bounded reward structures a natural and weak condition to

ensure this, is the following generalization to Condition 4.4. For our derivations we suppose

that Assumptions 5.1 and 5.2 hold.

Condition 6.1:

i) ν(f) <∞, ∀ f ∈ F .

ii) (1− z)Pij(f, z) can be analytically continued in a common disk D1,R, for some R > 0

and all i, j ∈ E, f ∈ F .

iii) sup‖(1− z)P (f, z)‖µ | z ∈ C1,x, f ∈ F <∞ ∀x ∈ (0, R).

In essence this is tantamount to Lasserre’s condition, which we will comment on after the

proof of the following theorem.

Page 123: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 113

Theorem 6.3 Condition 6.1 and µ− UWGRRS(M) are equivalent.

Proof: Obviously, condition µ−UWGRRS(M) implies Condition 6.1 by virtue of Proposition

5.1. Conversely, assume Condition 6.1. We will show that it implies µ − UBSRS(M),

analogously to the proof of Theorem 4.6. Indeed, for fixed f ∈ F all observations in the

proof of that theorem apply.

Thus Π(f) is a stochastic matrix, and the Laurent expansion of P (f, z) exists in a common

disk D1,R, ∀ f ∈ F . Moreover, it satisfies expression (4.11), D(f) has Property 4.1 and

‖Π(f)‖µ, ‖D(f)‖µ <∞ (cf. 4.7 et seq.). Since all bounds here are uniform bounds in f ∈F , uniform µ-boundedness of Π(f) and D(f) follows immediately by a comparison with

the arguments for the case of only one MC. Observe, that similarly supf∈F ‖P (f)‖µ <∞.

Indeed, P (f, z) =∑n P

n(f)zn for x ∈ (0, R), so that

supf∈F,z∈C1,x

‖(1−z)P (f, z)‖µ ≥ supf∈F‖(1−x)

∞∑n=0

Pn(f)xn‖µ ≥ sup

f∈Fx(1−x)‖P (f)‖µ. (6.10)

Proposition 4.1i) implies that P (f, z) as a matrix function of z is analytic, hence contin-

uous, in operator norm, so that ‖P (f, z)‖µ is continuous. However, viewed as a matrix

function on F this is not true and the utmost we can achieve, is µ-continuity. To this end

we will use an argument from Lasserre [1988].

So, let f ′ ∈ F and fnn∈IN ⊂ F be any converging sequence with limit f ′. The equality

P (f, z)(I − zP (f)

)=(I − zP (f)

)P (f, z) = I gives

P (f ′, z)− P (fn, z) = zP (f ′, z)(P (f ′)− P (fn)

)P (fn, z). (6.11)

By virtue of Cauchy’s integral theorem, Π(f) = (2πi)−1∮C1,x P (f, z)dz, so that for x ∈

(0, R) and i ∈ E∑j∈E|Πij(f

′)−Πij(fn)|µj =∑j∈E| 1

2πi

∮C1,x

z[P (f ′, z)

(P (f ′)− P (fn)

)P (fn, z)

]ijdz|µj

=1

∑j∈E|

2π∫0

x2ie2iφ[P (f ′, xeiφ)

(P (f ′)− P (fn)

)P (fn, xe

iφ)]ijdφ|µj

≤ x2

2π∫0

∑j∈E|P (f ′, xeiφ)

(P (f ′)− P (fn)

)P (fn, xe

iφ)|ijµjdφ,

where the Fubini-Tonelli theorem is used for the third inequality. Hence, ∀x ∈ (0, R) and

i ∈ E

lim supn→∞

∑j∈E|Πij(f

′)−Πij(fn)|µj ≤

x2

2πlim supn→∞

2π∫0

∑j∈E|P (f ′, xeiφ)

(P (f ′)− P (fn)

)P (fn, xe

iφ)|ijµjdφ. (6.12)

Page 124: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

114 Part II, Chapter 6

The integrand is uniformly µ-bounded in f ∈ F and z ∈ C1,x by Condition 6.1iii) and

(6.10). Then the dominated convergence theorem yields that the limit may be passed

under the integral sign. Furthermore, the µ-continuity of P (f) on F together with Lemma

5.1 yield, that(P (f ′) − P (fn)

)P (fn, xe

iφ) → 0, as n tends to infinity. Together with

dominated convergence we obtain that the integrand converges to 0, for n tending to

infinity. Hence, the righthandside of (6.12) converges to 0, so that Π(f) is µ-continuous,

hence (componentwise) continuous.

A similar derivation yields µ-continuity of D(f), consequently of P (f, z) as a matrix func-

tion on F .

We will now proceed by proving that µ − UBSRS(M) holds. First we notice, that the

continuity of Π(f) and Lemma 6.2 yield the existence of a finite set M ⊂ E and ε > 0,

such that for any f ∈ F , M contains a set B(f) of reference states with

Πbb(f) ≥ ε, ∀ b ∈ B(f). (6.13)

By virtue of Theorem 4.6 ‖∑n B(f)P

n(f)‖µ < ∞, for f ∈ F , and we have to show that

the expression is uniformly in f ∈ F .

A close scrutiny of the proof together with the results hitherto obtained, reveals that we

only need check the existence of r, c > 0, such that

|(1− z)Pbb(f, z)| ≥ c, z ∈ D1,r, b ∈ B(f), f ∈ F . (6.14)

Clearly, since ‖D(f)‖µ is uniformly µ-bounded, there is an r > 0 such that

|(1−z)Pmm(f, z)−Πmm(f)| = |1− zz

∞∑n=0

(z − 1

z

)nDn+1mm (f)| < ε

2, m ∈M, z ∈ D1,r f ∈ F .

Combining this with (6.13), we obtain the validity of (6.14) for c = ε/2.

For a better comparison with Lasserre’s condition we will state it below.

Lasserre’s condition:

i) rσ(P (f)

)= 1, ∀ f ∈ F .

ii) There is an R > 0, such that the resolvent R(λ, P (f)) =(λI −P (f)

)−1has no other

singularities but z = 1 in D1,R.

iii) sup‖R(λ, P (f))‖µ | λ ∈ C1,x <∞, ∀x ∈ (0, R).

First we point out, that Lasserre uses the stationary policies in his conditions. It is quite

obvious that it is sufficient to restrict to the deterministic policies. Therefore we used the

version above. Moreover, in the proof of Theorem 4.6 we argued the superfluity of i) for

stochastic P (f). Incidentally, it seems that iii) has to be reformulated for µ-norms on

p. 487. By virtue of the Laurent expansions (4.11) and (4.7) of P (f, z) and R(λ, P (f))

respectively for fixed f , we obtain equivalence of Condition 6.1 and Lasserre’s condition

for ν(f) <∞.

Page 125: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Necessary and sufficient Conditions for the Laurent Expansion 115

So, the only difference between the two conditions consists of the condition on ν(f). The

assumption that this function is finite is rather crucial for our proof and it is not quite

clear whether it can be relaxed without further assumptions. At least we would need an

extension of Lemma 6.2 to the case that ν(f) ≤ ∞.

As a final remark we mention, that Key theorem II and Theorem 6.3 imply the equivalence

of µ − UGE and Condition 6.1 for aperiodic MDC’s with ν(f) < ∞. Thus, analyticity of

(1− z)P (f, z) as a matrix function on a disk D0,R, R > 1, is equivalent to analyticity on

a disk D1,R′ , uniformly in f ∈ F .

Page 126: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

116 Part II, Chapter 7

CHAPTER SEVEN

Convergence of the Successive Approximations Algorithm.

1. The iteration scheme.

Throughout this chapter we will suppose Assumptions 5.1, 5.2 to hold together with

continuity of ν(f) on F and ‖r(f)‖µ <∞, ∀ f ∈ F . Also we assume condition µ−UGR(M)

for some β < 1. For v0 a µ-bounded vector on E we consider the following iteration scheme:

vn+1i := max

a∈A(i)ria +

∑j∈E

Piajvnj , i ∈ E. (7.1)

Then vn+1i is the maximum expected total reward over (n+ 1) time periods for the scrap-

value vector v0, if the system starts in state i at time 0.

Notice, that the assumptions together with the dominated convergence theorem imply, that

the expression between brackets in (7.1) is a continuous function on A(i), if ‖vn‖µ < ∞.

Hence, the maximum is attained. Furthermore, Proposition 5.1ix) yields inductively

‖vn+1‖µ ≤ supf∈F‖r(f)‖µ + sup

f∈F‖P (f)‖µ‖vn‖µ <∞.

So we conclude, that the scheme is well-defined. It is known, that the iteration scheme

generally does not converge, if the MC’s induced by stationary policies are periodic (cf.

Schweitzer [1971]). Due to Schweitzer’s data transformation technique as applied in sec-

tion 4.2, we may assume aperiodicity without loss of generality. Indeed, the validity of

the assumptions is not affected, and neither are the stationary probabilities of the MC’s

generated by the stationary policies (cf. p. 71). By virtue of Proposition 5.1vi) the ex-

pected average rewards do not change, so that stationary optimal policies remain optimal

(within the class of stationary policies). Therefore we assume the following condition.

Assumption 7.1: The MC induced by any deterministic policy is aperiodic.

Consider a solution (g, v) to the average optimality equations (5.1). Roughly speaking, it

is known that vn − ng converges to the bias vector v in the optimality equation. We refer

to White [1963], Schweitzer & Federgruen [1977] for finite state and action MDC’s, and

to Hordijk, Schweitzer & Tijms [1975] for denumerable MDC’s with finite action spaces,

satisfying ‖v1 − v‖e < ∞. Moreover, if fn is a maximizing rule in the nth iteration, then

any vector limit point f∗ of the sequence fnn∈IN is an average optimal policy. Before

Page 127: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Convergence of the Successive Approximations Algorithm 117

answering these questions for our case, we turn to the analysis of the average optimality

equations.

2. v-conserving policies are s-average optimal.

It appeared to be relatively easy to incorporate the stronger “limsup” criterion in the

analysis, so let us state it first. We define the s-average expected reward vector gs(R)

under policy R as

gsi (R) = lim supN→∞

1

N + 1IEi,R

N∑n=0

rX(n),Y (n), i ∈ E.

Then, policy R∗ is s-average optimal iff gsi (R∗) ≥ gsi (R), ∀R ∈ C, ∀ i ∈ E. For

deterministic policies f∞ the s-average expected reward vector exists as a limit, hence

gs(f∞) = Π(f)r(f).

We will need some more terminology. Let (g, v) be a µ-bounded solution to the optimality

equation (5.1).

Definition 7.1: A policy R is called v-conserving iff it chooses with probability 1 only

actions that maximize the second optimality equation (5.1b).

Then we obtain directly from Proposition 5.1

Proposition 7.1 Let (g,v) be a µ-bounded solution to the average optimality equations

(5.1). Any v-conserving deterministic policy f∞ is s-average optimal within the class of

deterministic policies and g = gs(f∞).

To show that it is optimal within the class of all policies, we need a result on maximization

in (5.1b) over all actions. With this purpose in mind, we introduce the following system

for vectors g′ and v′ on E.g′i = max

a∈A(i)

∑j∈E

Piajg′j

v′i = maxa∈A(i)

ria − g′i +∑j∈E

Piajv′j.

(7.2a)

(7.2b)

Let (g′, v′) be a µ-bounded solution to (7.2).

Definition 7.2: A policy R is said to be v′-s-conserving if it chooses with probability 1

actions that maximize the righthandside of (7.2b).

Then in fact g′ is a superharmonic vector.

Definition 7.3: g′ is superharmonic iff there is a µ-bounded vector v′, such that ∀ a ∈ A(i),

∀ i ∈ E, g′i ≥

∑j∈E

Piajg′j

v′i ≥ ria − g′i +∑j∈E

Piajv′j .

(7.3a)

(7.3b)

Page 128: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

118 Part II, Chapter 7

The following lemma shows the existence of µ-bounded solutions to (7.2). We use an

elegant and short argument by Schal [1989].

Lemma 7.1 Let (g, v) be a µ-bounded solution to (5.1). There is a µ-bounded vector v,

such that (g, v) is a solution to (7.2). Moreover, g is the smallest superharmonic vector.

Proof: Consider a transformed MDC with ria := ria − g, a ∈ A(i), i ∈ E. We denote

by g(f∞) the average reward under policy f∞ for this transformed problem. Obviously,

supf∈F ‖r(f)‖µ <∞, hence a µ-bounded solution (g, v) to (5.1) exists.

By virtue of Proposition 7.1 g = g(f∞), for f∞ a v-conserving policy. Moreover,

g ≥ g(f∞) = Π(f)r(f).

Multiplication with Π(f) yields

Π(f)g ≥ Π2(f)r(f) = Π(f)r(f),

so that

g = Π(f)r(f) = Π(f)(r(f)− g

)≤ 0.

On the other hand, for f∞ a v-conserving policy Π(f)r(f)− g = 0, hence

g ≥ g(f∞) = Π(f)(r(f)− g

)= 0.

Combination gives g = 0. Consequently g = P (f)g, ∀ f ∈ F , so that

vi = maxa∈A(i)

ria +∑j∈E

Piaj vj = maxa∈A(i)

ria − gi +∑j∈e

Piaj vj.

This establishes (g, v) as a µ-bounded solution to (7.2). Next we show that g′ is the

smallest superharmonic vector. Suppose that g′ is a superharmonic vector for v′. Then

g′ ≥ P (f)g′, ∀ f ∈ F .Hence, by iteration of this inequality

g′ ≥ Π(f)g′ ≥ Π(f)r(f) + P (f)v′ − v′

= Π(f)r(f) = g(f∞), ∀ f ∈ F ,

so that g′ ≥ maxf∈F

g(f∞) = g.

For the sequel we need the following technical lemma, which generalizes Lemma 3.2 in

Dekker & Hordijk [1989].

Lemma 7.2 There is a c′ > 0 such that

‖P (f0) · · ·P (fn)‖µ ≤ c′, ∀ f0, . . . , fn ∈ F , n ∈ IN0.

Proof: Use last exit decomposition to set M (cf. Chung [1967]). Then(P (f0) · · ·P (fn)

)ij

=(MP (f0) · · ·MP (fn)

)ij

+n∑k=0

∑m∈M

(P (f0) · · ·P (fk)

)im·(MP (fk+1) · · ·MP (fn)

)mj,

Page 129: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Convergence of the Successive Approximations Algorithm 119

with the convention that MP (fn+1)MP (fn) = I. So,∑j∈E

(P (f0) · · ·P (fn)

)ijµj=

∑j∈E

(MP (f0) · · ·MP (fn)

)ijµj

+∑j∈E

n∑k=0

∑m∈M

(P (f0) · · ·P (fk)

)im·(MP (fk+1) · · ·MP (fn)

)mjµj

≤ βn+1µi +

n∑k=0

∑m∈M

∑j∈E

(MP fk+1 · · ·MP (fn)

)mjµj

≤ βn+1µi +

n∑k=0

βn−k∑m∈M

µm ≤1

1− β(1 +

∑m∈M

µm)µi.

Choose c′ = (1− β)−1(1 +∑m∈M µm).

Notice, that the statement of the lemma equally holds, if we allow sequences of randomizing

decision rules, or equivalently, Markov policies. We will prove, that indeed g is the s-

average optimal reward.

Lemma 7.3 Let (g, v) be a µ-bounded solution to (5.1). Then g ≥ gs(R), ∀R ∈ C.

Proof: It is sufficient to prove the statement for R ∈ C(M). So, let R = (π0, π1, . . .) ∈ C.Lemma 7.2 implies the existence of a c′ > 0, such that ‖P (π0) · · ·P (πn)‖µ ≤ c′, n ∈ IN0,

where we write P (πn) for the transition matrix with elements Pij(πn) =

∫A(i)

Piajdπnia. By

virtue of Lemma 7.1 g is superharmonic for some µ-bounded vector v′. We use g ≥ P (πn)g,

n ∈ IN0, and iterate (7.3b) to obtain for i ∈ E

IEi,Rv′X(n) ≥ IEi,RrX(n),Y (n) − P (π0) · · ·P (πn)g + IEi,Rv

′X(n+1)

≥ IEi,RrX(n),Y (n) − g + IEi,Rv′X(n+1).

Substract IEi,Rv′X(n+1) − g, take the summation over n = 0, . . . , N and divide by N + 1,

then

1

N + 1v′ − 1

N + 1IEi,Rv

′X(N+1) + g ≥ 1

N + 1

N∑n=0

IEi,RrX(n),Y (n), i ∈ E.

Observe that |IEi,Rv′X(N+1)| = |P (π0) · · ·P (πn)v′|i ≤ c′‖v′‖µµi, for N ∈ IN. Consequently,

lim supN→∞(N + 1)−1|IEi,Rv′X(N+1)| = 0, so that

g= lim supN→∞

1

N + 1v′ − 1

N + 1IEi,Rv

′X(N+1) + g

≥ lim sup

N→∞

1

N + 1

∞∑n=0

IEi,RrX(n),Y (n) = gs(R).

Combination of Proposition 7.1 and the fore-going lemma extablishes the desired result.

Page 130: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

120 Part II, Chapter 7

Corollary 7.2 Any v-conserving policy R is s-average optimal.

Proof: For R = (π0, π1, . . .) a v-conserving policy in (5.1), g = P (πn)g, n ∈ IN0. Hence

all inequality signs in the proof of the previous lemma can be replaced by equality signs,

so that gs(R) = g.

Some comparison of solutions to (5.1) and (7.2) is useful in this place. Let (g, v), (g, v′)

solve (5.1) and (7.2) respectively. A v-conserving policy is “g-conserving” by definition,

whereas this need not hold for a v′-conserving policy. Thus, without further investigation

no conclusion can be drawn on s-average optimality of v′-conserving policies. However,

we do need (7.2) for the analysis of the iteration scheme. Even more important is, that

conclusions on optimal policies can only be drawn if maximization over all actions is

allowed, since generally no information on the maximizing actions in the first average

optimality equation is available. So, we have to derive a relation between systems (5.1)

and (7.2).

3. Convergence of the iteration scheme.

Throughout this section we assume g to be the maximum s-average expected reward vector,

such that (g, v) is a solution to (7.2). For n ≥ 1 we define the nth step approximation en

of v′ as

en := vn − ng − v, n ∈ IN. (7.4)

We write F∗ for the set of deterministic rules f that achieve equality in (7.2b), i.e. for

which f∞ is v′-s-conserving. Furthermore, fnn ⊂ F is a sequence of decision rules, such

that fn takes maximizing actions in the nth step of the iteration scheme.

The first lemma of the section derives some useful bounds for our further analysis.

Lemma 7.4

i) P (f)en−1 ≤ en ≤ P (fn)en−1, ∀ f ∈ F∗, n ∈ IN.

ii) supn∈IN0‖en‖µ <∞.

iii) limn→∞ ‖g − 1nv

n‖µ = 0.

Remark that the third assertion states, that g is the asymptotic average value of the

maximum expected rewards over n time periods. We will sharpen this result considerably

in Theorem 7.1, by showing that vn − ng converges as n tends to infinity.

Proof of Lemma 7.4: i) Let n ∈ IN and f ∈ F∗. Notice first that ‖e0‖µ = ‖v0 − v‖µ ≤‖v0‖µ + ‖v‖µ <∞. So,

P (f)en−1= P (f)vn−1 − (n− 1)g − v= r(f) + P (f)vn−1 − ng −

(r(f)− g − P (f)v

)≤ vn − ng − v = en.

Page 131: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Convergence of the Successive Approximations Algorithm 121

Similarly,

P (fn)en−1= P (fn)vn−1 − (n− 1)g − v= r(fn) + P (fn)vn−1 − ng −

(r(fn)− g − P (fn)v

)≥ vn − ng − v = en.

ii) By i)

en ≤ P (fn)en−1 ≤ · · · ≤ P (fn) · · ·P (f1)e0.

By virtue of Lemma 7.2 there is a constant c′ > 0, such that

eni ≤∑j∈E

(P (fn) · · ·P (f1)

)ijµj|e0j |µj≤ c′‖e‖oµµi, n ∈ IN.

Conversely,

eni ≥(Pn(f)e0

)i≥ −

∑j∈E

Pnij(f)µj

|e0j |µj≥ −c′‖e0‖µ,

so that ‖en‖µ ≤ c′‖e0‖µ, n ∈ IN.

iii) g = 1n (vn − en − v). Consequently, by ii)

‖g − 1

nvn‖µ =

1

n‖en + v‖µ ≤

1

n

(c′‖e0‖µ + ‖v‖µ

)→ 0, for n→∞.

This enables us to show the main theorem. For i ∈ E we denote by A∗(i) ⊂ A(i) the

maximizing actions in (7.2a) or, which is the same, (5.1a).

Theorem 7.1 en∞n=0 is a convergent sequence. The limit, say d, is a µ-bounded vector,

with the following properties:

i) (g, v = d+ v) is a solution to (7.2).

ii) For any vector limit point f∗∞ of the sequence fnn, f∗(i) is a maximizing action

in both (7.2a) and (7.2b), hence f∗(i) ∈ A∗(i), i ∈ E, and f∗∞ is v-s-conserving.

iii) d is a constant vector on any positive recurrent class in the MC generated by a v-s-

-conserving deterministic policy.

Proof: We first show convergence of the sequence enn. Therefore we need inequalities

(7.5) and (7.6), which we adopted from Hordijk, Schweitzer & Tijms [1975]. Insofar the

proof uses the contraction property of MP (f), it is new. We need the following notation

for i ∈ E

mi:= lim infn→∞

eni

Mi:= lim supn→∞

eni

bia:= ria − g +∑j∈E

Piajvj − vi, a ∈ A(i)

Page 132: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

122 Part II, Chapter 7

Notice, that ‖m‖µ, ‖M‖µ <∞ and bia ≤ 0 for a ∈ A(i), i ∈ E. We focus on the proof of

the following inequalities

Mi ≤ maxa∈A(i)

bia +∑j∈E

PiajMj

mi ≥ maxa∈A(i)

bia +∑j∈E

Piajmj.

(7.5)

(7.6)

Remark, that for n ∈ IN

eni = vni − ng − vi

= maxa∈A(i)

ria +∑j∈E

Piajvn−1j − ng − vi

= maxa∈A(i)

ria − g +∑j∈E

Piajvj − vi +∑j∈E

Piaj(vn−1j − (n− 1)g − vj)

= maxa∈A(i)

bia +∑j∈E

Piajen−1j .

(7.7)

(7.8)

Fix i ∈ E and let n(k)k∈IN be a sequence, such that en(k)i → Mi, for k → ∞. Fur-

thermore, let an(k) be a maximizing action in (7.8) for the n(k)th iteration, i.e. en(k)i =

bian(k)+∑j∈E Pian(k)j

en(k)−1j , k ∈ IN. By the compactness of A(i), the sequence an(k)k

contains a converging subsequence with limit a∗, say, which we denote by an(k)k for

notational convenience.

We claim, that a∗ ∈ A∗(i). Indeed, observe that (7.7) implies that vn(k)i = rian(k)

+∑j∈E Pian(k)j

vn(k)−1j . Hence,

vn(k)i

n(k)=rian(k)

n(k)+∑j∈E

Pian(k)j

vn(k)−1j

n(k).

By Lemma 7.4iii) the sequence 1nv

nn is µ-bounded. Application of Lemma 5.1 yields

gi = limn→∞

vn(k)i

n(k)= limn→∞

rian(k)

n(k)+∑j∈E

Pian(k)j

vn(k)−1j

n(k)

=∑j∈E

Pia∗jgj ,

and thus a∗ ∈ A∗(i). Thus, for any limiting policy f∗∞, f∗(i) ∈ A∗(i), i ∈ E. Moreover,

Mi = limk→∞

en(k)i = lim sup

k→∞en(k)i = lim sup

k→∞bian(k)

+∑j∈E

Pian(k)jen(k)−1j

≤ lim supk→∞

bian(k)+ lim sup

k→∞

∑j∈E

Pian(k)jen(k)−1j

= bia∗ + lim supk→∞

∑j∈E

Pian(k)jen(k)−1j . (7.9)

Page 133: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Convergence of the Successive Approximations Algorithm 123

For (7.5) it is sufficient to show, that

lim supk→∞

∑j∈E

Pian(k)jen(k)−1j ≤

∑j∈E

Pia∗jMj . (7.10)

Indeed, by a diagonalization procedure it is possible to choose a further subsequence

m(k)k of n(k)k, such that

lim supk→∞

∑j∈E

Pian(k)jen(k)−1j = lim

k→∞

∑j∈E

Piam(k)jem(k)−1j

and

limk→∞

em(k)−1j = e∗j , ∀ j ∈ E,

for some µ-bounded vector e∗. With Lemma 5.1 we find

limk→∞

∑j∈E

Piam(k)jem(k)−1j =

∑j∈E

Pia∗je∗j ≤

∑j∈E

Pia∗j lim supk→∞

en(k)−1j

=∑j∈E

Pia∗jMj .

We conclude that

Mi ≤ bia∗ +∑j∈E

Pia∗jMj ≤ maxa∈A(i)

bia +∑j∈E

PiajMj, ∀ i ∈ E, (7.11)

which proves (7.5). For the proof of (7.6) similar, but simpler arguments yield the result

by separate consideration of each a ∈ A(i).

Let f ∈ F be such that f(i) maximizes the righthandside of (7.5) for i ∈ E. Since bia ≤ 0

for a ∈ A(i), i ∈ E,

Mi ≤∑j∈E

Pij(f)Mj . (7.12)

Further, denote by Cb(f) the positive recurrent class of b ∈ B(f) ⊂ M . For Mi :=

Mi −∑b∈B(f) Fib(f)Mb, we have

Mi ≤∑j∈E

Pij(f)Mj −∑

b∈B(f)

Fib(f)Mb =∑j∈E

B(f)Pij(f)Mj .

Observe, that ‖M‖µ <∞. Iteration yields in vector notation

M ≤ B(f)Pn(f)M ≤ βn‖M‖µµ, ∀n ∈ IN.

Taking the limit for n→∞, we obtain Mi ≤ 0, ∀ i ∈ E, so that Mi ≤∑b∈B(f) Fib(f)Mb.

Iteration of equation (7.12) and combination with the foregoing inequality yields for i = b

Mb ≤∑

j∈Cb(f)

Πbj(f)Mj ≤Mb.

Page 134: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

124 Part II, Chapter 7

Consequently, Mb =∑∈Cb(f)Πbj(f)Mj . Also Mi ≤Mb and Πbi(f) > 0, both for i ∈ Cb(f),

so that Mi = Mb for j ∈ Cb(f). For i ∈ Cb(f) and bi(f) = bif(i) this implies by (7.5), that

0 = Mi −∑j∈E

Pij(f)Mj ≤ bi(f) ≤ 0.

Hence, bi(f) = 0, and f∞ s-conserves v for i ∈ Cb(f), b ∈ B(f). Since,

0 ≤Mi −mi ≤ bi(f) +∑j∈E

Pij(f)Mj − maxa∈A(i)

bia +∑j∈E

Piajmj

≤∑j∈E

Pij(f)Mj −mj,

it is similarly shown, that Mi − mi = Mb − mb, ∀ i ∈ Cb(f), b ∈ B(f). For i ∈ E a

transient state, we obtain by iteration

0 ≤Mi −mi ≤∑j∈E

Πij(f)Mj −mj =∑

b∈B(f)

Fib(f)Mb −mb. (7.13)

Thus the sequence enn converges, if we show that Mb = mb, for b ∈ B(f). We already

observed, that f(i) maximizes the righthandside of (7.2b) for positive recurrent states

i ∈ E. So, f∞ can be extended as a v′-s-conserving policy, i.e we can choose f ′ ∈ F ∗, such

that f ′(i) = f(i) for i ∈ Cb(f), b ∈ B(f). Then Cb(f) is a closed, positive recurrent class

in the MC induced by f ′∞.

Let b ∈ B(f) be fixed. Using a diagonalization procedure, we can find sequences n(k)kand m(k)k, such that en(k)k and em(k)k converge to vectors α and γ respectively

with αb = mb and γb = Mb. Choose for any k ∈ IN, h(k) ∈ IN, such that r(k) :=

n(h(k))−m(k) ≥ k. Lemma 7.4i) implies that

en(h(k)) = er(k)+m(k) ≥ Pr(k)

(f ′)em(k).

Thus

α = limk→∞

en(h(k)) ≥ limk→∞

Pr(k)

(f ′)em(k).

Since supn∈IN ‖en‖µ <∞, Key theorem I together with Proposition 18 (Royden [1968], p.

232) imply

limk→∞

Pr(k)

(f ′)em(k) = Π(f ′)γ,

so that

α ≥ Π(f ′)γ,

and similarly

Page 135: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Convergence of the Successive Approximations Algorithm 125

γ ≥ Π(f ′)α.

Combination of the two inequalities gives

αi ≥(Π(f ′)α

)i, i ∈ E.

For i ∈ Cb(f ′) we multiply both sides with Πbi(f′) and take the summation over i ∈ Cb(f ′).

As Πbi(f′) > 0 for i ∈ Cb(f ′), necessarily αi =

∑j Πbj(f

′)αj for i ∈ Cb(f ′). Analogously,

γi =∑j Πbj(f

′)γj , hence αi ≥∑j Πbj(f

′)γj = γi, for i ∈ Cb(f′). In particular mb =

αb ≥ γb = Mb. Consequently mb = Mb and we have shown that enn converges.

Combination with (7.5), (7.6) and (7.11) yields with d the limit of enn.

di = maxa∈A(i)

bia +∑j∈E

Piajdj = bi(f∗) +

∑j∈E

Pij(f∗)dj , i ∈ E.

By insertion of the expression for bia, a ∈ A(i), i ∈ E, we establish assertions i) and ii) of

the theorem for v = v + d. Finally, iii) is achieved in a similar way as the corresponding

assertion for f .

So, the result states that any limit of fnn is both v-conserving and v-s-conserving. We

have proved the following assertion as well.

Corollary 7.2 f∗∞ is an s-average optimal policy, for any vector limit-point f∗ of the

sequence fnn.

By virtue of Theorem 7.1 vn − ng → v + d, as n→∞, with g, d constant vectors on each

positive recurrent class in the MC, which is generated by a limiting policy f∗∞. Since we

obviously have no information on such policies, the unknown vector g bars any conclusions

on the structural properties of vn in relation to similar properties of v′+d. These problems

can be avoided by the following natural condition in most queueing models.

Assumption 7.2: The MC generated by any s-average optimal policy f∞ has no two

disjoint closed sets.

This means that |B(f)| = 1 for any s-average optimal policy f∞. Hence, the stationary

matrix Π(f) has equal rows, so that g is a constant vector. Consequently, (5.1a) is satisfied

for any decision rule. Then A∗(i) = A(i), for i ∈ E and the maximization in (5.1b) is

taken over all actions a ∈ A(i), so that systems (5.1) and (7.2) are equal. The following

lemma is Lemma 3 of Hordijk, Schweitzer & Hordijk [1975], albeit the proof is different.

Lemma 7.5 Suppose that Assumption 7.2 holds in addition to the assumptions of this

Chapter. Let (g, v) be a solution to the average optimality equations (5.1) and d be a

µ-bounded vector on E, such that (g, v + d) solves (5.1) as well. Then di = d, ∀ i ∈ E,

for some constant d.

Proof: Choose f, h ∈ F , such that f∞ and h∞ are v- and v + d-conserving respec-

tively. Then d ≤ P (h)d. Choose a reference state b(h) ∈ M for policy h∞, with M the

taboo set of the µ − GR property, and define di := di − db(h). We obtain d ≤ b(h)P (h)d.

Page 136: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

126 Part II, Chapter 7

Iteration yields di ≤ 0, hence di ≤ db(h), i ∈ E. Moreover, by iteration we obtain

db(h) ≤∑j∈E Πb(h)j(h)dj , and we conclude that di = db(h) for states i ∈ E that are

positive recurrent in the MC induced by h∞.

Similarly, di ≥∑j∈E Pij(f)dj , and an analogous reasoning gives that di ≥ db(f) for a

reference state b(f) ∈ M . Moreover, di = db(f) for all states i that are positive recurrent

under f∞. By Assumption 7.2, there is a state, say 0, that is positive recurrent under

both h∞ and f∞. Hence, for any i ∈ E, we conclude that di ≤ db(h) = d0 = db(f) ≤ di, so

that d is a constant vector.

The final theorem of the chapter states the obvious conclusion.

Theorem 7.2 The assumptions of this Chapter together with Assumption 7.2 imply that

enn converges to a constant vector.

Proof: Let (g, v) be a solution to (5.1) or (7.2). From the proof of Theorem 7.1 we have

for v + d = limn→∞(vn − ng) that (g, v + d) is a solution to (5.1). Apply Lemma 7.5.

Page 137: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 127

CHAPTER EIGHT

On the Limits of α-discounted Optimal Policies.

1. Sensitive optimality results.

As in the previous chapter we again suppose, that Assumptions 5.1, 5.2 and condition

µ − GR hold with ν(f) continuous on F and ‖r(f)‖µ < ∞, ∀ f ∈ F . Let us recall

the definitions of n-discount optimality. For ease of notation we use un(f) for the nth

coefficient of the Laurent expansion of V α(f∞), n ∈ IN, i.e.

u−1(f) = Π(f)

un(f) = (−1)nDn+1(f)r(f), n ∈ IN0,

thus, with a slight abuse of notation

V α(f∞) = (1 + ρ)∞∑

n=−1

ρnun(f). (8.1)

By Proposition 5.1 un(f) is a µ-bounded and componentwise continuous function on F .

f∞n is an n-discount optimal policy iff it maximizes lexicographically the first n+ 2 terms

of the Laurent expansion of the discounted rewards. This means that for any f ∈ F , i ∈ Eone of the two following assertions holds.

uki (f) = uki (fn), k = −1, 0, . . . , ni)

∃k ∈ −1, . . . , n− 1 such that

uli(f) = ul(fn), l ≤ k

uk+1i 6= uk+1

i (fn)=⇒ uk+1

i (f) < uk+1i (fn).ii)

Denote by f∞α an α-discounted deterministic optimal policy. For any vector limit point

f∗ of the set fαα for α tending to 1, it is easy to show that f∗∞ is −1-discount optimal.

Indeed,

ρV α(f∞α ) ≥ ρV α(f∞),

where α and ρ are related to each other through ρ = α(1 − α)−1. Hence, the continuity

of u−1(f) in f ∈ F , the µ-boundedness of all other coefficients of the Laurent expansion

and the dominated convergence theorem imply

u−1(f∗) = limρ↓0

ρV α(f∞α ) ≥ limρ↓0

ρV α(f∞) = u−1(f).

Page 138: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

128 Part II, Chapter 8

Consequently, f∗∞ is −1-discount optimal. To establish 0-discount optimality, we only

need consider f ∈ F and i ∈ E for which u−1i (f) = u−1

i (f∗).

So, we choose f ′ ∈ F , i′ ∈ E, with u−1i′ (f ′) = u−1

i′ (f∗). Then u−1i′ (f ′) ≥ u−1

i′ (f), for all

f ∈ F , in particular for fα. Hence,

u0i′(f∗) = lim

ρ↓0

(V αi′ (f

∞α )− 1 + ρ

ρu−1i′ (fα)

)≥ lim

ρ↓0

(V αi′ (f

′∞)− 1 + ρ

ρu−1i′ (f ′)

)= u0

i′(f′).

These arguments can be found in Hordijk [1976]. Notice that this reasoning can not be

continued for larger values of n. To this end we need that u0i (fα) ≤ u0

i (f∗), for i ∈ E and

all α in a neighbourhood of 1. However, if u−1i (fα) < u−1

i (fα), possibly u0i (fα) > u0

i (f∗).

This precisely is the case in our counterexamples.

It is quite evident that n-discount optimality of all f ∈ F , i.e. uk(f) = uk for some vector

uk, k = −1, . . . , n, yields n+ 1- and n+ 2-discount optimality by similar arguments. This

shows the validity of the following lemma.

Lemma 8.1

i) f∗∞ is −1- and 0-discount optimal.

ii) If all f ∈ F are n-discount optimal for some n ≥ −1, then f∗∞ is n+ 1- and n+ 2-

discount optimal.

Let us discuss the necessary characteristics for a counterexample that disproves stronger

optimality results. To avoid infinite sums we require the coefficient of ρ−1 to be 0, i.e. the

average expected rewards are 0 for any deterministic policy f∞. By virtue of Lemma 8.1

f∗∞ with f∗ a vector limit point of the sequence fαα, is 0- and 1-discount optimal. Our

goal is to construct a 1-discount optimal policy f ′, such that the two following conditions

are satisfied:

f ′ is not a limit of α-discounted optimal policies, for ρ ↓ 0i)

f ′ is 2-discount “strictly better” than f∗.ii)

Using the Laurent expansion, we have for i ∈ E,

u2i (f′)− u2

i (f∗) = lim

ρ↓0

V αi (f ′∞)− V αi (f∗∞)

(1 + ρ)ρ2.

Consequently, if

lim infρ↓0

V αi (f∞α )− V αi (f∗∞)

(1 + ρ)ρ2= 0, (8.2)

then also

u2i (f′)− u2

i (f∗) = lim

ρ↓0

V αi (f ′∞)− V αi (f∗∞)

(1 + ρ)ρ2≤ lim inf

ρ↓0

V αi (f∞α )− V αi (f∗∞)

(1 + ρ)ρ2= 0,

Page 139: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 129

which is clearly not what we want. So, a prerequisite for a counterexample satisfying i),

ii), is, that (8.2) fails, i.e.

∃i′ ∈ E for which lim infρ↓0

V αi′ (f∞α )− V αi′ (f∗

∞)

(1 + ρ)ρ2> 0. (8.3)

We translate (8.3) to a condition on the coefficients of the Laurent expansion. Then we

get the following expression:

lim infρ↓0

V αi′ (f∞α )− V αi′ (f∗

∞)

(1 + ρ)ρ2= lim inf

ρ↓0

(u0i′(fα)− u0

i′(f∗)

ρ2+u1i′(fα)− u1

i′(f∗)

ρ

). (8.4)

As f∗ is 0-discount optimal, u0i (fα) ≤ u0

i (f∗), for all i ∈ E. But if this inequality is a

strict inequality, then there is no restriction on the sign of u1i (fα) − u1

i (f∗), which might

very well be positive.

Indeed, in our example ∃i′ ∈ E for which the righthandside of (8.4) converges to a positive

number. This made it possible to add a 1-discount optimal policy f ′∞

to the system,

satisfying conditions i) and ii). The models we present in the next sections, are the

weakest possible extensions of the finite state and action MDC. i.e. both are unichain.

The first has a finite state and a compact action space, the second has a denumerable state

and finite action spaces, such that the number of possible actions is uniformly bounded

over the states.

2. Disproving stronger sensitive optimality results.

2.1. MDC’s with finite state and compact action spaces.

=action b

=action 1

=action 2

(. . . , . . .) = (piaj , ria)

( 14 + b2, 0) ( 1

4 + b, 0)

( 12 − b

2 − b, 0)

(1, 2)(1, 1)

(1, 0)

(1, 0)

(3−√

513+√

5, 10+6

√5

13+√

5

)

(10+2

√5

13+√

5, 10+6

√5

13+√

5

)

1 2 3

4

5

Figure 1

The model is sketched in Figure 1. We denote with fb, f2 the deterministic policies that

choose action b ∈ [0, 12 (√

3− 1)] and action 2 respectively in state 2.

Page 140: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

130 Part II, Chapter 8

Clearly, the transition probabilities and the one-step rewards in state 2 are continuous

functions of the parameter b. As the state space is finite, Assumptions 5.1, 5.2 and µ−GR

are satisfied for any bounding vector µ. Moreover, the number of recurrent classes ν(f) is

1 under any f , so that ν(f) is a continuous function on F .

We will prove two theorems in this subsection. Together they imply that the unique

limiting policy is not Blackwell optimal.

Theorem 8.1 ∃αo such that fbα is the unique deterministic α-discounted optimal policy

∀α ∈ (αo, 1), with

bα =1

α− 5

4+

1

√4− 2α− 3

4α2.

A consequence of this theorem is the following:

Corollary 8.1 fb∗ , with b∗ = 14 (√

5−1), is the unique limiting policy of the deterministic

α-discounted optimal policies fbα , for α tending to 1.

The average reward under any policy is equal to 0, as absorption into state 5 takes place

with probability 1. Consequently, all policies are −1-discount optimal and u−1(f) =

Π(f)r(f) = 0 for all f . We write x > y for two vectors x, y if x ≥ y in the componentwise

vector-ordering, and x 6= y. To complete our counterexample, we show that:

Theorem 8.2

i) u0(f2) = u0(fb∗) > u0(fb) ∀ b 6= b∗

ii) u1(f2) = u1(fb∗)

iii) u2(f2) > u2(fb∗).

Using the Laurent series approach, we can conclude the following:

Conclusion:

f2, and fb∗ are both 0- and 1-discount optimal; f2 is n-discount optimal for n ≥ 2.

Consequently, f2 is Blackwell optimal.

Proof of Theorem 8.1:

As only in state 2 more than 1 action can be chosen, a deterministic optimal policy is

determined by maximizing V α2 (f∞) over f . We compute the V α2 (f∞) by solving the well-

known optimality equation V α(f∞) = r(f) + αP (f)V α(f∞), for the appropriate states.

Clearly,

V α5 (f∞) = 0, V α1 (f∞) = 1 + αV α5 (f∞) = 1 and V α3 (f∞) = 2 + αV α5 (f∞) = 2.

For all b ∈ [0, 12 (√

3− 1)] :

V α2 (f∞b ) = 0 + α( 14 + b2)V α1 (f∞b ) + ( 1

2 − b2 − b)V α2 (f∞b ) + ( 1

4 + b)V α3 (f∞b )

= α 34 + b2 + 2b+ ( 1

2 − b2 − b)V α2 (f∞b ),

so that

Page 141: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 131

V α2 (f∞b ) = α34 + b2 + 2b

1− α( 12 − b2 − b)

.

Under policy f2 we get

V α2 (f∞2 ) = αV α4 (f∞2 ) = α10 + 6

√5

13 +√

5+ α2 3−

√5

13 +√

5V α2 (f∞2 ),

and hence

V α2 (f∞2 ) = α10 + 6

√5

13 +√

5− α2(3−√

5).

The rest of the proof consists of two parts. We first prove

∃αo < 1, such that V α2 (f∞bα) > V α2 (f∞b ) ∀ b 6= bα, ∀α ∈ (αo, 1). (8.5)

Then we show

∃α1 < 1 such that V α2 (f∞bα) > V α2 (f∞2 ) ∀α ∈ (α1, 1). (8.6)

Proof of (8.5): Compute the derivative of V α2 (f∞b ) to b, then we get

d

dbV α2 (f∞b ) =

α(1− α( 1

2 − b2 − b))2 (2b+ 2)(1− α( 1

2 − b2 − b))

− (2αb+ α)( 34 + b2 + 2b)

=α(

1− α( 12 − b2 − b)

)2 −αb2 + b(2− 52α) + 2− 7

4α,(8.7)

so thatd

dbV α2 (f∞b ) = 0 ⇐⇒ b =

1

α− 5

4+−

1

√4− 2α− 3

4α2.

Taking the limits of both roots for α ↑ 1, we get

1

α− 5

4+

1

√4− 2α− 3

4α2 → 1

4 (√

5− 1) < 12 (√

3− 1) (8.8)

1

α− 5

4− 1

√4− 2α− 3

4α2 → −1

4 (√

5 + 1) < 0.

It is straightforward to check that the maximum of V α2 (f∞b ) over b ∈ [0, 12 (√

3 − 1)] is

attained in

bα =1

α− 5

4+

1

√4− 2α− 3

4α2.

By (8.8) we also have, that bα is an admissible action as α tends to 1.

Proof of (8.6): This proof requires some tedious analysis.

V α2 (f∞bα) > V α2 (f∞2 )

⇐⇒34 + b2α + 2bα

1− α( 12 − b2α − bα)

>10 + 6

√5

13 +√

5− α2(3−√

5)

⇐⇒ ( 34 + b2α + 2bα)

(13 +

√5− α2(3−

√5))> (10 + 6

√5)(1− α

2 + αb2α + αbα).

Page 142: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

132 Part II, Chapter 8

Using that (8.7) is equal to 0 for b = b(a), we insert b2α = ba( 2α −

52 ) + 2

α −74 . Then the

last inequality holds if and only if(bα

( 2

α− 1

2

)+

2

α− 1)(

13 +√

5− α2(3−√

5))> (10 + 6

√5)(3− 9

4α+ bα(2− 32α))

⇐⇒ 26 + 2√

5

α− 13−

√5− α(6− 2

√5) + α2(3−

√5)

+bα

(26 + 2√

5

α− 13

2−√

5

2− α(6− 2

√5) + α2

(3

2−√

5

2

))> 30 + 18

√5− α( 45

2 + 272

√5) + bα

(20 + 12

√5− α(15 + 9

√5))

⇐⇒ 26 + 2√

5

α− 43− 19

√5 + α( 33

2 + 312

√5) + α2(3−

√5)

+bα

(26 + 2√

5

α− 53

2− 25

2

√5 + α(9 + 11

√5) + α2

(3

2−√

5

2

))> 0.

Substitute bα = 1α −

54 + 1

√4− 2α− 3

4α2 in the last inequality. Then we get

1

α2(26 + 2

√5) +

1

α(−33− 13

√5) + (− 7

8 + 618

√5) + α( 27

4 + 54

√5) + α2( 9

8 −38

√5)

+√

4− 2α− 34α

2

(1

α2(13 +

√5)− 1

α( 53

4 + 254

√5) + 9

2 + 112

√5 + α

(34 −

√5

4

))> 0.

Denote the lefthandside of the last inequality with v(α). It is easy to check that v(1) = 0.

So we need to study the derivative of v(α) to get some insight in the behaviour of v(α),

in a neighbourhood of 1.

d

dαv(α)

∣∣∣α=1

= −2(26 + 2√

5) + 33 + 13√

5 + 274 + 5

4

√5 + 2( 9

8 −38

√5)

− 7

2√

5(13 +

√5− 53

4 −254

√5 + 9

2 + 112

√5 + 3

4 −√

54 )

+

√5

2

(−2(13 +

√5) + ( 53

4 + 254

√5) + 3

4 −√

54

)= −10 + 19

2

√5− 7

2√

55 +

√5

2(−12 + 4

√5) = 0.

Likewise, the first derivative does not provide us enough information. Consequently we

have to study the second derivative. In order to make the calculations somewhat clearer,

we multiply v(α) by α2, and we define the three following functions:

v1(α) := −(26 + 2

√5 + α(−33− 13

√5) + α2(− 7

8 + 618

√5)

+ α3( 274 + 5

4

√5) + α4( 9

8 −38

√5))

v2(α) := 4− 2α− 34α

2

v3(α) := 13 +√

5− α( 534 + 25

4

√5) + α2( 9

2 + 112

√5) + α3( 3

4 −√

54 ).

Page 143: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 133

Then α2v(α) = −v1(α) +√v2(α) · v3(α). Of course, v(α) > 0 ⇐⇒ α2v(α) > 0, if α > 0.

Furthermore, vi(α), i = 1, 2, 3, is positive for α close to 1, as v1(1) = 52

√5, v2(1) = 5

4 ,

v3(1) = 5, and all are continuous functions. Therefore,

∃α1 < 1, such that v(α) > 0 ∀α ∈ (α1, 1)

⇐⇒ ∃α2 < 1, such that v∗(α) := α2v(α)(v1(α) +

√v2(α) · v3(α)

)= −v2

1(α) + v2(α) · v23(α) > 0, ∀α ∈ (α2, 1).

The first derivative of this new function v∗ is 0 in α = 1, as it consists of a sum of products

with either v(α), or v′(α) in it, which are both 0 in α = 1. Finally we get

v∗′′(α)|α=1 =

[−2(v′1(α))2 − 2v1(α)v′′1 (α) + v′′2 (α)v2

3(α) + 4v′2(α)v3(α)v′3(α)

+ 2v2(α)(v′3(α))2 + 2v2(α)v3(α)v′′3 (α)]|α=1

= −2(10− 92

√5)2 + 5

√5( 209

4 + 734

√5)− 6

452 − 70(−2 + 4√

5)

+ 52 (−2 + 4

√5)2 + 25

2 ( 272 + 19

2

√5)

= 535 + 240√

5 > 0.

So, the second derivative of v∗ is positive for large α ≤ 1. As v∗′(1) = 0, this implies that

the first derivative is negative for large α < 1. Which means that v∗ is (strictly) decreasing

for large α < 1. Therefore v∗ must be positive for large α < 1, as v∗(1) = 0. This proves

(8.6).

Proof of Theorem 8.2: As the uk(f) are functions of the transition matrix P (f), the sta-

tionary matrix Π(f) and the deviation matrix D(f), we need to compute these first.

P (fb) =

0 0 0 0 1

14 + b2 1

2 − b2 − b 1

4 + b 0 0

0 0 0 0 1

0 3−√

513+√

50 0 10+2

√5

13+√

5

0 0 0 0 1

,

P (f2) =

0 0 0 0 1

0 0 0 1 0

0 0 0 0 1

0 3−√

513+√

50 0 10+2

√5

13+√

5

0 0 0 0 1

.

Under any policy, absorption takes place into state 5 from any starting state, hence

Π(f) =

0 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 1

.

Page 144: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

134 Part II, Chapter 8

By Proposition 5.1 and Key theorem II D(fb) =∑n(Pn(fb) − Π(fb) = (I − P (fb) +

Π(fb))−1 −Π(fb). We calculate (I − P (fb) + Π(fb))

−1 first.

(I − P (fb) + Π(fb))−1 =

1 0 0 0 0

−( 14 + b2) 1

2 + b2 + b −( 14 + b) 0 1

0 0 1 0 0

0 − 3−√

513+√

50 1 3−

√5

13+√

5

0 0 0 0 1

−1

=

1 0 0 0 0(14 +b2)

12 +b2+b

112 +b2+b

14 +b

12 +b2+b

0 − 112 +b2+b

0 0 1 0 0

3−√

513+√

5

(14 +b2)

12 +b2+b

3−√

513+√

51

12 +b2+b

3−√

513+√

5

14 +b

12 +b2+b

1 − 3−√

513+√

5

32 +b2+b

12 +b2+b

0 0 0 0 1

and so

D(fb) =

1 0 0 0 −1(14 +b2)

12 +b2+b

112 +b2+b

14 +b

12 +b2+b

0 −(

1 + 112 +b2+b

)0 0 1 0 −1

3−√

513+√

5

(14 +b2)

12 +b2+b

3−√

513+√

51

12 +b2+b

3−√

513+√

5

14 +b

12 +b2+b

1 −(

1 + 3−√

513+√

5

32 +b2+b

12 +b2+b

)0 0 0 0 0

.

Similarly,

(I − P (f2) + Π(f2))−1 =

1 0 0 0 0

0 1 0 −1 1

0 0 1 0 0

0 − 3−√

513+√

50 1 3−

√5

13+√

5

0 0 0 0 1

−1

=

1 0 0 0 0

0 13+√

510+2

√5

0 13+√

510+2

√5− 8

5+√

5

0 0 1 0 0

0 3−√

510+2

√5

0 13+√

510+2

√5− 3−

√5

5+√

5

0 0 0 0 1

and thus

D(f2) =

1 0 0 0 −1

0 13+√

510+2

√5

0 13+√

510+2

√5− 13+

√5

5+√

5

0 0 1 0 −1

0 3−√

510+2

√5

0 13+√

510+2

√5− 8

5+√

5

0 0 0 0 0

.

Page 145: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 135

The proof of (i) needs the computation of u0(f).

u0(fb) = D(fb)r(fb) =

134 +b2+2b

12 +b2+b

2

3−√

513+√

5

34 +b2+2b

12 +b2+b

+ 10+6√

513+√

5

0

.

A simple calculation gives ( 34 + (b∗)2 + 2b∗)/( 1

2 + (b∗)2 + b∗) = (5 + 3√

5)/(5 +√

5), so

that

u0(fb∗) =

1

5+3√

55+√

5

23−√

513+√

55+3√

55+√

5+ 10+6

√5

13+√

5

0

=

1

5+3√

55+√

5

25+3√

55+√

5

0

,

u0(f2) =

1

13+√

510+2

√5

10+6√

513+√

5

213+√

510+2

√5

10+6√

513+√

5

0

=

1

5+3√

55+√

5

25+3√

55+√

5

0

= u0(fb∗).

It remains to be shown that u0(fb∗) > u0(fb), ∀ b 6= b∗. From the specific form of the

vectors, it is sufficient to prove that u02(fb∗) > u0

2(fb). As u02(fb) is equal to V α2 (f∞b ) for

α = 1, we can copy the optimization of V α2 (f∞b ) over fb, replacing α by 1.

For the proof of (ii), we compute u1(fb∗) and u1(f2). Here we use that 1/( 12 + (b∗)2 + b∗) =

8/(5 +√

5). We obtain

u1(fb∗) = −D(fb∗)u0(fb∗) =

−1

−(u0

2(fb∗) +u02(fb∗ )

12 +(b∗)2+b∗

)−2

− 3−√

513+√

5

(u0

2(fb∗) +u02(fb∗ )

12 +(b∗)2+b∗

)− u0

2(fb∗)

0

=

−1

− 5+3√

55+√

513+√

55+√

5

−2

− 3−√

513+√

55+3√

55+√

513+√

55+√

5− 5+3

√5

5+√

5

0

=

−1

− 5+3√

55+√

513+√

55+√

5

−2

− 5+3√

55+√

58

5+√

5

0

,

Page 146: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

136 Part II, Chapter 8

u1(f2) = −D(f2)u0(f2) =

−1

−2 13+√

510+2

√5

5+3√

55+√

5

−2

− 1610+2

√5

5+3√

55+√

5

0

= u1(fb∗).

Finally we prove (iii). As we know that a deterministic Blackwell optimal policy exists,

there is a deterministic 2-discount optimal policy. Therefore it can never happen that

u22(f2) > u2

2(fb∗), while u2i (f2) < u2

i (fb∗) for an i 6= 2, such that it is sufficient to show

u22(f2) > u2

2(fb∗).

u22(f2) = (−D(f2)u1(f2))2 =

13 +√

5

10 + 2√

5

5 + 3√

5

5 +√

5

21 +√

5

5 +√

5=

5 + 3√

5

(5 +√

5)3(139 + 17

√5),

u22(fb∗) = (−D(fb∗)u

1(fb∗))2 = u02(fb∗) +

8

5 +√

5

5 + 3√

5

5 +√

5

13 +√

5

5 +√

5

=5 + 3

√5

(5 +√

5)3

((5 +

√5)2 + 8(13 +

√5))

=5 + 3

√5

(5 +√

5)3(134 + 18

√5) < u2

2(f2).

2.2. MDC’s with denumerable state and finite action spaces.

The example in this subsection is a modification of the foregoing example. An obvious

modification is to split up the state with the compact action set, i.e. state 2, into a

denumerable set of states 2(n)n with finite action sets. However, it did not serve our

purpose, at least I did not see how to make it so. Fortunately, addition of another state

to the system, state 2∗, proved effective.

Before stating the model, we need some further analysis of the first counterexample. For

a proper discrimination between the two models, we add subcript “1” to all quantities of

the first example, e.g. 1Vα(f∞α ), 1u

n(f), etc.

Lemma 8.2 There is an α < 1, such that for α ∈ [α, 1)

g(α):= 1Vα2 (f∞2 )− 1V

α2 (f∞b∗ )

1V12 (f∞b∗ )− 1V

α2 (f∞b∗ )

and

h(α):= 1Vα2 (f∞2 )− 1V

α2 (f∞b∗ )

are positive and decreasing. Moreover, limα↑1 g(α) = limα↑1 h(α) = 0.

Proof: We might as well check the assertions by studying g and h as functions of ρ =

(1 − α)/α. Then α increases iff ρ decreases and α ↑ 1 ⇐⇒ ρ ↓ 0. Thus, we write g(ρ),

h(ρ) for the corresponding functions with parameter ρ. Then by virtue of Theorem 8.2

g(ρ)= (1 + ρ)ρ2

∑∞n=0 ρ

n(

1un+22 (f2)− 1u

n+22 (fb∗)

)1u

02(fb∗)− (1 + ρ)

∑∞n=0 ρ

n1un2 (fb∗)

(8.9)

= (1 + ρ)ρ 1u22(f2)− 1u

22(fb∗) +

∑∞n=1 ρ

n(

1un+22 (f2)− 1u

n+22 (fb∗)

−1u02(fb∗)− 1u

12(fb∗)−

∑∞n=1 ρ

n(

1un2 (fb∗) + 1u

n+12 (fb∗)

) . (8.10)

Page 147: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 137

Moreover, 1u22(f2)− 1u

22(fb∗) > 0, and

−1u02(fb∗)− 1u

12(fb∗) =

(−1 +

13 +√

5

5 +√

5

)5 + 3√

5

5 +√

5=

8

5 +√

5

5 + 3√

5

5 +√

5> 0,

so thatd

dρg(ρ)

∣∣∣ρ=0

= 1u22(f2)− 1u

22(fb∗)

−1u02(fb∗)− 1u

12(fb∗)

> 0.

Hence, g increases for small ρ and g(α) decreases in α near 1. Insertion of ρ = 0 gives

g(0) = g(0) = 0, and the assertion for g is proved. Similarly we check that h(0) = h(0) = 0,

h(ρ)|ρ=0 =d

dρh(ρ)

∣∣∣ρ=0

= 0

d2

dρ2h(ρ)

∣∣∣ρ=0

= 1u22(f2)− 1u

22(fb∗) > 0,and

hence h(α) decreases as a function of α, for α close to 1. Consequently, there is a suitable

α satisfying the assertions of the lemma.

=action b∗

=action 1

=action 2

(. . . , . . .) = (piaj , ria)

· · ·

( 14 + (b∗)2, 0) ( 1

4 + b∗, 0)

( 12 − (b∗)2 − b∗, 0)

(1, 2)(1, 1)

(1, 0)

(1, 0)

(1, 0)

(pn, 0)

(pk, 0), k 6= n

(1, 1V

12 (f∞b∗ )− ε

2 (1Vα(n+1)2 (f∞2 )− 1V

α(n+1)2 (f∞b∗ ))

)

(3−√

513+√

5, 10+6

√5

13+√

5

)

(10+2

√5

13+√

5, 10+6

√5

13+√

5

)

1 2 3

4

5

2∗

2(n)

Figure 2

Page 148: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

138 Part II, Chapter 8

Consider the model depicted in Figure 2. Besides the specification of the unknown pa-

rameters some explanation is due. Compared to the previous model all actions in state 2

except action 2 have been erased. The parameter n in the notation of state 2(n) ranges

over IN, so there are denumerably many of these states. We take pn = 2−n. Choose ε > 0

and let N ∈ IN be such, that 2−N (1 + ε)−1 ≤ g(α), with α chosen as in the statement of

Lemma 8.2. Moreover, let α(n)n ⊂ [α, 1) be a sequence, with α(n) a solution to

g(α(n)) = 1Vα(n)2 (f∞2 )− 1V

α(n)2 (f∞b∗ )

1V12 (f∞b∗ )− 1V

α(n)2 (f∞b∗ )

= 2−n(1 + ε)−1,

for n ≥ N , and α(n) = α(N) for n < N . By Lemma 8.2 these solutions exist, and α(n) ↑ 1,

as n tends to infinity. Moreover, for n > N , 1Vα(n)2 (f∞2 )−1V

α(n)2 (f∞b∗ ) = g(α(n))·h(α(n)) >

0, so that r2(n−1)1 < 1V12 (f∞b∗ ). Observe that limn→∞ r2(n−1)1 = 1V

12 (f∞b∗ ).

We point out that the model is uniformly strong recurrent. Take e.g. the finite set

M = 1, 3, 5, 2(1) and bounding vector µ = e. This is possible, because the one step

rewards are uniformly bounded. As ν(f) = 1 for all f ∈ F , it is a continuous function on

F . Finally, it is obvious that Assumptions 5.1 and 5.2 are satisfied. Thus Proposition 5.1

applies.

Next we will show two assertions on this example. Together they establish the desired

result.

Theorem 8.3

i) fα(2∗) = 1, for all α-discounted deterministic optimal policies f∞α , and all α ≥ α(N).

ii) There is an α′n < 1, such that fα(2(n)) = b∗, for all α-discounted deterministic

optimal policies f∞α with α ≥ α′n.

Immediately this establishes

Corollary 8.2 f∞1 with f1(2∗) = 1 and f1(2(n)) = b∗, n ∈ IN, is the unique limiting

policy of any sequence f∞α α of α-discounted optimal policies.

The average expected rewards are 0 under any policy, similarly to the first counterexam-

ple. Hence, f∞1 is 0- and 1-discount optimal. Consider the deterministic policy f∞2 with

f2(2∗) = 2 and f2(2(n)) = b∗, n ∈ IN. So, f1 and f2 only differ in state 2∗.

Theorem 8.4

i) u0(f1) = u0(f2) > u0(f), ∀ f 6= f1, f2.

ii) u1(f1) = u1(f2).

iii) u22∗(f1) < u2

2∗(f2).

We conclude, that f1 and f2 are the only 0- and 1-discount optimal policies. f2 is the

unique 2-discount optimal policy, hence it is the unique n-discount and Blackwell optimal

policy, for n ≥ 2.

Page 149: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

On the Limits of α-discounted Optimal Policies 139

Proof of Theorem 8.3: Fix α ≥ α(N). By definition (see Figure 2)

V α2(n)(f∞α ) ≥

1V12 (f∞b∗ )− ε

2

(1V

α(n+1)2 (f∞2 )− 1V

α(n+1)2 (f∞b∗ )

), n ∈ IN

1Vα2 (f∞b∗ ).

Hence,

V α2∗(f∞α ) ≥ αpn

(1V

12 (f∞b∗ )− ε

2

(1V

α(n+1)2 (f∞2 )− 1V

α(n+1)2 (f∞b∗ )

))+ α∑k 6=n

pk 1Vα2 (f∞b∗ )

= αpn

(1V

12 (f∞b∗ )− 1V

α2 (f∞b∗ )− ε

2

(1V

α(n+1)2 (f∞2 )− 1V

α(n+1)2 (f∞b∗ )

))+ α1V

α2 (f∞b∗ ), (8.11)

for any n ∈ IN, in particular for n∗ ≥ N such that α ∈[α(n∗), α(n∗+ 1)

).

If f(2∗) = 2 for some policy f∞, then V α2∗(f∞) = α 1V

α2 (f∞2 ). Thus fα(2∗) 6= 2, for any

α-discounted optimal policy f∞α , if for some n ∈ IN

(8.11) > α 1Vα2 (f2).

By virtue of Lemma 8.2 this is implied for n = n∗ by

pn∗(

1V12 (f∞b∗ )− 1V

α2 (f∞b∗ )

)>(1 +

ε

2pn∗)(

1Vα2 (f∞2 )− 1V

α2 (f∞b∗ )

),

which is tantamount to

g(α) = 1Vα2 (f∞2 )− 1V

α2 (f∞b∗ )

1V12 (f∞b∗ )− 1V

α2 (f∞b∗ )

< pn∗(1 +

ε

2pn∗)−1

= 2−n∗(1 + ε2−n

∗−1)−1. (8.12)

Invoking Lemma 8.2 we observe, that the last inequality is valid for α ∈[α(n∗), α(n∗+ 1)

)as

g(α) ≤ g(α(n∗)) = 2−n∗(1 + ε)−1 < righthandside of (8.12).

The second statement of the theorem is easily shown, since V α2(n)(f∞) converges to 1V

12 (f∞b∗ )

as α tends to 1, for all f ∈ F with f(2(n)) = b∗, n ∈ IN.

Proof of Theorem 8.4: i) For deterministic policies f∞ with f(2(n)) = 1,

V α2(n)(f∞) = 1V

12 (f∞b∗ )− ε

2

(1V

α(n+1)2 (f∞2 )− 1V

α(n+1)2 (f∞b∗ )

).

Hence by expression (8.1) and the fact that u−1(f) = 0

u02(n)(f) = V α2(n)(f

∞)|ρ=0 < 1V12 (f∞b∗ ) = 1u

02(fb∗) = u0

2(n)(f1).

So, for f∞ to be 0-discount optimal it is necessary that f(2(n)) = b∗, for all n ∈ IN. Thus

we need consider state 2∗ only and we will compare f1 and f2. Then

u02∗(f1) = V α2∗(f

∞1 )|ρ=0 = α 1V

α2 (f∞b∗ )|ρ=0 = 1u

02(fb∗)

and similarly

Page 150: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

140 Part II, Chapter 8

u02∗(f2) = 1u

02(f2) = 1u

02(fb∗),

where the second equality follows from Theorem 8.2i).

ii) A further use of (8.1) yields

u12∗(f1) =

d

1

1 + ρV α2∗(f

∞1 )∣∣∣ρ=0

=d

1

(1 + ρ)2 1Vα2 (f∞b∗ )

∣∣∣ρ=0

=d

1

1 + ρ

∞∑n=0

ρn1un2 (fb∗)

∣∣∣ρ=0

= − 1

(1 + ρ)2

∞∑n=0

ρn1un2 (fb∗)

∣∣∣ρ=0

+1

1 + ρ

∞∑n=0

(n+ 1)ρn1un+12 (fb∗)

∣∣∣ρ=0

= −1u02(fb∗) + 1u

12(fb∗).

Analogously

u12∗(f2) = −1u

02(f2) + 1u

12(f2) = u1

2∗(f1),

by virtue of Theorem 8.2ii).

iii) Continuing in the same vein, we establish that

u22∗(f1) = 2

(1u

02(fb∗)− 1u

12(fb∗) + 1u

22(fb∗)

)< 2(

1u02(f2)− 1u

12(f2) + 1u

22(f2)

)= u2

2∗(f2),

where we apply Theorem 8.2iii).

Page 151: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 141

CHAPTER NINE

Applications.

1. Introductory remarks.

For the value of the results hitherto obtained, it is of importance that they be applicableto optimal control models. This chapter discusses two multi-dimensional queueing appli-cations, the K competing queues model and the two centre open Jackson network withcontrol of the service rates. From the analysis it can be easily deduced that a version ofthe latter model with control of the arrival rates is included as well.

For both models we show uniform strong recurrence by the construction of a boundingvector µ of structures (3.3) and (3.2) respectively. Both models are unichain and easilychecked to satisfy Assumptions 5.1 and 5.2. Thus Proposition 5.1, the equivalence resultsin Chapter 6, and the convergence of the successive approximations algorithm apply, forall polynomially bounded rewards.

Mutatis mutandis, Propositions 3.1 and 3.2 are valid as well. Indeed, by assumption thebounds in the assertion of Proposition 3.1 are uniform bounds in f ∈ F in this chapter,so that the Laplace-Stieltjes transforms of the marginal distributions converge in a neigh-bourhood of 0, uniformly in f ∈ F . This implies convergence of all moments, uniformlyin the deterministic policies. On the other hand, the stochastic process generated by anydeterministic policy in each of the two models satisfies the assumptions of Proposition 3.2,thus disproving strong ergodicity. More specificly, the stochastic process associated withthe total number of customers in the system satisfies the left-skip-free property, which isthe condition of Proposition 3.2i). Hence, it is not possible to choose a bounded µ-vector.Therefore these models can not be handled by many earlier papers in this field (cf. Thomas[1980] for an overview). Notice, that combination with Theorem 6.2 also disproves strongergodicity of the continuous time versions of the exponential models we study.

The spatial geometric boundedness character of the transition probabilities is reflected bythe fact, that the µ-vector is an exponential function on E. This suggests a condition onthe Laplace-Stieltjes transforms of the arrival and/or service time distributions. Under a“uniform ergodicity” condition such condition turns out to be necessary and sufficient foruniform strong recurrence, similarly as for the applications in Chapter 3.

Throughout this chapter we will refrain from summarizing these implications and merelystate the results concerning uniform strong recurrence.

Page 152: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

142 Part II, Chapter 9

2. K competing queues.

2.1. General description of the model and results.

This section analyses three versions of the K competing queues model. We start with a

general description of the model.

K queues compete for service from one server. The server controls the system by choosing

which queue to serve. If a customer finishes service, he is routed to one of the queues or

he leaves the network according to some probability distribution. The versions differ in

two aspects. One concerns the timepoints at which the server may adjust his control. The

other difference is between the service time and arrival distributions.

The first model is a time-slotted version. This means that the server can only change

service at pre-fixed time points, which are called the decision moments. So, at a decision

moment the server chooses the queue to serve during the following time-slot (the time-

interval between two successive decision moments). When a service is finished during a

time-slot, the server waits until the next decision moment. These specifications imply, that

any number of customers may have entered each queue between two successive decision

moments.

The second version is an exponential model in continuous time, in which the server can

change service at any instant. This is a MDP, for which no sensitive optimality results

are available yet. Under a uniformizability condition we show uniform strong recurrence

of the AMDC.

As a third model we consider the semi-Markov decision model, in which decisions can be

made only at the end of each service. In order to apply the results of Dekker & Hordijk

[1990] uniform strong recurrence has to be checked for the embedded Markov chains on

the instants of a service completion. We will do so in subsection 9.1.4. Together with the

verification of the other assumptions in Dekker & Hordijk [1990] the existence of sensitive

optimal policies is also guaranteed in the semi-Markov model.

For the three models only non-idling policies are allowed, i.e. policies that never serve an

empty queue.

The description of the models can be formalized as follows. The state space E consists

of all K-tupels i = (i1, . . . , iK), where im is the number of customers in the mth queue,

im ∈ IN0. In each state one of at most K possible actions can be chosen. Action a = m

corresponds to the server deciding to serve queue m. Clearly, action m is excluded in state

i = (i1, . . . , iK) if im = 0, i.e. m ∈ A(i) ⇐⇒ im 6= 0. In the empty state only one action

a = 0 can be taken, which means that the server will do nothing.

Exogeneous customers for the mth queue arrive with a rate of λm per unit of time. The

average service time for the three models is ν−1m for a customer from queue m. A customer

who finishes service in queue m, goes with probability rml to queue l, and with probability

1 −∑l rml he leaves the network. The routing matrix R = (rml)1≤m,l≤K is supposed to

Page 153: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 143

be the transition matrix of a transient Markov chain. Thus the inverse matrix (I −R)−1

exists and equals∑nR

n.

Uniformly strong recurrent MDC’s are positive recurrent under the deterministic policies.

A stability condition for the first version of the K competing queues has already been

derived in Makowski & Shwartz [1989]. For that condition we need the throughput γm of

queue m, which is the solution of the following set of equalities:

γm =K∑l=1

rlmγl + λm, m = 1, . . . ,K.

In vector notation, γ = (γ1, . . . , γK) solves

γ = RT γ + λ = λ∞∑n=0

Rn, (9.1)

where λ is the vector of arrival rates and RT the transposed matrix R. So, we assume the

following condition.

Ergodicity condition:K∑m=1

γmνm

< 1.

Notice that the ergodicity condition was proven to be sufficient, but not necessary. Makowski

& Shwartz [1989] proved stability (transience) for all non-idling policies if∑m(γm/νm) <

(>)1, but they derived no result for the case that∑m(γm/νm) = 1.

The next three subsections will construct suitable x1, . . . , xK > 0 in expression (3.3) to

prove the following theorem, with em the mth unit vector in a K dimensional space.

Theorem 9.1 Consider the three models in discrete time and assume the ergodicity con-

dition. Both discrete and continuous time versions of the second model are uniformly

strongly recurrent for a bounding vector µ of structure (3.3). Moreover, uniform strong

recurrence for the same bounding vector is equivalent to the spatial geometric bounded-

ness condition for the first and third models (cf. sections 9.2.2, 9.2.4). As the finite set M

we take M = 0 for the first and second models, and M = 0, em,m = 1, . . . ,K for the

third.

As was already noticed in the Introduction in section 5.1, combination of the theorem

with results from Baras, Ma & Makowski [1985], or Buyukkoc, Varaiya and Walrand

[1985] shows the optimality of the µc-rule in the class of all policies for a special case of

the time-slotted K competing queues model. We will examine this more closely.

We require R = 0, i.e. there is no random routing. Let ria := −∑m cimim, for cim > 0,

m = 1, . . . ,K. Number the queues such that c1ν1 ≤ · · · ≤ cKνK . The µc-rule, or νc-rule

as we work with service rates νm, is a deterministic policy f∗∞, such that

f∗(i) =

K, iK > 0

m, im+1, . . . , iK = 0, im > 0,m < K

0, i = 0.

Page 154: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

144 Part II, Chapter 9

It is easily checked, that the bounding vector increases faster than the absolute rewards.

Hence, Proposition 5.1 implies the existence of a deterministic Blackwell optimal policy

within the class of all policies. The papers by Baras, Ma & Makowski and Buyukkoc,

Varaiya & Walrand prove α-discounted optimality of the νc-rule in the class of determin-

istic ( and deterministic non-stationary) policies, for all discount factors α ∈ (0, 1). Then

obviously the νc-rule is Blackwell optimal in this class. Combination of both results yields

the following theorem.

Theorem 9.2 Consider the time-slotted K competing queues model without random rout-

ing and assume the ergodicity and spatial geometric boundedness conditions (cf. section

9.2.2). The νc-rule is Blackwell optimal in the class of all policies.

2.2. Time-slotted model.

All parameters of the model are described now, except for the specific assumptions on the

service times and arrival distributions.

With probability νm a customer from queue m finishes service after 1 time slot of service.

If his service is not finished, the server can decide to put him back into the queue or give

him another time slot of service. So, each time a customer visits queue m, he requires a

geometrically (νm) distributed number of time slots of service.

Let Am denote the number of exogeneous arrivals in queue m during one time-slot. We

assume it to be independent of the services and arrivals in all other time-slots, and of the

services in the same time-slot. The probability distribution of the vector A = (A1, . . . , AK)

is denoted by p, i.e. IP(A = i) = pi1,...,iK . Then, IEAm = λm. As has already been

mentioned, it is necessary to make an assumption on the Laplace-Stieltjes transforms of

the transition probabilities.

Spatial geometric boundedness condition: ∃ε > 0, such that IEeε∑

mAm <∞.

Evidently, this assumption is satisfied if the arrival streams are Poisson distributed. Now

we have the following transition probabilities for km ≥ 0, for m = 1, . . . ,K:

P0a(k1,...,kK) = pk1,...,kK

P(i1,...,iK)a(i1+k1,...,iK+kK) = pk1,...,kK(1− νa) + νaraa+ pk1,...,ka+1,...,kKνa(1−∑m

ram)

+K∑m=16=a

δ(km)pk1,...,km−1,...,ka+1,...,kKνaram

P(i1,...,iK)a(i1+k1,...,ia−1,...,iK+kK) = pk1,...,ka−1,0,ka+1,...,kKνa(1−∑m

ram)

+

K∑m=16=a

δ(km)pk1,...,ka−1,0,ka+1,...,km−1,...,kKνaram.

Page 155: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 145

For the proof of Theorem 9.1 we have to construct β < 1, x1, . . . , xK > 0, such that

‖0P (f)‖µ ≤ β < 1 ∀ f ∈ F with µ a bounding vector of structure (3.3) or, equivalently,

∑j 6=0

P(i1,...,iK)a(j1,...,jK)

∏Km=1(1 + xm)jm∏K

m=1(1 + xm)im≤ β < 1, ∀ a ∈ A(i), ∀ i ∈ E. (9.2)

Recall that taking action a implies, that ia 6= 0. So, if we sum over all j in the lefthand

side of (9.2), we get for (i1, . . . , iK) 6= (0, . . . , 0),

∑j

P(i1,...,iK)a(j1,...,jK)

∏Km=1(1 + xm)jm∏K

m=1(1 + xm)im= (9.3)

=∑

k1,...,kK≥0

pk1,...,kK

((1−νa)+νaraa

)+

νa1 + xa

(1−∑m

ram)+∑m6=a

1 + xm1 + xa

νaram

K∏m=1

(1+xm)km

=∑

k1,...,kK≥0

pk1,...,kK

1− νa

1 + xa(xa −

∑m

ramxm) K∏m=1

(1 + xm)km (9.4)

By virtue of the spatial geometric boundedness condition, (9.4) is analytic for all vectors

x = (x1, . . . , xK), with 1 + xm < eε. For (9.4) to be smaller than 1, we need solutions x

to the inequality x > xRT . Denote x(I − RT ) by d = (d1, . . . , dK). The invertibility of

I −RT implies that x can be expressed as a function of d, i.e.

x = d

∞∑n=0

(RT )n. (9.5)

Obviously, x is nonnegative if d is. Number the queues such that ν1 ≤ · · · ≤ νK . Let

nm := νm/ν1 (≥ 1), m = 1, . . . ,K and notice that νm/nm = ν1. In the sequel we consider

vectors d with the following structure: dm = δ/nm, for some δ > 0. Inserting this in (9.5)

we obtain the following expression for x

xm = δ∑l≤Kn∈IN0

1

nl(RT )

n

lm, m = 1, . . . ,K. (9.6)

We simplify (9.4) by inserting (9.6) in the expression between brackets, so that

(9.4) =∑

k1,...,kK

pk1,...,kK

(1− ν1δ

1 + xa

) K∏m=1

(1 + xm)km .

This is a function of δ and a only and we denote it by f(δ, a). Clearly f(0, a) = 1,

∀ a ∈ A(i), ∀ i 6= 0. Using that ∂∂δxm = xm/δ, we derive

∂δf(δ, a)=

∑k1,...,kK

pk1,...,kK

K∏m=1

(1 + xm)km∑m

kmxmδ(1 + xm)

(1− ν1δ

1 + xa

)− ν1

(1 + xa)2

.

Insert (9.6) in this expression, then for δ = 0

Page 156: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

146 Part II, Chapter 9

∂δf(0, a)=

∑m,l≤Kn∈IN0

λm1

nl(RT )

n

lm − ν1

=∑

l,m≤Kn∈IN0

1

nlλmR

nml − ν1 =

∑l

γlnl− ν1 =

∑l

γlνlν1 − ν1 < 0,

because of the ergodicity assumption. Hence, a δo exists, such that f(δ, a) < 1, ∀ a ∈ A(i),

∀ i 6= 0, ∀ δ ∈ (0, δo). Choose a positive δ < δo, and β satisfying 1 > β ≥ f(δ, a), then

(9.3)≤ β, for all a ∈ A(i) and all i 6= 0.

Obviously, to take the summation over less states does not invalidate the inequality, so

that (9.2) holds for all i 6= 0. We will check (9.2) for i = 0:

∑j 6=0

P0a(j1,...,jK)

∏Km=1(1 + xm)jm∏K

m=1(1 + xm)0=

∑j1,...,jK∑mjm 6=0

pj1,...,jK∏m

(1 + xm)jm

=∑

j1,...,jK≥0

pk1,...,kK

K∏m=1

(1 + xm)jm − p0, ∀ a ∈ A(0).

Because of the ergodicity condition, p0 is positive. Then the spatial geometric boundedness

condition implies the existence of an ε1 < ε, such that IEeε′∑

mAm − p0 < 1, for all

positive ε′ < ε1. By choosing δ < δo positive and sufficiently small, we can always achieve

1 + xm < eε1 , for m = 1, . . . ,K. Hence (9.2) holds for xm = δ∑

l≤Kn∈IN0

n−1l (RT )

n

lm,

m = 1, . . . ,K and some β < 1.

Conversely, assume that the uniform strong recurrence property holds. There is only one

closed class, so that the model is ergodic under any deterministic policy. We will show

that strong recurrence of the MC generated by a deterministic policy implies the spatial

geometric boundedness condition.

Fix f ∈ F and denote by T(i1,...,iK) the random variable associated with the recurrence

time to state 0, if the system starts in i = (i1, . . . , iK) and f∞ is the service control

policy. Recall that T(i1,...,iK) is the time between two successive visits to state 0 for

i = (i1, . . . , iK) = 0. We write X(n) =(X1(n), . . . , XK(n)

)n∈IN0 for the MC generated

by policy f∞, where Xm(n) is the number of customers in queue m at time n, m = 1 . . . ,K,

n ∈ IN0. Then X(n)n is strongly recurrent. Hence by Proposition 1.5 the recurrence

time to state 0 is exponentially bounded, i.e. there are β1 < 1 and c(i1,...,iK) > 0, such

that

IPT(i1,...,iK) ≥ n ≤ βn1 c(i1,...,iK), (i1, . . . , iK) ∈ E,n ∈ IN0. (9.7)

Consider the stochastic process N(n) =∑mXm(n)n∈IN0 that describes the total number

of customers in the system (controlled by policy f∞) and let TN,k be the recurrence time

to state 0 if the process starts with k customers in the system. Observe that N(n)n

Page 157: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 147

is not a Markov chain. Furthermore, let ck := maxc(i1,...,iK) |∑m im = k, k ∈ IN0.

Since N(n) = 0 iff X(n) = 0, we obtain by (9.7) for any initial state (i1, . . . , iK) with∑m im = k,

IPTN,k ≥ n ≤ ckβn1 , n ∈ IN0. (9.8)

Rather than study the complicated process N(n)n∈IN0 , we construct a stochastically

smaller process, where all customers are served at the same rate. Define ν′ := maxνk(1−∑

m rkm) | k ∈ 1, . . . ,K

. So, ν′ is the maximum rate that a customer is served and

leaves the system. Moreover, let p′kk be the distribution of the total number of arrivals∑mAm in one time slot, i.e.

p′k =∑

(k1,...,kK):∑

mkm=k

p(k1,...,kK).

We consider a MC N ′(n)n∈IN0with transition matrix PN ′ , where

PN ′,0 l := p′l, l ≥ 0

PN ′,k k+l := (1− ν′)p′l + ν′p′l+1, l ≥ 0, k ≥ 1

PN ′,k k−1 := ν′p′0, k ≥ 1.

It is more convenient here to use partial ordering of probability distributions on IN0, instead

of the stochastic ordering of random variables it induces (cf. section 3.3). Let P1, P2 be

probability distributions on IN0. Then

P1 P2 ⇐⇒∑l≥k

P1,l ≤∑l≥k

P2,l, ∀ k ∈ IN0.

As the evolution of the process in one time-slot is the same for all states k ≥ 1, it is easily

verified that

PN ′,k• PN ′,k+1•, k ∈ IN0.

We will check ∀ k(t) ∈ IN0, t = 0, . . . , n, that

PN ′,k(n)• IP(N(n+ 1) = • |

n∩t=0N(t) = k(t)

), n ∈ IN0. (9.9)

Indeed, for any k(t) ∈ IN0, t = 0, . . . , n, l ∈ IN0 we establish,∑s≥l

PN ′,k(n) k(n)+s =∑s≥l

(1− ν′)p′s +∑s≥l+1

ν′p′s =∑s≥l

p′s − ν′p′l

≤∑s≥l

p′s − νa(1−∑m

ram)p′l, ∀ a ∈ A((k1, . . . , kK)

)≤ IP

(N(n+ 1) ≥ k(n) + l |

n∩t=0N(t) = k(t)

), n ∈ IN0.

Page 158: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

148 Part II, Chapter 9

For l = −1, . . . ,−k(n),∑s≥l

PN ′,k(n) k(n)+s = IP(N(n+ 1) ≥ k(n) + l |

n∩t=0N(t) = k(t)

)= 1, n ∈ IN0.

Finally, if k = 0, the corresponding conditional probability distributions are the same,

thus establishing (9.9).

Since 0 is the smallest state, similar stochastic inequalities hold, if we take the taboo prob-

abilities for the taboo set 0 instead. In section 3.3 we mentioned the characterization of

stochastic ordering of random variables that uses expectations of non-decreasing functions.

This allows a straightforward extension of a well-known stochastic comparison result for

Markov processes (cf. e.g. Ridder [1987], Stoyan [1983]) and we obtain for k, n ∈ IN0,

0PnN ′k• IP

(N(n) = •, N(t) 6= 0, t = 1, . . . , n− 1 | N(0) = k

)0P

nN ′,k• 0P

nN ′,k+1•.

(9.10)

For the recurrence time TN ′,k to state 0 in the process N ′(n)n this yields by (9.8)

IPTN ′,k ≥ n=∑l 6=0

0Pn−1N ′,k l

≤ IP(N(t) 6= 0, t = 1, . . . , n− 1 | N(0) = k

)= IPTN,k ≥ n ≤ ckβn1 , (9.11)

and

IPTN ′,k ≥ n≤ IPTN,k+1 ≥ n, k, n ∈ IN0. (9.12)

Let x ∈ (1, 1/β1). The probability generating function FN ′,k 0(x) of recurrence times to

state 0 converges for this x, since by equation (9.11)

FN ′,k 0(x) =

∞∑n=1

IPTN ′,k = nxn

≤∞∑n=1

ckβn1 x

n

=xβ1

1− xβ1ck, ∀ k ∈ IN0.

Moreover, the function f(n) = xn, n ∈ IN0 is non-decreasing on IN0. Invoking (9.12) and

stochastic comparison arguments (cf. section 3.3) we thus obtain FN ′,k 0(x) ≤ FN ′,k+1 0(x),

k ∈ IN0, so that

FN ′,k+1 0(x) = x

∞∑l=0

PN ′,k+1 k+lFN ′,k+l 0(x)

≥ xFN ′,k 0(x), k ≥ 1.

Iterating this inequality we establish

Page 159: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 149

FN ′,k+1 0(x) ≥ xkFN ′,1 0(x), k ≥ 1.

Consequently,

FN ′,0 0(x) = x∑l 6=0

p′lFN ′,l 0(x) + xp′0

≥∑l 6=0

p′lxlFN ′,1 0(x) + xp′0

≥∑l=0

p′lxl.

Recall, that p′ll is the probability distribution of∑mAm. We conclude that IEeε

∑mAm<

∞, for ε > 0 with eε < x. This demonstrates the spatial geometric boundedness condition.

The proofs in the next subsections highly resemble the last proofs. Therefore we will only

briefly discuss the other two models.

2.3. The exponential model.

For the continuous time model we assume the arrival streams and the service time distri-

butions to be independent. The exogeneous arrival stream at queue m is a Poisson stream

with parameter λm, and the service time for a customer from queue m is exponentially

(νm) distributed, m = 1, . . . ,K.

Evidently, the MDP is uniformizable. So, we will study the AMDC. Let h denote the

length of a time interval with h(∑m λm +

∑m νm) < 1. In the discrete time model the

server observes the queues at time points nh∞n=0 and decides then which queue to serve.

The service mechanism is now the same as in the time-slotted model. However, here

we have at most one service completion or one arrival within a time interval. Therefore

arrivals and service completions in one time slot are not independent, so that this model

is not a special case of the previous one.

Recall that action a can not be chosen in state i, if ia = 0. Then the transition probabilities

are:

P(i1,...,iK)a(j1,...,jK) =

λmh, jm = im + 1, jl = il, l 6= m

νah(1−K∑m=1

ram), ja = ia − 1, jl = il, l 6= a

νahram,

jm = im + 1, ja = ia − 1, jl = ilm 6= a, l 6= m, a

1−K∑m=1

λmh− νah(1− raa), j = i

P(0,...,0)a(j1,...,jK) =

λmh, jm = im + 1, jl = il, l 6= m

1−K∑m=1

λmh, j = 0.

Page 160: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

150 Part II, Chapter 9

Remark that no assumption on Laplace-Stieltjes transforms is necessary, since the tran-

sition probabilities have finite support. Inserting the transition probabilities in (9.3) we

obtain

(9.3) =∑m

λmh(1 + xm) + νah(1−∑m

ram)1

1 + xa

+∑m 6=a

νahram1 + xm1 + xa

+ 1− h(∑m

λm + νa(1− raa))

= 1 +∑m

λmhxm −νah

1 + xa(xa −

∑m

ramxm). (9.13)

Using a similar procedure as before, and defining dm = δ/nm and xm as in (9.6), we get

(9.13) = 1 + hδ( ∑m,l≤Kn∈IN0

λm1

nl(RT )

n

lm −ν1

1 + xa

)= 1 + hδ

(∑l

γlnl− ν1

1 + xa

). (9.14)

As∑l(γl/nl) =

∑l(γl/νl)ν1, a δo exists such that

∑l(γl/nl) < ν1/(1+xm), m = 1, . . . ,K,

for all δ ∈ (0, δo). So we can choose suitable δ < δo, β, such that (9.3)≤ β < 1. Conse-

quently (9.2) is satisfied for i 6= 0. It only remains to be shown that∑j

0P0a(j1,...,jK)

∏m

(1 + xm)jm =∑m

λmh(1 + xm) ≤ β. (9.15)

This might be conflicting with (9.2) for i 6= 0, because of the dependence between δo, h

and β. However the lefthandside of (9.15) is equal to(∑

m λm + δ∑l(γl/nl)

)h, which is

smaller than (∑m λm+δν1)h. For δ < 1 the last expression is smaller than 1 by definition.

So a β < 1 can be found to satisfy (9.2) for all i ∈ E.

2.4. The semi-Markov model.

The exogeneous arrival streams in the semi-Markov decision model are assumed to have

the same distributions as in the previous section, i.e. they are Poisson (λm) distributed

for queue m, m = 1, . . . ,K. The service times are allowed to have a general distribution

with mean ν−1m for a customer from queue m. We assume the service times and arrival

streams to be independent.

Denote the service time distribution of a customer in queue m with Bm, and let Bm(t) =

IPBm ≤ t. We assume that a server cannot interrupt his service, so the system can only

be controlled just after a service completion or just after an arrival of a customer in an

empty system.

For K = 1 this is nothing else but the M/G/1-queue. In the same vein as the results in

section 3.1 the following assumption turns out to be necessary and sufficient.

Spatial geometric boundedness condition: ∃ε > 0, such that IEeεBm <∞, m = 1, . . . ,K.

Page 161: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 151

The transition probabilities for the embedded chain are as follows. For km ≥ 0, k =

1, . . . ,K,

P0a(k1,...,kK) =λm∑l λl

, km = 1, kl = 0, l 6= m,m = 1, . . . ,K

P(i1,...,iK)a(i1+k1,...,iK+kK) =

∫t≥0

∏m

e−λmt(λmt)km

(km)!

raa +

λat

ka + 1(1−

∑m

ram)

+∑m6=a

λat

ka + 1· kmλmt

· δ(km)ram

dBa(t)

P(i1,...,iK)a(i1+k1,...,ia−1,...,iK+kK)=

∫t≥0

∏m 6=a

e−λmt(λmt)km

(km)!e−λat

(1−

∑m

ram)

+∑m6=a

kmλmt

· δ(km)ram

dBa(t)

As the finite set M we choose M = 0, em,m = 1, . . . ,K. For the proof of µ − UGR(M)

with µ of structure (3.3) we have to check

∑(j1,...,jK)6∈M

P(i1,...,iK)a(j1,...,jK)

∏Km=1(1 + xm)jm∏K

m=1(1 + xm)im≤ β < 1 (9.16)

As the lefthandside of (9.16) is bounded by (9.3), we insert the transition probabilities in

(9.3) first, for i 6= 0. Then

(9.3) =∑

k1,...,kK

∫t≥0

∏m

e−λmt(λmt)km

(km)!(1 + xm)kmdBa(t)

raa +

1−∑mram

1 + xa+∑m6=a

1 + xm1 + xa

ram

=

∫t≥0

e

∑m λmxmtdBa(t) ·

1 +∑m ramxm

1 + xa

=

∫t≥0

e

∑m λmxmtdBa(t) ·

(1− δ

na(1 + xa)

), (9.17)

where again the same choices of dm and xm as in the previous subsections are used (cf.

(9.6)). (9.17) is a function of δ and a, which we denote by f(δ, a). By the spatial geometric

boundedness condition f is analytic for all small values of δ. Then the partial derivative

to the first variable exists and equals

∂δf(δ, a)=

∫t≥0

e

∑m λmxmt

t∑m

λmxmδ

(1− δ

na(1 + xa)

)− 1

na(1 + xa)2

dBa(t).

Insertion of (9.5) gives

Page 162: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

152 Part II, Chapter 9

∂δf(0, a)=

∑l

γlnlνa

− 1

na=

1

νa

(∑l

γlnl− ν1

)< 0.

This implies that δ > 0, β < 1 can be found such that (9.3)≤ β, ∀ a ∈ A(i), i 6= 0. As in

(9.16) the summation over less states is taken, (9.16) holds as well for the same values of

δ and β, for a ∈ A(i), i 6= 0.

By the choice of set M , the lefthandside of (9.16) is 0, for i = 0. So, (9.16) holds for all

a ∈ A(i), i ∈ E, and the MDC is µ− UGR(M).

We can take take M = 0 as well. In this case µ(0,...,0) has to be chosen differently.

Indeed, it is easy to check that it has to satisfy the following inequality

µ(0,...,0) ≥ β−1∑m

λm∑l λl

(1 + xm).

This is always possible, since it is the only inequality in which µ(0,...,0) appears.

Conversely, assume that the uniform strong recurrence property holds. Similarly to the

time-slotted model there is one closed class under any deterministic policy, so that by

Corollary 6.3 the model is ergodic. Our object is the same as in subsection 9.2.2, p.

146. However, we can not apply the same technique, as the N ′-process has in fact a

stochastically minimal service time distribution. No such data are available for the semi-

Markov model.

Fix f ∈ F and denote by T(i1,...,iK) the recurrence time to state 0, if the system starts in

i = (i1, . . . , iK) and f∞ is the service control policy. Similarly as before, T(i1,...,iK) is the

time between two successive visits to state 0, if (i1, . . . , iK) = 0. The MC generated by

policy f∞ is strongly recurrent, hence by Proposition 1.5 the recurrence time to state 0 is

exponentially bounded, i.e. there are β1 < 1 and c(i1,...,iK) > 0, such that

IPT(i1,...,iK) ≥ n ≤ βn1 c(i1,...,iK), (i1, . . . , iK) ∈ E,n ∈ IN. (9.18)

Choose m ∈ 1, . . . ,K. Consider the stochastic process X ′m(n)n∈IN0 associated with

the number of customers in queuem at time instants that correspond to a subset of decision

moments of the semi-Markov decision chain. The queue is first observed at the beginning

of time. If the queue is not empty at some observation instant, the next observation is

either at the first instant the queue is empty, or at the instant that a service of this queue

starts. If it is empty, the next observation is at the next decision moment of the original

process. The recurrence time to state 0 will be denoted by TX′m,im , if the systems starts

with im customers in queue m and policy f∞ is used (for the original process).

Obviously, the recurrence time to state 0 in the new process is stochastically smaller (cf.

section 3.3) than the recurrence time to state 0 in the original process, since a visit to

state 0 = (0, . . . , 0) implies a visit to state 0 in the X ′m-process. Thus, for any state i ∈ Ewith mth coordinate im,

TX′m,imst≤ Ti. (9.19)

Page 163: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 153

Generally the process X ′m(n)n is not a Markov process, and its marginal probability

distributions are fairly complicated, we construct a stochastically smaller process, where

in fact only the behaviour of the system with respect to queue m in the first time-slot

after each observation in the process X ′m(n)n is taken into account. We consider a MC

Y (n)n∈IN0on IN0 with transition matrix PY defined as follows.

PY,0 k :=

∫t≥0

e−λmt(λmt)k

k!dBm(t), k ≥ 0

PY,im im+k:=

∫t≥0

e−λmt(λmt)k

k!

(rmm +

λmt

k + 1(1− rmm)

)dBm(t), im ≥ 1, k ≥ 0

PZ,im im−1:=

∫t≥0

e−λmt(1− rmm)dBm(t), im ≥ 1.

First we observe, that (cf. subsection 9.2.2)

PY,im• PY,im+1•, im ∈ IN0,

and thus since 0 is the smallest state

0PY,im• 0PY,im+1•, im ∈ IN0. (9.20)

For the comparison of the X ′m- and Y -process we have to revert directly to the taboo

transition probabilities, since the processes are not generally comparable in state 0. Notice

that for im(t) ∈ IN0, t = 0, . . . , n− 1 and im(n) ≥ 1 that

IP(X ′m(n+ 1) ≥ im(n) + k |

n∩n=0X ′m(t) = im(t)

)≥ IP

the queue grows at least k within the first epoch after an observation,

excluding arrivals from another queue and assuming queue m to be served

=∑l≥k

PY,im(n) im(n)+l, ∀ k, n ∈ IN0,

so that for im(n) ≥ 1, im(t) ∈ IN0, t = 0, . . . , n− 1

PY,im(n)• IP(X ′m(n+ 1) = • |

n∩t=0X ′m(t) = im(t)

), n ∈ IN0.

Combination of (9.20) with arguments similar to subsection 9.2.2 shows for n ∈ IN0 that

0PnY,im• IP

(X ′m(n) = •, X ′m(t) 6= 0, t = 1, . . . , n− 1 | X ′m(0) = im

), im ≥ 1

0PnY,im• 0P

nY,im+1•, im ∈ IN0.

(9.21)

Hence, for the recurrence time TY,im to state 0 in the process Y (n)n, combination with

(9.18) and (9.19) yields

Page 164: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

154 Part II, Chapter 9

IPTY,im ≥ n=∑k 6=0

0Pn−1Y,imk

≤ IP(X ′m(t) 6= 0, t = 1, . . . , n− 1 | X ′m(0) = im

)= IPTX′m,im ≥ n≤ IPTi ≥ n ≤ ciβn1 , im ≥ 1, n ∈ IN0, (9.22)

and any initial state i ∈ E with mth coordinate equal to im. Also,

IPTY,im ≥ n≤ IPTY,im+1 ≥ n, im, n ≥ 0. (9.23)

Consequently the probability generating function FY,im 0(x) of the recurrence times to

state 0 converges for x ∈ (1, 1/β1), for im ∈ IN0. Moreover, using (9.23) we can proceed

analogously to section 9.2.2 to obtain

FY,0 0(x) ≥∑k 6=0

PY,0 kxkFY,1 0(x) + xPY,0 0

≥∞∑k=0

PY,0 kxk

=∞∑k=0

∫t≥0

e−λmt(λmt)k

k!xkdBm(t)

=

∫t≥0

eλmt(x−1)dBm(t).

We conclude, that the spatial geometric boundedness condition holds for ε<(β−1−1) minm

λm.

3. The two centre open Jackson network.

At first sight the analysis of the 2 centre network does not seem more complicated than the

K-competing queues with one service centre. However, this turns out not to be the case.

Until now we were not able to construct suitable bounding vectors for Jackson networks

with more than two centres. In the two centre network problems arise at the boundaries

of the state space, so that we have to resort to a vector with structure (3.2). The case of

one fixed service rate for each service centre has been analysed extensively in Hordijk &

Spieksma [1989b].

Consider a service system with infinite buffers and two service centres. The arrival stream

in centre k from outside the network is Poisson λk distributed, k = 1, 2. A customer

that finishes service at centre k routs to centre l with probability rkl, or he leaves the

network with probability 1 − rk1 − rk2. We assume that each customer eventually leaves

the network. So, just as in section 9.2 the routing matrix R is assumed to be the transition

matrix of a transient Markov chain.

Page 165: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 155

The state space E consists of all pairs (i, j) | i, j ∈ IN0, where i and j are the number

of customers in centre 1 and 2 respectively.

Control of the system is exercised in the following way. At each moment of time a pair

of service rates (a1, a2) is selected from a compact action set A(i, j), if the state at that

moment is (i, j). The MDP is uniformizable if the following condition is satisfied.

Uniformization condition: ∃a = (a1, a2) < ∞, such that ak ≤ ak, k = 1, 2, ∀ a ∈ A(i, j),

∀ (i, j) ∈ E.

Ergodicity of the system under any deterministic policy f∞ is ensured, if the throughput

in centre k (which is independent of the policy) is strictly smaller and bounded away

from all service rates that can be selected, when many customers are present in centre k,

k = 1, 2. The vector γ of the throughput in the respective centres satisfies formula (9.1)

for K = 2. So, in formula our ergodicity assumption reads as follows:

Uniform ergodicity assumption: Finite numbers I, J ∈ IN0 exist, such that

a1 := infi≥I,j∈IN0

a1 | ∃a2 such that (a1, a2) ∈ A(i, j) > γ1

a2 := infi∈IN0,j≥J

a2 | ∃a1 such that (a1, a2) ∈ A(i, j) > γ2.

We study the AMDC, where time is discretized at intervals of length h with h(a1 + a2 +

λ1 + λ2) < 1. So we have,

P(i,j)a(i+1,j) = λ1h P(i,j)a(i,j+1) = λ2h

P(i,j)a(i−1,j) = δ(i)a1h(1− r11 − r12) P(i,j)a(i,j−1) = δ(j)a2h(1− r21 − r22)

P(i,j)a(i−1,j+1) = δ(i)a1hr12 P(i,j)a(i+1,j−1) = δ(j)a2hr21

P(i,j)a(i,j) = 1− λ1h− δ(i)a1h(1− r11)− λ2h− δ(j)a2h(1− r22).

Clearly, the transition probabilities have finite support, so no extra assumptions on the

convergence of Laplace-Stieltjes transforms are necessary.

The model is uniformly strong recurrent if a vector µ, with µ(i,j) ≥ 1, a finite set M , and

a β < 1 exist such that∑(k,l) MP(i,j)a(k,l)µ(k,l)

µ(i,j)≤ β < 1, ∀ a ∈ A(i, j), ∀ (i, j) ∈ E. (9.24)

For all large states (i, j) the taboo transition probabilities and the transition probabilities

are equal. Let µ be a vector with structure (3.2), then we have to verify the following set

of inequalities for all a ∈ A(i, j) and for all large values of i or j.

Page 166: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

156 Part II, Chapter 9

∑(k,l) P(i,j)a(k,l)µ(k,l)

µ(i,j)≤ β < 1 (9.25)

⇐⇒ λ1h(1 + xi+1) + δ(i)a1h(r12

1 + yj+1

1 + xi+ (1− r11 − r12)

1

1 + xi

)+ λ2h(1 + yj+1) + δ(j)a2h

(r21

1 + xi+1

1 + yj+ (1− r21 − r22)

1

1 + yj

)+1− h

(λ1 + λ2 + δ(i)a1(1− r11) + δ(j)a2(1− r22)

)≤ β

⇐⇒ v(a, i, j) :=a1δ(i)

1 + xi(xi − r11xi − r12yj+1) +

a2δ(j)

1 + yj(yj − r21xi+1 − r22yj)

−λ1xi+1 − λ2yj+1 ≥1− βh

. (9.26)

Suppose for the moment that xk, k ∈ IN and yl, l ∈ IN are fixed. Then expression (9.26)

only depends on (i, j) through δ(i), δ(j) and the service rates (a1, a2). Define the following

functions

f(a, x, y) :=a1

1 + x(x− r11x− r12y)+

a2

1 + y(y − r21x− r22y)− λ1x− λ2y.

f1(a1, x, y) :=a1

1 + x(x− r11x− r12y) − λ1x− λ2y.

f2(a2, x, y) :=a2

1 + y(y − r21x− r22y)− λ1x− λ2y.

f corresponds to the case that both i and j are positive. Similarly f1(a1, x, y) and

f2(a2, x, y) correspond to only i and only j respectively being positive. If both components

are 0, the state is (0,0), and this can be put into the finite set M .

Consequently, constant sequences xkk, yll in (3.2) only exist iff there is a solution

(x, y), with x, y > 0, such that f(x, y), f1(x, y), f2(x, y) > 0. Therefore it is useful to

study the functions f , f1 and f2 in more detail. We use an analogous procedure to section

9.2. Define the vector d = (d1, d2) := (x, y)(I − RT ), then (x, y) = d∑n (RT )

n. It is

straightforward to prove the following lemma (cf. the proof of Lemma 4.1 in Hordijk &

Spieksma [1989b]).

Lemma 9.1

i) If d1 > 0, d2 = 0, thena1

1 + x> γ1 ⇐⇒ f1(a1, x, y) = f(a, x, y) > 0, ∀ a2 ∈ IR+

ii) If d1 = 0, d2 > 0, thena2

1 + y> γ2 ⇐⇒ f2(a2, x, y) = f(a, x, y) > 0, ∀ a1 ∈ IR+

iii) If d1 > 0, d2 > 0, thena1

1 + x> γ1

a2

1 + y> γ2

=⇒ f(a, x, y) > 0.

Page 167: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Applications 157

Proof: Rewrite f as

f(a, x, y)= d

(a1/1 + x

a2/1 + y

)− (x, y)

(λ1

λ2

)= d

(a1/1 + x

a2/1 + y

)− d

∞∑k=0

(RT )k(λ1

λ2

)= d

((a1/1 + x)− γ1

(a2/1 + y)− γ2

)= d1

( a1

1 + x− γ1

)+ d2

( a2

1 + y− γ2

), (9.27)

where the expression for (x, y) as a function of d is used to get the second equality, and

the definition of the throughput for the third. The assertion of the lemma follows directly

from (9.27).

By choosing d1 and d2 sufficiently small, we can always achieve a1/(1 + x) > γ1 and

a2/(1 + y) > γ2. From the lemma it is evident, that solutions (x, y) exist, for which either

f and f1 or f and f2 are positive, if a1 ≥ a1 and a2 ≥ a2. The example below shows,

that it is not generally possible to find (x, y) ≥ (0, 0), such that the three functions are

simultaneously positive.

Example: λ1 = 12 , a1 = 3, λ2 = 3 a2 = 4, R =

(0 10 0

). In this case

f1(3, x, y) =3

1 + x(x− y)− 1

2x− 3y ⇒ y > 12x

f2(4, x, y) =4

1 + yy − 1

2x− 3y > 0 ⇒ 512x > y.

No (x, y) satisfies these inequalities simultaneously, so that we have to resort to a vector

µ with structure (3.2).

Mark, that (x, y) is non-negative if d = (x, y)(I − R) is non-negative. Choose positive

d∗1 and d∗2 and let (x∗, y∗) = d∗∑n (RT )

n. We may assume, that a1/(1 + x∗) > γ1 and

a2/(1 + y∗) > γ2, otherwise we divide d∗ by a large enough constant to achieve this.

Furthermore, let x∗1 and y∗1 be solutions to

x∗1 := r11x∗1 + r12y

y∗1 := r21x∗ + r22y

∗1 .

Observe that f(a, x, y) is positive for all (x, y) ∈ R = (x, y) | x = x∗, y ∈ [y∗1 , y∗] or x ∈

[x∗1, x∗], y = y∗. As R is compact, f(a, x, y) is bounded away from 0 on R, say by

a constant m. Then f(a, x, y) ≥ f(a, x, y) > 0 for (x, y) ∈ R if a1 ≥ a1 and a2 ≥a2. Consequently, , f1(a, x∗, y∗1) = f(a1, x

∗, y∗1) > 0 for a1 ≥ a1 and f2(a, x∗1, y∗) =

f(a2, x∗1, y∗) > 0 for a2 > a2.

Let δ > 0 be such, that δ(a2r21(1+y∗)−1 +λ1

)≤ m/2 and δ

(a1r12(1+x∗)−1 +λ2

)≤ m/2.

As appropriate sequences in (3.2) we choose

xk =

x∗1, k ≤ Ix∗1 + (k − I)δ, I ≤ k ≤ I +

⌊x∗ − x∗1δ

⌋=: I∗

x∗ otherwise,

Page 168: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

158 Part II, Chapter 9

yl =

y∗1 , l ≤ Jy∗1 + (l − J)δ, J ≤ l ≤ J +

⌊y∗ − y∗1δ

⌋=: J∗

y∗ otherwise.

A short explanation of how we came up with this choice, is due now. As has already been

mentioned, problems are caused by the boundary of the state space. So, first we ensure a

proper bounding vector for all states with many customers in both centres (by choosing

x∗ and y∗). Then y∗ is adjusted to give a proper bounding vector if centre 2 is empty, and

the same way x∗ is adjusted for centre 1. Thus we obtain y∗1 and x∗1.

No condition is required on the relation between γ2 and the service rate in centre 2 for

states with many customers in centre 1 but few in centre 2. Possible problems can be

avoided if we take care that contributions due to a service completion in centre 2 cancel

in expression (9.26). This is effectuated by the choice of y∗1 in these states. We proceed

analogously for the corresponding states with the roles of centres 1 and 2 interchanged.

The gap between x∗1 and x∗ and between y∗1 and y∗ is filled by slowly increasing x∗1 and

y∗1 such that the corresponding function values remain strictly positive.

This proves property µ−UGR(M), for µ as above, the finite set M = (i, j) | i ≤ I∗, j ≤ J∗and constant c in (3.2) equal to 1. Indeed, checking expression (9.26) we obtain

v(a, i, j) =

f(a, x∗, y∗) ≥ f(a, x∗, y∗) ≥ m, i ≥ I∗, j ≥ J∗

f(a, x∗, yj)− δ( a1r12

1 + x∗+ λ2

)≥

≥ f(a, x∗, yj)− δ( a1r12

1 + x∗+ λ2

)≥ m

2, i ≥ I∗, J ≤ j < J∗

f1(a1, x∗, yj) ≥ f1(a1, x

∗, y∗1) ≥ m, i ≥ I∗, 0 ≤ j < J

f(a, xi, y∗)− δ

( a2r21

1 + y∗+ λ1

)≥

≥ f(a, xi, y∗)− δ

( a2r21

1 + y∗+ λ1

)≥ m

2, I ≤ i < I∗, j ≥ J∗

f2(a2, xi, y∗) ≥ f2(a2, x

∗1, y∗) ≥ m, 0 ≤ i < I, j ≥ J∗.

By choosing β such that (1− β)/h ≤ m/2 we verify

(9.24)

≤∑

(k,l) P(i,j)a(k,l)µ(k,l)

µ(i,j)≤ β for i ≥ I∗ or j ≥ J∗

= 0 for i ≤ I∗ − 1 and j ≤ J∗ − 1.

This proves the following theorem.

Theorem 9.3 Assume the uniform ergodicity and the uniformization conditions. The

two centre open Jackson network with control of the service rates is uniformly strongly

recurrent both in discrete and continuous time for a product form bounding vector µ of

structure (3.2).

Page 169: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

PART III

LINEAR PROGRAMMING AND

CONSTRAINED OPTIMAL CONTROL OF QUEUES

The effect of maximising throughput without constraints on the waiting time

Page 170: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties
Page 171: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction of Linear Programs and Results 161

CHAPTER TEN

Introduction of Linear Programs and Results.

1. Linear programs.

Quite a number of papers have studied linear programs for solving stochastic dynamic

programming problems for the (constrained) average optimality criterion in MDC’s with

finite state and action spaces. The analysis regards the feasible solutions of the dual

program, which correspond to the long range expected state-action frequencies generated

by arbitrary policies, and the relation between optimal solutions to the dual program and

optimal policies. If multichain structures are allowed, this is quite a complicated problem,

which was solved in two papers by Hordijk & Kallenberg [1979], [1984]. For discussions

on earlier work in this field we refer to these papers and Kallenberg [1980].

Recent work related to the analysis in Part III can be found in papers by Ross [1989],

Ross & Varadarajan [1989], Borkar [1988], Altman & Shwartz [1989], [1990] and refer-

ences therein. Using the linear programming formulation Ross [1989] shows optimality

of stationary policies with degree of randomization of at most K, if there are K average

expected cost constraints. This parallels a similar result for K = 1 in Hordijk & Spieksma

[1989a]. Ross & Varadarajan [1989] study constrained optimization problems with cost

constraints on the state-action frequencies and use a parametric linear programming al-

gorithm to construct nearly optimal stationary policies.

Extensions to denumerable states Markov decision processes are contained in the papers

by Borkar, Altman & Shwartz. Altman & Shwartz focus on the polyhedron of long range

expected state- action frequencies, whereas Borkar studies the probability distributions of

the sample paths under any policy.

The linear program is not of finite dimension, if the state space is denumerable. In this case

however, we do obtain a finite dimensional linear programming problem, if we consider

the polyhedron of expected rewards as a function of the policy space. Gelenbe & Mitrani

[1980], Chapter 6, study a single-server queue with multiple, say K, customer classes. The

performance of the system is measured through aK-dimensional vector of expected average

response times. They study the polyhedron of achievable expected average response times,

and thus obtain optimality of priority rules for average expected costs that are a linear

function of the response times. Federgruen & Groenevelt [1988] extend these results to

M/G/s-type queueing systems with multiple customer classes.

Page 172: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

162 Part III, Chapter 10

In Part III we will ourselves restrict to MDC’s with finite action spaces and only one class

under any policy, so we assume the following condition.

Assumption 10.1:

i) A(i) is finite for i ∈ E.

ii) The MC generated by a stationary policy π∞ is unichain, i.e there is one positive

recurrent class and a possibly empty set of transient states T (π). Besides, for E(π)

the positive recurrent class we require FiE(π)(π) = 1, ∀ i ∈ E, i.e. the process is

absorbed with probability 1 into E(π), for any initial state i.

A first topic of Part III is the derivation of conditions in Chapter 11, for which the results

for finite MDC’s can be extended to denumerable MDC’s. Let us review the existing

results and assume the state space to be finite.

The average optimality equation can be solved by the linear program

ming : g + vi −∑j∈E

Piajvj ≥ ria, (i, a) ∈ E ×A, (10.1)

where E × A is shorthand notation for the set (i, a) | a ∈ A(i), i ∈ E. For convenience

we identify E with a subset of IN0 and A(i) with 1, . . . , |A(i)|. Instead of (10.1) we solve

the dual program

(LP) max

(i,a)∈E×A

riaxia

∑(i,a)∈E×A

(δij − Piaj)xia = 0, j ∈ E

∑(i,a)∈E×A

xia = 1

xia ≥ 0, (i, a) ∈ E ×A

,

and we denote by X the convex polytope of feasible solutions. As the state space is finite,

X is a bounded polytope.

From the characterization of X it is not remarkable, that X coincides with the set of

stationary distributions on states and actions of the MC’s generated by the stationary

policies. We will discuss this in more detail in section 10.2 and continue our summary of

known results first.

For any policy R ∈ C and any initial state i ∈ E, xN (i, R) denotes the vector of expected

state-action frequencies in N + 1 time-periods, i.e.

xNja(i, R) =1

N + 1IEi,R

( N∑n=0

1X(n)=j,Y (n)=a

), ∀ (j, a) ∈ E ×A.

Then xN (i, R) is a probability measure on E × A. We write X(i, R) for the set of

vector limit-points of the sequence xN (i, R) | N ∈ IN0. X(i, R) 6= ∅, as the vectors

Page 173: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction of Linear Programs and Results 163

xN (i, R), N ∈ IN0, have nonnegative components that are bounded by 1. Observe first

that |X(i, π∞)| = 1 for any π∞ ∈ C(S). Moreover, the stationary matrix Π(π) of the MC

generated by the stationary policy π∞ has equal rows, when Assumption 10.1ii) holds (cf.

Lemma 2.1). As it is easily derived that

xja(i, π∞) = Πij(π)πja, (10.2)

we may suppress the dependence on i in the notation by writing x(π) instead of x(i, π∞).

We introduce the following notation,

Li := x(i, R) ∈ X(i, R) | R ∈ CLi(M) := x(i, R) ∈ X(i, R) | R ∈ C(M)Li(S) := x(i, π∞) | π∞ ∈ C(S)Li(D) := x(i, f∞) | f∞ ∈ C(D).

Remark that the special structure of the MDC’s we analyse, implies that Li(S) and Li(D)

are independent of i. However, we prefer to give the general definitions. In order to match

the terminology in the work by Derman and Hordijk & Kallenberg we write A for the

closed, convex hull of set A (cf. Royden [1968], p. 207) throughout Part III.

The following result is known for finite MDC’s (cf. Derman [1970], pp. 80, 93-95)

Proposition 10.1 Let E be finite.

i) Li(S) = X.

ii) Li = Li(M) = Li(S) = Li(D).

iii) For x∗ an optimal solution of (LP), π∗∞ ∈ C(S) with

π∗ja =

x∗ja∑

a∈A(j) x∗ja

, if∑

a∈A(j)

x∗ja > 0

arbitrary with∑

a∈A(j)

π∗ja = 1, if∑

a∈A(j)

x∗ja = 0,(10.3)

is an average optimal policy. f∞ with f(j) ∈ a | x∗ja > 0 is a deterministic average

optimal policy.

As has already been mentioned, the (LP) is a finite dimensional linear program, since the

state space is finite. Because of Proposition 10.1i) X 6= ∅, so that there is an optimal

solution x∗. Since X and Li(S) are identical, the existence of stationary optimal polcies

is guaranteed. If the state space is denumerable, the (LP) is infinite dimensional and the

existence of feasible solutions alone is not sufficient to guarantee the existence of optimal

solutions. This will be illustrated by an analysis of the Fisher & Ross Counterexample

[1968] in section 11.4.

Recall Remark 5.1, p. 94. We state the complete result derived by Strauch & Veinott (cf.

Derman [1970], p. 91-93) for finite MDC’s and later extended by Hordijk [1974], Theorem

13.2 to denumerable MDC’s. No further assumptions on the MC structure are required.

Page 174: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

164 Part III, Chapter 10

Proposition 10.2 Consider a denumerable MDC. For any sequence of policies R1, R2, . . .,

any distribution α on IN and any i ∈ E, there is a Markov policy RM such that∞∑k=1

αkIPi,Rk(X(n) = j, Y (n) = a) = IPi,RM(X(n) = j, Y (n) = a), ∀ (j, a) ∈ E×A,n ∈ IN0.

Consequently, the similar relation holds for the corresponding expected state-action fre-

quencies in finite time-periods, hence for the vector limit-points.

Corollary 10.1 Li = Li(M), for denumerable MDC’s.

For denumerable MDC’s satisfying Assumption 10.1 no new proof techniques are required

to show Proposition 10.1i). Altman & Shwartz [1990] recently showed the validity of

Proposition 10.1ii) under an additional tightness condition (“weak tightness”). This tight-

ness condition is, roughly speaking, tantamount to continuity of the stationary probabil-

ities on states and actions, as a function of the stationary policies. It is essential for the

results, since without this condition Li(S) need not be a closed set and stationary optimal

policies may not exist. This is precisely what happens in the Fisher & Ross Counterexam-

ple, as we will see in section 11.4.

Under suitable conditions on the immediate rewards, guaranteeing upper semi-continuity

of the average expected rewards as a function of the stationary policies, optimal policies

can be shown to exist. Moreover, Proposition 10.1iii) holds. These results are contained

in Altman & Shwartz [1990], but not always explicitly stated. In sections 11.1, 11.2 we

give the proofs under the same assumptions as Altman & Shwartz. For µ-bounded or

nonpositive immediate rewards these conditions are e.g. implied by condition µ− UGR.

We already pointed out that the (LP) is an infinite dimensional program, if the MDC is

denumerable. This fact bars application of standard techniques for linear programming

from linear algebra. Instead our analysis relies heavily on probabilistic arguments. Notice,

that the use of the polyhedra of long range expected state-action frequencies allows a direct

derivation of optimality results through Proposition 10.1, whereas solutions to the average

optimality equation need not exist.

Consider again the characterization in the assertion of Proposition 10.1 of the polyhedra of

long range expected state-action frequencies produced by the various sets of policies. For

any criterion that only involves these frequencies, it is a useful tool to derive optimality

results. In particular we will study the constrained optimization problem as the second

topic of Part III.

Let c be a vector of immediate costs on E × A. Denote by R(i, R), C(i, R) the average

expected reward and average expected cost under policy R, when the initial state is i. The

average expected cost is defined through the “limsup”, i.e.

C(i,R) := lim supN→∞

1

N + 1IEi,R

( N∑n=0

cX(n),Y (n)

).

Page 175: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction of Linear Programs and Results 165

Under the conditions in Chapter 11 R(i, π∞) =∑

(j,a) rjaxja(π) for a stationary policy

π∞, as is the case for finite MDC’s. A similar expression holds for C(i, π∞). Hence we

write R(π) and C(π) instead. The constrained optimization problem is defined as

maxR(i, R) | R ∈ C : C(i, R) ≤ α. (10.4)

Our purpose is the show the existence of optimal policies and to determine which classes

of policies contain an optimal policy for the problem for denumerable MDC’s. Hordijk &

Kallenberg [1984] use the following linear programming problem to obtain results for the

finite state space model.

(CLP) max

(i,a)∈E×A

riaxia

∑(i,a)∈E×A

(δij − Piaj)xia = 0, j ∈ E

∑(i,a)∈E×A

xia = 1

∑(i,a)∈E×A

ciaxia ≤ α

xia ≥ 0, (i, a) ∈ E ×A

.

Let XC denote the convex polytope of feasible solutions, then XC=x ∈ X |∑

(i,a) ciaxia≤α. Furthermore, SC := π ∈ S | C(π) ≤ α and Li(SC) := x(π) | π ∈ SC. The

following proposition is essentially contained in Hordijk & Kallenberg [1984].

Proposition 10.3 Let the state space be finite.

i) Li(SC) = XC .

ii) If x∗ is optimal for the (CLP), then π∗∞ defined through (10.3) is an average optimal

policy for the constrained problem (10.4).

The issue of the existence of optimal solutions to the (CLP) obviously need not be raised,

since the (CLP) is a finite dimensional linear programming problem and XC is bounded.

Extending the result to denumerable MDC’s we have to ensure that SC is closed in S.

A sufficient condition to obtain this, is a lower semi-continuity property of the expected

average costs C(π) as a function on SC . Together with the conditions that yield the

conclusions of Proposition 10.1, this establishes both the existence of optimal solutions

to the (CLP) and the assertion of Proposition 10.3 in section 11.3. A similar result is

contained in the paper by Altman & Shwartz [1990] and we have adopted several of their

proof techniques.

Using the linear program it is straightforward to show for finite MDC’s, that an extreme

point (cf. Royden [1968], p. 207) of XC corresponds to either a deterministic policy with

expected average cost less or equal to α or a stationary policy that randomizes in at most

one state between two actions, such that the expected average cost equals α. We will give

the proof in Chapter 12, but make the set of policies under consideration more explicit

Page 176: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

166 Part III, Chapter 10

below. Let ND be the set of “nearly deterministic” decision rules, in the following sense

ND = f ∈ F | C(f) ≤ α ∪

π ∈ S ∃i ∈ E, f1, f2 ∈ F with

πjf1(j) = 1, f1(j) = f2(j), j 6= i

πif1(i) + πf2(i) = 1

C(f1) > α > C(f2), C(π) = α

.

Determination of the extreme points of XC establishes the following fact.

Proposition 10.4 Let the state space be finite. Li(ND) = XC ,

with Li(ND) = x(π) | π ∈ ND. Section 11.3 shows the result of Proposition 10.4 for

denumerable MDC’s under the same conditions that yield the assertion of Proposition

10.3, thus proving that an optimal policy within the smaller class exists. As far as we

know the result for the case of a denumerable state space is new.

This section ends with an overview of the contents of Part III. The next section shows some

useful properties of the stationary policies with respect to the corresponding stationary

distributions on states and actions, amongst which is the Key lemma. It states, that

the stationary distribution on states and actions is a convex combination of stationary

distributions for stationary policies that randomize in one state less. In this sense the

result might be called a splitting procedure. We point out that a similar formula has been

derived in Altman & Shwartz [1989] for the analysis of policy time sharing policies for

adaptive control. As the expected average reward R(π) under a stationary policy is a

linear function of x(π) under the conditions of Chapter 11, the Key lemma implies that

the expected reward is a convex combination of the expexted rewards under the “splitting”

policies as well.

Chapter 11 extends the optimization results we discussed in this section to denumerable

state spaces. Sections 11.1 and 11.2 prove the conclusion of Proposition 10.1. Section

11.3 first derives a more complex splitting procedure than the Key lemma, to show the

conclusion of Proposition 10.4. The desired optimality results and Proposition 10.3 then

easily follow, similarly to the proofs in sections 11.1, 11.2. As already mentioned, section

11.4 constructs a policy for the Fisher & Ross Counterexample for which the condition

used in the foregoing sections fails. The construction is similar to the construction of a

nonstationary optimal policy for the Counterexample in Hordijk & Tijms [1970].

Chapter 12 addresses the third topic of Part III. First it gives the arguments for Proposition

10.4 in the case of finite MDC’s. Obviously, these are not necessary since the result is

contained in section 11.2. However, we prefer to give it as a contrast to the proof techniques

used for denumerable MDC’s.

The chapter studies one-dimensional queueing systems, where control is exercised by re-

stricting arrivals. Due to the structure of the model, the weights of the convex combina-

tions in the Key lemma have simple expressions, which are stated in the Key theorem. By

Page 177: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Introduction of Linear Programs and Results 167

virtue of Proposition 10.4, the search of optimal policies may be restricted to policies that

randomize in at most one state. This fact is used to establish conditions for the existence

of threshold and thinning optimal policies. A detailed overview of related literature on

constrained admission control models is given in Chapter 12 itself, as Table 5. In fact,

these models illustrate the limitations of the analysis of constrained optimization in de-

numerable MDC’s. Formulated as denumerable MDC’s they generally do not satisfy the

conditions we use in Chapter 11. For threshold optimality a technique used by Ridder

[1987] can be applied to show that indeed the desired optimality results hold in the case

of a denumerable state space.

2. Some results on stationary policies.

Due to the importance of the stationary policies for the analysis of feasible solutions to

the linear programs (LP) and (CLP) we will study these more closely. We consider denu-

merable MDC’s that satisfy Assumption 10.1. All quantities used in Part II as (matrix)

functions on F can be extended as matrix functions on S, e.g. P (π) is the transition

matrix of the MC generated by policy π∞ ∈ C(S), with

Pij(π) =∑a∈A(i)

Piajπia, i, j ∈ E.

We observe that S is a compact set. Indeed, define the following metric d on S

d(π, π′) =∑

(i,a)∈E×A

|πia − π′ia|2−(i+a).

Since by Assumption 10.1i) A(i) is finite, the set M(A(i)

)of probability measures on

A(i) is a compact, separable space with respect to the metric on M(A(i)

)induced by d.

According to a theorem by Tychonov (cf. Kelley [1955] p.143) S is a compact, separable

metric space with respect to the metric d.

A measure x on E is said to be P (π)-excessive if x ≥ xP (π). The following result is

important to determine the relation between Li(S) and X.

Lemma 10.1 Upto a multiplicative constant there is a unique P (π)-excessive measure x

on E, such that x = xP (π).

The proof is a straightforward extension of the proof of Proposition 6.4 in Kemeny, Snell

and Knapp [1976], and can be found as such in the paper by Altman & Shwartz ([1990]).

Using relation (10.2), this implies uniqueness of the stationary distribution on states and

actions. For convenience we write xj(π) =∑a∈A(j) xja(π), for j ∈ E. Then

xja(π) = xj(π)πja. (10.5)

By virtue of Lemma 2.1 we also conclude that Assumption 10.1i) implies E(π) to be the

set of essential states in the MC generated by π∞. This has a useful consequence for the

comparison of stationary policies. For π ∈ S, i ∈ E choose π ∈ S such, that πja = πja for

a ∈ A(j), j 6= i, i.e. π and π only differ in state i.

Page 178: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

168 Part III, Chapter 10

Lemma 10.2

i ∈ E(π) ⇐⇒ i ∈ E(π).

Proof: By Assumption 10.1ii) state i is accessible from any other state in the MC generated

by π∞, if it is positive recurrent. Thus it is accessible from any other state under policy

π∞, so that it must be essential, hence positive recurrent in the MC generated by π∞.

The reversed relation is proved similarly.

So, if state i ∈ E is transient for some stationary policy π∞, any stationary policy π∞ with

πia = 1 for some a ∈ A(i) and πj• = πj• for j 6= i, has the same stationary distribution

on states and actions as π∞. We will refer to this as elimination of randomized decisions.

For a positive recurrent state i ∈ E the splitting procedure from the Key lemma allows us

to circumvent proof techniques used for finite MDC’s that do not extend to denumerable

MDC’s.

Key lemma Let π ∈ S. Suppose that i ∈ E(π) and that π randomizes in state i between

actions a(1), . . . , a(l). Denote by πk the decision rule that chooses action a(k) in state i

deterministically, but otherwise equals π, k = 1, . . . , l. For T the random variable that

denotes the recurrence time to state i, xja(π) =∑lk=1 λkxja(πk), (j, a) ∈ E ×A, with

λk =πia(k)IEi,π∞k T∑l

m=1 πia(m)IEi,π∞m T, k = 1, . . . , l. (10.6)

Proof: Apply the Ergodic Theorem (cf. Ross [1970], Theorem 3.16). For immediate

rewards equal to 1 any time the process is in state j and action a is chosen, and 0 otherwise

xja(π) = limN→∞

1

N + 1IEi,π∞

( N∑n=0

1X(n)=j,Y (n)=a

)=

IEi,π∞(∑T

n=1 1X(n)=j,Y (n)=a)

IEi,π∞T

=

∑lk=1 πia(k)IEi,π∞k

(∑Tn=1 1X(n)=j,Y (n)=a

)∑lk=1 πia(k)IEi,π∞k T

. (10.7)

A second application of the same theorem yields

IEi,π∞k

( T∑n=1

1X(n)=j,Y (n)=a

)= xja(πk)IEi,π∞

kT. (10.8)

Combination of (10.7) and (10.8) completes the proof.

Page 179: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 169

CHAPTER ELEVEN

Optimization and Constrained Optimization.

1. Polyhedron of state-action frequencies.

In this section we will show the relation between the sets of vector limit-points of the

average expected state-action frequencies produced by the various sets of policies, which

is given in Proposition 10.1i) and ii). Throughout the whole chapter Assumption 10.1 is

supposed to hold. The state space is assumed to be denumerable.

With the purpose of making the “weak tightness” condition that will be introduced later,

more plausible, we define the following polyhedron.

X∗ =

x ∈ E ×A

∑a∈A(j)

xja ≥∑

(i,a)∈E×A

xiaPiaj , j ∈ E

∑(i,a)∈E×A

xia ≤ 1

xia ≥ 0, (i, a) ∈ E ×A

.

So, X∗ is a collection of excessive, possibly defective probability measures on E ×A, and

X is a collection of invariant probability measures.

In this context it is convenient to use the notion of completeness (cf. Gelenbe & Mitrani

[1980], p. 188, Altman & Shwartz [1990]).

Definition 11.1: A class C∗ ⊂ C of policies is said to be complete if |X(i, R)| = 1, ∀R ∈ C∗and Li(C∗) = Li, for i ∈ E.

The following result is easily established. The proof of the first assertion uses arguments

from the proof of Theorem 3.2 in the paper by Altman & Shwartz, whereas the second

assertion is shown similarly to the same assertion for finite MDC’s, which can be found in

Derman [1970].

Theorem 11.1

i) Li ⊂ X∗.ii) The conclusion of Proposition 10.1i) holds, i.e. Li(S) = X.

Proof: i) Choose a policy R and a vector limit-point x ∈ X(i, R). By definition there is a

Page 180: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

170 Part III, Chapter 11

subsequence n(k)k∈IN ⊂ IN0, such that

xja = limk→∞

xn(k)ja (i, R), ∀ (j, a) ∈ E ×A.

On the other hand, conditioning on (X(n − 1), Y (n − 1)) we obtain by virtue of the

Fubini-Tonelli theorem for j ∈ E, n ∈ IN

IEi,P 1X(n)=j= IEi,R

(IEi,R

(1X(n)=j | X(n− 1), Y (n− 1)

))=

∑(l,a)∈E×A

PlajIEi,R1X(n−1)=l,Y (n−1)=a. (11.1)

Since A(j) is finite, we may pass the summation under the limit sign, so that

∑a∈A(j)

xja= limk→∞

1

n(k) + 1

∑a∈A(j)

IEi,R

(n(k)∑n=1

1X(n)=j,Y (n)=a

)

= limk→∞

1

n(k) + 1IEi,R

(n(k)∑n=1

1X(n)=j

)

= limk→∞

∑(l,a)∈E×A

1

n(k) + 1IEi,R

(n(k)−1∑n=0

1X(n)=l,Y (n)=aPlaj

)= limk→∞

∑(l,a)∈E×A

xn(k)la (i, R)Plaj , (11.2)

where we use (11.1) for the third equality. Application of Fatou’s lemma together with

(11.2) yields that ∑a∈A(j)

xja ≥∑

(l,a)∈E×A

limk→∞

xn(k)la (i, R) =

∑(l,a)∈E×A

xlaPlaj . (11.3)

Similar arguments lead to∑(j,a)∈E×A

xja =∑

(j,a)∈E×A

limk→∞

xn(k)ja (i, R) ≤ lim

k→∞

∑(j,a)∈E×A

xn(k)ja (i, R) = 1. (11.4)

Hence x ∈ X∗. Defining the stationary policy π∞ through (10.3) we obtain, that x as a

vector on E, with xj =∑a∈A(j) xja, is P (π)-excessive by (11.3). Moreover, by (11.4) x

is a possibly defective probability measure. By virtue of Lemma 10.1 we conclude that

xj = cxj(π), for some constant c ≤ 1, j ∈ E. Since xja = xjπja = cxj(π)πja = cxja(π),

also

xja = cxja(π), (j, a) ∈ E ×A. (11.5)

Page 181: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 171

ii) The inclusion Li(S) ⊂ X is fairly obvious. Indeed, by x(π) = x(π)P (π) and (10.2),∑a∈A(j)

xja(π) = xj(π) =∑l∈E

xl(π)Plj(π)

=∑

(l,a)∈E×A

xl(π)Plajπla

=∑l∈E

xla(π)Plaj .

Let x ∈ X, and proceed similarly as in the proof of i). However, since x is a probability

measure, the constant c equals 1, and thus x = x(π), for π defined through (10.3).

From the proof of Theorem 11.1 follows that Li = Li(S) iff the set of average expected

state-action frequencies xN (i, R)∞N=0 is relatively compact for any i ∈ E and any R ∈ C.By virtue of Proposition 10.2 we may restrict ourselves to Markov policies. Due to a

theorem by Prohorov (cf. Billingsley [1968], p. 37) this relative compactness property is

equivalent to the following condition.

Weak tightness condition: For any R ∈ C(M) and any i ∈ E, the collection of probability

measures xN (i, R) | N ∈ IN0 on E ×A is tight.

The condition is satisfied if e.g. µ−UGR holds for some bounding vector µ with µi ≥ 1, i ∈E (cf. Lemma 6.7). Next we derive a useful and very interesting (partial) characterization

of this weak tightness condition. To this end denote by M(E × A) the set of probability

measures on E ×A, and define the metric d′ on M(E ×A) through

d′(x1, x2) =∑

(i,a)∈E×A

|x1,ia − x2,ia|2−(i+a),

for x1, x2 ∈M(E ×A).

Continuity condition: x(π) is a continuous function in π ∈ S with respect to the metric

d′.

Notice, that the metric d′ is consistent with the topology of weak convergence of probability

measures on M(E × A). Hence continuity of x(π) with respect to d′ is equivalent to

continuity of xja(π), (j, a) ∈ E × A. So, by virtue of Lemma 6.3 (cf. also Hordijk [1974],

pp. 82-83), the continuity condition is equivalent to tightness of the set x(π) | π ∈ S(under Assumption 10.1).

Theorem 11.2 Weak tightness implies the continuity condition. Conversely, under the

additional assumption that E(π) = E for any π ∈ S, i.e. the state space consists of a

single positive recurrent class for all stationary policies, the continuity condition implies

weak tightness.

Thus on one hand weak tightness is necessary for Li to be equal to Li(S), on the other

hand it is a weak condition to ensure continuity of x(π), and hence the compactness of

Li(S).

Page 182: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

172 Part III, Chapter 11

The proof uses arguments from the proof of Theorem 5.1 in Altman & Shwartz’s paper.

The proof of the second assertion is essentially due to Borkar [1986] (cf. the proof of Lemma

7.3 of his paper) and was used originally by Altman & Shwartz to obtain equivalence of

weak tightness and tightness of the stationary probabilities.

Proof of Theorem 11.2: Suppose that the weak tightness condition holds and let πn∞n=1 ⊂S be a sequence with limit say π∗. First we notice, that for any initial distribution α on

E the dominated convergence theorem yields for π ∈ S

limN→∞

∑i∈E

αixNja(i, π∞) =

∑i∈E

limN→∞

αixNja(i, π∞) =

∑i∈E

αixja(π) = xja(π), (j, a) ∈ E ×A.

Consequently

limN→∞

d′(∑i∈E

αixN (i, π∞), x(π)

)= 0. (11.6)

Choose a sequence εn∞n=1, with εn ↓ 0 for n tending to infinity and fix a state i′ ∈ E.

Define a policy R as follows: decision rule π1 is used upto and including time n(1), with

n(1) := minN | d′(x(π1), xN (i′, π∞1 )

) < ε1.

Inductively, decision rule πk is used from time n(k− 1) + 1 upto and including time n(k),

with

n(k) := minN > n(k − 1) | d′(x(πk), xN (i′, Rk)

)< εk, (11.7)

where Rk is the policy that coincides with policy R upto time n(k−1)+1 and uses decision

rule π(k) ever afterwards. We claim that the finiteness of n(k−1) implies finiteness of n(k).

Indeed, this follows from (11.6) for the initial distribution α with αl=IPi′,RX(n(k−1)=l,since

xNja(i′, Rk) =n(k − 1)

N + 1xn(k−1)ja (i′, R)

+N + 1− n(k − 1)

N + 1

∑l∈E

αlxN−n(k−1)ja (l, π∞k ), N > n(k − 1).

By the weak tightness condition the set xN (i, R) | N ∈ IN0 is tight. Thus the subse-

quence xn(k)(i′, R) | k ∈ IN is tight and consequently relatively compact by a theorem

of Prohorov. Hence it contains a weakly convergent subsequence xn(km)(i′, R) | m ∈ INwith limit x∗ say. x∗ is a probability measure on E × A, and d′

(x∗, xn(km)(i′, R)

)→ 0, if

m tends to infinity. Then

d′(x∗, x(πkm)

)≤ d′

(x∗, xn(km)(i′, R)

)+ d′

(xn(km)(i′, R), x(πkm)

)≤ d′

(x∗, xn(km)(i′, R)

)+ εkm ,

so that

limm→∞

d′(x∗, x(πkm)

)= 0. (11.8)

Page 183: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 173

We will show that x∗ = x(π∗). Similarly to (10.3) and (11.5) there is a policy π∞, such

that x∗ = x(π), with πja = x∗ja/∑a x∗ja if

∑a x∗ja > 0. Let j ∈ E(π), i.e.

∑a x∗ja > 0. By

(11.8) there is an M ∈ IN, such that∑a xja(πkm) > 0 for m ≥M . Hence,

π∗ja = limm→∞

πja(km) = limm→∞

xja(πkm)∑a∈A(j) xja(πkm)

=x∗ja∑

a∈A(j) x∗ja

= πja.

So, π∗ja = πja, for a ∈ A(j), j ∈ E(π). Hence, x(π) solves x(π)P (π∗) = x(π). By virtue of

Lemma 10.1 we conclude, that x(π∗) = x(π). This establishes continuity of x(π).

We will now assume, that E(π) = E, for π ∈ S. Without further assumptions but the

finiteness of A(i), for i ∈ E, a result by Fisher [1968] implies, that

infπ∈S

xi(π) > 0.

Obviously, the continuity condition and the compactness of S are stronger conditions to

imply the same fact. Fix a state i′ ∈ E and denote by T the random variable associated

with the recurrence time to state i′. Let KM := (j, a) | (j, a) ∈ E × A, j ≥ M, for

M ∈ IN. By virtue of the Ergodic Theorem (cf. Ross [1970], Theorem 3.16) we first

observe that

supπ∈S

IEi′,π∞(T−1∑n=0

1(X(n),Y (n))∈KM

)= supπ∈S

( ∑(j,a)∈KM

xja(π) · IEi′,π∞T)

≤ supπ∈S

( ∑(j,a)∈KM

xja(π))·(

infπ∈S

xi′(π))−1

<∞, M ∈ IN.

Thus by the existence of stationary weak nearly optimal policies in positive dynamic

programming (Hordijk [1974], Theorem 13.6)

supR∈C

IEi′,R

(T−1∑n=0

1(X(n),Y (n))∈KM

)= supπ∈S

IEi′,π∞(T−1∑n=0

1(X(n),Y (n))∈KM

)<∞, M ∈ IN.

We will now prove the weak tightness. So let R ∈ C(M). For T l the random variable

associated with the lth visit to state i′ and T 0 = 0∑(j,a)∈KM

xNja(i′, R)=1

N + 1IEi′,R

( N∑n=0

1(X(n),Y (n))∈KM

)

≤ 1

N + 1IEi′,R

(TN+1−1∑n=0

1(X(n),Y (n))∈KM

)

=1

N + 1

N∑l=0

IEi′,R

(T l+1−1∑T l

1(X(n),Y (n))∈KM

)≤ supπ∈S

( ∑(j,a)∈KM

xja(π))·(

infπ∈S

xi′(π))−1

, M ∈ IN. (11.9)

Page 184: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

174 Part III, Chapter 11

Hence, the problem of proving the weak tightness is reduced to proving tightness of the

set x(π) | π ∈ S. However, this is implied by the continuity condition and thus for any

ε > 0 we can find an M , such that (11.9) < ε. Since (11.9) does not depend on the value

of N , the proof is completed.

Notice, that the proof in fact shows the equivalence of continuity of x(π) on S and tightness

of the set of expected state-action frequencies over N time periods, uniformly for any policy

and any time N .

Now we have established the continuity of x(π) as a function of π ∈ S, compactness of Syields compactness of Li(S).

Corollary 11.1 The weak tightness condition implies that Li(S) is compact.

The final theorem of this section completes the relations between the various polyhedra.

Theorem 11.3 Assume the weak tightness condition. Then the assertions of Proposition

10.1i), ii) hold. Consequently, the stationary policies C(S) are complete, and the extreme

points of Li(S) = X are contained in Li(D).

Proof: For the proof of Proposition 10.1i), ii) it only remains to be shown that Li =

Li(S) = Li(D). We first show Li = Li(S). Obviously Li(S) ⊂ Li, so let x ∈ Li. Similarly

to relation (11.5) in the proof of Theorem 11.1 we derive that x = cx(π), for some c ≤ 1,

π ∈ S. Due to the weak tightness condition x is a probability measure, so that c = 1 and

x ∈ Li(S).

Since Li(S) = X, Li(S) is a convex set. Moreover, by Corollary 11.1 Li(S) is compact and

thus Li(D) ⊂ Li(S). Let x(π) ∈ Li(S). Randomized decisions in states that are transient

for π∞, are eliminated by replacing these by any deterministically chosen decision. So

without loss of generality we may assume that π only randomizes in states that are positive

recurrent for π∞.

If π randomizes in finitely many states, a repeated application of the splitting procedure

in the Key lemma and elimination of randomized decisions by turns yields, that x(π) =∑kl=1 λlx(fl), for some k ∈ IN, λ1, . . . , λk with

∑l λl = 1 and f1, . . . , fk with fl(j) ∈ a ∈

A(j) | πja > 0, ∀ j ∈ E. Thus x(π) ∈ Li(D).

Suppose that π randomizes in infinitely many states that are positive recurrent for π∞.

Define a sequence πn∞n=1 ⊂ S with πn,ia = πia, for a ∈ A(i), i ≤ n, and πn,ia = 1 for

some a ∈ A(i), if i > n, n ∈ IN. Then πn randomizes in at most n + 1 states and thus

by the previous arguments x(πn) ∈ Li(D). Moreover, limn→∞ πn = π, hence by Theorem

11.2 limn→∞ x(πn) = x(π). Since Li(D) is closed, we have achieved that x(π) ∈ Li(D).

We finally invoke a theorem by Krein-Milman (cf. Royden [1968]), which states that a

compact, convex set is the closed convex hull of its extreme points. By virtue of the

Key lemma an extreme point of Li(S) must correspond to a deterministic policy, since

otherwise the splitting procedure can be applied.

Page 185: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 175

2. Optimization.

2.1. General assumptions.

Throughout section 11.2 we assume that the weak tightness condition holds in addition

to Assumption 10.1. We derive conditions that yield the existence of an optimal solution

to (LP), and show that the maximum is attained in an extreme point of X. Thus the

existence of deterministic optimal solutions is guaranteed.

In Chapter 10 we mentioned that upper semi-continuity of the average expected rewards

is sufficient. Two conditions implying this are the following two:

i) weakly uniformly integrable rewards, i.e. rX(n),Y (n)∞n=0 is uniformly integrable for

any policy R ∈ C(M) and any initial state i ∈ E.

ii) upper-s-bounded immediate rewards, i.e ria ≤ 0, (i, a) ∈ E × A. Furthermore, if

R(i, π∞) = −∞ for an i′ ∈ T (π) then R(i, π∞) = −∞ for i ∈ E(π).

Notice the difference between i) and uniform integrability as used in e.g. Chapter 6. The

adjective “weakly” is added to denote that the uniform integrability property is merely

required for each policy separately. Observe, that by Proposition 10.2 weak uniform

integrability of the rewards with respect to Markov policies implies the same property for

arbitrary policies. For the verification of i) we can use condition µ−UGR, if the immediate

rewards are µ-bounded (cf. Lemma 6.7). ii) involves negative dynamic programming. The

extra condition on the value of the average expected rewards is necessary to ensure that

R(i, π∞) is independent of the initial state, so that the stationary policies are sufficient.

This is reflected by the ‘s’ in the terminology. We need it for the equivalence of the (LP)

and the stochastic dynamic programming problem.

The notion of sufficiency of sets of policies plays an important role in the analysis of

Altman & Shwartz [1990]. We introduce it below.

Definition 11.2: A class C∗ ⊂ C of policies is sufficient for an optimization problem, if for

any i ∈ E, ∈ C, there is an R∗ ∈ C∗, such that R(i, R∗) ≥ R(i, R).

2.2. Weakly uniformly integrable rewards.

We first assume the rewards to be weakly uniformly integrable. Hence, for any ε > 0 and

R ∈ C, i ∈ E, there is a finite number K(ε), such that∑(j,a):|rja|≥K(ε)

|rja|IPi,R(X(n) = j, Y (n) = a) < ε, n ∈ IN0.

Thus IEi,R|rX(n),Y (n)| ≤ K(ε) + ε is finite and uniformly bounded in n ∈ IN0. Moreover,

R(i, R) = lim infN→∞

1

N + 1IEi,R

( N∑n=0

rX(n),Y (n)

)= lim inf

N→∞

∑(j,a)∈E×A

rjaxNja(i, R).

Page 186: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

176 Part III, Chapter 11

Choose a subsequence N(k)∞k=1 of IN0, such that

R(i, R) = limk→∞

∑(ja,)∈E×A

rjaxN(k)ja (i, R). (11.10)

By the weak tightness condition xN(k)(i, R)k contains a weakly convergent subsequence,

which we denote by xN(k)(i, R)k for notational simplicity, with weak limit, say x∗.

Then, by virtue of Theorem 5.4 from Billingsley [1968] we may pass the limit under the

summation sign in (11.10), so that

R(i, R) =∑

(j,a)∈E×A

limk→∞

rjaxN(k)ja (i, R) =

∑(j,a)∈E×A

rjax∗ja. (11.11)

The first equality is also straightforward to prove without invoking this theorem. If R =

π∞ for π ∈ S, then any subsequence of xN (i, π∞)N has the same weak limit x(π), hence

R(i, π∞) = R(π) =∑

(j,a)∈E×A

rjaxja(π), (11.12)

as was already mentioned in Chapter 10. By (11.12) and the identity X = Li(S) we

conclude that there is an optimal solution to the (LP) iff there is an optimal solution to the

problem maxR(π) | x(π) ∈ Li(S). Indeed, with r(X) = ∑

(j,a)∈E×A rjaxja | x ∈ X,the following lemma is established.

Lemma 11.1 r(X) = R(π) | π ∈ S = R(i, R) | R ∈ C, so that C(S) is sufficient for

the optimization problem.

Since Li(S) is compact, it is sufficient for the existence of optimal policies to show, that

R(π) is a continuous function of x(π). By the compactness of S and the continuity of x(π)

on S this is implied by the following lemma.

Lemma 11.2 Under the weak uniform integrability condition R(π) is continuous in π ∈ S.

Proof: A slight adaptation of the proof of Theorem 11.2 suffices. It consists of using

decision rule πk from time n(k − 1) + 1 upto and including time n(k) with

n(k) = min

N > n(k − 1)

d′(xN (i′, Rk), x(πk)

)< εk

|∑

(j,a)∈E×A

rjaxNja(i′, Rk)−R(πk)| < εk

.

Indeed, we obtain the finiteness of n(k) inductively by (11.11), using that x(i′, Rk) = x(πk).

For a weakly converging subsequence xn(km)(i′, R)m with limit x∗, we have

limm→∞

|∑

(j,a)∈E×A

rjaxn(km)ja (i′, R)−R(πkm)| = 0.

Since both terms within the absolute-sign converge,

limm→∞

R(πkm) = limk→∞

∑(j,a)∈E×A

rjaxn(km)ja (i′, R) =

∑(j,a)∈E×A

rjax∗ja = R(π∗).

The last theorem enables us to relate optimal solutions of the (LP) to optimal policies.

Page 187: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 177

Theorem 11.5 Under the weak uniform integrability condition the (LP) has an optimal

solution x∗, for which the assertion of Proposition 10.1iii) holds.

Proof: By virtue of Lemma 11.2 and the compactness of S the function R(π) attains a

maximum on S in π∗, say. Using Li(S) = X and relation (11.12), we derive optimality of

x∗ = x(π∗) for the (LP).

Consider any optimal solution x∗ of the (LP) and let π∗ be defined through (10.3). Then

x∗ = x(π∗) (cf. proof of (11.5). Define F∗ = f ∈ F | f(i) ∈ a | x∗ia > 0 and

Sn :=π ∈ S

∃f ∈ F∗, with πif(i) = 1,i ≤ nπia = π∗ia, for a ∈ A(i), i > n

.

Thus Sn consists of decision rules that are deterministic for states i ≤ n. We apply

an iterative splitting procedure to policy π∗∞, by starting to split in state 0. If state

0 is positive recurrent for π∗∞, we apply the Key lemma, and write x(π∗) as a convex

combination, so that any stationary distribution for a policy in S0 has positive weight.

Thus, if S0 = π1, . . . , πk, then there are λl > 0, l = 1, . . . , k, such that∑kl=1 λl = 1 and∑

l λlx(πl) = x(π). If 0 is transient, then it is irrelevant which action is chosen in this

state, and any convex combination is correct. The case that π∗∞ randomizes in finitely

many states is obviously included in the construction.

Let r∗ = R(π∗). As R(π) ≤ r∗ for π ∈ S and R(π) is a linear function of x(π), we conclude

that R(π) = r∗ for π ∈ S0.

We proceed by choosing a policy π∞ with π ∈ S0, and splitting the corresponding sta-

tionary distribution through state 1, etc. Observe, that for any policy π∞n with πn ∈ Sn,

there is a policy π∞n−1 with πn−1 ∈ Sn−1, such that x(πn) appears as one of the splitting

distributions of x(πn−1). Inductively we conclude that R(π) = r∗, if π ∈ Sn for some

n ∈ IN.

Choose f ∈ F∗. By the construction of Sn, there is a sequence πn∞n=1, such that πn ∈ Snand limn→∞ πn = f in the metric d. The continuity of R(π) on S implies, that

R(f) = limn→∞

R(πn) = limn→∞

r∗ = r∗.

This establishes the assertion of the theorem.

The arguments in the proof are based on probabilistic arguments. An alternative way to

show the existence of a deterministic average optimal policy is by using the observation,

contained in the proof of the Krein-Milman theorem in Royden [1968], that a continuous

linear functional on a convex, compact set attains its maximum in an extreme point of the

set. This can be applied to the set Li(S) or X.

Page 188: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

178 Part III, Chapter 11

2.3. Upper-s-bounded rewards.

Assuming the immediate rewards to satisfy ii) in subsection 11.2.1, it is not generally

possible to identify R(i, R) with an element of the set r(X). We will show however, that

it is sufficient to analyse the stationary policies. Therefore we study the average expected

rewards under arbitrary policies first.

Let R ∈ C and i ∈ E. By the non-positivity of rja, (j, a) ∈ E×A, no summability problems

occur. Instead of (11.11) we obtain for a weakly converging subsequence xN(k)(i, R)kwith weak limit x∗,∑

(j,a)∈E×A

rjax∗ja =

∑(j,a)∈E×A

limk→∞

rjaxN(k)ja (i, R)

≥ limk→∞

∑(j,a)∈E×A

rjaxN(k)ja (i, R) = R(i, R),

(11.13)

where Fatou’s lemma is used for the second equality. For the sufficiency of the stationary

policies we need (11.12) to hold. So, we consider a stationary policy π∞, and use the

Ergodic theorem. We suppose, that i′ is a positive recurrent state in the MC generated by

π∞. Let Tm be the random variable associated with the time of the mth visit to i′. Then

Tm∞m=1 is a (possibly delayed) renewal process. If IEi′,π∞(∑T−1

n=0 rX(n),Y (n)

)> −∞,

then by the Ergodic theorem

R(i′, π∞)= limN→∞

1

N + 1IEi′,π∞

( N∑n=0

rX(n),Y (n)

)=

IEi′,π∞(∑T−1

n=0 rX(n),Y (n)

)IEi′,π∞T

=∑

(j,a)∈E×A

rjaIEi′,π∞

(∑T−1n=0 1X(n)=j,Y (n)=a

)IEi′,π∞T

=∑

(j,a)∈E×A

rjaxja(π).

If IEi′,π∞(∑T−1

n=0 rX(n),Y (n)

)= −∞, then by using a decreasing sequence of lower bounded

reward functions rnn with rn,ja = −n1rja≤−n + rja1rja>−n and the monotone con-

vergence theorem, we can show that the same expression as above denotes the average

expected rewards (cf. proof of Lemma 2.3 in Altman & Shwartz [1990]).

For a transient initial state i ∈ E, Tmm is a delayed renewal process, and the condition

on the immediate rewards is sufficient for (11.12). Combination with (11.13) establishes

the following lemma.

Lemma 11.3 r(X) = R(π) | π ∈ S and C(S) is sufficient for the optimization problem.

For the existence of optimal policies we need the following theorem.

Page 189: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 179

Theorem 11.6 R(π) is u.s.c (upper semi-continuous) in π ∈ S.

Proof: It is sufficient to show for any constant c ∈ IR, that the set π | R(π) ≥ c is

closed in S. Suppose that the assertion is not true. Then there is a converging sequence

πnn ⊂ π ∈ S | R(π) ≥ c with limit say π∗, such that R(π∗) < c. Apply Fatou’s

lemma, thenc ≤ lim sup

n→∞R(πn)

≤∑

(j,a)∈E×A

lim supn→∞

rjaxja(πn) =∑

(j,a)∈E×A

rjaxja(π∗),

by the continuity of x(π) on S. This contradicts the assumption that R(π∗) < c, so that

the theorem is proved.

Then optimality of deterministic policies is ensured.

Theorem 11.7 Under the upper-s-boundedness condition there is an optimal solution x∗

to (LP) and the assertion of Proposition 10.1iii) holds.

Proof: As R(π) is upper semi-continuous on the compact set S, it attains a maximum

value, say r∗, for some π∗ ∈ S. Using the identity Li(S) = X we verify that x∗ = x(π∗) is

an optimal solution of the (LP). The proof of the second assertion is similar to the proof

of Theorem 11.5. We only have to check, that R(f) = r∗ for f ∈ F∗. This follows by

Fatou’s lemma, analogously to the proof of Theorem 11.6.

3. Constrained optimization.

Next we suppose that there is a constraint on the average expected cost. The purpose of

this section is to show the existence of optimal policies, their relation with the (CLP) and

to determine sufficient sets of policies. As before we assume the weak tightness condition

to hold.

Two types of immediate costs are under consideration:

i) weakly uniformly integrable costs

ii) lower-s-bounded immediate costs, i.e. cia ≥ 0, for (i, a) ∈ E × A. Furthermore, if

C(i, π∞) =∞ for an i ∈ T (π), then C(j, π∞) =∞ for all j ∈ E(π).

For these immediate cost definitions the s-average expected cost is independent of the

initial state of the process for any stationary policy π∞ and equals

C(π) =∑

(j,a)∈E×A

cjaxja(π).

Lemma 11.4 SC is compact, consequently Li(SC) is compact. Li(SC) = XC , so that the

assertion of Proposition 10.3i) holds.

Proof: For the compactness of SC it is sufficient to show, that it is closed in S. This follows

immediately, since for immediate cost structures i) and ii) C(π) is a continuous and l.s.c.

Page 190: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

180 Part III, Chapter 11

(lower semi-continuous) function on S respectively. Together with the continuity of x(π)

on S we establish the compactness of Li(SC). The relation Li(SC) = XC directly follows.

Indeed, using that Li(S) = X we have for x ∈ XC with x = x(π) that∑

(j,a)∈E×A cjaxja =

C(π) ≤ α and vice versa.

Obviously Li(SC) = XC is convex as well, so that it is the closed convex hull of its extreme

points, by the Krein-Milman theorem. We will show that any extreme point corresponds

to a policy π∞ with π ∈ ND. To this end we need two constrained splitting procedures.

Let ı ∈ E, π ∈ S and let πl, l = 1, . . . , a = |A(ı)|, be defined as

πl,ja = πja, a ∈ A(j), j 6= ı

πl,ıl = 1.

Let

A =

π ∈ S

πja = πja, a ∈ A(j), j 6= ı,

C(π) ≤ α

and

G = A ∩π ∈ S π = πl, l ∈ 1, . . . , |A(ı)| ∪

π ∈ S∃l,m ∈ 0, . . ., a with C(πl) > α > C(πm)

πıl + πım = 1

C(π) = α

.Write Li(A) = x(π) | π ∈ A and similarly for Li(G).

Lemma 11.5 “constrained splitting procedure in one state”: Li(A) = Li(G).

Proof: The result is trivial by Lemma 10.2 if ı is transient for π∞. If ı is positive recurrent,

the result follows from the Key lemma in all cases except the case when there are k,m

such that C(πk) > α > C(πm). So, let us assume ı to be positive recurrent.

By virtue of Lemma 10.2 ı is positive recurrent for π∞l , l = 1, . . . , a. Hence x(πl), l =

1, . . . , a are linearly independent. Consider the mapping x(πl)↔ el, where el ∈ IRa is the

lth unit vector and let

Λ =

λ = (λ1, . . . , λa) ∈ IRa

a∑l=1

λl = 1

a∑l=1

λlC(π(l)) ≤ α

λl ≥ 0, l = 1, . . . , k

.

The extreme points of Λ are easily determined. Indeed, the columns of the matrix of

constraint coefficients that correspond to the positive variables of an extreme point are

linearly independent. Since there are two linear equations, an extreme point λ0 contains

Page 191: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 181

at most two positive coordinates. If it contains precisely two positive coordinates, say λ0k

and λ0m, then λ0

kC(πk) + λ0mC(πm) = α and C(πk), C(πm) 6= α.

An extreme point λ0 of Λ is mapped to x(π0) =∑l λ

0l x(π(l)), for π∞0 ∈ S constructed

through (10.3). Then π0 ∈ G, as is easily verified by the structure of the extreme points

and the fact that C(π) is linear in x(π) for π ∈ S.

This completes the proof. Indeed, for x(π) ∈ Li(A) there is a λ ∈ Λ such that x(π) =∑l λlx(πl) by virtue of the Key lemma. λ is a convex combination of extreme points of

Λ; say λ =∑Kk=1 akλ

k, with∑k ak = 1, ak ≥ 0 for k = 1, . . . ,K, and extreme points

λ1, . . . , λK of Λ. Replacing λ by this convex combination, we obtain

x(π) =∑l

λlx(πl) =∑l

(∑k

akλkl

)x(πl)

=∑k

ak(∑l

λkl x(πl))

=∑k

akx(π′k),

for policies π′k∞

with π′k ∈ G, k = 1, . . . ,K.

Thus Lemma 11.4 serves to “reduce” the number of randomized decisions in one selected

state. The next lemma “reduces” the number of randomizing states.

We need some notation. Let π ∈ S, ı, ∈ E be such, that π randomizes in states ı and

between two actions, say a1, a2 and b1, b2 respectively. Define

A′ :=

π ∈ Sπja = πja, a ∈ A(j), j 6= ı, πıa > 0⇒ a ∈ a1, a2πa > 0⇒ a ∈ b1, b2C(π) ≤ α

.

Moreover, π(akbl) ∈ A′ denotes the decision rule that chooses with probability 1 action

ak in ı and action bl in . Finally

G′ := π(akbl) | C(π(akbl)) ≤ α, k, l = 1, 2 ∪

π ∈ A′

C(π(akbl) > α > C(π(ambl))⇒ πbl = 1, πıak + πıam = 1 such that C(π) = α

C(π(akbl) > α > C(π(akbn))⇒ πıak = 1, πbl + πbn = 1 such that C(π) = α

.

With the obvious definitions of Li(A′) and Li(G′) we show the following lemma.

Lemma 11.6 “constrained splitting procedure in two states”: Li(A′) = Li(G′).

Proof: If at least two of x(π(akbl)k,l=1,2 are identical, we can use the same mappings

as in the proof of the previous lemma. However, if all these stationary distributions differ

and we use the same mapping, then an extreme point of the corresponding polyhedron

is possibly mapped to a decision rule π, such that x(π) is a convex combination of e.g.

x(π(a1b1) and x(π(a2b2). In this case π 6∈ G′ and thus we have to construct a more suitable

map.

Page 192: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

182 Part III, Chapter 11

So we suppose, that ı, are positive recurrent for π∞(akbl), k, l = 1, 2. We will show, that

x(π(akbl)k,l=1,2 is a linearly dependent set. To this end we choose ξ ∈ (0, 1). Then there

is a stationary policy π, such that x(π) = ξx(π(a1b1)) + (1 − ξ)x(π(a2b2)). By virtue of

the Key lemma, we can split x(π) up as a convex combination of x(π(a1b1)), x(π(a1b2)),

x(π(a2b1)) and x(π(a2b2)) with positive weights µ1, µ2, µ3 and µ4 = 1 − µ1 − µ2 − µ3

respectively. Hence,

ξx(π(a1b1)) + (1− ξ)x(π(a2b2)) = µ1x(π(a1b1)) + µ2x(π(a1b2)) + µ3x(π(a2b1))

+(1− µ1 − µ2 − µ3)x(π(a2b2)),

so that for ξ 6= µ1

x(π(a1b1)) = x(π(a2b2)) +µ2

ξ − µ1

(x(π(a1b2))− x(π(a2b2))

)+

µ3

ξ − µ1

(x(π(a2b1))− x(π(a2b2))

). (11.14)

ξ = µ1 iff µ2x(π(a1b2)) + µ3x(π(a2b1)) = (µ2 + µ3)x(π(a2b2)), as x(π(albk)), k, l = 1, 2,

are stationary distributions on states and actions. It is easily checked that this implies ı

and to be transient in the MC generated by π∞(akbl), k, l = 1, 2. This contradicts our

assumption. We conclude that x(π(albk))k,l=1,2 is a linearly dependent set indeed.

Moreover, by equality (11.14)

(0 <) xıa1(π(a1b1)) =µ2

ξ − µ1xıa1(π(a1b2)),

so that ξ > µ1, and

0 = xıa2(π(a2b2))− µ2

ξ − µ1xıa2(π(a2b2)) +

µ3

ξ − µ1

[xıa2(π(a2b1)− xıa2(π(a2b2))

].

Using that ξ − µ1 > 0 we obtainµ2 + µ3

ξ − µ1> 1. (11.15)

We consider the convex polytope Λ′ ⊂ IR2 defined as

Λ′ :=

y =

(λ2 + λ1

µ2

ξ − µ1, λ3 + λ1

µ3

ξ − µ1

)4∑l=1

λl = 1

λ1C(π(a1b1)) + λ2C(π(a1b2))

+ λ3C(π(a2b1)) + λ4C(π(a2b2))≤ αλl ≥ 0, l = 1, . . . , 4

.

By (11.15) (µ2/(ξ − µ1), µ3/(ξ − µ1)) is not a convex combination of the vectors (0, 0),

(1, 0) and (0, 1). Hence, the collection V = (0, 0), (1, 0), (µ2/(ξ−µ1), µ3/(ξ−µ1)), (0, 1)consists of the extreme points of the polyhedron without the cost constraint. We choose

the following mapping:x(π(a1b2))− x(π(a2b2))↔ (1, 0)

x(π(a2b1))− x(π(a2b2))↔ (0, 1).

Page 193: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 183

Using the Key lemma and (11.14) it is easily seen that x(π)− x(π(a2b2)) is mapped to a

vector in Λ′. Then similar arguments as in the proof of Lemma 11.5 establish the result,

if we show that any extreme point of Λ′ “generates” a policy π∞ with π ∈ G′.

So, let λ0 be an extreme point of Λ′. λ0 is an extreme point of Λ′ if λ0 ∈ V and

λ01C(π(a1b2)) + λ0

2C(π(a2b1)) ≤ α, or if λ0 is a convex combination of two neighbour-

ing points in V such that the constraint value α is met. We claim that π′ ∈ G′, if π′∞

is

constructed through (10.3) and

x(π′)−x(π(a2b2)) = λ01

[x(π(a1b2))−x(π(a2b2))

]+λ0

2

[x(π(a2b1))−x(π(a2b2))

]. (11.16)

If λ0 ∈ V, the claim holds by virtue of (11.14) and the linearity of C(π) =∑

(i,a) ciaxia(π)

as a function of x(π). We check the case that λ0 = ν(1, 0)+(1−ν)(µ2/(ξ−µ1), µ3/(ξ−µ1)),

all other cases are shown similarly. Then λ0 is mapped to(ν + (1− ν)

µ2

ξ − µ1

)[x(π(a1b2))− x(π(a2b2))

]+ (1− ν)

µ3

ξ − µ1

[x(π(a2b1))− x(π(a2b2))

]= ν

[x(π(a1b2))− x(π(a2b2))

]+ (1− ν)

[x(π(a1b1))− x(π(a2b2))

]= νx(π(a1b2) + (1− ν)x(π(a1b1))− x(π(a2b2)).

Hence π′∞

satisfies (11.16) iff it solves x(π′) = νx(π(a1b2)) + (1− ν)x(π(a1b1)). λ0 meets

the cost constraint α, so that (1 − ν)C(π(a1b1)) + νC(π(a2b1)) = α. This means that

C(π′) = α and the claim is proved.

This enables us to prove the assertion of Proposition 10.4.

Theorem 11.8 The assertion of Proposition 10.4 holds, i.e. Li(ND) = Li(SC) = XC

and any extreme point of Li(SC) belongs to Li(ND).

Proof: Let x(π) ∈ Li(SC). Without loss of generality we may assume that π only ran-

domizes in states that are positive recurrent under π∞. If π 6∈ ND, constrained splitting

procedure in one or two states may be applied to yield, that x(π) is not an extreme point

of Li(SC). By the Krein-Milman theorem this set has extreme points, as it is compact

and convex, so that any extreme point must be contained in Li(ND).

Finally we show the desired optimality results. Let C(SC) = π∞ | π ∈ SC be the set of

constrained stationary policies.

Theorem 11.9 Suppose the immediate rewards are weakly uniformly integrable or upper-

s-bounded and the immediate costs are weakly uniformly integrable or lower-s-bounded.

Then C(SC) is sufficient for the constrained optimization problem. There is an optimal

solution x∗ to the (CLP), for which the assertion of Proposition 10.3ii) holds, and there

is an optimal policy π∞ with π ∈ ND.

Proof: Sufficiency of C(SC) follows directly from (11.11) and (11.13). Firstly, R(i, R) ≤∑(j,a) rjax

∗ja. Application of Billingsley’s Theorem 5.4 and Fatou’s lemma for the re-

spective cost structures yields that∑

(j,a) cjax∗ja ≤ lim supk→∞

∑(j,a) cjax

N(k)ja (i, R) ≤

C(i, R) ≤ α, so that x∗ ∈ XC . Apply Theorem 11.8.

Page 194: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

184 Part III, Chapter 11

By virtue of Lemma 11.4 and Theorem 11.6 SC is compact and R(π) is u.s.c. in π ∈ S.

Hence R(π) attains a maximum value r∗ for say π∗, so that x∗ := x(π∗) is optimal

for (CLP). Combination with Theorem 11.8 yields the assertion of Proposition 10.3ii).

The continuity of x(π) in π ∈ S and the upper semi-continuity of R(π) in π ∈ S together

imply that the set x(π) ∈ Li(SC) | R(π) ≥ r∗ is closed and thus compact. By the Krein-

Milman theorem it is the closed convex hull of its extreme points. Any extreme point of

this set is easily checked to be an extreme point of Li(SC), since R(π) is maximum on this

set. Hence the maximum is attained for some x(π) ∈ Li(ND).

Example of a tightly constrained queueing process

Page 195: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 185

4. The importance of being tight.

For the completeness of the stationary policies, i.e. for the proof of Li = Li(S), we intro-

duced the weak tightness condition. This condition ensures the stationary distributions to

be continuous functions on S and thus Li(S) is a closed set. The Fisher & Ross Counter

example [1968] satisfies Assumption 10.1, and even the state space is one single positive

recurrent class under any stationary policy. Thus by virtue of Theorem 11.2 weak tightness

and continuity of the stationary probabilities x(π) in π ∈ S are equivalent.

The purpose of this section is to show that even under the Assumption 10.1 weak tightness

and continuity of x(π) fail for this counter-example, so that the stationary policies are

not complete. We first show that x(π) is not a continuous function. Hence the weak

tightness property does not hold. Secondly we will explicitly construct a policy R∗, for

which the marginal distributions are not a tight collection. This construction is based on

the construction of an optimal nonstationary policy for this example in Hordijk & Tijms

[1970]. The results are summarized in the two following theorems.

Theorem 11.10 x(π) is not a continuous function in π ∈ S.

Theorem 11.11 ∃ policy R∗ such that X(0, R∗) consists of one vector limit point x, for

which: ∑j,a

xja =7

10.

The parameters of the model are sketched in the picture below, for arbitrary i ∈ IN.

=action 1

=action 2

•0

i′

i•i+ 1. . .

. . .

12

1− 2−i

1− 2−i

324−i

324−i

12

2−i

2−i

Figure 3

We will introduce some notation, before giving the proofs. f∞(N) is the policy that

chooses action 2 for i ≤ N , and action 1 for i > N ; x(f(N)) is the corresponding station-

ary distribution. f∗∞ is the policy that always chooses action 2, and x∗ the stationary

distribution induced by that policy.

Page 196: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

186 Part III, Chapter 11

From the expressions for the expected time between two visits to state 0 in Fisher and

Ross [1968], we can deduce that

limN→∞

x0(f(N)) =1

5< x0( lim

N→∞f(N)) = x∗0 =

2

7. (11.17)

Using the identity x(π) = (x(π))TP (π) for the stationary distribution of the Markov Chain

operating under the stationary policy π, it is easy to show, that

xi(f(N))

x0(f(N))=x∗ix∗0, and

xi′(f(N))

x0(f(N))=x∗i′

x∗0, for i ≤ N. (11.18)

For states i ≤ N , we use induction to i.

x1(f(N)) =3

2· 1

4x0(f(N)), x∗1 =

3

2· 1

4x∗0 ⇒ x1(f(N))

x0(f(N))=

3

2· 1

4=x∗1x∗0

x2(f(N)) =3

2· 1

42x0(f(N)) +

1

2x1(f(N)) = (

3

2· 1

42+

3

4· 1

4)x0(f(N))

and similarly x∗2 =(3

2· 1

42+

3

4· 1

4

)x∗0 ⇒ x2(f(N))

x0(f(N))=x∗2x∗0.

Assume the relation to hold for 1, 2, . . . , i− 1, then

xi(f(N)) =3

2· 1

4ix0(f(N)) +

1

2xi−1(f(N)), x∗i =

3

2· 1

4ix∗0 +

1

2x∗i−1

⇒ xi(f(N))

x0(f(N))=

3

2· 1

4i+

1

2· xi−1(f(N))

x0(f(N))=

3

2· 1

4i+

1

2·x∗i−1

x∗0=x∗ix∗0.

For states i′ with i ≤ N , relation (11.18) is even easier to show, as

xi′(f(N)) =(

1− 1

2i

)xi′(f(N)) +

3

2· 1

4ix0(f(N))

⇒ xi′(f(N)) = 2i · 3

2· 1

4ix0(f(N)) =

3

2· 1

2ix0(f(N)) and, analogously, x∗i′ =

3

2· 1

2ix∗0

⇒ xi′(f(N))

x0(f(N))=

3

2· 1

2i=x∗i′

x∗0.

(11.17) and (11.18) imply Theorem 11.10. For f(N) converges to f∗, as N tends to

infinity, but xi(f(N)) converges to (7/10)x∗i , and the same result holds for i′. Clearly, for

i < N , xi1(0, f(N)) = 0, and xi2(0, f(N)) = xi(f(N)). Hence, the expected state-action

frequencies under policy f(N) converge to a defective probability measure on E × A and

that is definitely not an element of Li(S).

Next we will prove Theorem 11.11. To this end we choose a subsequence Nkk of 1, 2, . . .

Page 197: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 187

such that:

g1 := f(N1), with(1

5≤)x0(f(N1)) ≤ 1

5+

1

2

g2 := f(N2), with N2 > N1, and(1

5≤)x0(f(N2)) ≤ 1

5+

1

4...

gk := f(Nk), with Nk > Nk−1, and(1

5≤)x0(f(Nk)) ≤ 1

5+

1

2k...

Choose a sequence δk∞k=1, with (1/2k) > δk ↓ 0, for k tending to infinity. By virtue of

a theorem by Scheffe [1947] on uniform convergence of probablity measures, we can also

choose a sequence Lkk, with Lk ≥ 1 ∀ k, such that |xTja(0, gk)− xja(gk)| < (1/2k)− δk,

∀ j, a, ∀T ≥ Lk, k = 1, 2, . . .

We define the policy R∗ and a sequence of stochastic time instants Tk∞k=1 in the following

way:

1) T0 = 0.

2) If Tk−1 is finite then

i) The system controller starts playing policy g∞k at time Tk−1.

ii) Either state 0 is visited after T ′ := max (Tk−1 +Lk + 1, (Tk−1/δk) + 1, 2kLk+1),

or not. If it is, Tk will be the first time state 0 is visited after T ′. Else Tk is equal

to infinity.

3) If Tk−1 =∞, then Tk =∞.

Note that R∗ is non-stationary and non-Markovian. Also remark that ∀ k, Tk is finite

with probability 1, as the chains induced by the policies g∞l kl=1 are positive recurrent.

For policy R∗ we first show the following lemma.

Lemma 11.7 ∀ ε > 0, ∀ j, ∃T ∗ such that

xT∗

ja (0, R∗) ≤ 7

10· x∗ja + ε, ∀ a ∈ A(j), ∀T > T ∗.

Proof: Take ε > 0, and fix j. Let K∗ be such, that

1

K∗

( 1

2x∗0+ 1)≤ ε

2(11.19)

NK∗ > j, (11.20)

and T ∗ such, that

IP0,R∗(T∗ < TK∗) <

ε

2. (11.21)

As TK∗ is finite with probability 1, T ∗ can be chosen. Then we only need to consider

policies g∞k with k ≥ K∗. So, if we take NK∗ > j, as in (11.20), then we know by (11.18)

that xja(gk)/x0(gk) = x∗ja/x∗0, ∀ k ≥ K∗.

Page 198: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

188 Part III, Chapter 11

Fix T ≥ T ∗. By 2)ii) ∃K ≥ K∗, such that T < TK+1 for any realization of the process.

With probability 1 the following identity holds:

xTja(0, R∗) =

K∑k=K∗

∑tk−1,tk

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a

∣∣∣Tk ≤ T < Tk + Lk+1

Tm = tm,m = k − 1, k

)× IP0,R∗(Tk ≤ T < Tk + Lk+1, Tm = tm,m = k − 1, k)

+K∑

k=K∗

∑tk−1,tk,tk+1

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a

∣∣∣Tk + Lk+1≤ T < Tk+1

Tm = tm,m = k − 1, k, k + 1

)

× IP0,R∗(Tk + Lk+1 ≤ T < Tk+1, Tm = tm,m = k − 1, k, k + 1)

+1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a | T < TK∗)

IP0,R∗(T < TK∗).

(11.22)

The third term is smaller than ε/2 by (11.21). We consider the first term and we split the

summation over n up into three sums:

1

T + 1IE0,R∗

(tk−1−1∑n=0

1X(n)=j,Y (n)=a | Tk ≤ T < Tk + Lk+1, Tm = tm,m = k − 1, k)≤

≤ tk−1

T + 1=

tk−1

tk − 1· tk − 1

T + 1≤ δk ·

tk − 1

T + 1≤ δk. (11.23)

For the first inequality we use, that there are at most tk−1 terms in the summation over t.

tk ≥ (tk−1/δk)+1, (by 2)ii)), is used to get the second inequality, and (tk−1)/(T +1) ≤ 1

to get the last one.

1

T + 1IE0,R∗

( tk−1∑n=tk−1

1X(n)=j,Y (n)=a | Tk ≤ T < Tk + Lk+1, Tk−1 = tk−1, Tk = tk

)

=1

T + 1IE0,g∞

k

(tk−tk−1−1∑n=0

1X(n)=j,Y (n)=a

)≤(xja(gk) +

1

2k− δk

) tk − tk−1

T + 1≤ xja(gk) +

1

2k− δk (11.24)

The first inequality holds by 2) ii): tk− tk−1 ≥ Lk+1, and by the definition of gk. Finally,

1

T + 1IE0,R∗

( T∑n=tk

1X(n)=j,Y (n)=a | Tk ≤ T < Tk + Lk+1, Tk−1 = tk−1, Tk = tk

)≤ Lk+1

T + 1=Lk+1

tk· tkT + 1

≤ 1

2k,

Page 199: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 189

as there are at most Lk+1 terms in the summation, and as Lk+1/tk is dominated by 1/2k

( 2)ii)). As a consequence, the first term in (11.22) is bounded by(max

K∗,...,Kxja(gk) +

1

K∗

K∑k=K∗

∑tk−1,tk

IP0,R∗(Tk ≤ T < Tk + Lk+1, Tm = tm,m = k − 1, k). (11.25)

In the second term of (11.22) again the summation over t can be split up into three sums

in the same way as above. The first two of these can be bounded by the one but last

bounds in (11.23) and (11.24). Then we get

tk − 1

T + 1δk +

tk − tk−1

T + 1

(xja(gk) +

1

2k− δk

)≤ tk − 1

T + 1·(δk + xja(gk) +

1

2k− δk

)=tk − 1

T + 1·(xja(gk) +

1

2k

).

We will elaborate on the third sum:

1

T + 1IE0,R∗

( T∑n=tk

1X(n)=j,Y (n)=a | Tk +Lk+1 ≤ T < Tk+1, Tm = tm,m = k− 1, k, k+ 1)

=1

T + 1IE0,g∞

k+1

(T−tk∑n=0

1X(n)=j,Y (n)=a

)≤(xja(gk+1) +

1

2(k + 1)− δk+1

)· T − tk + 1

T + 1,

using that T − tk ≥ Lk+1, and the definition of gk+1. By which the whole summation over

t is bounded by max (xja(gk), xja(gk+1)) + 1/2k. Hence, the second term in (11.22) is not

larger than(max

K∗,...,K+1xja(gk) +

1

2K∗

K∑k=K∗

∑tk−1,tk,

tk+1

IP0,R∗(Tk + Lk+1 ≤ T < Tk+1, Tm = tm, m = k − 1, k, k + 1). (11.26)

Combination of (11.25), (11.26) together with the bound on the third term of (11.22) gives

us

(11.22) ≤(

maxK∗,...,K+1

xja(gk) +1

K∗

)IP0,R∗(TK∗ ≤ T < TK+1) +

ε

2

≤ maxK∗,...,K+1

xja(gk) +1

K∗+ε

2. (11.27)

This completes the proof, as we get, using (11.18) and the specific choice of the gk,

Page 200: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

190 Part III, Chapter 11

xTja(0, R∗) ≤ (11.27)(11.20)

= maxK∗,...,K+1

(x∗jax∗0· x0(gk)

)+

1

K∗+ε

2

≤x∗jax∗0

(1

5+

1

2K∗

)+

1

K∗+ε

2≤ 7

10x∗ja +

1

K∗

( 1

2x∗0+ 1)

2

(11.19)

≤ 7

10x∗ja + ε, ∀ a ∈ A(j), ∀T ≥ T ∗.

The next lemma gives a lower bound for xTja(0, R∗), for T →∞.

Lemma 11.8 ∀ ε′ > 0, ∀ j, ∃T ∗, such that

xTja(0, R∗) ≥ 7

10x∗ja − ε′, ∀ a ∈ A(j), ∀T ≥ T ∗.

Proof: Fix ε′ > 0, j. Choose K∗, T ∗ satisfying (11.19), (11.20) and (11.21) for ε = ε′/2.

Fix T ≥ T ∗. With probability 1 the expected state average frequencies over a finite horizon

satisfy the following inequality, for T ≥ T ∗.

xTja(0, R∗) ≥K∑

k=K∗

∑tK∗−1,tK∗ ,...,tk

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a

∣∣∣Tk ≤ T < Tk + Lk+1

Tm = tm, m = K∗ − 1, . . . , k

)

× IP0,R∗(Tk ≤ T < Tk + Lk+1, Tm = tm, m = K∗ − 1, . . . , k)

+

K∑k=K∗

∑tK∗−1,tK∗ ,...,tk+1

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a

∣∣∣Tk + Lk+1 ≤ T < Tk+1

Tm = tm, m = K∗ − 1, . . . , k + 1

)

× IP0,R∗(Tk + Lk+1 ≤ T < Tk+1, Tm = tm, m = K∗ − 1, . . . , k + 1).

(11.28)

For the first term in (11.28) we can derive the appropriate bounds again using the definition

of gk, and (11.20):

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a | Tk ≤ T < Tk + Lk+1, Tm = tm, m = K∗ − 1, . . . , k)

≥ 1

T + 1IE0,R∗

( tk∑n=tK∗−1

1X(n)=j,Y (n)=a | Tk ≤ T < Tk+Lk+1, Tm = tm, m = K∗−1, . . . , k)

≥ tK∗ − tK∗−1

T + 1

(xja(gK∗)−

1

2K∗+ δK∗

)+ · · ·+ tk − tk−1

T + 1

(xja(gk)− 1

2k+ δk)

≥ tK∗ − tK∗−1

T + 1· x0(gK∗)

x∗0x∗ja + · · ·+ tk − tk−1

T + 1· x0(gk)

x∗0x∗ja −

tk − tK∗T + 1

· 1

2K∗

≥ tk − tK∗−1

T + 1·( 7

10x∗ja −

1

2K∗

). (11.29)

Page 201: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Optimization and Constrained Optimization 191

By 2)ii), the construction of δK∗ and (11.19), we have

tK∗−1

T + 1≤ δK∗ <

1

2K∗≤ ε

4and

T − tkT + 1

≤ Lk+1

T≤ Lk+1

tk≤ 1

2k<ε

4,

so thattk − tK∗−1

T + 1≥ 1− ε

2.

Hence,

(11.29) ≥(

1− ε

2

)( 7

10x∗ja −

ε

4

).

And consequently the first term in (11.28) is not less than(1− ε

2

)( 7

10x∗ja −

ε

4

)·K∑

k=K∗

IP0,R∗(Tk ≤ T < Tk + Lk+1). (11.30)

The bounds for the second term in (11.28) can be derived with similar arguments:

1

T + 1IE0,R∗

( T∑n=0

1X(n)=j,Y (n)=a | Tk+Lk+1 ≤ T < Tk+1, Tm = tm, m = K∗−1, . . . , k+1)

≥ tK∗ − tK∗−1

T + 1

(xja(gK∗)−

1

2K∗+ δK∗

)+ · · ·+ T − tk + 1

T + 1

(xja(gk+1)− 1

2k+ δk

)≥ T + 1− tK∗−1

T + 1

( 7

10x∗ja −

1

2K∗

)≥(

1− ε

2

)( 7

10x∗ja −

1

2K∗

).

By which the second term in (11.28) is greater or equal to

(1− ε

2

)( 7

10x∗ja −

ε

4

) K∑k=K∗

IP0,R∗(Tk + Lk+1 ≤ T < Tk+1). (11.31)

Combining (11.30) and (11.31), and using (11.21) we get as a lower bound for xTja(0, R∗)

xTja(0, R∗) ≥(

1− ε

2

)( 7

10x∗ja −

ε

4

)· IP0,R∗(TK∗ ≤ T ≤ TK+1)

≥( 7

10x∗ja −

ε

4

)(1− ε

2

)2

≥ 7

10x∗ja − ε′, ∀ a ∈ A(j), ∀T ≥ T ∗,

if ε′ < 1. Clearly the inequality holds trivially, if ε′ ≥ 1.

Combination of Lemmas 11.7 and 11.8 proves Theorem 11.11.

æææææææææææææææææææææææææææææææææ

Page 202: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

192 Part III, Chapter 12

æææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææææ

CHAPTER TWELVE

Constrained Admission Control to a Queueing System.

1. Introduction.

In this chapter we study a queueing system that can be controlled by restricting arrivals.

The control depends on the number of jobs present in the system. The interarrival and

service times are assumed to have an exponential distribution. However, if the service

disciplines are symmetric, our results extend directly to nonexponential service time dis-

tributions (cf. Kelly [1979] or Hordijk [1983]).

The state of the system is the number of jobs present in the queue. We assume that the

queue can only contain a bounded number of jobs, say at most m, so, the state space E

will be 0, 1, . . . ,m. The arrival and service rates in state i are denoted by λi and νirespectively. Our results are valid for any type of service with rates νi, i ∈ E, which means

that they hold for all unbiased service disciplines (cf. Cooper [1981] p.95).

The action or decision set A contains two elements, 0 and 1, which denote the rejection

and acceptance of a job. Randomized decision rules admit new customers with probability

0 ≤ αi ≤ 1, when there are i jobs present. Whether we study this MDC in continuous

time or we use a time-discretized version (with transition probabilities equal to the rates

times the time grid) is irrelevant, as in both cases the same optimality equations hold

(see Serfozo [1979]) for the average expected reward criterion. From Markov decision

theory we know that a stationary and deterministic optimal policy exists. In this chapter,

however, we want to restrict the analysis to policies that satisfy a certain cost constraint,

for example maximizing the throughput under a constraint on the expected delay. Under

such a restriction an optimal policy will generally be undeterministic. It may even happen

that no stationary policy at all is optimal. Fortunately, our model is a unichain MDC,

because under any policy the queue will get empty. By virtue of Proposition 10.3 of this

monograph (cf. Hordijk & Kallenberg [1984]) a stationary optimal policy exists, so we can

restrict ourselves to this class.

Two types of policies play a crucial role in our analysis. One is the so-called threshold

policy. We say that a policy is a threshold (j, q) policy if the following holds: an arriving

job is rejected when there are at least j jobs present; it is admitted when there are at most

(j − 2) jobs and it is admitted with probability q when there are precisely (j − 1) jobs in

Page 203: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 193

the queue. When the probability q is equal to 1, the policy is called a critical level j policy.

In the unconstrained problem there is a critical level j optimal policy (see Stidham [1985],

Stidham and Weber [1989]).

In the problem with a constraint we have to randomize, but randomization takes place

in at most one state. Indeed, since the MDC has a finite state space, the conditions in

Chapter 11 on weak tightness and uniform integrability are satisfied, so that the result of

Theorem 11.8 is valid.

In section 12.2 we give an overview of models with an optimal policy of threshold type.

It turns out that reversing the structure of the immediate reward and/or cost function,

for example taking νi to be convex increasing in i instead of concave increasing, gives in

a number of cases an optimal policy of the second type, the thinning policy. We say that

a policy is a thinning (q) policy if it admits jobs when there are already jobs present and

admits with probability q when the queue is empty. It is clear, that such a policy has

a preference to keep the system empty once it is empty. A plausible reason for playing

this policy is the possibility of doing maintenance of the service facility when the queue is

empty.

In section 12.4 we derive a criterion for optimal policies to be of threshold or thinning

type. For natural cost and reward functions it should be rather easy to verify whether the

criterion applies. However, we could not translate the criterion into one set of conditions

on the cost and reward functions implying all interesting cases. Therefore, in section 12.2

we give a list of natural models (including those we found in literature) with the structure

of the optimal policy. For models with natural performance measures not included in the

list, we have the impression that similar arguments will apply.

In section 12.3 we show that the existence of an optimal policy that randomizes only in one

state, follows easily from the correponding (finite-dimensional) linear program. A similar

result has been obtained by Beutler and Ross [1986], Ross [1989].

The key theorem of the chapter is in section 12.4 and it gives a relation between random-

izing policies and convex combinations of critical level policies in Key theorem III. It is a

special case of the Key lemma, which enables us to compute the randomization factor q of

the optimal threshold or thinning policy. This has been a problem of considerable interest

in several papers (cf. Shwartz, Ma & Makowski [1986]). As a corollary of this theorem we

find criteria and sufficient conditions for threshold and thinning optimality.

In section 12.5 an efficient algorithm of order O(m2) is given to compute an optimal

policy. It also checks whether a threshold (thinning) policy is optimal. Furthermore,

easy verifiable, sufficient conditions on the immediate reward and cost rates are derived.

Finally, in section 12.6 the proofs of the assertions in section 12.2 are given.

Page 204: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

194 Part III, Chapter 12

2. Optimal structure for specific performance measures.

In this section we focus our attention on standard performance measures and in Table 5

we give an overview of various cases in which the optimal policy is of threshold or thinning

type. For all the problems described there, we implicitly assume that there is some policy

which satisfies the constraint. If the constraint is not satisfied by any optimal policy for

the unconstrained problem, the constraint value α equals the average expected cost for

the optimal policy of the constrained problem. For compactness we shall use shorthand

notation in this overview. The proofs can be found in section 12.6.

If we want to maximize the (expected) throughput (max T ) we can take νi as the imme-

diate reward in state i (ri = νi). Indeed, if pi, i ∈ E, is the stationary distribution under

a certain policy, then∑i piνi gives the throughput. To express the expected delay D we

use Little’s formula and have the result that D is equal to the quotient of the expected

number of jobs N divided by the throughput. The constraint D ≤ α is then equivalent

to the constraint N − αT ≤ 0. In order to get N − αT as the constraint function, we

take (i− ανi) as the immediate cost in state i (ci = i− ανi). Consider now the following

constrained control problem: maximize the throughput while the expected delay of jobs

is not allowed to exceed α. In the third column of the overview we find that a threshold

policy is optimal, if νi is concave increasing in i. In our notation this will be expressed by

‘ν ......................................... ⇒ TR’.

In the second problem of the list, the constraint is expressed in a cost function C =∑i pici.

In the fourth problem we take −N/T as the object function in order to minimize D.

Setting r1i = −i and r2

i = νi, −N/T is expressed by∑i pir

1i /∑i pir

2i . In the last problem

we have general immediate rewards and cost functions which may depend on the action

a. In the conditions we use the notation:

FR(i) =

ri0 +

νiλi−1

(ri−1,1 − ri−1,0), i > 0

r00, i = 0

and similarly for FC(i).

In the third column we also give the references for the models we found in the literature. In

some of the references assumptions are used that are more restrictive than the conditions

given below.

Finally we will introduce some notation. The symbol TR(TH) denotes that a threshold (thin-

ning) policy is optimal. The symbols .........................................

, ..........................................

,..................................................

.

........ and ...........................................................

stand for concave nondecreas-

ing, convex nondecreasing, nonincreasing and nondecreasing respectively. Throughout the

paper increasing will mean strictly increasing. The same applies for decreasing.

Page 205: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 195

1.

max TD ≤ α −−−−−−−→

[ri = νi

ci = i− ανi

]...................................................................................................

..................................................................................................

[LAZAR [1983]

ν ......................................... ⇒ TR

[ ν .......................................... ⇒ TR

2.

max TC ≤ α −−−−−−−→

[ri = νi

ci

]...................................................................................................

..................................................................................................

SCHOUTE [1979]

SHWARTZ et al. [1986]

ν .........................................

& c .......................................... ⇒ TR

[ν .....................

.....................

& c ......................................... ⇒ TH

3.

minNT ≥ α −−−−−−−→

[ri = −ici = −νi

]...................................................................................................

..................................................................................................

[ν ............

............................. ⇒ TR

[ ν .......................................... ⇒ TH

4.

minDT ≥ α −−−−−−−→

r1i = −ir2i = νi

ci = −νi

...................................................................................................

..................................................................................................

[ν ............

............................. ⇒ TR

[ ν .......................................... ⇒ TR and TH

5.

maxRC ≤ α −−−−−−−→

[ria

cia

]...................................................................................................

..................................................................................................

[BEUTLER & ROSS [1986]

FC ...........................................................

&FR..................................................

.

........ onE \ 0 ⇒ TR

[FC

........................................................... &FR...........

................................................

onE \ 0 ⇒ TH

Table 5

Page 206: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

196 Part III, Chapter 12

3. Optimality of “nearly deterministic” policies.

In this section we show that for our constrained control problem there is always an optimal

policy of the following type: π(l, q, j)∞ is, for l < j and 0 ≤ q ≤ 1, the stationary

randomized policy which rejects arriving jobs when there are at least j jobs present; it

admits when there are at most j− 1 jobs and when there are precisely l jobs in the queue,

then an arriving job is admitted with probability q.

By virtue of Theorem 11.9 there is at most one state in which an optimal policy randomizes,

if there is only one constraint, as is the case in our problem. We prove this below for a

general finite unichain Markov decision chain using the (CLP) and direct arguments from

linear algebra. For the moment, let us assume this to be true. The decision in a transient

state does not influence the stationary distribution. If a policy rejects in state j then the

states j + 1, . . . ,m become transient. Consequently, without loss of all optimal policies

we may restrict the policy set to those that reject customers above a certain level. The

optimal policy randomizes in at most one state, hence it is of π(l, q, j)∞-type. Note that

the π(l, q, j)∞ policy with l = j − 1 is a threshold policy. If l = 0 and j = m, we have a

thinning policy. When q = 0 we get a critical level l policy and when q = 1 a critical level

j policy (notation π(l)∞ and π(j)∞ ).

So, let us consider the (CLP) and let x∗ be an optimal extreme point. By virtue of

Proposition 10.3 π∗∞ with π ∈ S defined through (10.3) is an average optimal policy with

x(π∗) = x∗. Then E(π∗) = j |∑a∈A(j) x

∗ja > 0 is the set of positive recurrent states

under policy π∗∞.

It is well-known that the columns of the matrix of constraint coefficients which correspond

to the positive variables of a basic solution, are linearly independent. As E(π∗) is closed

under P (π∗), we have Piaj = 0 if x∗ia > 0 and j 6∈ E(π∗). Moreover,∑j∈E(π∗)(δij−Piaj)=0

for (i, a) with x∗ia > 0. Consequently, the number of independent columns corresponding

to the basic variables in the set of equations∑i,a(δij − Piaj)x

∗ia = 0, j ∈ E, is at most

|E(π∗)| − 1. Because the linear programming problem contains 2 constraints more than

the first block of |E| constraints, we conclude that our basic solution contains at most

|E(π∗)|+ 1 positive variables. For each i ∈ E(π∗) there is at least one action a such that

x∗ia > 0. Hence there is at most one i ∈ E with more than one a with positive x∗ia. If for

i ∈ E there is more than one action a with positive x∗ia, then for this i there are exactly

two actions a with positive x∗ia. Thus, the optimal decision rule π∗ randomizes on E(π∗) in

at most one state between at most two actions. Outside the set E(π∗) an optimal decision

rule can be chosen arbitrarily, as long as all states in E \E(π∗) keep being transient states,

which is always true under Assumption 10.1 by Lemma 10.2.

So, without loss of generality we may assume π∗ to be deterministic for all states outside

E(π∗). This proves, that an optimal policy for a unichain MDC with one extra constraint

randomizes in at most one state and thus Theorem 11.9 is established. Similarly one shows

that for k extra constraints randomization takes place in at most k states.

Page 207: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 197

By specifying the reward, cost and transition structure, the linear program can be used

to compute an optimal policy. However, a more efficient algorithm will be presented in

section 12.5.

4. Criteria for threshold and thinning optimality.

In this section we present the key result of this paper. Recall that π(k)∞ is the critical level

k policy and π(l, q, j)∞ is the policy with critical level j, that randomizes with probability

q in state l ≤ j. The Key lemma states that the joint stationary distribution on state i

and action a for policy π(l, q, j)∞ is a convex combination of the stationary distributions

for π(l)∞ and π(j)∞. With this lemma we can transfer the expected reward and/or cost

function under a randomized policy to a convex combination of similar functions for critical

level policies. The Key theorem of this chapter gives simple expressions for the weights in

the convex combinations. This allows an easy derivation of a criterion for a threshold or

a thinning policy to be optimal.

Let Ri and Ci denote the expected reward and cost when using the critical level policy

π(i)∞. In Theorems 12.1 and 12.2 we collect simple, sufficient conditions for the desired

optimal structure, which are formulated in terms of the Ri’s and Ci’s. In the next section

we will derive sufficient conditions on the direct reward and cost structure.

ρj=

j∏i=1

λi−1

νi, j > 0

1, j = 0

We use the notation:

nk=( k∑i=0

ρi

)−1

.and

It is well-known from queueing theory (cf. Cooper [1981]) that ρjnk, j = 0,. . . ,k are the

stationary probabilities for the critical level k policy.

Key theorem III For all l ≤ j, i ≤ j, a ∈ 0, 1 and 0 ≤ q ≤ 1 the following equality

holds,

xia(π(l, q, j)

)= q?xia

(π(j)

)+ (1− q?)xia

(π(l)

)with q? =

qnlqnl + (1− q)nj

. (12.1)

Proof: Apply the Key lemma and split in state l. Then we obtain

q? =qIEl,π(j)∞T

qIEl,π(j)∞ + (1− q)IEl,π(l)∞T, (12.2)

where the random variable T describes the time between two visits to state l. By the Er-

godic theorem IEl,π(j)∞T = 1/xl(π(j)) = 1/ρlnj and similarly for policy π(l)∞. Insertion

of these expressions in (12.2) yields

q? =q

ρlnj(q/ρlnj + (1− q)/ρlnl

) =qnl

qnl + (1− q)nj.

Page 208: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

198 Part III, Chapter 12

As R(π) =∑

(i,a) riaxia(π) , and C(π) can be expressed similarly, we have the following

picture. Note that nk is decreasing in k.

•1=n0

•n1

•nl

•Rl↑↓

nl + q?(nj − nl)

•R(π(l, q, j)

)↑

↓•nj

•Rj↑

↓•

nm•0

Figure 4

So R(π(l, q, j)

)is precisely a convex combination of Rl and Rj . Now, suppose that Ci ≤ α

for i ≤ j and Cl > α for l ≥ j. Then π(j, q1, j + 1)∞ is the unique threshold policy with

expected cost equal to the constraint level α, if q1 is related to q?1 via (12.1) and if

q?1 =α− CjCj+1 − Cj

.

In the next criterion we also assume that Ri is nondecreasing in i. Extensions to other

cases are easily made, but difficult to state in general. We will use the following notation:

R(i, l) := Ri + (α− Ci)Rl −RiCl − Ci

, i ≤ l. (12.3)

For q related to q? = (α− Ci)/(Cl − Ci) via (12.1), R(i, l) denotes the reward under the

policy π(i, q, l)∞. The expected cost under this policy is equal to α. This follows from

the representation given in Figure 4. Indeed, the corresponding C-line segment intersects

the α-level between nl and nj at the point nl + q?(nl − nj). As a consequence of Key

Theorem III R(i, l) is equal to R(π(i, q, l)

).

Criterion 1 The threshold policy π(j, q1, j + 1)∞ is optimal if and only if for all i ≤ j

and l ≥ j + 1

R(i, l) ≤ R(j, j + 1). (12.4)

Proof: AsRi is nondecreasing in i, any policy π(i, q, l)∞ with l ≤ j can be majorized by one

with l ≥ j+ 1. R(π(i, q, l)

)and C

(π(i, q, l)

)are nondecreasing in q for i ≤ j and l ≥ j+ 1.

If we take q?2 = (α− Ci)/(Cl − Ci) then C(π(i, q2, l)

)= α and R

(π(i, q2, l)

)= R(i, l).

The right hand side of (12.4) is equal to the expected reward for the threshold policy.

The inequality then gives the result that the threshold policy is at least as good as all

randomized policies (see section 12.3) that satisfy the constraint.

In the same way a criterion for the optimality of the thinning policy π(o, q2,m)∞ with

q?2 = (α− C0)/(Cm − C0) can be formulated.

Page 209: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 199

Criterion 2 The thinning policy π(0, q2,m) is optimal if and only if for all i ≤ j and

l ≥ j + 1

R(i, l) ≤ R(0,m). (12.5)

From these criteria we find the following sufficient conditions in case Ri and Ci are in-

creasing in i. We will use the notation ∆Ri and ∆Ci for Ri+1 −Ri and Ci+1 − Ci.

Theorem 12.1 If for all i = 0,. . . ,m− 2∆Ri+1

∆Ri≤ (≥)

∆Ci+1

∆Ci(12.6)

then the unique threshold (thinning) policy is optimal.

Proof: We give the proof for the ≤-sign; the proof of the other case goes along the same

lines. As both Ri and Ci are increasing in i, the assumptions of the criteria are satisfied.

As before, let j be such that Cj ≤ α < Cj+1. In order to show (12.4) we prove that

R(i, l) ≤ R(j, l) ≤ R(j, j + 1).

In fact we only derive the last inequality, the first can be derived analogously. Using

relation (12.3) it remains to be shown that

Rl −RjCl − Cj

≤ ∆Rj∆Cj

.

From (12.6) easily follows∆Rj+k∆Rj

≤ ∆Cj+k∆Cj

.

From

Rl −Rj∆Rj

=

l−j−1∑k=0

∆Rj+k∆Rj

,

and similarly for the Ci’s, the desired inequality follows.

In the next section we derive conditions implying convexity or concavity of the Ri and Cias functions of the ni. Here we use convexity or concavity in i.

Theorem 12.2 If Ri is convex (concave) increasing and Ci is concave (convex) increasing

in i then there is a unique optimal threshold (thinning) policy.

Proof: If Ri and Ci are convex and concave increasing in i respectively, then

∆Ri+1

∆Ri≤ 1 ≤ ∆Ci+1

∆Ci.

Hence, Theorem 12.1 applies. The reversed case is proved similarly.

Remark 12.1: If we allow Ri and Ci to be non-decreasing, the results of Theorems 12.1

and 12.2 still hold with some extra conditions on the functions Ri and Ci. It will be clear

Page 210: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

200 Part III, Chapter 12

from the analysis in section 12.6, what kind of extra conditions will have to be imposed.

Remark 12.2: If we assume Ri,Ci to be decreasing, the same kind of results evidently hold.

One only needs to require the two functions to be nonincreasing, in order to prove that

Criteria 1 and 2 hold. However, under (12.6) or the equivalent assumption from Theorem

12.2, we have thinning optimality instead of threshold optimality and vice versa.

5. Algorithm and sufficient conditions.

In this section in Theorem 12.3 we will give expressions for the ∆Ri and ∆Ci in terms of

the immediate rewards and costs. These expressions allow us to construct an algorithm

of the order O(m2) to compute an optimal policy. This algorithm does not assume the

optimality of a threshold policy. On the contrary it also checks whether a threshold or

thinning policy is optimal.

The section concludes with easy verifiable conditions for threshold or thinning optimality,

also in terms of the direct rewards and costs, by inequalities (12.13) and (12.14). These

conditions are only sufficient in case the functions FR and FC are increasing (or decreasing;

but in that case we have the reversed results, as already mentioned in Remark 12.1. If we

assume more general functions FR and FC , it is difficult to state sufficient conditions, but

basically these will reduce to a version of the inequalities (12.13) or (12.14).

Theorem 12.3 For i = 0,. . . ,m

∆Ri = ni+1niρi+1

i∑k=0

∆FR(k)

nk(12.7)

and

∆Ci = ni+1niρi+1

i∑k=0

∆FC(k)

nk. (12.8)

Proof: Write ci: = ci0 , ci: = 1λi

(ci1 − ci0

), then ci1 = ci + λici and FC(i) = ci + νici−1,

i ≥ 1.

Ci=ni(i−1∑k=0

ck1ρk + ci0ρi

)= ni

(i−1∑k=0

(ck + λk ck)ρk + ciρi

)=ni

(i−1∑k=0

λk ckρk +i∑

k=0

ckρk

)=ni

( i∑k=1

ck−1νkρk +

i∑k=0

ckρk

)= ni

( i∑k=0

ρkFC(k)), (12.9)

where in the fourth and fifth equality we used λkρk = νk+1ρk+1 and FC(0) = c0. With

this expression we find

Page 211: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 201

∆Ci=ni+1

i+1∑k=0

ρkFC(k)− nii∑

k=0

ρkFC(k)

=ni+1ni

( i∑l=0

ρl

i+1∑k=0

ρkFC(k)−i+1∑l=0

ρl

i∑k=0

ρkFC(k))

=ni+1ni

(ρi+1FC(i+ 1)

i∑l=0

ρl − ρi+1

i∑k=0

ρkFC(k))

=ni+1niρi+1

i∑k=0

ρk (FC(i+ 1)− FC(k)) (12.10)

=ni+1niρi+1

i∑k=0

ρk

i∑l=k

∆FC(l) = ni+1niρi+1

i∑l=0

∆FC(l)l∑

k=0

ρk

=ni+1niρi+1

i∑l=0

∆FC(l)

nl.

Note from relation (12.9) that the average cost under policy π(i)∞ is equal to the average

cost under π(i)∞ in a system with cost rates FC(k) in state k. From this and relations

(12.8) and (12.10) the following equivalences can be obtained:

Corollary 12.1

∆Ci > (=)0 ⇐⇒ FC(i+ 1) > (=)Ci (12.11)

Assume now that the Ci and Ri are nondecreasing in i, which seems to be true for a lot of

problems from practice. Of course more complicated cases can be treated analogously. As

the ni and ρi, i = 0,. . . ,m can be computed recursively, (12.8) and (12.7) give a recursive

scheme for computing the Ci and Ri, and we have the following algorithm to compute an

optimal policy.

Algorithm.

Step 1. Compute recursively C0 = 0,C1,. . . until j with Cj ≤ α < Cj+1.

In case Cm ≤ α, the policy that always admits the customers (i.e. the critical

level policy π(m)∞) is optimal.

Step 2. Compute

q? =α− CjCj+1 − Cj

and

q =q?nj

q?nj + (1− q?)nj+1.

Page 212: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

202 Part III, Chapter 12

Then from Key Theorem III follows that π(j, q, j + 1)∞ is a threshold policy

that satisfies the constraint. This policy is a critical level policy if Cj = α. It is

unique if Cj < α < Cj+1.

Step 3. Compute recursively the Ri, i = 0,. . . ,m.

Use relation (12.3) to compute the R(i, l) for i ≤ j and l ≥ j + 1.

Find (io, lo) such that

(io, lo) ∈ argmax R(i, l) | (i, l), i ≤ j, l ≥ j + 1 .

In case R(j, j + 1) = R(io, lo), the threshold policy π(j, q, j + 1)∞ is optimal. If

not, compute

q?o =α− CioClo − Cio

and

qo =q?onio

q?onio + (1− q?o)nlo.

The policy π(io, qo, lo)∞ is optimal.

Note that the algorithm is of order O(m), if we know beforehand that there is a threshold

optimal policy.

We saw in section 12.4 (c.f. Figure 4) that the expected rewards and costs under random-

ized strategies are convex combinations of the expected rewards and costs under critical

level policies. We used the ni’s as a scale on the horizontal axis. As they are decreasing,

we will introduce ni := 1− ni, i = 0,. . . ,m. In Beutler and Ross [1986] it is assumed that

FC(i) and FR(i) are nondecreasing and decreasing respectively, for i ≥ 1. Mark that they

denote FC(i) and FR(i) by V (i+ 1) and Q(i+ 1). It is interesting to note that Theorem

12.3 clarifies this assumption. Indeed, it implies that Ci and Ri are convex and concave

on the ni’s. To see this, we make use of (ni+1 − ni)−1 = (nini+1ρi+1)−1 and

∆Ci∆ni

< (=)∆Ci+1

∆ni+1⇐⇒ ∆FC(i+ 1) > (=)0, 0 ≤ i ≤ m− 2. (12.12)

So, if FC is nondecreasing then ∆Ci ≥ 0; and if ∆FC(l) > 0 then Ci is increasing from l

on. In particular for l = 0, we have that Ci is increasing (and convex on the ni’s). Similar

remarks are valid for the Ri. Thus it is easy to check with Theorem 12.3 whether Ci,Riare convex on the ni and increasing.

We conclude this section with some conditions on the immediate rewards and costs imply-

ing relation (12.6). Thus they guarantee optimality of a threshold or thinning policy. For

sake of simplicity we again assume the Ci,Ri to be increasing. Extensions to specific other

cases are not essentially more difficult, but it seems difficult to include various cases and,

Page 213: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 203

at the same time, to keep the argumentation short and clear. Using (12.7) and (12.8) we

find the following equivalent inequatities:

∆Ri+1

∆Ri≤ ∆Ci+1

∆Ci

⇐⇒ ∆FR(i+ 1)∑ik=0 ∆FR(k)/nk

≤ ∆FC(i+ 1)∑ik=0 ∆FC(k)/nk

. (12.13)

A sufficient condition for the last inequality is

∆FC(k)

∆FC(i+ 1)≤ ∆FR(k)

∆FR(i+ 1),∀k ≤ i.

If FC , FR are increasing,this is implied by

∆FC(k)

∆FC(k + 1)≤ ∆FR(k)

∆FR(k + 1), k = 0, . . . ,m− 2. (12.14)

Hence inequality (12.14) is sufficient for the optimality of a threshold policy. With re-

versed inequality sign the same is true for the thinning policy. For some of the natural

performance measures from section 12.2 we have FR(i) = ri and FC(i) = ci. For these

cases it is sufficient to require

∆ci∆ci+1

≤ ∆ri∆ri+1

, i = 0, . . . ,m− 2. (12.15)

This inequality is clearly satisfied if ci ..........................................

, ri .........................................

and if both are increasing functions.

Of course, inequality (12.6) (or (12.13)) is also implied when Ri is concave on the niand Ci convex on the ni. Therefore it is of interest to know under what conditions the

Ri are concave on the ni, and increasing. From (12.12) we need ∆FR(i) ≤ 0, i ≥ 1. If

∆FR(0) ≤ 0, it is evident that the Ri are nonincreasing. Furthermore, if FR(l) < FR(0),

then, applying (12.10), we have ∆Rl−1 < 0. Hence, if FR(i) nonincreasing, i ≥ 1, a

necessary condition for Ri to be increasing, is:

∆FR(0) > 0

FR(i) ≥ FR(0),∀i ∈ E.Unfortunately we did not succeed in deriving a simple sufficient condition.

6. Proofs of the optimal structure of the models from section 12.2.

In this section we give proofs of the models from section 2. Where possible we will use the

criteria and sufficient conditions already derived in the previous sections. All these proofs

will be given under the following natural assumption:

Assumption.

1. There are policies that satisfy the constraint.

2. All optimal policies for the unconstrained problem are not admissible for the con-

strained problem (otherwise a critical level policy is trivially optimal).

For sake of clearness we denote ‘convexity and concavity on the ni’ with ..........................................

νand .........

................................

ν.

Page 214: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

204 Part III, Chapter 12

Model 1:

max TD ≤ α

With ri = νi, ci = i− ανi this problem is equivalent tomaxRC ≤ 0 =: β

.

In this model FR(i) = νi. As νi ...........................................................

, we have ∆FR(i) ≥ 0, ∀i ∈ E; consequently Ri..........................................

νand ........

...................................................

. Consider now FC(i) = i − ανi. Let us first assume that ν .........................................

. As

∆FC(i) = 1− α∆νi, and ∆ν..................................................

.

........ ,

∆FC(l) ≥ 0⇒ ∆FC(i) ≥ 0, ∀i ≥ l. (12.16)

•R01=n0

•C0=0

•ni0

•Ri0

•Ci0

β=0

β=0

R01=n0

C0

•n1

•Cm

Rm

nm

β=0

R01=n0

C0

••

ni1

Ri1

Ci1

nim

Rim

Cim

nj

Rj

Cj

•R(j, j + 1)

nj+1

Cj+1

Rj+1

Figure 5 Figure 6 Figure 7

We will discern two cases.

Case 1: ∆FC(0) = 1− α∆ν0 ≥ 0, that is, ∆ν0 ≤ 1/α.

From (12.16) follows ∆FC(i) ≥ 0, ∀i, and thus the result that Ci ...........................................................

and ..........................................

ν.

Let i0 := mini ∈ E | ∆FC(i) > 0. C0 = 0, as ν0 = 0. Therefore C0 =. . .= Ci0 = 0, and

Ci is increasing from i0 on. As the constraint value β is equal to 0, it is clear from Figure 5

that π(i0)∞ is optimal. Formally this can be proved using Criterion 1 with j = i0; because

all policies π(i, q, l)∞ with i ≤ i0, l ≥ i0 + 1 and cost equal to or less than β = 0 have

q? = 0; consequently q = 0.

Page 215: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 205

Case 2: ∆FC(0) = 1− α∆ν0 < 0, that is, ∆ν0 > 1/α.

As ∆ν..................................................

.

........ , it is possible that there is an i1 ∈ E with ∆νi1−1 ≥ 1/α, ∆νi1 < 1/α. This is

equivalent to ∆FC(i) ≤ 0, i ≤ i1, and ∆FC(i) ≥ 0, i > i1; which means that Ci is .........................................

νand

decreasing for i ∈ 0, . . . ,i1, Ci ..........................................

νfor i ≥ i1. We have the trivial result that π(m)∞ is

optimal, if either ∆νi ≥ 1/α, ∀i (i.e. i1 = m), or i1 < m, but Cm < 0 (see Figure 6). The

non-trivial situation is sketched in Figure 7.

Let j be as in Criterion 1,that is Cj ≤ 0, Cj+1 > 0. It is straightforward to prove that the

result of the criterion is valid for this model. Taking i ≤ j, l ≥ j + 1 and im = mink :

∆Ck > 0 we prove

i. R(i, l) ≤ R(i+ 1, l) ≤. . .≤ R(im, l), i ≤ im.

ii. R(i, l) ≤ R(j, j + 1), i ≥ im.

Proof of (i).

Similarly to (12.3) we can write

R(i, l) = Rl + (β − Cl)Ri −RlCi − Cl

. (12.17)

With β − Cl < 0, the following inequality holds

R(i, l) ≤ R(i+ 1, l) ⇐⇒ Ri −RlCi − Cl

≥ Ri+1 −RlCi+1 − Cl

⇐⇒ Ri −RlRi+1 −Rl

≥ Ci − ClCi+1 − Cl

. (12.18)

(12.18) is true as (Ri −Rl)/(Ri+1 −Rl) ≥ 1, (Ci − Cl)/(Ci+1 − Cl) ≤ 1, for i ≤ i1.

Proof of (ii).

Ci, Ri are ..........................................

νand increasing on im,. . . ,m. In order to get the desired result from

Theorem 12.1, we will check (12.13) for i ≤ im. In terms of this model, (12.13) is as

follows:∆νi+1∑io ∆νk/nk

≤ 1− α∆νi+1∑i0(1− α∆νk)/nk

.

Let i2 := maxi: ∆νi 6= 0. As ∆ν..................................................

.

........ , ν ...........................................................

we know that ∆νk = 0, ∀k ≥ i2 + 1; hence

(12.13) holds for i ≥ i2. For i ∈ im,. . . ,i2 − 1. (12.13) is equivalent to

i∑0

∆FR(k)

nk∆FR(i+ 1)≥

i∑0

∆FC(k)

nk∆FC(i+ 1).

Asim−1∑

0(∆FC(k)/nk) ≤ 0,

im−1∑0

(∆FR(k)/nk) > 0, it is sufficient to prove

i∑im

∆FR(k)

nk∆FR(i+ 1)≥

i∑im

∆FC(k)

nk∆FC(i+ 1).

Page 216: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

206 Part III, Chapter 12

And this is implied by

∆FR(k)

∆FR(k + 1)≥ ∆FC(k)

∆FC(k + 1), im ≤ k ≤ i

⇐⇒ ∆νk∆νk+1

≥ 1− α∆νk1− α∆νk+1

, im ≤ k ≤ i

⇐⇒ ∆νk ≥ ∆νk+1 , im ≤ k ≤ i.

In case ν ..........................................

, the following results can be obtained analogously:

1. ∆ν0 ≥ 1/α.

Then ∆νi ≥ 1/α, ∀i, and Ci..................................................

.

........ , .........................................

ν.

All critical value policies are admissible, and π(m)∞ is trivially optimal.

2. ∆ν0 < 1/α, ∆νm−1 = maxi ∆νi ≤ 1/α.

Then Ci. increasing .The only admissible policy is π(0)∞.

3. ∆ν0 < 1/α, ∆νm−1 > 1/α.

Now there is a i0 ∈ E,with Ci increasing and ..........................................

νfor i ≤ i0, and Ci .........

................................

νfor i > i0.

Clearly, from Cm ≤ 0 follows that π(m)∞ is optimal. If Cm > 0, then π(0)∞ is optimal.

As all optimal policies in these three cases are critical level policies, evidently the result

already mentioned in Table 5, section 12.2, follows.

Model 2:

max TC ≤ α

With ri = νi and ci arbitrarily, we have a model with FR(i) = νi and FC(i) = ci. If c, ν

...........................................................

, then FR, FC ...........................................................

, and also Ci, Ri ...........................................................

and ..........................................

ν. In case we also require ν, c to be

increasing, the desired results follow directly from inequalities (12.13), (12.14) and (12.15)

and Theorem 12.1, i.e.

i. ν .........................................

, c .......................................... ⇒ the threshold (j+1, q) policy with average cost equal to α is optimal.

ii. ν ..........................................

, c ......................................... ⇒ the thinning (q) policy with average cost equal to α is optimal.

As we allow ∆νi, ∆ci = 0, we have to verify that the results remain valid. We will only

check Case (i): ν .........................................

, c ..........................................

. Now it is possible that ∆νi = 0, i ≥ i0, and ∆ci = 0, for

i ≤ i1. This is equivalent to∆Rk∆nk

=∆Rk−1

∆nk−1, k ≥ i0

0 = C0 = . . . = Ci1 < Ci1+1 < . . . < Cm.

In order to prove for i ≤ j, l ≥ j + 1

1. R(i, l) ≤ R(i+ 1, l) ≤ . . . ≤ R(i1, l) , i ≤ i12. R(i, l) ≤ R(j, j + 1) , i ≥ i1,

we can use the same arguments as in the proof of Model 1, Case 2.

Page 217: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 207

Result (i) is a generalization of the result by Schoute [1979], who took νi = iν, λi = λ and

c ..........................................

; and of the result by Shwartz, Ma & Makowski [1986], who took νi = ν, λi = λ and

ci = i.

Model 3:

minNT ≥ α

With ri = −i, ci = −νi this problem is equivalent tomaxRC ≤ −α

.

From FR(i) = ri we have ∆FR(i) = −1. That means that Ri is decreasing and .........................................

ν.

∆FC(i) = −∆νi ≤ 0 and Ci..................................................

.

........ and .........................................

νas FC(i) = −νi and ν ........

...................................................

. Ci is decreasing if

ν .........................................

.

According to Remark 12.2 and the results from section 12.4, the optimality of the unique

threshold (j + 1, q) policy with C − j ≥ −α, Cj+1 < −α and average cost equal to −α can

be proved by checking

∆Ri+1

∆Ri≥ ∆Ci+1

∆Ci, i = 0, . . . ,m− 2,

or (12.13) with reversed inequality sign.

This is easy to check. For increasing ν we have the following equivalence:

−1

−1=

∆ri+1

∆ri≥ ∆ci+1

∆ci=

∆νi+1

∆νi,∀i ≤ m− 2 ⇐⇒ ν ............

.............................

.

Note that in this case the concavity of ν is also a necessary condition for (12.16).

The result for ν ..........................................

can be proved with the same arguments.

Model 4:

minDT ≥ α , α > 0

With r1i = −i, r2

i = νi and ci = −νi this problem is equivalent to max

(R−C

)C ≤ −α ,−α < 0

.

Here the functions R and C are the same as in Model 3.

Page 218: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

208 Part III, Chapter 12

From the results for the previous model it is clear that the same policy is optimal for the

following problems, if we take β ∈ [Cm, C0).

Problem 1:

max

(R−C

)C = β

and Problem 2:

maxRC ≤ β

.

Assuming ν to be .........................................

,we saw that Ci is decreasing. Hence, ∀β ∈ [Cl+1, Cl] the optimal

policy for Problem 1 is given by π(l, qβ , l+1)∞, with qβ the unique solution to C(π(l, q, l+

1)) = β.

Let β vary from Cm to −α. We will prove that the optimal threshold (j + 1, q) policy for

Model 3 is optimal for Model 4. It is sufficient to prove for β1, β2 ∈ [Ck+1, Ck], k ≥ j with

−α ≥ β1 > β2 that

R(π(k, qβ1

, k + 1))

−C(π(k, qβ1

, k + 1)) ≥ R

(π(k, qβ2

, k + 1))

−C(π(k, qβ2

, k + 1)) . (12.19)

Write

R(π(k, qβi , k + 1)

)=

k−1∑l=0

∆Rl + γβi∆Rk

C(π(k, qβi , k + 1)

)=

k−1∑l=0

∆Cl + γβi∆Ck

(12.20)

with γβi = (βi − Ck)/∆Ck, i = 1, 2.

Consequently, (12.19) holds

⇐⇒(k−1∑l=0

∆Rl + γβ1∆Rk

)(k−1∑l=0

∆Cl + γβ2∆Ck

)≤(k−1∑l=0

∆Cl + γβ1∆Ck

)(k−1∑l=0

∆Rl + γβ2∆Rk

)

⇐⇒(γβ2 − γβ1

) k−1∑l=0

∆Rl∆Ck ≤(γβ2 − γβ1

) k−1∑l=0

∆Cl∆Rk.

The last inequality is clearly valid as γβ1 ≤ γβ2 and as for the previous model ∆Ck/∆Cl ≤∆Rk/∆Rl , l ≤ k, was proved to hold.

Let us now assume ν ..........................................

. It is evident that in this case the thinning policy π(0, qβ ,m)∞

with C(π(0, qβ ,m)

)= β optimizes Problem 1. It turns out that each admissible thinning

Page 219: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Constrained Admission Control to a Queueing System 209

policy is optimal, in particular π(m)∞. As this is the policy that admits all customers,there

is also a threshold optimal policy. To see this, we prove

R(π(0, q1,m)

)C(π(0, q1,m)

) =R(π(0, q2,m)

)C(π(0, q2,m)

) ,∀q1, q2 > 0.

Using an expression similar to (12.20) we get

R(π(0, q,m)

)C(π(0, q,m)

) =R0 + γq(Rm −R0)

C0 + γq(Cm − C0)=Rm −R0

Cm − C0,∀q > 0.

Models 1 to 4 were models in which the functions Ci, Ri were -globally- both..................................................

.

........ or ...........................................................

and ..........................................

νor .........

................................

ν. In the next model we will treat the case that one is .........

................................

νand the other

..........................................

ν. In the proofs we will assume that all policies π(i, q, l)∞ that we will make use of,

are admissible policies. In fact, generalizations of Criteria 1 and 2 can be made, but we

will not do that. Instead we will give the direct proofs.

Model 5A:FC(i) ...........................................................

and FR(i)..................................................

.

........ on 1,. . . ,m.The assumptions imply that Ci ...........................

...............

ν, Ri .....................

....................

ν. In the non- trivial case we have essentially

the situation sketched in Figure 8 (see next page). Our goal is to prove the optimality of

π(j, q, j + 1)∞ with Cj ≤ α, Cj > α and C(π(j, q, j + 1)

)= α.

Let L := mink : Rk ≥ Ri,∀i. If i, l ≤ j then R(π(i, q, l)

)≤ Rl ≤ R(l, L). Take i ≤ j,

l ≥ j + 1, then:

1. Rl < Ri ⇒ R(π(i, q, l)

)≤ Ri ≤ R(i, L).

2. Rl ≥ Ri, l ≥ L, i ≤ I := maxk : Ck ≤ Ci ,∀i. Then R(π(i, q, l)

)≤ R(i, l ≤ R(I, l),

with the usual arguments. See e.g. Model 1, Case 2, i.

3. Rl ≥ Ri, l ≥ L and i ≥ I. As under point 2 it is sufficient to prove R(i, l) ≤ R(i, L).

We get the result using (12.3) and

Rl −RiRL −Ri

≤ 1 ≤ Cl − CiCL − Ci

.

4. i ≥ I, l ≤ L. On I,. . . ,L Ci and Ri are increasing functions. Applying Theorem

12.2 and making use of Ri being .........................................

νand Ci being .................

.........................

ν, we get the desired result.

Model 5B:FC(i)..................................................

.

........ and FR(i) ...........................................................

on 1,. . . ,m.Clearly, the assumptions are equivalent to Ci .........

................................

ν, Ri .................

.........................

ν. Again, excluding the trivial

cases, we get a picture like Figure 9. Choosing i < l we show the optimality of the thinning

policy π(0, q,m)∞, with C(π(0, q,m)

)= α, as follows:

1. Ri ≥ Rl ⇒ R(π(i, q, l)

)≤ Ri ≤ R0 ≤ R(0,m).

2. Ri < Rl ≤ R0 ⇒ R(π(i, q, l)

)≤ Rl ≤ R0 ≤ R(0,m).

3. Ri < Rl, R0 < Rl. Assuming that j is chosen in the usual way, let first i, l ≤ j.

Thus Ci, Cl ≤ α. Let k ∈ i, l be such that Rk = max(Ri,Rl

). We obtain that

Page 220: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

210 Part III, Chapter 12

R(π(i, q, l)

)≤ Rk ≤ R(k,m). If i ≤ j < l, we prove that R

(π(i, q, l)

)≤ R(i, l) ≤

R(i,m) ≤ R(0,m). The first inequality is clearly true as both Ri < Rl and Ci < Cl.The second inequality follows from (12.3) and

Rl −RiRm −Ri

≤ nl − ninm − ni

≤ Cl − CiCm − Ci

,

as Ci .........................................

νand Ri ...........................

...............

ν. Using α−Cm < 0 and expression (12.17), the third inequality

can be derived fromRm −RiRm −R0

≥ nm − ninm − n0

≥ Cm − CiCm − C0

.

α

R0

C0

1=n0

RI

nI

CI

Rj

nj

Cj

R(j, j+1)

Rj+1

nj+1

Cj+1

RL

nL

CL

α

R0

1=n0

C0

•Cj

•Cj+1

Rm

nm

Cm

R(0,m)

Figure 8 Figure 9

Page 221: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

References 211

REFERENCES

ALTMAN, E. AND SHWARTZ, A. (1989) Optimal priority assignment: a time sharing approach.

IEEE Trans. Aut. Control AC-34.

ALTMAN, E. AND SHWARTZ, A. (1990) Markov optimization problems and state-action fre-

quencies. EE PUB No. 679, Technion Inst. Techn.

BARAS, J.S., MA, D.-J. AND MAKOWSKI, A.M. (1985) K competing queues with geometric service

requirements and linear costs: the µc-rule is always optimal. Systems Control Lett.

6, 186-209.

BEUTLER, F.J. AND ROSS,K.W. (1986) Time-average optimal constrained semi-Markov deci-

sion processes. Adv. Appl. Prob. 18, 341-359.

BILLINGSLEY, P. (1968) Convergence of Probability Measures. J. Wiley, New York.

BLACKWELL, D. (1962) Discrete dynamic programming. Ann. Math. Stat. 33, 719-726.

BORKAR, V.S. (1986) Control of Markov chains with long-run average cost criterion, in:

Proc. Stochastic Differential Systems (eds. W. Fleming and P.-L. Lions). Springer

Verlag, Berlin, 57-77.

BORKAR, V.S. (1988) A convex analytic approach to Markov decision processes. Prob. Th.

Rel. Fields 78, 583-602.

BUYUKKOC, C., VARAIYA, P. AND WALRAND J. (1985) The cµ rule revisited. Adv. Appl. Prob.

17, 237-238.

CAVAZOS-CADENA, R. AND LASSERRE, J.B. (1988) Strong 1-optimal stationary policies in

denumerable Markov decision processes. Systems Control Lett. 11, 65-71.

CHEONG, C.K. AND HEATHCOTE, C.R. (1965) On the rate of convergence of waiting times. J.

Austr. Math. Soc. 5, 365-373.

CHUNG, K.L. (1967) Markov Chains with Stationary Transition Probabilities. Springer-

Verlag, Berlin.

CINLAR, E. (1975) Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs,

New Jersey.

COOPER, R. B. (1981) Introduction to Queueing Theory. North-Holland, New York.

DEKKER, R. (1985a) Denumerable Markov Decision Chains: optimal Ξolicies for small

Interest Rates. Ph.D.-thesis, University of Leiden.

DEKKER, R. (1985b) Theses, belonging to the Ph.D.-thesis. University of Leiden (in Dutch).

DEKKER, R. AND HORDIJK, A. (1984) Average, sensitive and Blackwell optimal policies in

denumerable Markov decision chains with unbounded rewards. Technical report, Dep.

of Math. and Comp. Sci., Univ. of Leiden.

Page 222: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

212 References

DEKKER, R. AND HORDIJK, A. (1988) Average, sensitive and Blackwell optimal policies in

denumerable Markov decision chains with unbounded rewards. Math. Operat. Res.

13, 395-421.

DEKKER, R. AND HORDIJK, A. (1989) Recurrence conditions for average and Blackwell opti-

mality in denumerable state Markov decision chains. Technical report, Dep. of Math.

and Comp. Sci., Univ. of Leiden. To appear in Math. Operat. Res.

DEKKER, R. AND HORDIJK, A. (1990) Denumerable semi-Markov decision chains with small

interest rates. Technical report, Dep. of Math. and Comp. Sci., Univ. of Leiden,

submitted for publication.

DEKKER, R., HORDIJK, A. AND SPIEKSMA, F.M. (1990) On the relation between recurrence and

ergodicity properties in denumerable Markov decision chains. Technical report, Dep.

of Math. and Comp. Sci., Univ. of Leiden, forthcoming.

DEPPE, H. (1985) Continuity of mean recurrence times in denumerable semi-Markov pro-

cesses. Z. Wahrscheinlichkeitsth. 69, 581-592.

DERMAN, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.

DERMAN, C. AND STRAUCH, R.E. (1966) A note on memoryless rules for controlling sequential

control processes. Ann. Math. Stat. 37, 276-278.

VAN DIJK, N.M. AND LAMOND, B.F. (??) Bounds for call congestion of finite single-server

tandem queues. Technical report.

FAYOLLE, G. AND IASNOGORODSKI, R. (1979) Two coupled processors: the reduction to a

Riemann-Hilbert problem. Z. Wahrscheinlichkeitsth. 47, 325-351.

FEDERGRUEN, A. AND GROENEVELT, H. (1988) M/G/c queueing systems with multiple cus-

tomer classes: characterization and control of achievable performance under nonpre-

emptive priority rules. Manag. Sci. 34, 1121-1138.

FEDERGRUEN, A., HORDIJK, A. AND TIJMS, H. C. (1978a) Recurrence conditions in denumer-

able state Markov decision processes, in: Dynamic Programming and its Applications

(ed. M. L. Puterman). Academic Press, New York, 3-22.

FEDERGRUEN, A., HORDIJK, A. AND TIJMS, H. C. (1978b) A note on simultaneous recurrence

conditions on a set of denumerable stochastic matrices. J. Appl. Prob. 15, 842-847.

FISHER, L. (1968) On recurrent denumerable decision processes. Ann. Math. Stat. 39,

424-432.

FISHER, L. AND ROSS, Sh. M. (1968) An example in denumerable decision processes. Ann.

Math. Stat. 39, 674-675.

FOSTER, F.G. (1953) On the stochastic matrices associated with certain queueing processes.

Ann. Math. Stat. 24, 355-360.

GELENBE, E. AND MITRANI, I. (1980) Analysis and Synthesis of Computer Systems. Aca-

demic Press, London.

HAJEK, B. (1982) Hitting-time and occupation-time bounds implied by drift analysis with

applications. Adv. Appl. Prob. 14, 502-525.

HORDIJK, A. (1974) Dynamic programming and Markov potential theory. Mathematical

Centre Tract 51, Amsterdam.

Page 223: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

References 213

HORDIJK, A. (1976) Regenerative Markov decision models, in: Mathematical Programming

Study 6 (ed. R.J.B. Wets). North-Holland, Amsterdam, 49-72.

HORDIJK, A. (1983) Insensitivity for stochastic networks, in: Mathematical Computer Per-

formance and Reliability (eds. G.Iazeolla, P.J. Courtois, A. Hordijk). North-Holland,

Amsterdam, 77-94.

HORDIJK, A. AND KALLENBERG, L.C.M. (1979) Linear programming and Markov decision

chains. Management Sci. 25, 352-362.

HORDIJK, A. AND KALLENBERG, L.C.M. (1984) Constrained undiscounted stochastic dynamic

programming. Math. Operat. Res. 9, 277-289.

HORDIJK, A., SCHWEITZER, P.J. AND TIJMS, H.C. (1975) The asymptotic behaviour of the

minimal total expected cost for the denumerable Markov decision model. J. Appl.

Prob. 12, 298-305.

HORDIJK, A. AND SLADKY, K. (1977) Sensitive optimality criteria in countable state dynamic

progrmming. Math. Operat. Res. 2, 1-14.

HORDIJK, A. AND SPIEKSMA, F.M. (1989a) Constrained admission control to a queueing sys-

tem. Adv. Appl. Prob. 21, 409-431.

HORDIJK, A. AND SPIEKSMA, F. M. (1989b) On ergodicity and recurrence properties of a

Markov chain with an application to an open Jackson network. Technical report,

Dep. of Math. and Comp. Sci., Univ. of Leiden, submitted for publication.

HORDIJK, H. AND TIJMS, H. C. (1970) Colloquium Markov-Programmering. 118-125, (in

Dutch).

HUANG, C. AND ISAACSON, D. (1976) Ergodicity using mean visit times. J. London Math.

Soc. B 14, 570-576.

ISAACSON, D. (1979) A characterization of geometric ergodicity. Z. Wahrscheinlichkeitsth.

49, 267-273.

ISAACSON, D. AND TWEEDIE, R.L. (1978) Criteria for strong ergodicity of Markov chains. J.

Appl. Prob. 15, 87-95.

ISAACSON, D. AND LUECKE, G.R. (1978) Strongly ergodic Markov chains and rates of con-

vergence using spectral conditions. Stoch. Proc. Appl. 7, 113-121.

KALLENBERG, L.C.M. (1980) Linear Programming and finite Markovian Control Problems.

Ph. D.-thesis, Univ. of Leiden.

KELLEY, J.L. (1955) General Topology. Springer-Verlag, New York.

KELLY, F.P. (1979) Reversibility and stochastic Networks. Wiley, New York.

KEMENY, J.G., SNELL, J.L. AND KNAPP, A.W. (1976) Denumerable Markov Chains (2nd edi-

tion). Springer-Verlag, New York.

KENDALL, D.G. (1959) Unitary dilations of Markov transition operators, and the corre-

sponding integral representations for transition-probability matrices, in: Probability

and Statistics (ed. U. Grenander). Almqvist and Wiksell, Stockholm (Wiley, New

York), 139-161.

KENDALL, D.G. (1960) Geometric ergodicity in the theory of queues, in: Mathematical

Methods in the Social Sciences, 1959 (eds. K. J. Arrow, S. Karlin and P. Suppes).

Stanford University Press, Stanford, 176-195.

Page 224: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

214 References

KINGMAN, J.F.C. (1964) The stochastic theory of regenerative events. Z. Wahrscheinlich-

keitsth. 2, 180-224.

KLIMOV, G.P. (1974) Time-sharing service systems I. Th. Prob. Appl. 19, 532-551.

LASSERRE, J.B. (1988) Conditions for existence of average and Blackwell optimal stationary

policies in denumerable Markov decision processes. J. Math. Anal. Appl. 136, 479-

490.

LAZAR, A.A. (1983) Optimal flow control of a class of queueing networks in equilibrium.

IEEE Trans. Autom. Control 28, 1001-1007.

MAKOWSKI, A.M. AND SHWARTZ, A. (1989) Recurrence properties of a discrete-time single-

server network with random routing. EE PUB No. 718, Technion Inst. Techn.

MALYSHEV, V.A. (1972) Classification of two-dimensional positive random walks and almost

linear semimartingales. Soviet Math. Dokl. 13, 136-139.

MALYSHEV, V.A. AND MENSIKOV, M.V. (1981) Ergodicity, continuity and analyticity of count-

able Markov chains. Trans. Moscow Math. Soc. 1, 1-48.

MILLER, H.D. (1966) Geometric ergodicity in a class of denumerable Markov chains. Z.

Wahrscheinlichkeitsth. 4, 354-373.

MILLER, B.L. AND VEINOTT, A. F. Jr. (1969) Discrete dynamic programming with a small

interest rate. Ann. Math. Stat. 40, 366-370.

NAIN, Ph. (1989) Interchange arguments for classical scheduling problems in queues. Sys-

tems Control Lett. 12, 177-184.

NEUTS, M.F. AND TEUGELS, J.L. (1969) Exponential ergodicity of the M/G/1-queue. SIAM J.

Appl. Math. 17, 921-929.

NEVEU, J. (1965) Mathematical Foundations of the Calculus of Probability. Holden-Day,

San Francisco.

NUMMELIN, E. AND TUOMINEN, P. (1982) Geometric ergodicity of Harris recurrent Markov

chains with Applications to renewal theory. Stoch. Proc. Appl. 12, 187-202.

NUMMELIN, E. AND TWEEDIE, R. (1978) Geometric ergodicity and R-positivity for general

Markov chains. Ann. Prob. 6, 404-420.

POPOV, N.N. (1977) Conditions for geometric ergodicity of countable Markov chains. Soviet

Math. Dokl. 18, 676-679.

PUTERMAN, M.L. (1974) Sensitive discount optimality in controlled one-dimensional diffu-

sions. Ann. Prob. 2, 408-419.

RIDDER, A.A.N. (1987) Stochastic Inequalities for Queues. Ph.D.-thesis, Univ. of Leiden.

ROSS, K.W. (1986) Optimal and suboptimal policies for Markov decision processes. submit-

ted for publication.

ROSS, K.W. (1989) Randomized and past-dependent policies for Markov decision processes

with multiple constraints. Operat. Res. 37, 474-477.

ROSBERG, Z. (1980) A positive recurrence criterion associated with multi-dimensional queue-

ing processes. J. Appl. Prob. 17, 790-801.

ROSS, S.M. (1970) Applied Probability Models with Optimization Applications. Holden-Day,

San Francisco.

ROSS, S.M. (1983) Stochastic Processes. J. Wiley, New York.

Page 225: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

References 215

ROYDEN, H.L. (1968) Real Analysis (2nd edition). Macmillan, New York.

SCHAL, M. (1971) The analysis of queues with state-dependent parameters by Markov

renewal processes. Adv. Appl. Prob. 3, 155-175.

SCHAL, M. (1989) On the second optimality equation for semi-Markov decision models.

Technical report, Univ. Bonn.

SCHEFFE, H. (1947) A useful Convergence Theorem for Probability Distributions. Ann.

Math. Stat. 18, 434-438.

SCHOUTE, F.C. (1979) Optimal Control and Call Acceptance in a SPC Exchange. Ninth

International Teletraffic Congress.

SENNOTT, L.I. (1989a) Average cost optimal stationary policies in infinite state Markov

decision processes with unbounded costs. Operat. Res. 37, 626-633.

SENNOTT, L.I. (1989b) Average cost semi-Markov decision processes and the control of

queueing systems. Prob. Engin. Inform. Sci. 2, 247-272.

SHWARTZ, A., MA, D.-J. AND MAKOWSKI, A.M. (1986) Estimation and optimal control for con-

strained Markov chains. Proceedings of the 25th Conference on Decision and Control,

Athens, December 1986, 994-999.

SCHWEITZER, P.J. (1971) Iterative solution to the functional equations of undiscounted

Markov renewal programming. J. Math. Anal. Appl. 34, 495-501.

SCHWEITZER, P.J. AND FEDERGRUEN, A. (1977) The asymptotic behaviour of undiscounted

value iteration in Markov decision problems. Math. Operat. Res. 2, 360-381.

SCHWEITZER, P.J. AND FEDERGRUEN, A. (1979) Geometric convergence of value-iteration in

multichain Markov decision problems. Adv. Appl. Prob. 11, 188-217.

SERFOZO, R.F. (1979) An equivalence between continuous and discrete time Markov decision

processes. Operat. Res. 27, 616-620.

STIDHAM JR., Sh. (1985) Optimal control of admission to a queueing system. IEEE Trans.

Autom. Control, AC-30, 705-713.

STIDHAM JR.,Sh. AND WEBER, R.R. (1989) Monotonic and insensitive optimal policies for

control of queues with undiscounted costs. Operat. Res. 87, 611-625.

STOYAN, D. (1983) Comparison Methods for Queues and other stochastic Models. J. Wiley,

New York.

SYSKI, R. (1978) Ergodic potential. Stoch. Proc. Appl. 7, 311-336.

SZPANKOWSKI, W. (1988) Stability conditions for multi-dimensional queueing systems with

computer applications. Operat. Res. 36, 944-957.

THOMAS, L.C. (1980) Connectedness conditions for denumerable state Markov decision pro-

cesses, in: Recent Developments in Markov Decision Processes (ed. R. Hartley, L. C.

Thomas, D. J. White). Academic Press, New York, 181-204.

TITCHMARSH, E.C. (1939) The Theory of Functions (2nd edition). Oxford University Press,

Oxford.

TIJMS, H.C. AND EIKEBOOM, A.M. (1986) A simple technique in Markovian control with

applications to resource allocation in communication networks. Operat. Res. Lett. 5,

25-31.

Page 226: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

216 References

TUOMINEN, P. AND TWEEDIE, R.L. (1979a) Exponential decay and ergodicity of general

Markov processes and their discrete skeletons. Adv. Appl. Prob. 11, 784-803.

TUOMINEN, P. AND TWEEDIE, R.L. (1979b) Exponential ergodicity in Markovian queueing

and dam models. J. Appl. Prob. 16, 867-880.

TWEEDIE, R.L. (1975) Sufficient conditions for regularity, recurrence and ergodicity of

Markov processes. Math. Proc. Camb. Phil. Soc. 78, 125-136.

TWEEDIE R.L. (1981) Criteria for ergodicity, exponential ergodicity and strong ergodicity

of Markov processes. J. Appl. Prob. 18, 122-130.

VERE-JONES, D. (1962) Geometric ergodicity in denumerable Markov chains. Quart. J.

Math. Oxford 13, 7-28.

VEINOTT, A.F. Jr. (1969) On discrete dynamic programming with sensitive discount opti-

mality criteria. Ann. Math. Stat. 40, 1635-1660.

WESSELS, J. (1977) Markov programming and successive approximations with respect to

weighted supremum norms. J. Math. Anal. Appl. 58, 326-335.

WEBER, R.R. AND STIDHAM JR, Sh. (1987) Optimal control of service rates in networks of

queues. Adv. Appl. Prob. 19, 202-218.

WHITE, D.J. (1963) Dynamic programming, Markov chains, and the method of successive

approximations. J. Math. Anal. Appl. 6, 373-376.

YOSIDA, K. (1980) Functional Analysis (6th edition). Springer Verlag, Berlin.

ZIJM, W.H.M. (1985) The optimality equations in multichain denumerable state Markov

decision processes with the average cost criterion: the bounded cost case. Statistics

& Decisions 3, 143-165.

ZIJM, W.H.M. (1987) Asymptotic expansions for dynamic programming recursions with gen-

eral nonnegative matrices. J. Optim. Th. Appl. 54, 157-191.

Page 227: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Definition Finder 217

DEFINITION FINDER

approximating Markov chain 2, 32

average optimality equations 97

boundedness

exponential — 14

spatial geometric — 3

temporal geometric — 3, 15

bounding vector 11

— of productform 4, 40-41

class 18

completeness 169

cost

average expected — 164

lower-s-bounded — 179

criterion

Foster’s — for ergodicity 12

Popov’s — for geometric ergodicity 14

Tweedie’s — for expon. ergodicity 33

data transformation 71

decision rule 94

deterministic — 94

nearly deterministic — 166

deviation matrix 64

elimination of randomized decisions 168

ergodicity 11

exponential — 33

geometric — 11

µ-exponential — 33

µ-geometric — 12

µ-uniform exponential — 111

µ-uniform geometric — 95

strong — 11, 33

excessive measure 167

expected state-action frequency 162

interest rate 96

Laurent expansion

— of α-discounted rewards 97,127

— of resolvent 66

— of probab. generating function 70

lexicographical maximization 96-97

linear programming formulation 162

constrained — 165

µc-rule 143

µ-continuity 96

µ-norm 11

νc-rule 143

operator norm 11

optimality criterion

α-discounted — 96

average — 96

bias — 91

Blackwell — 96

n-discount — 96

s-average — 117

sensitive — 85

policy 94

critical level j —

deterministic — 94

Markov — 94

non-idling 142

stationary — 94

thinning (q) — 192

threshold (j, q) — 192

v-conserving — 117

v-s-conserving — 117

recurrence

µ-geometric — 14

µ-uniform geometric — 95

µ-uniform weak exponential — 111

Page 228: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

218 Definition Finder

µ-uniform weak geometric — 98

µ-weak exponential — 34

µ-weak geometric — 19

strong — 16, 34

uniform strong — 95, 112

recurrence time 13

reference state 18

set of —s 18

resolvent 64

— set 64

reward 94

average expected — 96

expected α-discounted — 96

s-average expected — 117

upper-s-bounded — 175

weakly uniformly integrable — 175

simultaneous Doeblin condition 99

spectrum 64

spectral radius 64

splitting procedure 168

constrained —s 180, 181

stability 18

stationary matrix 18, 33

strong convergence 16, 34

uniform — 95, 112

sufficience 175

superharmonic vector 117

taboo

— set 2

— transition probability 13

throughput 143

unichain MDC 162

weak tightness 171

Page 229: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Symbol Finder 219

SYMBOL FINDER

........................................................... ,..............................

.............................

non-increasing, non-decreasing.

..........................................

,..........................................

ν,.........................................

,.........................................

νconvex non-decreasing,— on n, concave non-decreasing, — on n.

partial ordering of probability distributions: P1 P2 iff∑j≥n

P1j ≤∑j≥n

P2j , n ∈ IN0.

st≤ stochastic ordering of r.v.: X st

≤ Y iff IP(X ≥ n) ≤ IP(Y ≥ n),n ∈ IN0.

l≥ p. 97, lexicographical ordering sign of Laurent series.∇f gradient of function f , e.g. ( ∂

∂xf(x, y), ∂∂yf(x, y)).0 nul-element, e.g. vector, matrix consisting of 0’s only.vT , AT transposed of vector v, e.g.

(v1v2

), and transposed of matrix A.

vi ith component of vector v.∆vi vi+1 − vi.Sc complement of set S.S closure of set S in Chapters 3, 9; closed convex hull of set S in

Part III.α discount factor, in Part III constraint value.A(i) set of available actions in state i ∈ E.A∗(i) set of maximizing actions in first optimality equation.AMC, AMDC abbrev. approximating M(D)C.β contraction factor.B,B(f) set of reference states in a MC, the MC generated by f∞.γ throughput vector; γ = RT γ + λ.(CLP) p.165.C, C(M), C(S), set of all policies, all Markov policies, all stationary policies,

, C(D), C(ND) all deterministic policies and all nearly det. policies resp.C(·, R) av. exp. cost vector under policy R, i.e.

C(i, R) = lim supN→∞∑∞n=0 IEi,R

(∑Nn=0 cX(n),Y (n)

), i ∈ E.

C(π), Ci see R(π) etc.Cx,r z ∈ C | |z − r| = x.δij Kronecker delta, δij = 1, if j = i; δij = 0, if j 6= i.δ(i) δ(i) = 1, if i 6= 0; δ(i) = 0, if i = 0.1X(n)=j 1X(n) = 1, if X(n) = j; 1X(n)=j = 0 otherwise.d p.145, x(I −RT ).D,D(f) deviation matrix of a MC, the MC generated by policy f∞, e.g.

D = limx↑1∑∞n=0(Pn −Π)xn.

Page 230: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

220 Symbol Finder

D(z) “analytic” part of P (z), i.e. P (z)− (1− z)−1Π =∑n(Pn−Π)zn.

Dx,r z ∈ C | |z − r| < x.D expected delay.e vector consisting only of 1’s.em mth unit vector.ε small positive number.E (denumerable) state space of a MC or MDC.E ×A (i, a) | a ∈ A(i), i ∈ E.E(π) set of positive recurrent states under π∞.IEi expectation operator when the initial state is i.IEi,R expectation operator when the initial state is i and policy R is

played.f, f∞ deterministic decision rule , policy (f, f, . . .).f∞α deterministic α-discounted optimal policy.F set of all deterministic decision rules.F∗ set of v-s-conserving deterministic decision rules.FR, FC vectors with components FR(i) = ri0+ νi

λi−1(ri−1,1−ri−1,0), i > 0,

FR(0) = r00, FC ibid. for costs.

F(n)iM , F

(n)iM (f) (MP

n−1P )iM , etc.

FiM , FiM (f)∑∞n=1 F

(n)iM , etc.

F (z), F (f, z) probability generating matrix functions∑n F

(n)zn, etc.

F(n)

(z) nth derivative of F (z).g(R) av. exp. reward vector under policy R, i.e.

gi(R) = lim infN→∞1

N+1 IEi,R(∑N

n=0 rX(n),Y (n)

).

gs(R) s-av. exp. reward vector under policy R, i.e.

gsi (R) = lim supN→∞1

N+1 IEi,R(∑N

n=0 rX(n),Y (n)

).

I identity matrix.λ, λk arrival rates.l.s.c. abbrev. lower semi-continuous.(LP) p.162.Li,Li(M),Li(S),Li(D),Li(SC),Li(ND) X(i, R) | R ∈ C, etc.

M(A) set of probability measures on A.M mostly a finite subset of E.MC, MP abbrev. Markov chain (in discrete time), Markov process (in con-

tinuous time).MDC,MDP ibid. Markov decision chain, Markov decision process.µ bounding vector with components greater or equal to 1.‖v‖µ, ‖A‖µ µ-norm of vector v and matrix A.µ− BS(RS)(M) p.19.µ− EE p.33, µ-exponential ergodicity.µ− GE p.12, µ-geometric ergodicity.µ− GR(RS)(M) p.14, 19 µ-geometric recurrence.µ− R(RS)(M) p.19.

Page 231: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Symbol Finder 221

µ−WER(M) p.34, µ-weak exponential recurrence.µ−WGR(RS)(M) p.19, µ-weak geometric recurrence.µ− UBS(RS)(M) p.98, 99.µ− UEE p.111, µ-uniform exponential ergodicity.µ− UGE p.95, µ-uniform geometric ergodicity.µ− UGR(RS)(M) p.95, µ-uniform geometric recurrence.µ− UR(RS)(M) p.98.µ− UWGR(RS)(M) p.98, µ-uniform weak geometric recurrence.µ− UWER(M) p.111, µ-uniform weak exponential recurrence.IN0 0, 1, 2, . . ..ni νm/ν1 in Ch. 9, normalizing constant in Ch. 12.ni 1− ni.νk service rates.ν, ν(f) number of positive recurrent classes.N average number of present customers.ND p.166, set of “nearly deterministic” decision rules.IPi probability operator when the system is in state i at time 0.IPi,R probability operator when the system is in i at time 0, and policy

R is played.P, P (f), P (π) transition probability matrices of MC’s.P (t), P (t, f) transition probability matrices of MP’s.Ph, Ph(f) transition matrices of the AMC’s.P (z), P (f, z) probability generating functions

∑∞n=0 z

nPn,∑∞n=0 z

nPn(f).

MP ,MP (f),

MP (t),MP (t, f) pp.13, 34 taboo transition probability matrices for taboo set M .

Π,Π(f),Π(π) stationary matrices: limN→∞1

N+1

∑Nn=0 P

n for MC, limt→∞ P (t)for MP, etc.

Piaj probability that j is reached in one step from i, if action a ischosen.

Pij , Pij(z), Pij(f), etc. ijth element of matrix P , P (z), P (f), etc.Pi•,Πi•, etc. probability distribution Pijj∈E , etc. on E.π, π∞ randomizing decision rule, stationary policy (π, π, . . .).π(l, q, j)∞ p.196 policy with critical level j and randomized decisions in state

l only, where customers are admitted with probability q.π(l)∞ p.192, critical level (l) policy.Q,Q(f) itensity matrices of a MP.QI , QII , QIII , QIV the respective quadrants numbered anticlockwise, including the

coordinate axis.ρ interest rate, ρ = (α− 1)α−1.ria immediate reward in state i, when action a is chosen.r(f) immediate reward vector under policy f∞, with ri(f) = rif(i),

i ∈ E.R(·, R) R(·, R) = g(R).R(π) av. exp. reward vector under the stationary policy π∞, e.g.

R(π) =∑

(j,a) rjaxja(π).

Page 232: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

222 Symbol Finder

Ri av. exp. reward under critical level (i) policy.R(i, l) av. exp. reward under π(l, q, j)∞, such that C(π(l, q, j)) = α.R notation for policy, i.e R = (π0, π1, . . .); routing matrix in Ch. 9.ρ(A) resolvent set of matrix A, i.e. λ ∈ C | λI − A has a µ-bounded

inverse .R(λ,A) resolvent of matrix A at λ, i.e. (λI −A)−1 for λ ∈ ρ(A).

rσ(A) spectral radius, i.e. limn→∞ ‖An‖1/nµ .r−1σ (A) 1/rσ(A).σ(A) spectrum of matrix A, ρ(A)c.S,SC set of all randomizing, constrained randomizing decision rules.T, Ti recurrence time,— when starting state is i.T (π) set of transient states under π∞.T throughput.u.s.c. abbrev. upper semi-continuous.uk(f), k = −1, 0, . . . (k + 1)th term in the Laurent expansion of V α(f), i.e. u−1(f) =

Π(f)r(f), uk(f) = (−1)kDk+1(f)r(f), k ≥ 0.V α(R) vector of α-discounted rewards under policy R, i.e. V αi (R) =

V ρi (R) = IEi,R

(∑∞n=0 α

nrX(n),Y (n)

)= (1 + ρ)

∑∞k=−1 ρ

kuk(f).

v(i, j), v(a, i, j) pp.48, 57, 156.x real number.xN (i, R) vector of expected state-action frequencies over N time-periods:

xNja(i, R) = 1N+1 IEi,R

∑Nn=0 1X(n)=j,Y (n)=a.

x(π) stationary probability vector with components xj(π) = Πij(π) orxja(π) = xj(π)πja.

x(i, R), X(i, R) vector limit point, set of vector limit points of xN (i, R)∞N=0.X,XC p.162, 165.X(n)∞n=0 Markov chain.X(t)t≥0 Markov process.(X(n), Y (n)) state of the MDC and action chosen at time n.z complex number.

Page 233: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Samenvatting 223

SAMENVATTING

Geometrisch ergodische Markov Ketens enhet optimaal Sturen van Wachtrijen

Het proefschrift is onderverdeeld in drie delen. Het eerste deel gaat over geometrischeergodiciteit van Markov ketens. Er wordt een nieuwe conditie ingevoerd, genaamd sterkeconvergentie. Hierbij wordt het bestaan van een vector µ op de toestandsruimte geeist,met componenten groter of gelijk aan 1, zo dat de n-staps overgangsmatrix van de Markovketen geometrisch snel naar de stationaire matrix convergeert in µ-norm (dit is de matrixnorm die geınduceerd wordt door de gewogen supremum norm met gewichtsvector µ).Sterke convergentie blijkt onder zekere voorwaarden equivalent te zijn met geometrischeergodiciteit, en voor begrensde µ-vectoren zelfs met sterke ergodiciteit.

Sterke convergentie is niet direct toetsbaar voor concrete modellen. Volgens de hoofd-stelling van deel I blijkt sterke convergentie equivalent te zijn met een hiertoe meergeeigend begrip, namelijk sterke recurrentie. Sterke recurrentie eist het bestaan van eenvector µ, met componenten groter of gelijk aan 1, zo dat de matrix van 1-staps taboeovergangswaarschijnlijkheden voor een eindige taboeverzameling een contractie is met be-trekking tot de µ-norm. Van groot belang voor toepassingen is, dat -door de uitspraakvan de hoofdstelling- uit sterke recurrentie voor een zekere µ-vector sterke convergentievoor dezelfde µ volgt.

In de literatuur zijn weinig resultaten te vinden over geometrische ergodiciteit van meerdi-mensionale wachtrijmodellen. Voor verscheidene van zulke modellen is het gelukt om sterkerecurrentie aan te tonen voor een productvorm µ-vector die exponentieel snel met de toes-tand stijgt. Daaruit volgt niet alleen geometrische ergodiciteit maar ook geometrisch snelleconvergentie van de Laplace-Stieltjes getransformeerden van de marginale verdelingen vanhet proces.

De basisbegrippen uit deel I zijn afgeleid van condities die afkomstig zijn uit de theorie vanMarkov beslissingsketens, en die in het proefschrift uniform sterke convergentie en uniformsterke recurrentie zijn genoemd. Met “uniform” wordt bedoeld dat de betreffende eigen-schap uniform moet gelden in de verzameling van Markov ketens die gegenereerd zijn doorde deterministische, stationaire strategieen. Uit de literatuur is bekend dat -onder stan-daard voorwaarden- zowel uniform sterke convergentie als uniform sterke recurrentie hetbestaan van deterministische, stationaire gevoelig optimale strategieen waarborgen voorMarkov beslissingsketens met een aftelbare toestands- en compacte actieruimte. Hierbij isBlackwell optimaliteit het meest gevoelige en gemiddelde optimaliteit het minst gevoeligecriterium.

Page 234: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

224 Samenvatting

Deel II poogt inzicht te geven in de klassen van modellen die aan een van beide conditiesvoldoen. Uniform sterke recurrentie wordt vergeleken met verschillende condities voor deexistentie van Blackwell optimale strategieen en er wordt aangetoond dat deze conditiesniet wezenlijk van elkaar verschillen. Met name de hoofdstelling van deel II bevat deequivalentie met uniform sterke convergentie.

Uitgaande van uniform sterke recurrentie als toetsbare representant van de verschillendesoorten condities, tonen we convergentie van het successieve approximatie algoritme voorgemiddelde opbrengsten aan in modellen met een meervoudige kernfuikstructuur. Alsmethode om de structuur van optimale strategieen te bepalen, moeten we een enkelvoudigekernfuikstructuur aannemen.

Een betrouwbare manier om een Blackwell optimale strategie te bepalen voor het eindigetoestands- en actiemodel is de limiet voor α ↑ 1 te nemen van α-verdisconteerd optimalestrategieen. Uit twee tegenvoorbeelden blijkt dat deze methode bij algemenere modellenniet tot goede resultaten hoeft te leiden. Deel II sluit af met concrete modellen, waar-bij de existentie van gevoelig optimale strategieen -en de overige resultaten betreffendeconvergentie van het successieve approximatie algoritme- wordt bewezen met behulp vanuniform sterke recurrentie voor een µ-vector met dezelfde structuur als voor de modellenin deel I .

In plaats van gevoelige optimaliteitscriteria worden ook optimaliteitscriteria gebruikt,waarbij de gemiddelde opbrengsten worden gemaximaliseerd onder een beperking op degemiddelde kosten, bijvoorbeeld het maximaliseren van de doorzet onder de beperking datde wachttijd van klanten een bepaalde grens niet mag overschrijden.

Lineaire Programmering is voor zulke optimaliseringsproblemen een vruchtbare formuler-ing om de existentie van optimale strategieen te bewijzen en deelklassen te bepalen die eenoptimale strategie bevatten. In deel III bestuderen we de Lineaire Programmeringsformu-lering voor Markov beslissingsketens met een aftelbare toestands- en eindige actieruimte,en een enkelvoudige kernfuikstructuur. Aangezien in dit geval het Lineaire Programmer-ingsprobleem oneindig dimensionaal is, zijn nieuwe bewijstechnieken nodig.

Essentieel gebruik wordt gemaakt van de uitspraak in het hoofdlemma uit deel III, dat destationaire verdeling op toestanden en acties onder een stationaire strategie op te split-sen is als een convexe combinatie van de corresponderende verdelingen onder stationairestrategieen die in een toestand minder loten. Zo kan bewezen worden, dat voor het gemid-delde optimaliteitscriterium met een beperking er een stationaire optimale strategie is,die in hooguit een toestand tussen twee acties loot. Daarmee kan het zoeken naar eenoptimale strategie beperkt worden tot een deelklasse van de strategieruimte.

Dit resultaat wordt gebruikt voor de analyse van een-dimensionale wachtrijmodellen meteen beperking. Van groot belang hiervoor is, dat de structuur van zulke modellen zo-danig is dat eenvoudige formules afgeleid kunnen worden voor de gewichten in de con-vexe combinaties van de stationaire verdelingen uit het hoofdlemma. Dit resulteert ineen unificerende methode om voor dergelijke modellen condities voor de optimaliteit vandrempel-strategieen af te leiden. Deze methode blijkt voor alle bekende modellen uit deliteratuur goed te werken.

Page 235: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

Nawoord 225

NAWOORD

Op deze plaats wil ik de gelegenheid aangrijpen om een aantal mensen te noemen die van

belang zijn geweest voor de totstandkoming van dit proefschrift.

Met Nico van Dijk heb ik vele gesprekken gevoerd die hebben bijgedragen tot de voortgang

van het onderzoek. Ad Ridder leverde een “computerbewijs” voor mijn vermoeden dat

het driehoekjesprobleem een oplossing heeft. Intensive discussions via EMAIL with Eitan

Altman and especially during his visit to Leiden in July 1988 gave me a better under-

standing of properties of Markov decision chains. This helped me enormously to prove

one of the main results of this monograph.

Tineke Bakker’s morele steun en, ondanks zijn voorliefde voor Horowitz, Erik Bakker’s

TEXpertise op het gebied van splitstechnieken zijn onontbeerlijk gebleken. Bouke van der

Veen heeft mij attent gemaakt op de geweldige tekeningen van Ton Smits die dit proef-

schrift verluchten. Lidwien Smits was zo vriendelijk mij toe te staan deze te gebruiken.

Het Leidsch Promovendi Dispuut Drop-the-S zorgde voor mijn academische vorming buiten

de Wiskunde en GJ Franx verzorgde de muzikale omlijsting.

Page 236: GEOMETRICALLY ERGODIC MARKOV CHAINS THE OPTIMAL …spieksma/papers/proefschrift.pdf · convergence for the same -vector. This is very important for Markov reward processes. Properties

226 Curriculum

CURRICULUM VITÆ

De schrijfster dezes werd geboren op 13 september 1958 te Caracas, Venezuela. Na het be-

halen van het einddiploma Gymnasium β aan het Willem de Zwijger Lyceum te Bussum is

zij in 1976 aan de Rijksuniversiteit te Leiden Spaanse Taal- en Letterkunde gaan studeren.

In 1979 is zij met de studie Wiskunde aan dezelfde universiteit begonnen, waarna in febru-

ari 1980 het candidaatsexamen Spaans en in juni 1981 het candidaatsexamen Wiskunde

werd afgelegd. De studie Wiskunde werd (cum laude) afgerond in augustus 1985 met een

afstudeerproject by Prof. dr. A. Hordijk.

Van 15 augustus 1982 tot 15 augustus 1985 was zij als student-assistent verbonden aan

de afdeling Wiskunde en Informatica van de Rijksuniversiteit te Leiden. Zij werkte van

1 september 1985 tot 1 februari 1990 als wetenschappelijk assistent bij dezelfde afdeling,

daarbij tot 1 februari 1986 in dienst zijnde van de Rijksuniversiteit te Leiden en vervolgens

van de Nederlandse Stichting voor de Wiskunde S.M.C. Gedurende deze periode heeft zij

onderzoek verricht onder de inspirerende leiding van Prof. dr. A. Hordijk, waarvan alle

resultaten in dit proefschrift beschreven staan.


Recommended