+ All Categories
Home > Documents > Markov decision programming techniques applied to the ...II. A survey of Markov decision programming...

Markov decision programming techniques applied to the ...II. A survey of Markov decision programming...

Date post: 17-Mar-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
24
Markov decision programming techniques applied to the animal replacement problem Anvendelse af teknikker for Markov beslutnings programmering til løsning af udskiftningsproblemet vedrørende husdyr Anders Ringgaard Kristensen
Transcript
Page 1: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

Markov decision programming techniquesapplied to the animal replacement problem

Anvendelse af teknikker for Markov beslutnings programmeringtil løsning af udskiftningsproblemet vedrørende husdyr

Anders Ringgaard Kristensen

Page 2: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

Denne afhandling er af Den Kgl. Veterinær- og Landbohøjskoles husdyrbrugs- og veterinærviden-skabelige kollegium antaget til offentligt at forsvares for den jordbrugsvidenskabelige doktorgrad.

København, den 4. maj 1993

Poul HyttelFormand for det husdyrbrugs- og veterinærvidenskabelige kollegium

Til min familie

Markov decision programming techniques applied to the animal replacement problem

Dissertation 1993© Anders R. Kristensen & JordbrugsforlagetISBN 87 7026 332 9

Afhandlingen er trykt med støtte fra Statens Jordbrugs- og Veterinærvidenskabelige forskningsråd.

Forside: Bo Bendixen plakat, 3 køer, for Danish Turnkey Dairies Ltd., 1982 format 62 x 91 cm.Kunstneren og APV DTD takkes for velvilligt at have stillet motivet til rådighed.(Cover page: Bo Bendixen poster - 3 cows - for Danish Turnkey Dairies Ltd., 1982(original 62 x 91 cm). Reproduced with permission).

Papers already published are reproduced with the permission of the publishers.

Published by Jordbrugsforlaget, Mariendalsvej 27, 2. DK-2000 Frederiksberg.Printed in Denmark by Sangill, Holme Olstrup. Pre-press by Repro-Sats Nord, Skagen.

Page 3: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

PrefaceThe research behind this thesis was carried out at the Royal Veterinary and Agricultural Universitypartly while I was the holder of a senior research fellowship at the Department of Mathematics from1985 to 1986 and partly during my employment at the Department of Animal Science and AnimalHealth, where I was assistant professor from 1986 to 1990 and now hold a position as associateprofessor. I am very grateful to the staff of both departments for excellent working conditions.

Professor Mats Rudemo, D. techn. Sc., and Professor Mogens Flensted-Jensen, D. Sc., have been verygood advisers as concerns the difficult job of writing scientific papers and their scientific support hasbeen a continuous encouragement. I am grateful to the head of my present department, AssociateProfessor Poul Henning Petersen, Ph. D. (agric.), for his awareness of the importance of managementand informatics in animal science. I am indebted to my colleagues, Associate Professor Poul Jensen,M. Sc., and Jens Noesgaard Jørgensen, Ph. D. (agric.), for mutual exchange of computer power andfor many animating discussions concerning computers and statistical methods. Associate ProfessorSven Bresson, Ph. D. (agric.), and Professor A. Neiman-Sørensen, D.V.M., have indirectly been a greathelp through their concise ideas concerning the methods and nature of research .

Also my former place of work, the National Institute of Animal Science, Section of "MultidisciplinaryStudies in Cattle Production Systems" (Helårsforsøg med Kvæg), has been a continuous inspiration.The enthusiasm and scientific competence, which over a short period has raised the section from the"demonstration farm" level to a leading position in Europe concerning the research in cattle productionsystems and management, has been an ever lasting example. In particular, thanks are due to the headof the section, Vagn Østergaard, D. Sc. (agric.), Iver Thysen, Ph. D. (agric.), Jan Tind Sørensen, Ph.D. (agric.), and Jens Peter Hansen, M. Sc. (agric.) for many inspiring discussions and for commentson earlier versions of several of the chapters of this thesis. Erik Jørgensen, Ph. D. (agric.) at theDepartment of Research in Pigs and Horses of the same institute has also supplied inspiringsuggestions.

As concerns the choice of subject of the thesis, I am indebted to Professor Harald B. Giæver,Agricultural University of Norway, who must suffer the indignity of being referred to as "Giaever"in this and other studies. His thesis on "Optimal dairy cow replacement policies" from Berkeleyaroused my interest in the animal replacement problem already when I was a student. Even though thethesis was published already in 1966, it remains even today an important reference, and several morerecent studies have not even reached its level.

The works of Dr. Yaron Ben-Ari from Israel have been the direct inspiration of one of the chaptersof this thesis, and indirectly they have inspired several chapters. Also the numerous works of theDepartment of Farm Management, Wageningen Agricultural University, have been of great value tomy research. In particular I am indebted to the works of Professor, dr. ir. Aalt A. Dijkhuizen and hisstaff.

For typing of some of the manuscripts I thank Mrs. Ruth Crifling, Mrs. Kirsten Astrup, Mrs. BrittaChristensen, and for giving advice concerning the English language I thank Mrs. Lone Høst, Mrs.Alice Jensen and Mr. Bent Grønlund.

Financial support was granted directly by the Danish Agricultural and Veterinary Research Councilfrom 1986 to 1988, and indirectly via Dina (Danish Informatics Network in the Agricultural Sciences)in 1991.

Copenhagen, September 1992Anders Ringgaard Kristensen

Page 4: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

Contents

I. Introduction

II. A survey of Markov decision programming techniques applied to the animal replacementproblem.

III. Hierarchic Markov processes and their applications in replacement models.(Reprinted from European Journal of Operational Research 35, 207-215).

IV. Optimal replacement and ranking of dairy cows determined by a hierarchic Markov process.(Reprinted from Livestock Production Science 16, 131-144).

V. Maximization of net revenue per unit of physical output in Markov decision processes.(Reprinted from European Review of Agricultural Economics 18, 231-244).

VI. Optimal replacement and ranking of dairy cows under milk quotas.(Reprinted from Acta Agriculturæ Scandinavica 39, 311-318).

VII. Bayesian updating in hierarchic Markov processes applied to the animal replacement problem.(Reprinted from European Review of Agricultural Economics 20, 223-239).

VIII. Optimal replacement in the dairy herd: A multi-component system.(Reprinted from Agricultural Systems 39, 1-24).

IX. Applicational perspectives.

X. Economic value of culling information in the presence and absence of a milk quota.By Anders R. Kristensen and Iver Thysen.(Reprinted from Acta Agriculturæ Scandinavica 41, 129-135).

XI. Ranking of dairy cows for replacement. Alternative methods tested by stochastic simulation.By Anders R. Kristensen and Iver Thysen.(Reprinted from Acta Agriculturæ Scandinavica 41, 295-303).

XII. Conclusions and outlook.

XIII. Summary.

XIV. Dansk sammendrag.

Page 5: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

II

Page 6: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A survey of Markov decision programmingtechniques appliedto the animal replacement problem1

ANDERS RINGGAARD KRISTENSENDepartment of Animal Science and Animal Health, The Royal Veterinary and Agricultural University,Rolighedsvej 23, DK-1958 Frederiksberg C, Copenhagen, Denmark

AbstractThe major difficulties of the animal replacement problem are identified as uniformity, herd restraintsand the "curse of dimensionality". Approaches for circumventing these difficulties using Markovdecision programming methods are systematically discussed, and possible optimization techniques aredescribed and evaluated. Assuming that the objective of the farmer is maximum net returns from theentire herd, relevant criteria of optimality are discussed. It is concluded that a Bayesian technique isa promising approach as concerns the uniformity problem, that parameter iteration may be used underherd restraints, and that hierarchic Markov processes has contributed to the solution of thedimensionality problem.

Keywords: Criteria of optimality, hierarchic Markov process, parameter iteration, Bayesian updating.

1. Introduction

This paper deals with a problem and a tech-nique. The problem is the determination ofoptimal replacement of animals (in practicelimited to cows and sows). The technique isdynamic programming or, to be more specific,Markov decision programming. The literatureon the replacement problem in general is veryextensive. Studies on the animal replacementproblem are also numerous, but naturally theyare fewer than for the general problem. Areview of studies on dairy cow replacement isgiven by van Arendonk (1984). Also on Mar-kov decision programming the literature isextensive. Recent reviews are given by van derWal and Wessels (1985) as well as White andWhite (1989). A review of applications toagriculture has been given by Kennedy (1981).

Since both the problem and the techniquediscussed in this paper seem to be well eluci-dated in the literature, a relevant question toask would be why the combination of theproblem and the technique should be the

subject of a survey. The answer is that animalreplacement problems differ from generalreplacement problems in several respects, andin order to deal with the problems arising fromthis observation many modifications of thegeneral Markov decision programming tech-nique are relevant or even necessary.

The general replacement theory most oftenimplicitly assumes industrial items as theobjects of replacement. Ben-Ari et al. (1983)mention three main features in which the dairycow replacement problem differs from theindustrial problem. Exactly the same featuresare relevant in sow replacement models.- Uniformity. It is a problem that the traits of

an animal are difficult to define and mea-sure. Furthermore the variance of each traitis relatively large.

- Reproductive cycle. The production of ananimal is cyclic. We therefore need todecide in which cycle to replace as well aswhen to replace inside a cycle.

- Availability. Usually there is a limitedsupply of replacements (heifers or gilts).

1 This Research was carried out as part of Dina, Danish Informatics Network in theAgricultural Sciences

Page 7: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

This is particularly the case when the far-mer only uses home-grown animals - forinstance because of infection risks whenanimals are bought at the market.

The problem of availability is only oneexample of a restraint that applies to the herdas a whole. Other examples might be a milkquota, a limited supply of roughages or limi-ting housing capacity. In all cases the animalsconsidered for replacement compete for theresource (or quota) in question. We shalltherefore in this study consider the more ge-neral problem of optimal replacement undersome herd restraint.

The main reason for using Markov decisionprogramming in the determination of optimalanimal replacement policies is probably thevariation in traits, which with this technique istaken into account directly. Also the cyclicproduction may be directly considered bytraditional Markov decision programming.Very soon, however, a problem of dimensiona-lity is faced. If several traits of the animal areconsidered simultaneously, and each trait isconsidered at a realistic number of levels, thestate space becomes very large (the size of thestate space is in principle calculated as thenumber of traits times the number of levels ofeach). Even though the method in theory canhandle the problem, optimization is prohibitiveeven on modern computers. In literature, theproblem is referred to as the "curse of dimen-sionality".

The objective of this study is to discuss howthe technique (Markov decision programming)may be adapted to solve the problem (theanimal replacement problem), where uniformityand herd restraints as well as the curse ofdimensionality (arising from the variability intraits and the cyclic production) have beenidentified as major difficulties to be taken intoaccount. During the decade since the reviewsof Kennedy (1981) and van Arendonk (1984)were written, many results have been achievedconcerning these difficulties.

We shall assume throughout the study that the

objective of the farmer is the maximization ofnet revenue from the entire herd. In eachsituation, we shall consider how this objectivemay be transformed to a relevant criterion ofoptimality to be used in the Markov decisionprocess.

2. Variability and cyclic production: Mar-kov decision programming

As mentioned in the introduction, Markovdecision programming is directly able to takethe variability in traits and the cyclic produc-tion into account without any adaptations. Inorder to have a frame of reference, we shallbriefly present the theory of traditional Markovdecision programming originally described byHoward (1960).

2.1. Notation and terminology

Consider a discrete time Markov decisionprocess with a finite state space U = 1,2,...,u

and a finite action set D. A policy s is a mapassigning to each state i an action s(i) ∈ D. Letpij

d be the transition probability from state i tostate j if the action d ∈ D is taken. The rewardto be gained when the state i is observed, andthe action d is taken, is denoted as ri

d. Thetime interval between two transitions is calleda stage.

We have now defined the elements of a tradi-tional Markov decision process, but in somemodels we further assume that if state i isobserved, and action d is taken, a physicalquantity of mi

d is involved (e.g. Kristensen,1989; 1991). In this study we shall refer to mi

d

as the physical output. If s(i) = d, the symbolsri

d, mid and pij

d are also written as ris, mi

s andpij

s, respectively.

An optimal policy is defined as a policy thatmaximizes (or minimizes) some predefinedobjective function. The optimization technique(i.e. the method to identify an optimal policy)depends on the form of the objective functionor - in other words - on the criterion of op-timality. The over-all objective to maximize

Page 8: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

net revenue of the entire herd may (dependingon the circumstances) result in different criteriaof optimality formulated as alternative objec-tive functions. The choice of criterion dependson whether the planning horizon is finite orinfinite.

2.2. Criteria of optimality

2.2.1. Finite planning horizon

A farmer, who knows that he is going toterminate his production after N stages, mayuse the maximization of total expected rewardsas his criterion of optimality. The correspon-ding objective function h is

where E denotes the expected value, sn is the

(1)

policy at stage n, and I(n) is the (unknown)state at stage n.

If the farmer has a time preference, so that heprefers an immediate reward to an identicalreward later on, a better criterion is the maxi-mization of total expected discounted rewards.If all stages are of equal length, this is equal toapplying the objective function

where β < 1 is the discount factor defined by

(2)

the interest rate and the stage length.

2.2.2. Infinite planning horizon

A situation where the stage of termination isunknown (but at least far ahead) is usuallymodeled by an infinite planning horizon (i.e. N= ∞). In that case the optimal policy is con-stant over stages. The function (1) cannot beapplied in this situation, but since β < 1, thefunction (2) will converge towards a fixed

value for N becoming very large. Thus theobjective function is given by

Since, usually, each animal and its future

(3)

successors are represented by a separate Mar-kov decision process, this criterion togetherwith the criterion (2), are equal to the maximi-zation of total discounted net revenues peranimal. Such a criterion is relevant in a situa-tion where a limiting housing capacity is theonly (or at least the most limiting) herd re-straint.

An alternative criterion under infinite planninghorizon is the maximization of expected avera-ge reward per unit of time. If all stages are ofequal length, the objective function in thissituation is

where πis is the limiting state probability under

(4)

the policy s (i.e. when the policy is kept con-stant over an infinite number of stages). Thiscriterion maximizes the average net revenuesper stage, i.e. over time. It may be relevantunder the same conditions as criterion (3) if ananimal and its future successors are representedby a separate Markov decision process. Prac-tical experience shows that the optimal replace-ment policies determined under criteria (3) and(4) are almost identical.

If a herd restraint (e.g. a milk quota) is impo-sed on the physical output, a relevant criterionmay be the maximization of expected averagereward per unit of physical output using theobjective function

Page 9: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

In case of a milk quota, the physical output mis

(5)

is the milk produced by a cow in state i underpolicy s. The function (5) is also relevant if thecriterion is the maximization of the expectedaverage reward over time in a model where thestage length varies. In that case the physicaloutput represents the stage length. It should benoticed that if mi

d = 1 for all i and d, thefunction (5) is identical to (4). The symbol gr

s

is the average reward over stages (equal to gs

of Eq. (4)) and gms is the average physical

output over stages.

2.3. Optimization techniques in general Markovdecision programming

2.3.1. Value iteration

Under finite planning horizon the value itera-tion method is excellent. The optimal policiesare determined sequentially using the func-tional equations

where the action d maximizing the right hand

(6)

side is optimal for state i at the stage in qu-estion. The function fi(n) is the total expecteddiscounted rewards from the process when itstarts from state i and will operate for n stagesbefore termination. Thus fi(0) is the salvagevalue of the system when it is in state i. Ateach stage an optimal policy is chosen usingEqs. (6). If the objective function (1) is used,β = 1 in Eq. (6). Otherwise β is the discountfactor.

Under infinite planning horizon, the value

iteration method may be used to approximatean optimal policy. Under the objective function(3) it is possible to show that (cf. Howard1960)

where fi for fixed i is a constant. By using Eqs.

(7)

(6) over a large number of stages, we willsooner or later observe that fi(n+1) is almostequal to fi(n) for all i. Further we will observethat the same policy is chosen during severalstages. We can feel rather sure that such apolicy is close to be optimal, but there is noguarantee that it is identical to an optimalpolicy. For practical purposes, however, theapproximation usually suffices.

Since the objective function (4) is just a specialcase of function (5), where mi

s = 1 for all i andd, we shall only consider the criterion given by(5). In this case fi(n) is the total expectedrewards when the process starts from thebeginning of a stage in state i and will operateuntil n units of physical output have beenproduced. Under the criterion given by theobjective function (4), the production of n unitsof output is just the operation of the processover n stages. It is assumed that the physicaloutput only takes integer values (for practicalpurpose this is just a question of selecting anappropriate unit). According to Howard (1971)an optimal policy for producing n units ofoutput (i.e. a policy that maximizes the expec-ted reward of producing n units) is determinedrecursively by the relations (i=1,...,u):

Page 10: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

This is under the assumption that the reward-/output rate has the constant value of ri

d/mid

during the entire stage. However, since thephysical output is bounded, it is easily seenthat for n sufficiently large, a = 0. Hence weget for i=1,...,u

(8)

Thus in the long run, the assumption concer-ning constant reward/output rate in all stateswill have no effect. The equivalence of Eq. (7)is in this case

and sooner or later the policy will not differ

(9)

from step to step of Eqs. (8).

Further details on the value iteration methodare given by Howard (1960; 1971). It shouldparticularly be noticed that mi

d, which in thisstudy is interpreted as a physical output (e.g.milk yield), in the study of Howard (1971) isinterpreted as the expected stage length whenstate i is observed under the action d. Thus inhis model the criterion (5) is the expectedaverage reward over time. Compared to Eq.(8), Howard (1971) described a more generalcase where the stage length is a random vari-able of which the distribution is given by theaction and the present state as well as the stateto be observed at the next stage. Further thereward depends on the state combination, theaction and the stage length. The interpretation

as physical output has been discussed byKristensen (1991).

The value iteration method is identical to whatis usually referred to as dynamic programming,successive iteration or successive approxima-tion.

2.3.2. Policy iteration

Under infinite planning horizon, the policyiteration method may be applied. Unlike thevalue iteration method it always provides anoptimal policy. It covers all three objectivefunctions (3), (4) and (5). The iteration cycleused for optimization has the following steps:

1) Choose an arbitrary policy s. Go to 2.2) Solve the set of linear simultaneous equa-

tions appearing in Table 1. Go to 3.3) For each state i, find the action d’ that

maximizes the expression given in Table 1,and put s’(i)=d’. If s’=s then stop, since anoptimal policy is found. Otherwise redefines according to the new policy (i.e. put s=s’)and go back to 2.

From the equations and expressions of Table 1,we see that also with the policy iterationmethod the objective function (4) is just aspecial case of (5) where mi

s = 1 for all i andd. For the objective functions (3) and (4) thepolicy iteration method was developed byHoward (1960), and for the function (5) apolicy iteration method was presented byJewell (1963). Like Howard (1971), Jewellinterpreted mi

d as the expected stage length.

Under Criterion (3), fis is the total present value

of the expected future rewards of a processstarting in state i and running over an infinitenumber of stages following the constant policys. Under Criterions (4) and (5), fi

s is the relati-ve value of state i under the policy s. Thedifference in relative values between two statesequals the amount of money a rational personis just willing to pay in order to start in thehighest ranking of the two states instead of the

Page 11: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

Table 1. Equations and expressions to be used in the policy iteration cycle with different objectivefunctions.

Objective Linear equations of Step 2 Expressionfunction Step 3

Equation Unknowns Additional(i=1,...,u) equation

lowest ranking. The absolute value of fis is

determined arbitrarily by the additional equa-tion of Table 1, where the relative value ofstate u is defined to be zero. The interpretationof relative values is discussed in details byKristensen (1991).

2.3.3. Linear programming

Under an infinite planning horizon, linearprogramming is a possible optimization tech-nique. When the criterion (3) is applied thelinear programming problem becomes (cf.Ross, 1970)

It appears from (10) that each combination of

(10)

state and action is represented by exactly one

restriction. An action d is optimal in state i if,and only if, the corresponding restriction issatisfied as an equation when the values ofx1,...,xu arises from an optimal solution to thelinear programming problem. The optimalvalues of x1,...,xu are equal to the present valuesf1

s,...,fus under an optimal policy.

If the objective function (4) is applied, thelinear programming problem becomes

In this case an action d is optimal in state i if

(11)

and only if xid from the optimal solution is

strictly positive. The optimal value of theobjective function is equal to the average

Page 12: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

rewards per stage under an optimal policy. Theoptimal value of d∈ D xi

d is equal to the limi-ting state probability πi under an optimalpolicy.

Using Criterion (5), we may solve the follo-wing linear programming problem (cf. Kenne-dy,1986):

where a is a pre-determined relative value of

(12)

state u chosen sufficiently large to ensure thatall other relative values are positive. Theoptimal value of the objective function of thelinear programming problem is equal to theexpected average reward per unit of output asdefined in Eq. (5) under an optimal policy. Theoptimal values of the variables x1,...,xu-1 areequal to the relative values of the states 1,...,u-1, provided that the relative value of state u isequal to a . As it appears, each combination ofstate and action is represented by one and onlyone restriction. An action is optimal in a stateif and only if the corresponding restriction issatisfied as an equation in the optimal solution.

Since Criterion (4) is just a special case of (5)with all physical outputs set to the value 1, thelinear programming problem (12) may also beused in the determination of an optimal policyunder Criterion (4).

2.4. Discussion and applications

Under finite planning horizon, the value itera-

tion method is perfect, but in replacementmodels the planning horizon is rarely welldefined. Most often the process is assumed tooperate over an unknown period of time withno pre-determined stage of termination. In suchcases the abstraction of an infinite planninghorizon seems more relevant. Therefore weshall pay specific attention to the optimizationproblem under the criteria (3), (4) and (5)where all three techniques described in theprevious sections are available.

The value iteration method is not exact, andthe convergence is rather slow. On the otherhand, the mathematical formulation is verysimple, and the method makes it possible tohandle very large models with thousands ofstates. Further it is possible to let the rewardand/or the physical output depend on the stagenumber in some pre-defined way. This hasbeen mentioned by van Arendonk (1984) as anadvantage in modelling genetic improvementover time. The method has been used in a lotof dairy cow replacement models as an ap-proximation to the infinite stage optimum.Thus it has been used by Jenkins and Halter(1963), Giaever (1966), Smith (1971),McArthur (1973), Steward et al. (1977; 1978),Killen and Kearney (1978), Ben-Ari et al.(1983), van Arendonk (1985; 1986) and vanArendonk and Dijkhuizen (1985). Some of themodels mentioned have been very large. Forinstance, the model of van Arendonk andDijkhuizen contained 174 000 states (reportedby van Arendonk, 1988). In sows, the methodhas been used by Huirne et al. (1988).

The policy iteration method has almost exactlythe opposite characteristics of the value itera-tion method. Because of the more complicatedmathematical formulation involving solution oflarge systems of simultaneous linear equations,the method can only handle rather small mo-dels with, say, a few hundred states. Thesolution of the linear equations implies theinversion of a matrix of the dimension u × u ,which is rather complicated. On the other hand,the method is exact and very efficient in the

Page 13: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

sense of fast convergence. The rewards are notallowed to depend on the stage except for afixed rate of annual increase (e.g. inflation) ordecrease. However, a seasonal variation inrewards or physical outputs is easily modeledby including a state variable describing season(each state is usually defined by the value of anumber of state variables describing the sy-stem).

An advantage of the policy iteration method isthat the equations in Table 1 are general.Under any policy s we are able to calculatedirectly the economic consequences of follo-wing the policy by solution of the equations.This makes it possible to compare the econo-mic consequences of various non-optimalpolicies to those of the optimal. Further wemay use the equations belonging to the cri-terion (5) to calculate the long run technicalresults under a given policy by redefining ri

s

and mis . If for instance ri

s = 1 if a calvingtakes place and zero otherwise, and mi

s is thestage length when state i is observed underpolicy s , then gs, which is the average numberof calvings per cow per year, may be determi-ned from the equations. Further examples arediscussed by Kristensen (1991). For anexample where the equations are used forcalculation of the economic value of cullinginformation, reference is made to Kristensenand Thysen (1991).

The policy iteration method has been used byReenberg (1979) and Kristensen and Øster-gaard (1982). The models were very small,containing only 9 and 177 states, respectively.

3. The curse of dimensionality: HierarchicMarkov processes

In order to combine the computational advanta-ges of the value iteration method with theexactness and efficiency of the policy iterationmethod Kristensen (1988; 1991) introduced anew notion of a hierarchic Markov process. Itis a contribution to the solution of the problemreferred to as the "curse of dimensionality"

since it makes it possible to give exact solu-tions to models with even very large statespaces. A hierarchic Markov process is onlyrelevant under infinite planning horizon, andthere is no relevance of the criterion (4) be-cause the special situation where the physicaloutput equals 1 in all stages has no computa-tional advantages over other values. Thereforewe shall only consider the criteria (3) and (5).

3.1. Notation and terminology

A hierarchic Markov process is a series ofMarkov decision processes called subprocessesbuilt together in one Markov decision processcalled the main process. A subprocess is afinite time Markov decision process with Nstages and a finite state space Ωn = 1,...,un forstage n, 1 ≤ n ≤ N. The action set Dn of the nthstage is assumed to be finite, too. A policy s ofa subprocess is a map assigning to each stagen and state i ∈ Ω n an action s(n,i) ∈ Dn. The setof all possible policies of a subprocess isdenoted Γ. When the state i is observed andthe action d is taken, a reward ri

d(n) is gained.The corresponding physical output is denotedas mi

d(n). Let pijs(n) be the transition probabili-

ty from state i to state j where i is the state atthe nth stage, j is the state at the followingstage and d is the action taken at stage n.Under the Criterion (3) we shall denote thediscount factor in state i under the action d asβi

d(n) assuming that the stage length is givenby stage, state and action.

Assume that we have a set of v possible sub-processes each having its own individual set ofparameters. The main process is then a Markovdecision process running over an infinitenumber of stages and having the finite statespace 1,...,v. Each stage in this processrepresents a particular subprocess. The actionsets of the main process are the sets Γι , ι =1,...,v , of all possible policies of the individualsubprocesses (to avoid ambiguity the states ofthe main process will be denoted by Greekletters ι , κ etc.). A policy σ is a map assig-ning to each state ι of the main process an

Page 14: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

action σ(ι ) ∈ Γ ι. The transition matrix of themain process has the dimension v × v, and it isdenoted Φ = φικ. The transition probabilitiesare assumed to be independent of the actiontaken. The reward fι

σ and the physical outputhι

σ in state ι of the main process are deter-mined from the total rewards and output func-tions of the corresponding subprocess

and analogously for hισ (except for the dis-

(13)

count factor). The symbol pi(0) is the probabi-lity of observing state i at the first stage of thesubprocess. Finally, the expected discountfactor in state ι under the action s is denotedas Bι

s and calculated as follows

(14)

3.2. Optimization

Since the main process is just an ordinaryMarkov decision process, the policy iterationcycle described in Section 3.3.2 might be useddirectly for optimization. In practice Steps 1and 2 are easily carried out, but Step 3 isprohibitive because of the extremely largenumber of alternative actions s ∈ Γ ι (asmentioned above s is an entire policy of the

ι th subprocess). To circumvent this problemKristensen (1988; 1991) constructed an iterati-ve method, where a value iteration method isapplied in the subprocesses and the results areused in Step 3 of the policy iteration methodof the main process. The different versions ofthe method covers the criteria of optimalityunder infinite planning horizon defined as (3)and (5) in Section 3.2.2. Since criterion (4) isa special case of (5) it is also indirectly co-vered.

The general form of the iteration cycle of ahierarchic Markov process has the followingsteps:1) Choose an arbitrary policy σ. Go to 2.2) Solve the following set of linear simultane-

ous equations for F1σ,...,Fv

σ and in case ofCriterion (5) for gσ:

In case of Criterion (5) the additional equa-tion Fv

σ = 0 is necessary in order to deter-mine a unique solution. Go to 3.

3) Define

under Criterion (3) and Tι = 0 under Cri-terion (5). For each subprocess ι , find bymeans of the recurrence equations

a policy s’ of the subprocess. The action

Page 15: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

s’(n,i) is equal to the d’ that maximizes theright hand side of the recurrence equationof state i at stage n. Put σ’(ι ) = s’ for ι =1,...,v. If σ’ = σ, then stop since an optimalpolicy is found. Otherwise, redefine σaccording to the new policy (i.e. put σ =σ’) and go back to 2.

When the iteration cycle is used under Cri-terion (3) all physical outputs (mi

d(n) andaccordingly also hι

σ) are put equal to zero.The iteration cycle covering this situation wasdeveloped by Kristensen (1988).

Under Criterion (4) all physical outputs mid(n)

and all discount factors βid(n) and Bι

σ are putequal to 1, but under Criterion (5) only thediscount factors are put equal to 1. The itera-tion cycle covering these situations was descri-bed by Kristensen (1991).

3.3. Discussion and applications

The hierarchic Markov process is speciallydesigned to fit the structure of replacementproblems where the successive stages of thesubprocesses correspond to the age of the assetin question. By appropriate selection of statespaces in the subprocesses and the main pro-cess it is possible to find optimal solutions toeven very large models. The idea is to let thenumber of states in the subprocesses (where avalue iteration technique is applied) be verylarge and only include very few states in themain process (where the technique is directlybased on the policy iteration method). Thus wehave got a method which is at the same timefast, exact and able to handle very large mo-dels.

Kristensen (1987) used the technique in a dairycow replacement model which in a traditionalformulation as an ordinary Markov decisionprocess would have contained approximately60000 states, and later (Kristensen, 1989) in amodel with approximately 180 000 states. Inboth cases the number of states in the mainprocess was only 5, reducing Step 2 to the

solution of only 5 simultaneous linear equa-tions (versus 180 000 in a traditional formula-tion). Even in these very large models thenumber of iterations needed to provide anoptimal solution was only from 3 to 6 (testedunder 100 different price and production condi-tions, Kristensen, 1991). Recently, the methodis applied by Houben et al. (1992).

In sows, Huirne et al. (1992) seem to haveapplied a technique which in many aspects issimilar to a hierarchic Markov process, butthey have not explained their method in alldetails. Also Jørgensen (1992a) has applied atechnique which is inspired of a hierarchicMarkov process in a sow replacement model,and recently (Jørgensen 1992b), he used thehierarchic method in the determination ofoptimal delivery policies in slaughter pigs.

Naturally the hierarchic model just describedmay also be formulated as an ordinary Markovdecision process. In that case each combinationof subprocess (main state), stage and stateshould be interpreted as a state. We shalldenote a state in the transformed process as(ιni), and the parameters are

where the parameters mentioned on the right

(15)

hand side of the equations are those belongingto the ι th subprocess except for pi(0) whichbelongs to subprocess κ . This formulation ofcourse has the same optimal policies as thehierarchic formulation, so it is only computa-tional advantages that make the hierarchicmodel relevant. A comparison to traditionalmethods may therefore be relevant.

Page 16: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

Since the policy iteration method involves thesolution of a set of u equations (where u is thenumber of states) it is only relevant for smallmodels. The value iteration method, however,has been used with even very large models andmay handle problems of the same size as thehierarchic formulation, but the time spend onoptimization is much lower under the hierar-chic formulation. To recognize this, we shallcompare the calculations involved.

Step 3 of the hierarchic optimization involvesexactly the same number of operations as oneiteration of the value iteration method (Eq.(6)). The further needs of the hierarchic met-hod are the calculation of the rewards andeither the physical output or the expecteddiscount factor of a stage in the main processaccording to Eqs. (13) and (14). Since thecalculations at each stage is only carried outfor one action, the calculation of both mainstate parameters involves approximately thesame number of operations as one iterationunder the value iteration method if the numberof alternative actions is 2. If the number ofactions is higher, the calculations relativelyinvolves a lower number of operations than aniteration under the value iteration method.These considerations are based on the assump-tion that the value iteration method is program-med in an efficient way, so that the sum of Eq.(6) is not calculated as a sum of all u elements,but only as a sum of those elements where pij

d

is not zero according to Eq. (15). Otherwisethe hierarchic technique will be even moresuperior. Finally the system of linear equationsof Step 2 of the hierarchic cycle must besolved, but in large models with only a fewstates in the main process the time spent onthis is negligible.

If we use the considerations above in a prac-tical example, the advantages of the hierarchictechnique becomes obvious. As reported byKristensen (1991) a model with 180,000 statecombinations was optimized by the hierarchictechnique under 100 different price conditions.The number of iterations needed ranged from

3 to 6 corresponding to between 6 and 12iterations of the value iteration method. If thelatter method was used instead, a planninghorizon of 20 years would be realistic (cf. vanArendonk 1985). Since each state in the modelequals 4 weeks, this horizon represents 260iterations, which should be compared to theequivalence of from 6 to 12 when the hierar-chic technique was applied.

3.4. A numerical example of a hierarchicMarkov process

Consider an asset (e.g. a dairy cow) producingtwo kinds of output items (1 and 2, e.g. milkand beef). We shall assume that the productionlevel of item 1 may change stochastically overtime, whereas the production of item 2 isconstant over the entire life time of the asset(but may vary between individual assets). Atregular time intervals (stages) the asset isinspected in order to determine the productionlevel of item 1. At the first inspection of theasset the production level of item 2 is alsodetermined. In both cases we assume that theresult may be "bad", "normal" or "good"(representing the production of 5, 6 and 7 unitsof item 1 or 3, 4 and 5 units of item 2). Afterinspection we can choose to keep the asset forat least one additional stage, or we can chooseto replace it at the end of the stage at someadditional cost.

The three classes of production level of item 2are defined as states in the main process of ahierarchic Markov process. Thus the number ofsubprocesses is also 3 and each subprocessrepresents an asset of a certain productivityconcerning item 2. When a new asset is pur-chased, we assume that the probability dis-tribution over main states is uniform, so thatthe probability of entering either one is 1/3.The maximum age of an asset is assumed to be4 stages, and the states of the subprocess aredefined from the productivity concerning item1. Further a dummy state of length, reward andoutput equal to 0 is included at each stage ofthe subprocesses. If the asset is replaced at the

Page 17: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

Table 2. Parameters of the hierarchic Markov process, subprocesses.

Sub- Stg. St. pij1(n) mi

1(n) ri1(n) pij

2(n) mi2(n) ri

2(n)pr.ι n i j=1 j=2 j=3 j=4 j=1 j=2 j=3 j=4

1 1 1 0.6 0.3 0.1 0.0 5 7 0.0 0.0 0.0 1.0 5 51 1 2 0.2 0.6 0.2 0.0 6 8 0.0 0.0 0.0 1.0 6 61 1 3 0.1 0.3 0.6 0.0 7 9 0.0 0.0 0.0 1.0 7 71 1 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 01 2 1 0.6 0.3 0.1 0.0 5 6 0.0 0.0 0.0 1.0 5 41 2 2 0.2 0.6 0.2 0.0 6 7 0.0 0.0 0.0 1.0 6 51 2 3 0.1 0.3 0.6 0.0 7 8 0.0 0.0 0.0 1.0 7 61 2 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 01 3 1 0.6 0.3 0.1 0.0 5 5 0.0 0.0 0.0 1.0 5 31 3 2 0.2 0.6 0.2 0.0 6 6 0.0 0.0 0.0 1.0 6 41 3 3 0.1 0.3 0.6 0.0 7 7 0.0 0.0 0.0 1.0 7 51 3 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 01 4 1 - - - - 5 2 - - - - 5 21 4 2 - - - - 6 3 - - - - 6 31 4 3 - - - - 7 4 - - - - 7 41 4 4 - - - - 0 0 - - - - 0 02 1 1 0.6 0.3 0.1 0.0 5 8 0.0 0.0 0.0 1.0 5 62 1 2 0.2 0.6 0.2 0.0 6 9 0.0 0.0 0.0 1.0 6 72 1 3 0.1 0.3 0.6 0.0 7 10 0.0 0.0 0.0 1.0 7 82 1 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 02 2 1 0.6 0.3 0.1 0.0 5 7 0.0 0.0 0.0 1.0 5 52 2 2 0.2 0.6 0.2 0.0 6 8 0.0 0.0 0.0 1.0 6 62 2 3 0.1 0.3 0.6 0.0 7 9 0.0 0.0 0.0 1.0 7 72 2 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 02 3 1 0.6 0.3 0.1 0.0 5 6 0.0 0.0 0.0 1.0 5 42 3 2 0.2 0.6 0.2 0.0 6 7 0.0 0.0 0.0 1.0 6 52 3 3 0.1 0.3 0.6 0.0 7 8 0.0 0.0 0.0 1.0 7 62 3 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 02 4 1 - - - - 5 3 - - - - 5 32 4 2 - - - - 6 4 - - - - 6 42 4 3 - - - - 7 5 - - - - 7 52 4 4 - - - - 0 0 - - - - 0 03 1 1 0.6 0.3 0.1 0.0 5 9 0.0 0.0 0.0 1.0 5 73 1 2 0.2 0.6 0.2 0.0 6 10 0.0 0.0 0.0 1.0 6 83 1 3 0.1 0.3 0.6 0.0 7 11 0.0 0.0 0.0 1.0 7 93 1 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 03 2 1 0.6 0.3 0.1 0.0 5 8 0.0 0.0 0.0 1.0 5 63 2 2 0.2 0.6 0.2 0.0 6 9 0.0 0.0 0.0 1.0 6 73 2 3 0.1 0.3 0.6 0.0 7 10 0.0 0.0 0.0 1.0 7 83 2 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 03 3 1 0.6 0.3 0.1 0.0 5 7 0.0 0.0 0.0 1.0 5 53 3 2 0.2 0.6 0.2 0.0 6 8 0.0 0.0 0.0 1.0 6 63 3 3 0.1 0.3 0.6 0.0 7 9 0.0 0.0 0.0 1.0 7 73 3 4 0.0 0.0 0.0 1.0 0 0 0.0 0.0 0.0 1.0 0 03 4 1 - - - - 5 4 - - - - 5 43 4 2 - - - - 6 5 - - - - 6 53 4 3 - - - - 7 6 - - - - 7 63 4 4 - - - - 0 0 - - - - 0 0

Page 18: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

end of a stage, the process enters the dummystate with probability 1 at the next stage, andfor the rest of the duration of the subprocess itwill stay in the dummy states.

For all subprocesses we assume that, if theasset is kept, the probability of staying at thesame productivity level (state in the subpro-cess) concerning item 1 is 0.6, and if thepresent state is "normal", the probability oftransition to either "bad" or "good" is 0.2 each.The probability of transition (if kept) from"bad" or "good" to "normal" is in both cases0.3, and from "bad" to "good" and vice versathe probability is 0.1. The initial state pro-babilities of the subprocesses are assumed todepend on the subprocess in such a way thatfor subprocess number 1 (low productivity ofitem 2) the probabilities of entering state"bad", "normal" and "good" are 0.6, 0.3 and0.1 respectively. For subprocess number 2 thecorresponding probabilities are 0.2, 0.6 and 0.2and finally for subprocess number 3 they are0.1, 0.3, 0.6.

The physical output mid(n) of state i at stage n

of subprocess number ι is equal to the produc-tion of item 1 under the action d, and thecorresponding rewards are assumed to bedefined as follows:

where c1 is the price of item 1, c2 is the priceof item 2, cn is the cost of operating the assetat the age n, kι is the production of item 2 insubprocess (main state) number ι and c3 is thereplacement cost which is zero if no replace-ment takes place. The cost of operating theasset is assumed to increase linearly from 1 to4 over stages. Defining c1 = c2 = 1 and c3

2 = 2gives us the final parameters appearing inTables 2 and 3. All stages (except those wherethe process is in a dummy state of zero length)are assumed to be of equal length, which wefor convenience put equal to 1.

We shall determine an optimal solution underthe following 3 criteria of optimality:

1) Maximization of total expected discoun-ted rewards, i.e., the objective function(3). In this case the physical outputs ofTable 2 are ignored, and a discount fac-tor βi

d(n) = exp(-r), where r is the inter-est rate, is applied (for states where thestage length is not zero).

2) Maximization of average rewards overtime. In this situation we use the objec-tive function (5) letting the output repre-sent stage length. No discounting isperformed in this case.

3) Maximization of average rewards overoutput defined as in Table 2. Thus theobjective function (5) is applied, and nodiscounting is performed.

In Table 4, optimal policies under the threecriteria are shown. It appears that the samepolicies are optimal under the first two criteria,but under the third criterion the optimal policydiffers. A more detailed example of the effectof criterion of optimality was discussed byKristensen (1991).

In order to compare the efficiency of thehierarchic technique to the traditional policyand value iteration methods, the problem of theexample was transformed to an ordinary Mar-kov decision process and optimized by thosemethods. The transformed model has 3 × 4 ×4 = 48 states, which is not larger than thepolicy iteration method may be applied withoutproblems. In Table 5 some performance data ofthe three optimization techniques are com-pared.

The superiority of the hierarchic technique overthe policy iteration method is due mainly tothe time spent on solving the linear simultane-ous equations of Step 2. In the hierarchic casea system of 3 equations is solved, whereas 48equations are solved under the ordinary policyiteration method.

In this numerical example the performance ofthe hierarchic technique is even more superiorto the value iteration method than expectedfrom the theoretical considerations of Section3.3. In the present case an iteration of thehierarchic model is performed even faster than

Page 19: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

Table 3. Parameters of the hierarchic process. Transition probabilities of main process and initialstate probabilities of subprocesses

Transition probabilities, main process Initial state probabilities, subprocesses

Main state φικ pi(0)ι

κ=1 κ=2 κ=3 i=1 i=2 i=3 i=4

1 1/3 1/3 1/3 0.6 0.3 0.1 0.02 1/3 1/3 1/3 0.2 0.6 0.2 0.03 1/3 1/3 1/3 0.1 0.3 0.6 0.0

Table 4. Optimal policies under the three criteria (c1, c2, c3) defined in the text (actions: 1="keep",2="replace").

Subprocess Stage State 1 State 2 State 3

c1 c2 c3 c1 c2 c3 c1 c2 c3

1 1 2 2 2 2 2 2 2 2 21 2 2 2 2 2 2 2 2 2 21 3 2 2 2 2 2 2 2 2 22 1 2 2 1 1 1 1 1 1 22 2 2 2 2 2 2 2 2 2 22 3 2 2 2 2 2 2 2 2 23 1 1 1 1 1 1 1 1 1 13 2 2 2 1 1 1 1 1 1 23 3 2 2 2 2 2 2 2 2 2

Table 5. The performance of the hierarchic technique compared to the policy and value iterationmethods under the three criteria (c1, c2, c3) defined in the text.

Hierarchic model Policy iteration Value iteration

c1 c2 c3 c1 c2 c3 c1 c2 c3

Number of iterations 4 3 3 3 4 3 100 100 100Computer time, relatively 1 0.82 0.77 120 150 120 62 64 63

one of the value iteration method applied to thesame (transformed) model. The reason is thatthe value iteration algorithm has not beenprogrammed in the most efficient way asdefined in Section 3.3. On the contrary, the

sum of Eq. (6) has been calculated over all 48states of the transformed model. Since only 4transition probabilities from each state arepositive, the sum could be calculated only overthese 4 states.

Page 20: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

4. Uniformity: Bayesian updating

As discussed earlier, it is obvious that the traitsof an animal varies no matter whether we areconsidering the milk yield of a dairy cow, thelitter size of a sow or almost any other trait.On the other hand, it is not obvious to whatextent the observed trait Yn at stage n is, forinstant, the result of a permanent property ofthe animal X1, a permanent damage caused bya previous disease X2 or a temporary randomfluctuation en. Most often the observed value isthe result of several permanent and randomeffects. With Yn, X1, X2 and en defined as abovethe relation might for instance be

Yn = m + X1 + aX2 + en , (17)

where m is the expected value for an arbitrarilyselected animal under the circumstances inquestion, and a = -1 if the animal has beensuffering from the disease, and a = 0 other-wise. In this example X1 only varies amonganimals, whereas en also varies over time forthe same animal. The effect of the damagecaused by the disease X2 is in this exampleassumed to be constant over time when it hasbeen "switched on". The value of X2 is aproperty of the individual disease case (the"severity" of the case).

In a replacement decision it is of course impor-tant to know whether the observed value ismainly a result of a permanent effect or it isjust the result of a temporary fluctuation. Theproblem, however, is that only the resultingvalue Yn is observed, whereas the values of X1,X2 and en are unknown. On the other hand, asobservations of Y1, Y2,... are done we arelearning something about the value of thepermanent effects. Furthermore, we have got aprior distribution of X1 and X2, and each timean observation is done, we are able to calculatethe posterior distribution of X1 and X2 bymeans of the Kalman-filter theory (describedfor instance by Harrison and Stevens, 1976) ifwe assume all effects to be normally distribu-ted.

A model as described by Eq. (17) fits verywell into the structure of a hierarchic Markovprocess. Thus we may regard Yn as a statevariable in a subprocess, and the permanenteffects X1 and X2 as state variables of the mainprocess. We then face a hierarchic Markovprocess with unobservable main state. Kri-stensen (1993) discusses this notion in details,and it is shown that under the assumption ofnormally distributed effects, we only have tokeep the present expected values of X1 and X2,the currently observed value of Yn and (in thisexample) the number of stages since the animalwas suffering from the disease (if it has beensuffering from the disease at all). The expec-tations of X1 and X2 are sufficient to determinethe current posterior distribution of the vari-ables, because the variance is known in advan-ce. Even though the posterior variance decrea-ses as observations are done, the decrease doesnot depend on the values of Y1, Y2,... but onlyon the number of observations done.

In the study of Kristensen (1993), a moregeneral case involving several traits each beinginfluenced by several unobservable effects issketched, and a numerical example involvingonly a single trait is given. An example con-cerning replacement of sows has been given byJørgensen (1992a). It was demonstrated in bothstudies that the Bayesian approach in somecases may result in state space reduction wit-hout loss of information.

5. Herd restraints: Parameter iteration

One of the major difficulties identified in theintroduction was herd restraints. All the repla-cement models mentioned in the previoussections have been single-component models,i.e., only one animal is considered at the sametime, assuming an unlimited supply of allresources (heifers or gilts for replacement,feed, labour etc) and no production quota. In amulti-component model all animals of a herdare simultaneously considered for replacement.If all animals (components) compete for thesame limited resource or quota, the replace-

Page 21: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

ment decision concerning an animal does notonly depend on the state of that particularanimal, but also on the states of the otheranimals (components) of the herd.

If the only (or at least the most limiting) herdrestraint is a limited housing capacity, thenumber of animals in production is the scarceresource, and accordingly the relevant criterionof optimality is the maximization of net reve-nues per animal as it is expressed in the cri-teria (1), (2), (3) and (4). Thus the optimalreplacement policy of the single componentmodel is optimal for the multi-componentmodel too.

If the only (or most limiting) herd restraint isa milk quota, the situation is much morecomplicated. Since the most limiting restrictionis a fixed amount of milk to produce, therelevant criterion of optimality is now themaximization of average net revenues per kgmilk yield as expressed in criterion (5), be-cause a policy that maximizes net revenues perkg milk will also maximize total net revenuesfrom the herd which was assumed to be theobjective of the farmer.

By following a policy which is optimal accor-ding to criterion (5) we assure at any time thatthe cows which produce milk in the cheapestway are kept. Thus the problem of selectingwhich cows to keep in the long run (and themutual ranking of cows) is solved, but theproblem of determining the optimal number ofcows in production at any time is not solved.If for instance, it is recognized 2 monthsbefore the end of the quota year that the quotais expected to be exceeded by 10 percent, wehave to choose whether to reduce the herd sizeor to keep the cows and pay the penalty. Theproblem is that both decisions will influencethe possibilities of meeting the quota of thenext year in an optimal way. To solve thisshort run quota adjustment problem we need atrue multi-component model.

An other example of a herd restraint is a

limited supply of heifers. If the dairy farmeronly uses home-grown heifers for replacement,the actions concerning individual cows becomeinter-dependent, and again a multi-componentmodel is needed in order to solve the replace-ment problem. Ben-Ari and Gal (1986) andlater Kristensen (1992) demonstrated that thereplacement problem in a dairy herd with cowsand a limited supply of home grown heifersmay be formulated as a Markov decisionprocess involving millions of states. Thismulti-component model is based on a usualsingle-component Markov decision processrepresenting one cow and its future successors.Even though the hierarchic technique has madethe solution of even very large models pos-sible, such a model is far too large for op-timization in practice. Therefore, the need foran approximate method emerged, and a methodcalled parameter iteration was introduced byBen-Ari and Gal (1986).

The basic idea of the method is to approximateeither the present value function fi(n) (objectivefunction (3)) or the relative values fi

s (objectivefunctions (4) and (5)) by a function G invol-ving a set of parameters a1,...,am to be determi-ned in such a way that G(i,a1,...,am) ≈ fi(n) orG(i,a1,...,am) ≈ fi

s .

In the implementation of Ben-Ari and Gal(1986) the parameters were determined by aniterative technique involving the solution ofsets of simultaneous linear equations generatedby simulation.

In a later implementation Kristensen (1992)determined the parameters by ordinary leastsquares regression on a simulated data set. Thebasic idea of the implementation is to takeadvantage from the fact that we are able todetermine an optimal solution to the underlying(unrestricted) single-component model. If noherd restraint was present, the present value ofthe multi-component model would equal thesum of the present values of the individualanimals determined from the underlying single-component model. Then it is argued in what

Page 22: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

way the restraint will logically reduce the(multi-component) present value, and a func-tional expression having the desired propertiesis chosen. The parameters of the function areestimated from a simulated data set, and theoptimal action for a given (multi-component)state is determined as the one that maximizesthe corrected present value. (A state in themulti-component model is defined from thestates of the individual animals in the single-component model, and an action defines thereplacement decision of each individual ani-mal).

Ben-Ari and Gal (1986) compared the econo-mic consequences of the resulting optimalmulti-component policy to a policy defined bydairy farmers, and they showed that the policyfrom the parameter iteration method was better.Kristensen (1992) compared the optimal multi-component policies to policies from usualsingle-component models in extensive sto-chastic simulations and showed that the multi-component policies were superior in situationswith shortage of heifers.

The parameter iteration method has beenapplied under a limited supply of heifers. Itseems to be realistic to expect, that the methodand the basic principles of Kristensen (1992)may be used under other kinds of herd re-straints as for instance the short time adjust-ment to a milk quota as mentioned above.

6. General discussion

In the introduction, the main difficulties of theanimal replacement problem were identified asvariability in traits, cyclic production, uniformi-ty (the traits are difficult to define and mea-sure) and herd restraints. We are now able toconclude that the difficulties of variability andthe cyclic production are directly solved by theapplication of Markov decision programming,but when the variability of several traits areincluded we face a problem of dimensionality.The formulation of the notion of a hierarchicMarkov process contributed to the solution of

the dimensionality problem, but did not solveit. The upper limit of number of states to beincluded has been raised considerably, but noteliminated.

This is for instance clearly illustrated when weformulate multi-component herd models inorder to deal with herd restraints. In that casewe still have to use approximate methods todetermine an "optimal" replacement policy. Onthe other hand it has been demonstrated byKristensen (1992) that the parameter iterationmethod applied to a multi-component herdmodel (even though it is only approximate) isable to improve the total net revenue comparedto the application of a usual single-component(animal) model in a situation with shortage ofheifers. The parameter iteration method is animportant contribution to the problem of deter-mining optimal replacement policies under herdrestraints.

In other situations with a limiting herd restraintit may be relevant to use an alternative cri-terion of optimality maximizing average netrevenue per unit of the limiting factor. Thismethod has been successfully applied in asituation with milk production under a limitingquota.

Recent results have also contributed to thesolution of the uniformity problem. The Baye-sian updating technique described in Section 4seems to be a promising approach, but it hasnot yet been tested on real data. It might be asolution to the problem of including animalhealth as a trait to be considered. The problemof including diseases in the state space hasnever been solved, but at present Houben et al.(1992) are working on it. As concerns othertraits such as litter size or milk yield the Baye-sian approach may in some cases result in areduction of the state space without loss ofinformation (Jørgensen, 1992a; Kristensen,1993). Thus it contributes indirectly to thesolution of the dimensionality problem.

Page 23: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

References

Ben-Ari, Y. and Gal, S. 1986. Optimal replacement

policy for multicomponent systems: An appli-

cation to a dairy herd. European Journal of

Operational Research 23, 213-221.

Ben-Ari, Y., Amir, I. and Sharar, S. 1983. Opera-

tional replacement decision model for dairy

herds. Journal of Dairy Science 66, 1747-1759.

Giaever, H. B. 1966. Optimal dairy cow repla-

cement policies. Ph. D. dissertation. Ann Arbor,

Michigan: University Microfilms.

Harrison, P. J. and Stevens, C. F. 1976. Bayesian

forecasting. Journal of the Royal Statistical

Society B 38, 205-247.

Houben, E. H. P., Dijkhuizen, A. A. and Huirne, R.

M. B. 1992. Knowledge based support of treat-

ment and replacement decisions in dairy cows

with special attention to mastitis. Proceedings of

the SKBS/MPP Symposium ’92, March 11, Zeist,

The Netherlands.

Howard, R. A. 1960. Dynamic programming and

Markov processes. Cambridge, Massachusetts:

The M.I.T. Press.

Howard, R. A. 1971. Dynamic probabilistic sy-

stems. Volume II: Semi-Markov and decision

processes. New York: John Wiley & Sons, Inc.

Huirne, R. B. M., Hendriks, Th. H. B., Dijkhuizen,

A. A. and Giesen, G. W. J. 1988. The economic

optimization of sow replacement decisions by

stochastic dynamic programming. Journal of

Agricultural Economics 39, 426-438.

Huirne, R. B. M., van Beek, P., Hendriks, Th. H.

B. and Dijkhuizen, A. A. 1992. An application

of stochastic dynamic programming to support

sow replacement decisions. European Journal of

Operational Research. In press.

Jenkins, K. B. and Halter, A. N. 1963. A multistage

stochastic replacement decision model: Applica-

tion to replacement of dairy cows. Technical

Bulletin 67, Agr. Exp. Sta., Oregon State Uni-

versity.

Jewell, W. 1963. Markov renewal programming I

and II. Operations Research 11, 938-971.

Jørgensen, E. 1992a. Sow replacement: Reduction

of state space in Dynamic Programming model

and evaluation of benefit from using the model.

Submitted for journal publication.

Jørgensen, E. 1992b. The influence of weighing

precision on delivery decisions in slaughter pig

production. Acta Agriculturæ Scandinavica. Sec-

tion A, Animal Science. Submitted for publica-

tion.

Kennedy, J. O. S. 1981. Applications of dynamic

programming to agriculture, forestry and fisheri-

es: Review and prognosis. Review of Marketing

and Agricultural Economics 49, 141-173.

Kennedy, J. O. S. 1986. Dynamic programming.

Applications to agriculture and natural resour-

ces. London and New York: Elsevier applied

science publishers.

Killen, L. and Kearney, B. 1978. Optimal dairy cow

replacement policy. Irish Journal of Agricultural

Economics and Rural Sociology 7, 33-40.

Kristensen, A. R. 1987. Optimal replacement and

ranking of dairy cows determined by a hierar-

chic Markov process. Livestock Production Sci-

ence 16, 131-144.

Kristensen, A. R. 1988. Hierarchic Markov pro-

cesses and their applications in replacement mo-

dels. European Journal of Operational Research

35, 207-215.

Kristensen, A. R. 1989. Optimal replacement and

ranking of dairy cows under milk quotas. Acta

Agriculturæ Scandinavica 39, 311-318.

Kristensen, A. R. 1991. Maximization of net reve-

nue per unit of physical output in Markov deci-

sion processes. European Review of Agricultural

Economics 18, 231-244.

Kristensen, A. R. 1992. Optimal replacement in the

dairy herd: A multi-component system. Agricul-

tural Systems 39, 1-24.

Kristensen, A. R. 1993. Bayesian updating in

hierarchic Markov processes applied to the ani-

mal replacement problem. European Review of

Agricultural Economics 20. In press.

Kristensen, A. R. and Thysen, I. 1991. Economic

value of culling information in the presence and

absence of a milk quota. Acta Agriculturæ Sca-

ndinavica 41, 129-135.

Kristensen, A. R. and Østergaard, V. 1982. Op-

timalt udskiftningstidspunkt for malkekoen fast-

lagt ved en stokastisk model. (In Danish with

English summary). Beretning fra Statens Hus-

dyrbrugsforsøg 533.

McArthur, A. T. G. 1973. Application of dynamic

Page 24: Markov decision programming techniques applied to the ...II. A survey of Markov decision programming techniques applied to the animal replacement problem. III. Hierarchic Markov processes

A.R. Kristensen/A survey of Markov decision programming techniques

programming to the culling decision in dairy

cattle. Proceedings of the New Zealand Society

of Animal Production 33, 141-147.

Reenberg, H. 1979. Udskiftning af malkekøer. En

stokastisk udskiftningsmodel. (In Danish with

English summary). Memorandum 6, Jordbrugs-

økonomisk Institut.

Ross, S. M. 1970. Applied probability models with

optimization applications. San Francisco, Cali-

fornia: Holden-Day.

Smith, B. J. 1971. The dairy cow replacement

problem. An application of dynamic programmi-

ng. Bulletin 745. Florida Agricultural Experi-

ment Station. Gainesville, Florida.

Stewart, H. M., Burnside, E. B. and Pfeiffer, W. C.

1978. Optimal culling strategies for dairy cows

of different breeds. Journal of Dairy Science 61,

1605-1615.

Stewart, H. M., Burnside, E. B., Wilton, J. W. and

Pfeiffer, W. C. 1977. A dynamic programming

approach to culling decisions in commercial

dairy herds. Journal of Dairy Science 60, 602-

617.

van Arendonk, J. A. M. 1984. Studies on the repla-

cement policies in dairy cattle. I. Evaluation of

techniques to determine the optimum time for

replacement and to rank cows on future pro-

fitability. Zeitschrift für Tierzüchtung und

Züchtungsbiologi 101, 330-340.

van Arendonk, J. A. M. 1985. Studies on the repla-

cement policies in dairy cattle. II. Optimum po-

licy and influence of changes in production and

prices. Livestock Production Science 13, 101-

121.

van Arendonk, J. A. M. 1986. Studies on the repla-

cement policies in dairy cattle. IV. Influence of

seasonal variation in performance and prices. Li-

vestock Production Science 14, 15-28.

van Arendonk, J. A. M. 1988. Management guides

for insemination and replacement decisions.

Journal of Dairy Science 71, 1050-1057.

van Arendonk, J. A. M. and Dijkhuizen, A. A.

1985. Studies on the replacement policies in

dairy cattle. III. Influence of variation in repro-

duction and production. Livestock Production

Science 13, 333-349.

van der Wal, J. and Wessels, J. 1985. Markov

decision processes. Statistica Neerlandica 39,

219-233.

White, C. C. III and White, D. J. 1989. Markov

decision processes. European Journal of Opera-

tional Research 39, 1-16.


Recommended