UNIVERSITY OF WEST BOHEMIA
FACULTY OF APPLIED SCIENCES
Software Analysis of Bayesian DistributedDynamic Decision Making
A thesis submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Prague, 2005 Václav Šmídl
Summary
Decision making is an active and purposeful selection of actions among several alternative options.
For humans, DM is a natural part of everyday life. The Bayesian theory provides a rigorous and
consistent tool to help the decision maker to select the best action to achieve his aim. A significant
application area of the decision-making theory is the control theory. Most of the applications of the
theory are based on two assumptions: (i) the optimal decision is the only action that intentionally
influences the response, (ii) the decision-maker pursue only one aim which is known a priori. A
theory of distributed Bayesian decision-making—which relax the above mentioned assumption—is
still under development.
This thesis is a contribution to a wider project in creation of consistent theory of distributed
Bayesian decision-making, using the concept of multiple-participant decision-making. The main
concern of this work is preparation of a new software framework for development and application
of the Bayesian distributed decision making theory. In order to achieve this aim we have done the
following:
Chapter 2: the requirements on the resulting software were formalized, and the most prominent
freely available software packages were reviewed in the light of our requirements. It was
concluded that none of the packages is suitable for our needs and that it is necessary to create
a new one. We have chosen the object-oriented (OO) approach as a design method of the
toolbox. Since the main development platform for the project is Matlab and ANSI C, we have
proposed a novel approach of implementation of OO software in these tools.
Chapter 3: the basics of Bayesian decision-making theory were reviewed in this Chapter. We have
presented well known results, as well as new emerging methods and translated them into a
sequence of basic probabilistic operations, which are suitable for software implementation.
Chapter 4: it is well known that the Bayesian theory of decision making is computationally tractable
only under certain assumptions. Many approximate techniques were developed for model fam-
ilies for which the general Bayesian DM is not analytically tractable. These techniques were
also reviewed in this Chapter. Special attention was paid to the Variational Bayes technique,
which is based on the assumption of conditional independence. The basic tasks of decision-
making for this approximation were introduced.
Chapter 5: the basic steps of implementing DM theory in practice—gained from the experience
i
Summary
with singleparticipant DM—were reviewed in this Chapter. Majority of these steps is con-
cerned with translating real-world experience into abstract objects of the theory. The algo-
rithms of DM can be applied only when those objects are chosen and fixed.
Chapter 6: the basic practical steps of design of a single-participant DM were reviewed in the light
of multiple-participant scenario in this Chapter. An original concept of Bayesian MP DM is
presented in this Chapter. It was shown, that many sub-tasks of the MP DM (such as merging)
has already been addressed in the design tasks of single-participant DM.
Chapter 7: the core contribution of the thesis—i.e. analysis of new-generation software framework—
is presented in this Chapter. Since all tasks of DM are implemented in terms of probability
calculus, the most challenging task was to design the basic classes for random variables, func-
tions and pdfs. The chosen approach appears to be very perspective, as it embraces the classical
models, as well as the new approximative models based on conditional independence.
Conclusions and suggestions for further work, are presented in Chapter 8.
ii
Acknowledgement
This thesis was prepared during my study at Faculty of Applied Sciences, University of West Bo-
hemia in Pilsen, Czech Republic. It is based on the research work carried out in the Adaptive Systems
Department, Institute of Information Theory and Automation, Academy of Sciences of the Czech
Republic.
I am grateful to Ing. Miroslav Kárný, DrSc., and Doc. Ing. Jirí Cendelín, CSc. for being my
supervisors for this thesis, for their support and inspiration.
I would like to thank my family, for their love and support over the years.
The financial support of the projects GACR 102/03/0049 and AVCR 1ET 100 750 401, BADDYR,
is gratefully acknowledged.
iii
Acknowledgement
iv
Contents
Summary i
Acknowledgement iii
Notational Conventions xi
List of Acronyms xiii
1 Introduction 1
1.1 The theory of multiple participant DM . . . . . . . . . . . . . . . . . . . . . . . . .2
1.1.1 Basic nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
1.1.2 Bayesian approach to MPDM . . . . . . . . . . . . . . . . . . . . . . . . .2
1.2 Aim of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2 Problem Formulation 5
2.1 Purpose of the software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2.1.1 Software framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
2.1.2 Software implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
2.2 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
2.2.1 Mixtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 BNT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
2.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2.3 Object-oriented approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2.3.1 Basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2.3.2 Survey of OO languages . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
2.3.3 Legacy software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
2.3.4 Object-oriented approach in Matlab and ANSI C . . . . . . . . . . . . . . .13
2.3.5 Unified Modelling Language (UML) . . . . . . . . . . . . . . . . . . . . . .15
2.3.5.1 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
2.3.5.2 Sequential diagram . . . . . . . . . . . . . . . . . . . . . . . . .16
v
Contents
3 Theory of Decision Making 17
3.1 Bayesian formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
3.1.1 Basic nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
3.1.2 Probabilistic calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
3.1.2.1 Basic elements . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
3.1.2.2 Operations on pdf . . . . . . . . . . . . . . . . . . . . . . . . . .19
3.2 Dynamic learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
3.2.1 Probabilistic models: description of reality . . . . . . . . . . . . . . . . . .21
3.2.2 Bayesian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
3.2.3 Bayesian estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
3.3 Dynamic design of control strategy . . . . . . . . . . . . . . . . . . . . . . . . . . .23
3.4 Merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
3.4.1 Direct merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
3.4.2 Indirect merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
4 Feasible Decision Making 29
4.1 Linear state-space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.1.1 Dynamic learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.1.2 Fully probabilistic design . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.1.3 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
4.2 Time-invariant exponential family models . . . . . . . . . . . . . . . . . . . . . . .33
4.2.1 The models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
4.2.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
4.2.3 Fully probabilistic design . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
4.2.4 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
4.3 Distributional approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
4.3.1 Certainty equivalence approximation . . . . . . . . . . . . . . . . . . . . .37
4.3.2 Laplace’s approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
4.3.3 Fixed-form minimum distance approximation . . . . . . . . . . . . . . . . .38
4.3.4 Variational Bayes (VB) approximation . . . . . . . . . . . . . . . . . . . . .39
4.3.5 Markov Chain Monte Carlo (MCMC) approximation . . . . . . . . . . . . .41
4.4 Approximate Bayesian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
4.4.1 Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
4.4.2 Variational Bayes filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .42
4.5 Approximate estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
4.5.1 Bayes-closed approximation . . . . . . . . . . . . . . . . . . . . . . . . . .43
4.5.2 Projection based approach . . . . . . . . . . . . . . . . . . . . . . . . . . .44
4.5.3 On-line Variational Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . .45
4.6 Approximate design of DM Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .46
vi
Contents
5 Practical Aspects of Decision Making 47
5.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49
5.2 Prior elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
5.2.1 Elicitation of prior pdf from one source . . . . . . . . . . . . . . . . . . . .51
5.2.2 Merging of knowledge sources . . . . . . . . . . . . . . . . . . . . . . . . .52
5.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
5.4 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
5.5 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
5.5.1 Validation with fixed cutting moment . . . . . . . . . . . . . . . . . . . . .54
5.5.2 Validation with multiple cutting moments . . . . . . . . . . . . . . . . . . .55
5.5.3 Other techniques of model validation . . . . . . . . . . . . . . . . . . . . .57
5.6 Elicitation of ideal pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
5.7 Design of DM strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
5.8 Design validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
6 Multiple Participant Decision Making 59
6.1 On-line (data-processing) stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
6.2 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60
6.3 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60
6.4 Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
6.5 Design of MP decision-maker . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62
7 Software Image 65
7.1 Package Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
7.2 Package Prob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
7.2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
7.2.1.1 Datatype: rv_id . . . . . . . . . . . . . . . . . . . . . . . . . . .67
7.2.1.2 Class RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
7.2.1.3 Class RVfinal (RV) . . . . . . . . . . . . . . . . . . . . . . . . .67
7.2.1.4 Class RVlist (RV) . . . . . . . . . . . . . . . . . . . . . . . . . .68
7.2.2 Functions on random variables . . . . . . . . . . . . . . . . . . . . . . . . .68
7.2.2.1 Class function . . . . . . . . . . . . . . . . . . . . . . . . . . . .68
7.2.2.2 Class ConstFn . . . . . . . . . . . . . . . . . . . . . . . . . . . .69
7.2.2.3 Class LinearFn . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
7.2.2.4 Other classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
7.2.3 Observed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
7.2.3.1 Class DataSource . . . . . . . . . . . . . . . . . . . . . . . . . .70
7.2.4 Probability density functions (pdfs) . . . . . . . . . . . . . . . . . . . . . .71
7.2.4.1 Class mPdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
7.2.4.2 Class ePdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
vii
Contents
7.2.4.3 Class oPdf (mPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .74
7.2.4.4 Class pPdf (mPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .74
7.2.4.5 Class ePdfFinal (ePdf) . . . . . . . . . . . . . . . . . . . . . . . .75
7.2.4.6 Class eEmp (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . . .76
7.3 Package FProb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
7.3.1 Linear state-space models . . . . . . . . . . . . . . . . . . . . . . . . . . .76
7.3.1.1 Class QuadraticFn (LinearFn) . . . . . . . . . . . . . . . . . . . .77
7.3.1.2 Class expQuadFn (QuadraticFn) . . . . . . . . . . . . . . . . . .77
7.3.1.3 Class mNorm (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .77
7.3.1.4 Class oNorm (oPdf,mNorm) . . . . . . . . . . . . . . . . . . . .78
7.3.1.5 Class pNorm (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . .78
7.3.1.6 Class eNorm (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . .79
7.3.2 Exponential family models . . . . . . . . . . . . . . . . . . . . . . . . . . .80
7.3.2.1 Class MultiIndexFn (function) . . . . . . . . . . . . . . . . . . .82
7.3.2.2 Class eEF (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . . . .82
7.3.2.3 Class eGW_LD (eEF) . . . . . . . . . . . . . . . . . . . . . . . .82
7.3.2.4 Class mDelta (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .83
7.3.2.5 Class mFrgEF (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .83
7.3.2.6 Class oEF (oPdf) . . . . . . . . . . . . . . . . . . . . . . . . . . .84
7.3.2.7 Class eMC (eEF) . . . . . . . . . . . . . . . . . . . . . . . . . .84
7.3.2.8 Class pMC (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .85
7.3.3 Variational Bayes approach . . . . . . . . . . . . . . . . . . . . . . . . . . .86
7.3.3.1 Class oVBnet (oPdf) . . . . . . . . . . . . . . . . . . . . . . . . .86
7.3.3.2 Class oVBpart (oEF) . . . . . . . . . . . . . . . . . . . . . . . .87
7.3.3.3 Class eVBnet (ePdf) . . . . . . . . . . . . . . . . . . . . . . . . .87
7.3.3.4 Class pVBnet (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . .88
7.4 Package SingleDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
7.4.1 Class UserInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90
7.4.1.1 Class DataInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .90
7.4.1.1.1 ChnlInfo . . . . . . . . . . . . . . . . . . . . . . . . .90
7.4.1.1.2 FilterInfo . . . . . . . . . . . . . . . . . . . . . . . . .92
7.4.1.2 Class PriorInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
7.4.1.2.1 PriKnInfo . . . . . . . . . . . . . . . . . . . . . . . . .92
7.4.1.3 Class ModelInfo . . . . . . . . . . . . . . . . . . . . . . . . . . .92
7.4.1.4 Class EFModInfo (ModelInfo) . . . . . . . . . . . . . . . . . . .93
7.4.1.5 Class MValidInfo . . . . . . . . . . . . . . . . . . . . . . . . . .93
7.4.1.5.1 Class ValInfo . . . . . . . . . . . . . . . . . . . . . . .93
7.4.1.5.2 Class CuttingVInfo (ValInfo) . . . . . . . . . . . . . . .93
7.4.1.6 Class IdealInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
viii
Contents
7.4.1.6.1 Class IdealChInfo . . . . . . . . . . . . . . . . . . . . .94
7.4.1.7 Class DesignInfo . . . . . . . . . . . . . . . . . . . . . . . . . .94
7.4.1.8 Class DValidInfo . . . . . . . . . . . . . . . . . . . . . . . . . . .94
7.4.2 Special purpose classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
7.4.2.1 Class FictOPdf (oPdf) . . . . . . . . . . . . . . . . . . . . . . . .95
7.4.2.2 Class iPdf (ePdf) . . . . . . . . . . . . . . . . . . . . . . . . . . .95
7.4.2.3 Class Simulator (DataSource) . . . . . . . . . . . . . . . . . . . .95
7.4.2.4 Class Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . .96
7.4.2.5 Class MVHypothesis (Hypothesis) . . . . . . . . . . . . . . . . .97
7.4.3 Decision Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97
7.4.3.1 Class AdaptDM . . . . . . . . . . . . . . . . . . . . . . . . . . .97
7.4.3.2 Class SingleDM (AdaptDM) . . . . . . . . . . . . . . . . . . . .98
7.5 Package MultiDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101
7.5.1 User Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101
7.5.1.1 Datatype DAEP . . . . . . . . . . . . . . . . . . . . . . . . . . .101
7.5.1.2 Class MPUserInfo (UserInfo) . . . . . . . . . . . . . . . . . . . .102
7.5.1.3 Class NeighInf . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
7.5.1.4 Class DAEPInfo (DataInfo) . . . . . . . . . . . . . . . . . . . . .103
7.5.2 Special purpose classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
7.5.2.1 Class DAEPSource (DataSource) . . . . . . . . . . . . . . . . . .103
7.5.3 Decision Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
7.5.3.1 Class MultiDM (SingleDM) . . . . . . . . . . . . . . . . . . . . .104
8 Conclusion 107
8.1 Key contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109
ix
Contents
x
Notational Conventions
Linear algebra
a,A, all mathematical variables are assumed to be multivariate, no distinction is
made between lower and upper case letters.
ai, ith element of multivariate variablea, (a is assumed to be a vector)
ai,j , (i, j)th element of matrixa
at,ai;t variablea andith element ofa in time t, respectively.t is a letter reserved
for time index. If there is more than one letter in subscript, time index is
always the last one and is separated from the orther by a semicolon.
A′ transposition of matrixA
Ir square identity matrix of dimensionsr × r
1p,q, 0p,q matrix of sizep× q with all elements equal to one, zero, respectively
tr (A) trace of matrixA
δ (x) delta-type function. Exact meaning is determined by the type of the argu-
ment,x. If x is a continuous variable, thenδ (x) is the Dirac delta function:∫x δ (x− x0) g (x) dx = g (x0) .
If x is a discrete variable, thenδ (x) is the Kronecker function:
δ (x) =
1,0,
if x=0,otherwise.
Probability calculus
Pr (·) probability of argument
f (·) probability (density) function (pdf)bIf, bof due to probabilistic description of DM, various pdfs may be defined on the
same variables for different purposes. The purpose will be denoted by the
upper-left index, e.g.I, e. Othewise, the meaning of the p(d)f is given
through the name of its argument.bwf a pdf without explicit specification of its form. This type of pdfs will be used
in functional optimization techniques.
x denotes random quantity (variable).
x∗ denotes the range ofx, x ∈ x∗.x denotes the number of members in the countable setx∗.
xi
Notational Conventions
x denotes the number of elements in the multivariate variable (array)x.
≡ means the equality by definition.
xt is a quantityx at the discrete time labelled byt ∈ t∗ ≡ 1, . . . , tt ≤ ∞ is called (decision, learning, prediction, control) horizon.
xi;t is ani-th entry of the arrayx at timet, i = 1, . . . , x.
The semicolon in the subscript indicates that the symbol following it is the
time index.
x(k . . . l) denotes the sequencext with t between time momentsk ≤ l, i.e.x(k . . . l) ≡xk, . . . , xl,
x(t) simplified notationx(t) ≡ x(1 . . . t). Specifically, fort < 1, x(t) is an
empty sequence.
x[1], f[2] in multiple-partcipant settings, this form of subscript denotes affiliation of
the given object with participant identified by the number in brackets.
N(µ, s2
)Normal distribution with mean value,µ, and variance,s2
G (α, β) Gamma distribution with scalar parametersα andβ
U (·), U ((α, β]) Uniform distribution on the argument set, on the interval(α, β], respectively
xii
List of Acronyms
AR AutoRegressive (model, process)
BNT Bayesian Networks Toolbox
DM Decision Making
EF Exponential Family
EM Expectation Maximization (algorithm)
KF Kalman Filter
KL Kullback-Leibler (distance)
MAP MaximumA PosterioriProbability
MCMC Markov Chain Monte Carlo
ML Maximum Likelihood
MP Multiple Participant
pdf probability density function
UML Unified Modelling Language
VB Variational Bayes
VEM Variational EM (algorithm)
xiii
List of Acronyms
xiv
1 Introduction
Decision making [1, 2, 3, 4, 5] is an active and purposeful selection ofactionsamong several alter-
native options. For humans, DM is a natural part of everyday life. In this text, we are concerned with
an abstract concept of decision making without distinguishing if the actions are chosen by a human
or a machine. Therefore, we use a neutral worddecision makerfor both options.
Dynamic DMarise when the decision maker is aware of dynamically delayed consequences of his
decisions, and takes these consequences in account in the DM process. Obviously, control [6, 7, 8,
9, 10, 11] can be viewed as a specific instance of dynamic DM, cf. the IEEE Series of Conferences
on Decision and Control.
Dynamic DM, and thus control, is always made under uncertainty caused by decision-maker’s
incomplete knowledge on the mechanism relating the actions and their consequences. In fact, the
ever present uncertainty is the real reason forfeedback, i.e. modification of decision maker’s actions
using the observed data. Stochastic control theory [12] and, generally, theory of statistical DM
[13, 5] models this situation. It guides in designing of aDM strategy, i.e. optimal sequence of rules
which maps the available knowledge, DM aims and observations onto actions. Often, the knowledge
is accumulated from observations made either before applying the DM strategy or even during the
course of actions [14, 15, 16]. Thus, a sort oflearningbecomes a generic part of the DM strategy.
The following assumptions are typically made in the design of the optimal DM strategies:
1. The optimized strategy is the onlysystem that intentionally influences the optimized responses.
2. Typically, only one DM aim is given, and it is known a priori.
These assumptions are too restrictive for certain type of problems. For example: multi-criterion
decision making [17], or cooperation of autonomous units (robo-football) [18].
The first of the the listed assumptions seems to be appropriate, since the optimized strategy
can handle multivariate actions. However, the computational and communicational complexity—
inherent to DM under uncertainty—makes this assumption very restrictive. The DM strategies de-
signed under it are practically feasible in relatively low dimensional problems, and the solution is far
from being scalable.
The problem is practically solved by decomposition of the whole DM problem, leading to—
necessarily approximate—distributed DM. This methodology shifts the complexity boundary of
solvable cases much further on, e.g. [19, 20, 21, 22]. At the same time, a lot of problems are still
open and it seems that there is no commonly accepted methodology how to approach the solution.
Some problems seem to be of conceptual nature [23].
1
CHAPTER 1. INTRODUCTION
The second assumption is often violated in practice and represent a real problem even in standard
DM [4, 19]. The violation is even more serious in distributed settings. It raises complexity of the DM
problem and its solution as it needs to solve co-ordination and negotiation problems. Game theory
[24] addresses the problem but the assumption of fully rational players was already questioned [25],
and also there exist very conceptual problems on negotiation [26] within the discussed formulation.
1.1 The theory of multiple participant DM
The above discussion indicates that there is an urgent need to create realistic, scalable, distributed dy-
namic decision making theory under uncertainty. We will also call this theory asmultiple-participant
decision making (MP DM).
This thesis is a contribution to a wider project in creation of consistent theory of
MP DM.
In this Section, we introduce the background of this project, the adopted approach, concepts and
methods. The aim of the thesis, within the project, will be defined in detail in the next Section.
1.1.1 Basic nomenclature
The transition from single to multiple participant decision making requires a new nomenclature to
be used. In this Section, we relate the terminology we will use in this text to the terminology used in
single participant DM, which is also commonly used in control theory.
Single controller, as a prototype of single decision maker, influences a part of real world of its
interest. It is traditionally calledthe system. In the considered multiple participant scenario, parts of
the system can be influenced by several controllers, participants in the DM process. The traditional
understanding of “the system” looses its clarity and it is reasonable to adopt the termenvironment.
This is, again, a part of the world, which is to be influenced byanyof the participants. Each partici-
pant interact with apart of the environment via (i) observations, and (ii) decisions. This is illustrated
in Figure 1.1.
The main distinction between single and multiple participant DM is the ability of participants to
communicatewith each other. If the participants are not aware of each other presence, or they do
not care about the others, they act as single decision makers, following their different and possibly
contradictory aims. In such a situation, their mutual effect is generically adverse and yields poor
overall performance.
1.1.2 Bayesian approach to MPDM
The intended Bayesian theory [27] treats the task of distributed DM as a task of DM with multiple
individual decision makers (participants), which have:
1. individual aims,
2
1.1. THE THEORY OF MULTIPLE PARTICIPANT DM
System 1
Controler 1
data
actions
System 2
Controler 2
dataactions
Single participant DM
Participant 1
dataactions
Participant 2
dataactions
communication
Enviroment
Multiple participant DM
Figure 1.1:Relation of single and multiple participant DM.
2. pre-determined abilities to observe, act, evaluate andcommunicatewith other participants.
Thus the problem is reformulated as many parallel single participant DM tasks withnon-standard
but very realistic assumptions. This approach guaranteesa priori the full scalability of the distributed
DM.
Thus, many results from the single participant DM theory can be used. The following ideas forms
basis of the approach:
• The normative (prescriptive) theory is searched for. General results on DM under uncertainty
[1, 13, 5, 28] imply that DM of each individual participant, is to be guided by the Bayesian
DM paradigm.
• The rigorous, fully probabilistic formulation, is used to design the DM strategy for a given
probabilistic model [29, 30, 28, 31].
The extension of such a fully probabilistic decision maker to multiple participant scenario should
be formulated in the same (i.e. probabilistic) terms. This implies that the coordination—needed in
distributed setting—is reduced toreporting probabilities between a small set of reachable neighbors.
Therefore, two additional problems—which are not addressed by the centralized Bayesian DM—
have to be solved:
1. A mechanism for coordination of actions of a participant with its neighbors have to be de-
signed. The solution of this problem is foreseeable: the participant has to share and harmonize
knowledge and aims with its neighbors. Using fully probabilistic formulation of DM, the
problem reduces to merging and extension of probability distributions in the vein discussed in
[32].
2. The use of the above mentioned communication mechanism can be seen as a new decision
making problem to be solved by each participant. Therefore, each participant have to design
a corresponding strategy of communication. The paper [27] indicates that the number of such
types is very limited (selfish, cooperating and hierarchically cooperating participants), hence,
the design of adequate strategies is feasible.
3
CHAPTER 1. INTRODUCTION
Preliminary results of this research can be found in [33].
1.2 Aim of the thesis
One of the goals of the project is to apply the emerging theory to a set of real problems. In order to
achieve this goal, it is necessary to create a reliable software image of the theory. None of the avail-
able single participant DM software frameworks is easily extensible to deal with multiple participant
scenarios.
The aim of this thesis is to prepare a new software framework for development
and application of the Bayesian distributed decision making theory.
This task is challenging because the theory is not fully developed and stabilized. The software
should help in developing this theory and its parts as well as in transferring the results to various
application domains.
In order to achieve the overall aim, we define the following subtasks:
1. Formalize the requirements on the resulting software. These requirements can arise from the
considered application areas, theoretical background, or researchers involved in the project.
2. Review the available software tools in the light of these requirements.
3. Review the latest theoretical results and methods that should be supported by the framework.
4. Design the framework.
5. Demonstrate that the framework embraces state-of-the-art decision making problems.
These tasks will be addressed in the sequel as follows:
Chapter 2 defines the addressed problem. It contains formalization of the requirements, review of
the available software, and description of the chosen implementation.
Chapter 3 reviews the basic concepts of Bayesian theory of decision making,
Chapter 4 analyzes the general theory from computational point of view. Computationally feasible
models—i.e. models with exact or approximate solution of the DM problem—are reviewed.
Chapter 5 introduces some aspects of DM that are important for application of the theory to a real
problem. Experience accumulated in long-term research of single-participant decision making
is reviewed here.
Chapter 6 discuss the implications of the extension to multiple-participant setting.
Chapter 7 presents the main result of the thesis, i.e. the analysis of the software image of the theory
described previously.
4
2 Problem Formulation
The theory of statistical decision making was developed in [1], elaborated to engineering form by
[14, 15] and updated in [28]. Translation of the theory into software is a challenging task, since the
theory describes real world in terms of abstract mathematical structures, such as density functions
and functionals. The process of decision making is then defined in terms of operations on these
structures. The set of all possible structures is extremely rich and operations over many of them
are not computationally tractable. Therefore, each attempt at software analysis of the theory has to,
inevitably, restrict its scope to a certain, computationally tractable, sub-set. This initial restriction is a
very important step, since it represents a trade-off between (i) modelling abilities, and (ii) simplicity
of implementation of the software.
In this Chapter, we define the aim of the software and the requirements imposed on it. These
requirements will be used as guidelines in designing the software.
2.1 Purpose of the software
This work is being carried within a collaborative research environment with a long history. Any
complex software resulting from a long-term research follows naturally several, often contradictory,
aims. The first step of the design is, therefore, to identify these aims and their importance. These are
summarized here:
1. Inspection of the theory
The basic problems of DM were solved on the general level a long time ago. However, the
resulting operations were found to be computationally tractable only for a small sub-sets of
mathematical models. For thesefeasiblemodels (e.g. ARX models, state-space models, dis-
crete Bayesian networks) the DM has matured into reliable and practically applicable algo-
rithms. These disjoint sub-sets are bounded by analytical tractability. Worldwide, a lot of
effort has been directed to lifting these boundaries and important advances has been made in
approximation theory and its application to DM. Therefore, we intended to review the state-
of-the-art techniques and draw new borders of feasibility.
2. Establish a basis for long term research
In spite of the fact that general solutions of the DM are known for a long time, there are—
and will be—many detailed issues that are not satisfactorily resolved. Therefore, the software
should be open to further extensions in such a way that the involved researchers will actively
5
CHAPTER 2. PROBLEM FORMULATION
deal with sub-parts of the software and will act as passive users of its rest. Practically, the
software should consider even tasks (operations, functions) for which an algorithmic solution
is not yet known. If this can be achieved, it will lay basis for a long term research, where the
attention can be focused on particular problem and not on re-implementation of the overall
framework.
3. Unification of existing software applied to wide range of real-life problems.
At present, there are many software packages implementing (to a certain level) the DM for a
particular class of models in particular application areas. The newly designed software should
be general enough to cover at least the same range of problems and ideally, utilize maximum
of the experience accumulated within these projects.
To address these aims in more detail, it is useful to distinguish two principal parts of the software
package:
Framework is a general description of the distributed dynamic DM. It specifies (i) data structures,
and (ii) algorithms.
Implementation of the framework in a programming language.
Specification of the framework should be independent of its implementation. This will be achieved
by the use of general modelling language in which the data structures and algorithms will be de-
scribed. Various implementations of the framework may arise. These implementations may be
application-specific with different intellectual property rights. However, all implementation should
follow the framework specifications to be mutually (almost) compatible.
2.1.1 Software framework
As mentioned in the previous Section, the framework will be shaped according to the theory. How-
ever, full generality of the theory cannot be captured by any software. Inevitably, we have to restrict
our support of the DM problem to a suitable class of mathematical models of the environment and
the aims of DM. Therefore, we seek such a class of models that is as general as possible, but at the
same time computationally tractable and applicable in real-life.
In this Section, we summarize the necessary requirements for a candidate families:
Requirement 2.1 (Requirements on software framework)The considered framework should sup-
port:
1. Multivariate dynamic models; the environment we intend to work with is expected to have:
• both discrete and continuous variables, with mutual dependencies between them,
• dynamic nature, i.e. present behavior depends on the previous observations.
6
2.1. PURPOSE OF THE SOFTWARE
The chosen class of mathematical models must support these properties.
2. Lego-like concept; the software framework should provide basic building blocks that can be
seamlessly composed into complicated structures. These blocks should:
• cover data structures corresponding to basic structural elements in the theory,
• include composition tools corresponding to operators in the theory,
• allow easy addition of new types of all elements (within the framework).
Partially composed elements should be ready for particular tasks. This may be achieved using
object-oriented (OO)approach to software design.
3. Design of DM strategies; the nature of MPDM requires participants to design their DM strate-
gies as follows:
• Since aims of the DM may be changed on-line, each participant must be able to re-
evaluate its strategy recursively.
• Communication with other participants is also a DM problem, hence, each participant
may change its communication strategy at any time.
4. User interface; useris a human being who determines the desired behaviour of the whole
MP DM scenario. Therefore, the software framework should provide tools of interaction with
non-expert users:
• Description of the DM problem, i.e. its aims, available knowledge, used model and con-
straints should be made in user’s terms independently of the processing method.
• Presentation of results has to be close to the application domain.
• Processing outputs have to support “publication-technological" line.
• The choice among alternative processing and the corresponding tuning knobs has to be
left to expert only. Meaningful defaults have to be built in.
2.1.2 Software implementation
The software is to be used in full-scale applications, which induces high requirements for quality and
maturity of the code. The following points seem to be indispensable.
Requirement 2.2 (Requirements on Implementation)The supported development platform should
be:
1. numerically stable and efficient; which is important in industrial applications,
2. portable to a wide range of platforms; it should run on anything from a supercomputer to an
industrial micro-controller,
7
CHAPTER 2. PROBLEM FORMULATION
3. suitable for implementation of object-oriented algorithms; which is necessary for seamless
implementation of the framework which will be defined using object-oriented methodology,
4. able reuse the code that is already available; most of the development in the area was done
in Matlab and C. The new tool should be easily connectible to these tools,
5. economically affordable; applications of the framework in non-for-profit organizations is also
considered, therefore we should not rely on any expensive proprietary tools,
6. user-friendly; allowing easy testing of new algorithms and tuning knobs,
Traditionally, the development was done sequentially in: (i) Matlab, for rapid development and
testing, (ii) pure ANSI C, for portability and implementation in Matlab-free applications, and (iii)
Mex files, for connectivity of Matlab and ANSI C. This chain will be discussed later in Section 2.3.3.
2.2 State of the art
In this Section, we review the existing solutions in the light of the requirements described above.
We have tested software packages from the following research areas:
Optimal advising: packageMixtools,
http://guest:[email protected]:1800/svn/mixtools/ ,
Bayesian networks: packageBNT,
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html ,
Graphical models: projectgR,
http://www.r-project.org/gR/ ,
Bayesian neural networks: projectfbm,
http://www.cs.toronto.edu/~radford/fbm.software.html ,
Nonlinear filtration: projectnftool, http://control.zcu.cz/nftools/ ,
andrebelhttp://choosh.ece.ogi.edu/rebel/
Bayesian decision making: projectIND,
http://ic.arc.nasa.gov/ic/projects/bayes-group/ind/
Multi-agent systems: projectJADE, http://jade.tilab.com/ ,
this project is one example from the family of multi-agent systems. Multi-agent systems form
a large area of researchwww.multiagent.com , with defined standards for interoperabil-
ityhttp://www.fipa.org/ . However, the specification of agents do not provide any
guidance on the choice of the decision making methodology. Therefore, it can not be com-
pared with other tools from this point of view. The value of multi-agent systems—from our
8
2.2. STATE OF THE ART
project framework implementationmultivariate lego-like recursive user
dynamic models concept DM interfacemixtools +/− +/− + + Matlab,C
BNT + + +/− + Matlab,C++gR + + − − Sfbm + +/− − − ANSI C
nftool +/− +/− − − MatlabIND − − +/− − C
where + denotes full support,+/− partial support,− no suport of the given feature
Table 2.1:Review of available software packages for Bayesian Decision Making.
point of view—is in the definition of communication protocols such as: request interaction
protocol,http://www.fipa.org/specs/fipa00026/ .
Here, we have selected only the most advanced, freely available tools. There are many more software
projects in areas of state-space modelling and Bayesian estimation, see e.g. survey onhttp://
leuther-analytics.com/bayes/free-bayes-software.html . However, the num-
ber of tools for Bayesian decision-making is rather limited. Evaluation of all projects with respect to
our requirements (Requirements 2.1, and 2.2) is briefly summarized in Table 2.1.
Most of the projects are implemented in Matlab or C which is a combination that will be studied in
Section 2.3.3. However, implementation is the less important factor, since most of the packages fails
our requirements on the framework. All the studied tools have some advantages and disadvantages,
the most common disadvantage is that the application area of each tool is too narrow for our purpose.
All projects do well within their field of expertise, but none of them fulfills all of our requirements.
From those tools, we choose Mixtools and BNT for further detailed analysis, since these meet
most of our requirements.
2.2.1 Mixtools
Mixtools is a Matlab Toolbox developed in our department specifically for the purpose of optimal
Bayesian advising [34, 28, 35].
The model: The basic observation model considered within this toolbox is a mixture of ARX mod-
els (autoregressive models with exogenous input). Both continuous and discrete observations
are supported via mixture of Gaussian, and Markov chain regression models, respectively.
Lego-like concept: The basic structure is a mixture of ARX (or Markov) models. It is possible
to define new-type of components in the mixture, however it requires relatively lot of effort.
Composition tools are ready only for creation and manipulation with mixtures. Another dis-
advantage of this package is centralized data-handling mechanism.
9
CHAPTER 2. PROBLEM FORMULATION
Decision making: the DM strategy is designed usingfully probabilistic approach. This approach
is capable of both, recursive evaluation of DM strategy, and formalization communication
strategy as DM problem.
User interface: many tools for support of non-informed user are available.
Implementation: the toolbox is is implemented in Matlab, Mex, and ANSI C programming envi-
ronments (see Section 2.3.3). This type of implementation ensures both (i) ease of develop-
ment within Matlab, and (ii) portability and industrial applicability through ANSI C.
2.2.2 BNT
The BNT package implements state-of-the-art algorithms for Bayesian networks [36]. This toolbox
has two principal distinctions from other tools typically used in the area of graphical and Bayesian-
network modelling. First, autoregressive (i.e. dynamic) models are considered. Second, it includes
basic support for decision making.
The model: the basic model is a dynamic Bayesian network. In principle, it is a pdf restricted
by the assumptions of conditional independence between some variables. These assumptions
are described in terms of graph, where pdfs are nodes in the graph, and edges denote mutual
dependence of nodes. Currently, the following types of nodes are supported: Gaussian models,
hidden Markov models, perceptron neural networks, and discrete models.
Lego-like concept: the conditional independence assumption is an excellent tool for separation
of basic building blocks and their composition. Namely, nodes on the graph represent the
basic building blocks, and the graph (network) of their dependence is the composition tool.
This way, complex structures can be easily created.
Decision making: is made via so called utility functions, which are assigned to each node in the
graph. This mechanism supports both, recursive evaluation of DM strategy, and formaliza-
tion of the communication strategy as DM problem. However, it is readily available only for
one-step-ahead DM. Extensions to longer DM horizon is possible, however, it would be com-
putationally inefficient, since it requires to build the network for all variables within the DM
horizon.
User interface: is quite limited. Only expert users of the toolbox are supported. However, addi-
tional projects are trying to fill the gap, such as BNTeditorhttp://bnt.insa-rouen.
fr/BNTEd.html .
Implementation: the toolbox is primarily written in Matlab, however, a preliminary port to C++
is available at Intel,http://www.intel.com/research/mrl/pnl/ .
10
2.3. OBJECT-ORIENTED APPROACH
2.2.3 Summary
None of the currently available toolboxes for Bayesian decision making matches our requirements
(Requirements 2.1, and 2.2). Therefore, in this text, we develop an analysis of an the desired toolbox
for distributed Bayesian decision making.
We intend to exploit as much experience accumulated in the current software packages as possible.
We will draw inspiration from both: Mixtools, and BNT packages. The Mixtools are more mature
in their technology of implementation, and design of DM strategy. On the other hand, the range of
models supported by BNT is impressive and unrivaled.
2.3 Object-oriented approach
The basic requirements of the framework—namely extensibility, flexibility and intuitiveness of its
use—have been raised in computer science many time before. One approach that was designed to
meet these requirements is known asobject-oriented (OO)approach [37].
2.3.1 Basic principles
The OO approach introduces the following principles:
Encapsulation: the data fields and the relevant methods. Data fields—known asattributes—
of an object can be accessed or modified only by the corresponding procedure—known as
operations—that is encapsulated in the same object.
This principle assuresflexibility of the code because implementation of the object can change
(within reasonable bounds) without changing the interface visible to callers.
Inheritance: a new object is defined as an extension of another, already existing, object. The new
object inherits attributes and operations of the old one, however it is free to redefine the original
operations, or declare new operations.
This principle assuresreuseof the existing code. It also simplify maintainability of the code
and enhance its readability.
Polymorphism: is the ability to work with similar but different objects as if they were the same.
This principle enhancesaccessibilityof the code for non-expert programmers, since the num-
ber of concepts and identifiers is significantly reduced.
These principles can be seen as guidelines for definition of the framework. However, many program-
ming languages (OO languages) have been designed, with explicit support for these principles. This
means that the use of these principles is enforced by the compiler.
11
CHAPTER 2. PROBLEM FORMULATION
numerical stability portability OO code economically userand efficiency approach reuse affordable friendly
Matlab +/- +/− − + − +ANSI C + + − + + −
C++ + +/− + − + −JAVA − +/− + − + +/−
Table 2.2:Comparison of programming languages.
2.3.2 Survey of OO languages
We have tested some OO languages in the light of our requirements on implementation (Require-
ments 2.2), and compared them to our traditional languages: Matlab and ANSI C. The results are
summarized in Table 2.2. Here, we comment on each of the studied languages:
Matlab: has some form of support for object-oriented approach, however, its support is very poor
and inefficient. Therefore, we can not consider Matlab as ready for OO approach. Its main
attraction is user-friendliness, its main drawback is the lack of computational efficiency and
economical affordability.
Java: Java is a popular OO programming language. It is well supported in the Matlab environment,
namely Java classes can be called from Matlab. Its main attraction is support of OO approach
and connectivity to Matlab. Its main drawback is computational efficiency.
The following experiment with library JAMA library (http://math.nist.gov/javanumerics/
jama/ ) was performed on Pentium 400MHz:
Test: 100 multiplications of matrix 100x100: on Pentium 400MHz
Results: Matlab 0.8s, JAMA 4s, calling JAMA from Matlab (>10s)
ANSI C: is a low-level programing language. Its main advantages are computational efficiency and
portability. Its main drawback is the lack of support for OO approach and user-friendliness.
However, the latter can be remedied by interoperability with Matlab via the technology of Mex
files.
C++: is a re-design of the C language to support the OO approach. Therefore is has all the ad-
vantages of ANSI C, except for portability and connection with Matlab. Its main drawback is
therefore the lack of user friendliness.
The overall conclusion is that none of the above languages is suitable for our needs. We have to
use a combination of languages to meet most of our requirements. In this situation, the requirement
on continuity of research starts to be the dominant factor in the selection process. Combination of
Matlab and ANSI C via Mex files has a long tradition within our research environment. Therefore,
we will continue to use this tool-chain.
12
2.3. OBJECT-ORIENTED APPROACH
This combination, however, does not support OO approach. As it was mentioned in Section 2.3.1,
the basic principles of OO approach can be implemented even without direct support from the pro-
gramming environment. This increases the manual labor associated with the coding, however, we
believe that if will pay-off in better computational efficiency of the resulting software.
2.3.3 Legacy software tools
In this Section, we review the process of software design used in the previous projects. The software
is implemented in three parallel code-bases:
1. Matlab M-files
2. Matlab Mex-files
3. pure ANSI C
These parallel implementations are bound together by the specifications of the framework and de-
scription. They should provide identical results (within the limits of numerical accuracy) for identical
models.
The parallel maintenance of three various code-bases is labour-expensive, but it has the following
advantages:
Matlab is used as a platform for rapid development. It is user friendly, it has many visualization
tools, and users are familiar with it.
ANSI C is used as a platform for final implementation. It is used for numerical efficiency, portability
and applicability in the industry (where Matlab is too expensive)
Mex-files provide a convenient bridge between the two environments
This strategy of development was successfully used in many projects (ABET [38], ProDaCTool [34],
DESIGNER [39, 40] etc.). Many tools for preserving consistency of the parallel implementations
were developed. However, this strategy impose a strong demands on consistency and clarity of the
framework.
2.3.4 Object-oriented approach in Matlab and ANSI C
Experiments with implementation of object-oriented (OO) principles in the ANSI C language were
already presented [41], [42]. The published approaches can not be used in our Matlab-centered
environment, however, it motivated us to implement basic OO support in the currently used well
tested and reliable tool-chain (Matlab–Mex–ANSI C). The missing OO features can be emulated by
extra tools and coding agreements.
Agreement 2.1 (OO programming in Matlab and C) The following coding agreements establish
basic support for OO approach in Matlab, Mex, and ANSI C:
13
CHAPTER 2. PROBLEM FORMULATION
1. Objects are represented by Matlab data structures with a compulsory field type for unique
identification of the class the object belongs to. Attributes of the object are fields in the struc-
ture of the corresponding type. Operations of the object are also represented by fields of in the
structure. Name of the filed is the same as name of the function and it is of the function_handle
type. (Or, for computational speed, an index into global table of function_handles).
2. Operations on objects are Matlab functions which treat its first argument as the object they are
encapsulated into. Execution of the function then extracts the corresponding function_handle
from the object structure and use it to call function which is appropriate for given object.
These simple rules ensure consistent implementation of basic OO properties as follows:
Encapsulation: is achieved by storing the object attributes and the corresponding functions (func-
tion handles) in one structure.
Inheritance: is achieved at the stage of construction of an object. The constructor calls the con-
structor of the parent object first, hence the attributes and operations of the parent are created.
After that, the constructor can add new attributes, and operations, or rewrite handle of an
existing operation by another one.
Polymorphism: is achieved by the globally-defined Matlab functions implementing object-related
operations.
The approach presented above is not a full-featured OO tool. It has many flaws and limitations, the
most important of which are:
1. A distinction between public and private methods is missing. Therefore, access to various
fields in data structures is subject to discipline of the developers.
2. Operations belonging to different classes must be all globally available. Therefore, they must
be implemented under different names, pointers to these functions (i.e. function handles) will
be stored in corresponding structures and later called via the generic function. These functions
must not be called directly by any other method. Since there are no tools to assure this, we
have to rely on discipline of the developers.
3. Matlab has limited availability of parameter-checking, therefore, checks for consistency must
be done inside of every function.
It is obvious that this approach is much more labor-intensive that build-in support from a decent OO
language. However, this is considered as a reasonable trade-off between (i) easiness of implemen-
tation, and (ii) code reuse, availability of full power of the Matlab tools, and low-level numerical
aspects. It should be remembered that the number of implemented classes and their method is ex-
pected to be relatively low, and therefore maintainable.
14
2.3. OBJECT-ORIENTED APPROACH
Figure 2.1:Introduction of UML notation: class diagrams.
2.3.5 Unified Modelling Language (UML)
The Unified Modelling Language (UML) [43] is a widely adopted powerful graphical language for
object-oriented modelling of real world and subsequent design of its software representation. UML
is a graphical language that is independent of computer language used for actual coding. In fact,
many tools supporting UML methodology have the ability to export the UML-described project into
the chosen programming language. Therefore it is a natural choice for description of the framework
(Section 2.1.1) of the designed software.
One of the key features of the UML is its universality. It is used as a tool for modelling of banking
system, Internet applications, data-store application, and many others. The price payed for this
universality is the complexity of the language. It offers range of tools, diagrams types and scenarios
that can be used for modelling of specific processes. In this text, we will use only a small sub-
set of UML tools to describe the framework. Details of algorithmic implementation be described
by "pseudo-code". Namely we use only two diagram types: (i) class diagram, and (ii) sequential
diagram.
All names of software structures are printed inbold typeface.
2.3.5.1 Class diagram
The class diagram is used for description of structure of the software, i.e. definition of data types,
and object classes with attributes and operations. Graphical semantics of class diagrams is illustrated
on a simple example in Figure 2.1.
Datatype is a basic structural element, it can have a complex inner structure, which is, however,
irrelevant in the modelled context. Therefore, each datatype is fully determined by its name.
Class is another type of structural elements which is composed of: (i)attributes, which can be
datatypes (a1anda2 in parent) or instances of a class (a3 in child), and (ii)operations, which
has access to all attributes of the class, accept additional parameters and possibly return a value
in form of a datatype or class instance (e.g. operationo1 in parent accepts argumentpar and
yields an integer return value).
The arrow between from class child to class parent means that the classparent is generalizationof
the classchild. In reverse, we say thatchild is specializationof the parent. In practice, it means
that child has all attributes and operations ofparent (i.e. a1, a2, o1). Attributes and operations
defined inchild (i.e. a3, o2) are additional to those inherited ones. If an operation is defined again
15
CHAPTER 2. PROBLEM FORMULATION
Figure 2.2:Introduction of UML notation: sequential diagrams.
(i.e. o1 in child) the new operation is replacing the original operation of the parent. In such a case,
only the nameof the method is displayed in graphical notation (seeo1 in child); its return type and
parameters are the same as those of the inherited method.
2.3.5.2 Sequential diagram
Sequential diagrams are used for description of processes and procedures. Graphical semantics of a
sequential diagram is illustrated on a simple example in Figure 2.1. It describes interaction between
instances of classes (known asobjects) in certain situations.
Graphically, life of each object is represented by a vertical dashed line. The vertical direction
denotes the time-arrow. The sequence starts on top of the diagram and ends on the bottom of it.
Horizontal arrows denote calls of function, where the arrow leads from the caller towards the called
object. Each call is named after the operation it invokes on the called object. The actual computation
within the operation is visualized by a thin rectangle on the life-axis of the object. When the opera-
tion is finished it returns its results back to the original caller. This is known as synchronous message
in the standard UML. In our work, we will use only this type of interaction. A sample sequential dia-
gram with objectsP andC—being instances of classesparent, andchild, respectively—is displayed
in Figure 2.2.
16
3 Theory of Decision Making
In this Chapter, we present the general decision-making theory. The aim of this theory is to help the
decision maker to select one action from all available options. These options are relevant to a system
(i.e. part of the real world) in two ways (i) decisions on description of the system, and (ii) decisions
influencing the system. The purpose of this chapter is to summarize the principle of decision making
and to identify the key tools that are to be mapped on the designed software.
The adopted principle of the optimal decision-making under uncertainty (Section 3.3) implies the
following important conclusion:
Incomplete knowledge and randomness have the same operational
consequences for decision-making.
Therefore, they should be treated in the same way. This is known asBayesian decision making.
The basic formalism of Bayesian DM is presented in Section 3.1, together with review of basic
probability calculus.
Typically, the process of dynamic decision making problem is decomposed into the following
sub-problems: (i) Bayesian learning [44], summarized in Section 3.2, and (ii) the design of the
optimal strategies [29, 28], summarized in Section 3.3. However, the intended extension of problem
to multiple-participants requires a new operation, namely: (iii) merging of information, as outlined
in Section 3.4.
3.1 Bayesian formalism
The conventions presented here are mostly respected in this work. If some exception is necessary, it
is explicitly explained and used just at the place of its validity. If some verbal notions are introduced
within bodies of Propositions, Remarks etc., then they areemphasized by the printthat differs from
that of the surrounding text. The basic notational symbols and rules are summarized in the table of
notational conventions on page xi.
3.1.1 Basic nomenclature
A brief characterization of the introduced notion is summarized here.
Random variable is a mapping with a numerical range, i.e. a subset of the multi-variate, real-
valued space.
17
CHAPTER 3. THEORY OF DECISION MAKING
Realization is a value of the random variable. Often, the random variable and its realization are
not formally distinguished, as is usual in the applications of probability theory. The proper
meaning is determined by the context.
Participant is an abbreviation for a participant of the decision making process. It might be a
person, mechanism, or group of persons or mechanisms.
Environment is part of the world that is of interest for a participant who should either (i) describe,
or (ii) influence it. The environment is specified with respect to the aim that the participant
wants to reach and with respect to the tools it has available.
Decision is the value of a random variable that can be directly chosen by the participant for reach-
ing its aims.
Decision rule is a mapping that transforms knowledge of a participant into a decision.
Strategy is a sequence of decision rules.
Traditionally, we distinguish the decision making strategy in two categories based on the type
of decisions they make.
Controller is a causal strategy assigning inputs that influence the environment.
Estimator is a causal strategy evaluating decisions about description of the system.
3.1.2 Probabilistic calculus
Uncertainty in the applied DM theory [28] is described by probability density functions (pdf). In this
Section, we review basic calculus with pdfs. More detailed and formal treatment can be found in
[45].
3.1.2.1 Basic elements
Probability density function (pdf) is a functionf (x) of random random variablex with the
following properties:
Non-negativity f (x) ≥ 0,
Normalization∫f (x) dx = 1.
Probability mass function is a pdf of discrete argument. In this text, no formal distinction be-
tween pdf and probability mass function is needed. We will use pdf even for discrete argu-
ments. In this way, a significant simplification and unification of all formulas can be achieved.
18
3.1. BAYESIAN FORMALISM
One only has to keep in mind that the integration has to be replaced by regular summation
wherever the argument is discrete1.
For the simplicity of explanation, we distinguish the following special cases of pdfs. Consider a
generic pdf,f (ρ), on a multivariate random variableρ ≡ (α, β, γ).
joint pdf f(α, β|γ) of α, β conditioned onγ
Is a pdf on(α, β)∗ restrictingf(ρ) on the cross-section ofρ∗ given by a fixedγ
conditional pdf f(β|α, γ) of β conditioned onα, γ
Is a pdf onβ∗ restrictingf(ρ) on the cross-section ofρ∗ given by a fixedα, γ.
Theconditioning symbol| is dropped if just trivial conditions are considered.
marginal pdf f(α|γ) of α conditioned onγ
Is a pdf onα∗ restrictingf(ρ) on the cross-section ofβ∗ given by a fixedγ with no information
onβ.
conditional independence variablesα andβ are independent under the conditionγ iff
f(α, β|γ) = f(α|γ)f(β|γ). (3.1)
3.1.2.2 Operations on pdf
For a generic pdf with multivariate argument,f (ρ) = f (α, β, γ), ρ = α, β, γ, we define the
following operations:
Normalization∫f (α, β|γ) dαdβ =
∫f (α|β, γ) dα =
∫f (β|α, γ) dβ = 1.
Chain rule f(α, β|γ) = f(α|β, γ)f(β|γ) = f(β|α, γ)f(α|γ).
Marginalization
f(β|γ) =∫f(α, β|γ) dα. (3.2)
Bayes rule
f(β|α, γ) =f(α|β, γ)f(β|γ)
f(α|γ)=
f(α|β, γ)f(β|γ)∫f(α|β, γ)f(β|γ) dβ
∝ f(α|β, γ)f(β|γ). (3.3)
Theproportion sign,∝, means that the factor independent ofβ and uniquely determined by
the normalization is not explicitly written in the equality represented.
1This can also be achieved by employment of measure theory, operating in a consistent way with probability densitiesgeneralized in the Radon-Nikodym sense [46]. The practical effect is the same and therefore is neither necessary norhelpful for our purposes.
19
CHAPTER 3. THEORY OF DECISION MAKING
Expectation of functiong (α)
Ef(α) (g (α)) ≡∫g (α) f (α) dα. (3.4)
Notation (3.4), will be simplified toEf(α) (α) ≡ α in situations where it is clear with respect
to which distribution the expectation is to be evaluated.
Pdf of transformed variables Let α be a real vector, α ≡ [α1, . . . , αα] andT = [T1, . . . , Tα]bijection (one-to-one mapping) with finite continuous partial derivatives a.e. onα∗
Jij(α) ≡ ∂Ti(α)∂αj
, i, j = 1, . . . , α, (3.5)
for all entriesTi of T and entriesαj of α.
Then,
fT (T (α))|J(α)| = f(α), (3.6)
where| · | denotesdeterminantof the matrix in its argument.
Kullback-Leibler (KL) divergence measures the proximity of a pair of pdfsf, f acting on a set
x∗. It is defined as follows [47]:
KL(f ||f
)≡∫f(x) ln
(f(x)f(x)
)dx. (3.7)
The KL divergence has the following properties:
1. KL(f ||f
)≥ 0;
2. KL(f ||f
)= 0 iff f (x) = f (x) almost everywhere;
3. KL(f ||f
)= ∞ iff on a set of a positive measuref (x) > 0 andf (x) = 0;
4. KL(f ||f
)6= KL
(f ||f
)and KL divergence does not obey the triangle inequality.
Given 4., care is needed in the syntax describingKL (·). We say that (3.7) isfrom f (x) to
f (x).
3.2 Dynamic learning
The aim of the dynamic decision making is to find the optimal DM strategy. However, if any uncer-
tainty, i.e. unobserved internal random variableΘt, is present in the controlled environment we have
to model it.
Handling of uncertainty in models of real world is a challenging problem on its own, i.e. without
the ambition of influencing the world. Bayesian treatment of this sub-problem will be addressed in
this Section. Results established in this Section will be used later for the design of control strategy.
20
3.2. DYNAMIC LEARNING
Enviroment
decision-maker
datayt
actionsut
observed datadt
internal variablesΘt
Figure 3.1:Basic DM scenario
3.2.1 Probabilistic models: description of reality
The basic scenario of decision-making is illustrated in Figure 3.1.
The most complete probabilistic description of the closed loop environment–participant is the joint
f(d(t),Θ(t)|Θ0, d (0)
)f (Θ0|d (0)) = f
(d(t),Θ(t)|Θ0
)f (Θ0)
of all random variables involved in the closed loop. In it,Θ0 is initial uncertain unobserved random
variable, calledinternal variable, andd (0) stands for the prior information available before the
choice of the first input. Habitually,d (0) is considered only implicitly.
The chain rule for pdfs [44] implies the following decomposition of the joint pdf representing the
complete probabilistic description of the closed-loop behavior:
f(d(t),Θ(t)|Θ0
)= f (Θ0)×
×∏t∈t∗
f (yt|ut, d (t− 1) ,Θ(t)) f (Θt|ut, d (t− 1) ,Θ(t− 1)) f (ut|d (t− 1) ,Θ(t− 1)) . (3.8)
The chosen order of conditioning distinguishes the following important pdfs:observation model f (yt|ut, d(t− 1),Θ(t)) ,internal model f (Θt|ut, d (t− 1) ,Θ(t− 1)) ,DM strategy f (ut|d (t− 1) ,Θ(t− 1)) .
Note that these models are conditioned on the whole observation history as well as the whole
history of internal variables. In practical situations, however, the reality has to be described by
simpler models. Therefore, we introduce the following general assumptions.
Agreement 3.1 [Reduced dependency on internal variables ]
1. Distribution of the internal Θt is determined by the current input ut, all past data d(t− 1) and
the past internal Θt−1 only, i.e.
f (Θt|ut, d (t− 1) ,Θ(t− 1)) = f (Θt|ut, d (t− 1) ,Θt−1) . (3.9)
21
CHAPTER 3. THEORY OF DECISION MAKING
2. Distribution of the observed output yt is determined by the current decision ut, all past data
d(t− 1) and the internal Θt only, i.e.
f (yt|ut, d (t− 1) ,Θ(t)) = f (yt|ut, d (t− 1) ,Θt) . (3.10)
3. Admissible decision strategies, generating the decision ut from the observed data history
d (t− 1) and ignoring the unobserved internals Θ(t− 1), are considered, i.e.
f (ut|d (t− 1) ,Θ(t− 1)) = f (ut|d (t− 1)) . (3.11)
Under these Assumptions, the closed-loop description (3.8) reduces to
f(d(t),Θ(t)|Θ0
)=∏t∈t∗
f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1) ,Θt−1) f (ut|d (t− 1)) .
(3.12)
Remark 3.1 (Model structure) The notation used here implies that the random variables in the pdf
fully determine the model. This may not be sufficient in certain situations, e.g. notation f (x) does
not distinguish between Normal or Uniform distribution of the same fixed moments. However, in
most of this text, this situation will not arise. In cases, where confusion may arise, we will use
additional conditioning on an abstract object M, denoting model structure. Hence, the distinction
(e.g. between Normal and Uniform) distribution would be denoted f (x|M1) and f (x|M2).
3.2.2 Bayesian filtering
Proposition 3.1 (Bayesian filtering in closed control loop)Let the prior pdf f(Θ0) be given and
the assumptions of Agreement 3.1 are met. Then, the pdf f (Θt|d (t)), determining the estimate of
internals, and the pdf f (Θt|ut, d (t− 1)), determining the prediction of internals, evolve recursively
as follows:
f (Θt|ut, d (t− 1)) =∫f (Θt|ut, d (t− 1) ,Θt−1) f (Θt−1|d (t− 1)) dΘt−1, (3.13)
f (Θt|d (t)) ∝ f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1))f (yt|ut, d (t− 1))
, (3.14)
f (yt|ut, d (t− 1)) =∫f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1)) dΘt. (3.15)
Proof: See, for instance, [46]
Operations in Propositions 3.1 will be known in the sequel as:time-update(3.13),data-update
(3.14), andprediction(3.15). Objects on the left-hand-side of the operations will be denoted as: the
estimate, f (Θt|d (t)), in (3.14), andpredictor, f (yt|ut, d (t− 1)), in (3.15).
Here, we note that:
22
3.3. DYNAMIC DESIGN OF CONTROL STRATEGY
The Bayesian filtering does not depend on the functional form of the used
admissible control strategy f(ut|d(t− 1))t∈t∗ , but only on the generated
inputs ut.
This will be important for design of the DM strategy.
3.2.3 Bayesian estimation
This Section deals with a special version of filtering calledestimation. It arises when the internal
variablesΘt are time invariant
Θt = Θ, ∀t ∈ t∗. (3.16)
The common valueΘ is called unknownparameter. In this case, the internal model isf(Θt|ut, d(t−1),Θt−1) = δ(Θt −Θt−1).
Hence, the time-update operation (3.13) of Bayesian filtering has the following form:
f (Θt|d (t− 1)) = [f (Θt−1|d (t− 1))]Θt−1→Θt, (3.17)
here notation[·]x→y denotes replacement of the argumentx by y.
The data-update operation (3.14) is unchanged. However, the simplified time-update (3.17) allows
to expand the recursion of data-updates into the following (non-recursive) batch variant:
f (Θ|d (t)) ≡∏τ≤t f (yτ |uτ , d(τ − 1),Θ) f (Θ)
N (d (t))≡ L (Θ, d(t)) f (Θ)
N (d (t)). (3.18)
The introducedlikelihood function
L (Θ, d(t)) ≡∏τ≤t
f (yτ |uτ , d(τ − 1),Θ) , (3.19)
evolves independently of normalization. It starts, however, from theL (Θ, d(0)) identically equal
to 1.
The normalization factorN(·) is defined by the formula
N (d(t)) =∫
L (Θ, d(t)) f (Θ) dΘ ∝ f (yt|ut, d(t− 1)) . (3.20)
With it, the predictor (3.15) can alternatively be expressed as follows:
f (yt|ut, d(t− 1)) =N (d(t))
N (d(t− 1)). (3.21)
3.3 Dynamic design of control strategy
In this Section, we summarizefully probabilistic design (FPD)of the DM strategy. This approach
is taken as the basis of multiple participant DM. It is an alternative to the standard stochastic control
23
CHAPTER 3. THEORY OF DECISION MAKING
design, which is formulated as minimization of an expected loss function with respect to decision
making strategies, e.g. [12, 10]. The standard design can be interpreted as an attempt to influence
some characteristics of closed-loop behavior be selecting an appropriate decision making strategy.
Loss function is generally deduced from a desired deterministic relationships between considered
variables and it is unrelated (at most weakly related) to random nature of the involved mappings, i.e.
time evolution (3.9) and observation models (3.10).
The FPD [29, 30], reviewed in this Section, formulates the design problem in the way that allows
the designer respect its random nature. It starts with specification of the decision-making aim in the
form of ideal pdfof the closed loop. Then, the DM strategy is chosen as a minimizer of the KL
divergence (3.7) between the observed and the ideal pdf.
The approach has the following special features.
• The KL divergence to an ideal pdf forms a special type of loss function that can be simply
tailored both to deterministic and stochastic features of the considered DM problem.
• Minimum of the KL divergence – i.e. the optimal DM strategy – is found in aclosed form.
Thus, the minimization step “disappear" from the standard pair of operations (minimization,
and expectation) that are applied sequentially when optimizing via stochastic dynamic pro-
gramming [29].
• The use of the multi-modal desired distribution provides a well justified and feasible multiple-
objective DM design [17, 48].
The ideal pdf is constructed in the way analogous to (3.12) with user-specified factors distinguished
by the superscriptbI:
bIf(d(t),Θ(t)|Θ0
) bIf (Θ0) =∏t∈t∗
bIf (yt|ut, d (t− 1) ,Θt) bIf (Θt|ut, d (t− 1) ,Θt−1) bIf (ut|d (t− 1)) f (Θ0) . (3.22)
Here pdfsbIf (yt|ut, d (t− 1) ,Θt) , bIf (Θt|ut, d (t− 1) ,Θt−1) describe the ideal models of ob-
servation and time evolution of internals andbIf (ut|d (t− 1)) the ideal DM strategy.
The prior pdf on the initial internal random variableΘ∗0 cannot be influenced by the optimized DM
strategy so that it is left to its fate, i.e.bIf (Θ0) = f (Θ0).To formulate the FPD concisely, the following shorthand notation is used below:
ft ≡ f(d(t),Θ(t)|Θ0
)f (Θ0) ,
bIft ≡ bIf(d(t),Θ(t)|Θ0
)f (Θ0) .
Under the assumptions made in Agreement 3.1, the FPD is formulated as follows.
Find admissible DM strategy minimizing the KL divergence KL(ft|| bIft
).
24
3.3. DYNAMIC DESIGN OF CONTROL STRATEGY
Proposition 3.2 (Solution of FPD) Let both the joint pdf f(Θ(t), d(t)|Θ0) and its ideal counterpartbIf(Θ(t), d(t)|Θ0) meet the Assumptions 3.1.
Then, the optimal admissible DM strategy minimizing KL(ft|| bIft
)is given by the pdfs:
bof(ut|d(t− 1)) = bIf(ut|d(t− 1))exp[−ω(ut, d(t− 1))]
γ(d(t− 1)), t ∈ t∗, (3.23)
γ(d(t− 1)) ≡∫
bIf(ut|d(t− 1)) exp[−ω(ut, d(t− 1))] dut. (3.24)
Starting on the DM horizon, t, with γ(d(t)) ≡ 1, the functions ω(ut, d(t − 1)) are generated recur-
sively for t = t, t− 1, . . . , 1, in the backward manner, as follows:
ω(ut, d(t− 1)) ≡∫
Ω(ut, d(t− 1),Θt−1)f(Θt−1|d(t− 1))dΘt−1 (3.25)
= Ef(Θt−1|d(t−1)) (Ω(ut, d(t− 1),Θt−1)) ,
Ω(ut, d(t− 1),Θt−1) ≡∫f(yt|ut, d(t− 1),Θt)f(Θt|ut, d(t− 1),Θt−1)×
ln(
f(yt|ut, d(t− 1),Θt)f(Θt|ut, d(t− 1),Θt−1)γ(d(t)) bIf(yt|ut, d(t− 1),Θt) bIf(Θt|ut, d(t− 1),Θt−1)
)dyt dΘt. (3.26)
Here, pdfs f(Θt|d(t)) have their usual meaning given by Proposition 3.1.
Proof: See [31].
Note that (3.26) can be written in terms of expected value ofγ (·), and KL divergence, as follows:
Ω(ut, d(t− 1),Θt−1) ≡ Ef(yt|ut,d(t−1),Θt)f(Θt|ut,d(t−1),Θt−1) (− ln γ(d(t)))
+ Ef(Θt|ut,d(t−1),Θt−1)
(KL(f(yt|ut, d(t− 1),Θt)|| bIf(yt|ut, d(t− 1),Θt)
))+ KL
(f(Θt|ut, d(t− 1),Θt−1)|| bIf(Θt|ut, d(t− 1),Θt−1)
). (3.27)
γ (d (t− 1)) = E bIf(ut|d(t−1))
(exp
[−Ef(Θt−1|d(t−1)) (Ω (ut, d (t− 1) ,Θt−1))
])(3.28)
Both, expectation (3.4) and KL divergence (3.7), are basic operations of probabilistic calculus (Sec-
tion 3.1.2) and should be readily available. This is important for design of the software image of this
theory.
Proposition 3.2 is the most general design scenario we consider in this work. However, for many
practical problems it can be simplified. Specifically, if we do not care about the internal variables,
the problem can be re-formulated in terms of the input-output models (3.15).
Proposition 3.3 (Data-driven FPD) Let us try to influence just the joint pdf of observed data ft ≡f(d(t)) so that it is close to its ideal counterpart bIft ≡ bIf(d(t)).
25
CHAPTER 3. THEORY OF DECISION MAKING
Then, the optimal admissible DM strategy minimizing KL(ft|| bIft
)is given by the pdfs:
bof(ut|d(t− 1)) = bIf(ut|d(t− 1))exp[−ω(ut, d(t− 1))]
γ(d(t− 1)), t ∈ t∗, (3.29)
γ(d(t− 1)) ≡∫
bIf(ut|d(t− 1)) exp[−ω(ut, d(t− 1))] dut.
Starting with γ(d(t)) ≡ 1, the functions ω(ut, d(t − 1)) are generated recursively for t = t, t −1, . . . , 1 in the backward manner, as follows
ω(ut, d(t− 1)) ≡∫f(yt|ut, d(t− 1)) ln
(f(yt|ut, d(t− 1))
γ(d(t)) bIf(yt|ut, d(t− 1))
)dyt. (3.30)
Proof: It coincides with Proposition 3.2 simplified to the case without internals.
Note, that the proved proposition covers fully probabilistic counter-part of classical dual control
[14, 15], when the environment is described up to unknown parametersΘt. In this case, it is sufficient
to run Bayesian filtering, Proposition 3.1, and to use the predictorf(yt|ut, d(t − 1)) as the model
relating inputs to outputs.
3.4 Merging of pdfs
The task of information fusion is a rich area of research used in many engineering applications, see
the Information Fusionjournal published by Elsevier. In the probabilistic paradigm, each source of
information is represented by a pdf. Thus, the task of information fusion can be translated into the
task ofmergingof pdfs [32].
The operation of merging is defined as a mapping of two pdfs into one:
f1 (Θt|d (t)) , f2 (Θt|d (t))merge−→ f (Θt|d (t)) , (3.31)
wheref1 andf2 are thesource pdfs, and thef is themerged pdf. The aim of the merging operation
is to preserve within one pdf,f , as much information from the sources,f1 andf2, as possible.
Note that, the source pdfs in (3.31) are defined on the same variable as the merged pdf, hence the
mapping will be known asdirect merging. Alternatively, the sources can be defined on the variable
in condition of the merged pdf,
f1 (d|d (t)) , f2 (d|d (t))merge−→ f (Θt|d (t)) , (3.32)
in which case, the mapping be known asindirect merging.
3.4.1 Direct merging of pdfs
The general formalization of the merging operation is still not fully stabilized. The most promising
approach to direct merging is based on minimization of weighted sum of Kullback-Leibler diver-
26
3.4. MERGING OF PDFS
gences [49], [32].
The task is formalized only for independent observations. Therefore, in this Section, all models
(3.9)–(3.11) are defined as time-invariant, i.e.f (d|Θ) for the observation model (3.10).
The merged pdff (d) is selected so that a weighted sum of Kullback-Leibler divergences between
the source pdfs and the resulting one is minimized:
f (d) = arg minf
(α2KL (f2 (d) ||f (d)) + (1− α2) KL (f1 (d) ||f (d))) . (3.33)
The optimum of (3.33), for merging of distributions of thesamevariable, is found in the form of a
probabilistic mixture of the source pdfs:
f (d) = α2f2 (d) + (1− α2) f1 (d) . (3.34)
Optimal solution for distributions with partially overlapping arguments—e.g.f (y, u) with f (y)andf (u)—is not analytically tractable. However, an iterative algorithm minimizing (3.33) can be
found [32].
Remark 3.2 (Approximations in direct merging) Note, that even for the analytical solution (3.34)
the number of components in the mixture grows with each iteration. Therefore, it may be necessary to
find a reasonable projection into finite-dimensional family. Solutions to this task are readily available
only for certain families [50, 51].
Alternatively, the problem can be formulated in the reverse KL divergence
f (d) = arg minf
(α2KL (f (d) ||f2 (d)) + (1− α2) KL (f (d) ||f1 (d))) . (3.35)
The optimum of (3.35), for merging of distributions of thesamevariable, is found in the form of a
geometric mean of the source pdfs:
f (d) = (f2 (d))α2 (f1 (d))(1−α2) . (3.36)
This solution is less optimal in the sense of statistical utility [52], but it can have computational
advantages for certain pdf-families.
3.4.2 Indirect merging of pdfs
A procedure for indirect merging is even less developed than that for direct merging. Here, we
describe the most promising approach that is being developed in the department (personal commu-
nication with J. Kracik [32]). The basic idea follows from reformulation of Bayes rule (3.3) in the
following form [53]:
f (Θ|d) ∝ f (Θ) exp(−t∫
brf(d|d (t)) ln(
1f (d|Θ)
)d d
), (3.37)
27
CHAPTER 3. THEORY OF DECISION MAKING
where brf (d|d (t)) denotes the empirical distribution on the observed data (3.38). For independent
observations, i.e. many observations of one variabled, the empirical density is defined as follows:
d (t) ⇐⇒ brf (d|d (t)) ≡ 1t
t∑i=1
δ (d− dt) . (3.38)
Hered is random variable anddt are the observed realization ofd at timet.
Equation (3.37) can be used to interpret the Bayesian estimation (Section 3.2.3) as a procedure
measuring how individual models—from considered parameterized class of pdfs—fit the empirical
density brf (d|d (t)). Equation (3.37) is valid for estimation from one source of data (i.e. one empir-
ical density). Using result (3.34) from direct merging (Section 3.4.1), we define the joint empirical
distribution as:brf (d) = α2f2 (d) + (1− α2) f1 (d) . (3.39)
Then, (3.37) can be re-written using the expectation operation (3.4) as follows:
f (Θ|d) ∝ f (Θ) exp(t(α2E brf2(d|V ) (ln f (d|Θ)) + (1− α2) E brf2(d) (ln f (d|Θ))
)). (3.40)
Hence, the merging operation has the same structure as FPD (Proposition 3.2). This will be important
in design of software structures.
From (3.40), it is possible to see that merging on the full data records is rather easy task, since it
correspond to learning on the data records. However, a new challenge arises when the data records
are incomplete. In such a case, the observation modelf (d|Θ) must be defined only on the available
subset of the data record. This can be achieved by normalization of the original observation model
[32].
In some applications, it is not feasible to operate on full-length data records, since these are ex-
tremely large. Then, we seek a suitable replacement of the empirical densitybrf (d|d (t)). Optimal
solution to this problem is not known to us. Preliminary results suggest that approximation of the
empirical density by an outer model (3.15), i.e.
brf (d) ≈ f (d) =∫f (d,Θ) dΘ,
is a reasonable option.
28
4 Feasible Decision Making
The theory of decision making, presented in Chapter 3, is formulated in terms of mathematical
objects (pdfs), and operations associated with them. The aim of this thesis is to represent these math-
ematical structures in a computer. However, it is feasible only for a subset of pdfs and operations on
them. Representation of pdfs in computers has been studied for a long time. There are two principle
approaches to the problem: (i) parametric, and (ii) non-parametric approach [54](for example). In
this work, our concern is with computational efficiency of operations with pdfs, therefore, we focus
on parametric models.
In order to achieve computational efficiency we introduce the following requirement.
Requirement 4.1 Statistics (shaping parameters) describing pdfs of a decision-maker should be
finite-dimensional, with the same dimensionality for increasing number of processed data and in-
creasing DM horizon.
This requirement has serious consequences on the DM process, since all the involved pdfs are dy-
namically evaluated via the basic operations of decision making, namely: time update (3.13), data
update (3.14), and FPD (3.27), (3.28). Hence, we require the chosen family of distributions to be
closedunder these operations. The problem has been studied theoretically [55, 56, 57], and the
following families has been found to have this property:
1. probabilistic mixture with known components but with unknown weights [55],
2. Daum Family [56], which is generalization of linear state-space models [58],
3. Exponential family (under additional assumptions) [57].
In all other families, extra approximations are required to achieve tractability, e.g. mixture of pdfs
from EF [59]. In this Chapter, we review the basic DM operations for the linear state-space model
(Section 4.1), and exponential family (Section 4.2). Then, we review the most commonly used distri-
butional approximation (Section 4.3). The use of these distributional approximations is then studied
on the problems of Bayesian filtering (Section 4.4), estimation (Section 4.5), and FPD (Section 4.6).
29
CHAPTER 4. FEASIBLE DECISION MAKING
4.1 Linear state-space models
In this Section, we study DM with linear state-space models, defined as follows:
f (Θt|Θt−1, A,B,R) = N (AΘt−1 +But, R) (4.1)
f (yt|Θt, C,D,Q) = N (CΘt +Dut, Q) (4.2)
Here (4.1) defines the internal model, and (4.2) the observation model.
In the sequel, we will assume that matricesA,B,R,C,D,Q are known. Hence, for clarity of
notation, we drop them from the conditioning of the pdfs.
4.1.1 Dynamic learning
Application of the general Bayesian filtering (Proposition 3.1) to model (4.1)–(4.2) is known as
Kalman filtering [60].
Let us assume that
f (Θt−1|d (t− 1) , ut−1) = N (µt−1,Σt−1) ,
then, the time update operation (3.13) yields the following result:
f (Θt|d (t− 1) , ut−1) = N(µt, Σt
), (4.3)
µt = Aµt−1 +But−1,
Σt = R+AΣt−1A′.
The data update operation (3.14) yields:
f (Θt|d (t) , ut) = N (µt,Σt) , (4.4)
µt = µt + ΣtC′Q−1 (yt − Cµt −Dut) ,
Σt = Σt − ΣtC′(Q+ CΣtC
′)−1
CΣt.
One-step-ahead prediction (3.15) is:
f (yt|d (t− 1) , ut) = N(Cµt +Dut, Q+ CΣtC
). (4.5)
Hence, the functional recursion (3.13)–(3.14) can be replaced by an algebraic recursion onµt, Σt.
4.1.2 Fully probabilistic design
Application of the general FPD (Proposition 3.2) is not tractable in the sense of Requirement 4.1, see
discussion at the end of this Section.
30
4.1. LINEAR STATE-SPACE MODELS
Therefore, for illustration, we consider a model with fully observed state, i.e.C = I, andQ = 0.
Then, it is necessary to choose only the following ideal pdfs
bIf (Θt|Θt−1, ut) = N (0, R) , (4.6)bIf (ut|d (t− 1)) = N (0, S) .
This choice is practically reasonable, since the ideal spread around zero state can not be lower than
that of the innovations.
In order to evaluate the FPD recursion (3.27)–(3.28), we need the KL divergence of two normal
distributions, which is [28]:
KL (N (µ1,Σ1) ||N (µ2,Σ2)) =12[ln(Σ2Σ−1
1
)− µ+ tr
(Σ1Σ−1
2
)+ (µ1 − µ2)
′ Σ−12 (µ1 − µ2)
]. (4.7)
Using (4.1) and (4.6) in (4.7), we obtain
KL(f (Θt|Θt−1, ut) || bIf (Θt|Θt−1, ut)
)=
12
(AΘt−1 +But)′R−1 (AΘt−1 +But) .(4.8)
Note that since the covariance matrices of the involved distributions are identical, only the quadratic
term in (4.7) remains in the result.
Let us assume that
− ln (γ (t)) =12Θ′tΦtΘt + zt. (4.9)
Inserting (4.8) and (4.9) into (3.27) we obtain:
Ω(ut,Θt−1) = Ef(Θt|Θt−1,ut)
(12Θ′tΦtΘt + zt
)+
12
(AΘt−1 +But)′R−1 (AΘt−1 +But) ,(4.10)
=12
(AΘt−1 +But)′ (Φt +R−1
)(AΘt−1 +But) + zt.
Note that sinceΘt is observable, it plays to role ofd (t) in conditioning of the strategy (3.23). The
optimal DM strategy (3.23) is then:
bof(ut|d(t− 1)) = bIf(ut|d(t− 1))exp[−ω(ut, d(t− 1))]
γ(d(t− 1))
= exp(−1
2(u′tS
−1ut + (AΘt−1 +But)′ (Φt +R−1
)(AΘt−1 +But)
))×(4.11)
(2π)−12u |S|−
12u exp (−zt) γ−1 (d (t− 1))
Completing squares in exp of (4.11) with respect tout we can separate (4.11) into a Gaussian distri-
31
CHAPTER 4. FEASIBLE DECISION MAKING
bution
bof(ut|d(t− 1)) = Nut (µt,Σt) ,
Σt =(S−1 +B′ (Φt +R−1
)B)−1
,
µt = ΣtB′ (Φt +R−1
)AΘt−1, (4.12)
which also determines the Bellman function
γ (d (t− 1)) = |S|12u exp (zt)
∣∣S−1 +B′ (Φt +R−1)B∣∣− 1
2u×
exp(−1
2[Θ′t−1A
′ ((Φt +R−1)
+(Φt +R−1
)BΣtB
′ (Φt +R−1))AΘt−1
]). (4.13)
Hence, the logarithm of (4.13) remains in the form of (4.9):
− ln γ (d (t− 1)) =12Θ′t−1Φt−1Θt−1 + zt−1,
Φt−1 = A′ ((Φt +R−1)−(Φt +R−1
)BΣtB
′ (Φt +R−1))A, (4.14)
zt−1 = zt +12u(ln |S| − ln
∣∣S−1 +B′ (Φt +R−1)B∣∣) . (4.15)
The obtained result is equivalent to the classical linear-quadratic (LQ) design, see [29] for details.
Remark 4.1 (FPD for unobserved state)Note that the FPD solution (3.27) for unobserved state
extends (4.10) by one extra expectation of KL divergence of the observation models. Since the
observation models are also Gaussians, it has the form of (4.7). However, a problem arise in (3.28),
namely in taking expectation over f (Θt|d (t− 1)), especially on a long horizons. From (4.4), the
form of f (Θt|d (t− 1)) as a function of h unobserved observations is a Normal distribution with
mean value as an h-order polynomial. All operations of associated with FPD are still analytically
tractable, however, their complexity is growing rapidly with the DM horizon.
This behaviour is not compatible with the requirement of feasibility
(Requirement 4.1).
Hence, we will provide only partial support to this approach, until more suitable approximations—
such as neglecting some terms in the h-order polynomial—will be found.
4.1.3 Merging
Two basic merging operations has been considered in Section 3.4, namely direct merging and indirect
merging.
Direct merging: was defined on outer observation models, i.e. in this case onbIf (dt|d (t− 1))being Gaussian distributed. Hence the merged distribution (3.34) is a mixture of Gaussians.
32
4.2. TIME-INVARIANT EXPONENTIAL FAMILY MODELS
For feasibility reasons, this distribution has to be projected into a single Gaussian using KL di-
vergence [50]. Merging of source pdfs on overlapping variables for Gaussians is not available.
For the alternative formalization (3.35), the solution is a geometric mean of Gaussians, i.e.
also a Gaussian. Hence, no further approximations are required. Moreover, this approach is
also perspective for merging of source pdfs on overlapping variables.
Indirect merging: for Bayesian filtering was not elaborated yet.
4.2 Time-invariant exponential family models
In this Section, we review the task of parameter estimation (Section 3.2.3).
4.2.1 The models
Consider the following observation model:
f (yt|ut, d (t− 1) ,Θ) = f (yt|ψt,Θ) = A (Θ) exp 〈B (Ψt) , C (Θ)〉+D (Ψt) (4.16)
where
regression vector ψt is determined by known (i.e. observed) variablesut, d(t− 1) ;
data vector Ψt = [yt, ψt] . A new transformed variableyt is defined as
yt = gt (d (t)) , (4.17)
via a known smooth one-to-one mappinggt (·)—for givenut andd (t− 1)—with a non-zero
Jacobian:
Jt =∣∣∣∣∂gt (d (t))
∂yt
∣∣∣∣ . (4.18)
A(Θ) is non-negative function defined onΘ∗
B(·), C(·) are array functions of compatible, finite and fixed dimensions. They are defined on data
vectorΨt, and internalsΘ, respectively.
D(·) is a non-negative scalar function defined onΨ∗i .
〈·, ·〉 is a functional, linear in the first argument, defined (within this text) as follows
〈x, y〉 =
x′y if x, y are vectors,′ is transposition
tr[xy] if x, y are matrices, tr is trace∑i∈i∗ xiyi if x, y are arrays with a multi-indexi,
(4.19)
Models of the form (4.16) are known as theexponential family(EF).
33
CHAPTER 4. FEASIBLE DECISION MAKING
Remark 4.2 (Exponential family for dynamic models.) The exponential family is a rather wide
family if we consider independent identically distributed observations. For example, the Poisson
distribution,
f (yt|λ) = Po (λ) =(
1yt!
)exp (−λ) exp (yt log (λ)) ,
is clearly a member of the family. However, the family embraces only a few dynamic (auto-regressive)
models, i.e. models where yt = yt (d (t− 1)). For example, a simple 2nd order auto-regressive Pois-
son distribution
f (yt|Θ, yt−1) = Po (Θ1yt−1 + Θ2yt−2)
=(
1yt!
)exp (−Θ1yt−1 −Θ2yt−2) exp (yt (log (Θ1yt−1 + Θ2yt−2))) ,
is clearly out of the family, since logarithm of a sum can not be expressed in any scalar product from.
In the auto-regressive case, i.e. with a non-empty regression vectorψ, the exponential family contains
the following special cases:
1. normal (Gaussian) linear-in-parameters models,
f (yt|Θ, d (t− 1)) = N(θψt,Ω−1
), (4.20)
where bothθ andΩ are considered unknown, i.e.Θ = [θ,Ω].
2. Markov chain models for discrete-valued variables
f (yt|Θ, d (t− 1)) =y∏i=1
ψ∏j=1
Θδ(yt−y)δ(ψt−ψ)〈yt〉,〈ψt〉 , (4.21)
where〈yt〉 denotes a unique integer number associated with each possible (discrete) state of
yt, 1 ≤ 〈yt〉 ≤ y. Hence, parameterΘ can be seen as a multi-index variable, each element
of which determines probability of realizationyt with index〈yt〉 given realization ofψt with
index〈ψt〉. The observation model (4.21) has the form of Multinomial pdf [61].
These two models are (almost) the only autoregressive members of the family. They are also the
most practically important ones.
4.2.2 Learning
For the stationary system, the time-update operation is trivial
f (Θt|d (t− 1)) = [f (Θt−1|d (t− 1))]Θt−1→Θt.
34
4.2. TIME-INVARIANT EXPONENTIAL FAMILY MODELS
Consider the previous estimate to be of the following type,
f (Θt−1|d (t− 1)) = Aνt−1 (Θ) exp 〈Vt−1, C (Θ)〉 , (4.22)
where theD (Ψt) term was eliminated by normalization.
The data-update operation yields:
f (Θt|d (t)) ∝ Aνt−1 (Θ) exp 〈Vt−1, C (Θ)〉
A (Θ) exp 〈B (Ψt) , C (Θ)〉 dΘ,
= Aνt−1+1 (Θ) exp 〈Vt−1 +B (Ψt) , C (Θ)〉 .
I.e. it is of the same form as (4.22) with algebraic recursion
νt = νt−1 + 1,
Vt = Vt−1 +B (Ψt) . (4.23)
The predictive distribution (3.15) has the following form:
f (yt|d (t− 1)) =N (Vt−1 +B (Ψt) , νt−1 + 1)
N (Vt−1, νt−1). (4.24)
Remark 4.3 (Conjugacy) Note that we made the choice of the distribution at time t − 1. The
distribution was intentionally chosen to be self-replicating under the data-update operation with the
observation model (4.16). This is known as the conjugacyprinciple [62].
For the considered special cases, the exact types of distributions (4.22) are [28]: (i) Gauss-Wishart
for linear Gaussian model (4.20), and (ii) Dirichlet for the Markov model (4.21).
4.2.3 Fully probabilistic design
General formulation of FPD for the whole family is not available. In special cases, the solution
reduces to propagation of finite dimensional Belman function similar to that in Section 4.1.
For Markov model (4.21), the Belman function can be found in the form a multi-index array,
similar to that for parametersΘ in (4.21) [28]. We do not review this special case here, since it does
not require any new operations or structures than those already used in Section 4.1.
4.2.4 Merging
Two basic merging operations have been considered in Section 3.4, namely direct, and indirect merg-
ing.
Direct merging: was defined on outer observation models, i.e. in this case onbIf (dt|d (t− 1)) of
the type (4.24). Analytical results are available only for special cases from EF. Direct merging
35
CHAPTER 4. FEASIBLE DECISION MAKING
of Gaussians was already discussed in Section 4.2.
Analytical result are, however, available for discrete pdfs (such as Dirichlet pdf) which also
belong to EF. For these pdfs, operations of algebraic (3.34) and geometric (3.36) are analyt-
ically tractable. Moreover, merging of source pdfs on overlapping variables is also feasible
[32].
Indirect merging: was defined using empirical densitybrf (d (t)), or predictive distributionf (d (t) |V ).First, we consider the case with empirical density. Using (4.16) in (3.40) yields
f[1] (Θ|d(t)) ∝ f (Θ) exp(t(α2E brf [2](d(t))
(ln f (d (t) |Θ))))
exp(t (1− α2) E brf [1](d(t))
(ln f (d (t) |Θ)))
∝ f (Θ) (A (Θ))t exp(α2E brf [2](d(t))
(〈B (Ψt) , C (Θ)〉+D (Ψt)))
exp(t (1− α2) E brf [1](d(t))
(〈B (Ψt) , C (Θ)〉+D (Ψt))),
which is (due to linearity ofB (Ψt) in scalar product) again of the exponential family with
statistics
V[1] = α2E brf [2](d(t))(B (Ψt)) + (1− α2) E brf [1](d(t))
(B (Ψt)) , (4.25)
νt = ν0 + t.
This can be further simplified to:
V[1] = α2V[2] + (1− α2)V[1], (4.26)
νt = ν0 + t.
4.3 Distributional approximations
Up till now, all operations of probabilistic calculus—namely marginalization (3.2) and expectation
(3.4)—were analytically tractable. In this Section, we review the most common approximation meth-
ods used to overcome computational difficulties associated with evaluation of analytically intractable
pdfs.
The problem can be avoided byprojection of the pdf by onto a family of distribution that is
computationally tractable. In all subsequent operations, such as normalization, marginalization and
evaluation of moments, the original intractable pdf will be replaced (approximated) by its projection:
f (Θ|d(t)) ≈ baf (Θ|d(t)) . (4.27)
Here, baf denotes the best possible approximation withing the chosen computationally tractable
class.
36
4.3. DISTRIBUTIONAL APPROXIMATIONS
Various approximation strategies have been developed. In this Section, we review the most com-
mon approximation techniques.
4.3.1 Certainty equivalence approximation
In many engineering problems, dealing with full pdfs is avoided. A point estimate, i.e. one value of
parameterΘ, is considered as the summarizing result of the learning task.
The point estimate,Θ = Θ (d (t)), can be interpreted as an extreme approximation of the posterior
pdf by the functionδ (·):
f (Θ|d(t)) ≈ baf (Θ|d(t)) = δ((
Θ− Θ)|d(t)
), (4.28)
whereΘ is the chosen point estimate of parameterΘ, andδ (x) is the Dirac delta function∫xδ (x− x) g (x) dx = g (x) ,
if x is a continuous variable, and the Kronecker function
δ (x) =
1,0,
if x = 0otherwise
if x is a discrete variable.
This approximation is known as thecertainty equivalenceprinciple [63]. It remains to determine
an optimal value of the point estimate. Typically, it is chosen as MaximumA Posteriori (MAP)
estimate:
Θ = arg maxΘ
f (Θ|d(t)) . (4.29)
There are many methods for evaluation of MAP estimates. Here, we review the famous EM
algorithm, since it will be used in later derivations.
Algorithm 4.1 (Expectation Maximization (EM) algorithm) is a well known algorithm for ML
estimation—and by extension for MAP estimation—of model parameters Θ = [Θ1,Θ2] [64]. Here,
we follow an alternative derivation of EM via distributional approximations [65]. The task is to
estimate parameter Θ1, of the (intractable) marginal distribution
f (Θ1|d (t)) =∫f (Θ1,Θ2|d (t)) dΘ2. (4.30)
Using Jensen’s inequality, it is possible to obtain a lower bound on (4.30) which is numerically
tractable [65]. The resulting inference algorithm is then a cyclic iteration of two basic steps:
E-step: compute approximate distribution of parameter Θ2, at iteration i:
baf (i) (Θ2|d(t)) ≈ f(Θ2|d(t), Θ(i−1)
1
). (4.31)
37
CHAPTER 4. FEASIBLE DECISION MAKING
M-step: using approximate distribution from the E-step, find new estimate Θ(i)1 :
Θ(i)1 = arg max
Θ1
∫Θ2
baf (i) (Θ2|d(t)) ln f (Θ1,Θ2, d (t)) dΘ2. (4.32)
It was proven that this algorithm monotonically increases the marginal likelihood, f (d (t) |Θ1), thus
converging to a local maximum [66].
4.3.2 Laplace’s approximation
This method is based on local approximation by a Gaussian distribution at the MAP estimateΘ, of
the posterior pdff (Θ|d(t)) [67], Θ ∈ <p.Formally, Laplace’s method approximates the distribution (4.27) as follows
f (Θ|d(t)) ≈ N(Θ,H−1
)(4.33)
whereΘ is the MAP estimate (4.29), andH ∈ <p×p is the (negative) Hessian matrix of the logarithm
of the joint pdff (Θ, d (t)) with respect toΘ, evaluated atΘ = Θ,
H = −[∂2 log f (Θ, d (t))
∂Θi∂Θj
]Θ=Θ
, i, j = 1, . . . , p, (4.34)
The asymptotic error of approximation was studied in [67].
4.3.3 Fixed-form minimum distance approximation
The approximating distributionbaf (Θ|η) is chosen as a tractable distribution with parameterη. The
optimal approximationbaf (Θ|η)—given the fixed-form functionbaf (·)—is then determined as
η = arg minη
∆(f (Θ|d(t)) || baf (Θ|η)
), (4.35)
where∆( baf (·) , f (·)
)is an appropriate measure of distance (or divergence) between two pdfs.
Various measures are used for specific problems, such as Kullback-Leibler, Levy, chi-squared,L2-
norm, etc. These are reviewed in [59]. Specifically, the Kullback-Leibler (KL) divergence (3.7) is
important for two reasons:
1. statistical inference via KL divergence was shown to be optimal in statistical utility sense [52].
2. minimization (4.35) with respect the KL divergence (3.7) has a unique—and therefore global—
solution [68].
Moreover, the KL divergence is also used in many practical applications [53, 35, 69].
38
4.3. DISTRIBUTIONAL APPROXIMATIONS
4.3.4 Variational Bayes (VB) approximation
The Variational Bayes procedure is defined by the restriction of conditionally independence:
bwf (Θ1,Θ2|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) .
Note that the restriction does not prescribe any specific form of the distribution, therefore, the in-
volved distributions are denotedbwf . Optimization of the KL divergence for this choice is given by
the following theorem.
Theorem 4.1 (Variational Bayes)Let f (Θ|d(t)) be the posterior pdf of multivariate parameter Θ.
The parameter Θ is partitioned into Θ =[Θ′
1,Θ′2, . . . ,Θ
′q
]′. Let bwf (Θ|d(t)) be an approximate
pdf restricted to the set of conditionally independent distributions on Θ1,Θ2, . . . ,Θq:
bwf (Θ|d(t)) = bwf (Θ1,Θ2, . . . ,Θq|d(t)) = Πqi=1
bwfi (Θi|d(t)) . (4.36)
Then, the minimum of the KL divergence,
baf (Θ|d(t)) = arg minbwf(·)
KL(bwf (Θ|d(t)) ||f (Θ|d(t))
), (4.37)
is reached for
bafi (Θi|d(t)) ∝ exp(E baf/i(Θ/i|d(t)) (ln (f (Θ, D)))
), i = 1, . . . , q, (4.38)
where Θ/i denotes the complement of Θi in Θ, and baf/i(Θ/i|d(t)
)=∏qj=1,j 6=i
bafj (Θj |d(t)). We
will refer to baf (Θ|d(t)) as the Variational Extreme. Conditionally independent elements of (4.38)
will be called VB-marginals. The parameters of the posterior distributions (4.38) will be called
VB-statistics.
Proof: See [70], [71].
The main computational problem of the VB approximation is that the Variational Extreme (4.38) is
not given in closed-form. For example, withq = 2, the moments ofbaf1 (·), are needed for evaluation
of baf2 (·), and vice-versa. The solution of (4.38) is usually found via an iterative algorithm that is
suggestive of the EM algorithm (Algorithm 4.1), but where all steps involve expectations of the kind
in (4.32), as follows.
Algorithm 4.2 (Variational EM (VEM)) Consider the case where q = 2, i.e. Θ = [Θ′1,Θ
′2]′, then
cyclic iteration of the following steps, n = 1, 2, . . ., converge to a VB extreme (4.38).
E-step: compute approximate distribution of parameter Θ2 at iteration n:
baf(n)2 (Θ2|d(t)) ∝ exp
∫Θ1
baf(n−1)1 (Θ1|d(t)) ln f (Θ1,Θ2, D) dΘ1. (4.39)
39
CHAPTER 4. FEASIBLE DECISION MAKING
M-step: using approximate distribution from the ith E-step compute approximate distribution of
parameter Θ1 at iteration n:
baf(n)1 (Θ1|d(t)) ∝ exp
∫Θ2
baf(n)2 (Θ2|d(t)) ln f (Θ1,Θ2, D) dΘ2. (4.40)
Where the initializers, i.e. VB-statistics of baf(0)1 (·) and baf
(0)2 (·), may be chosen randomly. Con-
vergence of the algorithm to fixed VB-marginals, baf(i)i (Θi|d(t)), ∀i, was proven in [70] via natural
gradient technique [72].
Compared tofixed-formminimum divergence approximation (4.35) there are two key differences:
1. the approximating distribution is not confined to a given form, but it is restricted functionally,
using the assumption of conditional independence:
f (Θ|d(t)) ≈ bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) . . . bwf (Θq|d(t)) , (4.41)
whereΘ =[Θ′
1,Θ′2, . . . ,Θ
′q
]′is the multivariate parameter partitioned intoq elements. No-
tation bwf (·) is used to denote an unspecified functional variant (‘wild-card function) used in
optimization procedure which yield the approximating distribution.
2. for reasons of tractability, the VB procedure does not minimize the ‘original’ KL divergence
fromf (Θ|d(t)) to bwf (Θ|η) (4.35) but the ‘reverse’ KL divergenceKL( bwf (Θ|d(t)) ||f (Θ|d(t))
),
i.e. from bwf (Θ|d(t)) to f (Θ|η).
These have, respectively, the following consequences:
1. conditional independence:
• the VB approximation can be used only for models with more than one parameter,
• cross-correlation between variablesΘ1 andΘ2 is not modelled. Intuitively, the correlated
multivariate distribution is modelled as a product of approximating marginals.
2. the use of ‘reverse’ KL divergence:
• from property 4. of the KL divergence (Section 4.3.3), the ‘reverse’ KL divergence is not
equal to the ‘original’ one and therefore, it islessoptimal in the statistical utility sense
[52].
• minimum divergence approximation viaKL( bwf (·) ||f (·)
)is not guaranteedto have a
unique minimum [68].
These disadvantages are, however, out-weighted by computational advantages: (i) functional (i.e.
free form) optimization has an analytical solution, and (ii) parameters of the optimal approximating
posteriors can be evaluated using an alternating VEM algorithm (Algorithm 4.2).
40
4.4. APPROXIMATE BAYESIAN FILTERING
4.3.5 Markov Chain Monte Carlo (MCMC) approximation
In this approach, the posterior pdf is approximated by a piece wise constant density on a partitioned
support, i.e. via a histogram constructed from a sequence of random samples,Θ(0),Θ(1),Θ(2), . . .
,Θ(n), . . ., of variableΘ.
The sequence of random samples is called a Markov chain if then-th sampleΘ(n) is generated
from a chosen conditional distribution
f(Θ(n)|Θ(n−1)
)(4.42)
which depends only upon the previous state of the chainΘ(n−1).
For mild regularity conditions onf (·|·) (4.42), then, asn → ∞, Θ(n) ∼ f (Θ), the (time-
invariant) stable distribution of the Markov chain defined via the kernel (4.42). Hence i.i.d. samples
from f (Θ) may be drawn via an appropriate choice of kernel (4.42), ifn is chosen sufficiently large.
Typically, the associated computational burden is high, especially for high-dimensional parameters.
4.4 Approximate Bayesian filtering
In this Section, study the use of distributional approximations (Section 4.3) for the problem of
Bayesian filtering (Proposition 3.1). We review the techniques of forgetting [73]. Moreover, we
introduce a new approximation technique based on the VB approximation (Section 4.3.4), which
will be calledVB-filtering.
4.4.1 Forgetting
Note that for the estimation scenario, the Bayesian filtering problem is replaced by accumulation
of sufficient statistics, which is computationally feasible. The technique of forgetting [74] was pro-
posed for estimation of non-stationary parameters of models from the exponential family (4.16).
Originally, the technique was developed as heuristics [74]. Later, it was shown to be a special form
of approximation of the time-update operation in Bayesian filtering [73].
The time-update operation is approximated as follows:
f (Θt|d (t− 1) , φt) ∝[f (Θt−1|d (t− 1))Θt
]φt × bAf (Θt|d (t− 1))1−φt . (4.43)
The notationf (·)Θtindicates the replacement of the argument off (·) by Θt, whereΘt is the time-
varying unknown parameter set at timet. bAf (·) is a chosen alternative distribution, expressing
alternative knowledge aboutΘt at timet. Coefficientφt, 0 ≤ φt ≤ 1 is known as the forgetting
factor. From (4.43), the limits are interpreted as follows:
for φt = 1: prior information, at timet, about the new variableΘt is identical to the posterior of
Θt−1 at t− 1:
f (Θt|d (t− 1) , φt) = f (Θt−1|d (t− 1))Θt.
41
CHAPTER 4. FEASIBLE DECISION MAKING
This is consistent with the choiceΘt = Θt−1, i.e. the time-invariant parameter assumption.
for φt = 0: prior information, at timet, about the new variableΘt is chosen as the alternative dis-
tribution:
f (Θt|d (t− 1) , φt) = bAf (Θt|d (t− 1)) .
This is consistent with the choice of independence betweenΘt andΘt−1, i.e.
f (Θt,Θt−1|d (t− 1)) = bAf (Θt|d (t− 1)) f (Θt−1|d (t− 1)) .
The forgetting factor is typically considered as fixed and it is chosen by the designer of the model.
The choice ofφt close to1 models slowly varying parameters. The choice ofφt close to0 models
rapidly varying parameters.
Remark 4.4 (Internal model for forgetting) It is possible to construct the explicit internal model
(3.9), however, no use was found for it.
Using this approach, the task of Bayesian filtering (Proposition 3.1) can be re-interpreted in terms
of the task of estimation within exponential family (Section 4.2). Using the time-update operation
(4.43), the data-update operation (3.14) for the exponential family (4.16) can be rewritten as follows:
f (Θ|d (t)) = Aνt (Θ) exp 〈Vt, C (Θ)〉 ,
Vt = φtVt−1 +B (Ψt) + (1− φt) bAVt, (4.44)
νt = φtνt−1 + 1 + (1− φt) bAνt,
where bAV and bAν denotes statistics of the alternative distributionbAf .
4.4.2 Variational Bayes filtering
In this Section, we slightly re-formulate the operation of Bayesian filtering (Proposition 3.1). Here,
we treat the time-update operation (3.13) as a sub-task of the data-update operation (3.14). The
data-update operation is approximated as one operation:
f (Θt|d (t)) ∝∫f (Θt,Θt−1|d (t)) dΘt−1,
which, can be—under the assumptions of Agreement 3.1—split into separate time- and data-update
operations.
Here, we seek an approximation of the joint distributionf (Θt,Θt−1|d (t)) in the class of condi-
tionally independent distributions, i.e.
bwf (Θt,Θt−1|d (t)) = bwf (Θt|d (t)) bwf (Θt−1|d (t)) .
42
4.5. APPROXIMATE ESTIMATION
Using the Variational Bayes approximation (Theorem 4.1), it is easy to show that the optimal ap-
proximation can be found in the following form:
baf (Θt|d (t)) ∝ exp
E baf(Θt−1|d(t)) (ln f (yt,Θt,Θt−1|ut, d (t− 1))),
baf (Θt−1|d (t)) ∝ exp
E baf(Θt|d(t)) (ln f (yt,Θt,Θt−1|ut, d (t− 1))).
From Agreement 3.1, it follows that
baf (Θt|d (t)) ∝
exp
E baf(Θt−1|d(t)) (ln f (yt|ut, d (t− 1) ,Θt) + ln f (Θt|Θt−1) + ln f (Θt−1|d (t− 1))),
∝ f (yt|ut, d (t− 1) ,Θt) exp
E baf(Θt−1|d(t)) (ln f (Θt|Θt−1)).
(the first log-term is independent ofΘt−1 and can be omitted from expectation, the last log-term is
independent ofΘt and thus it will be part of the normalization). Hence, the time-update step can be
written as:
baf (Θt|d (t− 1)) ∝ exp
E baf(Θt−1|d(t)) (ln f (Θt|Θt−1)).
However, in this case, the time- and data-update operations can not be done sequentially, as they are
mutually dependent. Therefore, the VEM algorithm (Algorithm 4.2) must be used.
4.5 Approximate estimation
4.5.1 Bayes-closed approximation
The problem of recursive estimation with limited memory was addressed in general in [75]. There,
the problem was defined as finding a functional form,bwf (Θ), of such an approximate distributions
that is closed under Bayes’ rule, i.e.
bwf (Θ|d (t)) ∝ f (dt|Θ, d (t)) bwf (Θ|d (t− 1)) . (4.45)
where bwf (Θ|d (t− 1)) and bwf (Θ|d (t)) are of the same functional form. Moreover, the form
must depend only on a finite-dimensional statistics,st, such as
bwf (Θ|d (t)) = bwf (Θ|st) ,
where dimensionst is assigned, and may be chosenarbitrarily small. Note thatst plays the role of
sufficient statistics.
The requirement of closure under the Bayes rule is important, since any Bayes-closed estimation
avoids accumulation of errors during time-updating.
43
CHAPTER 4. FEASIBLE DECISION MAKING
The family was found in the form of probabilistic mixture ofst fixed (known) pdfsfi (Θ), i =1, . . . , st, weighted by elements ofst. Statisticsst is then updated by a linear functional,l (·),
si,t = si,t−1 + l (fi (Θ) , ln f (dt,Θ|d(t)t−1)) , i = 1, . . . , st. (4.46)
Alternatively, the choice ofst fixed pdfs,f i (Θ), can be replaced by the choice ofst functionals
li (·), such as
si,t = si,t−1 + li (ln f (dt,Θ|d (t− 1))) .
It was proven then the approximate on-line identification (4.45) isgloballyoptimal—with respect to
orthogonal projection on the true posterior distribution—[55].
Practical use of the approximation is, however, rather limited. The method requires time- and
data-invariant linear functionals,li (·) to be chosena priori. Design criteria for these operators are
available only for special cases. The method was demonstrated to be applicable to low-dimensional
problems only.
Remark 4.5 (Particle Filtering) The popular technique of particle filtering [60] applied to station-
ary model can be seen as a special case of the Bayes-closed approximation. In this approach, the pdf
f (Θ|st) is approximated by particles, i.e. samples from Θ∗,Θ(1),Θ(2), . . . ,Θ(n)
, each of which
has assigned weights w(i). This correspond to the choice of
fi (Θ) = δ(Θ−Θ(i)
),
and the weights st ≡ w = [w1, w2,...].
4.5.2 Projection based approach
In this case, the requirement for the approximation family to be closed under Bayes’ rule is relaxed.
The form of the posterior,f (Θ|st), is givena priori and fixed for allt. It is the Bayes’ rule what is
approximated at each step [59, 76]. If the posterior distribution,f (Θ|d (t− 1)), has a form different
from the prior,f (Θ|st−1), an approximation of the posterior is found in the family of the prior
distributionbaf (Θ|st) ≈ f (Θ|d (t− 1)) ∝ f (dt|Θ, d (t− 1)) baf (Θ|st) (4.47)
The approximation (4.47) is used as prior in the next step.
All projection-based approximations of pdfs reviewed in Section 4.3.3 may be used here.
Note that one-step approximation is onlylocally optimal (i.e. optimal only for one step, not for
the whole trajectory), and so the error of approximation may grow with time. Thus the quality of the
approximation has to be studied asymptotically, i.e. fort → ∞. Furthermore, the approximation is
not closed under Bayes’ rule. In practice, this means that on-line identification given a set of i.i.d.
observations yields different results depending on the order in which the data are processed [59].
Remark 4.6 (Re-sampling in particlle filtering) A typical problem of particle filtering is that the
44
4.5. APPROXIMATE ESTIMATION
posterior mass concentrates on a few particles. This effect can be avoided by so-called re-sampling-
operation. This operation can, once again, be seen as projection of the pdf on another support. One
immediate consequence is that the closure under the Bayes rule is lost.
4.5.3 On-line Variational Bayes
The general VB approximation (Section 4.3.4) was extended to the on-line scenario in [70]. It is
found that the on-line VB method is a special case of one-step approximation, namely distribution
fitting, with Theorem 4.1 used to satisfy (4.47). Convergence of the method was also proven in [70],
by showing on-line VB to be a special case of stochastic approximation, which is known to converge
[77].
Off-line VB approximation (Section 4.3.4) is a functional optimization of the KL divergence. This
functional optimization can be extended to the on-line scenario as follows:
bwf (Θ|d(t)) ≈ f (dt|Θ, d (t− 1)) bwf (Θ|d(t− 1)) . (4.48)
We seek an optimal approximation of the true posterior under the conditional independence con-
straint (assumeq = 2 for algebraic simplicity):
bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) , (4.49)bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) . (4.50)
Then, using (4.48) and (4.49) in Theorem 4.1, the VB-optimal form of (4.49) is found in the following
form:
baf (Θi|d(t)) ∝ exp(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1))) + ln bwf (Θi|d(t− 1))
),
∝ exp(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1)))
)bwf (Θi|d(t− 1)) . (4.51)
Equation (4.51) can be rewritten as:
baf (Θi|d(t)) = bV Bfi (dt|Θ, d (t− 1)) bwf (Θi|d(t)) , i = 1, 2, (4.52)bV Bfi (dt|Θ, d (t− 1)) ∝ exp
(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1)))
). (4.53)
Then, (4.52) is the VB-approximate update of parameter distribution, wherebV Bfi (dt|Θ, d (t− 1))plays the role of observation model for theith posterior distribution. Hence, it will be known as
partial VB-observation model. This concept is helpful, since it allows us to use the results from
estimation (Section 4.2) and choosebwf (Θi|·) conjugate with the VB observation model (4.53)
yields a numerically tractable recursive identification algorithm. ThisVB-conjugatedistribution can
be found if the partial VB observation model (4.53) is from the exponential family.
Note that (4.53) is, in fact, in the form of the Bayes-closed approximation (4.46) withE baf(Θi|d(t)) (·)
45
CHAPTER 4. FEASIBLE DECISION MAKING
playing the role of linear operatorli (·). However, the expected value,E baf(Θi|d(t)) (·), is conditioned
by d (t) and is, therefore, time-varying. This is not allowed for the linear operators used in the
Bayes-closed approximation. Therefore, the on-line VB approximation (4.52) is not closed under
Bayes’ rule. Asymptotically, however, VEM evaluation off (Θi|d(t)), converges [70]. Thus, VB-
approximation is asymptotically Bayes-closed.
4.6 Approximate design of DM Strategy
Fully probabilistic design (FPD) of the DM strategy was chosen as the main approach to this prob-
lem. For special cases from model families reviewed in Sections 4.1, and 4.2, the solution of FPD
(Proposition 3.2) is analytically tractable. However, this is not true in general. In this Section, we
analyze the problem of approximate design of DM strategies using the FPD approach.
The solution of FPD (Proposition 3.2) can be interpreted as a specific type ofdynamic programing
[10], with γ (d (t)) playing the role ofBellman function[6]. The functionγ (d (t)) is recursively
evaluated against the time arrowγ (d (t− 1)) = g γ (d (t)), where functiong · is defined by
(3.27), (3.28). The FPD solution is feasible if the Bellman functionγ (d (t)) can be represented by a
finite dimensional statistics at each timet. In other words, the form of the Bellman function is again
self-replicating under operations (3.27), (3.28).
This concept is similar to that of conjugacy (Remark 4.3). Therefore, the approach to approxima-
tion may be similar to that used for filtering (Section 4.4), and estimation (Section 4.5). A review of
state of the art techniques for general dynamic programing was presented in [78]. It was concluded
that the most promising approach to the problem is the use of approximate Bellman functions from
carefully chosen family [79].
The number of steps involved in FPD solution—Proposition 3.2, restated in (3.27), (3.28)—is
rather high. However, since all of the steps are operations of probabilistic calculus, general proba-
bilistic approximations (Section 4.3) can be used.
Approximations of FPD for mixture models has been presented in [80]. However, systematic use
of distributional approximations in this area remains a topic for future research.
46
5 Practical Aspects of Decision Making
In this Chapter, we list steps that are meaningful for application of the DM theory (Chapter 3) to a
practical problem. Both off line and on line parts are covered.
Agreement 5.1 (On-line steps of decision making)The adaptive decision maker operates by re-
cursive repetition of the following steps:
1. read: the observed data are read from the environment. All the necessary pre-processing and
transformation of data is done in this step.
2. learn: the observed data are used to increase the knowledge about the environment,
3. adapt: the decision-maker use the improved knowledge of the system to improve its DM
strategy,
4. decide: the adapted DM strategy is used to choose an appropriate action,
5. write: the chosen action is written into the environment. Similar to the first step, transforma-
tion of the results is done in this step.
Note that all of the on-line steps of DM, described in Agreement 5.1, should be done within a fixed
period of time. This justifies our emphasis on feasibility (Requirement 4.1) of the DM operations
(Chapter 4). The UML notation of the on-line DM is displayed in Figure 5.1.
Due to computational constraints, it is expected that the level of adaptivity of the decision maker
is rather limited. Namely, it is expected that both (i) structure of the model of the environment, and
(ii) structure of the DM strategy, are defineda priori and hard-wired into the nature of the decision
maker. The challenging task of selection of appropriate structures is typically left for the expert
designers. TheDESIGNERproject [81, 82, 83, 84] is an attempt to systematically address the task
of automated design of DM strategies for various practical problems.
The following steps summarize the available experience, gained especially in development of the
DESIGNER project.
Agreement 5.2 (Basic steps of DM)
1. Problem descriptionIn this step, technical problem specification covering all available knowledge, aims and re-
strictions is collected form the user. Specifically, we collect the knowledge required for all
47
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
Figure 5.1:UML sequential diagram of on-line steps of decision making.
subsequent steps. The first required information is full description of the observed data. We
are dealing with dynamic systems, therefore, all data are expected to vary in time. The stream
(time-indexed sequence) of observations is called the channel. At first, we collect description
of all individual channels, available off-line data, ranges of data sensors, role in the DM (i.e.
is the channel an action ut or observation yt). Then, expert knowledge relevant to each of the
following steps is collected.
2. Elicitation of prior distributionsTypically, the expert knowledge is not available in the form of pdfs. Therefore, this knowledge
needs to be converted (often approximately) into probabilistic terms. From now on, only the
probabilistic representation of this knowledge will be used by the methodology. The original
description of the problem will be used only in interaction with the user.
3. Model selectionThe expert knowledge, collected in step 1, does not select one particular model, but only a
class of considered models. The available off-line data can be used to decide which model
from the class is best suited to the problem.
4. LearningParameters of the model selected in the previous step are estimated.
48
5.1. PROBLEM DESCRIPTION
5. Model validationSince the model identified in the previous steps will be considered as fixed (up to some pa-
rameter – in adaptive case), it is wise to perform some additional tests to validate quality of
the model. Use of invalid model in the following steps may prove to be too expensive. If the
model is found invalid, the whole DM process must be restarted from step 1 (or 2).
6. Elicitation of ideal pdfsAt this stage, structure of the model is considered as fixed. Therefore, the ideal pdf—which
has the same structure as the model for computational tractability—can be build from specifi-
cations obtained from the user at step 1.
7. DesignIn this step, the admissible control strategy is computed.
8. Design validationThe designed control strategy is tested if the closed loop meets requirements specified in step
1. If these requirements are not met, the DM process must be restarted form step 6, or, in
severe cases, from step 1.
9. ImplementationThe control strategy is implemented and tested on-line in real environment. It is also the final
validation of the approach.
The cycle of development used under DESIGNER is described by sequential diagram in Figure 5.2.
The steps in the cycle are only loosely tight together. The user (designer) is allowed to:
skip some steps if a reasonable defaults are available (e.g. for prior elicitation of model selec-
tion).
repeat steps if he is not satisfied with the achieved results. This makes sense only if he changes his
description of the system. This need naturally arises after each validation step when the user
finds the learned model or the designed strategy insufficient for his needs, the whole process
must be restarted.
5.1 Problem description
At this stage, the user (or the designer) should describe the problem in a systematic way. Interaction
with the user can be done in two ways:
interactive mode where the DM process stops after each step and allows the user to change the
description or restart the whole process.
batch mode in which all the information is collected first, and all computation runs independently
of the user. This is illustrated in Figure 5.2, where all steps are called from operationbatchrun.
49
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
Figure 5.2:UML sequence diagram of the design of the decision-maker.
50
5.2. PRIOR ELICITATION
Naturally, it should be possible to combine the use of these modes. For example, at first, initial de-
scription of the problem is created in the interactive mode on a small dataset. Then, the computation
on a larger data set is run with the same description in the batch mode.
Thus a systematic way of storing user description of the problem is required. From software design
point of view, a structure for external information is to be created. This structure cannot be considered
as being rigid or complete. It is expected that with more sophisticated methods, more information
from the user may be required. Following the object-oriented approach, all new information fields
should be added by the mechanism of inheritance. This approach will ensure compatibility of the
extended description with older methods.
5.2 Prior elicitation
If a knowledge on the modelled environment is available before any data are observed, this knowl-
edge can be injected into the learning process by means of prior pdfs on model parameters. Thus,
the form of prior pdf depends on the observation model used for estimation. However, the prior
knowledge,K, is typically available in the form independent of the chosen model parameterization.
The task of prior elicitation is basically translation of partial knowledge available for input/output
model (3.15) into prior knowledge on parametersf (Θ).
Since the main concern of this work is with feasibility of the whole process, we adopt the following
assumptions:
1. The form of the prior is chosen as conjugate under the Bayesian filtering/estimation. Therefore,
the task of prior elicitation is to select only its statistics.
2. If more than one piece of knowledge is available, i.e.K =K1, . . . ,KK
, information from
all possible sources must be taken into account. Typically, a compromise between the sources
must be found, as the sources can suggest incompatible knowledge.
It is easy to recognize the task of prior elicitation as a special case of merging, namelyindirect
merging (Section 4.25). However, this task has a few specific features, which we will analyzed
in this Section. Primarily, the problem was addressed before the general problem of probabilistic
merging, hence it is interesting to review the published results, [85, 86, 87, 88], in the light of recent
development of general theory of merging.
5.2.1 Elicitation of prior pdf from one source
In this Section, we consider the elicitation of the prior based on a single piece of knowledgeK.
A feasible mechanism for prior elicitation known was introduced in [85]. The prior knowledge is
used to generate a typical data records,d, which would be generated by a system (of the form of the
chosen observation model) that is compatible withK. These data are calledfictious data, since they
51
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
were not observed on any real system. Propagation of the fictious data,d, through Bayes’ rule yield
the required prior distribution
f (Θ|K) ≈ f(Θ|d)∝ f
(d|Θ)f (Θ) , (5.1)
wheref (Θ) denotes non-informative ‘pre-prior’ distribution.
This mechanism can be interpreted as a special case of the general theory of probabilistic merg-
ing (Section 3.4), namely indirect merging (3.37). Note, from (3.37) and (5.1), that the generation
of the fictious datad correspond to approximation off (yt|d (t− 1) , ut) by an empirical densitybrf (d (t)), for which the merging operation (3.37) is equivalent to the learning operation (3.14).
From the point of view of software design, we note that elicitation of the prior information from
one source is done in two steps: (i) translation of the given knowledge into the fictious data, and
(ii) learning with fictious data. The learning operation is common to all fictious data, however, the
translation into fictious data must be defined for each type of source of prior information.
5.2.2 Merging of knowledge sources
In this Section, we describe merging of prior knowledge from various sources. It is assumed that
prior pdfsf (Θ|Ki) , i = 1, . . . , K are available. The task is to find such an approximate distributionbaf(Θ|K
), which is (i) conjugate with the observation model, and (ii) combines the knowledge
accumulated inf (Θ|Ki) , i = 1, . . . , K. It was found [87], that
baf(Θ|K
)=
K∏i=1
f (Θ|Ki)βi , (5.2)
is optimal in the sense of KL divergence. Scalarsβi, i = 1, . . . , K,∑K
i=1 βi = 1 are weights
corresponding of theith source of prior knowledge. This result is not surprising, since it is, yet
again, a special case of the general merging operation, namely (3.36).
However, in general merging theory, the weightsβi are assumed to be known. This assumption is
not valid in the task of prior elicitation. A method for selection of weightsβi was presented in [87].
It is argued thatβi should be chosen as follows:
βi ∝ f (d (t) |Ki) =∫f (d (t) ,Θ) f (Θ|Ki) dΘ. (5.3)
Note that (5.3) is the marginal posterior distribution onβi. Therefore, formally, it is not a prior, but
a posterior estimate. In spite of this, such a choice is important from practical point of view, since
it allows us to reduce the computational cost associated with learning. For example, weightsβi may
be estimated on a small amount on data and fixed, while the learning is performed on a larger dataset
with fixed prior baf(Θ|K
).
52
5.3. MODEL SELECTION
Remark 5.1 (Flattening) Informativeness of the prior distribution is strongly dependent on the
number of fictious data that was used for its creation. Moreover, in complex situations, posterior
distributions may be used in construction of the adequate prior [28], which presents a danger of over-
fitting of the prior with respect to the used dataset. In order to overcome this problem, the operation
of flatteningis defined as a way to reduce informativeness of the prior. From software design point
of view, flattening is a newgeneral operation on pdfs.
5.3 Model selection
The notion of model structure has been established in Section 3.1, Remark 3.1. Under the Bayesian
paradigm, the modelM, can be treated as an unknown variable. Hence, the task of model selection
is in principal equivalent to that of Bayesian learning. However, the set of all possible modelsM∗
(3.8) is infinite dimensional and a practical construction of the prior distribution over it, as well as
posterior and evaluation of its moments, is intractable. Therefore, we treat this special problem in
this Section.
We adopt the following assumptions: (i) the model is considered to be time-invariant for all data
recordsd (t), and (ii) the prior onM∗ is considered to be uniform. Then, the task of model selection
is equivalent to finding the maximum likelihood estimateboM ∈ M∗. The likelihood function
L(d(t),M) of M is equal to distributionf(d(t)|M) that depends onM. Thus, the construction of
the likelihood function is implied by (3.19):
L(d(t),M) =∏t∈t∗
f(yt|ut, d(t− 1),M). (5.4)
Hence, the estimation selects among various modelsM, fromM∗, the model with the highestv-
likelihood (5.4) (likelihood on model variants).
Remark 5.2 (Nesting in EF) For tractability reasons, only a finite (and typically quite small) num-
ber of models can be tested using (5.4), since evaluation of the likelihood requires to perform the
learning procedure on the full set of data. An exception from this rule is the exponential family
(Section 4.2), where various model structures share parts of their sufficient statistics. This property
is known as nestingin EF [89].
From the software design point of view, model selection can be implemented as learning with pdf
augmented by an extra discrete parameter (label ofM), followed by marginalization over the re-
maining parameters.
5.4 Learning
In this step, it is assumed that the best modelboM has been already selected. Naturally, one might
see this step as redundant as learning of model parameters for this model has been done in the model
53
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
selection procedure. However, those two steps has different role in many applications. Namely,
model selection is typically performed on smaller data sets (for computational feasibility) and only
in off-line phase of development. Learning must be performed in on-line scenarios for each incoming
data.
From the software design point of view, model selection and learning should use the same algo-
rithms. However, since approximation are required for identification of complex models, different
approximations may be needed for off-line and on-line learning. Then, the bias of the model se-
lection task towards off-line use, and of the learning task towards on-line use, will force us to use
different procedures.
5.5 Model validation
An extensive theory of model validation has been developed, see e.g. the review [90]. However,
the available procedures deal almost exclusively with independent data samples. Consequently, they
cannot be used for validation ofdynamicmodels. Just a few exceptions is available [91], addressing
only special cases.
Model validation is an additional test on the quality ofboM. Recall, from Section 5.3, thatboMwas chosen under the assumption of time-invariance of the model. One task of model validation
is to verify this assumption. This task is addressed in the classical model validation theory [90],
by splitting of all available datad(t) into (i) learning data bld, and (ii) validation data bvd. The
best modelboM is learnt on the learning databld and its performance is checked on the validation
data bvd. The validation technique essentially inspects how good is the bestdynamicmodel boM in
extrapolating of the past to the future. Thus, the learning databld has to form the “prefix" part ofd(t)and the validation databvd the “suffix" part.
The results of validation strongly depend on the choice of the cutting moment which splits the
available data into learning and validation parts. None of the existing methods, [90], is directly
prepared for the considered dynamic models. These models allow just cutting into contiguous se-
quences. Essentially, the available data up to acutting momentτ are taken as learning data and the
rest as validation data. This reduces the number of possible choices of learning and validating data.
At the same time it disqualifies majority of the available analysis. This motivates us to design an
adequate, purely Bayesian, solution of the model validation problem.
5.5.1 Validation with fixed cutting moment
Let us consider a fixed cutting momentτ ∈ t∗ ∪ 0, which defines
bld(τ) ≡ d(τ) (5.5)bvd(t \ τ) ≡ (dτ−∂ , . . . , dt). (5.6)
where∂ is the largest delay of a data record in auto-regression.
54
5.5. MODEL VALIDATION
The task of model validation can be formulated as test of the following hypotheses:
H0: All recorded data,d(t), are described by the learnt modelboM.
Thev-likelihood of this hypothesis results from Bayesian filtering on all data giving
f(d(t)|H0) ∝ L(d(t), boM). (5.7)
H1: Learning data and validation data should be described by individual models.
The correspondingv-likelihood results from independent filtering on learning and validation
data giving
f(d(t)|H1, τ) ∝ L(bld(τ), boM|τ
)L(bvd(t \ τ), b1M|τ
). (5.8)
Note that the proportionality factor formed by the randomized DM strategy (3.11) which is common
for both hypothesis.
The modelb1M used on validation data may differ fromboM. The strength of the constructed test
depends significantly on the choice of the competing modelb1M. We make the following choice: (i)b1M has the same structure asboM, (ii) it is learnt on validation data, (iii) prior pdf in the validation
phase is chosen as flattened version of the state estimate gained in the learning phase. Spread of the
flattened pdf should be comparable to that of the prior pdf used on the learning data.
This choice intuitively meets the requirement on a real competitor: learning is exploited without
fixing the results too much and thus without restricting possibility to fit the validation data in a better
way.
The principle of validation is graphically illustrated in Figure 5.3. Estimation on the whole data
d(t) yields result in the class time invariant models. Estimation on the separate data sets yields result
in the class of models switched at the cutting moment. The latter class is, of course, richer but it has
smaller portion of data per estimated variable at disposal. Thus, the winner is not a priori determined.
With no prior prejudice,f(H0|τ) = f(H1|τ), the Bayes rule provides the posterior pdff(H0|d(t), τ).The learnt model can be accepted if the posterior pdf,
f(H0|d(t), τ) =
(1 +
L( bld(τ), b0M|τ
)L( bvd(t \ τ), b1M|τ
)L(d(t), b0M
) )−1
, (5.9)
is high enough, i.e. close to 1. Otherwise, we have to search for the reason why the chosen model is
not reliable enough.
5.5.2 Validation with multiple cutting moments
Results of the previous test depend, often strongly, on the selected cutting momentτ . Thus, it makes
sense to validate learning for various cutting momentsτ ∈ τ∗ ⊂ t∗. We are making a pair of
55
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
System 2System 1
Switching modelssingle model
Figure 5.3:Scheme of the proposed validation. Ellipses denote classes of models, small circles de-note alternative “positions" of the real system with respect to the model class. The crossesdenote models of the systems estimated within each class. Dashed lines signify distancesof the system to the best models. The hypothesisH0 is expected to win for System 1 andH1 for System 2.
decisions(H, τ) based on the available datad(t). We selectτ ∈ τ∗ and accept (H = H0) or reject
(H = H1) the hypothesisH0 that the learnt model is valid.
We solve this static decision task and select the optimal decisionboH on inspected hypotheses and
optimal cutting time momentboτ as a minimizer of the expected loss. We assume, for simplicity, that
the losses caused by a wrong acceptance and rejection are identical, say (without loss of generality)
1. The loss function is thus chosen as
Z(H, H, τ) = 1− δ(H(τ)−H
), H,H ∈ H0,H1 ,
whereδ(·) is Kronecker delta. The optimal decisionsboH, boτ minimizes expected valueEH (·)taken over uncertain datad(t) and hypothesisH
boH, boτ ∈ Arg minH,τ∗
E[Z(H, H, τ)
]. (5.10)
Proposition 5.1 (Optimal cutting) Let 0, t ∈ τ∗. Then, the optimal decision boH about the in-
spected hypotheses H0,H1 and the optimal cutting boτ , that minimize the expected loss in (5.10),
are given by the following rule.
Compute b0τ ∈ Arg maxτ∈τ∗
f(H0|d(t), τ)b1τ ∈ Arg min
τ∈τ∗f(H0|d(t), τ) (5.11)
If f(H0|d(t), b0τ) ≥ 1− f(H0|d(t), b1τ)
then select boH = H0,boτ = b0τ
else select boH = H1,boτ = b1τ
Proof: Let us consider the set of cutting moments b0τ∗ ≡τ ∈ τ∗ : f(H0|d(t), τ) ≥ 0.5
.
This finite set is non-empty, as for τ = 0 f(H0|d(t), τ) = 0.5. For a fixed τ ∈ b0τ∗, the
56
5.6. ELICITATION OF IDEAL PDFS
decision H = H0 leads to a smaller loss than the decision H = H1. The achieved min-
imum is expectation over d(t) of 1 − f(H0|d(t), τ). Thus, it is smallest for b0τ maximizing
f(H0|d(t), τ) on b0τ∗.
For any fixed τ in the set b1τ∗ ≡ τ ∈ τ∗ : f(H0|d(t), τ) ≤ 0.5, the decision H = H1
leads to a smaller loss than the decision H = H0. The achieved minimum is expectation
over d(t) of f(H0|d(t), τ). Thus, it is smallest for b1τ minimizing f(H0|d(t), τ) on b1τ∗. The
smaller of the discussed pairs of minima determines the optimal decision pair.
Practical applications of the above test strongly depend on the setτ∗ of the considered cutting
moments. The finest possible choice isτ∗ = t∗. The exhaustive search is too demanding for exten-
sive data sets. Search for the minimizer by a version of golden-cut rule, by a random choice or by a
systematic inspection on a small predefined grid can be applied. The predefined grid seems to be the
simplest and still relevant variant as minor changes inτ∗ make little physical sense.
Detailed elaboration of the technique for exponential family (Section 4.2) and simulation example
can be found in [92].
5.5.3 Other techniques of model validation
From Bayesian point of view, model validation can be seen as model selection, where the competing
models have been designed based on one selected modelboM. Other common model validation
techniques are based on analysis of modelling of residues,
εt = Ef(dt|d(t−1)) (dt)− dt. (5.12)
Further analysis can be done either by visual inspection, e.g. histogram based, or, by additional
modelling.
From software-design point of view, it is important to store the residues and use them as observa-
tions for another model.
5.6 Elicitation of ideal pdfs
Since ideal pdfs are typically assigned by the user, their elicitation has many common features with
the task of prior elicitation (Section 5.2). Specifically, elicitation of ideal distributions on data-
independent internal variables, i.e.bIf (Θt|d (t)) = bIf (Θt), is identical to prior elicitation.
A non-expert user is not able to formalize his knowledge in terms of pdfs and their statistics.
Typically, we can expect the user to formalize his requirements in terms of moments (mean and
variance) or ranges on given variables. This information must be translated into pdfs. This can be
achieved, for example, by mean of projection (Section 4.3.3).
Two non-standard cases may arise:
data-dependent ideals i.e. ideals on dynamic behaviour of the system. For example, the user
wish to place restriction on the differences of the observed data,yt − yt−1, in terms of upper
57
CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING
and lower bound. In this case, it is typically sufficient to select the ideal distributions with
mean atyt−1 and adjust to variance to be compatible with the given bounds (e.g. using2σ rule
for Gaussian distribution).
time-dependent ideals for following a priori known trajectory, i.e. the ideal distribution is de-
fined on the whole DM horizonbIf(d(t))
. Hence, the whole trajectory must be stored.
Another possibility is that there exist an analytical formula for recursive computation of the
ideal pdf.
5.7 Design of DM strategy
By agreement, the fully probabilistic design (FPD), Proposition 3.2, is used for design of the DM
strategy.
5.8 Design validation
The best validation of the designed DM strategy is its implementation in the real environment. How-
ever, this can be too costly if the designed DM strategy is incorrect. Therefore, we seek a safer testing
mechanism. At present, the most common validation technique is validation by simulation, which
involves sampling from all involved pdfs. This is important from software-design point of view.
58
6 Multiple Participant Decision Making
In this Chapter, we comment on the general theory of DM (Chapters 3–5) from the multiple partici-
pant point of view. As it was mentioned in Section 1.1, the theory of MPDM is not fully developed
yet. The main distinction of MP scenario from the classical single-participant DM is the ability
and need of participants to communicate and cooperate. Therefore, we distinguish three stages of
operation of each participant:
1. on-line (data-processing) stage, when the participant interacts with the environment, in the
same way as in the single participant case,
2. communicationstage, when the participant exchange information with its neighbours,
3. negotiationstage, when the participant makes decisions how to act and re-act with respect to
its neighbours.
6.1 On-line (data-processing) stage
Is equivalent to on-line DM (Proposition 5.1) of single participant scenario. Here, we preserve the
traditional notion of on-line acting in the sense of processing of the latest observations from the
environment. These steps are adjusted as follows:
1. read: the observed data are read from the system (environment).
In MP scenario, the information available at the current time does contain only the usual inno-
vation of the observed data, but also possible communication from the neighbours.
2. learn: the observed data are used to increase the knowledge about the system (environment).
In MP scenario, it is necessary to absorb information from both (i) the observed data, and
(ii) possible communication from the neighbours. Note that merging of information from
the neighbours does not occur at each time step, and it may be computationally expensive
operation. Therefore, we introduce a new step:
2a. merge: which merge the current knowledge with information obtained from the neighbour.
This operation may be called as a subroutine of thelearn step, or as a separate background
job. The latter mechanism requires develop a new mechanism for synchronization of these two
tasks.
59
CHAPTER 6. MULTIPLE PARTICIPANT DECISION MAKING
3. adapt: the decision-maker use the improved knowledge of the environment to improve its DM
strategy.
In MP scenario, the ideal pdfs describing the aims of DM can be changed on-line by commu-
nicating new ideal distributions. Therefore, it may be necessary to recompute the whole DM
strategy. We introduce a new step:
3a. design: which re-evaluate the FPD on the whole horizon.
This operation may be called as a sub-routine of the adapt step, or a separate background job.
4. decide: the adapted DM strategy is used to choose an appropriate action,
In MP scenario, the task of communication is also part of the decision making problem. There-
fore, in this step, decisions on communication actions—such as: request communication, ne-
gotiate, refuse communication, etc.—must be also made.
5. write : the chosen action is written into the system (environment).
6.2 Communication
Interaction between two (or more) participants can be done only in terms common to all involved
participants. Note that we do not impose any particular internal model structure on each participant,
therefore, the participants may share only information defined on the observed data. This restriction
can be easily relaxed by definition of commonly shared internal variables. Shared internal variables
may represent a real quantity with physical meaning which is not directly observable.
Recall, from Section 3.2, that each participant stores its knowledge as the following pdfs: (i) the
factorized model (3.12), (ii) the factorized ideal (3.22), (iii) and estimates (3.14). The participants
can thus interact via restrictions of the named object into the data space.
DM strategy: both, optimizedf (ut|d (t− 1)), and idealbIf (ut|d (t− 1)),
predictor: of observations,f (yt|ut, d (t− 1)) (3.15), and correspondingly formulated ideal,bIf (yt|ut, d (t− 1)). In many practical applications, the ideal will be defined in this form.
observed data: i.e. values ofd (t). The individual values can be seen as a special case of pdfs,
namelyempirical density, brf (d).
estimates: f (Θt|d (t)) and idealsbIf (Θt|d (t)) on internal variables, if these variables are com-
mon to both interacting participants.
Communication is coordinated through a special channel of observed data.
6.3 Merging
Communication of two participants,P1 andP2, is meaningful only if it causes any modification of
behaviour of any of them. This can be achieved in two ways: (i) modification of model (3.12), or
60
6.4. NEGOTIATION
(ii) modification aims (3.22), of any participant. The model (3.12) is factorized into (i) observation
model, and (ii) estimates. In principal, it is possible to consider modifications of the observation
model, however, no consistent theory is known to us, therefore, we will omit it in this text. Modifi-
cation of the estimates using the communicated data-related pdfs (i.e. predictors, or empirical pdfs)
of can be formalized asindirect merging(Section 3.4.1), as follows:
f[1] (Θt|d (t)) , f[2] (d (t))merge−→ f[1] (Θt|d (t)) , (6.1)
wheref[1] (·) denotes a pdf belonging to a participantP1. Technically, data records observed byP1
are different from those observed byP2 and we should reflect this fact in out notation. However,
the notation of the merged pdf would become increasingly complicated after interaction with many
neighbours, and without any practical benefits. Therefore, we condition formally all merged pdfs on
the general datad (t).Modification of the DM strategy can be achieved by modification of the ideal distributions (3.22)
followed by FPD (Proposition 3.2). Merging of the ideal pdfs can be formalized asdirect merging
(Section 3.4.2):
bIf [1] (dt|·) , bIf [2] (dt|·)merge−→ bIf [1] (dt|·) . (6.2)
Analogically, direct merging is used for estimates of common internal variables:
f[1] (Θt|·) , f[2] (Θt|·)merge−→ f[1] (Θt|·) .
Remark 6.1 Since the DM strategies bof [1] (ut|·), bof [2] (ut|·) are also pdfs, it is possible to merge
them using (6.2). However, behaviour of the merged DM strategy is not optimal under the FPD and
may produce unpredictable results. Therefore, we prefer to merge ideal distributions and estimates
and perform FPD to obtain DM strategy reflecting knowledge and aims of both participants.
6.4 Negotiation
Communication is an active process for both participants. In each step, participant can select from
a range of communication-related actions: initiate communication, accept communication, reject
communication. When the communication was accepted, the participant is expected to reply with
counter proposal: close communication, send counter-proposal, or request further information. Se-
lection of an appropriate action is also part of the DM process, and as such, it is being made based
on its negotiation strategy.
Note that the participants influence each other via the merging operation (Section 6.3), the result
of which is determined by the weightsαi (6.1), wherei is the unique identifier of the neighbour.
This weight can be interpreted as a level of belief in the neighbour. Formally, decision making on
communication is done by negotiating these weights.
We distinguish three basic negotiation strategies [27]:
61
CHAPTER 6. MULTIPLE PARTICIPANT DECISION MAKING
Selfish: is a strategy where each participant freely chooses its own weights. It accepts all infor-
mation from its neighbour, but it refuse any attempts to change the weightsα by another
participant via communication.
Hierarchical: is a strategy where the participant have a fixed values ofαi. If the neighbour is
superordinate, it can assign the value ofα by communication.
Cooperative: is a strategy, where the participants communicate the value ofα (i.e. α2,[1] for the
participantP1, and1− α1,[2] for the neighbourP2). Their common aim is to reach agreement
on its value.
6.5 Design of MP decision-maker
In this Section, we review the basic steps of single-participant DM design (Agreement 5.2) for the
MP scenario. The main distinctions are:
Merging note that merging was already part of the single participant DM, namely the tasks of
prior elicitation (Section 5.2) and ideal elicitation (Section 5.6). However, in both cases, this
operation was performed off-line, i.e. with little constraints on computational efficiency. In
MP scenario, this operation is performed in real-time, and thus, it must be coordinated with
update of data by observations of the environment.
Negotiation the basic strategies of negotiation were described in the previous Section. At present,
we assume that these strategies are designed as deterministic. Formally, it is possible to create
an explicit probabilistic model of the neighbouring participants, learn their behaviour, and
design an appropriate strategy of communication with them. This scenario was not studied in
detail yet and remains as an interesting topic for further work.
Problem description the description structure of a single participant must be extended to contain
information about the neighbours, as follows:
1. a request for communication from a neighbour participant will be observed as a data
record and thus, it will be part ofyt. This must be reflected in the data description.
2. a response to communication, or initiation of a communication to a neighbour is a deci-
sion action, which must be also reflected in the data description.
3. the objects to be communicated are pdfs with possibly large statistics, therefore a suffi-
cient space must be allocated for these structures.
Model selection the initial model selection will be performed independently for each participant
using the same methods as for the single participant case (Section 5.3). However, some form
of model selection must be performed in the case of explicit modeling of the neighbours. For
62
6.5. DESIGN OF MP DECISION-MAKER
example, removal of models of inactive neighbours, or creating entries for newly recognized
neighbours.
Note that explicit modelling of neighbours may be extremely computationally intensive since
the standard model selection (Section 5.3) is defined in terms of hypothesis testing. Testing
large numbers of hypothesis associated with each neighbour is clearly intractable problem and
various simplifications and approximations must be found.
Learning from the observed data will be done using the same techniques as in the single partici-
pant case (Section 5.4). Learning from the knowledge obtained by communication with other
participants will be done via merging (Section 3.4), which can be once again translated into
the basic learning operations (Section 3.4).
Note that learning of the explicit model of the neighbours is a challenging task even in the
simple case with unknownα. Therefore, we expect the following:
• learning of the model of neighbours must be done with different sampling period than
learning of other parameters inΘ,
• no direct observation model is available. The effects of communication (and merging
with given α) will be not be immediately recognizable. Therefore, an approximative
observation model (similar to that in VB, Section 4.5.3) must be found.
63
CHAPTER 6. MULTIPLE PARTICIPANT DECISION MAKING
64
7 Software Image
In this Chapter, we present the UML description of the software toolbox for distributed dynamic
Bayesian decision making.
The whole toolbox will be split into the following packages:
Math is an abstract package, which is used as repository of basic data structures and mathematical
objects, such as matrices. All operations on matrices are supposed to be elements of this
package. However, we will not model these operations, since these are expected to be already
available within the implementation environment, i.e. Matlab or ANSI C.
Prob is the first package with defined classes. In this package, we models the basic objects of
probabilistic calculus, i.e. random variables, functions, pdfs, and elementary operations on
them. These objects forms the smallest building blocks of the DM task.
This package designs software image of the general theory of DM (Chapter 3).
FProb is the package where we specialize the general classes from packageProb into the classes
of feasible DM. Classes used in exact DM (e.g. estimation in exponential family, Kalman
filtering) are defined here, as well as classes for approximate evaluation (Variational Bayes
approach).
This package designs software image of the feasible DM (Chapter 4).
SingleDM is the package which defines building blocks that are necessary to connect the proba-
bilistic core (defined in packageProb) with the real world. Specifically, it defines data filters,
description structures, communication objects etc.
This package designs software image of the practical aspects of DM (Chapter 5).
MultiDM is the package which extends the classes for single-participant DM (packageSingleDM)
into the considered multiple-participant DM.
This package designs software image of the MP DM theory (Chapter 6).
This initial decomposition intentionally respects the decomposition of the theory. Packages are build
on top of each other, however, their dependence is “one-directional” in the sense that classes from
SingleDM needs classes formProb, but not the other way around. Therefore, other packages using
probabilistic calculus may be build on top ofProb.
In order to distinguish software and theoretical objects, all names of software related objects are
printed inbold typeface.
65
CHAPTER 7. SOFTWARE IMAGE
7.1 Package Math
Defines basic data structures in the form of datatypes:
mxArray representing two-dimensional array of real numbers,
Cell representing list of pointers.
This nomenclature comes from Matlab, which is our primary implementation environment, however,
can be easily re-implemented in any other language. By combination of these two types we obtain
datatypemxACell, i.e. Cell of mxArray s.
In further text, we will often use lists (i.e. Cells) of different classes. By convention, names of
these new datatypes will start with the name of the class followed by ‘Cell’, e.g.RVCell being list
of classesRV.
7.2 Package Prob
The basic objects involved in probabilistic calculus (Chapter 3) are: (i) random variables, (ii) func-
tions on variables, (iii) observed data, and (iii) pdfs. Here, we design software representation for
each of them.
7.2.1 Random variables
In the abstract theory (Chapters 3, and 4), arandom variablecan be used in two different flavours:
multivariate random variable: Θ, or d, of fixed a priori knowndimensionalityΘ, or d. This
form is used if the variable, e.g.d, is observed via a realization,dt. Also, all numerical
expectations are expected to be in this form, i.e. estimatesΘ are real-valued matrices with
dimensionalityΘ.
set of sub-parameters: Θ = α, β, γ, whereα, β, γ may havea priori unknown (or irrelevant)
and mutually different dimensionality or nature (i.e. continuous discrete), and may be recur-
sively separable, e.g.α = α1, α2, . . . , αn. This form is used in definition of structure of
models, such as conditional independence (3.1).
Note that the main reason why to represent the random variable in software is to distinguish argu-
ments of various pdfs. In many existing packages, the structure of decomposition of the model (3.12)
is fixed, hence, the pdfs are uniquely identified by their position in the structure. This is typical for
models, on which the probabilistic operations of DM were transformed into algebraic operations, for
example, state-space models (Section 4.1), or exponential family (Section 4.2).
The problem of unique identification of pdfs arises when approximate methods must be used.
Consider, for example, the operation (3.27) which evaluates the following terms:
γ (d (t)) = Ef(Θ|d(t−1)) (g (Θ, d (t))) .
66
7.2. PACKAGE PROB
Figure 7.1:UML class diagram of random variables.
whereg (Θ, d (t)) is a general function of its arguments. If the pdf,f (Θ|d (t− 1)), is defined as a
chain rule of pdfs of various types, then unique distinction between random variables is vital.
The proposed classes for random variables are displayed in UML notation in Figure 7.1.
7.2.1.1 Datatype: rv_id
Is an abstract datatype used as wild-card for any reasonable unique identifier (e.g. ordinary number,
or string, or both). Detailed implementation of this type will be decided later. The choice will be
made after tests of performance of low-level functions for comparing and sorting random variables.
7.2.1.2 Class RV
Attributes:
ID:rv_id is used as unique identifier of the variable.
final:bool is a switch between the above mentioned roles of RV. Its value will be assigned by
descendants of this class. If true, then the random variable has no inner structure (see class
RVfinal ), otherwise it is composed of list of sub-variables (see classRVlist ).
This class does not define any operations.
This class is abstract. It will be used as wild-card for definition of random variables in functions
and pdfs. It will be always implemented via its descendants.
7.2.1.3 Class RVfinal (RV)
Is a descendant of classRV.
Attributes:
size:mxArray defines size of the random variable. It is a vector of integer values used in Matlab
convention, i.e. if the array contains just one number thanRV represents a vector of that length;
if it contains two numbers,RV represents a matrix with given number of rows and columns.
Operations:
new is a constructor. It copies its argument into the attributesizeand sets the value offinal to true.
67
CHAPTER 7. SOFTWARE IMAGE
Figure 7.2:UML class diagram of functions of random variables.
7.2.1.4 Class RVlist (RV)
Is a descendant of classRV.
Attributes:
RVs:RVCell defines the list of sub-variables ofRV type, i.e. its elements can be either of the
RVfinal of RVlist type.
Operations:
new is a constructor. It copies its argument into the attributeRVs and setsfinal to false.
7.2.2 Functions on random variables
In the theory (Chapters 3, and 4), functions of random variables are denoted by a letter followed by
its variables in round brackets, e.g.g (Θ) (3.4),T (α) (3.5) orω (ut, d (t− 1)) (3.25).
Note that these objects are (almost exclusively) needed as input or output of the expectation oper-
ator (3.4) in FPD (3.27). Evaluation of integration of arbitrary functions is beyond the scope of this
report, however, it may be important for future research. Therefore, we define an abstract function
and its basic descendants here. The UML scheme is displayed in Figure 7.2.
7.2.2.1 Class function
Is a class implementing transformation of random variables:
g (Θ) = Θ,
whereg (Θ) denotes dependent variable andΘ stands for independent variable.
Attributes:
68
7.2. PACKAGE PROB
rv:RVfinCell is a list of random variables on which the function is defined.
dimen:mxArray defines size of the function output.
Operations:
evalall:mxArray with argumentvalues:mxACell, returns value of the function for parameters val-
ues given be argument . Naturally, dimensions ofvaluesshould correspond to those ofrv , and
dimension of the returned value should bedimen.
evalsome:function with argumentswhich:RVfinCell , andvalues:mxACell, replaces the random
variables inwhich by the values invalues. The result of this operation is a new function
defined on complement ofwhich in rv . Again, dimensions of argumentswhich andvalues
should match.
Jacobian:double with attributevalues:mxACell, evaluates the Jacobian operation (3.5) of the
function for values ofrv given by argumentvalues. In this trivial case, the return value is
always equal to 1.
add with argumentFn:function , additively extends the current form of the function by addition of
terms theFn. This operation is abstract in this class, since addition of extra terms creates a
different functional form.
exp:function returns a new instance of the classfunction representing the exponential of the cur-
rent function, i.e.g (Θ) = exp (g (Θ)).
In this basic class, all above mentioned operations are trivial. The main purpose of this class is
to serve as structural information that will be later used (e.g. for evaluation of moments of pdfs).
Naturally, operations on descendants of this class will be much more complex.
7.2.2.2 Class ConstFn
Is a class implementing the transformation:
g (Θ1, . . . ,Θn) = a1, . . . , an , ∀Θ.
Attributes:
As:mxACell list of constant values of parametersa1, . . . , an,
Operations:
All operations of classfunction, i.e. evalall, evalsomeandJacobianmust be re-implemented.
69
CHAPTER 7. SOFTWARE IMAGE
7.2.2.3 Class LinearFn
Is a class implementing the transformation:
g (Θ1, . . . ,Θn) = a1Θ1 + . . .+ anΘn, 1 ≤ n <∞,
wherea1, . . . , an are fixed values of appropriate dimensions (i.e. compatible withΘ1, . . . ,Θn).
Attributes:
As:mxACell is the list of coefficientsa1, . . . , an, whereai represents the value ofΘi.
Operations
All operations of classfunction, i.e. evalall, evalsomeandJacobianmust be re-implemented.
7.2.2.4 Other classes
For various purposes, other trivial functions, such asln (Θ), or sin (Θ), may be required. These can
be easily derived from the general classfunction when needed.
7.2.3 Observed data
In the theory (Chapters 3, and 4), we used notationdt for both (i) random variables, and (ii) their
realizations (i.e.observed data), since the probabilistic calculus is same for both objects. However,
for software representation, it is essential to distinguish these two cases. Here, we define the software
image of the observed data.
7.2.3.1 Class DataSource
Is a class representing the data observed in discrete-time steps.
Attributes:
Dt:mxArray is the software image of the history of observed datad (t). However, for feasible
recursive DM, only the most recent data are required. Hence, this attribute contains only the
last∂ observation records,d (t− ∂ . . . t), where scalar∂ denotes the largest delay of a data
record in auto-regression.
Operations:
step is the operation representing time-shift of the data observation. This operation replacesDt by
a new data records observed at the next step.
write with argumentUt:mxArray , whereUt represents the values of decisionsut+1 that were
chosen at the current step.
70
7.2. PACKAGE PROB
7.2.4 Probability density functions (pdfs)
The basic properties ofpdfswere defined in Section 3.1.2 as well as the basic operations on them.
The use of pdfs within the DM framework is more elaborated by (Agreement 3.1). We define two
basic roles of pdfs:
model: which represents mutual dependence of random variables, such as data observation model
(3.10), internal model (3.9), and predictor (3.15).
estimates: which represents posterior pdfs, i.e. distributions of the variable conditioned only by
the observed data.
Note, that for analytically tractable models—i.e. linear state-space models and exponential family—
recursion of probabilistic operations can be transformed into algebraic operations. Namely, (4.3)–
(4.4) and (4.23), for learning in state-space, and exponential family, respectively. For FPD, the prob-
abilistic recursion (3.27)–(3.28) was transformed into recursion on kernel of the quadratic Bellman
function (4.14)–(4.15).
Therefore, in many software packages (e.g. Mixtools) the models are not independent structures,
but only sub-structures of the estimates. Moreover, this notion of joint representation is even more
emphasized in learning algorithms for models with the conditional independence assumption—such
as Bayesian networks, or on-line VB (Section 4.5.3)—since these algorithms assign an approximate
partial observation model (similar to (4.53)) to each posterior estimate.
However, for the purpose of this text, we propose to model these entities an independent objects.
This proposal is motivated by the structure of FPD (Proposition 3.2), namely operation (3.27), which
implies evaluation of KL divergence on individual observation models.
Therefore we design the following basic classes: (i)mPdf, representing models, and (ii)ePdf,
representing estimates. UML class diagram of these is displayed in Figure 7.3.
7.2.4.1 Class mPdf
Is a class representing models, in this basic version, it represents only theinternal modelf (Θt|Θt−1)(3.9).
Attributes:
rv:RV is the variable on which the pdf is defined, i.e.Θt in (3.9),
rvc:RV is the variable in condition, i.e.Θt−1 in (3.9).
Note that both variables are instances of the general classRV, hence,mPdf can be defined on com-
posed as well as final random variables.
Operations:
expectation:function with argumentFn:function , implements the expectation operation (3.4).
Fn stands forg (α) in (3.4). This operation is needed in FPD, operation (3.27).
71
CHAPTER 7. SOFTWARE IMAGE
Figure 7.3:UML class diagram of the basic Pdf classes.
72
7.2. PACKAGE PROB
divergence:function with argumentspdf:mPdf , andtype:int , implements a divergence given by
the argumenttype. Most often, the KL divergence (3.7) will be used, therefore it is the default
operation fortype=1. This operation is needed in FPD, operation (3.27).
new is the constructor.
7.2.4.2 Class ePdf
Is a class representing theestimates, i.e. f (Θt|d(t)) (3.14). This function is important, since the task
of learning is defined on this object.
Attributes:
rv:RV is random variable on which the pdf is defined, i.e.Θt in (3.14).
Once again, an instance of the general classRV is used in order to allow representation of com-
posed as well as final pdfs. For better intuition, in comparison with neural networks,ePdf is an
abstractgeneral class that models both network, and its nodes. Exact meaning will be refined by
specializations of this class.
Operations:
update with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13), and
data-update (3.14), operations.
Statistics of the resulting pdf are used to replace the original statistics of this object. Therefore,
if the exact update operation yields pdf of different type than is this object, the resulting pdf
must be projected back onto the original family.
update_new:ePdf with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update
(3.13), and data-update (3.14), operations. In contrast with theupdateoperation, this operation
returns the updated pdf as a new object. Therefore, it can be of different type than the original.
expectation:function with argumentFn:function , implements the expectation operation (3.4).
Fn stands forg (α) in (3.4).
dmerge with argumentsEpdf:ePdf, andalpha:double, implements thedirect mergingoperation
(Section 3.4). We do not impose any form of merging, i.e. it is not important which form of
KL divergence—(3.33), or (3.35)—is used. This choice will be made in specializations of this
class.
imerge with argumentsEpdf:ePdf, alpha:double, OM:oPdf implements theindirect mergingop-
eration (Section 3.4). In contrast to the direct merging, this operation requires the knowledge
of the observations modelOM .
Note, that the merging operation was defined only for independent observations. Therefore it
is expected that this definition of the operation is preliminary and it may be changed when a
significant progress in the merging theory is achieved.
73
CHAPTER 7. SOFTWARE IMAGE
project_new:ePdf with argumentfamily:int , implements the operation of projection (4.27). Ar-
gumentfamily denotes onto which family the pdf should be projected. This function may be
used in the task ideal elicitation (Section 5.6).
predictor:pPdf with argumentsOM:mPdf , SM:mPdf, implements the predictor operation (3.15).
Here,OM stands for the observation model, however,no observed data will be used in this
operation. This is expressed by the fact that OM is an instance ofmPdf, and notoPdf. SM is
the internal model (3.9).
log_pred:double with argumentOM:oPdf , implements the prediction operation (3.15). In con-
trast to thepredictor operation,log_pred treatsdt as observations, hence, a numerical value
is returned. For numerical reasons, the returned value islogarithmof (3.15).
flatten with argumentfactor:double, implements the flattening operation needed for prior elicita-
tion (Section 5.2), Remark 5.1.
Primary role of this class is to act as a generalization of both: (i) pdfs composed form another pdfs
by chain rule into (graphs), (ii) and final pdfs (nodes). It is defined as abstract, hence it will not be
used directly but only via its descendants.
Loosely speaking, the purpose of this class is to remind us what operations
must be defined on estimate of any kind.
7.2.4.3 Class oPdf (mPdf)
Is a class representing models, specifically the observation models f (dt|Θt, d (t− 1) , ut) (3.10). It
extends the classmPdf, by linking the random variablerv to the observed data.
Attributes:
DS:DataSource is an instance of the classDataSource.
ind:mxArray indicates a position of realizations ofrv in the data vectorDS.Dt.
Operations:
new constructor must be re-implemented to reflect the presence of the new attributes.
getdata:mxArray in an operation which returns the observed value of variablerv from data-source
DS.
7.2.4.4 Class pPdf (mPdf)
Is a class representing thepredictor, f (dt|d (t− 1) , ut) (3.15). It is defined as an extension of the
classmPdf, for recursive update, similar to the one defined for the classePdf. The key difference
from ePdf is that here, datadt are treated as random variables, not observations.
Operations:
74
7.2. PACKAGE PROB
update with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13), and
data-update (3.14), operations.
rupdate:pPdf with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13),
and data-update (3.14), operations in reverse timing. This operation in required by the FPD
algorithm (Proposition 3.2).
Here, we note that FPD with unknown internals is relatively new result and detailed algorith-
mic solution was not yet elaborated. Therefore, this operation may not be easily implementable
and another updating mechanism must be found.
However, preliminary considerations (Remark 4.1) suggest that it should be possible to create a
sequence of pdfs for each step on the DM horizon. In such a case, statistics of all predictors would
be generated by operationupdate and would be stored as attributes of the class. Then, therupdate
operation would remove the latest statistics and replace them by the previous ones. This behaviour
is however not feasible for long DM horizons. Further research in this area is needed to achieve
feasible FPD design in the sense of Requirement 4.1.
7.2.4.5 Class ePdfFinal (ePdf)
Is a class representing estimates, which are defined on final random variablesRVfinal . These es-
timates correspond to the nodes in graphical models (2.2.2). It is defined as specialization of class
ePdf.
Attributes:
rv:RVfinal the attributerv is redefined as an instance of the classRVfinal ,
Operations:
In contrast to the generalePdf function,ePdfFinal has numerical values of its statistics, hence it
offers an extra operations on them.
expect:mxArray with argumentFn:function , implements the expectation operation (3.4). In con-
trast to operationexpectation, it does not return functional form of the expectations but a
numerical value. This is, however, possible only if theFn argument is defined only on variable
rv , otherwise, this operation causes an error.
replace_stats with argumentnstats:mxACell, is an auxiliary operation that will be used to replace
attributestats by the argumentnstats. This will be needed in some approximate learning
algorithms, such as VEM (Algorithm 4.2).
sample with argumentn:int , implements sampling from the distribution which is needed for the
task of design validation (Section 5.8).
75
CHAPTER 7. SOFTWARE IMAGE
Figure 7.4:UML class diagram of pdfs used for DM with the linear state-space model.
7.2.4.6 Class eEmp (ePdfFinal)
Is a class representing the empirical density,brf (d), (3.38).
Attribute:
data:mxArray represent the observed data recordd (t).
The types of pdfs, introduced in this Section, formalize the interface of the basic building blocks
for pdfs. All pdfs derived from these classes should obey this structure. Therefore, all algorithms
designed for these classes should work—without any modification—for all future descendants (spe-
cializations) of these classes.
7.3 Package FProb
In this package, the general pdfs defined in packageProb are specialized to yield pdfs used in feasible
DM (Chapter 4).
7.3.1 Linear state-space models
In this Section, the general models from theProb package are specialized for the linear state-space
models (Section 4.1). The definition of random variables and data-sources from Sections 7.2.1 and
7.2.3, respectively, does not have to be re-defined. However, new classes specialization are needed
for (i) functions, and (ii) all types of pdfs. UML class diagram of the involved objects is displayed in
Figure 7.4.
76
7.3. PACKAGE FPROB
7.3.1.1 Class QuadraticFn (LinearFn)
The Belman function for linear Gaussian FPD (4.9) has the form of a quadratic function. It is
implemented as an extension of theLinearFn class.
Class implements transformation
g (Θ1, . . . ,Θn) = a1Θ1 + . . .+ anΘn + Θ1b1Θ′1 + . . .ΘnbnΘ′
n, 1 ≤ n <∞,
wherea1, . . . , an andb1, . . . , bn are fixed values of appropriate dimensions (i.e. compatible with
Θ1, . . . ,Θn).
Attributes:
Bs:mxACell list of coefficientsb1, . . . , bn,
Operations:
add with argumentFn:function accepts Fn in the form ofQuadraticFn or LinearFn . If Fn is
defined on the same variables, the correspondingAs andBs are summed. IfFn is defined on
different variables, the listsrv , As andBs are extended by elements fromFn.
exp returns exponential ofg (Θ). It creates a new instance ofexpQuadFnwith same values ofAs
andBs.
All operations of classLinearFn , i.e. evalall, evalsomeandJacobianmust be re-implemented.
7.3.1.2 Class expQuadFn (QuadraticFn)
Is a class representing the exponent of the quadratic functionQuadraticFn.
Operations:
add is not defined,
exp is not defined.
All operations of classfunction, i.e. evalall, evalsomeandJacobianmust be re-implemented.
The purpose of this class is merely a storage of attributesAs andBs in an appropriate (i.e. expo-
nential) form.
7.3.1.3 Class mNorm (mPdf)
This class is a specialization of the mPdf class for linear Gaussian observation model (4.1).
Attributes:
rv:RV is of theRVfinal type, representing variableΘt,
77
CHAPTER 7. SOFTWARE IMAGE
rvc:RV is of theRVlist type containing twoRVfinal instances for variableΘt−1, andut.
A:mxArray is the matrixA in (4.1),
B:mxArray is the matrixB in (4.1),
R:mxArray is the matrixR in (4.1),
Operations:
expectation:function with argumentFn:function , implements the expectation operation (3.4).
The operation should recognize work with:
LinearFn g (Θt) = aΘt, for which it returnsEΘt (atΘt) = a (AΘt−1 +But), and
QuadraticFn g (Θt) = Θ′tZΘt, for which it returnsEΘt (Θ′
tZΘt) = tr (ZR)+(AΘt−1 +But)′ Z (AΘt−1 +But),
and
QuadraticFn g (Θt) = ΘtZΘ′t, for which it returnsEΘt (ΘtZΘ′
t) = ZR+(AΘt−1 +But)Z (AΘt−1 +But)′.
Linear terms inQuadraticFn functions are handled in the same way as inLinearFn . This
operation is needed in FPD, operation (3.27).
divergence:function with argumentspdf:mPdf , andtype:int , implements a divergence given by
the argumenttype. At present, the operation implements the KL divergence (3.7) to another
mNorm class according to formula (4.7).
7.3.1.4 Class oNorm (oPdf,mNorm)
This class is a joint specialization of theoPdf andmNorm classes for linear Gaussian observation
model (4.2). Since all attributes and operations ofoPdf andmNorm complement each other, there
is no need to redefine them. However, the semantic correlation with the theory is broken. One must
remember that attributesA, B andR representC, D andQ in (4.2). Also the constructor new must
be redefined as a merge ofoNorm:new andoPdf:new.
7.3.1.5 Class pNorm (pPdf)
This class is a specialization of thepPdf class for a Gaussian pdf (4.5).
rv:RV is of theRVfinal type, representing variabledt,
rvc:RV is of the typeRVlist , representing all needed variablesdt−i, ut−i, i = 1, . . . , t. Wheret is
the length of the DM horizon.
As:mxACell is the list of linear coefficients for the mean valueµt, which is—from (4.3)–(4.5)—
defined as a linear combination of previous observationsdt−i, ut−i, i = 1, . . . , t, which are
listed in attributervc.
78
7.3. PACKAGE FPROB
Sig:mxArray is the covariance matrixΣt in (4.5), which is—from (4.3)–(4.5)—independent of
previous observations (attributervc), and thus, it can be represented by a numeric value.
Operations:
update accepts argumentsOM:oNorm , andSM:mNorm , jointly implements the time-update (4.3),
and data-update (4.4), operations in the same way aseNorm.updatedoes, with the exception
thatµt is not a numerical value but a functional form.
rupdate accepts argumentsOM:oNorm , and SM:mNorm , jointly implements the time-update
(4.3), and data-update (4.4), in reverse order.
expectation:function with argumentFn:function , implements the expectation operation (3.4) in
the same way aseNorm.expectationdoes, with the exception thatµt is not a numerical value
but a functional form.
7.3.1.6 Class eNorm (ePdfFinal)
This class is a specialization of theePdfclass for a Gaussian pdf (4.4).
Attributes:
rv:RV is of theRVfinal type, representing variableΘt,
mu:mxArray is the mean valueµt in (4.4),
Sig:mxArray is the covariance matrixΣt in (4.4).
Operations
update accepts argumentsOM:oNorm , andSM:mNorm , jointly implements the time-update (4.3),
and data-update (4.4), operations.
update_new:ePdf creates a new instance ofeNorm, copies its statistics and callsupdate on the
new instance.
expectation:function with argumentFn:function , implements the expectation operation (3.4).
The operation should acceptFn of the following types:
LinearFn g (Θt) = aΘt, for which it returnsEΘt (atΘt) = aµt, and
QuadraticFn g (Θt) = Θ′tZΘt, for which it returnsEΘt (Θ′
tZΘt) = tr (ZΣt)+µ′tZµt, and
QuadraticFn g (Θt) = ΘtZΘ′t, for which it returnsEΘt (Θ′
tZΘt) = ZΣt + µZµ′.
79
CHAPTER 7. SOFTWARE IMAGE
dmerge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Sec-
tion 3.4). It acceptsEpdf argument of theeNorm type defined on the same variable, i.e. on
rv , then, the operation (3.36) is implemented as follows:
Σt =(α2Σ−1
t + (1− α2) Σ−1[2]
)−1,
µt = Σt
(α2Σ−1
t µt + (1− α2) Σ−1[2] µ[2]
),
whereΣ[2] andµ[2] is used forSigandmu attributes ofEpdf.
imerge with argumentsEpdf:ePdf, alpha:doubleandOM:oPdf , acceptsEpdf in the formeEmp.
ForeEmp, it calls theupdateoperation withOM , for which the data-sourceDSwas replaced
by eEmp:data.
prediction:pNorm with argumentOM:mNorm implements the operation of prediction (4.5). It
creates a new instance ofpNorm.
log_pred:double with argumentOM:oNorm implements the operation of prediction (4.5) for the
current observed data from the observation modelOM .
expect:mxArray with argumentFn:function , implements the same expectations as operationex-
pectation, however, it returns numerical values of the moments rather than functions.
flatten with argumentfactor:double, implements the flattening operation needed for prior elic-
itation (Section 5.2), Remark 5.1. For Gaussian distribution this operation is just a scalar
multiplication of the covariance matrixΣ.
7.3.2 Exponential family models
In this Section, the general models from theProb package are specialized for the exponential fam-
ily models (Section 4.2) and approximate Bayesian filtering via forgetting (4.4). The definition of
random variables and data-sources from Sections 7.2.1 and 7.2.3, respectively, does not have to be
re-defined. Technically, all operations can be defined in general for sufficient statisticsV andν.
However, for computational reasons, algorithms for most prominent members of this family—linear
Gaussian models (4.20), and Markov models (4.21)—are implemented separately.
The necessary Bellman functions for FPD for a Gaussian pdf was already defined in Section 7.3.1.
Another important member of the family is Dirichlet distribution, which is used for modelling of
discrete Markov processes. In general, treatment of this distribution is simpler than that for Gaussian
distribution, details can be found in [28]. For the purpose of this text, we define only the basic classes
for pdf and Bellman function.
UML class diagram of the involved objects is displayed in Figure 7.5.
80
7.3. PACKAGE FPROB
Figure 7.5:UML class diagram of pdfs used for DM with the exponential family model.
81
CHAPTER 7. SOFTWARE IMAGE
7.3.2.1 Class MultiIndexFn (function)
Is a class representing the multi-arrayΘ in (4.21).
Attributes:
rv:RVfinCell is a list of random variables where all elements are discrete random variables.
Array:mxArray is a structure of numerical values. Size of this array is determined by number of
possible states of the random variablesrv. Indexing of the array for realizations of random
variables isinternalproperty of the class.
Operations:
setElement with attributesrvind:mxACell , andvalue:double is an auxiliary function for assign-
ing values ofArray elements. This function is required, since indexing of the Array is imple-
mented internally. This operation is therefore, the only option to assign value to an element of
Array at position given byrvind .
Also operationsevalsome, andevalall must be re-implemented to comply with the indexing mecha-
nism.
7.3.2.2 Class eEF (ePdfFinal)
Is a class representing the estimate within the EF (4.22).
Technically, it is possible to define statisticsV andν, and operations on them here. However,
these operations are later redefined for each special class, namely Gauss-Wishart and Dirichlet dis-
tributions. Therefore, we leave this class asabstract.
7.3.2.3 Class eGW_LD (eEF)
Is a class representing Gauss-Wishart posterior density for linear autoregressive models (4.20). It is
computationally advantageous to implement all the required operations on LD decompositions [93]
of sufficient statisticsV [94].
Attributes:
LD:mxArray represent LD decomposition of the sufficient statisticsV .
dfm:double represents the sufficient statisticsν, which is also known as degrees of freedom (dfm).
Operations:
update accepts argumentsOM:oEF , and SM:mDelta (or SM:eFrgEF), implements the data-
update operation (4.23) formDelta, or data-update (4.44) foreFrgEF internal models.
82
7.3. PACKAGE FPROB
expectation:function with argumentFn:function , implements the expectation operation (3.4).
The operation should work with:LinearFn andQuadraticFn types of function.
dmerge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Sec-
tion 3.4). It acceptsEpdf argument of theeGW_LD type defined on the same variable, i.e. on
rv , then, the operation is implemented using (4.26).
imerge with argumentsEpdf:ePdf, alpha:doubleandOM:oPdf , acceptsEpdf in the formeEmp.
Then, it calls theupdate operation withOM , for which the data-sourceDS was replaced by
eEmp:data.
prediction:pNorm with argumentOM:oEF implements the operation of prediction (4.24). Ana-
lytically, (4.24) is a Student pdf, however, it can be well approximated by a Gaussian ifν > 10[28]. Since not all operations required on the predictor (pPdf in 7.3) are available for the Stu-
dent pdf, this operationprojectsthe Student pdf into a Gaussian pdf and returns predictor of
thepNorm type.
log_pred:double with argumentOM:oEF implements the operation of prediction (4.24) for the
current observed data from the observation modelOM . Since numerical evaluation of the
Student pdf is numerically feasible, this operation returns the exact value of the prediction
(4.24).
expect:mxArray with argumentFn:function , implements the same expectations as operationex-
pectation, however, it returns numerical values of the moments rather than functions.
flatten with argumentfactor:double, implements the flattening operation needed for prior elicita-
tion (Section 5.2), Remark 5.1. For exponential family this operation is reduced to multiplica-
tion of the sufficient statisticsV andν by a constant.
7.3.2.4 Class mDelta (mPdf)
Is a class representing the stationary internal model (3.17). This model is trivial, hence no other
attributes or classes are required. The purpose of this class is to act as a switch of operational modes
for updateoperations, where the internal model is a mandatory parameter.
7.3.2.5 Class mFrgEF (mPdf)
Is a class representing the forgetting operator (4.43). This class is common to both special types
of pdfs, i.e. linear Gaussian, and Markov models. The exact meaning is determined by the type of
attributerv .
Attributes:
AltPdf:eEF is an instance of theeEFclass representing the alternative distribution in (4.43).
83
CHAPTER 7. SOFTWARE IMAGE
EP:eEF is an instance of theeEFclass representing the posterior distribution at timet−1 in (4.43).
frg:double is the forgetting factorφt in (4.43).
Operations:
At present, the defaultmPdf operations (i.e.expectation, and divergence) on the forgetting
operator are not defined. Formally, at least approximate versions of these operations should be
available. However, their derivation is beyond the scope of this report. This task is left open for
further research.
7.3.2.6 Class oEF (oPdf)
Is a class representing both: (i) the general autoregressive model (4.20), and (ii) the general Markov
model (4.21). In contrast to the Normal observation model, delayed observationsdt−i are present in
the model, withing the variableΨt. Therefore, the observation model must be extended to provide
not only the current data, but the whole regressorΨt (4.17) and its associated Jacobian (4.18).
Attributes:
str:mxArray is a structure of the regressorΨt. Note thatoEF is specialization of the generaloPdf
class, hence the data-source attributeDS is also defined in this class. Elements ofstr are thus
pointers into the array of observationDS:Dt.
Operations:
getPsi:mxArray is an alternative to getdata, which returns numerical value of the currentΨt.
Jacobian:double returns the value of the current Jacobian (4.18).
7.3.2.7 Class eMC (eEF)
Is a class representing the Dirichlet pdf which is conjugate with the general Markov model (4.21).
Note that we faced the problem of representation of the multi-array parameterΘ in the classMulti-
IndexFn. Here, we face the same problem, since statistics of the Dirichlet pdf have the same form
as it parameter. Therefore, once again, aninternal indexing mechanism must be found.
Attributes:
V:mxArray is an attribute representing the sufficient statisticsV in (4.22).
Operations:
update accepts argumentsOM:oEF , and SM:mDelta (or SM:eFrgEF), implements the data-
update operation (4.23) formDelta, or data-update (4.44) foreFrgEF internal models.
84
7.3. PACKAGE FPROB
expectation:function with argumentFn:function , implements the expectation operation (3.4).
Integration over variables is replaced by a summation, so the operation of expectation should
work almost all types of functions.
merge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Section
3.4). It acceptsEpdf argument of theeMC type defined on the same variable, i.e. onrv , then,
the operation is implemented using (4.26).
Moreover, this class is the only class, where it is feasible to implement the (better) merging
algorithm via (3.33) [32].
prediction:pMC with argumentOM:oEF implements the operation of prediction (4.24).
log_pred:double with argumentOM:oEF implements the operation of prediction (4.24) for the
current observed data from the observation modelOM . Since numerical evaluation of the
Student pdf is numerically feasible, this operation returns the exact value of the prediction
(4.24).
expect:mxArray with argumentFn:function , implements the same expectations as operationex-
pectation, however, it returns numerical values of the moments rather than functions.
7.3.2.8 Class pMC (pPdf)
Is a class representing the exponential family predictor (4.24), for the special case of Markov model
(4.21), which is in the form of multinomial pdf [28].
Attributes:
V:mxArray is the statistics of predictor. For one step-ahead predictor, it is of the same size asrv .
Operations:
update:pMC with argumentOM:eEF, andSM:mPdf, should implement the prediction operation
(4.24) next step. However, exact evaluation of this formula is rather complex, therefore, we
approximate the posterior pdf onΘ by a certainty equivalence approximation (Section 4.28),
with point estimate chosen as expected valueΘ = Ef(Θ|d(t)) (Θ) of the posterior on the latest
available data. Thus, multi-step-ahead predictor remains in the form of Multinomial pdf.
rupdate:pMC with argumentOM:eEF, andSM:mPdf, should implement the reverse prediction,
i.e. return (4.24) at the previous time step.
expectation:function implements the expectation operation (3.4). This operation is trivial.
divergence:function implements the KL divergence (3.7) using the formula from [28].
85
CHAPTER 7. SOFTWARE IMAGE
Figure 7.6:UML class diagram of pdfs used for DM with the Variational Bayes approach.
7.3.3 Variational Bayes approach
In this Section, the general models from theProb package are specialized for the conditionally in-
dependent models used in VB approach (Section 4.3.4). In Section 4.3.4, the VB approach was
interpreted as an approximate inference scheme for multivariate pdfs using the assumption of condi-
tional independence. It was shown in (4.53), that dynamic DM using this approximation is feasible
if the VB-marginal pdfs (4.38) are from the exponential family (4.16). Therefore, we restrict our
attention to pdfs composed by conditionally independent models from exponential family, and we
define classes for VB on top of those for EF (Section 7.3.2).
UML class diagram of the involved objects is displayed in Figure 7.6.
7.3.3.1 Class oVBnet (oPdf)
Is an abstract class representing the original (intractable) observation model approximated by the
VB approach. However, this class is not directly used in any learning operation, since the update
is defined with respect to an auxiliarypartial VB-observation models(4.53). At present, the main
purpose of this class is its use as an identifier of compatible types for the argumentOM:oPdf in the
eVBnet:updateoperation.
Note however, that this observation model is needed in FPD (Proposition 3.2). Technically, it
is possible to replace all operations on the original model by operations on partial VB-observation
models. However, the FPD for this type of approximation is not elaborated yet. Therefore, we treat
the VB as a learning-specific approximation.
Attributes:
rv:RVlist since VB approximation is defined on multivariate pdfs, therv attribute must be of the
86
7.3. PACKAGE FPROB
RVlist type.
7.3.3.2 Class oVBpart (oEF)
Is a class representing thepartial VB-observation models(4.53). The key distinction of this type of
model is its dependence on posterior estimates of its neighbours.
Attributes:
neighbours:eVBCell is a list of all neighbours whose moments are needed for evaluation of the
Psi regressor.
Operations:
getPsi:mxArray is an operation returning regressor associated with the model. Note that the result
may be function of observed data and/or statistics of theneighbours.
7.3.3.3 Class eVBnet (ePdf)
Is a class representing the joint distribution (4.49) of conditionally independent VB-posteriors.
Attributes:
rv:RVlist contains random variables on which the pdf is defined.
nodes:eEFCell is the list of all nodes (conditionally independent pdfs). The nodes are from the
exponential family.
PartOMs:oVBCell is the list of partial observation models. Each node should have its correspond-
ing observation model in this list.
Operations:
update with argumentsOM:oVBnet , andSM:mDelta, implements the VEM algorithm (Algo-
rithm 4.2) on thenodes.
expectation:function with argumentFn:function , implements the expectation operation (3.4).
SinceoVBnet is joint pdf of exponential family pdfs, evaluation of expectations (3.4) is re-
duced to calling theexpectationoperation of allnodes.
dmerge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Sec-
tion 3.4). It acceptsEpdf argument of theeVBnet type defined on any subset ofrv . Due to
conditional independence, the direct merging operation is translated into merging operations
onnodes.
imerge with argumentsEpdf:ePdf, alpha:doubleandOM:oPdf , acceptsEpdf in the formeEmp.
ForeEmp, it calls theupdateoperation withOM , for which the data-sourceDSwas replaced
by eEmp:data.
87
CHAPTER 7. SOFTWARE IMAGE
prediction:pVBnet with argumentOM:mNorm implements the operation of prediction (4.5).
Due to conditional independence, the resulting predictor is a product of predictors for each
node.
log_pred:double with argumentOM:oNorm implements the operation of prediction (4.5) for the
current observed data from the observation modelOM . It returns sum oflog_predvalues from
all nodes.
expect:mxArray with argumentFn:function , implements the same expectations as operationex-
pectation, however, it returns numerical values of the moments rather than functions.
flatten with argumentfactor:double, implements the flattening operation needed for prior elicita-
tion (Section 5.2), Remark 5.1. Flattening can be done, once again, independently for each
node.
7.3.3.4 Class pVBnet (pPdf)
Is a class representing predictor obtained by marginalization of theeVBnetestimate.
Note that the marginalization operation for the original observation model (3.10) may not be
tractable. On the other hand, if we approximate the original model by partial observation models
(4.53), the integration is trivial. Therefore, at present, we design the pVBnet class as software repre-
sentation of the latter case. The predictor is then a product of predictors form the exponential family,
i.e. pNorm or pMC .
Attributes:
predictors:pPdfCell array of predictors.
Operations:
No theoretical results for operation on predictors are available at present. These will be defined
later.
7.4 Package SingleDM
This package implements all classes needed for the practical tasks associated withsingle-participant
decision making (Chapter 5). Separation of features related to single- and multiple-participant DM
is motivated by two factors:
1. the purpose of the work is to create a basis for long term research (Section 2.1). A lot of
research will be directed towards single participant scenarios, hence, the presence of features
of MP DM would be redundant and confusing for the users,
2. single participant scenario is a special case of MP scenario, therefore, due to the object-
oriented approach, the classes from this package can be easily extended for MP scenario in
the next package.
88
7.4. PACKAGE SINGLEDM
Here, we review the basic steps of decision making (Agreement 5.2) from the software-design point
of view.
1. Problem description
In this step, the expert user provides as much information about the problem as possible. This
information needs to be stored in order to be used later. Moreover, it must be stored in such a
way that allows easy handling of the information in the subsequent tasks. Hence, we design a
new classUserInfo as a structure oftask-specificpieces of information.
2. Elicitation of prior distributions
This task can be translated into the task of learning with fictious data (Section 5.2). The
learning operations (time- and data- update) are already covered in packagesProb andFProb.
It remains to specialize theoPdf class to handle fictious data.
3. Model selection
Once again, the task of model selection can be translated as learning followed by marginaliza-
tion over parameter-space (Section 5.3). All required operations (time- and data- update, and
prediction) are already covered in packagesProb andFProb. However, the general model of
hypothesis testing (Section 5.3) can be computationally inefficient for certain families, such
as exponential family (Remark 5.2). Therefore, we design a new classHypothesis, which is
intended to encapsulate the task of model selection, and, by specialization, the task of model
validation.
4. Learning
All required operations (time- and data- update, and prediction) are already covered in pack-
agesProb andFProb.
5. Model validation
Is a special case of model selection. The classHypothesiscan be specialized for this purpose.
6. Elicitation of ideal pdfs
The ideal pdf is used, almost exclusively, for evaluation its KL divergence from the observa-
tion model in FPD (Proposition 3.2). Alternatively, its KL divergence from the predictor is
evaluated in data-driven FPD (Proposition 3.3). Therefore, we consider the ideal distributions
to be specialization of the classpPdf.
Note that thepPdf class has defined theupdate and rupdate operations, which allows to
modify the predictor using any observation model or internal model. This mechanism is well
suited for modelling of the time-variant ideals (Section 5.6).
7. Design
All operations for FPD are already available.
89
CHAPTER 7. SOFTWARE IMAGE
8. Design validation
As in was mentioned in Section 5.8, this task is typically achieved by means of simulation.
Since simulation can be seen as a special case of decision-making, it can be implemented
using the already available objects. However, the classDataSourcemust be extended to allow
for writing of the simulated data (not only decisions).
Now, we design software image of the required objects in detail.
7.4.1 Class UserInfo
Is the overall class unifying all information from the expert user. It is composed from information
related to individual steps of DM. UML class diagram of the whole structure is displayed in Figure
7.7.
7.4.1.1 Class DataInfo
This class stores all available information about the observed data and their nature.
Since all data are supposed to be observed on-line, the source of the one scalar observation will be
known as thechannel.
Attributes:
chnum:int is used for storing the number of all channels.
Chns:ChInfCell is a list of informations available on each of the channels. The new datatype
ChInfCell is defined as list ofChnlInfo classes, which is defined later.
PreProc:FltInfCell is a list of information about the necessary pre-processing that must be per-
formed on the data. The new datatypeFltInfCell is a list ofFilterInfo classes. EachFilterInfo
stores information aboutonefilter used for preprocessing.
Internal classes:
7.4.1.1.1 ChnlInfo Attributes:
id:int is a unique identifier of the data user for referencing within software representation of the
system,
name:string is a user-friendly identifier of the data, used for presenting the results to the user,
type:int is an identifier of the data type. At present we distinguish discrete (type=0) and continuous
(type=1) data.
min:double is a minimum possible value in the channel.
max:double is a maximum possible value in the channel.
action:int is used to indicate if the decision maker can choose value of this channel (action=1).
90
7.4. PACKAGE SINGLEDM
Figure 7.7:UML class diagram of the classUserInfo.
91
CHAPTER 7. SOFTWARE IMAGE
7.4.1.1.2 FilterInfo Attributes:
type:int is used to indicate type of the pre-processing filter, i.e. wavelet filter, median filter, etc.
chnls:mxArray is a structure indicating which channels are being used by the filter.
This class is defined as virtual. Specialized filters of various types will extend this info with their
own information.
7.4.1.2 Class PriorInfo
As it was outlined in Section 5.2, the available prior information may consist of many mutually
incompatible pieces (sources) of information. Information from these sources is then combined
together.
Attributes:
Sources:PriKnCell is a list of information on each of the sources. The new datatypePriKnCell
is a list of classesPriorKnInfo . ClassPriorKnInfo contains information on each source.
weights:mxArray is a list ofa priori known weights that will be used to combine the sources. If
the actual weights are to be inferred from the data (Section 5.2), these values will be used as
statistics of the prior distribution on the weights.
Internal classes:
7.4.1.2.1 PriKnInfo Is an abstract class representing one particular type of available information.
Attributes:
type:int identifies the type of information on each source.
This class is defined as virtual. Specialized filters of various types will extend this info with their
own information.
7.4.1.3 Class ModelInfo
Is a class for collecting information on the models preferred by the user.
Attributes:
maxno:int specifies the maximum number of models compared by the model selection procedure.
class:int is used to indicate which class of models is preferred by the user. We assume that all the
tested models will be from this class.
92
7.4. PACKAGE SINGLEDM
7.4.1.4 Class EFModInfo (ModelInfo)
Is a specialization of theModelInfo class for the exponential family. Due to the nesting property of
the exponential family (Section 5.3), the necessary learning phase of model selection can be done
only on one the richest model and the task of model selection is then reduced on operations on
sufficient statistics of this model.
Attributes:
maxstr:mxArray denotes maximum possible structure of the model
7.4.1.5 Class MValidInfo
Is a class storing information on user preference of the model validation procedure (Section 5.5).
The model can be validated by many tests, which can be applied sequentially.
Attributes:
testes:ValInfCell is a list of information about the validation testes. The new datatypeValInfCell
is a list ofValInfo classes. EachValInfo class stores information aboutonevalidation test.
Internal classes:
7.4.1.5.1 Class ValInfo Is an abstract class, used as root of further specialization.
7.4.1.5.2 Class CuttingVInfo (ValInfo) Is specialization of the ValInfo class for validation by
cutting (Section 5.5.2).
Attributes:
cutpoints:mxArray is an array of time-indexes defining the grid of cutting moments.
7.4.1.6 Class IdealInfo
Is a class storing information about the user requirements on output of the closed loop. Once again,
all observations on the closed loop system are done on-line, hence the information in this field is
channel-specific.
Attributes:
Observed:IdealInfCell is a list information about the desired closed-loop behaviour of selected
channels. The new datatypeIdealInfCell is a list of IdealChInfo classes. EachIdealChInfo
class stores information aboutonechannel.
Internal classes:
93
CHAPTER 7. SOFTWARE IMAGE
7.4.1.6.1 Class IdealChInfo Is a class representing the user-desired values of each channel.
Attributes:
id:int identifies the channel on which the requirements are imposed. Channel of the sameid must
exist in theDataInfo structure.
imin:double is the requested minimum value of the data in this channel.
imax:double is the requested maximum value of the data in this channel.
dmin:double is the requested minimum of the difference of data-value in this channel between
two subsequent observations.
dmax:double is the requested maximum of the difference of data-value in this channel between
two subsequent observations.
Note that this class describes time-invariant requirements on the system. Other specializations of
IdealChInfo must be derived to describe time-variant requirements.
7.4.1.7 Class DesignInfo
Is a class with requirements and settings used for the task of design of the DM strategy.
Attribute:
horizon:int is the number of optimized steps ahead,t in Proposition 3.3.
options:string is the list of options (tuning parameters) of the FPD algorithm (Proposition 3.3).
At this stage we do not impose any structure of this attribute. It will be interpreted by the
corresponding operation implementing FPD.
7.4.1.8 Class DValidInfo
Is a class with requirements and settings used for the task of validation of the designed DM strategy.
Attributes:
ndat:long the number of simulated time steps.
tolerance:double a tuning knob in decisions of validity of the designed DM strategy.
7.4.2 Special purpose classes
In this Section, the functionality of pdfs from packagesProb andFProb is extended in order to
support for fictious data (Section 5.2), time-variant ideals (Section 5.6), and simulation (Section
5.8). UML class diagram of the new classes is displayed in Figure 7.8.
94
7.4. PACKAGE SINGLEDM
Figure 7.8:UML class diagram of specialization of pdfs for practical tasks of DM.
7.4.2.1 Class FictOPdf (oPdf)
Is a class designed for the task of prior elicitation (Section 5.2). This class is specialization of the
originaloPdf class.
Operation:
create: with argumentK:PriorInfo , analyzes the argument and creates a new instance ofoPdf with
a new instance of theDataSourcefilled with fictious data for givenK .
7.4.2.2 Class iPdf (ePdf)
Is a class designed for the task of ideal elicitation (Section 5.6), and the DM strategy design (Section
5.7). There, a formal method of step-wise construction of the ideal distribution on the DM horizon
was introduced.
Operations:
create with argumentU:IdealChInfo , is an alternative constructor of the inheritedpPdf attributes.
It analyses the argument and assigns such values to the inherited statistics that are appropriate
for the givenU.
update re-defines the inherited operationupdate, if the operation differs. (It is expected that the
update operation for ideal pdfs will be simpler than that for predictors.)
7.4.2.3 Class Simulator (DataSource)
Is a class designed for the task of design validation (Section 5.8), to allow simulation of the data. It
extends the DataSource class to allow for writing one-step ahead data.
Operations:
step is redefined to accept the argumentDt:mxArray , which assigns values of theDt attribute for
the next step.
95
CHAPTER 7. SOFTWARE IMAGE
Figure 7.9:UML class diagram of the Hypothesis class.
7.4.2.4 Class Hypotheses
Is a class designed for the task of model selection (Section 5.3) and, by specialization, for the task
of model validation (Section 5.5). Technically, these tasks can be achieved by sequential use of the
standard operations on pdfs. However, we separate the task into this class, since special treatment
is required for some model classes (Remark 5.2). UML class diagram for this class is displayed in
Figure 7.9.
Attributes:
weights:mxArray is the posterior estimate of the likelihood (5.4) corresponding to each consid-
ered hypotheses.
Ests:ePdfCell is a list of posterior estimates for each considered hypothesis,
OMs:oPdfCell is a list of observation models for each hypotheses,
SMs:oPdfCell is a list of internal models for each hypotheses,
Lengths of these lists may differ for different specializations of the class. Correct handling of differ-
ent lengths must be assured within the operationtest.
Operations:
create with attributeMI:ModelInfo is a constructor which creates the class attributes based on the
information from the user.
test with argumentsDS:DataSourceandndat:int , is an operation which typically calls theup-
date and log_pred operations for each of the estimates inEsts and accumulate their results
in weights. For exponential family, this operation needs to be redefined to update just one
estimate and subsequently evaluate likelihoods for all possible sub-structures (Remark 5.2).
96
7.4. PACKAGE SINGLEDM
Figure 7.10:UML class diagram of the basic single-participant decision makers.
7.4.2.5 Class MVHypothesis (Hypothesis)
Is a specialization of theHypothesisclass for the task of model validation by cutting (Section 5.5.2).
In this case, the estimates in the listEstshave the same structure, but differ in the data on which are
conditioned.
Attributes:
cutpoints:mxArray is a predefined grid of cutting points,
Operations:
test the operation is re-defined to update different estimates in Ests in different parts of the cutting
grid.
create with attributeMVI:CuttingVInfo , re-implements of the original constructor to accept the
user info on the cutting grid.
7.4.3 Decision Makers
The UML class diagram of the basic decision makers defined in this package is displayed in Figure
7.10.
7.4.3.1 Class AdaptDM
The basic class of this package implements the on-line version of the decision-maker as described
by Agreement 5.1.
Attributes:
OM:oPdf is the observation model (3.10),
SM:ePdf is the internal model (3.9),
97
CHAPTER 7. SOFTWARE IMAGE
Est:ePdf is the current estimate (3.14),
Stra:pPdf is the designed DM strategy (3.23),
DS:DataSource is the actual link with the environment, i.e. it is the source of the observed data,
and destination of the decisions.
Operations:
new is the constructor of the class.
read reads the data from the environment, in the simplest case, it just callsDS.step(),
learn uses the observed data to improve its knowledge about the environment, in the simplest case,
it just callsEst.update(OM,SM).
adapt uses the updated estimates to adjust the designed DM strategy (if it depends on the parame-
ters). In the simplest case, it callsEst.expect()and propagates results toStra.replace_stats().
decide selects the optimal decision using the current data and strategy. Since the DM strategy
is designed as pdf, we need to collapse it to a value. It can be done via (i) moments, or
(ii) random sampling. For the first case, the operation callsMom:=Stra.expectation() and
Ut:=Mom.evalall(). For the second case,Updf:=Stra.condition() andUt:=Updf.sample().
write writes the decisionUt into the environment. In the simplest case, it callsDS.write(Ut).
run repeatedly calls the above procedures, i.e.read–write , for all available data.
7.4.3.2 Class SingleDM (AdaptDM)
Is a class implementing the off-line steps of decision making as described by Agreement 5.2. This
class is specialization of the classAdaptDM since the purpose of the off-line analysis is to create all
structures needed of the on-line mode.
Attributes:
UseInf:UserInfo the structure of expert information from the user,
I-OM:iPdf is the ideal observation model in (3.22),
I-SM:iPdf is the ideal internal model in (3.22),
I-U:iPdf is the ideal DM strategy in (3.22),
BF:function is the auxiliary Belman functionγ (t), (3.24).
Operation:
98
7.4. PACKAGE SINGLEDM
Figure 7.11:UML sequence diagram of the task of prior elicitation.
new with argumentUseInf:UserInfo, is the constructor.
PriorElicit performs the task of prior elicitation (Section 5.2). It analyzes thePriorInfo field in the
UseInf structure and elicits the prior values of theEst attribute. This procedure is displayed in
Figure 7.11.
ModelSel performs the task of model selection (Section 5.3). It creates a new instance of the
Hypothesisclass using theModelInfo field in theUseInf structure and elicits the prior values
of theEst attribute. Internally, it may call thePriorElicit function.
Learning performs the task of learning (Section 5.4). In the simplest case, it just calls the inherited
run procedure.
ModelValid performs the task of model validation (Section 5.5). It creates a new instance of the
Hypothesisclass using theMValidInfo field in theUseInf structure and elicits the prior values
of theEst attribute.
IdealElicit performs the task of ideal elicitation (Section 5.6). It analyzes theIdealInfo field in the
UseInf structure and elicits the prior values of theEst attribute.
FPD performs the task of design of the DM strategy (Section 5.7). Its UML sequential diagram is
displayed in Figure 7.12
FpdValid performs the task of design validation (Section 5.8). Internally, a new instance of the
Simulator class is created and used in place ofDS.
BatchRun analysis thesteps_to_dofiled in the UseInf attribute, and runs the above tasks, i.e.
PriorElicit –FpdValid , if they are selected by the user. This mechanism assures both types of
user interaction, i.e. batch and interactive modes, as discussed in Section 5.1.
99
CHAPTER 7. SOFTWARE IMAGE
Figure 7.12:UML sequence diagram of the fully probabilistic design (FPD).
100
7.5. PACKAGE MULTIDM
value timestamp
value timestamp
value timestamp
communicationblock (incomming)
communicationblock (outgoing)
::
value timestamp
channels
blocks
noc cic coc cbs description
Figure 7.13:Illustration of structure of the DAEP.
7.5 Package MultiDM
This package extends the classes from packageSingleDM (Section 7.4) for the MP DM scenario
(Chapter 6). It should be kept in mind that classes defined in this package are preliminary, since they
were not verified by real experiments.
7.5.1 User Information
7.5.1.1 Datatype DAEP
Since we allow the environment to be implemented by any technology—i.e. not necessarily by the
OO approach—the interface between the participant and the environment is done via Data-Action
Exchange Platform, which not a class but a data-structureDAEP.
The DAEP is illustrated in Figure 7.13, and it consist of the following major parts:
Description has a fixed structure and uniquely determines the following parts of DAEP. This part
is to be read by another participant in order to find the communication channels.
Channels is a list of data channels. There are two types of channels: (i) input channel, where the
environment writes its data, and (ii) output channels, where the participant writes its decisions.
Since different participants can work with different sampling period, values of all data written
into DAEP must be accompanied by exact timestamp of their creation.
At least two channels (one for input and one for output) are reserved for synchronization of
inter-participant communication. All participants can write their requests and responses to
their output channel. The environment is responsible for transfer of data from the output
channel to the input channel of the appropriate recipient.
101
CHAPTER 7. SOFTWARE IMAGE
Figure 7.14:UML class diagram of the MPUserInfo class used for storing information from the user.
Communication blocks are two blocks (input and output) in memory allocated for exchange of
arbitrary information. This information is treated as an array of bytes by DAEP and it should
be interpreted by communication routines of each participant.
Once again, each participant writes its data into its own output block and the environment is
responsible for their delivery to the input block of the recipient.
Thus, the DAEP is uniquely described by the following attributes:
noc:int the total number of data channels,
cic:int the number (identifier) of the communication-input channel,
coc:int the number (identifier) of the communication-output channel,
cbs:int size of the communication block,
7.5.1.2 Class MPUserInfo (UserInfo)
Is an extension of the classUserInfo for additional MP-related information. UML class diagram is
displayed in Figure 7.14.
Attributes:
NeighbourInf:NeiInfCell is an additional argument which is used to store the users information
about the participant neighbours. It is constructed as a list of information about particular
neighbours.
DataInf:DAEPInfo is redefined as an instance of theDAEPInfo class, which is extension of the
originalDataInfo class to reflect the structure of the DAEP.
102
7.5. PACKAGE MULTIDM
Figure 7.15:UML class diagram of the data-handling mechanism for MP DM.
7.5.1.3 Class NeighInf
Is a class for storing the user information on the participant neighbours.
Attributes:
id:PartID is a unique identifier of each participant. At present, it is represented by an abstract
datatypePartID . More detailed description of this datatype may be specified for each imple-
mentation.
alpha:double fixed value ofα, for selfish and hierarchical negotiation strategies (Section 6.4). For
cooperative scenario, this value may be used as initial condition for the negotiation procedure.
7.5.1.4 Class DAEPInfo (DataInfo)
Is an extension of theDataInfo class made to contain user information about the structure ofDAEP.
Attributes:
Cbs:long is the user-defined size of the communication blocks in DAEP.
Cic:int is the index of the incoming-communication channel in theChns list (inherited from DataInfo).
The channel must be of discrete type, withaction set tofalse.
Coc:int is the index of the outgoing-communication channel in theChns list (inherited from DataInfo).
The channel must be of discrete type, withaction set totrue.
7.5.2 Special purpose classes
Due to similarities of the intended approach to MP DM with practical tasks of DM for a single
participant (Chapter 5), very little needs to be done to adapt the structure of the basic classes of
probability calculus. The only exception is the data-handling mechanism, however, even their the
structural changes are rather small. The UML class diagram of the extension is displayed in Figure
7.15.
7.5.2.1 Class DAEPSource (DataSource)
Is a specialization of theDataSourceclass designed to provide an interface betweenDAEP and the
probabilistic core of each participant.
Attributes:
103
CHAPTER 7. SOFTWARE IMAGE
Figure 7.16:UML class diagram of the MP decision maker.
DAEP:DAEP is the instance of theDAEP datatype.
period:double is the period of sampling from the continuous time.
Operations:
step is re-implementation of the inherited operation used for innovating of the observed data (at-
tributeDt). This operation must be re-defined for each special type of theDataSource, how-
ever, in this case it is unusually challenging. Note that the observed data records are contin-
uously being written to DAEP by the environment. Each observation has its own timestamp
which can have (almost) arbitrary value. Therefore, the task of this operation is to re-sample
the irregular continuous-time observations from DAEP into fixed grid discrete-time observa-
tions forDt.
write is re-implementation of the inherited operation for writing the participant’s decisions into the
environment. This operation is simpler thanstep, since it can be written regularly at the end of
the operation cycle. This operation calls theatime operation internally to assign an appropriate
timestamp for each decision.
atime:double returns the actual time in the same format as the timestamps have.
7.5.3 Decision Makers
7.5.3.1 Class MultiDM (SingleDM)
Is an extension of the single participant classSingleDM.
Attributes:
DAEP:DAEP is an instance of theDAEP datatype.
UseInf:MPUserInfo is an instanceMPUserInfo instead of the originalUserInfo,
Operations:
read is an inherited operation that must be extended to read not only the observed data, but also
the information communicated from other participants. Specifically, if some information is
present in the input communication block of the DAEP, then, this operation must recognize
the nature of this information and call an appropriate constructor for it.
104
7.5. PACKAGE MULTIDM
learn is an inherited operation that must be extended to handle possible merging of information
from the neighbour as discussed in Section 6.1, i.e. by calling themergeoperations of the in-
volved pdfs internally, or as a parallel process. In the latter case, the followingmergeoperation
is used.
merge is an auxiliary operation for cases where merging must be performed as a parallel operation
to learning, and possibly, over more than one DM cycle.
decide is an inherited operation which must be extended to implement the chosen negotiation strat-
egy (Section 6.4). In situation, when the aims—formalized by ideal pdfs—were changed, the
operation must call the design operation to adjust the DM strategy (attributeStra).
design is an auxiliary operation that is called when the ideal distribution has changed. In some
cases, it may be sufficient to adjust statistics of the strategy by calling theStra.replace_stats.
However, when the change in the ideal distributions is significant, the fullFPD operation must
be called.
This class finalizes the analysis of the Bayesian multiple participant decision making. Detailed spe-
cialization and implementation of the classes presented in this analysis will be the next step in the
global project of Bayesian MP DM.
105
CHAPTER 7. SOFTWARE IMAGE
106
8 Conclusion
In this thesis, we have designed a new framework for software support for Bayesian distributed
dynamic decision making. The primary concern of the thesis is the software framework. The task
of its design was complicated by the fact that the theory of distributed decision-making is not fully
developed and stabilized. Therefore, many theoretical issues that were encountered during the design
process were also addressed. As a result, a range of smaller contributions to the theory of decision
making was also achieved.
8.1 Key contributions of the thesis
Chapter 2 The requirements on the analysis were formalized by Requirements 2.1, and 2.2. The
requirements were often found contradictory and it was necessary to find a reasonable com-
promise.
The most prominent freely available software packages were reviewed in the light of our re-
quirements. It was concluded that none of the packages is suitable for our needs and that it is
necessary to create a new one.
Since flexibility of the framework was one of the key requirements, we have chosen the object-
oriented (OO) approach as a design method. On the other hand, the requirement of continuity
of research forced us to use Matlab as the basic development environment. Therefore, we have
proposed a novel approach of implementation of OO software in Matlab (Agreement 2.1). The
approach was tested on a simple problem and it was verified that it is possible to implement
OO principles in Matlab at a negligible loss of computational efficiency.
Chapter 3 The basics of Bayesian decision-making theory were reviewed in this Chapter. We have
presented well known results, as well as new emerging methods such as fully probabilistic
design (FPD) for systems with unobserved state (Proposition 3.2) and merging of pdfs (Section
3.4). Moreover, these results were translated into a sequence of basic probabilistic operations,
which are suitable for software implementation, see e.g. (3.27) for FPD.
Chapter 4 It is well known that the Bayesian theory of decision making is computationally tractable
only under certain assumptions. The well known basic DM operations for linear state-space
models, and exponential family models, were reviewed. Moreover, we have presented the
results of merging operations for these models.
107
CHAPTER 8. CONCLUSION
Many approximate techniques were developed for model families for which the general Bayesian
DM is not analytically tractable. These techniques can be seen as distributional approxima-
tions that are being applied to the general DM formulae. These techniques were also reviewed
in this Chapter.
Special attention was paid to the Variational Bayes technique, which is based on the assump-
tion of conditional independence. This assumption is a successful, widely used approximation
in the area of Bayesian networks. Better understanding of this assumption in terms of dynamic
decision-making may open a way for dealing with more complex models than it is usual at
present. It was discovered, that application of the Variational Bayes theorem (Theorem 4.1) to
the tasks of Bayesian filtering (Section 4.4.2), and Bayesian estimation (Section 4.5), can be
interpreted as exact analytical treatment with approximate (conditionally independent) models.
However, statistics of the posterior distributions are mutually dependent and must be evaluated
iteratively. This result is important for two reasons: (i) it guarantee a fixed finite-dimensional
form of the posterior distributions, which is important for achieving feasibility, and (ii) the
quality of approximation can be increased by iterations of the VEM algorithm.
Chapter 5 The Bayesian formulation of the decision making task is a consistent mathematical the-
ory. However, its application to real life problems is not trivial and many problems must
be addressed to achieve practical solutions. The basic steps of implementing DM theory
in practice—gained from the experience withsingleparticipant DM—were reviewed in this
Chapter. Therefore, this chapter is concerned with single participant DM.
Most of the steps is concerned with translating real-world experience into abstract objects of
the theory, namely the involved pdfs, i.e. prior distributions, models, ideal pdfs, etc. The
algorithms of DM can be applied only when those objects are chosen and fixed. Yet, we can
still question their validity for the given task after processing of real data.
The main contribution in this Chapter is in the area of model validation fordynamicDM. The
classical approach of splitting the real data in two parts (learning and validation part) was re-
formulated for dynamic models in terms of Bayesian DM (Section 5.5). It was observed that
the algorithm is sensitive to the choice of the cutting moment. To address the problem a new
method with multiple cutting moments was developed (Section 5.5.2).
Chapter 6 The basic practical steps of design of a single-participant DM were reviewed in the light
of multiple-participant scenario in this Chapter. An original concept of Bayesian MP DM is
presented in this Chapter. It was shown, that many sub-tasks of the MP DM (such as merging)
has already been addressed in the design tasks of single-participant DM. Detailed elaboration
of these principles is a task for future research.
Chapter 7 The core contribution of the thesis—i.e. analysis of new-generation software framework—
is presented in this Chapter. The analysis is presented in the Unified Modelling Language
(UML) notation. Following the UML methodology, the software is presented in five packages:
108
8.2. FUTURE WORK
one for mathematical functions, and four packages implementing the classes of Bayesian DM.
Each of the latter packages correspond to one chapter of the theory (Chapters 3–6).
Since all tasks of DM are implemented in terms of probability calculus, the most challenging
task was to design the basic classes for random variables, functions and pdfs. The chosen
approach appears to be very perspective, as it embraces the classical models—such as linear
state-space models (Section 7.3.1) and exponential family models (Section 7.3.2)—as well as
the new approximative models based on conditional independence (Section 7.3.3).
The analysis presented in this thesis reveals that structural differences between software images of
participants in multiple- and single-participant DM are rather small. This result is a consequence
of the chosen approach to the task of distributed DM (Section 1.1.2) and the chosen OO approach
to software design. Therefore, future development of distributed DM systems, using the multiple-
participant approach, is conceptually well defined in terms of classical single-participant paradigm.
However, a wide range of various problems must be overcome in order to achieve such maturity of
the theory and the its software image that would allow its application to real world problems.
8.2 Future work
As it was stated in the introduction (Section 1.1), this thesis is an initial step in creation of the
Bayesian distributed decision making theory. The amount of work required to reach this aim is
extensive. Therefore, in this Section, we mention only short-term tasks that are closely related to the
designed software.
Implementation the basic classes for linear state-space models were already implemented in the
Baddyr repository,http://guest:[email protected]:1800/svn/badyr/
work/Participants . This initial work helped to clarify many details in packagesProb
andFProb. It can be expected that implementation of the remaining packages (SingleDM and
MultiDM ) will also lead to clarification and modification of many details in them.
Particle filtering and geometric approach (Section 4.5) were not elaborated as part of theFProb
package. Preliminary considerations indicate that these techniques fit in the proposed frame-
work, and can by easily added by specialization of the basic classes in theProb package.
Computational efficiency issues were mostly neglected in this text. These issues are very im-
portant in practical implementations and many clever speedup were proposed for standard al-
gorithms. However, the main purpose of this text was to prepare a framework for development
of newalgorithms. Therefore, the main concern of this analysis was with keeping the software
structures as close to the theory as possible. However, implementation of computationally
optimized algorithms should be straightforward due to the object-oriented approach.
Communication between participant must be synchronized using a finite-state protocol. Creation
of such a protocol is essential for experiments with MP scenarios. It seems reasonable to
109
CHAPTER 8. CONCLUSION
implement the standard used in multi-agent systems, which is known as the request interaction
protocolhttp://www.fipa.org/specs/fipa00026/ .
FPD solution for the conditionally independent models (e.g. VB approach, Section 4.3.4) was
not elaborated yet. Elaboration of this step with elements from exponential family should
be relatively straightforward. This result, if it would be computationally tractable, could be
extremely useful for design of strategy of participant negotiation.
110
Index
actions, 1
attributes, 11, 15
Bayesian decision making, 17
Bayes rule, 19
BNT, 10
BNT„ 8
Chain rule, 19
channel, 48, 90
communication, 2, 59, 60
conditional independence, 19
conditioned on, 19
conditioning symbol, 19
conjugacy, 35
DAEP, 101
decision maker, 1, 17
DESIGNER, 47
determinant, 20
direct merging, 26, 61, 73
distributed DM, 1
DM horizon, 25, 29
DM strategy, 1, 21, 23, 25
Dynamic DM, 1
empirical density, 60
environment, 2
estimate, 73
estimate of internals, 22
estimation, 23
Expectation, 20
exponential family, 33, 80
feedback, 1
fictious data, 94
Framework, 6
fully probabilistic design, 58
fully probabilistic design (FPD), 23
ideal pdf, 24
Implementation, 6
indirect merging, 26, 61, 73
internal model, 21, 23, 30, 42, 71, 83
internal variable, 21
Jacobian, 33
Kullback-Leibler (KL) divergence, 20
learning, 1, 20
learning data, 54
likelihood function, 23
linear-in-parameters, 34
Marginalization, 19
Markov chain, 34
merged pdf, 26
merging, 26, 35
Mixtools, 8, 9
model structure, 22, 53
multiple-participant decision making (MP DM),
2
negotiation, 59, 61
111
Index
Normalization, 19
normalization factor, 23
object-oriented (OO), 7, 11, 13
observation model, 21, 30, 33, 45, 74
observed data, 70
operations, 11
optimal DM strategy, 20
parameter, 23
partial VB-observation model, 45, 86
partial VB-observation models, 87
participant, 2, 18
pdf, 18
pdfs, 71
Pdf of transformed variables, 20
prediction of internals, 22
predictor, 74
projection, 36
proportion sign,∝, 19
random variable, 18, 66
realization, 18
source pdfs, 26
specialization, 15
user, 7, 47
VB-conjugate, 45
112
Bibliography
[1] A. Wald,Statistical Decision Functions. New York, London: John Wiley & Sons, 1950.
[2] P. Fishburn,Utility Theory for Decision Making. New York, London, Sydney, Toronto: J. Wiley
and Sons, 1970.
[3] H. Mine and S. Osaki,Markovian Decision Processes. New York: Elsevier, 1970.
[4] R. Keeny and H. Raiffa,Decisions with multiple objectives: Preferences and value tradeoffs.
New York: J. Wiley and Sons, 1978.
[5] J. Berger,Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag,
1985.
[6] R. Bellman,Introduction to the Mathematical Theory of Control Processes. New York: Aca-
demic Press, 1967.
[7] H. Kushner,Introduction to stochastic control. New York, San Francisco, London: Holt, Rine-
hart and Winston, 1971.
[8] K. Astrom and B. Wittenmark,Adaptive Control. Reading, Massachusetts: Addison-Wesley
Publishing Company, 1989.
[9] E. Mosca,Optimal, Predictive, and Adaptive Control. Prentice Hall, 1994.
[10] D. Bertsekas,Dynamic Programming and Optimal Control. Nashua: Athena Scientific, 2001.
2nd edition.
[11] B. T. (Ed.),Control Theory. New York: IEEE Press, 2001. 523 pp.
[12] K. Astrom, Introduction to stochastic control. New York, San Francisco, London: Academic
Press, 1970.
[13] M. DeGroot,Optimal Statistical Decisions. New York: McGraw-Hill Company, 1970.
[14] A. Feldbaum, “Theory of dual control,”Autom. Remote Control, vol. 21, no. 9, 1960.
[15] A. Feldbaum, “Theory of dual control,”Autom. Remote Control, vol. 22, no. 2, 1961.
113
Bibliography
[16] P. Wellstead and M. Zarrop,Self-tuning Systems. Chichester: John Wiley & Sons, 1991.
[17] M. Kárný and J. Kracík, “A normative probabilistic design of a fair governmental decision
strategy,”Journal of Multi-Criteria Decision Analysis, vol. 10, pp. 1–15, 2004.
[18] T. Edmonds, S. Hodges, and A. Hopper, “An adaptive thin-client robot control architecture,” in
Proceedings of International Conference on Real-Time Computing Systems and Applications,
1999.
[19] Y. Haimes and D. Li, “Hierarchical multiobjective analysis for large scale systems: Review and
current status,”Automatica, vol. 24, no. 1, pp. 53–69, 1988.
[20] T. Sandholm, “Distributed rational decision making,” inMultiagent Systems - A Modern Ap-
proach to Distributed Artificial Intelligence(G. Weiss, ed.), pp. 201–258, 1999.
[21] A. Bonastre, R. Ors, and M. Peris, “Distributed expert systems as a new tool in analytical
chemistry,”Trac–Trends Anal. Chem., vol. 20, no. 5, pp. 263–271, 2001.
[22] R. Caballero, T. Gomez, M. Luque, F. Miguel, and F. Ruiz, “Hierarchical generation of pareto
optimal solutions in large-scale multiobjective systems,”Computers and operations research,
vol. 29, no. 11, pp. 1537–1558, 2002.
[23] H. Nurmi, “Resolving group choice paradoxes using probabilistic and fuzzy concepts,”Group
Decision and Negotiation, vol. 10, pp. 177–198, 2001.
[24] D. Fudenberg and J. Tirole,Game Theory. MIT Press, 1991.
[25] H. L. Dreyfus and S. Dreyfus, “From socrates to expert systems: The limits and dangers of
calculative rationality.”
[26] K. Arrow, Social Choice and Individual Values. New Haven: Yale University Press, 1995. 2nd.
ed.
[27] M. Kárný and T. Guy, “On dynamic decision-making scenarios with multiple participants,” in
Multiple Participant Decision Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), (Adelaide),
pp. 17–28, Advanced Knowledge International, May 2004.
[28] M. Kárný, J. Böhm, T. Guy, L. Jirsa, I. Nagy, P. Nedoma, and L. Tesar, Optimized Bayesian
Dynamic Advising: Theory and Algorithms. London: Springer, 2005. to appear.
[29] M. Kárný, “Towards fully probabilistic control design,”Automatica, vol. 32, no. 12, pp. 1719–
1722, 1996.
[30] M. Kárný, J. Böhm, T. V. Guy, and P. Nedoma, “Mixture-based adaptive probabilistic control,”
International Journal of Adaptive Control and Signal Processing, vol. 17, no. 2, pp. 119–132,
2003.
114
Bibliography
[31] M. Kárný and T. Guy, “Fully probabilistic control design,”Systems & Control Letters, 2004.
submitted.
[32] J. Kracík, “On composition of probability density functions,” inMultiple Participant Decision
Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), vol. 9 ofInternational Series on Advanced
Intelligence, pp. 113–121, Adelaide, Australia: Advanced Knowledge International, 2004.
[33] J. Andrýsek, M. Kárný, and J. Kracík, eds.,Multiple Participant Decision Making, (Adelaide),
Advanced Knowledge International, May 2004.
[34] M. K. et al,ProDaCTool Background. Internal Report of IST-99-12058 Project, 2001.
[35] A. Rakar, P. N. Tatiana V. Guy, M. Kárný, and D. Juricic, “Advisory system prodactool: Case
study on gas conditioning unit,”Journal of Adaptive Control and Signal Processing, 2003.
submitted.
[36] K. Murphy, “The bayes net toolbox for matlab,”Computing Science and Statistics, vol. 33,
2001.
[37] P. Coad and N. J.,Object-Oriented Programming. Prentice Hall, 1993.
[38] P. Nedoma, and M. Kárný and J. Böhm,ABET: Adaptive Bayesian Estimation Toolbox for
MATLAB. Prague, Czech Republic: ÚTIA AVCR, 1996.
[39] M. Kárný and A. Halousková, “Designer – package for preliminary tuning of LQG adaptive
control. programmer’s manual (with user’s guide). version 4.0,” Tech. Rep. 1665, UTIA AV
CR, PObox 18, 182 08 Prague 8, Czech Republic, 1990.
[40] P. Nedoma, and M. Kárný and J. Böhm, “Project DESIGNER,” inAdaptive and Predictive
Control, Proceedings of the Spring school(A. Kuznetsov, ed.), pp. 91–93, 1996.
[41] S. Hendrickx, “Glib-c: C as an alternative object oriented environment,” Master’s thesis, UNI-
VERSITEIT ANTWERPEN, 2003–2004.
[42] T. Van Sickle,Reusable software components: object-oriented embedded systems programming
in C. Prentice-Hall, 1997.
[43] G. Booch, J. Rumbaugh, and I. Jacobson,The Unified Modelling Language User Guide.
Addison-Wesley, 1998.
[44] V. Peterka, “Bayesian system identification,” inTrends and Progress in System Identification
(P. Eykhoff, ed.), pp. 239–304, Oxford: Pergamon Press, 1981.
[45] M. Rao,Measure Theory and Integration. New York, Chichester, Brisbane, Toronto, Singapore:
John Wiley and Sons, 1987. Pure and Applied Mathematics, Wiley-Interscience Series of Texts,
Monographs and Tracts.
115
Bibliography
[46] V. Peterka, “Bayesian approach to system identification,” inTrends and Progress in System
identification(P. Eykhoff, ed.), pp. 239–304, Oxford: Pergamon Press, 1981.
[47] S. Kullback and R. Leibler, “On information and sufficiency,”Annals of Mathematical Statis-
tics, vol. 22, pp. 79–87, 1951.
[48] T. Guy, J. Böhm, and M. Kárný, “Multiobjective probabilistic mixture control,” inIFAC World
Congress, Preprints(IFAC, ed.), Prague: IFAC, 2005. accepted.
[49] J. Kracík, “Composition of probability density functions - optimizing approach,” Tech. Rep.
2099, ÚTIA AV CR, Praha, 2004.
[50] J. Andrýsek, “Approximate recursive Bayesian estimation of dynamic probabilistic mixtures,”
in Multiple Participant Decision Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), pp. 39–
54, Magill, Adelaide: Advanced Knowledge International, 2004.
[51] V. Šmídl, The Variational Bayes Approach in Signal Processing. PhD thesis, Trinity College
Dublin, 2004.
[52] J. M. Bernardo, “Expected infromation as expected utility,”The Annals of Statistics, vol. 7,
no. 3, pp. 686–690, 1979.
[53] R. Kulhavý, “Recursive nonlinear estimation: A geometric approach,”Automatica, vol. 26,
no. 3, pp. 545–555, 1990.
[54] S. Dalal and G. H. Jr., “On approximating parametric Bayes models by nonparametric Bayes
models,”The Annals of Statistics, vol. 8, pp. 664–672, 1980.
[55] R. Kulhavý, “A Bayes-closed approximation of recursive non-linear estimation,”International
Journal Adaptive Control and Signal Processing, vol. 4, pp. 271–285, 1990.
[56] E. Daum, “New exact nonlinear filters,” inBayesian Analysis of Time Series and Dynamic
Models(J. Spall, ed.), New York: Marcel Dekker, 1988.
[57] O. Barndorff-Nielsen,Information and exponential families in statistical theory. New York:
Wiley, 1978.
[58] L. Ljung and T. Söderström,Theory and practice of recursive identification. Cambridge; Lon-
don: MIT Press, 1983.
[59] D. Titterington, A. Smith, and U. Makov,Statistical Analysis of Finite Mixtures. Chichester,
New York, Brisbane, Toronto, Singapore: John Wiley & Sons, 1985. ISBN 0 471 90763 4.
[60] B. Ristic, S. Arulampalam, and N. Gordon,Beyond the Kalman Filter: Particle Filters for
Tracking Applications. Artech House Publishers, 2004.
[61] S. Kotz and N. Johnson,Encyclopedia of statistical sciences. New York: John Wiley, 1985.
116
Bibliography
[62] J. Bernardo and A. Smith,Bayesian theory. Chichester, New York, Brisbane, Toronto, Singa-
pore: John Wiley & Sons, 1997. 2nd edition.
[63] J. Pratt, H. Raiffa, and R. Schlaifer,Introduction to Statistical Decision Theory. MIT Press,
1995.
[64] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via
the EM algorithm,”Journal of Royal Statistical Society, Series B, vol. 39, pp. 1–38, 1977.
[65] R. M. Neal and G. E. Hinton, “A view of the EM algorithm that justifies incremental, sparse
and other variants.,” inLearning in Graphical Models(M. I. Jordan, ed.), pp. 355–369, Kluwer,
1998.
[66] C.F.J.Wu, “On the convergence properties of the EM algorithm,”The Annals of Statistics,
vol. 11, pp. 95–103, 1983.
[67] R. E. Kass and A. E. Raftery, “Bayes factors,”Journal of American Statistical Association,
vol. 90, pp. 773–795, 1995.
[68] S. Amari, S. Ikeda, and H. Shimokawa, “Information geometry ofα-projection in mean field
approximation,” inAdvanced Mean Field Methods(M. Opper and D. Saad, eds.), (Cambridge,
Massachusetts), The MIT Press, 2001.
[69] A. Quinn, P. Ettler, L. Jirsa, I. Nagy, and P. Nedoma, “Probabilistic advisory systems for
data-intensive applications,”International Journal of Adaptive Control and Signal Processing,
vol. 17, no. 2, pp. 133–148, 2003.
[70] M. Sato, “Online model selection based on the variational bayes,”Neural Computation, vol. 13,
pp. 1649–1681, 2001.
[71] J. W. Miskin,Ensemble Learning for Independent Component Analysis. PhD thesis, University
of Cambridge, 2000.
[72] S. Amari,Differential-Geometrical Methods in Statistics. Sringer-Verlag, 1985.
[73] R. Kulhavý and M. B. Zarrop, “On general concept of forgetting,”International Journal of
Control, vol. 58, no. 4, pp. 905–924, 1993.
[74] A. Jazwinski,Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.
[75] R. Kulhavý, “Recursive Bayesian estimation under memory limitations,”Kybernetika, vol. 26,
pp. 1–20, 1990.
[76] J. M. Bernardo, “Approximations in statistics from a decision-theoretical viewpoint,” inProb-
ability and Bayesian Statistics(R. Viertl, ed.), pp. 53–60, New York: Plenum, 1987.
117
Bibliography
[77] H. J. Kushner and G. G. Yin,Stochastic Approximation Algorithms and Applications. New
York: Springer-Verlag, 1997.
[78] J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability
for process control: A review and future directions,”International Journal of Control, Automa-
tion, and Systems, vol. 2, no. 3, pp. 263–278, 2004.
[79] G. J. Gordon, “Stable function approximation in dynamic programming,” inProceedings of the
Twelfth International Conference on Machine Learning, pp. 261–268, 1995.
[80] J. Böhm and M. Kárný, “Quadratic adaptive control of normal mixtures,” inProceedings of the
European Control Conference ECC’01(J. L. Martins de Carvalho, ed.), (Porto), Faculdade de
Engenharia da Universidade, September 2001.
[81] J. Bucha, M. Kárný, P. Nedoma, J. Böhm, and J. Rojícek, “Designer 2000 project,” inInterna-
tional Conference on Control ’98, (London), pp. 1450–1455, IEE, September 1998.
[82] M. Kárný, “Tools for computer aided design of adaptive controllers,”IEE Proceedings - Con-
trol Theory and Applications, vol. 150, no. 6, p. 642, 2003.
[83] M. Kárný, P. Nedoma, N. Khailova, and L. Pavelková, “Prior information in structure esti-
mation,” IEE Proceedings - Control Theory and Applications, vol. 150, no. 6, pp. 643–653,
2003.
[84] M. Novák, J. Böhm, P. Nedoma, and L. Tesar, “Adaptive LQG controller tuning,”IEE Proceed-
ings - Control Theory and Applications, vol. 150, no. 6, pp. 655–665, 2003.
[85] M. Kárný, “Quantification of prior knowledge about global characteristics of linear normal
model,”Kybernetika, vol. 20, no. 5, pp. 376–385, 1984.
[86] M. Kárný and A. Halousková, “Automatic prior design of LQG adaptive controllers,” Tech.
Rep. 1794, ÚTIA AVCR, POBox 18, 182 08 Prague 8, Czech Republic, 1994.
[87] M. Kárný, N. Khailova, P. Nedoma, and J. Böhm, “Quantification of prior information revised,”
International Journal of Adaptive Control and Signal Processing, vol. 15, no. 1, pp. 65–84,
2001.
[88] N. Khaylova, “Exploitation of prior knowledge in adaptive control design,” tech. rep., FAV
ZCU, University of West Bohemia, Pilsen, Czech Republic, 2001. PhD Thesis.
[89] M. Kárný, “Algorithms for determining the model structure of a controlled system,”Kyber-
netika, vol. 19, no. 2, pp. 164–178, 1983.
[90] M. Plutowski, “Survey: Cross-validation in theory and practice,” research report, Department
of Computational Science Research, David Sarnoff Research Center, Princeton, New Jersey,
USA, 1996.
118
Bibliography
[91] B. Huang, “On-line closed-loop model validation and detection of abrupt parameter changes,”
Journal of Process Control, vol. 11, no. 6, pp. 699–715, 2001.
[92] M. Kárný, P. Nedoma, and V. Šmídl, “Cross-validation of controlled dynamic models: Bayesian
approach,” inIFAC World Congress, Preprints(IFAC, ed.), Prague: IFAC, 2005. accepted.
[93] G. Golub and C. VanLoan,Matrix Computations. Baltimore – London: The John Hopkins
University Press, 1989.
[94] G. Bierman,Factorization Methods for Discrete Sequential Estimation. New York: Academic
Press, 1977.
119