Software Analysis of Bayesian Distributed Dynamic Decision...

UNIVERSITY OF WEST BOHEMIA

FACULTY OF APPLIED SCIENCES

Software Analysis of Bayesian DistributedDynamic Decision Making

A thesis submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Prague, 2005 Václav Šmídl

Summary

Decision making is an active and purposeful selection of actions among several alternative options.

For humans, DM is a natural part of everyday life. The Bayesian theory provides a rigorous and

consistent tool to help the decision maker to select the best action to achieve his aim. A significant

application area of the decision-making theory is the control theory. Most of the applications of the

theory are based on two assumptions: (i) the optimal decision is the only action that intentionally

influences the response, (ii) the decision-maker pursue only one aim which is known a priori. A

theory of distributed Bayesian decision-making—which relax the above mentioned assumption—is

still under development.

This thesis is a contribution to a wider project in creation of consistent theory of distributed

Bayesian decision-making, using the concept of multiple-participant decision-making. The main

concern of this work is preparation of a new software framework for development and application

of the Bayesian distributed decision making theory. In order to achieve this aim we have done the

following:

Chapter 2: the requirements on the resulting software were formalized, and the most prominent

freely available software packages were reviewed in the light of our requirements. It was

concluded that none of the packages is suitable for our needs and that it is necessary to create

a new one. We have chosen the object-oriented (OO) approach as a design method of the

toolbox. Since the main development platform for the project is Matlab and ANSI C, we have

proposed a novel approach of implementation of OO software in these tools.

Chapter 3: the basics of Bayesian decision-making theory were reviewed in this Chapter. We have

presented well known results, as well as new emerging methods and translated them into a

sequence of basic probabilistic operations, which are suitable for software implementation.

Chapter 4: it is well known that the Bayesian theory of decision making is computationally tractable

only under certain assumptions. Many approximate techniques were developed for model fam-

ilies for which the general Bayesian DM is not analytically tractable. These techniques were

also reviewed in this Chapter. Special attention was paid to the Variational Bayes technique,

which is based on the assumption of conditional independence. The basic tasks of decision-

making for this approximation were introduced.

Chapter 5: the basic steps of implementing DM theory in practice—gained from the experience

i

Summary

with singleparticipant DM—were reviewed in this Chapter. Majority of these steps is con-

cerned with translating real-world experience into abstract objects of the theory. The algo-

rithms of DM can be applied only when those objects are chosen and fixed.

Chapter 6: the basic practical steps of design of a single-participant DM were reviewed in the light

of multiple-participant scenario in this Chapter. An original concept of Bayesian MP DM is

presented in this Chapter. It was shown, that many sub-tasks of the MP DM (such as merging)

has already been addressed in the design tasks of single-participant DM.

Chapter 7: the core contribution of the thesis—i.e. analysis of new-generation software framework—

is presented in this Chapter. Since all tasks of DM are implemented in terms of probability

calculus, the most challenging task was to design the basic classes for random variables, func-

tions and pdfs. The chosen approach appears to be very perspective, as it embraces the classical

models, as well as the new approximative models based on conditional independence.

Conclusions and suggestions for further work, are presented in Chapter 8.

ii

Acknowledgement

This thesis was prepared during my study at Faculty of Applied Sciences, University of West Bo-

hemia in Pilsen, Czech Republic. It is based on the research work carried out in the Adaptive Systems

Department, Institute of Information Theory and Automation, Academy of Sciences of the Czech

Republic.

I am grateful to Ing. Miroslav Kárný, DrSc., and Doc. Ing. Jirí Cendelín, CSc. for being my

supervisors for this thesis, for their support and inspiration.

I would like to thank my family, for their love and support over the years.

The financial support of the projects GACR 102/03/0049 and AVCR 1ET 100 750 401, BADDYR,

is gratefully acknowledged.

iii

Acknowledgement

iv

Contents

Summary i

Acknowledgement iii

Notational Conventions xi

List of Acronyms xiii

1 Introduction 1

1.1 The theory of multiple participant DM . . . . . . . . . . . . . . . . . . . . . . . . .2

1.1.1 Basic nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

1.1.2 Bayesian approach to MPDM . . . . . . . . . . . . . . . . . . . . . . . . .2

1.2 Aim of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

2 Problem Formulation 5

2.1 Purpose of the software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

2.1.1 Software framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

2.1.2 Software implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.2 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

2.2.1 Mixtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 BNT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

2.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2.3 Object-oriented approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2.3.1 Basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2.3.2 Survey of OO languages . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

2.3.3 Legacy software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

2.3.4 Object-oriented approach in Matlab and ANSI C . . . . . . . . . . . . . . .13

2.3.5 Unified Modelling Language (UML) . . . . . . . . . . . . . . . . . . . . . .15

2.3.5.1 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

2.3.5.2 Sequential diagram . . . . . . . . . . . . . . . . . . . . . . . . .16

v

Contents

3 Theory of Decision Making 17

3.1 Bayesian formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

3.1.1 Basic nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

3.1.2 Probabilistic calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

3.1.2.1 Basic elements . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

3.1.2.2 Operations on pdf . . . . . . . . . . . . . . . . . . . . . . . . . .19

3.2 Dynamic learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

3.2.1 Probabilistic models: description of reality . . . . . . . . . . . . . . . . . .21

3.2.2 Bayesian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

3.2.3 Bayesian estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

3.3 Dynamic design of control strategy . . . . . . . . . . . . . . . . . . . . . . . . . . .23

3.4 Merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

3.4.1 Direct merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

3.4.2 Indirect merging of pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

4 Feasible Decision Making 29

4.1 Linear state-space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

4.1.1 Dynamic learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

4.1.2 Fully probabilistic design . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

4.1.3 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

4.2 Time-invariant exponential family models . . . . . . . . . . . . . . . . . . . . . . .33

4.2.1 The models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33

4.2.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34

4.2.3 Fully probabilistic design . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

4.2.4 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

4.3 Distributional approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

4.3.1 Certainty equivalence approximation . . . . . . . . . . . . . . . . . . . . .37

4.3.2 Laplace’s approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

4.3.3 Fixed-form minimum distance approximation . . . . . . . . . . . . . . . . .38

4.3.4 Variational Bayes (VB) approximation . . . . . . . . . . . . . . . . . . . . .39

4.3.5 Markov Chain Monte Carlo (MCMC) approximation . . . . . . . . . . . . .41

4.4 Approximate Bayesian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

4.4.1 Forgetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

4.4.2 Variational Bayes filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .42

4.5 Approximate estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

4.5.1 Bayes-closed approximation . . . . . . . . . . . . . . . . . . . . . . . . . .43

4.5.2 Projection based approach . . . . . . . . . . . . . . . . . . . . . . . . . . .44

4.5.3 On-line Variational Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . .45

4.6 Approximate design of DM Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .46

vi

Contents

5 Practical Aspects of Decision Making 47

5.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49

5.2 Prior elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

5.2.1 Elicitation of prior pdf from one source . . . . . . . . . . . . . . . . . . . .51

5.2.2 Merging of knowledge sources . . . . . . . . . . . . . . . . . . . . . . . . .52

5.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

5.4 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

5.5 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54

5.5.1 Validation with fixed cutting moment . . . . . . . . . . . . . . . . . . . . .54

5.5.2 Validation with multiple cutting moments . . . . . . . . . . . . . . . . . . .55

5.5.3 Other techniques of model validation . . . . . . . . . . . . . . . . . . . . .57

5.6 Elicitation of ideal pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

5.7 Design of DM strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

5.8 Design validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

6 Multiple Participant Decision Making 59

6.1 On-line (data-processing) stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59

6.2 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

6.3 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

6.4 Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61

6.5 Design of MP decision-maker . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62

7 Software Image 65

7.1 Package Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66

7.2 Package Prob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66

7.2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66

7.2.1.1 Datatype: rv_id . . . . . . . . . . . . . . . . . . . . . . . . . . .67

7.2.1.2 Class RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67

7.2.1.3 Class RVfinal (RV) . . . . . . . . . . . . . . . . . . . . . . . . .67

7.2.1.4 Class RVlist (RV) . . . . . . . . . . . . . . . . . . . . . . . . . .68

7.2.2 Functions on random variables . . . . . . . . . . . . . . . . . . . . . . . . .68

7.2.2.1 Class function . . . . . . . . . . . . . . . . . . . . . . . . . . . .68

7.2.2.2 Class ConstFn . . . . . . . . . . . . . . . . . . . . . . . . . . . .69

7.2.2.3 Class LinearFn . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

7.2.2.4 Other classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

7.2.3 Observed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

7.2.3.1 Class DataSource . . . . . . . . . . . . . . . . . . . . . . . . . .70

7.2.4 Probability density functions (pdfs) . . . . . . . . . . . . . . . . . . . . . .71

7.2.4.1 Class mPdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

7.2.4.2 Class ePdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73

vii

Contents

7.2.4.3 Class oPdf (mPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .74

7.2.4.4 Class pPdf (mPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .74

7.2.4.5 Class ePdfFinal (ePdf) . . . . . . . . . . . . . . . . . . . . . . . .75

7.2.4.6 Class eEmp (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . . .76

7.3 Package FProb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76

7.3.1 Linear state-space models . . . . . . . . . . . . . . . . . . . . . . . . . . .76

7.3.1.1 Class QuadraticFn (LinearFn) . . . . . . . . . . . . . . . . . . . .77

7.3.1.2 Class expQuadFn (QuadraticFn) . . . . . . . . . . . . . . . . . .77

7.3.1.3 Class mNorm (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .77

7.3.1.4 Class oNorm (oPdf,mNorm) . . . . . . . . . . . . . . . . . . . .78

7.3.1.5 Class pNorm (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . .78

7.3.1.6 Class eNorm (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . .79

7.3.2 Exponential family models . . . . . . . . . . . . . . . . . . . . . . . . . . .80

7.3.2.1 Class MultiIndexFn (function) . . . . . . . . . . . . . . . . . . .82

7.3.2.2 Class eEF (ePdfFinal) . . . . . . . . . . . . . . . . . . . . . . . .82

7.3.2.3 Class eGW_LD (eEF) . . . . . . . . . . . . . . . . . . . . . . . .82

7.3.2.4 Class mDelta (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .83

7.3.2.5 Class mFrgEF (mPdf) . . . . . . . . . . . . . . . . . . . . . . . .83

7.3.2.6 Class oEF (oPdf) . . . . . . . . . . . . . . . . . . . . . . . . . . .84

7.3.2.7 Class eMC (eEF) . . . . . . . . . . . . . . . . . . . . . . . . . .84

7.3.2.8 Class pMC (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . . .85

7.3.3 Variational Bayes approach . . . . . . . . . . . . . . . . . . . . . . . . . . .86

7.3.3.1 Class oVBnet (oPdf) . . . . . . . . . . . . . . . . . . . . . . . . .86

7.3.3.2 Class oVBpart (oEF) . . . . . . . . . . . . . . . . . . . . . . . .87

7.3.3.3 Class eVBnet (ePdf) . . . . . . . . . . . . . . . . . . . . . . . . .87

7.3.3.4 Class pVBnet (pPdf) . . . . . . . . . . . . . . . . . . . . . . . . .88

7.4 Package SingleDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

7.4.1 Class UserInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90

7.4.1.1 Class DataInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .90

7.4.1.1.1 ChnlInfo . . . . . . . . . . . . . . . . . . . . . . . . .90

7.4.1.1.2 FilterInfo . . . . . . . . . . . . . . . . . . . . . . . . .92

7.4.1.2 Class PriorInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .92

7.4.1.2.1 PriKnInfo . . . . . . . . . . . . . . . . . . . . . . . . .92

7.4.1.3 Class ModelInfo . . . . . . . . . . . . . . . . . . . . . . . . . . .92

7.4.1.4 Class EFModInfo (ModelInfo) . . . . . . . . . . . . . . . . . . .93

7.4.1.5 Class MValidInfo . . . . . . . . . . . . . . . . . . . . . . . . . .93

7.4.1.5.1 Class ValInfo . . . . . . . . . . . . . . . . . . . . . . .93

7.4.1.5.2 Class CuttingVInfo (ValInfo) . . . . . . . . . . . . . . .93

7.4.1.6 Class IdealInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . .93

viii

Contents

7.4.1.6.1 Class IdealChInfo . . . . . . . . . . . . . . . . . . . . .94

7.4.1.7 Class DesignInfo . . . . . . . . . . . . . . . . . . . . . . . . . .94

7.4.1.8 Class DValidInfo . . . . . . . . . . . . . . . . . . . . . . . . . . .94

7.4.2 Special purpose classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94

7.4.2.1 Class FictOPdf (oPdf) . . . . . . . . . . . . . . . . . . . . . . . .95

7.4.2.2 Class iPdf (ePdf) . . . . . . . . . . . . . . . . . . . . . . . . . . .95

7.4.2.3 Class Simulator (DataSource) . . . . . . . . . . . . . . . . . . . .95

7.4.2.4 Class Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . .96

7.4.2.5 Class MVHypothesis (Hypothesis) . . . . . . . . . . . . . . . . .97

7.4.3 Decision Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97

7.4.3.1 Class AdaptDM . . . . . . . . . . . . . . . . . . . . . . . . . . .97

7.4.3.2 Class SingleDM (AdaptDM) . . . . . . . . . . . . . . . . . . . .98

7.5 Package MultiDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101

7.5.1 User Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101

7.5.1.1 Datatype DAEP . . . . . . . . . . . . . . . . . . . . . . . . . . .101

7.5.1.2 Class MPUserInfo (UserInfo) . . . . . . . . . . . . . . . . . . . .102

7.5.1.3 Class NeighInf . . . . . . . . . . . . . . . . . . . . . . . . . . . .103

7.5.1.4 Class DAEPInfo (DataInfo) . . . . . . . . . . . . . . . . . . . . .103

7.5.2 Special purpose classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103

7.5.2.1 Class DAEPSource (DataSource) . . . . . . . . . . . . . . . . . .103

7.5.3 Decision Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104

7.5.3.1 Class MultiDM (SingleDM) . . . . . . . . . . . . . . . . . . . . .104

8 Conclusion 107

8.1 Key contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109

ix

Contents

x

Notational Conventions

Linear algebra

a,A, all mathematical variables are assumed to be multivariate, no distinction is

made between lower and upper case letters.

ai, ith element of multivariate variablea, (a is assumed to be a vector)

ai,j , (i, j)th element of matrixa

at,ai;t variablea andith element ofa in time t, respectively.t is a letter reserved

for time index. If there is more than one letter in subscript, time index is

always the last one and is separated from the orther by a semicolon.

A′ transposition of matrixA

Ir square identity matrix of dimensionsr × r

1p,q, 0p,q matrix of sizep× q with all elements equal to one, zero, respectively

tr (A) trace of matrixA

δ (x) delta-type function. Exact meaning is determined by the type of the argu-

ment,x. If x is a continuous variable, thenδ (x) is the Dirac delta function:∫x δ (x− x0) g (x) dx = g (x0) .

If x is a discrete variable, thenδ (x) is the Kronecker function:

δ (x) =

1,0,

if x=0,otherwise.

Probability calculus

Pr (·) probability of argument

f (·) probability (density) function (pdf)bIf, bof due to probabilistic description of DM, various pdfs may be defined on the

same variables for different purposes. The purpose will be denoted by the

upper-left index, e.g.I, e. Othewise, the meaning of the p(d)f is given

through the name of its argument.bwf a pdf without explicit specification of its form. This type of pdfs will be used

in functional optimization techniques.

x denotes random quantity (variable).

x∗ denotes the range ofx, x ∈ x∗.x denotes the number of members in the countable setx∗.

xi

Notational Conventions

x denotes the number of elements in the multivariate variable (array)x.

≡ means the equality by definition.

xt is a quantityx at the discrete time labelled byt ∈ t∗ ≡ 1, . . . , tt ≤ ∞ is called (decision, learning, prediction, control) horizon.

xi;t is ani-th entry of the arrayx at timet, i = 1, . . . , x.

The semicolon in the subscript indicates that the symbol following it is the

time index.

x(k . . . l) denotes the sequencext with t between time momentsk ≤ l, i.e.x(k . . . l) ≡xk, . . . , xl,

x(t) simplified notationx(t) ≡ x(1 . . . t). Specifically, fort < 1, x(t) is an

empty sequence.

x[1], f[2] in multiple-partcipant settings, this form of subscript denotes affiliation of

the given object with participant identified by the number in brackets.

N(µ, s2

)Normal distribution with mean value,µ, and variance,s2

G (α, β) Gamma distribution with scalar parametersα andβ

U (·), U ((α, β]) Uniform distribution on the argument set, on the interval(α, β], respectively

xii

List of Acronyms

AR AutoRegressive (model, process)

BNT Bayesian Networks Toolbox

DM Decision Making

EF Exponential Family

EM Expectation Maximization (algorithm)

KF Kalman Filter

KL Kullback-Leibler (distance)

MAP MaximumA PosterioriProbability

MCMC Markov Chain Monte Carlo

ML Maximum Likelihood

MP Multiple Participant

pdf probability density function

UML Unified Modelling Language

VB Variational Bayes

VEM Variational EM (algorithm)

xiii

List of Acronyms

xiv

1 Introduction

Decision making [1, 2, 3, 4, 5] is an active and purposeful selection ofactionsamong several alter-

native options. For humans, DM is a natural part of everyday life. In this text, we are concerned with

an abstract concept of decision making without distinguishing if the actions are chosen by a human

or a machine. Therefore, we use a neutral worddecision makerfor both options.

Dynamic DMarise when the decision maker is aware of dynamically delayed consequences of his

decisions, and takes these consequences in account in the DM process. Obviously, control [6, 7, 8,

9, 10, 11] can be viewed as a specific instance of dynamic DM, cf. the IEEE Series of Conferences

on Decision and Control.

Dynamic DM, and thus control, is always made under uncertainty caused by decision-maker’s

incomplete knowledge on the mechanism relating the actions and their consequences. In fact, the

ever present uncertainty is the real reason forfeedback, i.e. modification of decision maker’s actions

using the observed data. Stochastic control theory [12] and, generally, theory of statistical DM

[13, 5] models this situation. It guides in designing of aDM strategy, i.e. optimal sequence of rules

which maps the available knowledge, DM aims and observations onto actions. Often, the knowledge

is accumulated from observations made either before applying the DM strategy or even during the

course of actions [14, 15, 16]. Thus, a sort oflearningbecomes a generic part of the DM strategy.

The following assumptions are typically made in the design of the optimal DM strategies:

1. The optimized strategy is the onlysystem that intentionally influences the optimized responses.

2. Typically, only one DM aim is given, and it is known a priori.

These assumptions are too restrictive for certain type of problems. For example: multi-criterion

decision making [17], or cooperation of autonomous units (robo-football) [18].

The first of the the listed assumptions seems to be appropriate, since the optimized strategy

can handle multivariate actions. However, the computational and communicational complexity—

inherent to DM under uncertainty—makes this assumption very restrictive. The DM strategies de-

signed under it are practically feasible in relatively low dimensional problems, and the solution is far

from being scalable.

The problem is practically solved by decomposition of the whole DM problem, leading to—

necessarily approximate—distributed DM. This methodology shifts the complexity boundary of

solvable cases much further on, e.g. [19, 20, 21, 22]. At the same time, a lot of problems are still

open and it seems that there is no commonly accepted methodology how to approach the solution.

Some problems seem to be of conceptual nature [23].

1

CHAPTER 1. INTRODUCTION

The second assumption is often violated in practice and represent a real problem even in standard

DM [4, 19]. The violation is even more serious in distributed settings. It raises complexity of the DM

problem and its solution as it needs to solve co-ordination and negotiation problems. Game theory

[24] addresses the problem but the assumption of fully rational players was already questioned [25],

and also there exist very conceptual problems on negotiation [26] within the discussed formulation.

1.1 The theory of multiple participant DM

The above discussion indicates that there is an urgent need to create realistic, scalable, distributed dy-

namic decision making theory under uncertainty. We will also call this theory asmultiple-participant

decision making (MP DM).

This thesis is a contribution to a wider project in creation of consistent theory of

MP DM.

In this Section, we introduce the background of this project, the adopted approach, concepts and

methods. The aim of the thesis, within the project, will be defined in detail in the next Section.

1.1.1 Basic nomenclature

The transition from single to multiple participant decision making requires a new nomenclature to

be used. In this Section, we relate the terminology we will use in this text to the terminology used in

single participant DM, which is also commonly used in control theory.

Single controller, as a prototype of single decision maker, influences a part of real world of its

interest. It is traditionally calledthe system. In the considered multiple participant scenario, parts of

the system can be influenced by several controllers, participants in the DM process. The traditional

understanding of “the system” looses its clarity and it is reasonable to adopt the termenvironment.

This is, again, a part of the world, which is to be influenced byanyof the participants. Each partici-

pant interact with apart of the environment via (i) observations, and (ii) decisions. This is illustrated

in Figure 1.1.

The main distinction between single and multiple participant DM is the ability of participants to

communicatewith each other. If the participants are not aware of each other presence, or they do

not care about the others, they act as single decision makers, following their different and possibly

contradictory aims. In such a situation, their mutual effect is generically adverse and yields poor

overall performance.

1.1.2 Bayesian approach to MPDM

The intended Bayesian theory [27] treats the task of distributed DM as a task of DM with multiple

individual decision makers (participants), which have:

1. individual aims,

2

1.1. THE THEORY OF MULTIPLE PARTICIPANT DM

System 1

Controler 1

data

actions

System 2

Controler 2

dataactions

Single participant DM

Participant 1

dataactions

Participant 2

dataactions

communication

Enviroment

Multiple participant DM

Figure 1.1:Relation of single and multiple participant DM.

2. pre-determined abilities to observe, act, evaluate andcommunicatewith other participants.

Thus the problem is reformulated as many parallel single participant DM tasks withnon-standard

but very realistic assumptions. This approach guaranteesa priori the full scalability of the distributed

DM.

Thus, many results from the single participant DM theory can be used. The following ideas forms

basis of the approach:

• The normative (prescriptive) theory is searched for. General results on DM under uncertainty

[1, 13, 5, 28] imply that DM of each individual participant, is to be guided by the Bayesian

DM paradigm.

• The rigorous, fully probabilistic formulation, is used to design the DM strategy for a given

probabilistic model [29, 30, 28, 31].

The extension of such a fully probabilistic decision maker to multiple participant scenario should

be formulated in the same (i.e. probabilistic) terms. This implies that the coordination—needed in

distributed setting—is reduced toreporting probabilities between a small set of reachable neighbors.

Therefore, two additional problems—which are not addressed by the centralized Bayesian DM—

have to be solved:

1. A mechanism for coordination of actions of a participant with its neighbors have to be de-

signed. The solution of this problem is foreseeable: the participant has to share and harmonize

knowledge and aims with its neighbors. Using fully probabilistic formulation of DM, the

problem reduces to merging and extension of probability distributions in the vein discussed in

[32].

2. The use of the above mentioned communication mechanism can be seen as a new decision

making problem to be solved by each participant. Therefore, each participant have to design

a corresponding strategy of communication. The paper [27] indicates that the number of such

types is very limited (selfish, cooperating and hierarchically cooperating participants), hence,

the design of adequate strategies is feasible.

3

CHAPTER 1. INTRODUCTION

Preliminary results of this research can be found in [33].

1.2 Aim of the thesis

One of the goals of the project is to apply the emerging theory to a set of real problems. In order to

achieve this goal, it is necessary to create a reliable software image of the theory. None of the avail-

able single participant DM software frameworks is easily extensible to deal with multiple participant

scenarios.

The aim of this thesis is to prepare a new software framework for development

and application of the Bayesian distributed decision making theory.

This task is challenging because the theory is not fully developed and stabilized. The software

should help in developing this theory and its parts as well as in transferring the results to various

application domains.

In order to achieve the overall aim, we define the following subtasks:

1. Formalize the requirements on the resulting software. These requirements can arise from the

considered application areas, theoretical background, or researchers involved in the project.

2. Review the available software tools in the light of these requirements.

3. Review the latest theoretical results and methods that should be supported by the framework.

4. Design the framework.

5. Demonstrate that the framework embraces state-of-the-art decision making problems.

These tasks will be addressed in the sequel as follows:

Chapter 2 defines the addressed problem. It contains formalization of the requirements, review of

the available software, and description of the chosen implementation.

Chapter 3 reviews the basic concepts of Bayesian theory of decision making,

Chapter 4 analyzes the general theory from computational point of view. Computationally feasible

models—i.e. models with exact or approximate solution of the DM problem—are reviewed.

Chapter 5 introduces some aspects of DM that are important for application of the theory to a real

problem. Experience accumulated in long-term research of single-participant decision making

is reviewed here.

Chapter 6 discuss the implications of the extension to multiple-participant setting.

Chapter 7 presents the main result of the thesis, i.e. the analysis of the software image of the theory

described previously.

4

2 Problem Formulation

The theory of statistical decision making was developed in [1], elaborated to engineering form by

[14, 15] and updated in [28]. Translation of the theory into software is a challenging task, since the

theory describes real world in terms of abstract mathematical structures, such as density functions

and functionals. The process of decision making is then defined in terms of operations on these

structures. The set of all possible structures is extremely rich and operations over many of them

are not computationally tractable. Therefore, each attempt at software analysis of the theory has to,

inevitably, restrict its scope to a certain, computationally tractable, sub-set. This initial restriction is a

very important step, since it represents a trade-off between (i) modelling abilities, and (ii) simplicity

of implementation of the software.

In this Chapter, we define the aim of the software and the requirements imposed on it. These

requirements will be used as guidelines in designing the software.

2.1 Purpose of the software

This work is being carried within a collaborative research environment with a long history. Any

complex software resulting from a long-term research follows naturally several, often contradictory,

aims. The first step of the design is, therefore, to identify these aims and their importance. These are

summarized here:

1. Inspection of the theory

The basic problems of DM were solved on the general level a long time ago. However, the

resulting operations were found to be computationally tractable only for a small sub-sets of

mathematical models. For thesefeasiblemodels (e.g. ARX models, state-space models, dis-

crete Bayesian networks) the DM has matured into reliable and practically applicable algo-

rithms. These disjoint sub-sets are bounded by analytical tractability. Worldwide, a lot of

effort has been directed to lifting these boundaries and important advances has been made in

approximation theory and its application to DM. Therefore, we intended to review the state-

of-the-art techniques and draw new borders of feasibility.

2. Establish a basis for long term research

In spite of the fact that general solutions of the DM are known for a long time, there are—

and will be—many detailed issues that are not satisfactorily resolved. Therefore, the software

should be open to further extensions in such a way that the involved researchers will actively

5

CHAPTER 2. PROBLEM FORMULATION

deal with sub-parts of the software and will act as passive users of its rest. Practically, the

software should consider even tasks (operations, functions) for which an algorithmic solution

is not yet known. If this can be achieved, it will lay basis for a long term research, where the

attention can be focused on particular problem and not on re-implementation of the overall

framework.

3. Unification of existing software applied to wide range of real-life problems.

At present, there are many software packages implementing (to a certain level) the DM for a

particular class of models in particular application areas. The newly designed software should

be general enough to cover at least the same range of problems and ideally, utilize maximum

of the experience accumulated within these projects.

To address these aims in more detail, it is useful to distinguish two principal parts of the software

package:

Framework is a general description of the distributed dynamic DM. It specifies (i) data structures,

and (ii) algorithms.

Implementation of the framework in a programming language.

Specification of the framework should be independent of its implementation. This will be achieved

by the use of general modelling language in which the data structures and algorithms will be de-

scribed. Various implementations of the framework may arise. These implementations may be

application-specific with different intellectual property rights. However, all implementation should

follow the framework specifications to be mutually (almost) compatible.

2.1.1 Software framework

As mentioned in the previous Section, the framework will be shaped according to the theory. How-

ever, full generality of the theory cannot be captured by any software. Inevitably, we have to restrict

our support of the DM problem to a suitable class of mathematical models of the environment and

the aims of DM. Therefore, we seek such a class of models that is as general as possible, but at the

same time computationally tractable and applicable in real-life.

In this Section, we summarize the necessary requirements for a candidate families:

Requirement 2.1 (Requirements on software framework)The considered framework should sup-

port:

1. Multivariate dynamic models; the environment we intend to work with is expected to have:

• both discrete and continuous variables, with mutual dependencies between them,

• dynamic nature, i.e. present behavior depends on the previous observations.

6

2.1. PURPOSE OF THE SOFTWARE

The chosen class of mathematical models must support these properties.

2. Lego-like concept; the software framework should provide basic building blocks that can be

seamlessly composed into complicated structures. These blocks should:

• cover data structures corresponding to basic structural elements in the theory,

• include composition tools corresponding to operators in the theory,

• allow easy addition of new types of all elements (within the framework).

Partially composed elements should be ready for particular tasks. This may be achieved using

object-oriented (OO)approach to software design.

3. Design of DM strategies; the nature of MPDM requires participants to design their DM strate-

gies as follows:

• Since aims of the DM may be changed on-line, each participant must be able to re-

evaluate its strategy recursively.

• Communication with other participants is also a DM problem, hence, each participant

may change its communication strategy at any time.

4. User interface; useris a human being who determines the desired behaviour of the whole

MP DM scenario. Therefore, the software framework should provide tools of interaction with

non-expert users:

• Description of the DM problem, i.e. its aims, available knowledge, used model and con-

straints should be made in user’s terms independently of the processing method.

• Presentation of results has to be close to the application domain.

• Processing outputs have to support “publication-technological" line.

• The choice among alternative processing and the corresponding tuning knobs has to be

left to expert only. Meaningful defaults have to be built in.

2.1.2 Software implementation

The software is to be used in full-scale applications, which induces high requirements for quality and

maturity of the code. The following points seem to be indispensable.

Requirement 2.2 (Requirements on Implementation)The supported development platform should

be:

1. numerically stable and efficient; which is important in industrial applications,

2. portable to a wide range of platforms; it should run on anything from a supercomputer to an

industrial micro-controller,

7


3. suitable for implementation of object-oriented algorithms; which is necessary for seamless

implementation of the framework which will be defined using object-oriented methodology,

4. able reuse the code that is already available; most of the development in the area was done

in Matlab and C. The new tool should be easily connectible to these tools,

5. economically affordable; applications of the framework in non-for-profit organizations is also

considered, therefore we should not rely on any expensive proprietary tools,

6. user-friendly; allowing easy testing of new algorithms and tuning knobs,

Traditionally, the development was done sequentially in: (i) Matlab, for rapid development and

testing, (ii) pure ANSI C, for portability and implementation in Matlab-free applications, and (iii)

Mex files, for connectivity of Matlab and ANSI C. This chain will be discussed later in Section 2.3.3.

2.2 State of the art

In this Section, we review the existing solutions in the light of the requirements described above.

We have tested software packages from the following research areas:

Optimal advising: packageMixtools,

http://guest:[email protected]:1800/svn/mixtools/ ,

Bayesian networks: packageBNT,

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html ,

Graphical models: projectgR,

http://www.r-project.org/gR/ ,

Bayesian neural networks: projectfbm,

http://www.cs.toronto.edu/~radford/fbm.software.html ,

Nonlinear filtration: projectnftool, http://control.zcu.cz/nftools/ ,

andrebelhttp://choosh.ece.ogi.edu/rebel/

Bayesian decision making: projectIND,

http://ic.arc.nasa.gov/ic/projects/bayes-group/ind/

Multi-agent systems: projectJADE, http://jade.tilab.com/ ,

this project is one example from the family of multi-agent systems. Multi-agent systems form

a large area of researchwww.multiagent.com , with defined standards for interoperabil-

ityhttp://www.fipa.org/ . However, the specification of agents do not provide any

guidance on the choice of the decision making methodology. Therefore, it can not be com-

pared with other tools from this point of view. The value of multi-agent systems—from our

8

http://guest:[email protected]:1800/svn/mixtools/

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

http://www.r-project.org/gR/

http://www.cs.toronto.edu/~radford/fbm.software.html

http://control.zcu.cz/nftools/

http://choosh.ece.ogi.edu/rebel/

http://ic.arc.nasa.gov/ic/projects/bayes-group/ind/

http://jade.tilab.com/

www.multiagent.com

http://www.fipa.org/

2.2. STATE OF THE ART

project framework implementationmultivariate lego-like recursive user

dynamic models concept DM interfacemixtools +/− +/− + + Matlab,C

BNT + + +/− + Matlab,C++gR + + − − Sfbm + +/− − − ANSI C

nftool +/− +/− − − MatlabIND − − +/− − C

where + denotes full support,+/− partial support,− no suport of the given feature

Table 2.1:Review of available software packages for Bayesian Decision Making.

point of view—is in the definition of communication protocols such as: request interaction

protocol,http://www.fipa.org/specs/fipa00026/ .

Here, we have selected only the most advanced, freely available tools. There are many more software

projects in areas of state-space modelling and Bayesian estimation, see e.g. survey onhttp://

leuther-analytics.com/bayes/free-bayes-software.html . However, the num-

ber of tools for Bayesian decision-making is rather limited. Evaluation of all projects with respect to

our requirements (Requirements 2.1, and 2.2) is briefly summarized in Table 2.1.

Most of the projects are implemented in Matlab or C which is a combination that will be studied in

Section 2.3.3. However, implementation is the less important factor, since most of the packages fails

our requirements on the framework. All the studied tools have some advantages and disadvantages,

the most common disadvantage is that the application area of each tool is too narrow for our purpose.

All projects do well within their field of expertise, but none of them fulfills all of our requirements.

From those tools, we choose Mixtools and BNT for further detailed analysis, since these meet

most of our requirements.

2.2.1 Mixtools

Mixtools is a Matlab Toolbox developed in our department specifically for the purpose of optimal

Bayesian advising [34, 28, 35].

The model: The basic observation model considered within this toolbox is a mixture of ARX mod-

els (autoregressive models with exogenous input). Both continuous and discrete observations

are supported via mixture of Gaussian, and Markov chain regression models, respectively.

Lego-like concept: The basic structure is a mixture of ARX (or Markov) models. It is possible

to define new-type of components in the mixture, however it requires relatively lot of effort.

Composition tools are ready only for creation and manipulation with mixtures. Another dis-

advantage of this package is centralized data-handling mechanism.

9

http://www.fipa.org/specs/fipa00026/

http://leuther-analytics.com/bayes/free-bayes-software.html

http://leuther-analytics.com/bayes/free-bayes-software.html


Decision making: the DM strategy is designed usingfully probabilistic approach. This approach

is capable of both, recursive evaluation of DM strategy, and formalization communication

strategy as DM problem.

User interface: many tools for support of non-informed user are available.

Implementation: the toolbox is is implemented in Matlab, Mex, and ANSI C programming envi-

ronments (see Section 2.3.3). This type of implementation ensures both (i) ease of develop-

ment within Matlab, and (ii) portability and industrial applicability through ANSI C.

2.2.2 BNT

The BNT package implements state-of-the-art algorithms for Bayesian networks [36]. This toolbox

has two principal distinctions from other tools typically used in the area of graphical and Bayesian-

network modelling. First, autoregressive (i.e. dynamic) models are considered. Second, it includes

basic support for decision making.

The model: the basic model is a dynamic Bayesian network. In principle, it is a pdf restricted

by the assumptions of conditional independence between some variables. These assumptions

are described in terms of graph, where pdfs are nodes in the graph, and edges denote mutual

dependence of nodes. Currently, the following types of nodes are supported: Gaussian models,

hidden Markov models, perceptron neural networks, and discrete models.

Lego-like concept: the conditional independence assumption is an excellent tool for separation

of basic building blocks and their composition. Namely, nodes on the graph represent the

basic building blocks, and the graph (network) of their dependence is the composition tool.

This way, complex structures can be easily created.

Decision making: is made via so called utility functions, which are assigned to each node in the

graph. This mechanism supports both, recursive evaluation of DM strategy, and formaliza-

tion of the communication strategy as DM problem. However, it is readily available only for

one-step-ahead DM. Extensions to longer DM horizon is possible, however, it would be com-

putationally inefficient, since it requires to build the network for all variables within the DM

horizon.

User interface: is quite limited. Only expert users of the toolbox are supported. However, addi-

tional projects are trying to fill the gap, such as BNTeditorhttp://bnt.insa-rouen.

fr/BNTEd.html .

Implementation: the toolbox is primarily written in Matlab, however, a preliminary port to C++

is available at Intel,http://www.intel.com/research/mrl/pnl/ .

10

http://bnt.insa-rouen.fr/BNTEd.html

http://bnt.insa-rouen.fr/BNTEd.html

http://www.intel.com/research/mrl/pnl/

2.3. OBJECT-ORIENTED APPROACH

2.2.3 Summary

None of the currently available toolboxes for Bayesian decision making matches our requirements

(Requirements 2.1, and 2.2). Therefore, in this text, we develop an analysis of an the desired toolbox

for distributed Bayesian decision making.

We intend to exploit as much experience accumulated in the current software packages as possible.

We will draw inspiration from both: Mixtools, and BNT packages. The Mixtools are more mature

in their technology of implementation, and design of DM strategy. On the other hand, the range of

models supported by BNT is impressive and unrivaled.

2.3 Object-oriented approach

The basic requirements of the framework—namely extensibility, flexibility and intuitiveness of its

use—have been raised in computer science many time before. One approach that was designed to

meet these requirements is known asobject-oriented (OO)approach [37].

2.3.1 Basic principles

The OO approach introduces the following principles:

Encapsulation: the data fields and the relevant methods. Data fields—known asattributes—

of an object can be accessed or modified only by the corresponding procedure—known as

operations—that is encapsulated in the same object.

This principle assuresflexibility of the code because implementation of the object can change

(within reasonable bounds) without changing the interface visible to callers.

Inheritance: a new object is defined as an extension of another, already existing, object. The new

object inherits attributes and operations of the old one, however it is free to redefine the original

operations, or declare new operations.

This principle assuresreuseof the existing code. It also simplify maintainability of the code

and enhance its readability.

Polymorphism: is the ability to work with similar but different objects as if they were the same.

This principle enhancesaccessibilityof the code for non-expert programmers, since the num-

ber of concepts and identifiers is significantly reduced.

These principles can be seen as guidelines for definition of the framework. However, many program-

ming languages (OO languages) have been designed, with explicit support for these principles. This

means that the use of these principles is enforced by the compiler.

11


numerical stability portability OO code economically userand efficiency approach reuse affordable friendly

Matlab +/- +/− − + − +ANSI C + + − + + −

C++ + +/− + − + −JAVA − +/− + − + +/−

Table 2.2:Comparison of programming languages.

2.3.2 Survey of OO languages

We have tested some OO languages in the light of our requirements on implementation (Require-

ments 2.2), and compared them to our traditional languages: Matlab and ANSI C. The results are

summarized in Table 2.2. Here, we comment on each of the studied languages:

Matlab: has some form of support for object-oriented approach, however, its support is very poor

and inefficient. Therefore, we can not consider Matlab as ready for OO approach. Its main

attraction is user-friendliness, its main drawback is the lack of computational efficiency and

economical affordability.

Java: Java is a popular OO programming language. It is well supported in the Matlab environment,

namely Java classes can be called from Matlab. Its main attraction is support of OO approach

and connectivity to Matlab. Its main drawback is computational efficiency.

The following experiment with library JAMA library (http://math.nist.gov/javanumerics/

jama/ ) was performed on Pentium 400MHz:

Test: 100 multiplications of matrix 100x100: on Pentium 400MHz

Results: Matlab 0.8s, JAMA 4s, calling JAMA from Matlab (>10s)

ANSI C: is a low-level programing language. Its main advantages are computational efficiency and

portability. Its main drawback is the lack of support for OO approach and user-friendliness.

However, the latter can be remedied by interoperability with Matlab via the technology of Mex

files.

C++: is a re-design of the C language to support the OO approach. Therefore is has all the ad-

vantages of ANSI C, except for portability and connection with Matlab. Its main drawback is

therefore the lack of user friendliness.

The overall conclusion is that none of the above languages is suitable for our needs. We have to

use a combination of languages to meet most of our requirements. In this situation, the requirement

on continuity of research starts to be the dominant factor in the selection process. Combination of

Matlab and ANSI C via Mex files has a long tradition within our research environment. Therefore,

we will continue to use this tool-chain.

12

http://math.nist.gov/javanumerics/jama/

http://math.nist.gov/javanumerics/jama/


This combination, however, does not support OO approach. As it was mentioned in Section 2.3.1,

the basic principles of OO approach can be implemented even without direct support from the pro-

gramming environment. This increases the manual labor associated with the coding, however, we

believe that if will pay-off in better computational efficiency of the resulting software.

2.3.3 Legacy software tools

In this Section, we review the process of software design used in the previous projects. The software

is implemented in three parallel code-bases:

1. Matlab M-files

2. Matlab Mex-files

3. pure ANSI C

These parallel implementations are bound together by the specifications of the framework and de-

scription. They should provide identical results (within the limits of numerical accuracy) for identical

models.

The parallel maintenance of three various code-bases is labour-expensive, but it has the following

advantages:

Matlab is used as a platform for rapid development. It is user friendly, it has many visualization

tools, and users are familiar with it.

ANSI C is used as a platform for final implementation. It is used for numerical efficiency, portability

and applicability in the industry (where Matlab is too expensive)

Mex-files provide a convenient bridge between the two environments

This strategy of development was successfully used in many projects (ABET [38], ProDaCTool [34],

DESIGNER [39, 40] etc.). Many tools for preserving consistency of the parallel implementations

were developed. However, this strategy impose a strong demands on consistency and clarity of the

framework.

2.3.4 Object-oriented approach in Matlab and ANSI C

Experiments with implementation of object-oriented (OO) principles in the ANSI C language were

already presented [41], [42]. The published approaches can not be used in our Matlab-centered

environment, however, it motivated us to implement basic OO support in the currently used well

tested and reliable tool-chain (Matlab–Mex–ANSI C). The missing OO features can be emulated by

extra tools and coding agreements.

Agreement 2.1 (OO programming in Matlab and C) The following coding agreements establish

basic support for OO approach in Matlab, Mex, and ANSI C:

13


1. Objects are represented by Matlab data structures with a compulsory field type for unique

identification of the class the object belongs to. Attributes of the object are fields in the struc-

ture of the corresponding type. Operations of the object are also represented by fields of in the

structure. Name of the filed is the same as name of the function and it is of the function_handle

type. (Or, for computational speed, an index into global table of function_handles).

2. Operations on objects are Matlab functions which treat its first argument as the object they are

encapsulated into. Execution of the function then extracts the corresponding function_handle

from the object structure and use it to call function which is appropriate for given object.

These simple rules ensure consistent implementation of basic OO properties as follows:

Encapsulation: is achieved by storing the object attributes and the corresponding functions (func-

tion handles) in one structure.

Inheritance: is achieved at the stage of construction of an object. The constructor calls the con-

structor of the parent object first, hence the attributes and operations of the parent are created.

After that, the constructor can add new attributes, and operations, or rewrite handle of an

existing operation by another one.

Polymorphism: is achieved by the globally-defined Matlab functions implementing object-related

operations.

The approach presented above is not a full-featured OO tool. It has many flaws and limitations, the

most important of which are:

1. A distinction between public and private methods is missing. Therefore, access to various

fields in data structures is subject to discipline of the developers.

2. Operations belonging to different classes must be all globally available. Therefore, they must

be implemented under different names, pointers to these functions (i.e. function handles) will

be stored in corresponding structures and later called via the generic function. These functions

must not be called directly by any other method. Since there are no tools to assure this, we

have to rely on discipline of the developers.

3. Matlab has limited availability of parameter-checking, therefore, checks for consistency must

be done inside of every function.

It is obvious that this approach is much more labor-intensive that build-in support from a decent OO

language. However, this is considered as a reasonable trade-off between (i) easiness of implemen-

tation, and (ii) code reuse, availability of full power of the Matlab tools, and low-level numerical

aspects. It should be remembered that the number of implemented classes and their method is ex-

pected to be relatively low, and therefore maintainable.

14


Figure 2.1:Introduction of UML notation: class diagrams.

2.3.5 Unified Modelling Language (UML)

The Unified Modelling Language (UML) [43] is a widely adopted powerful graphical language for

object-oriented modelling of real world and subsequent design of its software representation. UML

is a graphical language that is independent of computer language used for actual coding. In fact,

many tools supporting UML methodology have the ability to export the UML-described project into

the chosen programming language. Therefore it is a natural choice for description of the framework

(Section 2.1.1) of the designed software.

One of the key features of the UML is its universality. It is used as a tool for modelling of banking

system, Internet applications, data-store application, and many others. The price payed for this

universality is the complexity of the language. It offers range of tools, diagrams types and scenarios

that can be used for modelling of specific processes. In this text, we will use only a small sub-

set of UML tools to describe the framework. Details of algorithmic implementation be described

by "pseudo-code". Namely we use only two diagram types: (i) class diagram, and (ii) sequential

diagram.

All names of software structures are printed inbold typeface.

2.3.5.1 Class diagram

The class diagram is used for description of structure of the software, i.e. definition of data types,

and object classes with attributes and operations. Graphical semantics of class diagrams is illustrated

on a simple example in Figure 2.1.

Datatype is a basic structural element, it can have a complex inner structure, which is, however,

irrelevant in the modelled context. Therefore, each datatype is fully determined by its name.

Class is another type of structural elements which is composed of: (i)attributes, which can be

datatypes (a1anda2 in parent) or instances of a class (a3 in child), and (ii)operations, which

has access to all attributes of the class, accept additional parameters and possibly return a value

in form of a datatype or class instance (e.g. operationo1 in parent accepts argumentpar and

yields an integer return value).

The arrow between from class child to class parent means that the classparent is generalizationof

the classchild. In reverse, we say thatchild is specializationof the parent. In practice, it means

that child has all attributes and operations ofparent (i.e. a1, a2, o1). Attributes and operations

defined inchild (i.e. a3, o2) are additional to those inherited ones. If an operation is defined again

15


Figure 2.2:Introduction of UML notation: sequential diagrams.

(i.e. o1 in child) the new operation is replacing the original operation of the parent. In such a case,

only the nameof the method is displayed in graphical notation (seeo1 in child); its return type and

parameters are the same as those of the inherited method.

2.3.5.2 Sequential diagram

Sequential diagrams are used for description of processes and procedures. Graphical semantics of a

sequential diagram is illustrated on a simple example in Figure 2.1. It describes interaction between

instances of classes (known asobjects) in certain situations.

Graphically, life of each object is represented by a vertical dashed line. The vertical direction

denotes the time-arrow. The sequence starts on top of the diagram and ends on the bottom of it.

Horizontal arrows denote calls of function, where the arrow leads from the caller towards the called

object. Each call is named after the operation it invokes on the called object. The actual computation

within the operation is visualized by a thin rectangle on the life-axis of the object. When the opera-

tion is finished it returns its results back to the original caller. This is known as synchronous message

in the standard UML. In our work, we will use only this type of interaction. A sample sequential dia-

gram with objectsP andC—being instances of classesparent, andchild, respectively—is displayed

in Figure 2.2.

16

3 Theory of Decision Making

In this Chapter, we present the general decision-making theory. The aim of this theory is to help the

decision maker to select one action from all available options. These options are relevant to a system

(i.e. part of the real world) in two ways (i) decisions on description of the system, and (ii) decisions

influencing the system. The purpose of this chapter is to summarize the principle of decision making

and to identify the key tools that are to be mapped on the designed software.

The adopted principle of the optimal decision-making under uncertainty (Section 3.3) implies the

following important conclusion:

Incomplete knowledge and randomness have the same operational

consequences for decision-making.

Therefore, they should be treated in the same way. This is known asBayesian decision making.

The basic formalism of Bayesian DM is presented in Section 3.1, together with review of basic

probability calculus.

Typically, the process of dynamic decision making problem is decomposed into the following

sub-problems: (i) Bayesian learning [44], summarized in Section 3.2, and (ii) the design of the

optimal strategies [29, 28], summarized in Section 3.3. However, the intended extension of problem

to multiple-participants requires a new operation, namely: (iii) merging of information, as outlined

in Section 3.4.

3.1 Bayesian formalism

The conventions presented here are mostly respected in this work. If some exception is necessary, it

is explicitly explained and used just at the place of its validity. If some verbal notions are introduced

within bodies of Propositions, Remarks etc., then they areemphasized by the printthat differs from

that of the surrounding text. The basic notational symbols and rules are summarized in the table of

notational conventions on page xi.

3.1.1 Basic nomenclature

A brief characterization of the introduced notion is summarized here.

Random variable is a mapping with a numerical range, i.e. a subset of the multi-variate, real-

valued space.

17

CHAPTER 3. THEORY OF DECISION MAKING

Realization is a value of the random variable. Often, the random variable and its realization are

not formally distinguished, as is usual in the applications of probability theory. The proper

meaning is determined by the context.

Participant is an abbreviation for a participant of the decision making process. It might be a

person, mechanism, or group of persons or mechanisms.

Environment is part of the world that is of interest for a participant who should either (i) describe,

or (ii) influence it. The environment is specified with respect to the aim that the participant

wants to reach and with respect to the tools it has available.

Decision is the value of a random variable that can be directly chosen by the participant for reach-

ing its aims.

Decision rule is a mapping that transforms knowledge of a participant into a decision.

Strategy is a sequence of decision rules.

Traditionally, we distinguish the decision making strategy in two categories based on the type

of decisions they make.

Controller is a causal strategy assigning inputs that influence the environment.

Estimator is a causal strategy evaluating decisions about description of the system.

3.1.2 Probabilistic calculus

Uncertainty in the applied DM theory [28] is described by probability density functions (pdf). In this

Section, we review basic calculus with pdfs. More detailed and formal treatment can be found in

[45].

3.1.2.1 Basic elements

Probability density function (pdf) is a functionf (x) of random random variablex with the

following properties:

Non-negativity f (x) ≥ 0,

Normalization∫f (x) dx = 1.

Probability mass function is a pdf of discrete argument. In this text, no formal distinction be-

tween pdf and probability mass function is needed. We will use pdf even for discrete argu-

ments. In this way, a significant simplification and unification of all formulas can be achieved.

18

3.1. BAYESIAN FORMALISM

One only has to keep in mind that the integration has to be replaced by regular summation

wherever the argument is discrete1.

For the simplicity of explanation, we distinguish the following special cases of pdfs. Consider a

generic pdf,f (ρ), on a multivariate random variableρ ≡ (α, β, γ).

joint pdf f(α, β|γ) of α, β conditioned onγ

Is a pdf on(α, β)∗ restrictingf(ρ) on the cross-section ofρ∗ given by a fixedγ

conditional pdf f(β|α, γ) of β conditioned onα, γ

Is a pdf onβ∗ restrictingf(ρ) on the cross-section ofρ∗ given by a fixedα, γ.

Theconditioning symbol| is dropped if just trivial conditions are considered.

marginal pdf f(α|γ) of α conditioned onγ

Is a pdf onα∗ restrictingf(ρ) on the cross-section ofβ∗ given by a fixedγ with no information

onβ.

conditional independence variablesα andβ are independent under the conditionγ iff

f(α, β|γ) = f(α|γ)f(β|γ). (3.1)

3.1.2.2 Operations on pdf

For a generic pdf with multivariate argument,f (ρ) = f (α, β, γ), ρ = α, β, γ, we define the

following operations:

Normalization∫f (α, β|γ) dαdβ =

∫f (α|β, γ) dα =

∫f (β|α, γ) dβ = 1.

Chain rule f(α, β|γ) = f(α|β, γ)f(β|γ) = f(β|α, γ)f(α|γ).

Marginalization

f(β|γ) =∫f(α, β|γ) dα. (3.2)

Bayes rule

f(β|α, γ) =f(α|β, γ)f(β|γ)

f(α|γ)=

f(α|β, γ)f(β|γ)∫f(α|β, γ)f(β|γ) dβ

∝ f(α|β, γ)f(β|γ). (3.3)

Theproportion sign,∝, means that the factor independent ofβ and uniquely determined by

the normalization is not explicitly written in the equality represented.

1This can also be achieved by employment of measure theory, operating in a consistent way with probability densitiesgeneralized in the Radon-Nikodym sense [46]. The practical effect is the same and therefore is neither necessary norhelpful for our purposes.

19


Expectation of functiong (α)

Ef(α) (g (α)) ≡∫g (α) f (α) dα. (3.4)

Notation (3.4), will be simplified toEf(α) (α) ≡ α in situations where it is clear with respect

to which distribution the expectation is to be evaluated.

Pdf of transformed variables Let α be a real vector, α ≡ [α1, . . . , αα] andT = [T1, . . . , Tα]bijection (one-to-one mapping) with finite continuous partial derivatives a.e. onα∗

Jij(α) ≡ ∂Ti(α)∂αj

, i, j = 1, . . . , α, (3.5)

for all entriesTi of T and entriesαj of α.

Then,

fT (T (α))|J(α)| = f(α), (3.6)

where| · | denotesdeterminantof the matrix in its argument.

Kullback-Leibler (KL) divergence measures the proximity of a pair of pdfsf, f acting on a set

x∗. It is defined as follows [47]:

KL(f ||f

)≡∫f(x) ln

(f(x)f(x)

)dx. (3.7)

The KL divergence has the following properties:

1. KL(f ||f

)≥ 0;

2. KL(f ||f

)= 0 iff f (x) = f (x) almost everywhere;

3. KL(f ||f

)= ∞ iff on a set of a positive measuref (x) > 0 andf (x) = 0;

4. KL(f ||f

)6= KL

(f ||f

)and KL divergence does not obey the triangle inequality.

Given 4., care is needed in the syntax describingKL (·). We say that (3.7) isfrom f (x) to

f (x).

3.2 Dynamic learning

The aim of the dynamic decision making is to find the optimal DM strategy. However, if any uncer-

tainty, i.e. unobserved internal random variableΘt, is present in the controlled environment we have

to model it.

Handling of uncertainty in models of real world is a challenging problem on its own, i.e. without

the ambition of influencing the world. Bayesian treatment of this sub-problem will be addressed in

this Section. Results established in this Section will be used later for the design of control strategy.

20

3.2. DYNAMIC LEARNING

Enviroment

decision-maker

datayt

actionsut

observed datadt

internal variablesΘt

Figure 3.1:Basic DM scenario

3.2.1 Probabilistic models: description of reality

The basic scenario of decision-making is illustrated in Figure 3.1.

The most complete probabilistic description of the closed loop environment–participant is the joint

pdf

f(d(t),Θ(t)|Θ0, d (0)

)f (Θ0|d (0)) = f

(d(t),Θ(t)|Θ0

)f (Θ0)

of all random variables involved in the closed loop. In it,Θ0 is initial uncertain unobserved random

variable, calledinternal variable, andd (0) stands for the prior information available before the

choice of the first input. Habitually,d (0) is considered only implicitly.

The chain rule for pdfs [44] implies the following decomposition of the joint pdf representing the

complete probabilistic description of the closed-loop behavior:

f(d(t),Θ(t)|Θ0

)= f (Θ0)×

×∏t∈t∗

f (yt|ut, d (t− 1) ,Θ(t)) f (Θt|ut, d (t− 1) ,Θ(t− 1)) f (ut|d (t− 1) ,Θ(t− 1)) . (3.8)

The chosen order of conditioning distinguishes the following important pdfs:observation model f (yt|ut, d(t− 1),Θ(t)) ,internal model f (Θt|ut, d (t− 1) ,Θ(t− 1)) ,DM strategy f (ut|d (t− 1) ,Θ(t− 1)) .

Note that these models are conditioned on the whole observation history as well as the whole

history of internal variables. In practical situations, however, the reality has to be described by

simpler models. Therefore, we introduce the following general assumptions.

Agreement 3.1 [Reduced dependency on internal variables ]

1. Distribution of the internal Θt is determined by the current input ut, all past data d(t− 1) and

the past internal Θt−1 only, i.e.

f (Θt|ut, d (t− 1) ,Θ(t− 1)) = f (Θt|ut, d (t− 1) ,Θt−1) . (3.9)

21


2. Distribution of the observed output yt is determined by the current decision ut, all past data

d(t− 1) and the internal Θt only, i.e.

f (yt|ut, d (t− 1) ,Θ(t)) = f (yt|ut, d (t− 1) ,Θt) . (3.10)

3. Admissible decision strategies, generating the decision ut from the observed data history

d (t− 1) and ignoring the unobserved internals Θ(t− 1), are considered, i.e.

f (ut|d (t− 1) ,Θ(t− 1)) = f (ut|d (t− 1)) . (3.11)

Under these Assumptions, the closed-loop description (3.8) reduces to

f(d(t),Θ(t)|Θ0

)=∏t∈t∗

f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1) ,Θt−1) f (ut|d (t− 1)) .

(3.12)

Remark 3.1 (Model structure) The notation used here implies that the random variables in the pdf

fully determine the model. This may not be sufficient in certain situations, e.g. notation f (x) does

not distinguish between Normal or Uniform distribution of the same fixed moments. However, in

most of this text, this situation will not arise. In cases, where confusion may arise, we will use

additional conditioning on an abstract object M, denoting model structure. Hence, the distinction

(e.g. between Normal and Uniform) distribution would be denoted f (x|M1) and f (x|M2).

3.2.2 Bayesian filtering

Proposition 3.1 (Bayesian filtering in closed control loop)Let the prior pdf f(Θ0) be given and

the assumptions of Agreement 3.1 are met. Then, the pdf f (Θt|d (t)), determining the estimate of

internals, and the pdf f (Θt|ut, d (t− 1)), determining the prediction of internals, evolve recursively

as follows:

f (Θt|ut, d (t− 1)) =∫f (Θt|ut, d (t− 1) ,Θt−1) f (Θt−1|d (t− 1)) dΘt−1, (3.13)

f (Θt|d (t)) ∝ f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1))f (yt|ut, d (t− 1))

, (3.14)

f (yt|ut, d (t− 1)) =∫f (yt|ut, d (t− 1) ,Θt) f (Θt|ut, d (t− 1)) dΘt. (3.15)

Proof: See, for instance, [46]

Operations in Propositions 3.1 will be known in the sequel as:time-update(3.13),data-update

(3.14), andprediction(3.15). Objects on the left-hand-side of the operations will be denoted as: the

estimate, f (Θt|d (t)), in (3.14), andpredictor, f (yt|ut, d (t− 1)), in (3.15).

Here, we note that:

22

3.3. DYNAMIC DESIGN OF CONTROL STRATEGY

The Bayesian filtering does not depend on the functional form of the used

admissible control strategy f(ut|d(t− 1))t∈t∗ , but only on the generated

inputs ut.

This will be important for design of the DM strategy.

3.2.3 Bayesian estimation

This Section deals with a special version of filtering calledestimation. It arises when the internal

variablesΘt are time invariant

Θt = Θ, ∀t ∈ t∗. (3.16)

The common valueΘ is called unknownparameter. In this case, the internal model isf(Θt|ut, d(t−1),Θt−1) = δ(Θt −Θt−1).

Hence, the time-update operation (3.13) of Bayesian filtering has the following form:

f (Θt|d (t− 1)) = [f (Θt−1|d (t− 1))]Θt−1→Θt, (3.17)

here notation[·]x→y denotes replacement of the argumentx by y.

The data-update operation (3.14) is unchanged. However, the simplified time-update (3.17) allows

to expand the recursion of data-updates into the following (non-recursive) batch variant:

f (Θ|d (t)) ≡∏τ≤t f (yτ |uτ , d(τ − 1),Θ) f (Θ)

N (d (t))≡ L (Θ, d(t)) f (Θ)

N (d (t)). (3.18)

The introducedlikelihood function

L (Θ, d(t)) ≡∏τ≤t

f (yτ |uτ , d(τ − 1),Θ) , (3.19)

evolves independently of normalization. It starts, however, from theL (Θ, d(0)) identically equal

to 1.

The normalization factorN(·) is defined by the formula

N (d(t)) =∫

L (Θ, d(t)) f (Θ) dΘ ∝ f (yt|ut, d(t− 1)) . (3.20)

With it, the predictor (3.15) can alternatively be expressed as follows:

f (yt|ut, d(t− 1)) =N (d(t))

N (d(t− 1)). (3.21)

3.3 Dynamic design of control strategy

In this Section, we summarizefully probabilistic design (FPD)of the DM strategy. This approach

is taken as the basis of multiple participant DM. It is an alternative to the standard stochastic control

23


design, which is formulated as minimization of an expected loss function with respect to decision

making strategies, e.g. [12, 10]. The standard design can be interpreted as an attempt to influence

some characteristics of closed-loop behavior be selecting an appropriate decision making strategy.

Loss function is generally deduced from a desired deterministic relationships between considered

variables and it is unrelated (at most weakly related) to random nature of the involved mappings, i.e.

time evolution (3.9) and observation models (3.10).

The FPD [29, 30], reviewed in this Section, formulates the design problem in the way that allows

the designer respect its random nature. It starts with specification of the decision-making aim in the

form of ideal pdfof the closed loop. Then, the DM strategy is chosen as a minimizer of the KL

divergence (3.7) between the observed and the ideal pdf.

The approach has the following special features.

• The KL divergence to an ideal pdf forms a special type of loss function that can be simply

tailored both to deterministic and stochastic features of the considered DM problem.

• Minimum of the KL divergence – i.e. the optimal DM strategy – is found in aclosed form.

Thus, the minimization step “disappear" from the standard pair of operations (minimization,

and expectation) that are applied sequentially when optimizing via stochastic dynamic pro-

gramming [29].

• The use of the multi-modal desired distribution provides a well justified and feasible multiple-

objective DM design [17, 48].

The ideal pdf is constructed in the way analogous to (3.12) with user-specified factors distinguished

by the superscriptbI:

bIf(d(t),Θ(t)|Θ0

) bIf (Θ0) =∏t∈t∗

bIf (yt|ut, d (t− 1) ,Θt) bIf (Θt|ut, d (t− 1) ,Θt−1) bIf (ut|d (t− 1)) f (Θ0) . (3.22)

Here pdfsbIf (yt|ut, d (t− 1) ,Θt) , bIf (Θt|ut, d (t− 1) ,Θt−1) describe the ideal models of ob-

servation and time evolution of internals andbIf (ut|d (t− 1)) the ideal DM strategy.

The prior pdf on the initial internal random variableΘ∗0 cannot be influenced by the optimized DM

strategy so that it is left to its fate, i.e.bIf (Θ0) = f (Θ0).To formulate the FPD concisely, the following shorthand notation is used below:

ft ≡ f(d(t),Θ(t)|Θ0

)f (Θ0) ,

bIft ≡ bIf(d(t),Θ(t)|Θ0

)f (Θ0) .

Under the assumptions made in Agreement 3.1, the FPD is formulated as follows.

Find admissible DM strategy minimizing the KL divergence KL(ft|| bIft

).

24

3.3. DYNAMIC DESIGN OF CONTROL STRATEGY

Proposition 3.2 (Solution of FPD) Let both the joint pdf f(Θ(t), d(t)|Θ0) and its ideal counterpartbIf(Θ(t), d(t)|Θ0) meet the Assumptions 3.1.

Then, the optimal admissible DM strategy minimizing KL(ft|| bIft

)is given by the pdfs:

bof(ut|d(t− 1)) = bIf(ut|d(t− 1))exp[−ω(ut, d(t− 1))]

γ(d(t− 1)), t ∈ t∗, (3.23)

γ(d(t− 1)) ≡∫

bIf(ut|d(t− 1)) exp[−ω(ut, d(t− 1))] dut. (3.24)

Starting on the DM horizon, t, with γ(d(t)) ≡ 1, the functions ω(ut, d(t − 1)) are generated recur-

sively for t = t, t− 1, . . . , 1, in the backward manner, as follows:

ω(ut, d(t− 1)) ≡∫

Ω(ut, d(t− 1),Θt−1)f(Θt−1|d(t− 1))dΘt−1 (3.25)

= Ef(Θt−1|d(t−1)) (Ω(ut, d(t− 1),Θt−1)) ,

Ω(ut, d(t− 1),Θt−1) ≡∫f(yt|ut, d(t− 1),Θt)f(Θt|ut, d(t− 1),Θt−1)×

ln(

f(yt|ut, d(t− 1),Θt)f(Θt|ut, d(t− 1),Θt−1)γ(d(t)) bIf(yt|ut, d(t− 1),Θt) bIf(Θt|ut, d(t− 1),Θt−1)

)dyt dΘt. (3.26)

Here, pdfs f(Θt|d(t)) have their usual meaning given by Proposition 3.1.

Proof: See [31].

Note that (3.26) can be written in terms of expected value ofγ (·), and KL divergence, as follows:

Ω(ut, d(t− 1),Θt−1) ≡ Ef(yt|ut,d(t−1),Θt)f(Θt|ut,d(t−1),Θt−1) (− ln γ(d(t)))

+ Ef(Θt|ut,d(t−1),Θt−1)

(KL(f(yt|ut, d(t− 1),Θt)|| bIf(yt|ut, d(t− 1),Θt)

))+ KL

(f(Θt|ut, d(t− 1),Θt−1)|| bIf(Θt|ut, d(t− 1),Θt−1)

). (3.27)

γ (d (t− 1)) = E bIf(ut|d(t−1))

(exp

[−Ef(Θt−1|d(t−1)) (Ω (ut, d (t− 1) ,Θt−1))

])(3.28)

Both, expectation (3.4) and KL divergence (3.7), are basic operations of probabilistic calculus (Sec-

tion 3.1.2) and should be readily available. This is important for design of the software image of this

theory.

Proposition 3.2 is the most general design scenario we consider in this work. However, for many

practical problems it can be simplified. Specifically, if we do not care about the internal variables,

the problem can be re-formulated in terms of the input-output models (3.15).

Proposition 3.3 (Data-driven FPD) Let us try to influence just the joint pdf of observed data ft ≡f(d(t)) so that it is close to its ideal counterpart bIft ≡ bIf(d(t)).

25


Then, the optimal admissible DM strategy minimizing KL(ft|| bIft

)is given by the pdfs:


γ(d(t− 1)), t ∈ t∗, (3.29)

γ(d(t− 1)) ≡∫

bIf(ut|d(t− 1)) exp[−ω(ut, d(t− 1))] dut.

Starting with γ(d(t)) ≡ 1, the functions ω(ut, d(t − 1)) are generated recursively for t = t, t −1, . . . , 1 in the backward manner, as follows

ω(ut, d(t− 1)) ≡∫f(yt|ut, d(t− 1)) ln

(f(yt|ut, d(t− 1))

γ(d(t)) bIf(yt|ut, d(t− 1))

)dyt. (3.30)

Proof: It coincides with Proposition 3.2 simplified to the case without internals.

Note, that the proved proposition covers fully probabilistic counter-part of classical dual control

[14, 15], when the environment is described up to unknown parametersΘt. In this case, it is sufficient

to run Bayesian filtering, Proposition 3.1, and to use the predictorf(yt|ut, d(t − 1)) as the model

relating inputs to outputs.

3.4 Merging of pdfs

The task of information fusion is a rich area of research used in many engineering applications, see

the Information Fusionjournal published by Elsevier. In the probabilistic paradigm, each source of

information is represented by a pdf. Thus, the task of information fusion can be translated into the

task ofmergingof pdfs [32].

The operation of merging is defined as a mapping of two pdfs into one:

f1 (Θt|d (t)) , f2 (Θt|d (t))merge−→ f (Θt|d (t)) , (3.31)

wheref1 andf2 are thesource pdfs, and thef is themerged pdf. The aim of the merging operation

is to preserve within one pdf,f , as much information from the sources,f1 andf2, as possible.

Note that, the source pdfs in (3.31) are defined on the same variable as the merged pdf, hence the

mapping will be known asdirect merging. Alternatively, the sources can be defined on the variable

in condition of the merged pdf,

f1 (d|d (t)) , f2 (d|d (t))merge−→ f (Θt|d (t)) , (3.32)

in which case, the mapping be known asindirect merging.

3.4.1 Direct merging of pdfs

The general formalization of the merging operation is still not fully stabilized. The most promising

approach to direct merging is based on minimization of weighted sum of Kullback-Leibler diver-

26

3.4. MERGING OF PDFS

gences [49], [32].

The task is formalized only for independent observations. Therefore, in this Section, all models

(3.9)–(3.11) are defined as time-invariant, i.e.f (d|Θ) for the observation model (3.10).

The merged pdff (d) is selected so that a weighted sum of Kullback-Leibler divergences between

the source pdfs and the resulting one is minimized:

f (d) = arg minf

(α2KL (f2 (d) ||f (d)) + (1− α2) KL (f1 (d) ||f (d))) . (3.33)

The optimum of (3.33), for merging of distributions of thesamevariable, is found in the form of a

probabilistic mixture of the source pdfs:

f (d) = α2f2 (d) + (1− α2) f1 (d) . (3.34)

Optimal solution for distributions with partially overlapping arguments—e.g.f (y, u) with f (y)andf (u)—is not analytically tractable. However, an iterative algorithm minimizing (3.33) can be

found [32].

Remark 3.2 (Approximations in direct merging) Note, that even for the analytical solution (3.34)

the number of components in the mixture grows with each iteration. Therefore, it may be necessary to

find a reasonable projection into finite-dimensional family. Solutions to this task are readily available

only for certain families [50, 51].

Alternatively, the problem can be formulated in the reverse KL divergence

f (d) = arg minf

(α2KL (f (d) ||f2 (d)) + (1− α2) KL (f (d) ||f1 (d))) . (3.35)

The optimum of (3.35), for merging of distributions of thesamevariable, is found in the form of a

geometric mean of the source pdfs:

f (d) = (f2 (d))α2 (f1 (d))(1−α2) . (3.36)

This solution is less optimal in the sense of statistical utility [52], but it can have computational

advantages for certain pdf-families.

3.4.2 Indirect merging of pdfs

A procedure for indirect merging is even less developed than that for direct merging. Here, we

describe the most promising approach that is being developed in the department (personal commu-

nication with J. Kracik [32]). The basic idea follows from reformulation of Bayes rule (3.3) in the

following form [53]:

f (Θ|d) ∝ f (Θ) exp(−t∫

brf(d|d (t)) ln(

1f (d|Θ)

)d d

), (3.37)

27


where brf (d|d (t)) denotes the empirical distribution on the observed data (3.38). For independent

observations, i.e. many observations of one variabled, the empirical density is defined as follows:

d (t) ⇐⇒ brf (d|d (t)) ≡ 1t

t∑i=1

δ (d− dt) . (3.38)

Hered is random variable anddt are the observed realization ofd at timet.

Equation (3.37) can be used to interpret the Bayesian estimation (Section 3.2.3) as a procedure

measuring how individual models—from considered parameterized class of pdfs—fit the empirical

density brf (d|d (t)). Equation (3.37) is valid for estimation from one source of data (i.e. one empir-

ical density). Using result (3.34) from direct merging (Section 3.4.1), we define the joint empirical

distribution as:brf (d) = α2f2 (d) + (1− α2) f1 (d) . (3.39)

Then, (3.37) can be re-written using the expectation operation (3.4) as follows:

f (Θ|d) ∝ f (Θ) exp(t(α2E brf2(d|V ) (ln f (d|Θ)) + (1− α2) E brf2(d) (ln f (d|Θ))

)). (3.40)

Hence, the merging operation has the same structure as FPD (Proposition 3.2). This will be important

in design of software structures.

From (3.40), it is possible to see that merging on the full data records is rather easy task, since it

correspond to learning on the data records. However, a new challenge arises when the data records

are incomplete. In such a case, the observation modelf (d|Θ) must be defined only on the available

subset of the data record. This can be achieved by normalization of the original observation model

[32].

In some applications, it is not feasible to operate on full-length data records, since these are ex-

tremely large. Then, we seek a suitable replacement of the empirical densitybrf (d|d (t)). Optimal

solution to this problem is not known to us. Preliminary results suggest that approximation of the

empirical density by an outer model (3.15), i.e.

brf (d) ≈ f (d) =∫f (d,Θ) dΘ,

is a reasonable option.

28

4 Feasible Decision Making

The theory of decision making, presented in Chapter 3, is formulated in terms of mathematical

objects (pdfs), and operations associated with them. The aim of this thesis is to represent these math-

ematical structures in a computer. However, it is feasible only for a subset of pdfs and operations on

them. Representation of pdfs in computers has been studied for a long time. There are two principle

approaches to the problem: (i) parametric, and (ii) non-parametric approach [54](for example). In

this work, our concern is with computational efficiency of operations with pdfs, therefore, we focus

on parametric models.

In order to achieve computational efficiency we introduce the following requirement.

Requirement 4.1 Statistics (shaping parameters) describing pdfs of a decision-maker should be

finite-dimensional, with the same dimensionality for increasing number of processed data and in-

creasing DM horizon.

This requirement has serious consequences on the DM process, since all the involved pdfs are dy-

namically evaluated via the basic operations of decision making, namely: time update (3.13), data

update (3.14), and FPD (3.27), (3.28). Hence, we require the chosen family of distributions to be

closedunder these operations. The problem has been studied theoretically [55, 56, 57], and the

following families has been found to have this property:

1. probabilistic mixture with known components but with unknown weights [55],

2. Daum Family [56], which is generalization of linear state-space models [58],

3. Exponential family (under additional assumptions) [57].

In all other families, extra approximations are required to achieve tractability, e.g. mixture of pdfs

from EF [59]. In this Chapter, we review the basic DM operations for the linear state-space model

(Section 4.1), and exponential family (Section 4.2). Then, we review the most commonly used distri-

butional approximation (Section 4.3). The use of these distributional approximations is then studied

on the problems of Bayesian filtering (Section 4.4), estimation (Section 4.5), and FPD (Section 4.6).

29

CHAPTER 4. FEASIBLE DECISION MAKING

4.1 Linear state-space models

In this Section, we study DM with linear state-space models, defined as follows:

f (Θt|Θt−1, A,B,R) = N (AΘt−1 +But, R) (4.1)

f (yt|Θt, C,D,Q) = N (CΘt +Dut, Q) (4.2)

Here (4.1) defines the internal model, and (4.2) the observation model.

In the sequel, we will assume that matricesA,B,R,C,D,Q are known. Hence, for clarity of

notation, we drop them from the conditioning of the pdfs.

4.1.1 Dynamic learning

Application of the general Bayesian filtering (Proposition 3.1) to model (4.1)–(4.2) is known as

Kalman filtering [60].

Let us assume that

f (Θt−1|d (t− 1) , ut−1) = N (µt−1,Σt−1) ,

then, the time update operation (3.13) yields the following result:

f (Θt|d (t− 1) , ut−1) = N(µt, Σt

), (4.3)

µt = Aµt−1 +But−1,

Σt = R+AΣt−1A′.

The data update operation (3.14) yields:

f (Θt|d (t) , ut) = N (µt,Σt) , (4.4)

µt = µt + ΣtC′Q−1 (yt − Cµt −Dut) ,

Σt = Σt − ΣtC′(Q+ CΣtC

′)−1

CΣt.

One-step-ahead prediction (3.15) is:

f (yt|d (t− 1) , ut) = N(Cµt +Dut, Q+ CΣtC

). (4.5)

Hence, the functional recursion (3.13)–(3.14) can be replaced by an algebraic recursion onµt, Σt.

4.1.2 Fully probabilistic design

Application of the general FPD (Proposition 3.2) is not tractable in the sense of Requirement 4.1, see

discussion at the end of this Section.

30

4.1. LINEAR STATE-SPACE MODELS

Therefore, for illustration, we consider a model with fully observed state, i.e.C = I, andQ = 0.

Then, it is necessary to choose only the following ideal pdfs

bIf (Θt|Θt−1, ut) = N (0, R) , (4.6)bIf (ut|d (t− 1)) = N (0, S) .

This choice is practically reasonable, since the ideal spread around zero state can not be lower than

that of the innovations.

In order to evaluate the FPD recursion (3.27)–(3.28), we need the KL divergence of two normal

distributions, which is [28]:

KL (N (µ1,Σ1) ||N (µ2,Σ2)) =12[ln(Σ2Σ−1

1

)− µ+ tr

(Σ1Σ−1

2

)+ (µ1 − µ2)

′ Σ−12 (µ1 − µ2)

]. (4.7)

Using (4.1) and (4.6) in (4.7), we obtain

KL(f (Θt|Θt−1, ut) || bIf (Θt|Θt−1, ut)

)=

12

(AΘt−1 +But)′R−1 (AΘt−1 +But) .(4.8)

Note that since the covariance matrices of the involved distributions are identical, only the quadratic

term in (4.7) remains in the result.

Let us assume that

− ln (γ (t)) =12Θ′tΦtΘt + zt. (4.9)

Inserting (4.8) and (4.9) into (3.27) we obtain:

Ω(ut,Θt−1) = Ef(Θt|Θt−1,ut)

(12Θ′tΦtΘt + zt

)+

12

(AΘt−1 +But)′R−1 (AΘt−1 +But) ,(4.10)

=12

(AΘt−1 +But)′ (Φt +R−1

)(AΘt−1 +But) + zt.

Note that sinceΘt is observable, it plays to role ofd (t) in conditioning of the strategy (3.23). The

optimal DM strategy (3.23) is then:


γ(d(t− 1))

= exp(−1

2(u′tS

−1ut + (AΘt−1 +But)′ (Φt +R−1

)(AΘt−1 +But)

))×(4.11)

(2π)−12u |S|−

12u exp (−zt) γ−1 (d (t− 1))

Completing squares in exp of (4.11) with respect tout we can separate (4.11) into a Gaussian distri-

31


bution

bof(ut|d(t− 1)) = Nut (µt,Σt) ,

Σt =(S−1 +B′ (Φt +R−1

)B)−1

,

µt = ΣtB′ (Φt +R−1

)AΘt−1, (4.12)

which also determines the Bellman function

γ (d (t− 1)) = |S|12u exp (zt)

∣∣S−1 +B′ (Φt +R−1)B∣∣− 1

2u×

exp(−1

2[Θ′t−1A

′ ((Φt +R−1)

+(Φt +R−1

)BΣtB

′ (Φt +R−1))AΘt−1

]). (4.13)

Hence, the logarithm of (4.13) remains in the form of (4.9):

− ln γ (d (t− 1)) =12Θ′t−1Φt−1Θt−1 + zt−1,

Φt−1 = A′ ((Φt +R−1)−(Φt +R−1

)BΣtB

′ (Φt +R−1))A, (4.14)

zt−1 = zt +12u(ln |S| − ln

∣∣S−1 +B′ (Φt +R−1)B∣∣) . (4.15)

The obtained result is equivalent to the classical linear-quadratic (LQ) design, see [29] for details.

Remark 4.1 (FPD for unobserved state)Note that the FPD solution (3.27) for unobserved state

extends (4.10) by one extra expectation of KL divergence of the observation models. Since the

observation models are also Gaussians, it has the form of (4.7). However, a problem arise in (3.28),

namely in taking expectation over f (Θt|d (t− 1)), especially on a long horizons. From (4.4), the

form of f (Θt|d (t− 1)) as a function of h unobserved observations is a Normal distribution with

mean value as an h-order polynomial. All operations of associated with FPD are still analytically

tractable, however, their complexity is growing rapidly with the DM horizon.

This behaviour is not compatible with the requirement of feasibility

(Requirement 4.1).

Hence, we will provide only partial support to this approach, until more suitable approximations—

such as neglecting some terms in the h-order polynomial—will be found.

4.1.3 Merging

Two basic merging operations has been considered in Section 3.4, namely direct merging and indirect

merging.

Direct merging: was defined on outer observation models, i.e. in this case onbIf (dt|d (t− 1))being Gaussian distributed. Hence the merged distribution (3.34) is a mixture of Gaussians.

32

4.2. TIME-INVARIANT EXPONENTIAL FAMILY MODELS

For feasibility reasons, this distribution has to be projected into a single Gaussian using KL di-

vergence [50]. Merging of source pdfs on overlapping variables for Gaussians is not available.

For the alternative formalization (3.35), the solution is a geometric mean of Gaussians, i.e.

also a Gaussian. Hence, no further approximations are required. Moreover, this approach is

also perspective for merging of source pdfs on overlapping variables.

Indirect merging: for Bayesian filtering was not elaborated yet.

4.2 Time-invariant exponential family models

In this Section, we review the task of parameter estimation (Section 3.2.3).

4.2.1 The models

Consider the following observation model:

f (yt|ut, d (t− 1) ,Θ) = f (yt|ψt,Θ) = A (Θ) exp 〈B (Ψt) , C (Θ)〉+D (Ψt) (4.16)

where

regression vector ψt is determined by known (i.e. observed) variablesut, d(t− 1) ;

data vector Ψt = [yt, ψt] . A new transformed variableyt is defined as

yt = gt (d (t)) , (4.17)

via a known smooth one-to-one mappinggt (·)—for givenut andd (t− 1)—with a non-zero

Jacobian:

Jt =∣∣∣∣∂gt (d (t))

∂yt

∣∣∣∣ . (4.18)

A(Θ) is non-negative function defined onΘ∗

B(·), C(·) are array functions of compatible, finite and fixed dimensions. They are defined on data

vectorΨt, and internalsΘ, respectively.

D(·) is a non-negative scalar function defined onΨ∗i .

〈·, ·〉 is a functional, linear in the first argument, defined (within this text) as follows

〈x, y〉 =

x′y if x, y are vectors,′ is transposition

tr[xy] if x, y are matrices, tr is trace∑i∈i∗ xiyi if x, y are arrays with a multi-indexi,

(4.19)

Models of the form (4.16) are known as theexponential family(EF).

33


Remark 4.2 (Exponential family for dynamic models.) The exponential family is a rather wide

family if we consider independent identically distributed observations. For example, the Poisson

distribution,

f (yt|λ) = Po (λ) =(

1yt!

)exp (−λ) exp (yt log (λ)) ,

is clearly a member of the family. However, the family embraces only a few dynamic (auto-regressive)

models, i.e. models where yt = yt (d (t− 1)). For example, a simple 2nd order auto-regressive Pois-

son distribution

f (yt|Θ, yt−1) = Po (Θ1yt−1 + Θ2yt−2)

=(

1yt!

)exp (−Θ1yt−1 −Θ2yt−2) exp (yt (log (Θ1yt−1 + Θ2yt−2))) ,

is clearly out of the family, since logarithm of a sum can not be expressed in any scalar product from.

In the auto-regressive case, i.e. with a non-empty regression vectorψ, the exponential family contains

the following special cases:

1. normal (Gaussian) linear-in-parameters models,

f (yt|Θ, d (t− 1)) = N(θψt,Ω−1

), (4.20)

where bothθ andΩ are considered unknown, i.e.Θ = [θ,Ω].

2. Markov chain models for discrete-valued variables

f (yt|Θ, d (t− 1)) =y∏i=1

ψ∏j=1

Θδ(yt−y)δ(ψt−ψ)〈yt〉,〈ψt〉 , (4.21)

where〈yt〉 denotes a unique integer number associated with each possible (discrete) state of

yt, 1 ≤ 〈yt〉 ≤ y. Hence, parameterΘ can be seen as a multi-index variable, each element

of which determines probability of realizationyt with index〈yt〉 given realization ofψt with

index〈ψt〉. The observation model (4.21) has the form of Multinomial pdf [61].

These two models are (almost) the only autoregressive members of the family. They are also the

most practically important ones.

4.2.2 Learning

For the stationary system, the time-update operation is trivial

f (Θt|d (t− 1)) = [f (Θt−1|d (t− 1))]Θt−1→Θt.

34

4.2. TIME-INVARIANT EXPONENTIAL FAMILY MODELS

Consider the previous estimate to be of the following type,

f (Θt−1|d (t− 1)) = Aνt−1 (Θ) exp 〈Vt−1, C (Θ)〉 , (4.22)

where theD (Ψt) term was eliminated by normalization.

The data-update operation yields:

f (Θt|d (t)) ∝ Aνt−1 (Θ) exp 〈Vt−1, C (Θ)〉

A (Θ) exp 〈B (Ψt) , C (Θ)〉 dΘ,

= Aνt−1+1 (Θ) exp 〈Vt−1 +B (Ψt) , C (Θ)〉 .

I.e. it is of the same form as (4.22) with algebraic recursion

νt = νt−1 + 1,

Vt = Vt−1 +B (Ψt) . (4.23)

The predictive distribution (3.15) has the following form:

f (yt|d (t− 1)) =N (Vt−1 +B (Ψt) , νt−1 + 1)

N (Vt−1, νt−1). (4.24)

Remark 4.3 (Conjugacy) Note that we made the choice of the distribution at time t − 1. The

distribution was intentionally chosen to be self-replicating under the data-update operation with the

observation model (4.16). This is known as the conjugacyprinciple [62].

For the considered special cases, the exact types of distributions (4.22) are [28]: (i) Gauss-Wishart

for linear Gaussian model (4.20), and (ii) Dirichlet for the Markov model (4.21).

4.2.3 Fully probabilistic design

General formulation of FPD for the whole family is not available. In special cases, the solution

reduces to propagation of finite dimensional Belman function similar to that in Section 4.1.

For Markov model (4.21), the Belman function can be found in the form a multi-index array,

similar to that for parametersΘ in (4.21) [28]. We do not review this special case here, since it does

not require any new operations or structures than those already used in Section 4.1.

4.2.4 Merging

Two basic merging operations have been considered in Section 3.4, namely direct, and indirect merg-

ing.

Direct merging: was defined on outer observation models, i.e. in this case onbIf (dt|d (t− 1)) of

the type (4.24). Analytical results are available only for special cases from EF. Direct merging

35


of Gaussians was already discussed in Section 4.2.

Analytical result are, however, available for discrete pdfs (such as Dirichlet pdf) which also

belong to EF. For these pdfs, operations of algebraic (3.34) and geometric (3.36) are analyt-

ically tractable. Moreover, merging of source pdfs on overlapping variables is also feasible

[32].

Indirect merging: was defined using empirical densitybrf (d (t)), or predictive distributionf (d (t) |V ).First, we consider the case with empirical density. Using (4.16) in (3.40) yields

f[1] (Θ|d(t)) ∝ f (Θ) exp(t(α2E brf [2](d(t))

(ln f (d (t) |Θ))))

exp(t (1− α2) E brf [1](d(t))

(ln f (d (t) |Θ)))

∝ f (Θ) (A (Θ))t exp(α2E brf [2](d(t))

(〈B (Ψt) , C (Θ)〉+D (Ψt)))

exp(t (1− α2) E brf [1](d(t))

(〈B (Ψt) , C (Θ)〉+D (Ψt))),

which is (due to linearity ofB (Ψt) in scalar product) again of the exponential family with

statistics

V[1] = α2E brf [2](d(t))(B (Ψt)) + (1− α2) E brf [1](d(t))

(B (Ψt)) , (4.25)

νt = ν0 + t.

This can be further simplified to:

V[1] = α2V[2] + (1− α2)V[1], (4.26)

νt = ν0 + t.

4.3 Distributional approximations

Up till now, all operations of probabilistic calculus—namely marginalization (3.2) and expectation

(3.4)—were analytically tractable. In this Section, we review the most common approximation meth-

ods used to overcome computational difficulties associated with evaluation of analytically intractable

pdfs.

The problem can be avoided byprojection of the pdf by onto a family of distribution that is

computationally tractable. In all subsequent operations, such as normalization, marginalization and

evaluation of moments, the original intractable pdf will be replaced (approximated) by its projection:

f (Θ|d(t)) ≈ baf (Θ|d(t)) . (4.27)

Here, baf denotes the best possible approximation withing the chosen computationally tractable

class.

36

4.3. DISTRIBUTIONAL APPROXIMATIONS

Various approximation strategies have been developed. In this Section, we review the most com-

mon approximation techniques.

4.3.1 Certainty equivalence approximation

In many engineering problems, dealing with full pdfs is avoided. A point estimate, i.e. one value of

parameterΘ, is considered as the summarizing result of the learning task.

The point estimate,Θ = Θ (d (t)), can be interpreted as an extreme approximation of the posterior

pdf by the functionδ (·):

f (Θ|d(t)) ≈ baf (Θ|d(t)) = δ((

Θ− Θ)|d(t)

), (4.28)

whereΘ is the chosen point estimate of parameterΘ, andδ (x) is the Dirac delta function∫xδ (x− x) g (x) dx = g (x) ,

if x is a continuous variable, and the Kronecker function

δ (x) =

1,0,

if x = 0otherwise

if x is a discrete variable.

This approximation is known as thecertainty equivalenceprinciple [63]. It remains to determine

an optimal value of the point estimate. Typically, it is chosen as MaximumA Posteriori (MAP)

estimate:

Θ = arg maxΘ

f (Θ|d(t)) . (4.29)

There are many methods for evaluation of MAP estimates. Here, we review the famous EM

algorithm, since it will be used in later derivations.

Algorithm 4.1 (Expectation Maximization (EM) algorithm) is a well known algorithm for ML

estimation—and by extension for MAP estimation—of model parameters Θ = [Θ1,Θ2] [64]. Here,

we follow an alternative derivation of EM via distributional approximations [65]. The task is to

estimate parameter Θ1, of the (intractable) marginal distribution

f (Θ1|d (t)) =∫f (Θ1,Θ2|d (t)) dΘ2. (4.30)

Using Jensen’s inequality, it is possible to obtain a lower bound on (4.30) which is numerically

tractable [65]. The resulting inference algorithm is then a cyclic iteration of two basic steps:

E-step: compute approximate distribution of parameter Θ2, at iteration i:

baf (i) (Θ2|d(t)) ≈ f(Θ2|d(t), Θ(i−1)

1

). (4.31)

37


M-step: using approximate distribution from the E-step, find new estimate Θ(i)1 :

Θ(i)1 = arg max

Θ1

∫Θ2

baf (i) (Θ2|d(t)) ln f (Θ1,Θ2, d (t)) dΘ2. (4.32)

It was proven that this algorithm monotonically increases the marginal likelihood, f (d (t) |Θ1), thus

converging to a local maximum [66].

4.3.2 Laplace’s approximation

This method is based on local approximation by a Gaussian distribution at the MAP estimateΘ, of

the posterior pdff (Θ|d(t)) [67], Θ ∈ <p.Formally, Laplace’s method approximates the distribution (4.27) as follows

f (Θ|d(t)) ≈ N(Θ,H−1

)(4.33)

whereΘ is the MAP estimate (4.29), andH ∈ <p×p is the (negative) Hessian matrix of the logarithm

of the joint pdff (Θ, d (t)) with respect toΘ, evaluated atΘ = Θ,

H = −[∂2 log f (Θ, d (t))

∂Θi∂Θj

]Θ=Θ

, i, j = 1, . . . , p, (4.34)

The asymptotic error of approximation was studied in [67].

4.3.3 Fixed-form minimum distance approximation

The approximating distributionbaf (Θ|η) is chosen as a tractable distribution with parameterη. The

optimal approximationbaf (Θ|η)—given the fixed-form functionbaf (·)—is then determined as

η = arg minη

∆(f (Θ|d(t)) || baf (Θ|η)

), (4.35)

where∆( baf (·) , f (·)

)is an appropriate measure of distance (or divergence) between two pdfs.

Various measures are used for specific problems, such as Kullback-Leibler, Levy, chi-squared,L2-

norm, etc. These are reviewed in [59]. Specifically, the Kullback-Leibler (KL) divergence (3.7) is

important for two reasons:

1. statistical inference via KL divergence was shown to be optimal in statistical utility sense [52].

2. minimization (4.35) with respect the KL divergence (3.7) has a unique—and therefore global—

solution [68].

Moreover, the KL divergence is also used in many practical applications [53, 35, 69].

38

4.3. DISTRIBUTIONAL APPROXIMATIONS

4.3.4 Variational Bayes (VB) approximation

The Variational Bayes procedure is defined by the restriction of conditionally independence:

bwf (Θ1,Θ2|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) .

Note that the restriction does not prescribe any specific form of the distribution, therefore, the in-

volved distributions are denotedbwf . Optimization of the KL divergence for this choice is given by

the following theorem.

Theorem 4.1 (Variational Bayes)Let f (Θ|d(t)) be the posterior pdf of multivariate parameter Θ.

The parameter Θ is partitioned into Θ =[Θ′

1,Θ′2, . . . ,Θ

′q

]′. Let bwf (Θ|d(t)) be an approximate

pdf restricted to the set of conditionally independent distributions on Θ1,Θ2, . . . ,Θq:

bwf (Θ|d(t)) = bwf (Θ1,Θ2, . . . ,Θq|d(t)) = Πqi=1

bwfi (Θi|d(t)) . (4.36)

Then, the minimum of the KL divergence,

baf (Θ|d(t)) = arg minbwf(·)

KL(bwf (Θ|d(t)) ||f (Θ|d(t))

), (4.37)

is reached for

bafi (Θi|d(t)) ∝ exp(E baf/i(Θ/i|d(t)) (ln (f (Θ, D)))

), i = 1, . . . , q, (4.38)

where Θ/i denotes the complement of Θi in Θ, and baf/i(Θ/i|d(t)

)=∏qj=1,j 6=i

bafj (Θj |d(t)). We

will refer to baf (Θ|d(t)) as the Variational Extreme. Conditionally independent elements of (4.38)

will be called VB-marginals. The parameters of the posterior distributions (4.38) will be called

VB-statistics.

Proof: See [70], [71].

The main computational problem of the VB approximation is that the Variational Extreme (4.38) is

not given in closed-form. For example, withq = 2, the moments ofbaf1 (·), are needed for evaluation

of baf2 (·), and vice-versa. The solution of (4.38) is usually found via an iterative algorithm that is

suggestive of the EM algorithm (Algorithm 4.1), but where all steps involve expectations of the kind

in (4.32), as follows.

Algorithm 4.2 (Variational EM (VEM)) Consider the case where q = 2, i.e. Θ = [Θ′1,Θ

′2]′, then

cyclic iteration of the following steps, n = 1, 2, . . ., converge to a VB extreme (4.38).

E-step: compute approximate distribution of parameter Θ2 at iteration n:

baf(n)2 (Θ2|d(t)) ∝ exp

∫Θ1

baf(n−1)1 (Θ1|d(t)) ln f (Θ1,Θ2, D) dΘ1. (4.39)

39


M-step: using approximate distribution from the ith E-step compute approximate distribution of

parameter Θ1 at iteration n:

baf(n)1 (Θ1|d(t)) ∝ exp

∫Θ2

baf(n)2 (Θ2|d(t)) ln f (Θ1,Θ2, D) dΘ2. (4.40)

Where the initializers, i.e. VB-statistics of baf(0)1 (·) and baf

(0)2 (·), may be chosen randomly. Con-

vergence of the algorithm to fixed VB-marginals, baf(i)i (Θi|d(t)), ∀i, was proven in [70] via natural

gradient technique [72].

Compared tofixed-formminimum divergence approximation (4.35) there are two key differences:

1. the approximating distribution is not confined to a given form, but it is restricted functionally,

using the assumption of conditional independence:

f (Θ|d(t)) ≈ bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) . . . bwf (Θq|d(t)) , (4.41)

whereΘ =[Θ′

1,Θ′2, . . . ,Θ

′q

]′is the multivariate parameter partitioned intoq elements. No-

tation bwf (·) is used to denote an unspecified functional variant (‘wild-card function) used in

optimization procedure which yield the approximating distribution.

2. for reasons of tractability, the VB procedure does not minimize the ‘original’ KL divergence

fromf (Θ|d(t)) to bwf (Θ|η) (4.35) but the ‘reverse’ KL divergenceKL( bwf (Θ|d(t)) ||f (Θ|d(t))

),

i.e. from bwf (Θ|d(t)) to f (Θ|η).

These have, respectively, the following consequences:

1. conditional independence:

• the VB approximation can be used only for models with more than one parameter,

• cross-correlation between variablesΘ1 andΘ2 is not modelled. Intuitively, the correlated

multivariate distribution is modelled as a product of approximating marginals.

2. the use of ‘reverse’ KL divergence:

• from property 4. of the KL divergence (Section 4.3.3), the ‘reverse’ KL divergence is not

equal to the ‘original’ one and therefore, it islessoptimal in the statistical utility sense

[52].

• minimum divergence approximation viaKL( bwf (·) ||f (·)

)is not guaranteedto have a

unique minimum [68].

These disadvantages are, however, out-weighted by computational advantages: (i) functional (i.e.

free form) optimization has an analytical solution, and (ii) parameters of the optimal approximating

posteriors can be evaluated using an alternating VEM algorithm (Algorithm 4.2).

40

4.4. APPROXIMATE BAYESIAN FILTERING

4.3.5 Markov Chain Monte Carlo (MCMC) approximation

In this approach, the posterior pdf is approximated by a piece wise constant density on a partitioned

support, i.e. via a histogram constructed from a sequence of random samples,Θ(0),Θ(1),Θ(2), . . .

,Θ(n), . . ., of variableΘ.

The sequence of random samples is called a Markov chain if then-th sampleΘ(n) is generated

from a chosen conditional distribution

f(Θ(n)|Θ(n−1)

)(4.42)

which depends only upon the previous state of the chainΘ(n−1).

For mild regularity conditions onf (·|·) (4.42), then, asn → ∞, Θ(n) ∼ f (Θ), the (time-

invariant) stable distribution of the Markov chain defined via the kernel (4.42). Hence i.i.d. samples

from f (Θ) may be drawn via an appropriate choice of kernel (4.42), ifn is chosen sufficiently large.

Typically, the associated computational burden is high, especially for high-dimensional parameters.

4.4 Approximate Bayesian filtering

In this Section, study the use of distributional approximations (Section 4.3) for the problem of

Bayesian filtering (Proposition 3.1). We review the techniques of forgetting [73]. Moreover, we

introduce a new approximation technique based on the VB approximation (Section 4.3.4), which

will be calledVB-filtering.

4.4.1 Forgetting

Note that for the estimation scenario, the Bayesian filtering problem is replaced by accumulation

of sufficient statistics, which is computationally feasible. The technique of forgetting [74] was pro-

posed for estimation of non-stationary parameters of models from the exponential family (4.16).

Originally, the technique was developed as heuristics [74]. Later, it was shown to be a special form

of approximation of the time-update operation in Bayesian filtering [73].

The time-update operation is approximated as follows:

f (Θt|d (t− 1) , φt) ∝[f (Θt−1|d (t− 1))Θt

]φt × bAf (Θt|d (t− 1))1−φt . (4.43)

The notationf (·)Θtindicates the replacement of the argument off (·) by Θt, whereΘt is the time-

varying unknown parameter set at timet. bAf (·) is a chosen alternative distribution, expressing

alternative knowledge aboutΘt at timet. Coefficientφt, 0 ≤ φt ≤ 1 is known as the forgetting

factor. From (4.43), the limits are interpreted as follows:

for φt = 1: prior information, at timet, about the new variableΘt is identical to the posterior of

Θt−1 at t− 1:

f (Θt|d (t− 1) , φt) = f (Θt−1|d (t− 1))Θt.

41


This is consistent with the choiceΘt = Θt−1, i.e. the time-invariant parameter assumption.

for φt = 0: prior information, at timet, about the new variableΘt is chosen as the alternative dis-

tribution:

f (Θt|d (t− 1) , φt) = bAf (Θt|d (t− 1)) .

This is consistent with the choice of independence betweenΘt andΘt−1, i.e.

f (Θt,Θt−1|d (t− 1)) = bAf (Θt|d (t− 1)) f (Θt−1|d (t− 1)) .

The forgetting factor is typically considered as fixed and it is chosen by the designer of the model.

The choice ofφt close to1 models slowly varying parameters. The choice ofφt close to0 models

rapidly varying parameters.

Remark 4.4 (Internal model for forgetting) It is possible to construct the explicit internal model

(3.9), however, no use was found for it.

Using this approach, the task of Bayesian filtering (Proposition 3.1) can be re-interpreted in terms

of the task of estimation within exponential family (Section 4.2). Using the time-update operation

(4.43), the data-update operation (3.14) for the exponential family (4.16) can be rewritten as follows:

f (Θ|d (t)) = Aνt (Θ) exp 〈Vt, C (Θ)〉 ,

Vt = φtVt−1 +B (Ψt) + (1− φt) bAVt, (4.44)

νt = φtνt−1 + 1 + (1− φt) bAνt,

where bAV and bAν denotes statistics of the alternative distributionbAf .

4.4.2 Variational Bayes filtering

In this Section, we slightly re-formulate the operation of Bayesian filtering (Proposition 3.1). Here,

we treat the time-update operation (3.13) as a sub-task of the data-update operation (3.14). The

data-update operation is approximated as one operation:

f (Θt|d (t)) ∝∫f (Θt,Θt−1|d (t)) dΘt−1,

which, can be—under the assumptions of Agreement 3.1—split into separate time- and data-update

operations.

Here, we seek an approximation of the joint distributionf (Θt,Θt−1|d (t)) in the class of condi-

tionally independent distributions, i.e.

bwf (Θt,Θt−1|d (t)) = bwf (Θt|d (t)) bwf (Θt−1|d (t)) .

42

4.5. APPROXIMATE ESTIMATION

Using the Variational Bayes approximation (Theorem 4.1), it is easy to show that the optimal ap-

proximation can be found in the following form:

baf (Θt|d (t)) ∝ exp

E baf(Θt−1|d(t)) (ln f (yt,Θt,Θt−1|ut, d (t− 1))),

baf (Θt−1|d (t)) ∝ exp

E baf(Θt|d(t)) (ln f (yt,Θt,Θt−1|ut, d (t− 1))).

From Agreement 3.1, it follows that

baf (Θt|d (t)) ∝

exp

E baf(Θt−1|d(t)) (ln f (yt|ut, d (t− 1) ,Θt) + ln f (Θt|Θt−1) + ln f (Θt−1|d (t− 1))),

∝ f (yt|ut, d (t− 1) ,Θt) exp

E baf(Θt−1|d(t)) (ln f (Θt|Θt−1)).

(the first log-term is independent ofΘt−1 and can be omitted from expectation, the last log-term is

independent ofΘt and thus it will be part of the normalization). Hence, the time-update step can be

written as:

baf (Θt|d (t− 1)) ∝ exp

E baf(Θt−1|d(t)) (ln f (Θt|Θt−1)).

However, in this case, the time- and data-update operations can not be done sequentially, as they are

mutually dependent. Therefore, the VEM algorithm (Algorithm 4.2) must be used.

4.5 Approximate estimation

4.5.1 Bayes-closed approximation

The problem of recursive estimation with limited memory was addressed in general in [75]. There,

the problem was defined as finding a functional form,bwf (Θ), of such an approximate distributions

that is closed under Bayes’ rule, i.e.

bwf (Θ|d (t)) ∝ f (dt|Θ, d (t)) bwf (Θ|d (t− 1)) . (4.45)

where bwf (Θ|d (t− 1)) and bwf (Θ|d (t)) are of the same functional form. Moreover, the form

must depend only on a finite-dimensional statistics,st, such as

bwf (Θ|d (t)) = bwf (Θ|st) ,

where dimensionst is assigned, and may be chosenarbitrarily small. Note thatst plays the role of

sufficient statistics.

The requirement of closure under the Bayes rule is important, since any Bayes-closed estimation

avoids accumulation of errors during time-updating.

43


The family was found in the form of probabilistic mixture ofst fixed (known) pdfsfi (Θ), i =1, . . . , st, weighted by elements ofst. Statisticsst is then updated by a linear functional,l (·),

si,t = si,t−1 + l (fi (Θ) , ln f (dt,Θ|d(t)t−1)) , i = 1, . . . , st. (4.46)

Alternatively, the choice ofst fixed pdfs,f i (Θ), can be replaced by the choice ofst functionals

li (·), such as

si,t = si,t−1 + li (ln f (dt,Θ|d (t− 1))) .

It was proven then the approximate on-line identification (4.45) isgloballyoptimal—with respect to

orthogonal projection on the true posterior distribution—[55].

Practical use of the approximation is, however, rather limited. The method requires time- and

data-invariant linear functionals,li (·) to be chosena priori. Design criteria for these operators are

available only for special cases. The method was demonstrated to be applicable to low-dimensional

problems only.

Remark 4.5 (Particle Filtering) The popular technique of particle filtering [60] applied to station-

ary model can be seen as a special case of the Bayes-closed approximation. In this approach, the pdf

f (Θ|st) is approximated by particles, i.e. samples from Θ∗,Θ(1),Θ(2), . . . ,Θ(n)

, each of which

has assigned weights w(i). This correspond to the choice of

fi (Θ) = δ(Θ−Θ(i)

),

and the weights st ≡ w = [w1, w2,...].

4.5.2 Projection based approach

In this case, the requirement for the approximation family to be closed under Bayes’ rule is relaxed.

The form of the posterior,f (Θ|st), is givena priori and fixed for allt. It is the Bayes’ rule what is

approximated at each step [59, 76]. If the posterior distribution,f (Θ|d (t− 1)), has a form different

from the prior,f (Θ|st−1), an approximation of the posterior is found in the family of the prior

distributionbaf (Θ|st) ≈ f (Θ|d (t− 1)) ∝ f (dt|Θ, d (t− 1)) baf (Θ|st) (4.47)

The approximation (4.47) is used as prior in the next step.

All projection-based approximations of pdfs reviewed in Section 4.3.3 may be used here.

Note that one-step approximation is onlylocally optimal (i.e. optimal only for one step, not for

the whole trajectory), and so the error of approximation may grow with time. Thus the quality of the

approximation has to be studied asymptotically, i.e. fort → ∞. Furthermore, the approximation is

not closed under Bayes’ rule. In practice, this means that on-line identification given a set of i.i.d.

observations yields different results depending on the order in which the data are processed [59].

Remark 4.6 (Re-sampling in particlle filtering) A typical problem of particle filtering is that the

44

4.5. APPROXIMATE ESTIMATION

posterior mass concentrates on a few particles. This effect can be avoided by so-called re-sampling-

operation. This operation can, once again, be seen as projection of the pdf on another support. One

immediate consequence is that the closure under the Bayes rule is lost.

4.5.3 On-line Variational Bayes

The general VB approximation (Section 4.3.4) was extended to the on-line scenario in [70]. It is

found that the on-line VB method is a special case of one-step approximation, namely distribution

fitting, with Theorem 4.1 used to satisfy (4.47). Convergence of the method was also proven in [70],

by showing on-line VB to be a special case of stochastic approximation, which is known to converge

[77].

Off-line VB approximation (Section 4.3.4) is a functional optimization of the KL divergence. This

functional optimization can be extended to the on-line scenario as follows:

bwf (Θ|d(t)) ≈ f (dt|Θ, d (t− 1)) bwf (Θ|d(t− 1)) . (4.48)

We seek an optimal approximation of the true posterior under the conditional independence con-

straint (assumeq = 2 for algebraic simplicity):

bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) , (4.49)bwf (Θ|d(t)) = bwf (Θ1|d(t)) bwf (Θ2|d(t)) . (4.50)

Then, using (4.48) and (4.49) in Theorem 4.1, the VB-optimal form of (4.49) is found in the following

form:

baf (Θi|d(t)) ∝ exp(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1))) + ln bwf (Θi|d(t− 1))

),

∝ exp(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1)))

)bwf (Θi|d(t− 1)) . (4.51)

Equation (4.51) can be rewritten as:

baf (Θi|d(t)) = bV Bfi (dt|Θ, d (t− 1)) bwf (Θi|d(t)) , i = 1, 2, (4.52)bV Bfi (dt|Θ, d (t− 1)) ∝ exp

(E baf(Θ/i|d(t)) (ln f (dt|Θ, d (t− 1)))

). (4.53)

Then, (4.52) is the VB-approximate update of parameter distribution, wherebV Bfi (dt|Θ, d (t− 1))plays the role of observation model for theith posterior distribution. Hence, it will be known as

partial VB-observation model. This concept is helpful, since it allows us to use the results from

estimation (Section 4.2) and choosebwf (Θi|·) conjugate with the VB observation model (4.53)

yields a numerically tractable recursive identification algorithm. ThisVB-conjugatedistribution can

be found if the partial VB observation model (4.53) is from the exponential family.

Note that (4.53) is, in fact, in the form of the Bayes-closed approximation (4.46) withE baf(Θi|d(t)) (·)

45


playing the role of linear operatorli (·). However, the expected value,E baf(Θi|d(t)) (·), is conditioned

by d (t) and is, therefore, time-varying. This is not allowed for the linear operators used in the

Bayes-closed approximation. Therefore, the on-line VB approximation (4.52) is not closed under

Bayes’ rule. Asymptotically, however, VEM evaluation off (Θi|d(t)), converges [70]. Thus, VB-

approximation is asymptotically Bayes-closed.

4.6 Approximate design of DM Strategy

Fully probabilistic design (FPD) of the DM strategy was chosen as the main approach to this prob-

lem. For special cases from model families reviewed in Sections 4.1, and 4.2, the solution of FPD

(Proposition 3.2) is analytically tractable. However, this is not true in general. In this Section, we

analyze the problem of approximate design of DM strategies using the FPD approach.

The solution of FPD (Proposition 3.2) can be interpreted as a specific type ofdynamic programing

[10], with γ (d (t)) playing the role ofBellman function[6]. The functionγ (d (t)) is recursively

evaluated against the time arrowγ (d (t− 1)) = g γ (d (t)), where functiong · is defined by

(3.27), (3.28). The FPD solution is feasible if the Bellman functionγ (d (t)) can be represented by a

finite dimensional statistics at each timet. In other words, the form of the Bellman function is again

self-replicating under operations (3.27), (3.28).

This concept is similar to that of conjugacy (Remark 4.3). Therefore, the approach to approxima-

tion may be similar to that used for filtering (Section 4.4), and estimation (Section 4.5). A review of

state of the art techniques for general dynamic programing was presented in [78]. It was concluded

that the most promising approach to the problem is the use of approximate Bellman functions from

carefully chosen family [79].

The number of steps involved in FPD solution—Proposition 3.2, restated in (3.27), (3.28)—is

rather high. However, since all of the steps are operations of probabilistic calculus, general proba-

bilistic approximations (Section 4.3) can be used.

Approximations of FPD for mixture models has been presented in [80]. However, systematic use

of distributional approximations in this area remains a topic for future research.

46

5 Practical Aspects of Decision Making

In this Chapter, we list steps that are meaningful for application of the DM theory (Chapter 3) to a

practical problem. Both off line and on line parts are covered.

Agreement 5.1 (On-line steps of decision making)The adaptive decision maker operates by re-

cursive repetition of the following steps:

1. read: the observed data are read from the environment. All the necessary pre-processing and

transformation of data is done in this step.

2. learn: the observed data are used to increase the knowledge about the environment,

3. adapt: the decision-maker use the improved knowledge of the system to improve its DM

strategy,

4. decide: the adapted DM strategy is used to choose an appropriate action,

5. write: the chosen action is written into the environment. Similar to the first step, transforma-

tion of the results is done in this step.

Note that all of the on-line steps of DM, described in Agreement 5.1, should be done within a fixed

period of time. This justifies our emphasis on feasibility (Requirement 4.1) of the DM operations

(Chapter 4). The UML notation of the on-line DM is displayed in Figure 5.1.

Due to computational constraints, it is expected that the level of adaptivity of the decision maker

is rather limited. Namely, it is expected that both (i) structure of the model of the environment, and

(ii) structure of the DM strategy, are defineda priori and hard-wired into the nature of the decision

maker. The challenging task of selection of appropriate structures is typically left for the expert

designers. TheDESIGNERproject [81, 82, 83, 84] is an attempt to systematically address the task

of automated design of DM strategies for various practical problems.

The following steps summarize the available experience, gained especially in development of the

DESIGNER project.

Agreement 5.2 (Basic steps of DM)

1. Problem descriptionIn this step, technical problem specification covering all available knowledge, aims and re-

strictions is collected form the user. Specifically, we collect the knowledge required for all

47

CHAPTER 5. PRACTICAL ASPECTS OF DECISION MAKING

Figure 5.1:UML sequential diagram of on-line steps of decision making.

subsequent steps. The first required information is full description of the observed data. We

are dealing with dynamic systems, therefore, all data are expected to vary in time. The stream

(time-indexed sequence) of observations is called the channel. At first, we collect description

of all individual channels, available off-line data, ranges of data sensors, role in the DM (i.e.

is the channel an action ut or observation yt). Then, expert knowledge relevant to each of the

following steps is collected.

2. Elicitation of prior distributionsTypically, the expert knowledge is not available in the form of pdfs. Therefore, this knowledge

needs to be converted (often approximately) into probabilistic terms. From now on, only the

probabilistic representation of this knowledge will be used by the methodology. The original

description of the problem will be used only in interaction with the user.

3. Model selectionThe expert knowledge, collected in step 1, does not select one particular model, but only a

class of considered models. The available off-line data can be used to decide which model

from the class is best suited to the problem.

4. LearningParameters of the model selected in the previous step are estimated.

48

5.1. PROBLEM DESCRIPTION

5. Model validationSince the model identified in the previous steps will be considered as fixed (up to some pa-

rameter – in adaptive case), it is wise to perform some additional tests to validate quality of

the model. Use of invalid model in the following steps may prove to be too expensive. If the

model is found invalid, the whole DM process must be restarted from step 1 (or 2).

6. Elicitation of ideal pdfsAt this stage, structure of the model is considered as fixed. Therefore, the ideal pdf—which

has the same structure as the model for computational tractability—can be build from specifi-

cations obtained from the user at step 1.

7. DesignIn this step, the admissible control strategy is computed.

8. Design validationThe designed control strategy is tested if the closed loop meets requirements specified in step

1. If these requirements are not met, the DM process must be restarted form step 6, or, in

severe cases, from step 1.

9. ImplementationThe control strategy is implemented and tested on-line in real environment. It is also the final

validation of the approach.

The cycle of development used under DESIGNER is described by sequential diagram in Figure 5.2.

The steps in the cycle are only loosely tight together. The user (designer) is allowed to:

skip some steps if a reasonable defaults are available (e.g. for prior elicitation of model selec-

tion).

repeat steps if he is not satisfied with the achieved results. This makes sense only if he changes his

description of the system. This need naturally arises after each validation step when the user

finds the learned model or the designed strategy insufficient for his needs, the whole process

must be restarted.

5.1 Problem description

At this stage, the user (or the designer) should describe the problem in a systematic way. Interaction

with the user can be done in two ways:

interactive mode where the DM process stops after each step and allows the user to change the

description or restart the whole process.

batch mode in which all the information is collected first, and all computation runs independently

of the user. This is illustrated in Figure 5.2, where all steps are called from operationbatchrun.

49


Figure 5.2:UML sequence diagram of the design of the decision-maker.

50

5.2. PRIOR ELICITATION

Naturally, it should be possible to combine the use of these modes. For example, at first, initial de-

scription of the problem is created in the interactive mode on a small dataset. Then, the computation

on a larger data set is run with the same description in the batch mode.

Thus a systematic way of storing user description of the problem is required. From software design

point of view, a structure for external information is to be created. This structure cannot be considered

as being rigid or complete. It is expected that with more sophisticated methods, more information

from the user may be required. Following the object-oriented approach, all new information fields

should be added by the mechanism of inheritance. This approach will ensure compatibility of the

extended description with older methods.

5.2 Prior elicitation

If a knowledge on the modelled environment is available before any data are observed, this knowl-

edge can be injected into the learning process by means of prior pdfs on model parameters. Thus,

the form of prior pdf depends on the observation model used for estimation. However, the prior

knowledge,K, is typically available in the form independent of the chosen model parameterization.

The task of prior elicitation is basically translation of partial knowledge available for input/output

model (3.15) into prior knowledge on parametersf (Θ).

Since the main concern of this work is with feasibility of the whole process, we adopt the following

assumptions:

1. The form of the prior is chosen as conjugate under the Bayesian filtering/estimation. Therefore,

the task of prior elicitation is to select only its statistics.

2. If more than one piece of knowledge is available, i.e.K =K1, . . . ,KK

, information from

all possible sources must be taken into account. Typically, a compromise between the sources

must be found, as the sources can suggest incompatible knowledge.

It is easy to recognize the task of prior elicitation as a special case of merging, namelyindirect

merging (Section 4.25). However, this task has a few specific features, which we will analyzed

in this Section. Primarily, the problem was addressed before the general problem of probabilistic

merging, hence it is interesting to review the published results, [85, 86, 87, 88], in the light of recent

development of general theory of merging.

5.2.1 Elicitation of prior pdf from one source

In this Section, we consider the elicitation of the prior based on a single piece of knowledgeK.

A feasible mechanism for prior elicitation known was introduced in [85]. The prior knowledge is

used to generate a typical data records,d, which would be generated by a system (of the form of the

chosen observation model) that is compatible withK. These data are calledfictious data, since they

51


were not observed on any real system. Propagation of the fictious data,d, through Bayes’ rule yield

the required prior distribution

f (Θ|K) ≈ f(Θ|d)∝ f

(d|Θ)f (Θ) , (5.1)

wheref (Θ) denotes non-informative ‘pre-prior’ distribution.

This mechanism can be interpreted as a special case of the general theory of probabilistic merg-

ing (Section 3.4), namely indirect merging (3.37). Note, from (3.37) and (5.1), that the generation

of the fictious datad correspond to approximation off (yt|d (t− 1) , ut) by an empirical densitybrf (d (t)), for which the merging operation (3.37) is equivalent to the learning operation (3.14).

From the point of view of software design, we note that elicitation of the prior information from

one source is done in two steps: (i) translation of the given knowledge into the fictious data, and

(ii) learning with fictious data. The learning operation is common to all fictious data, however, the

translation into fictious data must be defined for each type of source of prior information.

5.2.2 Merging of knowledge sources

In this Section, we describe merging of prior knowledge from various sources. It is assumed that

prior pdfsf (Θ|Ki) , i = 1, . . . , K are available. The task is to find such an approximate distributionbaf(Θ|K

), which is (i) conjugate with the observation model, and (ii) combines the knowledge

accumulated inf (Θ|Ki) , i = 1, . . . , K. It was found [87], that

baf(Θ|K

)=

K∏i=1

f (Θ|Ki)βi , (5.2)

is optimal in the sense of KL divergence. Scalarsβi, i = 1, . . . , K,∑K

i=1 βi = 1 are weights

corresponding of theith source of prior knowledge. This result is not surprising, since it is, yet

again, a special case of the general merging operation, namely (3.36).

However, in general merging theory, the weightsβi are assumed to be known. This assumption is

not valid in the task of prior elicitation. A method for selection of weightsβi was presented in [87].

It is argued thatβi should be chosen as follows:

βi ∝ f (d (t) |Ki) =∫f (d (t) ,Θ) f (Θ|Ki) dΘ. (5.3)

Note that (5.3) is the marginal posterior distribution onβi. Therefore, formally, it is not a prior, but

a posterior estimate. In spite of this, such a choice is important from practical point of view, since

it allows us to reduce the computational cost associated with learning. For example, weightsβi may

be estimated on a small amount on data and fixed, while the learning is performed on a larger dataset

with fixed prior baf(Θ|K

).

52

5.3. MODEL SELECTION

Remark 5.1 (Flattening) Informativeness of the prior distribution is strongly dependent on the

number of fictious data that was used for its creation. Moreover, in complex situations, posterior

distributions may be used in construction of the adequate prior [28], which presents a danger of over-

fitting of the prior with respect to the used dataset. In order to overcome this problem, the operation

of flatteningis defined as a way to reduce informativeness of the prior. From software design point

of view, flattening is a newgeneral operation on pdfs.

5.3 Model selection

The notion of model structure has been established in Section 3.1, Remark 3.1. Under the Bayesian

paradigm, the modelM, can be treated as an unknown variable. Hence, the task of model selection

is in principal equivalent to that of Bayesian learning. However, the set of all possible modelsM∗

(3.8) is infinite dimensional and a practical construction of the prior distribution over it, as well as

posterior and evaluation of its moments, is intractable. Therefore, we treat this special problem in

this Section.

We adopt the following assumptions: (i) the model is considered to be time-invariant for all data

recordsd (t), and (ii) the prior onM∗ is considered to be uniform. Then, the task of model selection

is equivalent to finding the maximum likelihood estimateboM ∈ M∗. The likelihood function

L(d(t),M) of M is equal to distributionf(d(t)|M) that depends onM. Thus, the construction of

the likelihood function is implied by (3.19):

L(d(t),M) =∏t∈t∗

f(yt|ut, d(t− 1),M). (5.4)

Hence, the estimation selects among various modelsM, fromM∗, the model with the highestv-

likelihood (5.4) (likelihood on model variants).

Remark 5.2 (Nesting in EF) For tractability reasons, only a finite (and typically quite small) num-

ber of models can be tested using (5.4), since evaluation of the likelihood requires to perform the

learning procedure on the full set of data. An exception from this rule is the exponential family

(Section 4.2), where various model structures share parts of their sufficient statistics. This property

is known as nestingin EF [89].

From the software design point of view, model selection can be implemented as learning with pdf

augmented by an extra discrete parameter (label ofM), followed by marginalization over the re-

maining parameters.

5.4 Learning

In this step, it is assumed that the best modelboM has been already selected. Naturally, one might

see this step as redundant as learning of model parameters for this model has been done in the model

53


selection procedure. However, those two steps has different role in many applications. Namely,

model selection is typically performed on smaller data sets (for computational feasibility) and only

in off-line phase of development. Learning must be performed in on-line scenarios for each incoming

data.

From the software design point of view, model selection and learning should use the same algo-

rithms. However, since approximation are required for identification of complex models, different

approximations may be needed for off-line and on-line learning. Then, the bias of the model se-

lection task towards off-line use, and of the learning task towards on-line use, will force us to use

different procedures.

5.5 Model validation

An extensive theory of model validation has been developed, see e.g. the review [90]. However,

the available procedures deal almost exclusively with independent data samples. Consequently, they

cannot be used for validation ofdynamicmodels. Just a few exceptions is available [91], addressing

only special cases.

Model validation is an additional test on the quality ofboM. Recall, from Section 5.3, thatboMwas chosen under the assumption of time-invariance of the model. One task of model validation

is to verify this assumption. This task is addressed in the classical model validation theory [90],

by splitting of all available datad(t) into (i) learning data bld, and (ii) validation data bvd. The

best modelboM is learnt on the learning databld and its performance is checked on the validation

data bvd. The validation technique essentially inspects how good is the bestdynamicmodel boM in

extrapolating of the past to the future. Thus, the learning databld has to form the “prefix" part ofd(t)and the validation databvd the “suffix" part.

The results of validation strongly depend on the choice of the cutting moment which splits the

available data into learning and validation parts. None of the existing methods, [90], is directly

prepared for the considered dynamic models. These models allow just cutting into contiguous se-

quences. Essentially, the available data up to acutting momentτ are taken as learning data and the

rest as validation data. This reduces the number of possible choices of learning and validating data.

At the same time it disqualifies majority of the available analysis. This motivates us to design an

adequate, purely Bayesian, solution of the model validation problem.

5.5.1 Validation with fixed cutting moment

Let us consider a fixed cutting momentτ ∈ t∗ ∪ 0, which defines

bld(τ) ≡ d(τ) (5.5)bvd(t \ τ) ≡ (dτ−∂ , . . . , dt). (5.6)

where∂ is the largest delay of a data record in auto-regression.

54

5.5. MODEL VALIDATION

The task of model validation can be formulated as test of the following hypotheses:

H0: All recorded data,d(t), are described by the learnt modelboM.

Thev-likelihood of this hypothesis results from Bayesian filtering on all data giving

f(d(t)|H0) ∝ L(d(t), boM). (5.7)

H1: Learning data and validation data should be described by individual models.

The correspondingv-likelihood results from independent filtering on learning and validation

data giving

f(d(t)|H1, τ) ∝ L(bld(τ), boM|τ

)L(bvd(t \ τ), b1M|τ

). (5.8)

Note that the proportionality factor formed by the randomized DM strategy (3.11) which is common

for both hypothesis.

The modelb1M used on validation data may differ fromboM. The strength of the constructed test

depends significantly on the choice of the competing modelb1M. We make the following choice: (i)b1M has the same structure asboM, (ii) it is learnt on validation data, (iii) prior pdf in the validation

phase is chosen as flattened version of the state estimate gained in the learning phase. Spread of the

flattened pdf should be comparable to that of the prior pdf used on the learning data.

This choice intuitively meets the requirement on a real competitor: learning is exploited without

fixing the results too much and thus without restricting possibility to fit the validation data in a better

way.

The principle of validation is graphically illustrated in Figure 5.3. Estimation on the whole data

d(t) yields result in the class time invariant models. Estimation on the separate data sets yields result

in the class of models switched at the cutting moment. The latter class is, of course, richer but it has

smaller portion of data per estimated variable at disposal. Thus, the winner is not a priori determined.

With no prior prejudice,f(H0|τ) = f(H1|τ), the Bayes rule provides the posterior pdff(H0|d(t), τ).The learnt model can be accepted if the posterior pdf,

f(H0|d(t), τ) =

(1 +

L( bld(τ), b0M|τ

)L( bvd(t \ τ), b1M|τ

)L(d(t), b0M

) )−1

, (5.9)

is high enough, i.e. close to 1. Otherwise, we have to search for the reason why the chosen model is

not reliable enough.

5.5.2 Validation with multiple cutting moments

Results of the previous test depend, often strongly, on the selected cutting momentτ . Thus, it makes

sense to validate learning for various cutting momentsτ ∈ τ∗ ⊂ t∗. We are making a pair of

55


System 2System 1

Switching modelssingle model

Figure 5.3:Scheme of the proposed validation. Ellipses denote classes of models, small circles de-note alternative “positions" of the real system with respect to the model class. The crossesdenote models of the systems estimated within each class. Dashed lines signify distancesof the system to the best models. The hypothesisH0 is expected to win for System 1 andH1 for System 2.

decisions(H, τ) based on the available datad(t). We selectτ ∈ τ∗ and accept (H = H0) or reject

(H = H1) the hypothesisH0 that the learnt model is valid.

We solve this static decision task and select the optimal decisionboH on inspected hypotheses and

optimal cutting time momentboτ as a minimizer of the expected loss. We assume, for simplicity, that

the losses caused by a wrong acceptance and rejection are identical, say (without loss of generality)

1. The loss function is thus chosen as

Z(H, H, τ) = 1− δ(H(τ)−H

), H,H ∈ H0,H1 ,

whereδ(·) is Kronecker delta. The optimal decisionsboH, boτ minimizes expected valueEH (·)taken over uncertain datad(t) and hypothesisH

boH, boτ ∈ Arg minH,τ∗

E[Z(H, H, τ)

]. (5.10)

Proposition 5.1 (Optimal cutting) Let 0, t ∈ τ∗. Then, the optimal decision boH about the in-

spected hypotheses H0,H1 and the optimal cutting boτ , that minimize the expected loss in (5.10),

are given by the following rule.

Compute b0τ ∈ Arg maxτ∈τ∗

f(H0|d(t), τ)b1τ ∈ Arg min

τ∈τ∗f(H0|d(t), τ) (5.11)

If f(H0|d(t), b0τ) ≥ 1− f(H0|d(t), b1τ)

then select boH = H0,boτ = b0τ

else select boH = H1,boτ = b1τ

Proof: Let us consider the set of cutting moments b0τ∗ ≡τ ∈ τ∗ : f(H0|d(t), τ) ≥ 0.5

.

This finite set is non-empty, as for τ = 0 f(H0|d(t), τ) = 0.5. For a fixed τ ∈ b0τ∗, the

56

5.6. ELICITATION OF IDEAL PDFS

decision H = H0 leads to a smaller loss than the decision H = H1. The achieved min-

imum is expectation over d(t) of 1 − f(H0|d(t), τ). Thus, it is smallest for b0τ maximizing

f(H0|d(t), τ) on b0τ∗.

For any fixed τ in the set b1τ∗ ≡ τ ∈ τ∗ : f(H0|d(t), τ) ≤ 0.5, the decision H = H1

leads to a smaller loss than the decision H = H0. The achieved minimum is expectation

over d(t) of f(H0|d(t), τ). Thus, it is smallest for b1τ minimizing f(H0|d(t), τ) on b1τ∗. The

smaller of the discussed pairs of minima determines the optimal decision pair.

Practical applications of the above test strongly depend on the setτ∗ of the considered cutting

moments. The finest possible choice isτ∗ = t∗. The exhaustive search is too demanding for exten-

sive data sets. Search for the minimizer by a version of golden-cut rule, by a random choice or by a

systematic inspection on a small predefined grid can be applied. The predefined grid seems to be the

simplest and still relevant variant as minor changes inτ∗ make little physical sense.

Detailed elaboration of the technique for exponential family (Section 4.2) and simulation example

can be found in [92].

5.5.3 Other techniques of model validation

From Bayesian point of view, model validation can be seen as model selection, where the competing

models have been designed based on one selected modelboM. Other common model validation

techniques are based on analysis of modelling of residues,

εt = Ef(dt|d(t−1)) (dt)− dt. (5.12)

Further analysis can be done either by visual inspection, e.g. histogram based, or, by additional

modelling.

From software-design point of view, it is important to store the residues and use them as observa-

tions for another model.

5.6 Elicitation of ideal pdfs

Since ideal pdfs are typically assigned by the user, their elicitation has many common features with

the task of prior elicitation (Section 5.2). Specifically, elicitation of ideal distributions on data-

independent internal variables, i.e.bIf (Θt|d (t)) = bIf (Θt), is identical to prior elicitation.

A non-expert user is not able to formalize his knowledge in terms of pdfs and their statistics.

Typically, we can expect the user to formalize his requirements in terms of moments (mean and

variance) or ranges on given variables. This information must be translated into pdfs. This can be

achieved, for example, by mean of projection (Section 4.3.3).

Two non-standard cases may arise:

data-dependent ideals i.e. ideals on dynamic behaviour of the system. For example, the user

wish to place restriction on the differences of the observed data,yt − yt−1, in terms of upper

57


and lower bound. In this case, it is typically sufficient to select the ideal distributions with

mean atyt−1 and adjust to variance to be compatible with the given bounds (e.g. using2σ rule

for Gaussian distribution).

time-dependent ideals for following a priori known trajectory, i.e. the ideal distribution is de-

fined on the whole DM horizonbIf(d(t))

. Hence, the whole trajectory must be stored.

Another possibility is that there exist an analytical formula for recursive computation of the

ideal pdf.

5.7 Design of DM strategy

By agreement, the fully probabilistic design (FPD), Proposition 3.2, is used for design of the DM

strategy.

5.8 Design validation

The best validation of the designed DM strategy is its implementation in the real environment. How-

ever, this can be too costly if the designed DM strategy is incorrect. Therefore, we seek a safer testing

mechanism. At present, the most common validation technique is validation by simulation, which

involves sampling from all involved pdfs. This is important from software-design point of view.

58

6 Multiple Participant Decision Making

In this Chapter, we comment on the general theory of DM (Chapters 3–5) from the multiple partici-

pant point of view. As it was mentioned in Section 1.1, the theory of MPDM is not fully developed

yet. The main distinction of MP scenario from the classical single-participant DM is the ability

and need of participants to communicate and cooperate. Therefore, we distinguish three stages of

operation of each participant:

1. on-line (data-processing) stage, when the participant interacts with the environment, in the

same way as in the single participant case,

2. communicationstage, when the participant exchange information with its neighbours,

3. negotiationstage, when the participant makes decisions how to act and re-act with respect to

its neighbours.

6.1 On-line (data-processing) stage

Is equivalent to on-line DM (Proposition 5.1) of single participant scenario. Here, we preserve the

traditional notion of on-line acting in the sense of processing of the latest observations from the

environment. These steps are adjusted as follows:

1. read: the observed data are read from the system (environment).

In MP scenario, the information available at the current time does contain only the usual inno-

vation of the observed data, but also possible communication from the neighbours.

2. learn: the observed data are used to increase the knowledge about the system (environment).

In MP scenario, it is necessary to absorb information from both (i) the observed data, and

(ii) possible communication from the neighbours. Note that merging of information from

the neighbours does not occur at each time step, and it may be computationally expensive

operation. Therefore, we introduce a new step:

2a. merge: which merge the current knowledge with information obtained from the neighbour.

This operation may be called as a subroutine of thelearn step, or as a separate background

job. The latter mechanism requires develop a new mechanism for synchronization of these two

tasks.

59

CHAPTER 6. MULTIPLE PARTICIPANT DECISION MAKING

3. adapt: the decision-maker use the improved knowledge of the environment to improve its DM

strategy.

In MP scenario, the ideal pdfs describing the aims of DM can be changed on-line by commu-

nicating new ideal distributions. Therefore, it may be necessary to recompute the whole DM

strategy. We introduce a new step:

3a. design: which re-evaluate the FPD on the whole horizon.

This operation may be called as a sub-routine of the adapt step, or a separate background job.

4. decide: the adapted DM strategy is used to choose an appropriate action,

In MP scenario, the task of communication is also part of the decision making problem. There-

fore, in this step, decisions on communication actions—such as: request communication, ne-

gotiate, refuse communication, etc.—must be also made.

5. write : the chosen action is written into the system (environment).

6.2 Communication

Interaction between two (or more) participants can be done only in terms common to all involved

participants. Note that we do not impose any particular internal model structure on each participant,

therefore, the participants may share only information defined on the observed data. This restriction

can be easily relaxed by definition of commonly shared internal variables. Shared internal variables

may represent a real quantity with physical meaning which is not directly observable.

Recall, from Section 3.2, that each participant stores its knowledge as the following pdfs: (i) the

factorized model (3.12), (ii) the factorized ideal (3.22), (iii) and estimates (3.14). The participants

can thus interact via restrictions of the named object into the data space.

DM strategy: both, optimizedf (ut|d (t− 1)), and idealbIf (ut|d (t− 1)),

predictor: of observations,f (yt|ut, d (t− 1)) (3.15), and correspondingly formulated ideal,bIf (yt|ut, d (t− 1)). In many practical applications, the ideal will be defined in this form.

observed data: i.e. values ofd (t). The individual values can be seen as a special case of pdfs,

namelyempirical density, brf (d).

estimates: f (Θt|d (t)) and idealsbIf (Θt|d (t)) on internal variables, if these variables are com-

mon to both interacting participants.

Communication is coordinated through a special channel of observed data.

6.3 Merging

Communication of two participants,P1 andP2, is meaningful only if it causes any modification of

behaviour of any of them. This can be achieved in two ways: (i) modification of model (3.12), or

60

6.4. NEGOTIATION

(ii) modification aims (3.22), of any participant. The model (3.12) is factorized into (i) observation

model, and (ii) estimates. In principal, it is possible to consider modifications of the observation

model, however, no consistent theory is known to us, therefore, we will omit it in this text. Modifi-

cation of the estimates using the communicated data-related pdfs (i.e. predictors, or empirical pdfs)

of can be formalized asindirect merging(Section 3.4.1), as follows:

f[1] (Θt|d (t)) , f[2] (d (t))merge−→ f[1] (Θt|d (t)) , (6.1)

wheref[1] (·) denotes a pdf belonging to a participantP1. Technically, data records observed byP1

are different from those observed byP2 and we should reflect this fact in out notation. However,

the notation of the merged pdf would become increasingly complicated after interaction with many

neighbours, and without any practical benefits. Therefore, we condition formally all merged pdfs on

the general datad (t).Modification of the DM strategy can be achieved by modification of the ideal distributions (3.22)

followed by FPD (Proposition 3.2). Merging of the ideal pdfs can be formalized asdirect merging

(Section 3.4.2):

bIf [1] (dt|·) , bIf [2] (dt|·)merge−→ bIf [1] (dt|·) . (6.2)

Analogically, direct merging is used for estimates of common internal variables:

f[1] (Θt|·) , f[2] (Θt|·)merge−→ f[1] (Θt|·) .

Remark 6.1 Since the DM strategies bof [1] (ut|·), bof [2] (ut|·) are also pdfs, it is possible to merge

them using (6.2). However, behaviour of the merged DM strategy is not optimal under the FPD and

may produce unpredictable results. Therefore, we prefer to merge ideal distributions and estimates

and perform FPD to obtain DM strategy reflecting knowledge and aims of both participants.

6.4 Negotiation

Communication is an active process for both participants. In each step, participant can select from

a range of communication-related actions: initiate communication, accept communication, reject

communication. When the communication was accepted, the participant is expected to reply with

counter proposal: close communication, send counter-proposal, or request further information. Se-

lection of an appropriate action is also part of the DM process, and as such, it is being made based

on its negotiation strategy.

Note that the participants influence each other via the merging operation (Section 6.3), the result

of which is determined by the weightsαi (6.1), wherei is the unique identifier of the neighbour.

This weight can be interpreted as a level of belief in the neighbour. Formally, decision making on

communication is done by negotiating these weights.

We distinguish three basic negotiation strategies [27]:

61


Selfish: is a strategy where each participant freely chooses its own weights. It accepts all infor-

mation from its neighbour, but it refuse any attempts to change the weightsα by another

participant via communication.

Hierarchical: is a strategy where the participant have a fixed values ofαi. If the neighbour is

superordinate, it can assign the value ofα by communication.

Cooperative: is a strategy, where the participants communicate the value ofα (i.e. α2,[1] for the

participantP1, and1− α1,[2] for the neighbourP2). Their common aim is to reach agreement

on its value.

6.5 Design of MP decision-maker

In this Section, we review the basic steps of single-participant DM design (Agreement 5.2) for the

MP scenario. The main distinctions are:

Merging note that merging was already part of the single participant DM, namely the tasks of

prior elicitation (Section 5.2) and ideal elicitation (Section 5.6). However, in both cases, this

operation was performed off-line, i.e. with little constraints on computational efficiency. In

MP scenario, this operation is performed in real-time, and thus, it must be coordinated with

update of data by observations of the environment.

Negotiation the basic strategies of negotiation were described in the previous Section. At present,

we assume that these strategies are designed as deterministic. Formally, it is possible to create

an explicit probabilistic model of the neighbouring participants, learn their behaviour, and

design an appropriate strategy of communication with them. This scenario was not studied in

detail yet and remains as an interesting topic for further work.

Problem description the description structure of a single participant must be extended to contain

information about the neighbours, as follows:

1. a request for communication from a neighbour participant will be observed as a data

record and thus, it will be part ofyt. This must be reflected in the data description.

2. a response to communication, or initiation of a communication to a neighbour is a deci-

sion action, which must be also reflected in the data description.

3. the objects to be communicated are pdfs with possibly large statistics, therefore a suffi-

cient space must be allocated for these structures.

Model selection the initial model selection will be performed independently for each participant

using the same methods as for the single participant case (Section 5.3). However, some form

of model selection must be performed in the case of explicit modeling of the neighbours. For

62

6.5. DESIGN OF MP DECISION-MAKER

example, removal of models of inactive neighbours, or creating entries for newly recognized

neighbours.

Note that explicit modelling of neighbours may be extremely computationally intensive since

the standard model selection (Section 5.3) is defined in terms of hypothesis testing. Testing

large numbers of hypothesis associated with each neighbour is clearly intractable problem and

various simplifications and approximations must be found.

Learning from the observed data will be done using the same techniques as in the single partici-

pant case (Section 5.4). Learning from the knowledge obtained by communication with other

participants will be done via merging (Section 3.4), which can be once again translated into

the basic learning operations (Section 3.4).

Note that learning of the explicit model of the neighbours is a challenging task even in the

simple case with unknownα. Therefore, we expect the following:

• learning of the model of neighbours must be done with different sampling period than

learning of other parameters inΘ,

• no direct observation model is available. The effects of communication (and merging

with given α) will be not be immediately recognizable. Therefore, an approximative

observation model (similar to that in VB, Section 4.5.3) must be found.

63


64

7 Software Image

In this Chapter, we present the UML description of the software toolbox for distributed dynamic

Bayesian decision making.

The whole toolbox will be split into the following packages:

Math is an abstract package, which is used as repository of basic data structures and mathematical

objects, such as matrices. All operations on matrices are supposed to be elements of this

package. However, we will not model these operations, since these are expected to be already

available within the implementation environment, i.e. Matlab or ANSI C.

Prob is the first package with defined classes. In this package, we models the basic objects of

probabilistic calculus, i.e. random variables, functions, pdfs, and elementary operations on

them. These objects forms the smallest building blocks of the DM task.

This package designs software image of the general theory of DM (Chapter 3).

FProb is the package where we specialize the general classes from packageProb into the classes

of feasible DM. Classes used in exact DM (e.g. estimation in exponential family, Kalman

filtering) are defined here, as well as classes for approximate evaluation (Variational Bayes

approach).

This package designs software image of the feasible DM (Chapter 4).

SingleDM is the package which defines building blocks that are necessary to connect the proba-

bilistic core (defined in packageProb) with the real world. Specifically, it defines data filters,

description structures, communication objects etc.

This package designs software image of the practical aspects of DM (Chapter 5).

MultiDM is the package which extends the classes for single-participant DM (packageSingleDM)

into the considered multiple-participant DM.

This package designs software image of the MP DM theory (Chapter 6).

This initial decomposition intentionally respects the decomposition of the theory. Packages are build

on top of each other, however, their dependence is “one-directional” in the sense that classes from

SingleDM needs classes formProb, but not the other way around. Therefore, other packages using

probabilistic calculus may be build on top ofProb.

In order to distinguish software and theoretical objects, all names of software related objects are

printed inbold typeface.

65

CHAPTER 7. SOFTWARE IMAGE

7.1 Package Math

Defines basic data structures in the form of datatypes:

mxArray representing two-dimensional array of real numbers,

Cell representing list of pointers.

This nomenclature comes from Matlab, which is our primary implementation environment, however,

can be easily re-implemented in any other language. By combination of these two types we obtain

datatypemxACell, i.e. Cell of mxArray s.

In further text, we will often use lists (i.e. Cells) of different classes. By convention, names of

these new datatypes will start with the name of the class followed by ‘Cell’, e.g.RVCell being list

of classesRV.

7.2 Package Prob

The basic objects involved in probabilistic calculus (Chapter 3) are: (i) random variables, (ii) func-

tions on variables, (iii) observed data, and (iii) pdfs. Here, we design software representation for

each of them.

7.2.1 Random variables

In the abstract theory (Chapters 3, and 4), arandom variablecan be used in two different flavours:

multivariate random variable: Θ, or d, of fixed a priori knowndimensionalityΘ, or d. This

form is used if the variable, e.g.d, is observed via a realization,dt. Also, all numerical

expectations are expected to be in this form, i.e. estimatesΘ are real-valued matrices with

dimensionalityΘ.

set of sub-parameters: Θ = α, β, γ, whereα, β, γ may havea priori unknown (or irrelevant)

and mutually different dimensionality or nature (i.e. continuous discrete), and may be recur-

sively separable, e.g.α = α1, α2, . . . , αn. This form is used in definition of structure of

models, such as conditional independence (3.1).

Note that the main reason why to represent the random variable in software is to distinguish argu-

ments of various pdfs. In many existing packages, the structure of decomposition of the model (3.12)

is fixed, hence, the pdfs are uniquely identified by their position in the structure. This is typical for

models, on which the probabilistic operations of DM were transformed into algebraic operations, for

example, state-space models (Section 4.1), or exponential family (Section 4.2).

The problem of unique identification of pdfs arises when approximate methods must be used.

Consider, for example, the operation (3.27) which evaluates the following terms:

γ (d (t)) = Ef(Θ|d(t−1)) (g (Θ, d (t))) .

66

7.2. PACKAGE PROB

Figure 7.1:UML class diagram of random variables.

whereg (Θ, d (t)) is a general function of its arguments. If the pdf,f (Θ|d (t− 1)), is defined as a

chain rule of pdfs of various types, then unique distinction between random variables is vital.

The proposed classes for random variables are displayed in UML notation in Figure 7.1.

7.2.1.1 Datatype: rv_id

Is an abstract datatype used as wild-card for any reasonable unique identifier (e.g. ordinary number,

or string, or both). Detailed implementation of this type will be decided later. The choice will be

made after tests of performance of low-level functions for comparing and sorting random variables.

7.2.1.2 Class RV

Attributes:

ID:rv_id is used as unique identifier of the variable.

final:bool is a switch between the above mentioned roles of RV. Its value will be assigned by

descendants of this class. If true, then the random variable has no inner structure (see class

RVfinal ), otherwise it is composed of list of sub-variables (see classRVlist ).

This class does not define any operations.

This class is abstract. It will be used as wild-card for definition of random variables in functions

and pdfs. It will be always implemented via its descendants.

7.2.1.3 Class RVfinal (RV)

Is a descendant of classRV.

Attributes:

size:mxArray defines size of the random variable. It is a vector of integer values used in Matlab

convention, i.e. if the array contains just one number thanRV represents a vector of that length;

if it contains two numbers,RV represents a matrix with given number of rows and columns.

Operations:

new is a constructor. It copies its argument into the attributesizeand sets the value offinal to true.

67


Figure 7.2:UML class diagram of functions of random variables.

7.2.1.4 Class RVlist (RV)

Is a descendant of classRV.

Attributes:

RVs:RVCell defines the list of sub-variables ofRV type, i.e. its elements can be either of the

RVfinal of RVlist type.

Operations:

new is a constructor. It copies its argument into the attributeRVs and setsfinal to false.

7.2.2 Functions on random variables

In the theory (Chapters 3, and 4), functions of random variables are denoted by a letter followed by

its variables in round brackets, e.g.g (Θ) (3.4),T (α) (3.5) orω (ut, d (t− 1)) (3.25).

Note that these objects are (almost exclusively) needed as input or output of the expectation oper-

ator (3.4) in FPD (3.27). Evaluation of integration of arbitrary functions is beyond the scope of this

report, however, it may be important for future research. Therefore, we define an abstract function

and its basic descendants here. The UML scheme is displayed in Figure 7.2.

7.2.2.1 Class function

Is a class implementing transformation of random variables:

g (Θ) = Θ,

whereg (Θ) denotes dependent variable andΘ stands for independent variable.

Attributes:

68

7.2. PACKAGE PROB

rv:RVfinCell is a list of random variables on which the function is defined.

dimen:mxArray defines size of the function output.

Operations:

evalall:mxArray with argumentvalues:mxACell, returns value of the function for parameters val-

ues given be argument . Naturally, dimensions ofvaluesshould correspond to those ofrv , and

dimension of the returned value should bedimen.

evalsome:function with argumentswhich:RVfinCell , andvalues:mxACell, replaces the random

variables inwhich by the values invalues. The result of this operation is a new function

defined on complement ofwhich in rv . Again, dimensions of argumentswhich andvalues

should match.

Jacobian:double with attributevalues:mxACell, evaluates the Jacobian operation (3.5) of the

function for values ofrv given by argumentvalues. In this trivial case, the return value is

always equal to 1.

add with argumentFn:function , additively extends the current form of the function by addition of

terms theFn. This operation is abstract in this class, since addition of extra terms creates a

different functional form.

exp:function returns a new instance of the classfunction representing the exponential of the cur-

rent function, i.e.g (Θ) = exp (g (Θ)).

In this basic class, all above mentioned operations are trivial. The main purpose of this class is

to serve as structural information that will be later used (e.g. for evaluation of moments of pdfs).

Naturally, operations on descendants of this class will be much more complex.

7.2.2.2 Class ConstFn

Is a class implementing the transformation:

g (Θ1, . . . ,Θn) = a1, . . . , an , ∀Θ.

Attributes:

As:mxACell list of constant values of parametersa1, . . . , an,

Operations:

All operations of classfunction, i.e. evalall, evalsomeandJacobianmust be re-implemented.

69


7.2.2.3 Class LinearFn

Is a class implementing the transformation:

g (Θ1, . . . ,Θn) = a1Θ1 + . . .+ anΘn, 1 ≤ n <∞,

wherea1, . . . , an are fixed values of appropriate dimensions (i.e. compatible withΘ1, . . . ,Θn).

Attributes:

As:mxACell is the list of coefficientsa1, . . . , an, whereai represents the value ofΘi.

Operations


7.2.2.4 Other classes

For various purposes, other trivial functions, such asln (Θ), or sin (Θ), may be required. These can

be easily derived from the general classfunction when needed.

7.2.3 Observed data

In the theory (Chapters 3, and 4), we used notationdt for both (i) random variables, and (ii) their

realizations (i.e.observed data), since the probabilistic calculus is same for both objects. However,

for software representation, it is essential to distinguish these two cases. Here, we define the software

image of the observed data.

7.2.3.1 Class DataSource

Is a class representing the data observed in discrete-time steps.

Attributes:

Dt:mxArray is the software image of the history of observed datad (t). However, for feasible

recursive DM, only the most recent data are required. Hence, this attribute contains only the

last∂ observation records,d (t− ∂ . . . t), where scalar∂ denotes the largest delay of a data

record in auto-regression.

Operations:

step is the operation representing time-shift of the data observation. This operation replacesDt by

a new data records observed at the next step.

write with argumentUt:mxArray , whereUt represents the values of decisionsut+1 that were

chosen at the current step.

70

7.2. PACKAGE PROB

7.2.4 Probability density functions (pdfs)

The basic properties ofpdfswere defined in Section 3.1.2 as well as the basic operations on them.

The use of pdfs within the DM framework is more elaborated by (Agreement 3.1). We define two

basic roles of pdfs:

model: which represents mutual dependence of random variables, such as data observation model

(3.10), internal model (3.9), and predictor (3.15).

estimates: which represents posterior pdfs, i.e. distributions of the variable conditioned only by

the observed data.

Note, that for analytically tractable models—i.e. linear state-space models and exponential family—

recursion of probabilistic operations can be transformed into algebraic operations. Namely, (4.3)–

(4.4) and (4.23), for learning in state-space, and exponential family, respectively. For FPD, the prob-

abilistic recursion (3.27)–(3.28) was transformed into recursion on kernel of the quadratic Bellman

function (4.14)–(4.15).

Therefore, in many software packages (e.g. Mixtools) the models are not independent structures,

but only sub-structures of the estimates. Moreover, this notion of joint representation is even more

emphasized in learning algorithms for models with the conditional independence assumption—such

as Bayesian networks, or on-line VB (Section 4.5.3)—since these algorithms assign an approximate

partial observation model (similar to (4.53)) to each posterior estimate.

However, for the purpose of this text, we propose to model these entities an independent objects.

This proposal is motivated by the structure of FPD (Proposition 3.2), namely operation (3.27), which

implies evaluation of KL divergence on individual observation models.

Therefore we design the following basic classes: (i)mPdf, representing models, and (ii)ePdf,

representing estimates. UML class diagram of these is displayed in Figure 7.3.

7.2.4.1 Class mPdf

Is a class representing models, in this basic version, it represents only theinternal modelf (Θt|Θt−1)(3.9).

Attributes:

rv:RV is the variable on which the pdf is defined, i.e.Θt in (3.9),

rvc:RV is the variable in condition, i.e.Θt−1 in (3.9).

Note that both variables are instances of the general classRV, hence,mPdf can be defined on com-

posed as well as final random variables.

Operations:

expectation:function with argumentFn:function , implements the expectation operation (3.4).

Fn stands forg (α) in (3.4). This operation is needed in FPD, operation (3.27).

71


Figure 7.3:UML class diagram of the basic Pdf classes.

72

7.2. PACKAGE PROB

divergence:function with argumentspdf:mPdf , andtype:int , implements a divergence given by

the argumenttype. Most often, the KL divergence (3.7) will be used, therefore it is the default

operation fortype=1. This operation is needed in FPD, operation (3.27).

new is the constructor.

7.2.4.2 Class ePdf

Is a class representing theestimates, i.e. f (Θt|d(t)) (3.14). This function is important, since the task

of learning is defined on this object.

Attributes:

rv:RV is random variable on which the pdf is defined, i.e.Θt in (3.14).

Once again, an instance of the general classRV is used in order to allow representation of com-

posed as well as final pdfs. For better intuition, in comparison with neural networks,ePdf is an

abstractgeneral class that models both network, and its nodes. Exact meaning will be refined by

specializations of this class.

Operations:

update with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13), and

data-update (3.14), operations.

Statistics of the resulting pdf are used to replace the original statistics of this object. Therefore,

if the exact update operation yields pdf of different type than is this object, the resulting pdf

must be projected back onto the original family.

update_new:ePdf with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update

(3.13), and data-update (3.14), operations. In contrast with theupdateoperation, this operation

returns the updated pdf as a new object. Therefore, it can be of different type than the original.


Fn stands forg (α) in (3.4).

dmerge with argumentsEpdf:ePdf, andalpha:double, implements thedirect mergingoperation

(Section 3.4). We do not impose any form of merging, i.e. it is not important which form of

KL divergence—(3.33), or (3.35)—is used. This choice will be made in specializations of this

class.

imerge with argumentsEpdf:ePdf, alpha:double, OM:oPdf implements theindirect mergingop-

eration (Section 3.4). In contrast to the direct merging, this operation requires the knowledge

of the observations modelOM .

Note, that the merging operation was defined only for independent observations. Therefore it

is expected that this definition of the operation is preliminary and it may be changed when a

significant progress in the merging theory is achieved.

73


project_new:ePdf with argumentfamily:int , implements the operation of projection (4.27). Ar-

gumentfamily denotes onto which family the pdf should be projected. This function may be

used in the task ideal elicitation (Section 5.6).

predictor:pPdf with argumentsOM:mPdf , SM:mPdf, implements the predictor operation (3.15).

Here,OM stands for the observation model, however,no observed data will be used in this

operation. This is expressed by the fact that OM is an instance ofmPdf, and notoPdf. SM is

the internal model (3.9).

log_pred:double with argumentOM:oPdf , implements the prediction operation (3.15). In con-

trast to thepredictor operation,log_pred treatsdt as observations, hence, a numerical value

is returned. For numerical reasons, the returned value islogarithmof (3.15).

flatten with argumentfactor:double, implements the flattening operation needed for prior elicita-

tion (Section 5.2), Remark 5.1.

Primary role of this class is to act as a generalization of both: (i) pdfs composed form another pdfs

by chain rule into (graphs), (ii) and final pdfs (nodes). It is defined as abstract, hence it will not be

used directly but only via its descendants.

Loosely speaking, the purpose of this class is to remind us what operations

must be defined on estimate of any kind.

7.2.4.3 Class oPdf (mPdf)

Is a class representing models, specifically the observation models f (dt|Θt, d (t− 1) , ut) (3.10). It

extends the classmPdf, by linking the random variablerv to the observed data.

Attributes:

DS:DataSource is an instance of the classDataSource.

ind:mxArray indicates a position of realizations ofrv in the data vectorDS.Dt.

Operations:

new constructor must be re-implemented to reflect the presence of the new attributes.

getdata:mxArray in an operation which returns the observed value of variablerv from data-source

DS.

7.2.4.4 Class pPdf (mPdf)

Is a class representing thepredictor, f (dt|d (t− 1) , ut) (3.15). It is defined as an extension of the

classmPdf, for recursive update, similar to the one defined for the classePdf. The key difference

from ePdf is that here, datadt are treated as random variables, not observations.

Operations:

74

7.2. PACKAGE PROB

update with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13), and

data-update (3.14), operations.

rupdate:pPdf with argumentsOM:oPdf , andSM:mPdf, jointly implements the time-update (3.13),

and data-update (3.14), operations in reverse timing. This operation in required by the FPD

algorithm (Proposition 3.2).

Here, we note that FPD with unknown internals is relatively new result and detailed algorith-

mic solution was not yet elaborated. Therefore, this operation may not be easily implementable

and another updating mechanism must be found.

However, preliminary considerations (Remark 4.1) suggest that it should be possible to create a

sequence of pdfs for each step on the DM horizon. In such a case, statistics of all predictors would

be generated by operationupdate and would be stored as attributes of the class. Then, therupdate

operation would remove the latest statistics and replace them by the previous ones. This behaviour

is however not feasible for long DM horizons. Further research in this area is needed to achieve

feasible FPD design in the sense of Requirement 4.1.

7.2.4.5 Class ePdfFinal (ePdf)

Is a class representing estimates, which are defined on final random variablesRVfinal . These es-

timates correspond to the nodes in graphical models (2.2.2). It is defined as specialization of class

ePdf.

Attributes:

rv:RVfinal the attributerv is redefined as an instance of the classRVfinal ,

Operations:

In contrast to the generalePdf function,ePdfFinal has numerical values of its statistics, hence it

offers an extra operations on them.

expect:mxArray with argumentFn:function , implements the expectation operation (3.4). In con-

trast to operationexpectation, it does not return functional form of the expectations but a

numerical value. This is, however, possible only if theFn argument is defined only on variable

rv , otherwise, this operation causes an error.

replace_stats with argumentnstats:mxACell, is an auxiliary operation that will be used to replace

attributestats by the argumentnstats. This will be needed in some approximate learning

algorithms, such as VEM (Algorithm 4.2).

sample with argumentn:int , implements sampling from the distribution which is needed for the

task of design validation (Section 5.8).

75


Figure 7.4:UML class diagram of pdfs used for DM with the linear state-space model.

7.2.4.6 Class eEmp (ePdfFinal)

Is a class representing the empirical density,brf (d), (3.38).

Attribute:

data:mxArray represent the observed data recordd (t).

The types of pdfs, introduced in this Section, formalize the interface of the basic building blocks

for pdfs. All pdfs derived from these classes should obey this structure. Therefore, all algorithms

designed for these classes should work—without any modification—for all future descendants (spe-

cializations) of these classes.

7.3 Package FProb

In this package, the general pdfs defined in packageProb are specialized to yield pdfs used in feasible

DM (Chapter 4).

7.3.1 Linear state-space models

In this Section, the general models from theProb package are specialized for the linear state-space

models (Section 4.1). The definition of random variables and data-sources from Sections 7.2.1 and

7.2.3, respectively, does not have to be re-defined. However, new classes specialization are needed

for (i) functions, and (ii) all types of pdfs. UML class diagram of the involved objects is displayed in

Figure 7.4.

76

7.3. PACKAGE FPROB

7.3.1.1 Class QuadraticFn (LinearFn)

The Belman function for linear Gaussian FPD (4.9) has the form of a quadratic function. It is

implemented as an extension of theLinearFn class.

Class implements transformation

g (Θ1, . . . ,Θn) = a1Θ1 + . . .+ anΘn + Θ1b1Θ′1 + . . .ΘnbnΘ′

n, 1 ≤ n <∞,

wherea1, . . . , an andb1, . . . , bn are fixed values of appropriate dimensions (i.e. compatible with

Θ1, . . . ,Θn).

Attributes:

Bs:mxACell list of coefficientsb1, . . . , bn,

Operations:

add with argumentFn:function accepts Fn in the form ofQuadraticFn or LinearFn . If Fn is

defined on the same variables, the correspondingAs andBs are summed. IfFn is defined on

different variables, the listsrv , As andBs are extended by elements fromFn.

exp returns exponential ofg (Θ). It creates a new instance ofexpQuadFnwith same values ofAs

andBs.

All operations of classLinearFn , i.e. evalall, evalsomeandJacobianmust be re-implemented.

7.3.1.2 Class expQuadFn (QuadraticFn)

Is a class representing the exponent of the quadratic functionQuadraticFn.

Operations:

add is not defined,

exp is not defined.


The purpose of this class is merely a storage of attributesAs andBs in an appropriate (i.e. expo-

nential) form.

7.3.1.3 Class mNorm (mPdf)

This class is a specialization of the mPdf class for linear Gaussian observation model (4.1).

Attributes:

rv:RV is of theRVfinal type, representing variableΘt,

77


rvc:RV is of theRVlist type containing twoRVfinal instances for variableΘt−1, andut.

A:mxArray is the matrixA in (4.1),

B:mxArray is the matrixB in (4.1),

R:mxArray is the matrixR in (4.1),

Operations:


The operation should recognize work with:

LinearFn g (Θt) = aΘt, for which it returnsEΘt (atΘt) = a (AΘt−1 +But), and

QuadraticFn g (Θt) = Θ′tZΘt, for which it returnsEΘt (Θ′

tZΘt) = tr (ZR)+(AΘt−1 +But)′ Z (AΘt−1 +But),

and

QuadraticFn g (Θt) = ΘtZΘ′t, for which it returnsEΘt (ΘtZΘ′

t) = ZR+(AΘt−1 +But)Z (AΘt−1 +But)′.

Linear terms inQuadraticFn functions are handled in the same way as inLinearFn . This

operation is needed in FPD, operation (3.27).

divergence:function with argumentspdf:mPdf , andtype:int , implements a divergence given by

the argumenttype. At present, the operation implements the KL divergence (3.7) to another

mNorm class according to formula (4.7).

7.3.1.4 Class oNorm (oPdf,mNorm)

This class is a joint specialization of theoPdf andmNorm classes for linear Gaussian observation

model (4.2). Since all attributes and operations ofoPdf andmNorm complement each other, there

is no need to redefine them. However, the semantic correlation with the theory is broken. One must

remember that attributesA, B andR representC, D andQ in (4.2). Also the constructor new must

be redefined as a merge ofoNorm:new andoPdf:new.

7.3.1.5 Class pNorm (pPdf)

This class is a specialization of thepPdf class for a Gaussian pdf (4.5).

rv:RV is of theRVfinal type, representing variabledt,

rvc:RV is of the typeRVlist , representing all needed variablesdt−i, ut−i, i = 1, . . . , t. Wheret is

the length of the DM horizon.

As:mxACell is the list of linear coefficients for the mean valueµt, which is—from (4.3)–(4.5)—

defined as a linear combination of previous observationsdt−i, ut−i, i = 1, . . . , t, which are

listed in attributervc.

78

7.3. PACKAGE FPROB

Sig:mxArray is the covariance matrixΣt in (4.5), which is—from (4.3)–(4.5)—independent of

previous observations (attributervc), and thus, it can be represented by a numeric value.

Operations:

update accepts argumentsOM:oNorm , andSM:mNorm , jointly implements the time-update (4.3),

and data-update (4.4), operations in the same way aseNorm.updatedoes, with the exception

thatµt is not a numerical value but a functional form.

rupdate accepts argumentsOM:oNorm , and SM:mNorm , jointly implements the time-update

(4.3), and data-update (4.4), in reverse order.

expectation:function with argumentFn:function , implements the expectation operation (3.4) in

the same way aseNorm.expectationdoes, with the exception thatµt is not a numerical value

but a functional form.

7.3.1.6 Class eNorm (ePdfFinal)

This class is a specialization of theePdfclass for a Gaussian pdf (4.4).

Attributes:

rv:RV is of theRVfinal type, representing variableΘt,

mu:mxArray is the mean valueµt in (4.4),

Sig:mxArray is the covariance matrixΣt in (4.4).

Operations

update accepts argumentsOM:oNorm , andSM:mNorm , jointly implements the time-update (4.3),

and data-update (4.4), operations.

update_new:ePdf creates a new instance ofeNorm, copies its statistics and callsupdate on the

new instance.


The operation should acceptFn of the following types:

LinearFn g (Θt) = aΘt, for which it returnsEΘt (atΘt) = aµt, and

QuadraticFn g (Θt) = Θ′tZΘt, for which it returnsEΘt (Θ′

tZΘt) = tr (ZΣt)+µ′tZµt, and

QuadraticFn g (Θt) = ΘtZΘ′t, for which it returnsEΘt (Θ′

tZΘt) = ZΣt + µZµ′.

79


dmerge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Sec-

tion 3.4). It acceptsEpdf argument of theeNorm type defined on the same variable, i.e. on

rv , then, the operation (3.36) is implemented as follows:

Σt =(α2Σ−1

t + (1− α2) Σ−1[2]

)−1,

µt = Σt

(α2Σ−1

t µt + (1− α2) Σ−1[2] µ[2]

),

whereΣ[2] andµ[2] is used forSigandmu attributes ofEpdf.

imerge with argumentsEpdf:ePdf, alpha:doubleandOM:oPdf , acceptsEpdf in the formeEmp.

ForeEmp, it calls theupdateoperation withOM , for which the data-sourceDSwas replaced

by eEmp:data.

prediction:pNorm with argumentOM:mNorm implements the operation of prediction (4.5). It

creates a new instance ofpNorm.

log_pred:double with argumentOM:oNorm implements the operation of prediction (4.5) for the

current observed data from the observation modelOM .

expect:mxArray with argumentFn:function , implements the same expectations as operationex-

pectation, however, it returns numerical values of the moments rather than functions.

flatten with argumentfactor:double, implements the flattening operation needed for prior elic-

itation (Section 5.2), Remark 5.1. For Gaussian distribution this operation is just a scalar

multiplication of the covariance matrixΣ.

7.3.2 Exponential family models

In this Section, the general models from theProb package are specialized for the exponential fam-

ily models (Section 4.2) and approximate Bayesian filtering via forgetting (4.4). The definition of

random variables and data-sources from Sections 7.2.1 and 7.2.3, respectively, does not have to be

re-defined. Technically, all operations can be defined in general for sufficient statisticsV andν.

However, for computational reasons, algorithms for most prominent members of this family—linear

Gaussian models (4.20), and Markov models (4.21)—are implemented separately.

The necessary Bellman functions for FPD for a Gaussian pdf was already defined in Section 7.3.1.

Another important member of the family is Dirichlet distribution, which is used for modelling of

discrete Markov processes. In general, treatment of this distribution is simpler than that for Gaussian

distribution, details can be found in [28]. For the purpose of this text, we define only the basic classes

for pdf and Bellman function.

UML class diagram of the involved objects is displayed in Figure 7.5.

80

7.3. PACKAGE FPROB

Figure 7.5:UML class diagram of pdfs used for DM with the exponential family model.

81


7.3.2.1 Class MultiIndexFn (function)

Is a class representing the multi-arrayΘ in (4.21).

Attributes:

rv:RVfinCell is a list of random variables where all elements are discrete random variables.

Array:mxArray is a structure of numerical values. Size of this array is determined by number of

possible states of the random variablesrv. Indexing of the array for realizations of random

variables isinternalproperty of the class.

Operations:

setElement with attributesrvind:mxACell , andvalue:double is an auxiliary function for assign-

ing values ofArray elements. This function is required, since indexing of the Array is imple-

mented internally. This operation is therefore, the only option to assign value to an element of

Array at position given byrvind .

Also operationsevalsome, andevalall must be re-implemented to comply with the indexing mecha-

nism.

7.3.2.2 Class eEF (ePdfFinal)

Is a class representing the estimate within the EF (4.22).

Technically, it is possible to define statisticsV andν, and operations on them here. However,

these operations are later redefined for each special class, namely Gauss-Wishart and Dirichlet dis-

tributions. Therefore, we leave this class asabstract.

7.3.2.3 Class eGW_LD (eEF)

Is a class representing Gauss-Wishart posterior density for linear autoregressive models (4.20). It is

computationally advantageous to implement all the required operations on LD decompositions [93]

of sufficient statisticsV [94].

Attributes:

LD:mxArray represent LD decomposition of the sufficient statisticsV .

dfm:double represents the sufficient statisticsν, which is also known as degrees of freedom (dfm).

Operations:

update accepts argumentsOM:oEF , and SM:mDelta (or SM:eFrgEF), implements the data-

update operation (4.23) formDelta, or data-update (4.44) foreFrgEF internal models.

82

7.3. PACKAGE FPROB


The operation should work with:LinearFn andQuadraticFn types of function.


tion 3.4). It acceptsEpdf argument of theeGW_LD type defined on the same variable, i.e. on

rv , then, the operation is implemented using (4.26).


Then, it calls theupdate operation withOM , for which the data-sourceDS was replaced by

eEmp:data.

prediction:pNorm with argumentOM:oEF implements the operation of prediction (4.24). Ana-

lytically, (4.24) is a Student pdf, however, it can be well approximated by a Gaussian ifν > 10[28]. Since not all operations required on the predictor (pPdf in 7.3) are available for the Stu-

dent pdf, this operationprojectsthe Student pdf into a Gaussian pdf and returns predictor of

thepNorm type.

log_pred:double with argumentOM:oEF implements the operation of prediction (4.24) for the

current observed data from the observation modelOM . Since numerical evaluation of the

Student pdf is numerically feasible, this operation returns the exact value of the prediction

(4.24).




tion (Section 5.2), Remark 5.1. For exponential family this operation is reduced to multiplica-

tion of the sufficient statisticsV andν by a constant.

7.3.2.4 Class mDelta (mPdf)

Is a class representing the stationary internal model (3.17). This model is trivial, hence no other

attributes or classes are required. The purpose of this class is to act as a switch of operational modes

for updateoperations, where the internal model is a mandatory parameter.

7.3.2.5 Class mFrgEF (mPdf)

Is a class representing the forgetting operator (4.43). This class is common to both special types

of pdfs, i.e. linear Gaussian, and Markov models. The exact meaning is determined by the type of

attributerv .

Attributes:

AltPdf:eEF is an instance of theeEFclass representing the alternative distribution in (4.43).

83


EP:eEF is an instance of theeEFclass representing the posterior distribution at timet−1 in (4.43).

frg:double is the forgetting factorφt in (4.43).

Operations:

At present, the defaultmPdf operations (i.e.expectation, and divergence) on the forgetting

operator are not defined. Formally, at least approximate versions of these operations should be

available. However, their derivation is beyond the scope of this report. This task is left open for

further research.

7.3.2.6 Class oEF (oPdf)

Is a class representing both: (i) the general autoregressive model (4.20), and (ii) the general Markov

model (4.21). In contrast to the Normal observation model, delayed observationsdt−i are present in

the model, withing the variableΨt. Therefore, the observation model must be extended to provide

not only the current data, but the whole regressorΨt (4.17) and its associated Jacobian (4.18).

Attributes:

str:mxArray is a structure of the regressorΨt. Note thatoEF is specialization of the generaloPdf

class, hence the data-source attributeDS is also defined in this class. Elements ofstr are thus

pointers into the array of observationDS:Dt.

Operations:

getPsi:mxArray is an alternative to getdata, which returns numerical value of the currentΨt.

Jacobian:double returns the value of the current Jacobian (4.18).

7.3.2.7 Class eMC (eEF)

Is a class representing the Dirichlet pdf which is conjugate with the general Markov model (4.21).

Note that we faced the problem of representation of the multi-array parameterΘ in the classMulti-

IndexFn. Here, we face the same problem, since statistics of the Dirichlet pdf have the same form

as it parameter. Therefore, once again, aninternal indexing mechanism must be found.

Attributes:

V:mxArray is an attribute representing the sufficient statisticsV in (4.22).

Operations:

update accepts argumentsOM:oEF , and SM:mDelta (or SM:eFrgEF), implements the data-

update operation (4.23) formDelta, or data-update (4.44) foreFrgEF internal models.

84

7.3. PACKAGE FPROB


Integration over variables is replaced by a summation, so the operation of expectation should

work almost all types of functions.

merge with argumentsEpdf:ePdf, andalpha:double, implements the merging operation (Section

3.4). It acceptsEpdf argument of theeMC type defined on the same variable, i.e. onrv , then,

the operation is implemented using (4.26).

Moreover, this class is the only class, where it is feasible to implement the (better) merging

algorithm via (3.33) [32].

prediction:pMC with argumentOM:oEF implements the operation of prediction (4.24).

log_pred:double with argumentOM:oEF implements the operation of prediction (4.24) for the

current observed data from the observation modelOM . Since numerical evaluation of the

Student pdf is numerically feasible, this operation returns the exact value of the prediction

(4.24).



7.3.2.8 Class pMC (pPdf)

Is a class representing the exponential family predictor (4.24), for the special case of Markov model

(4.21), which is in the form of multinomial pdf [28].

Attributes:

V:mxArray is the statistics of predictor. For one step-ahead predictor, it is of the same size asrv .

Operations:

update:pMC with argumentOM:eEF, andSM:mPdf, should implement the prediction operation

(4.24) next step. However, exact evaluation of this formula is rather complex, therefore, we

approximate the posterior pdf onΘ by a certainty equivalence approximation (Section 4.28),

with point estimate chosen as expected valueΘ = Ef(Θ|d(t)) (Θ) of the posterior on the latest

available data. Thus, multi-step-ahead predictor remains in the form of Multinomial pdf.

rupdate:pMC with argumentOM:eEF, andSM:mPdf, should implement the reverse prediction,

i.e. return (4.24) at the previous time step.

expectation:function implements the expectation operation (3.4). This operation is trivial.

divergence:function implements the KL divergence (3.7) using the formula from [28].

85


Figure 7.6:UML class diagram of pdfs used for DM with the Variational Bayes approach.

7.3.3 Variational Bayes approach

In this Section, the general models from theProb package are specialized for the conditionally in-

dependent models used in VB approach (Section 4.3.4). In Section 4.3.4, the VB approach was

interpreted as an approximate inference scheme for multivariate pdfs using the assumption of condi-

tional independence. It was shown in (4.53), that dynamic DM using this approximation is feasible

if the VB-marginal pdfs (4.38) are from the exponential family (4.16). Therefore, we restrict our

attention to pdfs composed by conditionally independent models from exponential family, and we

define classes for VB on top of those for EF (Section 7.3.2).

UML class diagram of the involved objects is displayed in Figure 7.6.

7.3.3.1 Class oVBnet (oPdf)

Is an abstract class representing the original (intractable) observation model approximated by the

VB approach. However, this class is not directly used in any learning operation, since the update

is defined with respect to an auxiliarypartial VB-observation models(4.53). At present, the main

purpose of this class is its use as an identifier of compatible types for the argumentOM:oPdf in the

eVBnet:updateoperation.

Note however, that this observation model is needed in FPD (Proposition 3.2). Technically, it

is possible to replace all operations on the original model by operations on partial VB-observation

models. However, the FPD for this type of approximation is not elaborated yet. Therefore, we treat

the VB as a learning-specific approximation.

Attributes:

rv:RVlist since VB approximation is defined on multivariate pdfs, therv attribute must be of the

86

7.3. PACKAGE FPROB

RVlist type.

7.3.3.2 Class oVBpart (oEF)

Is a class representing thepartial VB-observation models(4.53). The key distinction of this type of

model is its dependence on posterior estimates of its neighbours.

Attributes:

neighbours:eVBCell is a list of all neighbours whose moments are needed for evaluation of the

Psi regressor.

Operations:

getPsi:mxArray is an operation returning regressor associated with the model. Note that the result

may be function of observed data and/or statistics of theneighbours.

7.3.3.3 Class eVBnet (ePdf)

Is a class representing the joint distribution (4.49) of conditionally independent VB-posteriors.

Attributes:

rv:RVlist contains random variables on which the pdf is defined.

nodes:eEFCell is the list of all nodes (conditionally independent pdfs). The nodes are from the

exponential family.

PartOMs:oVBCell is the list of partial observation models. Each node should have its correspond-

ing observation model in this list.

Operations:

update with argumentsOM:oVBnet , andSM:mDelta, implements the VEM algorithm (Algo-

rithm 4.2) on thenodes.


SinceoVBnet is joint pdf of exponential family pdfs, evaluation of expectations (3.4) is re-

duced to calling theexpectationoperation of allnodes.


tion 3.4). It acceptsEpdf argument of theeVBnet type defined on any subset ofrv . Due to

conditional independence, the direct merging operation is translated into merging operations

onnodes.


ForeEmp, it calls theupdateoperation withOM , for which the data-sourceDSwas replaced

by eEmp:data.

87


prediction:pVBnet with argumentOM:mNorm implements the operation of prediction (4.5).

Due to conditional independence, the resulting predictor is a product of predictors for each

node.

log_pred:double with argumentOM:oNorm implements the operation of prediction (4.5) for the

current observed data from the observation modelOM . It returns sum oflog_predvalues from

all nodes.




tion (Section 5.2), Remark 5.1. Flattening can be done, once again, independently for each

node.

7.3.3.4 Class pVBnet (pPdf)

Is a class representing predictor obtained by marginalization of theeVBnetestimate.

Note that the marginalization operation for the original observation model (3.10) may not be

tractable. On the other hand, if we approximate the original model by partial observation models

(4.53), the integration is trivial. Therefore, at present, we design the pVBnet class as software repre-

sentation of the latter case. The predictor is then a product of predictors form the exponential family,

i.e. pNorm or pMC .

Attributes:

predictors:pPdfCell array of predictors.

Operations:

No theoretical results for operation on predictors are available at present. These will be defined

later.

7.4 Package SingleDM

This package implements all classes needed for the practical tasks associated withsingle-participant

decision making (Chapter 5). Separation of features related to single- and multiple-participant DM

is motivated by two factors:

1. the purpose of the work is to create a basis for long term research (Section 2.1). A lot of

research will be directed towards single participant scenarios, hence, the presence of features

of MP DM would be redundant and confusing for the users,

2. single participant scenario is a special case of MP scenario, therefore, due to the object-

oriented approach, the classes from this package can be easily extended for MP scenario in

the next package.

88

7.4. PACKAGE SINGLEDM

Here, we review the basic steps of decision making (Agreement 5.2) from the software-design point

of view.

1. Problem description

In this step, the expert user provides as much information about the problem as possible. This

information needs to be stored in order to be used later. Moreover, it must be stored in such a

way that allows easy handling of the information in the subsequent tasks. Hence, we design a

new classUserInfo as a structure oftask-specificpieces of information.

2. Elicitation of prior distributions

This task can be translated into the task of learning with fictious data (Section 5.2). The

learning operations (time- and data- update) are already covered in packagesProb andFProb.

It remains to specialize theoPdf class to handle fictious data.

3. Model selection

Once again, the task of model selection can be translated as learning followed by marginaliza-

tion over parameter-space (Section 5.3). All required operations (time- and data- update, and

prediction) are already covered in packagesProb andFProb. However, the general model of

hypothesis testing (Section 5.3) can be computationally inefficient for certain families, such

as exponential family (Remark 5.2). Therefore, we design a new classHypothesis, which is

intended to encapsulate the task of model selection, and, by specialization, the task of model

validation.

4. Learning

All required operations (time- and data- update, and prediction) are already covered in pack-

agesProb andFProb.

5. Model validation

Is a special case of model selection. The classHypothesiscan be specialized for this purpose.

6. Elicitation of ideal pdfs

The ideal pdf is used, almost exclusively, for evaluation its KL divergence from the observa-

tion model in FPD (Proposition 3.2). Alternatively, its KL divergence from the predictor is

evaluated in data-driven FPD (Proposition 3.3). Therefore, we consider the ideal distributions

to be specialization of the classpPdf.

Note that thepPdf class has defined theupdate and rupdate operations, which allows to

modify the predictor using any observation model or internal model. This mechanism is well

suited for modelling of the time-variant ideals (Section 5.6).

7. Design

All operations for FPD are already available.

89


8. Design validation

As in was mentioned in Section 5.8, this task is typically achieved by means of simulation.

Since simulation can be seen as a special case of decision-making, it can be implemented

using the already available objects. However, the classDataSourcemust be extended to allow

for writing of the simulated data (not only decisions).

Now, we design software image of the required objects in detail.

7.4.1 Class UserInfo

Is the overall class unifying all information from the expert user. It is composed from information

related to individual steps of DM. UML class diagram of the whole structure is displayed in Figure

7.7.

7.4.1.1 Class DataInfo

This class stores all available information about the observed data and their nature.

Since all data are supposed to be observed on-line, the source of the one scalar observation will be

known as thechannel.

Attributes:

chnum:int is used for storing the number of all channels.

Chns:ChInfCell is a list of informations available on each of the channels. The new datatype

ChInfCell is defined as list ofChnlInfo classes, which is defined later.

PreProc:FltInfCell is a list of information about the necessary pre-processing that must be per-

formed on the data. The new datatypeFltInfCell is a list ofFilterInfo classes. EachFilterInfo

stores information aboutonefilter used for preprocessing.

Internal classes:

7.4.1.1.1 ChnlInfo Attributes:

id:int is a unique identifier of the data user for referencing within software representation of the

system,

name:string is a user-friendly identifier of the data, used for presenting the results to the user,

type:int is an identifier of the data type. At present we distinguish discrete (type=0) and continuous

(type=1) data.

min:double is a minimum possible value in the channel.

max:double is a maximum possible value in the channel.

action:int is used to indicate if the decision maker can choose value of this channel (action=1).

90


Figure 7.7:UML class diagram of the classUserInfo.

91


7.4.1.1.2 FilterInfo Attributes:

type:int is used to indicate type of the pre-processing filter, i.e. wavelet filter, median filter, etc.

chnls:mxArray is a structure indicating which channels are being used by the filter.

This class is defined as virtual. Specialized filters of various types will extend this info with their

own information.

7.4.1.2 Class PriorInfo

As it was outlined in Section 5.2, the available prior information may consist of many mutually

incompatible pieces (sources) of information. Information from these sources is then combined

together.

Attributes:

Sources:PriKnCell is a list of information on each of the sources. The new datatypePriKnCell

is a list of classesPriorKnInfo . ClassPriorKnInfo contains information on each source.

weights:mxArray is a list ofa priori known weights that will be used to combine the sources. If

the actual weights are to be inferred from the data (Section 5.2), these values will be used as

statistics of the prior distribution on the weights.

Internal classes:

7.4.1.2.1 PriKnInfo Is an abstract class representing one particular type of available information.

Attributes:

type:int identifies the type of information on each source.

This class is defined as virtual. Specialized filters of various types will extend this info with their

own information.

7.4.1.3 Class ModelInfo

Is a class for collecting information on the models preferred by the user.

Attributes:

maxno:int specifies the maximum number of models compared by the model selection procedure.

class:int is used to indicate which class of models is preferred by the user. We assume that all the

tested models will be from this class.

92


7.4.1.4 Class EFModInfo (ModelInfo)

Is a specialization of theModelInfo class for the exponential family. Due to the nesting property of

the exponential family (Section 5.3), the necessary learning phase of model selection can be done

only on one the richest model and the task of model selection is then reduced on operations on

sufficient statistics of this model.

Attributes:

maxstr:mxArray denotes maximum possible structure of the model

7.4.1.5 Class MValidInfo

Is a class storing information on user preference of the model validation procedure (Section 5.5).

The model can be validated by many tests, which can be applied sequentially.

Attributes:

testes:ValInfCell is a list of information about the validation testes. The new datatypeValInfCell

is a list ofValInfo classes. EachValInfo class stores information aboutonevalidation test.

Internal classes:

7.4.1.5.1 Class ValInfo Is an abstract class, used as root of further specialization.

7.4.1.5.2 Class CuttingVInfo (ValInfo) Is specialization of the ValInfo class for validation by

cutting (Section 5.5.2).

Attributes:

cutpoints:mxArray is an array of time-indexes defining the grid of cutting moments.

7.4.1.6 Class IdealInfo

Is a class storing information about the user requirements on output of the closed loop. Once again,

all observations on the closed loop system are done on-line, hence the information in this field is

channel-specific.

Attributes:

Observed:IdealInfCell is a list information about the desired closed-loop behaviour of selected

channels. The new datatypeIdealInfCell is a list of IdealChInfo classes. EachIdealChInfo

class stores information aboutonechannel.

Internal classes:

93


7.4.1.6.1 Class IdealChInfo Is a class representing the user-desired values of each channel.

Attributes:

id:int identifies the channel on which the requirements are imposed. Channel of the sameid must

exist in theDataInfo structure.

imin:double is the requested minimum value of the data in this channel.

imax:double is the requested maximum value of the data in this channel.

dmin:double is the requested minimum of the difference of data-value in this channel between

two subsequent observations.

dmax:double is the requested maximum of the difference of data-value in this channel between

two subsequent observations.

Note that this class describes time-invariant requirements on the system. Other specializations of

IdealChInfo must be derived to describe time-variant requirements.

7.4.1.7 Class DesignInfo

Is a class with requirements and settings used for the task of design of the DM strategy.

Attribute:

horizon:int is the number of optimized steps ahead,t in Proposition 3.3.

options:string is the list of options (tuning parameters) of the FPD algorithm (Proposition 3.3).

At this stage we do not impose any structure of this attribute. It will be interpreted by the

corresponding operation implementing FPD.

7.4.1.8 Class DValidInfo

Is a class with requirements and settings used for the task of validation of the designed DM strategy.

Attributes:

ndat:long the number of simulated time steps.

tolerance:double a tuning knob in decisions of validity of the designed DM strategy.

7.4.2 Special purpose classes

In this Section, the functionality of pdfs from packagesProb andFProb is extended in order to

support for fictious data (Section 5.2), time-variant ideals (Section 5.6), and simulation (Section

5.8). UML class diagram of the new classes is displayed in Figure 7.8.

94


Figure 7.8:UML class diagram of specialization of pdfs for practical tasks of DM.

7.4.2.1 Class FictOPdf (oPdf)

Is a class designed for the task of prior elicitation (Section 5.2). This class is specialization of the

originaloPdf class.

Operation:

create: with argumentK:PriorInfo , analyzes the argument and creates a new instance ofoPdf with

a new instance of theDataSourcefilled with fictious data for givenK .

7.4.2.2 Class iPdf (ePdf)

Is a class designed for the task of ideal elicitation (Section 5.6), and the DM strategy design (Section

5.7). There, a formal method of step-wise construction of the ideal distribution on the DM horizon

was introduced.

Operations:

create with argumentU:IdealChInfo , is an alternative constructor of the inheritedpPdf attributes.

It analyses the argument and assigns such values to the inherited statistics that are appropriate

for the givenU.

update re-defines the inherited operationupdate, if the operation differs. (It is expected that the

update operation for ideal pdfs will be simpler than that for predictors.)

7.4.2.3 Class Simulator (DataSource)

Is a class designed for the task of design validation (Section 5.8), to allow simulation of the data. It

extends the DataSource class to allow for writing one-step ahead data.

Operations:

step is redefined to accept the argumentDt:mxArray , which assigns values of theDt attribute for

the next step.

95


Figure 7.9:UML class diagram of the Hypothesis class.

7.4.2.4 Class Hypotheses

Is a class designed for the task of model selection (Section 5.3) and, by specialization, for the task

of model validation (Section 5.5). Technically, these tasks can be achieved by sequential use of the

standard operations on pdfs. However, we separate the task into this class, since special treatment

is required for some model classes (Remark 5.2). UML class diagram for this class is displayed in

Figure 7.9.

Attributes:

weights:mxArray is the posterior estimate of the likelihood (5.4) corresponding to each consid-

ered hypotheses.

Ests:ePdfCell is a list of posterior estimates for each considered hypothesis,

OMs:oPdfCell is a list of observation models for each hypotheses,

SMs:oPdfCell is a list of internal models for each hypotheses,

Lengths of these lists may differ for different specializations of the class. Correct handling of differ-

ent lengths must be assured within the operationtest.

Operations:

create with attributeMI:ModelInfo is a constructor which creates the class attributes based on the

information from the user.

test with argumentsDS:DataSourceandndat:int , is an operation which typically calls theup-

date and log_pred operations for each of the estimates inEsts and accumulate their results

in weights. For exponential family, this operation needs to be redefined to update just one

estimate and subsequently evaluate likelihoods for all possible sub-structures (Remark 5.2).

96


Figure 7.10:UML class diagram of the basic single-participant decision makers.

7.4.2.5 Class MVHypothesis (Hypothesis)

Is a specialization of theHypothesisclass for the task of model validation by cutting (Section 5.5.2).

In this case, the estimates in the listEstshave the same structure, but differ in the data on which are

conditioned.

Attributes:

cutpoints:mxArray is a predefined grid of cutting points,

Operations:

test the operation is re-defined to update different estimates in Ests in different parts of the cutting

grid.

create with attributeMVI:CuttingVInfo , re-implements of the original constructor to accept the

user info on the cutting grid.

7.4.3 Decision Makers

The UML class diagram of the basic decision makers defined in this package is displayed in Figure

7.10.

7.4.3.1 Class AdaptDM

The basic class of this package implements the on-line version of the decision-maker as described

by Agreement 5.1.

Attributes:

OM:oPdf is the observation model (3.10),

SM:ePdf is the internal model (3.9),

97


Est:ePdf is the current estimate (3.14),

Stra:pPdf is the designed DM strategy (3.23),

DS:DataSource is the actual link with the environment, i.e. it is the source of the observed data,

and destination of the decisions.

Operations:

new is the constructor of the class.

read reads the data from the environment, in the simplest case, it just callsDS.step(),

learn uses the observed data to improve its knowledge about the environment, in the simplest case,

it just callsEst.update(OM,SM).

adapt uses the updated estimates to adjust the designed DM strategy (if it depends on the parame-

ters). In the simplest case, it callsEst.expect()and propagates results toStra.replace_stats().

decide selects the optimal decision using the current data and strategy. Since the DM strategy

is designed as pdf, we need to collapse it to a value. It can be done via (i) moments, or

(ii) random sampling. For the first case, the operation callsMom:=Stra.expectation() and

Ut:=Mom.evalall(). For the second case,Updf:=Stra.condition() andUt:=Updf.sample().

write writes the decisionUt into the environment. In the simplest case, it callsDS.write(Ut).

run repeatedly calls the above procedures, i.e.read–write , for all available data.

7.4.3.2 Class SingleDM (AdaptDM)

Is a class implementing the off-line steps of decision making as described by Agreement 5.2. This

class is specialization of the classAdaptDM since the purpose of the off-line analysis is to create all

structures needed of the on-line mode.

Attributes:

UseInf:UserInfo the structure of expert information from the user,

I-OM:iPdf is the ideal observation model in (3.22),

I-SM:iPdf is the ideal internal model in (3.22),

I-U:iPdf is the ideal DM strategy in (3.22),

BF:function is the auxiliary Belman functionγ (t), (3.24).

Operation:

98


Figure 7.11:UML sequence diagram of the task of prior elicitation.

new with argumentUseInf:UserInfo, is the constructor.

PriorElicit performs the task of prior elicitation (Section 5.2). It analyzes thePriorInfo field in the

UseInf structure and elicits the prior values of theEst attribute. This procedure is displayed in

Figure 7.11.

ModelSel performs the task of model selection (Section 5.3). It creates a new instance of the

Hypothesisclass using theModelInfo field in theUseInf structure and elicits the prior values

of theEst attribute. Internally, it may call thePriorElicit function.

Learning performs the task of learning (Section 5.4). In the simplest case, it just calls the inherited

run procedure.

ModelValid performs the task of model validation (Section 5.5). It creates a new instance of the

Hypothesisclass using theMValidInfo field in theUseInf structure and elicits the prior values

of theEst attribute.

IdealElicit performs the task of ideal elicitation (Section 5.6). It analyzes theIdealInfo field in the

UseInf structure and elicits the prior values of theEst attribute.

FPD performs the task of design of the DM strategy (Section 5.7). Its UML sequential diagram is

displayed in Figure 7.12

FpdValid performs the task of design validation (Section 5.8). Internally, a new instance of the

Simulator class is created and used in place ofDS.

BatchRun analysis thesteps_to_dofiled in the UseInf attribute, and runs the above tasks, i.e.

PriorElicit –FpdValid , if they are selected by the user. This mechanism assures both types of

user interaction, i.e. batch and interactive modes, as discussed in Section 5.1.

99


Figure 7.12:UML sequence diagram of the fully probabilistic design (FPD).

100

7.5. PACKAGE MULTIDM

value timestamp

value timestamp

value timestamp

communicationblock (incomming)

communicationblock (outgoing)

::

value timestamp

channels

blocks

noc cic coc cbs description

Figure 7.13:Illustration of structure of the DAEP.

7.5 Package MultiDM

This package extends the classes from packageSingleDM (Section 7.4) for the MP DM scenario

(Chapter 6). It should be kept in mind that classes defined in this package are preliminary, since they

were not verified by real experiments.

7.5.1 User Information

7.5.1.1 Datatype DAEP

Since we allow the environment to be implemented by any technology—i.e. not necessarily by the

OO approach—the interface between the participant and the environment is done via Data-Action

Exchange Platform, which not a class but a data-structureDAEP.

The DAEP is illustrated in Figure 7.13, and it consist of the following major parts:

Description has a fixed structure and uniquely determines the following parts of DAEP. This part

is to be read by another participant in order to find the communication channels.

Channels is a list of data channels. There are two types of channels: (i) input channel, where the

environment writes its data, and (ii) output channels, where the participant writes its decisions.

Since different participants can work with different sampling period, values of all data written

into DAEP must be accompanied by exact timestamp of their creation.

At least two channels (one for input and one for output) are reserved for synchronization of

inter-participant communication. All participants can write their requests and responses to

their output channel. The environment is responsible for transfer of data from the output

channel to the input channel of the appropriate recipient.

101


Figure 7.14:UML class diagram of the MPUserInfo class used for storing information from the user.

Communication blocks are two blocks (input and output) in memory allocated for exchange of

arbitrary information. This information is treated as an array of bytes by DAEP and it should

be interpreted by communication routines of each participant.

Once again, each participant writes its data into its own output block and the environment is

responsible for their delivery to the input block of the recipient.

Thus, the DAEP is uniquely described by the following attributes:

noc:int the total number of data channels,

cic:int the number (identifier) of the communication-input channel,

coc:int the number (identifier) of the communication-output channel,

cbs:int size of the communication block,

7.5.1.2 Class MPUserInfo (UserInfo)

Is an extension of the classUserInfo for additional MP-related information. UML class diagram is

displayed in Figure 7.14.

Attributes:

NeighbourInf:NeiInfCell is an additional argument which is used to store the users information

about the participant neighbours. It is constructed as a list of information about particular

neighbours.

DataInf:DAEPInfo is redefined as an instance of theDAEPInfo class, which is extension of the

originalDataInfo class to reflect the structure of the DAEP.

102


Figure 7.15:UML class diagram of the data-handling mechanism for MP DM.

7.5.1.3 Class NeighInf

Is a class for storing the user information on the participant neighbours.

Attributes:

id:PartID is a unique identifier of each participant. At present, it is represented by an abstract

datatypePartID . More detailed description of this datatype may be specified for each imple-

mentation.

alpha:double fixed value ofα, for selfish and hierarchical negotiation strategies (Section 6.4). For

cooperative scenario, this value may be used as initial condition for the negotiation procedure.

7.5.1.4 Class DAEPInfo (DataInfo)

Is an extension of theDataInfo class made to contain user information about the structure ofDAEP.

Attributes:

Cbs:long is the user-defined size of the communication blocks in DAEP.

Cic:int is the index of the incoming-communication channel in theChns list (inherited from DataInfo).

The channel must be of discrete type, withaction set tofalse.

Coc:int is the index of the outgoing-communication channel in theChns list (inherited from DataInfo).

The channel must be of discrete type, withaction set totrue.

7.5.2 Special purpose classes

Due to similarities of the intended approach to MP DM with practical tasks of DM for a single

participant (Chapter 5), very little needs to be done to adapt the structure of the basic classes of

probability calculus. The only exception is the data-handling mechanism, however, even their the

structural changes are rather small. The UML class diagram of the extension is displayed in Figure

7.15.

7.5.2.1 Class DAEPSource (DataSource)

Is a specialization of theDataSourceclass designed to provide an interface betweenDAEP and the

probabilistic core of each participant.

Attributes:

103


Figure 7.16:UML class diagram of the MP decision maker.

DAEP:DAEP is the instance of theDAEP datatype.

period:double is the period of sampling from the continuous time.

Operations:

step is re-implementation of the inherited operation used for innovating of the observed data (at-

tributeDt). This operation must be re-defined for each special type of theDataSource, how-

ever, in this case it is unusually challenging. Note that the observed data records are contin-

uously being written to DAEP by the environment. Each observation has its own timestamp

which can have (almost) arbitrary value. Therefore, the task of this operation is to re-sample

the irregular continuous-time observations from DAEP into fixed grid discrete-time observa-

tions forDt.

write is re-implementation of the inherited operation for writing the participant’s decisions into the

environment. This operation is simpler thanstep, since it can be written regularly at the end of

the operation cycle. This operation calls theatime operation internally to assign an appropriate

timestamp for each decision.

atime:double returns the actual time in the same format as the timestamps have.

7.5.3 Decision Makers

7.5.3.1 Class MultiDM (SingleDM)

Is an extension of the single participant classSingleDM.

Attributes:

DAEP:DAEP is an instance of theDAEP datatype.

UseInf:MPUserInfo is an instanceMPUserInfo instead of the originalUserInfo,

Operations:

read is an inherited operation that must be extended to read not only the observed data, but also

the information communicated from other participants. Specifically, if some information is

present in the input communication block of the DAEP, then, this operation must recognize

the nature of this information and call an appropriate constructor for it.

104


learn is an inherited operation that must be extended to handle possible merging of information

from the neighbour as discussed in Section 6.1, i.e. by calling themergeoperations of the in-

volved pdfs internally, or as a parallel process. In the latter case, the followingmergeoperation

is used.

merge is an auxiliary operation for cases where merging must be performed as a parallel operation

to learning, and possibly, over more than one DM cycle.

decide is an inherited operation which must be extended to implement the chosen negotiation strat-

egy (Section 6.4). In situation, when the aims—formalized by ideal pdfs—were changed, the

operation must call the design operation to adjust the DM strategy (attributeStra).

design is an auxiliary operation that is called when the ideal distribution has changed. In some

cases, it may be sufficient to adjust statistics of the strategy by calling theStra.replace_stats.

However, when the change in the ideal distributions is significant, the fullFPD operation must

be called.

This class finalizes the analysis of the Bayesian multiple participant decision making. Detailed spe-

cialization and implementation of the classes presented in this analysis will be the next step in the

global project of Bayesian MP DM.

105


106

8 Conclusion

In this thesis, we have designed a new framework for software support for Bayesian distributed

dynamic decision making. The primary concern of the thesis is the software framework. The task

of its design was complicated by the fact that the theory of distributed decision-making is not fully

developed and stabilized. Therefore, many theoretical issues that were encountered during the design

process were also addressed. As a result, a range of smaller contributions to the theory of decision

making was also achieved.

8.1 Key contributions of the thesis

Chapter 2 The requirements on the analysis were formalized by Requirements 2.1, and 2.2. The

requirements were often found contradictory and it was necessary to find a reasonable com-

promise.

The most prominent freely available software packages were reviewed in the light of our re-

quirements. It was concluded that none of the packages is suitable for our needs and that it is

necessary to create a new one.

Since flexibility of the framework was one of the key requirements, we have chosen the object-

oriented (OO) approach as a design method. On the other hand, the requirement of continuity

of research forced us to use Matlab as the basic development environment. Therefore, we have

proposed a novel approach of implementation of OO software in Matlab (Agreement 2.1). The

approach was tested on a simple problem and it was verified that it is possible to implement

OO principles in Matlab at a negligible loss of computational efficiency.

Chapter 3 The basics of Bayesian decision-making theory were reviewed in this Chapter. We have

presented well known results, as well as new emerging methods such as fully probabilistic

design (FPD) for systems with unobserved state (Proposition 3.2) and merging of pdfs (Section

3.4). Moreover, these results were translated into a sequence of basic probabilistic operations,

which are suitable for software implementation, see e.g. (3.27) for FPD.

Chapter 4 It is well known that the Bayesian theory of decision making is computationally tractable

only under certain assumptions. The well known basic DM operations for linear state-space

models, and exponential family models, were reviewed. Moreover, we have presented the

results of merging operations for these models.

107

CHAPTER 8. CONCLUSION

Many approximate techniques were developed for model families for which the general Bayesian

DM is not analytically tractable. These techniques can be seen as distributional approxima-

tions that are being applied to the general DM formulae. These techniques were also reviewed

in this Chapter.

Special attention was paid to the Variational Bayes technique, which is based on the assump-

tion of conditional independence. This assumption is a successful, widely used approximation

in the area of Bayesian networks. Better understanding of this assumption in terms of dynamic

decision-making may open a way for dealing with more complex models than it is usual at

present. It was discovered, that application of the Variational Bayes theorem (Theorem 4.1) to

the tasks of Bayesian filtering (Section 4.4.2), and Bayesian estimation (Section 4.5), can be

interpreted as exact analytical treatment with approximate (conditionally independent) models.

However, statistics of the posterior distributions are mutually dependent and must be evaluated

iteratively. This result is important for two reasons: (i) it guarantee a fixed finite-dimensional

form of the posterior distributions, which is important for achieving feasibility, and (ii) the

quality of approximation can be increased by iterations of the VEM algorithm.

Chapter 5 The Bayesian formulation of the decision making task is a consistent mathematical the-

ory. However, its application to real life problems is not trivial and many problems must

be addressed to achieve practical solutions. The basic steps of implementing DM theory

in practice—gained from the experience withsingleparticipant DM—were reviewed in this

Chapter. Therefore, this chapter is concerned with single participant DM.

Most of the steps is concerned with translating real-world experience into abstract objects of

the theory, namely the involved pdfs, i.e. prior distributions, models, ideal pdfs, etc. The

algorithms of DM can be applied only when those objects are chosen and fixed. Yet, we can

still question their validity for the given task after processing of real data.

The main contribution in this Chapter is in the area of model validation fordynamicDM. The

classical approach of splitting the real data in two parts (learning and validation part) was re-

formulated for dynamic models in terms of Bayesian DM (Section 5.5). It was observed that

the algorithm is sensitive to the choice of the cutting moment. To address the problem a new

method with multiple cutting moments was developed (Section 5.5.2).

Chapter 6 The basic practical steps of design of a single-participant DM were reviewed in the light

of multiple-participant scenario in this Chapter. An original concept of Bayesian MP DM is

presented in this Chapter. It was shown, that many sub-tasks of the MP DM (such as merging)

has already been addressed in the design tasks of single-participant DM. Detailed elaboration

of these principles is a task for future research.

Chapter 7 The core contribution of the thesis—i.e. analysis of new-generation software framework—

is presented in this Chapter. The analysis is presented in the Unified Modelling Language

(UML) notation. Following the UML methodology, the software is presented in five packages:

108

8.2. FUTURE WORK

one for mathematical functions, and four packages implementing the classes of Bayesian DM.

Each of the latter packages correspond to one chapter of the theory (Chapters 3–6).

Since all tasks of DM are implemented in terms of probability calculus, the most challenging

task was to design the basic classes for random variables, functions and pdfs. The chosen

approach appears to be very perspective, as it embraces the classical models—such as linear

state-space models (Section 7.3.1) and exponential family models (Section 7.3.2)—as well as

the new approximative models based on conditional independence (Section 7.3.3).

The analysis presented in this thesis reveals that structural differences between software images of

participants in multiple- and single-participant DM are rather small. This result is a consequence

of the chosen approach to the task of distributed DM (Section 1.1.2) and the chosen OO approach

to software design. Therefore, future development of distributed DM systems, using the multiple-

participant approach, is conceptually well defined in terms of classical single-participant paradigm.

However, a wide range of various problems must be overcome in order to achieve such maturity of

the theory and the its software image that would allow its application to real world problems.

8.2 Future work

As it was stated in the introduction (Section 1.1), this thesis is an initial step in creation of the

Bayesian distributed decision making theory. The amount of work required to reach this aim is

extensive. Therefore, in this Section, we mention only short-term tasks that are closely related to the

designed software.

Implementation the basic classes for linear state-space models were already implemented in the

Baddyr repository,http://guest:[email protected]:1800/svn/badyr/

work/Participants . This initial work helped to clarify many details in packagesProb

andFProb. It can be expected that implementation of the remaining packages (SingleDM and

MultiDM ) will also lead to clarification and modification of many details in them.

Particle filtering and geometric approach (Section 4.5) were not elaborated as part of theFProb

package. Preliminary considerations indicate that these techniques fit in the proposed frame-

work, and can by easily added by specialization of the basic classes in theProb package.

Computational efficiency issues were mostly neglected in this text. These issues are very im-

portant in practical implementations and many clever speedup were proposed for standard al-

gorithms. However, the main purpose of this text was to prepare a framework for development

of newalgorithms. Therefore, the main concern of this analysis was with keeping the software

structures as close to the theory as possible. However, implementation of computationally

optimized algorithms should be straightforward due to the object-oriented approach.

Communication between participant must be synchronized using a finite-state protocol. Creation

of such a protocol is essential for experiments with MP scenarios. It seems reasonable to

109

http://guest:[email protected]:1800/svn/badyr/work/Participants

http://guest:[email protected]:1800/svn/badyr/work/Participants

CHAPTER 8. CONCLUSION

implement the standard used in multi-agent systems, which is known as the request interaction

protocolhttp://www.fipa.org/specs/fipa00026/ .

FPD solution for the conditionally independent models (e.g. VB approach, Section 4.3.4) was

not elaborated yet. Elaboration of this step with elements from exponential family should

be relatively straightforward. This result, if it would be computationally tractable, could be

extremely useful for design of strategy of participant negotiation.

110

http://www.fipa.org/specs/fipa00026/

Index

actions, 1

attributes, 11, 15

Bayesian decision making, 17

Bayes rule, 19

BNT, 10

BNT„ 8

Chain rule, 19

channel, 48, 90

communication, 2, 59, 60

conditional independence, 19

conditioned on, 19

conditioning symbol, 19

conjugacy, 35

DAEP, 101

decision maker, 1, 17

DESIGNER, 47

determinant, 20

direct merging, 26, 61, 73

distributed DM, 1

DM horizon, 25, 29

DM strategy, 1, 21, 23, 25

Dynamic DM, 1

empirical density, 60

environment, 2

estimate, 73

estimate of internals, 22

estimation, 23

Expectation, 20

exponential family, 33, 80

feedback, 1

fictious data, 94

Framework, 6

fully probabilistic design, 58

fully probabilistic design (FPD), 23

ideal pdf, 24

Implementation, 6

indirect merging, 26, 61, 73

internal model, 21, 23, 30, 42, 71, 83

internal variable, 21

Jacobian, 33

Kullback-Leibler (KL) divergence, 20

learning, 1, 20

learning data, 54

likelihood function, 23

linear-in-parameters, 34

Marginalization, 19

Markov chain, 34

merged pdf, 26

merging, 26, 35

Mixtools, 8, 9

model structure, 22, 53

multiple-participant decision making (MP DM),

2

negotiation, 59, 61

111

Index

Normalization, 19

normalization factor, 23

object-oriented (OO), 7, 11, 13

observation model, 21, 30, 33, 45, 74

observed data, 70

operations, 11

optimal DM strategy, 20

parameter, 23

partial VB-observation model, 45, 86

partial VB-observation models, 87

participant, 2, 18

pdf, 18

pdfs, 71

Pdf of transformed variables, 20

prediction of internals, 22

predictor, 74

projection, 36

proportion sign,∝, 19

random variable, 18, 66

realization, 18

source pdfs, 26

specialization, 15

user, 7, 47

VB-conjugate, 45

112

Bibliography

[1] A. Wald,Statistical Decision Functions. New York, London: John Wiley & Sons, 1950.

[2] P. Fishburn,Utility Theory for Decision Making. New York, London, Sydney, Toronto: J. Wiley

and Sons, 1970.

[3] H. Mine and S. Osaki,Markovian Decision Processes. New York: Elsevier, 1970.

[4] R. Keeny and H. Raiffa,Decisions with multiple objectives: Preferences and value tradeoffs.

New York: J. Wiley and Sons, 1978.

[5] J. Berger,Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag,

1985.

[6] R. Bellman,Introduction to the Mathematical Theory of Control Processes. New York: Aca-

demic Press, 1967.

[7] H. Kushner,Introduction to stochastic control. New York, San Francisco, London: Holt, Rine-

hart and Winston, 1971.

[8] K. Astrom and B. Wittenmark,Adaptive Control. Reading, Massachusetts: Addison-Wesley

Publishing Company, 1989.

[9] E. Mosca,Optimal, Predictive, and Adaptive Control. Prentice Hall, 1994.

[10] D. Bertsekas,Dynamic Programming and Optimal Control. Nashua: Athena Scientific, 2001.

2nd edition.

[11] B. T. (Ed.),Control Theory. New York: IEEE Press, 2001. 523 pp.

[12] K. Astrom, Introduction to stochastic control. New York, San Francisco, London: Academic

Press, 1970.

[13] M. DeGroot,Optimal Statistical Decisions. New York: McGraw-Hill Company, 1970.

[14] A. Feldbaum, “Theory of dual control,”Autom. Remote Control, vol. 21, no. 9, 1960.

[15] A. Feldbaum, “Theory of dual control,”Autom. Remote Control, vol. 22, no. 2, 1961.

113

Bibliography

[16] P. Wellstead and M. Zarrop,Self-tuning Systems. Chichester: John Wiley & Sons, 1991.

[17] M. Kárný and J. Kracík, “A normative probabilistic design of a fair governmental decision

strategy,”Journal of Multi-Criteria Decision Analysis, vol. 10, pp. 1–15, 2004.

[18] T. Edmonds, S. Hodges, and A. Hopper, “An adaptive thin-client robot control architecture,” in

Proceedings of International Conference on Real-Time Computing Systems and Applications,

1999.

[19] Y. Haimes and D. Li, “Hierarchical multiobjective analysis for large scale systems: Review and

current status,”Automatica, vol. 24, no. 1, pp. 53–69, 1988.

[20] T. Sandholm, “Distributed rational decision making,” inMultiagent Systems - A Modern Ap-

proach to Distributed Artificial Intelligence(G. Weiss, ed.), pp. 201–258, 1999.

[21] A. Bonastre, R. Ors, and M. Peris, “Distributed expert systems as a new tool in analytical

chemistry,”Trac–Trends Anal. Chem., vol. 20, no. 5, pp. 263–271, 2001.

[22] R. Caballero, T. Gomez, M. Luque, F. Miguel, and F. Ruiz, “Hierarchical generation of pareto

optimal solutions in large-scale multiobjective systems,”Computers and operations research,

vol. 29, no. 11, pp. 1537–1558, 2002.

[23] H. Nurmi, “Resolving group choice paradoxes using probabilistic and fuzzy concepts,”Group

Decision and Negotiation, vol. 10, pp. 177–198, 2001.

[24] D. Fudenberg and J. Tirole,Game Theory. MIT Press, 1991.

[25] H. L. Dreyfus and S. Dreyfus, “From socrates to expert systems: The limits and dangers of

calculative rationality.”

[26] K. Arrow, Social Choice and Individual Values. New Haven: Yale University Press, 1995. 2nd.

ed.

[27] M. Kárný and T. Guy, “On dynamic decision-making scenarios with multiple participants,” in

Multiple Participant Decision Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), (Adelaide),

pp. 17–28, Advanced Knowledge International, May 2004.

[28] M. Kárný, J. Böhm, T. Guy, L. Jirsa, I. Nagy, P. Nedoma, and L. Tesar, Optimized Bayesian

Dynamic Advising: Theory and Algorithms. London: Springer, 2005. to appear.

[29] M. Kárný, “Towards fully probabilistic control design,”Automatica, vol. 32, no. 12, pp. 1719–

1722, 1996.

[30] M. Kárný, J. Böhm, T. V. Guy, and P. Nedoma, “Mixture-based adaptive probabilistic control,”

International Journal of Adaptive Control and Signal Processing, vol. 17, no. 2, pp. 119–132,

2003.

114

Bibliography

[31] M. Kárný and T. Guy, “Fully probabilistic control design,”Systems & Control Letters, 2004.

submitted.

[32] J. Kracík, “On composition of probability density functions,” inMultiple Participant Decision

Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), vol. 9 ofInternational Series on Advanced

Intelligence, pp. 113–121, Adelaide, Australia: Advanced Knowledge International, 2004.

[33] J. Andrýsek, M. Kárný, and J. Kracík, eds.,Multiple Participant Decision Making, (Adelaide),

Advanced Knowledge International, May 2004.

[34] M. K. et al,ProDaCTool Background. Internal Report of IST-99-12058 Project, 2001.

[35] A. Rakar, P. N. Tatiana V. Guy, M. Kárný, and D. Juricic, “Advisory system prodactool: Case

study on gas conditioning unit,”Journal of Adaptive Control and Signal Processing, 2003.

submitted.

[36] K. Murphy, “The bayes net toolbox for matlab,”Computing Science and Statistics, vol. 33,

2001.

[37] P. Coad and N. J.,Object-Oriented Programming. Prentice Hall, 1993.

[38] P. Nedoma, and M. Kárný and J. Böhm,ABET: Adaptive Bayesian Estimation Toolbox for

MATLAB. Prague, Czech Republic: ÚTIA AVCR, 1996.

[39] M. Kárný and A. Halousková, “Designer – package for preliminary tuning of LQG adaptive

control. programmer’s manual (with user’s guide). version 4.0,” Tech. Rep. 1665, UTIA AV

CR, PObox 18, 182 08 Prague 8, Czech Republic, 1990.

[40] P. Nedoma, and M. Kárný and J. Böhm, “Project DESIGNER,” inAdaptive and Predictive

Control, Proceedings of the Spring school(A. Kuznetsov, ed.), pp. 91–93, 1996.

[41] S. Hendrickx, “Glib-c: C as an alternative object oriented environment,” Master’s thesis, UNI-

VERSITEIT ANTWERPEN, 2003–2004.

[42] T. Van Sickle,Reusable software components: object-oriented embedded systems programming

in C. Prentice-Hall, 1997.

[43] G. Booch, J. Rumbaugh, and I. Jacobson,The Unified Modelling Language User Guide.

Addison-Wesley, 1998.

[44] V. Peterka, “Bayesian system identification,” inTrends and Progress in System Identification

(P. Eykhoff, ed.), pp. 239–304, Oxford: Pergamon Press, 1981.

[45] M. Rao,Measure Theory and Integration. New York, Chichester, Brisbane, Toronto, Singapore:

John Wiley and Sons, 1987. Pure and Applied Mathematics, Wiley-Interscience Series of Texts,

Monographs and Tracts.

115

Bibliography

[46] V. Peterka, “Bayesian approach to system identification,” inTrends and Progress in System

identification(P. Eykhoff, ed.), pp. 239–304, Oxford: Pergamon Press, 1981.

[47] S. Kullback and R. Leibler, “On information and sufficiency,”Annals of Mathematical Statis-

tics, vol. 22, pp. 79–87, 1951.

[48] T. Guy, J. Böhm, and M. Kárný, “Multiobjective probabilistic mixture control,” inIFAC World

Congress, Preprints(IFAC, ed.), Prague: IFAC, 2005. accepted.

[49] J. Kracík, “Composition of probability density functions - optimizing approach,” Tech. Rep.

2099, ÚTIA AV CR, Praha, 2004.

[50] J. Andrýsek, “Approximate recursive Bayesian estimation of dynamic probabilistic mixtures,”

in Multiple Participant Decision Making(J. Andrýsek, M. Kárný, and J. Kracík, eds.), pp. 39–

54, Magill, Adelaide: Advanced Knowledge International, 2004.

[51] V. Šmídl, The Variational Bayes Approach in Signal Processing. PhD thesis, Trinity College

Dublin, 2004.

[52] J. M. Bernardo, “Expected infromation as expected utility,”The Annals of Statistics, vol. 7,

no. 3, pp. 686–690, 1979.

[53] R. Kulhavý, “Recursive nonlinear estimation: A geometric approach,”Automatica, vol. 26,

no. 3, pp. 545–555, 1990.

[54] S. Dalal and G. H. Jr., “On approximating parametric Bayes models by nonparametric Bayes

models,”The Annals of Statistics, vol. 8, pp. 664–672, 1980.

[55] R. Kulhavý, “A Bayes-closed approximation of recursive non-linear estimation,”International

Journal Adaptive Control and Signal Processing, vol. 4, pp. 271–285, 1990.

[56] E. Daum, “New exact nonlinear filters,” inBayesian Analysis of Time Series and Dynamic

Models(J. Spall, ed.), New York: Marcel Dekker, 1988.

[57] O. Barndorff-Nielsen,Information and exponential families in statistical theory. New York:

Wiley, 1978.

[58] L. Ljung and T. Söderström,Theory and practice of recursive identification. Cambridge; Lon-

don: MIT Press, 1983.

[59] D. Titterington, A. Smith, and U. Makov,Statistical Analysis of Finite Mixtures. Chichester,

New York, Brisbane, Toronto, Singapore: John Wiley & Sons, 1985. ISBN 0 471 90763 4.

[60] B. Ristic, S. Arulampalam, and N. Gordon,Beyond the Kalman Filter: Particle Filters for

Tracking Applications. Artech House Publishers, 2004.

[61] S. Kotz and N. Johnson,Encyclopedia of statistical sciences. New York: John Wiley, 1985.

116

Bibliography

[62] J. Bernardo and A. Smith,Bayesian theory. Chichester, New York, Brisbane, Toronto, Singa-

pore: John Wiley & Sons, 1997. 2nd edition.

[63] J. Pratt, H. Raiffa, and R. Schlaifer,Introduction to Statistical Decision Theory. MIT Press,

1995.

[64] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via

the EM algorithm,”Journal of Royal Statistical Society, Series B, vol. 39, pp. 1–38, 1977.

[65] R. M. Neal and G. E. Hinton, “A view of the EM algorithm that justifies incremental, sparse

and other variants.,” inLearning in Graphical Models(M. I. Jordan, ed.), pp. 355–369, Kluwer,

1998.

[66] C.F.J.Wu, “On the convergence properties of the EM algorithm,”The Annals of Statistics,

vol. 11, pp. 95–103, 1983.

[67] R. E. Kass and A. E. Raftery, “Bayes factors,”Journal of American Statistical Association,

vol. 90, pp. 773–795, 1995.

[68] S. Amari, S. Ikeda, and H. Shimokawa, “Information geometry ofα-projection in mean field

approximation,” inAdvanced Mean Field Methods(M. Opper and D. Saad, eds.), (Cambridge,

Massachusetts), The MIT Press, 2001.

[69] A. Quinn, P. Ettler, L. Jirsa, I. Nagy, and P. Nedoma, “Probabilistic advisory systems for

data-intensive applications,”International Journal of Adaptive Control and Signal Processing,

vol. 17, no. 2, pp. 133–148, 2003.

[70] M. Sato, “Online model selection based on the variational bayes,”Neural Computation, vol. 13,

pp. 1649–1681, 2001.

[71] J. W. Miskin,Ensemble Learning for Independent Component Analysis. PhD thesis, University

of Cambridge, 2000.

[72] S. Amari,Differential-Geometrical Methods in Statistics. Sringer-Verlag, 1985.

[73] R. Kulhavý and M. B. Zarrop, “On general concept of forgetting,”International Journal of

Control, vol. 58, no. 4, pp. 905–924, 1993.

[74] A. Jazwinski,Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.

[75] R. Kulhavý, “Recursive Bayesian estimation under memory limitations,”Kybernetika, vol. 26,

pp. 1–20, 1990.

[76] J. M. Bernardo, “Approximations in statistics from a decision-theoretical viewpoint,” inProb-

ability and Bayesian Statistics(R. Viertl, ed.), pp. 53–60, New York: Plenum, 1987.

117

Bibliography

[77] H. J. Kushner and G. G. Yin,Stochastic Approximation Algorithms and Applications. New

York: Springer-Verlag, 1997.

[78] J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability

for process control: A review and future directions,”International Journal of Control, Automa-

tion, and Systems, vol. 2, no. 3, pp. 263–278, 2004.

[79] G. J. Gordon, “Stable function approximation in dynamic programming,” inProceedings of the

Twelfth International Conference on Machine Learning, pp. 261–268, 1995.

[80] J. Böhm and M. Kárný, “Quadratic adaptive control of normal mixtures,” inProceedings of the

European Control Conference ECC’01(J. L. Martins de Carvalho, ed.), (Porto), Faculdade de

Engenharia da Universidade, September 2001.

[81] J. Bucha, M. Kárný, P. Nedoma, J. Böhm, and J. Rojícek, “Designer 2000 project,” inInterna-

tional Conference on Control ’98, (London), pp. 1450–1455, IEE, September 1998.

[82] M. Kárný, “Tools for computer aided design of adaptive controllers,”IEE Proceedings - Con-

trol Theory and Applications, vol. 150, no. 6, p. 642, 2003.

[83] M. Kárný, P. Nedoma, N. Khailova, and L. Pavelková, “Prior information in structure esti-

mation,” IEE Proceedings - Control Theory and Applications, vol. 150, no. 6, pp. 643–653,

2003.

[84] M. Novák, J. Böhm, P. Nedoma, and L. Tesar, “Adaptive LQG controller tuning,”IEE Proceed-

ings - Control Theory and Applications, vol. 150, no. 6, pp. 655–665, 2003.

[85] M. Kárný, “Quantification of prior knowledge about global characteristics of linear normal

model,”Kybernetika, vol. 20, no. 5, pp. 376–385, 1984.

[86] M. Kárný and A. Halousková, “Automatic prior design of LQG adaptive controllers,” Tech.

Rep. 1794, ÚTIA AVCR, POBox 18, 182 08 Prague 8, Czech Republic, 1994.

[87] M. Kárný, N. Khailova, P. Nedoma, and J. Böhm, “Quantification of prior information revised,”

International Journal of Adaptive Control and Signal Processing, vol. 15, no. 1, pp. 65–84,

2001.

[88] N. Khaylova, “Exploitation of prior knowledge in adaptive control design,” tech. rep., FAV

ZCU, University of West Bohemia, Pilsen, Czech Republic, 2001. PhD Thesis.

[89] M. Kárný, “Algorithms for determining the model structure of a controlled system,”Kyber-

netika, vol. 19, no. 2, pp. 164–178, 1983.

[90] M. Plutowski, “Survey: Cross-validation in theory and practice,” research report, Department

of Computational Science Research, David Sarnoff Research Center, Princeton, New Jersey,

USA, 1996.

118

Bibliography

[91] B. Huang, “On-line closed-loop model validation and detection of abrupt parameter changes,”

Journal of Process Control, vol. 11, no. 6, pp. 699–715, 2001.

[92] M. Kárný, P. Nedoma, and V. Šmídl, “Cross-validation of controlled dynamic models: Bayesian

approach,” inIFAC World Congress, Preprints(IFAC, ed.), Prague: IFAC, 2005. accepted.

[93] G. Golub and C. VanLoan,Matrix Computations. Baltimore – London: The John Hopkins

University Press, 1989.

[94] G. Bierman,Factorization Methods for Discrete Sequential Estimation. New York: Academic

Press, 1977.

119

Date post:	24-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Software Analysis of Bayesian Distributed Dynamic Decision...

Documents