Bayes Linear for Dummies
Bayes Linear for Dummies
JAC+IRV
24th April 2009
Bayes Linear for Dummies
Why would I use Bayes linear?
Why would I use Bayes linear?
Why would I use Bayes linear?
Bayes Linear for Dummies
Why would I use Bayes linear?
Di�culties with full Bayes
Even in small problems, it can be too di�cult ortime-consuming to express, document and validate ameaningful joint prior probability specification;
Given such a specification, the computations for learning fromdata become technically di�cult and extremely computerintensive;
In higher-dimensions the likelihood surface can be verycomplicated, making full Bayes calculations potentially highlynon-robust.
Therefore if, in complex problems, we are unable to make andanalyse full prior probability specifications, it follows that werequire methods based around simpler belief specifications
Bayes Linear for Dummies
Why would I use Bayes linear?
Working with partial belief specifications
Bayes linear is such an approach which is based around partialbelief specifications in terms of expectations
Typically, we require only mean, variance and covariancespecifications for all uncertain quantities
We may view the Bayes linear approach asO↵ering a simple approximation to a full Bayes analysisComplementary to the full Bayes approach, o↵ering newinterpretative and diagnostic toolsA generalisation of the full Bayes approach where we lift therestriction of requiring a full probabilistic prior before we maylearn anything from data
Bayes Linear for Dummies
Features of Bayes linear
Features of Bayes linear
Features of Bayes linear
Bayes Linear for Dummies
Features of Bayes linear
Features of the Bayes linear approach
Subjective and Bayesian
Belief specifications honestly correspond to our beliefs
Expectation as primitive
Adjust beliefs by linear fitting rather than conditioning
Computationally straightforward allowing the analysis of morecomplex problems
Diagnostic tools are a key part of the approachHow prior beliefs a↵ect conclusionsHow beliefs change by the adjustmentHow beliefs about observables compare to the observationsthemselves
Important special cases - multivariate Gaussian
Bayes Linear for Dummies
Features of Bayes linear
Stages of belief analysis
A typical Bayes linear analysis of beliefs proceeds in the followingstages:
1 Specification of prior beliefs
2 Interpret the expected adjustments a priori
3 Given observations, perform and interpret the adjustments
4 Make diagnostic comparisons between actual and expectedbeliefs
Bayes Linear for Dummies
Belief Specification
Belief Specification
Belief Specification
Bayes Linear for Dummies
Belief Specification
de Finetti: Expectation as Primitive
de Finetti spent most of his life studying subjectiveconceptions of probability.
He proposed the use of expectation as the primitive entity onwhich to base any analysis, as opposed to probability.
In the Bayes linear approach, we follow de Finetti and takeexpectation as primitive.
Probabilities (where relevant) enter as derived quantities: theyare the expectations of indicator functions.
Note this asymmetry: if probability is treated as the primitivequantity then one has to specify (in the continuous case) aninfinite set of probabilities in order to derive a singleexpectation.
Bayes Linear for Dummies
Belief Specification
Belief Specification
The Bayes linear approach is subjectivist, and so in anyanalysis we need to specify our beliefs over all randomquantities of interest.
However, as we consider expectation as primitive, we requireonly specifications of the expectations, variances andcovariances of the random quantities of interest.
(If we have beliefs about higher orders we can include these inthe analysis too!)
For example, say we are interested in predictingB = (B
1
,B2
)T from knowledge of D = (D1
,D2
)T which wewill measure soon, then all we need to specify are E(B),E(D), Var(B), Var(D) and Cov (B,D).
Bayes Linear for Dummies
Belief Specification
Methods for Assigning Expectations
There are no strict rules for quantifying prior beliefs; everycase will depend on personal judgement, the problem inquestion, and the availability of information
Possible techniques include:Studying summary statistics from samples in relatedpopulationsIdentifying one and two standard deviation intervalsSpecifying probability quantiles and/or distributions consistentwith those quantilesAssessing a covariance by considering the variance of thedi↵erence of the corresponding quantitiesPartitioning variances and covariances into termscorresponding to uncorrelated components
Bayes Linear for Dummies
Belief Specification
The example: Numbers
Suppose we have a simple computer simulator and considerthe output points F = (B
1
,B2
,D1
,D2
)T
We want to predict the computer model output atB = (B
1
,B2
)T from the observed values at D = (D1
,D2
)T
We have a very simple prior specification:
E(F ) = 0, Var(F )ii = 100,
and we obtain a correlation matrix via the standard Gaussiancovariance function with ✓ = 1
B1
B2
D1
D2
B1
1.00 0.56 0.52 0.61B
2
0.56 1.00 0.32 0.98D
1
0.52 0.32 0.52 0.28D
2
0.61 0.98 0.28 1.00
Bayes Linear for Dummies
Belief Specification
The example: Picture
Bayes Linear for Dummies
Adjusted Expectation and Variance
Adjusted Expectation and Variance
Adjusted Expectation and Variance
Bayes Linear for Dummies
Adjusted Expectation and Variance
Belief Adjustment
We are interested in how our beliefs about B change in thelight of information given by D.
We look among the collection of linear estimates, i.e. those ofform c
0
+ c1
D1
+ c2
D2
, and choose constants c0
, c1
, c2
tominimise the prior expected squared error loss in estimatingeach of B
1
and B2
:
E([B1
� c0
� c1
D1
� c2
D2
]2).The choices of constants may be easily computed, and theestimators ED(B) = (ED(B
1
),ED(B2
))T turn out to be givenby:
ED(B) = E(B) + Cov (B,D) Var(D)†(D � E(D)).
which we refer to as the adjusted expectation for collection Bgiven collection D.
Bayes Linear for Dummies
Adjusted Expectation and Variance
Adjusted expectation
The adjusted expectation for collection B given collection D is
ED(B) = E(B) + Cov (B,D) Var(D)†(D � E(D)).
The adjusted version of the B given D is the ‘residual’ vector
AD(B) = B � ED(B).
We can partition the vector B as the sum of two uncorrelatedvectors:
B = ED(B) + AD(B),
Bayes Linear for Dummies
Adjusted Expectation and Variance
Adjusted variance
We partition the variance matrix of B into two variancecomponents:
Var(B) = Var(ED(B)) + Var(AD(B))
= RVarD(B) + VarD(B)
These are the resolved variance matrix and the adjustedvariance matrix (i.e. explained and residual variation).
The variance matrices are calculated as
VarD(B)= Var(B)� Cov (B,D)Var(D)†Cov (D,B) ,
RVarD(B) = Cov (B,D) Var(D)†Cov (D,B) .
Our variance matrices must be non-negative definite.
We use the Moore-Penrose generalized inverse (A†) to allowfor degeneracy.
Bayes Linear for Dummies
Adjusted Expectation and Variance
Resolution
We summarize the expected e↵ect of the data D for theadjustment of B by a scale-free measure which we call theresolution of B induced by D,
RD(B) = 1� VarD(B)
Var(B)=
Var(ED(B))
Var(B).
The resolution lies between 0 and 1, and in general, small(large) resolutions imply that the information has little(much) linear predictive value, given the prior specification.
Similar in spirit to an R2 measure for the adjustment.
Bayes Linear for Dummies
Adjusted Expectation and Variance
Example: The Adjustment
We can calculate our adjusted expectations for points B givenD algebraically as:
ED(B1
) = 0.381D1
+ 0.507D2
+ 0
ED(B2
) = 0.051D1
+ 0.961D2
+ 0
We see that B2
is mainly determined by the value of D2
–unsurprising given how close these points are!
We can also calculate the adjusted variance and resolutions
VarD(B) =
✓49.06 -5.83-5.83 4.64
◆, RD(B) =
✓0.5090.954
◆
We can see that we resolve much of the uncertainty about B2
Bayes Linear for Dummies
Adjusted Expectation and Variance
Example: Variance Partition
We can decompose the prior variance into its resolved andunresolved portions:
Var(B) = RVarD(B) + VarD(B)✓
100.00 55.7155.71 100
◆=
✓50.94 61.5461.54 95.36
◆+
✓49.06 -5.83-5.83 4.64
◆
Which is nice!
Bayes Linear for Dummies
Adjusted Expectation and Variance
Interpretations of belief adjustment
An approximationIf we’re fully Bayesian, then adjusted expectation is a tractableapproximation to the full Bayes conditional expectationAdjusted variance is then an easily-computable upper boundon the full Bayes preposterior risk, under quadratic loss
An estimatorED(B) is an ‘estimator’ of the value of B, which combines thedata with simple aspects of our prior beliefs in a plausiblemannerAdjusted variance is then the mean-squared error of theestimator ED(B)
A primitiveAdjusted expectation is a primitive quantification of furtheraspects of our beliefs about B having‘accounted for’ DAdjusted variance is also a primitive, but applied to the‘residual variance’ in B having removed the e↵ects of D
Bayes Linear for Dummies
Adjusted Expectation and Variance
Adjusted and Conditional Expectations
The conditional expectation of B|D is the value you wouldspecify under the penalty LC =
Pi cDi [B � E(B|Di )]2
If D is a partition, so Di 2 {0, 1} andP
i Di = 1, then thenthe adjusted expectation minimises LA =
Pi cDi [B � xi ]2.
So we choose xi to be the conditional expectation, and
ED(B) =X
i
E(B|Di )Di
So when D is a partition, the adjusted and conditionalexpectations are identical
Adjusted expectation does not require D to be a partition,and so can be considered as a generalization of conditionalexpectation
Bayes Linear for Dummies
Adjusted Expectation and Variance
Extension to linear combinations
Let hBi be the set of all linear combinations of B
If X = hTB 2 hBi, then we can write
E(X ) = hTE(B), Var(X ) = hT
Var(B)h.
So by specifying E(B) and Var(B) we have implicitly specifiedexpectations and variances for all elements of hBiSimilarly, by calculating ED(B) and VarD(B), we haveimplicitly calculated the adjustment for all X 2 hBi
Bayes Linear for Dummies
Adjusted Expectation and Variance
The observed adjustment
Given the observed value d of D, we can calculate theobserved adjusted expectation
Ed(B) = E(B) + Cov (B,D) Var(D)†(d � E(D)).
For our example, we observe d = (�8, 10) and thecorresponding observed adjusted expectations are:
Ed(B) =
✓2.029.20
◆
Having observed D = d , we notice that our adjustedexpectations have both increased
B1
is relatively far from D and so only moves a little, whereasB
2
is close to D2
and so its expectation shifts substantiallytowards the value d
2
= 10
Bayes Linear for Dummies
Adjusted Expectation and Variance
The observed adjustment
Bayes Linear for Dummies
Adjusted Expectation and Variance
The observed adjustment
Bayes Linear for Dummies
Diagnostics
Diagnostics
Diagnostics
Bayes Linear for Dummies
Diagnostics
Data and Diagnostics
Once data has been observed (first for D and then for B) wecan perform diagnostics.
The Bayes linear methodology has a rich variety of diagnostictools available (more than in a fully Bayesian analysis).
We can perform diagnostics on individual random quantities,or on collections of random quantities.
Three important versions are:Prior Diagnostics.Adjustment Diagnostics.Final Observation Diagnostics.
Bayes Linear for Dummies
Diagnostics
Prior Diagnostics
Each prior belief statement that we make describes our beliefsabout some random quantity.
If we observe that quantity, we may compare what we expectto happen with what actually happens.
Once we observe the values of D = d , we can check whetherthe data is consistent with our prior specifications.
For a single random quantity, we can calculate thestandardized change and the discrepancy:
S(di ) =di � E(Di )p
Var(Di ), Dis(d) =
[di � E(Di )]2
Var(Di )= S(di )
2
E(S(di )) = 0 and Var(S(di )) = 1, so if we observe S(di )greater than about 3 this suggests an inconsistency.
Bayes Linear for Dummies
Diagnostics
Discrepancy Ratio
For the entire collection, the natural counterpart of thediscrepancy is the Mahalanobis distance:
Dis(d) = (d � E(D))TVar(D)†(d � E(D)).
The prior expected value of Dis(d) is given byE(Dis(d)) = rk{Var(D)}NB: if we pretend D is Normal, then Dis(d) would be �2
We can then normalise the discrepancy, to obtain thediscrepancy ratio for d
Dr(d) =Dis(d)
rk{Var(D)} ,
which has prior expectation E(Dr(d)) = 1.
Large Dr(d) will of course also suggest inconsistencies.
Bayes Linear for Dummies
Diagnostics
Example: Prior Diagnostics
Comparing our observed values, with our priors for thosevalues we obtain
S(di ) =
✓-0.81.0
◆, Dr(d) = 1.1
So d1
is smaller than expected, and d2
is larger. But not bymuch.
Unsurprising as observing d = (�8, 10), which when we havea prior standard deviation of 10 is perfectly reasonable
If we assumed that S(di ) is unimodal, then approximate 95%bounds are given by ±3� – so this is clearly ok
Considering the collection, Dr(d) ' 1 so the observed valuesare not inconsistent with our prior beliefs
Bayes Linear for Dummies
Diagnostics
Adjustment Diagnostics
Having obtained the observed adjusted expectation, we maynow check how much our beliefs have been a↵ected by thedata
We calculate the univariate standardized adjustments (anddiscrepancies):
Sd(Bi ) = S(Ed(Bi )) =Ed(Bi )� E(ED(Bi ))p
Var(ED(Bi ))=
Ed(Bi )� E(Bi )pRVarD(Bi )
The adjustment discrepancy for a collection is given by:
Disd(B) = (Ed(B)� E(B))TRVarD(B)†(Ed(B)� E(B)).
Again E(S(Ed(Bi ))) = 0, Var(S(Ed(Bi ))) = 1 andE(Disd(B)) = rk{RVarD(Bi )} so large values warrant furtherinvestigation.
Bayes Linear for Dummies
Diagnostics
Size and Size Ratio
We may also now check how di↵erent our observed adjustedexpectation is from our prior expectations
For this, we calculate the size of the adjustment of B by D
Sized(Bi ) =(Ed(Bi )� E(Bi ))2
Var(Bi )
Similarly, the size of the adjustment for the collection B byD = d is
Sized(B) = (Ed(B)� E(B))TVarD(B)†(Ed(B)� E(B)).
Bayes Linear for Dummies
Diagnostics
Example: Adjustment Diagnostics
Calculating the standardised adjustments we obtain:
Sd(Bi ) =
✓0.280.94
◆, Drd(B) = 1.13
So our beliefs about B1
appear only slightly a↵ected by thedata, whereas our beliefs about B
2
are more influenced due toits strong correlation to D
2
The size diagnostics are given by:
Sized(Bi ) =
✓0.040.85
◆, Sized(B) = 0.49
Since E(Sized(B)) = 1.13(= RUd(B)), this suggests ouradjusted beliefs are closer to our priors than expected –perhaps indicating that we may have over-stated our variance
Bayes Linear for Dummies
Diagnostics
Final Observation Diagnostics
Eventually, we may observe the values B = b and in additionto checking how these deviate from the prior expected valuesE(B), we should also check the change from adjustedexpectation Ed(B) to actual observation b.
For a single rq, the appropriate standardized change anddiscrepancy are
Sd(bi ) = S(Abi (d)) =bi � Ed(Bi )p
VarD(Bi ), Disd(bi ) = Sd(bi )
2,
and for the collection we have:
Disd(b) = (b � Ed(B))TVarD(B)†(b � Ed(B)).
These checks suggest that the predictions were roughly withinthe tolerances suggested by our prior variance specifications.
Bayes Linear for Dummies
Diagnostics
Example: Final Observation Diagnostics
We actually observe B to be b = (1, 9). Comparing this toour adjusted expectations given D, we obtain
Sd(b) =
✓-0.15-0.09
◆, Drd(b) = 0.02
So our adjusted expectation is suspiciously close to theobserved values of B, and Dr(d)b is suspiciously small
Perhaps this could indicate we’ve overstated our variance ormis-specified our correlation
Or perhaps its just a great prediction! Or an artificially goodsimulated example!
Bayes Linear for Dummies
Diagnostics
The observed adjustment
-10 -5 0 5 10 15
05
10
15
20
25
30
Diagnostics for Differing choices of E[D]
Prior Expectation of D
Diagnostics
Dis_d(b)Dis(d)
Bayes Linear for Dummies
Diagnostics
Diagnostics: morals
Piecemeal diagnostic analysis of individual quantities is notsu�cient. We have to examine diagnostics for collections.
Each part of the adjustment process can be diagnosticallyscrutinized. We have shown diagnostic measures for:
the raw data;the di↵erence between adjusted expectations (estimates) andprior expectations relative to variance explained and priorvariance;the di↵erence between adjusted versions (residuals) and priorexpectations relative to variance remaining.
If we had found a problem how might we have avoided it, orat least detected it sooner?
Bayes Linear for Dummies
Canonical Structure
Canonical Structure
Canonical Structure
Bayes Linear for Dummies
Canonical Structure
Canonical analysis
Our belief specification for B and our adjustment by Dimplies specifications and adjustments for all linearcombinations in hBi.We can explore the (possibly complex) changes in beliefsabout hBi induced by the adjustment via a canonical analysisA key component of the canonical analysis is the resolutiontransform matrix defined as
TB:D = Var(B)†Cov (B,D)Var(D)†Cov (D,B) .
TB:D has the property that Var(B)TB:D = RVarD(B)The eigenstructure of TB:D summarises all the e↵ects of beliefadjustmentLet the normed right eigenvectors of TB:D be v
1
, . . . , vrB ,ordered by eigenvalues 1 � �
1
� �2
� . . . � �rB � 0 andscaled as vT
i Var(B)vi = 1
Bayes Linear for Dummies
Canonical Structure
Canonical directions
We define the ith canonical direction as
Yi = vTi (B � E(B))
The canonical directions have the following properties
E(Yi ) = 0, Var(Yi ) = 1, Corr (Yi ,Yj) = 0
RVarD(Yi ) = �i , VarD(Yi ) = 1� �i ,
So the collection {Y1
,Y2
, . . .} forms a mutually uncorrelated‘grid’ of directions over hBi, summarizing the e↵ects of theadjustment.
Y1
is the quantity we learn most about. Y2
is the quantity welearn next most about, given that it is uncorrelated with Y
1
.Yrk{B} is the quantity we learn least about.
Relationship to canonical correlation analysis (and PCA)
Bayes Linear for Dummies
Canonical Structure
Canonical properties and system resolution
Each X 2 hBi can be expressed using the canonical structureas
X � E(X ) =X
i
Cov (X ,Yi ) Yi ,
and RVarD(X ) =X
i
�i (Corr (X ,Yi ))2
We can use this structure to express the resolved uncertaintyfor the entire collection hBi given adjustment by D via theresolved uncertainty and the system resolution
RUD(B) =X
i
�i , RD(B) =1
rk{B}X
i
�i
RD(B) is a scalar summary of the e↵ectiveness of theadjustment by D for the entire collection hBi
Bayes Linear for Dummies
Canonical Structure
Bearing 180, Mark 0
The matrix TB:D fully summarises all aspects of theunobserved adjustment
The observed adjustment can be summarised in a singlevector – the bearing
The bearing for the adjustment of B by D = d is a randomquantity in hBi which maximises SizeD(X ) and is given by
Zd(B) = [Ed(B)� E(B)]TVar(B)†[B � E(B)].
The bearing expresses both the direction and the magnitudeof the change between prior and adjusted beliefs, relative tothe prior covariance specification.
The biggest possible expected squared change in expectation,relative to prior variance, is for the linear combination givenby Zd(B)
Bayes Linear for Dummies
Canonical Structure
Example: Canonical gubbins
Investigating the canonical structure of the unobservedadjustment yields:
TB:D =
✓0.24 0.120.48 0.89
◆, RUD(B) = 1.13
� =
✓0.970.16
◆,
✓Y
1
Y2
◆=
✓0.17 B
1
+ 0.98 B2
0.83 B1
- 0.55 B2
◆
So we learn most about Y1
(and so particularly about B2
), weexpect to resolve 97% of the uncertainty in this direction
We learn comparatively little in direction Y2
Having seen the data, we can obtain the bearing
Zd(B) = �0.045B1
+ 0.117B2
Which is nice!
Bayes Linear for Dummies
Partial Analysis
Partial Analysis
Partial Analysis
Bayes Linear for Dummies
Partial Analysis
Partial Analysis
I need more data!Suppose we have already adjusted out beliefs about B givendata, DNow suppose we get even more data F , how should we furtheradjust our beliefs about B?
What does this bit do?Suppose we have already adjusted out beliefs about B givendata, H = D [ FWhat were the individual e↵ects of adjusting by D or F?
This requires a partial analysis where we consider the e↵ectsof subsets of the data on our beliefs
Bayes Linear for Dummies
Partial Analysis
Partial adjustments
By adjusting beliefs sequentially, we can separate andscrutinize the adjustments at each stage
In order to separate the e↵ects on our beliefs of di↵erentsub-collections, we evaluate partial adjustments representingthe change in adjustment as we accumulate data.
Suppose we intend to adjust our beliefs about B byobservations on D and F
We adjust B by (D [ F ) but separate the e↵ects of thesubsets by adjusting B in stages, first by D, then adding F (orvice versa)
Bayes Linear for Dummies
Partial Analysis
Separating things out
How do we separate the e↵ects of D and F on B?
If D ?? F , then adjusted expectations are additive so
ED[F (B � E(B)) = ED(B � E(B)) + EF (B � E(B))
If D and F are correlated, then we obtain a similar expressionby removing the ‘common variability’ between F and D.
For any D, F , the vectors D and AD(F ) = F � ED(F ) areuncorrelated.
Also the collection of linear combinations hD [ F i is the sameas hD [ AD(F )iSo, for any D, F
ED[F (B � E(B)) = ED(B � E(B)) + EAD(F )
(B � E(B))
Bayes Linear for Dummies
Partial Analysis
The partial adjustment
The partial adjustment of B by F given D, denotedE
[F/D]
(B), is
ED[F (B) = ED(B) + E
[F/D]
(B)
We can partition the variance in several waysVar(B) = RVarD(B) + VarD(B)
= RVarD(B) + RVar
[F/D]
(B) + VarD[F (B)= RVarD[F (B) + VarD[F (B)
The partial resolved variance matrix of B by F given D is
RVar
[F/D]
(B) = Var(E[F/D]
(B))
Bayes Linear for Dummies
Partial Analysis
Diagnostics and Path correlation
Every summary and diagnostic which we have alreadydiscussed can be calculated for the partial adjustment
There is an extra diagnostic available for partial adjustments
Every belief adjustment changes our beliefs. This change isencapsulated in the bearing for that adjustment.
If we do multiple partial adjustments, these changes mayreinforce or contradict one another
We can assess this by the path correlation
PC(d , [f /d ]) = Corr
�Zd(B), Z
[f /d ]
(B)�.
If this is near +1, then we may view the two collections ascomplementary
If this is near -1, then the two collections are givingcontradictory messages
Bayes Linear for Dummies
The end
The end
We have seen:
How we represent our beliefs – using expectation as primitive
How we would update our beliefs – the BL adjustment
How we can investigate potential problems in our beliefspecification – diagnostics
How we can understand how our beliefs are a↵ected by thedata – canonical analysis
How we would incorporate additional information – partialanalysis