Probabilistic Reasoning Systems
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Chapter 15
• Capturing uncertain knowledge
• Probabilistic inference
CS 471/598 by H. Liu 2
Knowledge representation
Joint probability distribution can answer any question about the domain can become intractably large as #RV grows can be difficult to specify P for atomic events
Conditional independence can simplify P assign’t
A data structure - a belief network that represents the dependence between variables and gives a concise specification of the joint.
CS 471/598 by H. Liu 3
A belief network is a graph: A set of random variables A set of directed links connects pairs of nodes Each node has a conditional P table that
quantifies the effects that the parents have on the node
The graph has no directed cycles (DAG)
It is usually much easier for an expert to decide conditional dependence relationships than specifying probabilities
CS 471/598 by H. Liu 4
Once the network is specified, we need only specify conditional probabilities for the nodes that participate in direct dependencies, and use those to compute any other probabilities.
An example of burglary-alarm-call (Fig 15.1) The topology of the network can be thought of as
the general structure of the causal process. Many details (Mary listening to loud music, or
phone ringing and confusing John) are summarized in the uncertainty associated with the links from Alarm to JohnCalls and MaryCalls.
CS 471/598 by H. Liu 5
The probabilities actually summarize a potentially infinite set of possible circumstances
Specifying the CPT for each node (P 438) A conditioning case - a possible combination of
values for the parent nodes (2^n) Each row in a CPT must sum to 1 A node with no parents has only one row (priors)
Fig 15.2 shows the complete network for the burglary example.
CS 471/598 by H. Liu 6
The semantics of belief networks
Two equivalent views of a belief network Representing the JPD - helpful in constructing
networks Representing conditional independence
relations - helpful in designing inference procedures
CS 471/598 by H. Liu 7
1. Representing JPD - constructing a BN
A belief network provides a complete description of the domain. Every entry in the JPD can be calculated from the info in the network.
A generic entry in the joint is the probability of a conjunction of particular assignments to each variable.
P(x1,…,xn)=P(xi|Parents(xi)) (15.1)
What’s the probability of the event of J^M^A^!B^!E?
CS 471/598 by H. Liu 8
A method for constructing belief networks
Eq 15.1 defines what a given BN means but implies certain conditional independence relationships that can be used to guide the construction.
P(x1,…,xn)=P(xn|xn-1,…,x1)P(xn-1,…,x1)P(Xi|Xi-1,…,X1)=P(Xi|Parents(Xi)) (15.2)
The BN is a correct representation of the domain only if each node is C-indep’t of its predecessors in the node ordering, given its parents.
P(M|J,A,E,B)=P(M|A)
CS 471/598 by H. Liu 9
Incremental network construction
Choose relevant variables describing the domain
Choose an ordering for the variablesWhile there are variables left:
Pick a var and add a node to the network Set its parents to some minimal set of nodes
already in the net to satisfy Eq.15.2 Define the CPT for the var.
CS 471/598 by H. Liu 10
Compactness
A belief network can often be far more compact than the full joint.
In a locally structured system, each sub-component interacts directly with only a bounded number of other components.
Local structure is usually associated with linear rather than exponential growth in complexity.
With 20 nodes, if a node is directly influenced by 5 nodes, what’s the difference between BN & joint?
CS 471/598 by H. Liu 11
Node ordering
The correct order to add nodes is to add the “root causes” first, then the variables they influence, and so on until we reach the leaves that have no direct causal influence on the other variables.
What happens if we happen to choose the wrong order? Fig 15.3 shows an example.
If we stick to a true causal model, we end up having to specify fewer numbers, and the numbers will often be easier to come up with.
CS 471/598 by H. Liu 12
Representation of CPTs
Given canonical distributions, the complete table can be specified by naming the distribution with some parameters.
A deterministic node has its value specified exactly by the values of its parents.
Uncertain relationships can often be characterized by “noisy” logical relationships.
An example on page 444 - The probability that the output node is False is just the product of the noise parameters
for all the input nodes that are true.
CS 471/598 by H. Liu 13
2. Conditional independence relations
In design inference algorithms, we need to know if more general conditional independences hold.
Given a network, can we know if a set of nodes X is independent of another set Y, given a set of evidence nodes E? It boils down to d-separation.
If every undirected path from a node in X to a node in Y is d-separated by E, then X and Y are conditionally independent given E.
CS 471/598 by H. Liu 14
E d-separates X and Y if every undirected path from a node in X to a node in Y is blocked given E.
Three conditions make it possible for a path to be blocked given E: Fig 15.4
Fig 15.5 shows examples of three conditions.
CS 471/598 by H. Liu 15
Inference in belief networks
The basic task is to get P(Query|Evidence).The nature of probabilistic inferences (Fig
15.6) Diagnostic (from effects to causes)
P(Burglary|JohnCalls)
Causal (from causes to effects)P(JohnCalls|Burglary)
Intercausal (between causes of a common effect)P(Burglary|Alarm^Earthquake) vs. P(Burgalary|Alarm)
Mixed (combining two or more of the above)P(A|JohnCalls^!Earthquake), P(B|J^!E)
CS 471/598 by H. Liu 16
Answering queries
Singly connected networks - polytrees (Fig 15.6)
Causal and evidential support (Fig 15.7)The general strategy is
express P(X|E) in terms of E+ and E- Compute E+ Compute E-
Two basic functions (Fig 15.8, p 452) support-except(X,V) evidence-except(X,V)
CS 471/598 by H. Liu 17
Inference in multiply connected belief networks
A multiply connected graph - two nodes are connected by more than one path.
An example - Fig 15.9Three basic classes of algorithms for
evaluating multiply connected networks. Clustering - Fig 15.10 Conditioning - Fig 15.11 Stochastic simulation - logic sampling, or
likelihood weighting
CS 471/598 by H. Liu 18
Knowledge engineering for uncertain reasoning
Decide what to talk aboutDecide on a vocabulary of random
variablesEncode general knowledge about the
dependence Encode a description of the specific
problem instancePose queries to the inference procedure
and get answers
CS 471/598 by H. Liu 19
Case study
The PATHFINDER system (p 457) PATHFINDER I - pure logical reasoning PATHFINDER II - certainty factor, Dempster-
Shafer theory, simplified Bayesian model (independent assumption)
PATHFINDER III - the simplified Bayesian model paying attention to low-probability events
PATHFINDER IV - a belief network to represent the dependencies that couldn’t be handled in the simplified Bayesian model
CS 471/598 by H. Liu 20
Other approaches to uncertain reasoning
Different generations of expert systems Strict logic reasoning (ignore uncertainty) Probabilistic techniques using the full Joint Default reasoning - believed until a better reason is
found to believe something else Rules with certainty factors Handling ignorance - Dempster-Shafer theory Vagueness - something is sort of true (fuzzy logic)
Probability makes the same ontological commitment as logic: the event is true or false
CS 471/598 by H. Liu 21
Default reasoning
The four-wheel car conclusion is reached by default.
New evidence can cause the conclusion retracted - FOL is strictly monotonic.
Representatives are default logic, nonmonotonic logic, circumscription
There are problematic issues Refer to Page 459-360
CS 471/598 by H. Liu 22
Rule-based methods
Logical reasoning systems have properties like: Monotonicity Locality Detachment Truth-functionality
These properties are good for obvious computational advantages; bad as they’re inappropriate for uncertain reasoning.
CS 471/598 by H. Liu 23
Summary
Reasoning properly In FOL, it means conclusions follow from premises In probability, it means having beliefs that allow an agent to
act rationally
Conditional independence info is vital A belief network is a complete representation for the
JPD, but exponetionally smaller in size Belief networks can reason causally, diagnostically,
intercausally, or combining two or more of the three. For polytrees, the computational time is linear in
network size.