+ All Categories
Home > Documents > CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap...

CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap...

Date post: 27-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
101
CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in your email subject line “Probabilistic Reasoning” Eakta Jain [email protected] With slides from: Stuart Russell, Hwee Tau Ng, Dan Klein, Pieter Abbeel
Transcript
Page 1: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

CAP 4621 ARTIFICIAL INTELLIGENCEReminder: [CAP4621] in your email subject line

“Probabilistic Reasoning”

Eakta Jain

[email protected]

With slides from: Stuart Russell, Hwee Tau Ng, Dan Klein, Pieter Abbeel

Page 2: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Wake up! (2 minutes)

Page 3: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Module 1• Objectives

– Interpret a given Bayesian network with respect to independence and conditional independence

– Construct a Bayesian network given a problem statement

Page 4: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

Page 5: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

• Takeaways from previous week

Page 6: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

• Takeaways from previous week– Examples we discussed

• Joint probability table of Weather, Toothache, Cavity, Catch

• Joint probability table of symptoms and diseases for an intelligent medical assistant

Page 7: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

• Takeaways from previous week• Independence and conditionalindependence simplify our probabilistic representation of the world

Page 8: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

• Takeaways from previous week• Independence and conditionalindependence simplify our probabilistic representation of the world– Example: Weather is independent of

(Toothache, Catch, Cavity)

Page 9: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Recap

• Probability models are a representation of our uncertain knowledge about the world

• Takeaways from previous week• Independence and conditionalindependence simplify our probabilistic representation of the world– Example: Weather is independent of

(Toothache, Catch, Cavity)– This allows us to reduce the size of our joint

probability table from 32 to 8+4=12

Page 10: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian Networks

• Bayesian networks are a way to represent independence and conditional independence relationships

Page 11: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian Networks

• Bayesian networks are a way to represent independence and conditional independence relationships

• We will define the syntax and semantics of Bayesian networks

Page 12: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian Networks

• Bayesian networks are a way to represent independence and conditional independence relationships

• We will define the syntax and semantics of Bayesian networks

• We will discuss how probabilistic inference can be performed in practical situations

Page 13: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian network

• Also called belief network, graphical model, causal network, knowledge map

Page 14: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian network

• Also called belief network, graphical model, causal network, knowledge map

• A directed graphExample

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

Page 15: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian network

• Also called belief network, graphical model, causal network, knowledge map

• A directed graph

Representation of our belief that Weather is independent of other three variables, Cavity causes Toothache and Catch

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

Page 16: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian network

• Also called belief network, graphical model, causal network, knowledge map

• A directed acyclic graph

Representation of our belief that Weather is independent of other three variables, Cavity causes Toothache and Catch

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

Page 17: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian network

• Also called belief network, graphical model, causal network, knowledge map

• A directed acyclic graph

Representation of the joint probability model

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

Page 18: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

Representation of the joint probability modelAbbreviate variables: W, C, T, H

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

Page 19: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

Representation of the joint probability modelAbbreviate variables: W, C, T, H

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

Page 20: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

Representation of the joint probability modelAbbreviate variables: W, C, T, H

Example

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

Let’s expand the right hand side and simplify based on the probability rules we studied last time

Page 21: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

Page 22: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

P (W,C, T,H) = P (W )P (C, T,H)

Page 23: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

P (W,C, T,H) = P (W )P (C, T,H)

P (W,C, T,H) = P (W )P (T |C,H)P (C,H)

Page 24: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

P (W,C, T,H) = P (W )P (C, T,H)

P (W,C, T,H) = P (W )P (T |C,H)P (C,H)

P (W,C, T,H) = P (W )P (T |C)P (C,H)

Page 25: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Representing the full joint distribution

P (W,C, T,H) = P (W |C, T,H)P (C, T,H)

P (W,C, T,H) = P (W )P (C, T,H)

P (W,C, T,H) = P (W )P (T |C,H)P (C,H)

P (W,C, T,H) = P (W )P (T |C)P (C,H)

P (W,C, T,H) = P (W )P (T |C)P (H|C)P (C)

Page 26: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Bayesian networkExample

Topology of network encodes conditional independence assertions:

Weather Cavity

Toothache Catch

Weather is independent of the other variables

Toothache and Catch are conditionally independent given Cavity

Chapter 14.1–3 4

P (W,C, T,H) = P (W )P (T |C)P (H|C)P (C)

P (x1, x2, ..., xn) = ⇧P (xi|Parents(Xi))

Page 27: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Classroom Exercise

• Handout

Page 28: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Module 2• Objectives

– Construct a Bayesian network given a problem statement

Page 29: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

A method to construct Bayesian

networks

• First, determine the set of variables that are

required to model the domain.

• Order then {X1, X2, …, Xn}

– Any order will work, but the resulting network will

be more compact if variables are ordered by

cause preceding effect

• Loop from i to n

– For each node Xi, choose the minimal set of

parents for Xi from the list X1,…Xi-1

– Insert a link from each parent to Xi

Page 30: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks• Many real world problems

involve continuous quantities• Example

– Subsidy (is the govt subsidy scheme in operation?)

– Buys (does the customer buy the fruit?)

– Cost (price of fruit)– Harvest (quantity of fruit

harvested)

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)

Page 31: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks• Many real world problems

involve continuous quantities• Example

– Subsidy (is the govt subsidy scheme in operation?)

– Buys (does the customer buy the fruit?)

– Cost (price of fruit)– Harvest (quantity of fruit

harvested)

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)

Which ones are discrete/continuous?

Page 32: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)

Page 33: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The discrete parent is handled by writing

outP(Cost|Harvest, subsidy)P(Cost|Harvest, !subsidy)

Page 34: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The continuous parent is handled by the

use of a linear Gaussian distribution

Page 35: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The continuous parent is handled by the

use of a linear Gaussian distribution

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)

1058 Appendix A. Mathematical background

The density function must be nonnegative for all x and must have!

−∞

P (x) dx= 1 .

We can also define a cumulative probability density function FX(x), which is the proba-CUMULATIVE

PROBABILITY

DENSITY FUNCTION

bility of a random variable being less than x:

FX(x) = P (X ≤ x) =

! x

−∞

P (u) du .

Note that the probability density function has units, whereas the discrete probability functionis unitless. For example, if values of X are measured in seconds, then the density is measuredin Hz (i.e., 1/sec). If values of X are points in three-dimensional space measured in meters,then density is measured in 1/m3.

One of the most important probability distributions is the Gaussian distribution, alsoGAUSSIAN

DISTRIBUTION

known as the normal distribution. A Gaussian distribution with mean µ and standard devi-ation σ (and therefore variance σ

2) is defined as

P (x)=1

σ

√2π

e−(x−µ)2/(2σ2)

,

where x is a continuous variable ranging from −∞ to +∞. With mean µ = 0 and varianceσ

2 = 1, we get the special case of the standard normal distribution. For a distribution overSTANDARD NORMAL

DISTRIBUTION

a vector x in n dimensions, there is the multivariate Gaussian distribution:MULTIVARIATE

GAUSSIAN

P (x)=1

"(2π)n|Σ|

e−

1

2

(x−µ)⊤Σ−1

(x−µ)

,

where µ is the mean vector and Σ is the covariance matrix (see below).In one dimension, we can define the cumulative distribution function F (x) as theCUMULATIVE

DISTRIBUTION

probability that a random variable will be less than x. For the normal distribution, this is

F (x)=

x!

−∞

P (z)dz =1

2(1 + erf(

z − µ

σ

√2

)) ,

where erf(x) is the so-called error function, which has no closed-form representation.The central limit theorem states that the distribution formed by sampling n indepen-CENTRAL LIMIT

THEOREM

dent random variables and taking their mean tends to a normal distribution as n tends toinfinity. This holds for almost any collection of random variables, even if they are not strictlyindependent, unless the variance of any finite subset of variables dominates the others.

The expectation of a random variable, E(X), is the mean or average value, weightedEXPECTATION

by the probability of each value. For a discrete variable it is:

E(X)=#

i

xi P (X =xi) .

For a continuous variable, replace the summation with an integral over the probability densityfunction, P (x):

E(X)=

∞!

−∞

xP (x) dx ,

Page 36: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The continuous parent is handled by the

use of a linear Gaussian distribution

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)

The mean value of cost increases linearly with harvest(Remember, coefficients can be negative)

Page 37: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The continuous parent is handled by the

use of a linear Gaussian distribution

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)One set of coefficients for when subsidy is

operational, and one set of coefficients for when it is not

Page 38: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Cost|Harvest, Subsidy)• The continuous parent is handled by the

use of a linear Gaussian distribution

520 Chapter 14. Probabilistic Reasoning

HarvestSubsidy

Buys

Cost

Figure 14.5 A simple network with discrete variables (Subsidy and Buys) and continuousvariables (Harvest and Cost ).

possible values into a fixed set of intervals. For example, temperatures could be divided into(<0oC), (0oC−100oC), and (>100oC). Discretization is sometimes an adequate solution,but often results in a considerable loss of accuracy and very large CPTs. The most com-mon solution is to define standard families of probability density functions (see Appendix A)that are specified by a finite number of parameters. For example, a Gaussian (or normal)PARAMETER

distribution N(µ,σ2)(x) has the mean µ and the variance σ

2 as parameters. Yet anothersolution—sometimes called a nonparametric representation—is to define the conditionalNONPARAMETRIC

distribution implicitly with a collection of instances, each containing specific values of theparent and child variables. We explore this approach further in Chapter 18.

A network with both discrete and continuous variables is called a hybrid Bayesiannetwork. To specify a hybrid network, we have to specify two new kinds of distributions:HYBRID BAYESIAN

NETWORK

the conditional distribution for a continuous variable given discrete or continuous parents;and the conditional distribution for a discrete variable given continuous parents. Consider thesimple example in Figure 14.5, in which a customer buys some fruit depending on its cost,which depends in turn on the size of the harvest and whether the government’s subsidy schemeis operating. The variable Cost is continuous and has continuous and discrete parents; thevariable Buys is discrete and has a continuous parent.

For the Cost variable, we need to specify P(Cost |Harvest ,Subsidy). The discreteparent is handled by enumeration—that is, by specifying both P (Cost |Harvest , subsidy)

and P (Cost |Harvest ,¬subsidy). To handle Harvest , we specify how the distribution overthe cost c depends on the continuous value h of Harvest . In other words, we specify theparameters of the cost distribution as a function of h. The most common choice is the linearGaussian distribution, in which the child has a Gaussian distribution whose mean µ variesLINEAR GAUSSIAN

linearly with the value of the parent and whose standard deviation σ is fixed. We need twodistributions, one for subsidy and one for ¬subsidy , with different parameters:

P (c | h, subsidy) = N(ath + bt,σ2t )(c) =

1

σt

√2π

e−

1

2

c−(ath+bt)

σt

”2

P (c | h,¬subsidy) = N(afh + bf ,σ2f )(c) =

1

σf

√2π

e

1

2

c−(af h+bf )

σf

«2

.

For this example, then, the conditional distribution for Cost is specified by naming the linearGaussian distribution and providing the parameters at, bt, σt, af , bf , and σf . Figures 14.6(a)One set of coefficients for when subsidy is

operational, and one set of coefficients for when it is not

Section 14.3. Efficient Representation of Conditional Distributions 521

0 2 4 6 8 10Cost c02 46 81012

Harvest h

00.10.20.30.4

P(c | h, subsidy)

0 2 4 6 8 10Cost c02 46 81012

Harvest h

00.10.20.30.4

P(c | h, ¬subsidy)

0 2 4 6 8 10Cost c02 46 81012

Harvest h

00.10.20.30.4

P(c | h)

(a) (b) (c)

Figure 14.6 The graphs in (a) and (b) show the probability distribution over Cost as afunction of Harvest size, with Subsidy true and false, respectively. Graph (c) shows thedistribution P (Cost |Harvest), obtained by summing over the two subsidy cases.

and (b) show these two relationships. Notice that in each case the slope is negative, becausecost decreases as supply increases. (Of course, the assumption of linearity implies that thecost becomes negative at some point; the linear model is reasonable only if the harvest size islimited to a narrow range.) Figure 14.6(c) shows the distribution P (c | h), averaging over thetwo possible values of Subsidy and assuming that each has prior probability 0.5. This showsthat even with very simple models, quite interesting distributions can be represented.

The linear Gaussian conditional distribution has some special properties. A networkcontaining only continuous variables with linear Gaussian distributions has a joint distribu-tion that is a multivariate Gaussian distribution (see Appendix A) over all the variables (Exer-cise 14.9). Furthermore, the posterior distribution given any evidence also has this property.3

When discrete variables are added as parents (not as children) of continuous variables, thenetwork defines a conditional Gaussian, or CG, distribution: given any assignment to theCONDITIONAL

GAUSSIAN

discrete variables, the distribution over the continuous variables is a multivariate Gaussian.Now we turn to the distributions for discrete variables with continuous parents. Con-

sider, for example, the Buys node in Figure 14.5. It seems reasonable to assume that thecustomer will buy if the cost is low and will not buy if it is high and that the probability ofbuying varies smoothly in some intermediate region. In other words, the conditional distribu-tion is like a “soft” threshold function. One way to make soft thresholds is to use the integralof the standard normal distribution:

Φ(x) =

! x

−∞

N(0, 1)(x)dx .

Then the probability of Buys given Cost might be

P (buys |Cost = c) = Φ((−c + µ)/σ) ,

which means that the cost threshold occurs around µ, the width of the threshold region is pro-portional to σ, and the probability of buying decreases as cost increases. This probit distri-

3 It follows that inference in linear Gaussian networks takes only O(n3) time in the worst case, regardless of the

network topology. In Section 14.4, we see that inference for networks of discrete variables is NP-hard.

Page 39: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Buys|Cost)• The continuous parent of a discrete variable

Page 40: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Buys|Cost)• The continuous parent of a discrete variable

522 Chapter 14. Probabilistic Reasoning

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(c

)

Cost c

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(b

uys

| c)

Cost c

LogitProbit

(a) (b)

Figure 14.7 (a) A normal (Gaussian) distribution for the cost threshold, centered onµ =6.0 with standard deviation σ =1.0. (b) Logit and probit distributions for the probabilityof buys given cost , for the parameters µ =6.0 and σ = 1.0.

bution (pronounced “pro-bit” and short for “probability unit”) is illustrated in Figure 14.7(a).PROBIT

DISTRIBUTION

The form can be justified by proposing that the underlying decision process has a hard thresh-old, but that the precise location of the threshold is subject to random Gaussian noise.

An alternative to the probit model is the logit distribution (pronounced “low-jit”). ItLOGIT DISTRIBUTION

uses the logistic function 1/(1 + e−x) to produce a soft threshold:LOGISTIC FUNCTION

P (buys |Cost = c) =1

1 + exp(−2−c+µ

σ)

.

This is illustrated in Figure 14.7(b). The two distributions look similar, but the logit actuallyhas much longer “tails.” The probit is often a better fit to real situations, but the logit is some-times easier to deal with mathematically. It is used widely in neural networks (Chapter 20).Both probit and logit can be generalized to handle multiple continuous parents by taking alinear combination of the parent values.

14.4 EXACT INFERENCE IN BAYESIAN NETWORKS

The basic task for any probabilistic inference system is to compute the posterior probabilitydistribution for a set of query variables, given some observed event—that is, some assign-EVENT

ment of values to a set of evidence variables. To simplify the presentation, we will consideronly one query variable at a time; the algorithms can easily be extended to queries with mul-tiple variables. We will use the notation from Chapter 13: X denotes the query variable; Edenotes the set of evidence variables E1, . . . , Em, and e is a particular observed event; Y willdenotes the nonevidence, nonquery variables Y1, . . . , Yl (called the hidden variables). Thus,HIDDEN VARIABLE

the complete set of variables is X = {X}∪E∪Y. A typical query asks for the posteriorprobability distribution P(X | e).

522 Chapter 14. Probabilistic Reasoning

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(c

)

Cost c

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(b

uys

| c)

Cost c

LogitProbit

(a) (b)

Figure 14.7 (a) A normal (Gaussian) distribution for the cost threshold, centered onµ =6.0 with standard deviation σ =1.0. (b) Logit and probit distributions for the probabilityof buys given cost , for the parameters µ =6.0 and σ = 1.0.

bution (pronounced “pro-bit” and short for “probability unit”) is illustrated in Figure 14.7(a).PROBIT

DISTRIBUTION

The form can be justified by proposing that the underlying decision process has a hard thresh-old, but that the precise location of the threshold is subject to random Gaussian noise.

An alternative to the probit model is the logit distribution (pronounced “low-jit”). ItLOGIT DISTRIBUTION

uses the logistic function 1/(1 + e−x) to produce a soft threshold:LOGISTIC FUNCTION

P (buys |Cost = c) =1

1 + exp(−2−c+µ

σ)

.

This is illustrated in Figure 14.7(b). The two distributions look similar, but the logit actuallyhas much longer “tails.” The probit is often a better fit to real situations, but the logit is some-times easier to deal with mathematically. It is used widely in neural networks (Chapter 20).Both probit and logit can be generalized to handle multiple continuous parents by taking alinear combination of the parent values.

14.4 EXACT INFERENCE IN BAYESIAN NETWORKS

The basic task for any probabilistic inference system is to compute the posterior probabilitydistribution for a set of query variables, given some observed event—that is, some assign-EVENT

ment of values to a set of evidence variables. To simplify the presentation, we will consideronly one query variable at a time; the algorithms can easily be extended to queries with mul-tiple variables. We will use the notation from Chapter 13: X denotes the query variable; Edenotes the set of evidence variables E1, . . . , Em, and e is a particular observed event; Y willdenotes the nonevidence, nonquery variables Y1, . . . , Yl (called the hidden variables). Thus,HIDDEN VARIABLE

the complete set of variables is X = {X}∪E∪Y. A typical query asks for the posteriorprobability distribution P(X | e).

Page 41: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Hybrid Bayesian Networks

• P(Buys|Cost)• The continuous parent of a discrete variable

522 Chapter 14. Probabilistic Reasoning

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(c

)

Cost c

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(b

uys

| c)

Cost c

LogitProbit

(a) (b)

Figure 14.7 (a) A normal (Gaussian) distribution for the cost threshold, centered onµ =6.0 with standard deviation σ =1.0. (b) Logit and probit distributions for the probabilityof buys given cost , for the parameters µ =6.0 and σ = 1.0.

bution (pronounced “pro-bit” and short for “probability unit”) is illustrated in Figure 14.7(a).PROBIT

DISTRIBUTION

The form can be justified by proposing that the underlying decision process has a hard thresh-old, but that the precise location of the threshold is subject to random Gaussian noise.

An alternative to the probit model is the logit distribution (pronounced “low-jit”). ItLOGIT DISTRIBUTION

uses the logistic function 1/(1 + e−x) to produce a soft threshold:LOGISTIC FUNCTION

P (buys |Cost = c) =1

1 + exp(−2−c+µ

σ)

.

This is illustrated in Figure 14.7(b). The two distributions look similar, but the logit actuallyhas much longer “tails.” The probit is often a better fit to real situations, but the logit is some-times easier to deal with mathematically. It is used widely in neural networks (Chapter 20).Both probit and logit can be generalized to handle multiple continuous parents by taking alinear combination of the parent values.

14.4 EXACT INFERENCE IN BAYESIAN NETWORKS

The basic task for any probabilistic inference system is to compute the posterior probabilitydistribution for a set of query variables, given some observed event—that is, some assign-EVENT

ment of values to a set of evidence variables. To simplify the presentation, we will consideronly one query variable at a time; the algorithms can easily be extended to queries with mul-tiple variables. We will use the notation from Chapter 13: X denotes the query variable; Edenotes the set of evidence variables E1, . . . , Em, and e is a particular observed event; Y willdenotes the nonevidence, nonquery variables Y1, . . . , Yl (called the hidden variables). Thus,HIDDEN VARIABLE

the complete set of variables is X = {X}∪E∪Y. A typical query asks for the posteriorprobability distribution P(X | e).

522 Chapter 14. Probabilistic Reasoning

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(c

)

Cost c

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(b

uys

| c)

Cost c

LogitProbit

(a) (b)

Figure 14.7 (a) A normal (Gaussian) distribution for the cost threshold, centered onµ =6.0 with standard deviation σ =1.0. (b) Logit and probit distributions for the probabilityof buys given cost , for the parameters µ =6.0 and σ = 1.0.

bution (pronounced “pro-bit” and short for “probability unit”) is illustrated in Figure 14.7(a).PROBIT

DISTRIBUTION

The form can be justified by proposing that the underlying decision process has a hard thresh-old, but that the precise location of the threshold is subject to random Gaussian noise.

An alternative to the probit model is the logit distribution (pronounced “low-jit”). ItLOGIT DISTRIBUTION

uses the logistic function 1/(1 + e−x) to produce a soft threshold:LOGISTIC FUNCTION

P (buys |Cost = c) =1

1 + exp(−2−c+µ

σ)

.

This is illustrated in Figure 14.7(b). The two distributions look similar, but the logit actuallyhas much longer “tails.” The probit is often a better fit to real situations, but the logit is some-times easier to deal with mathematically. It is used widely in neural networks (Chapter 20).Both probit and logit can be generalized to handle multiple continuous parents by taking alinear combination of the parent values.

14.4 EXACT INFERENCE IN BAYESIAN NETWORKS

The basic task for any probabilistic inference system is to compute the posterior probabilitydistribution for a set of query variables, given some observed event—that is, some assign-EVENT

ment of values to a set of evidence variables. To simplify the presentation, we will consideronly one query variable at a time; the algorithms can easily be extended to queries with mul-tiple variables. We will use the notation from Chapter 13: X denotes the query variable; Edenotes the set of evidence variables E1, . . . , Em, and e is a particular observed event; Y willdenotes the nonevidence, nonquery variables Y1, . . . , Yl (called the hidden variables). Thus,HIDDEN VARIABLE

the complete set of variables is X = {X}∪E∪Y. A typical query asks for the posteriorprobability distribution P(X | e).

Logistic function: used in neural networks

Page 42: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

CAP 4621 ARTIFICIAL INTELLIGENCEReminder: [CAP4621] in your email subject line

“Probabilistic Reasoning”

Eakta Jain

[email protected]

With slides from: Stuart Russell, Hwee Tau Ng, Dan Klein, Pieter Abbeel

Page 43: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Wake up! (2 minutes)

Page 44: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Announcements

• Thank you for the mid-semester feedback

Page 45: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Announcements

• Common themes– Advantage of coming to class is staying on

top of course content and not having things back up

– Morning class – too early!

Page 46: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Announcements

• Suggestions from you that I will incorporate going forward– Hard to take notes à Lecture slides online, I

will try to go slower on slides– Confused about how readings, homeworks fit

in à I will make this clearer– Not enough time for group work à More time

for group work, less follow along?

Page 47: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Announcements• Things that you can do to improve

experience in this class– “xyz is not clear”, “TA hours N/A”à Go to TA

and instructor office hours because classroom is not an ideal environment for individualized attention

Page 48: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Announcements• Things that you can do to improve

experience in this class– “xyz is not clear”, “TA hours N/A”à Go to TA

and instructor office hours because classroom is not an ideal environment for individualized attention

– Confused about how readings, homeworks fit in• Revisit topics list and learning objectives in the

handout for Week 1 • Ask your peers (use Canvas, use the 10 minute

break on Tuesdays)

Page 49: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Module 3• Objectives

– Perform inference by enumeration to answer queries on Bayesian networks

Page 50: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

What does “Inference” mean?

• Compute the posterior probability for a set of query variables given the values of some evidence variables

• Reading: Sec 14.4 of textbook

Page 51: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Page 52: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable?

Page 53: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable?

Burglary is a variable, abbreviated BIt can take on two values: 0 or 1 (True or False)This is abbreviated as “b” or “!b”

Page 54: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable?

P(B) is a distribution It refers to two numbers P(b) and P(!b)

Page 55: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable? Posterior probability

P(B|J,M) is the posterior probability distribution of the query B given the evidence variables J and M.It refers to a table which enumerates P(b) for every value that J and M can take.

Page 56: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable? Posterior probability

Page 57: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable? Posterior probability

Alarm is a hidden variable, i.e., non-query, non-evidence variables.

Page 58: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Evidence variable?Query variable? Posterior probability

Alarm is a hidden variable, i.e., non-query, non-evidence variables.

When the query is on Burglary, then Earthquake is a hidden variable.

Page 59: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

139

.001

P(B)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

tf

.90

.05

B

t

t

f

f

E

t

f

t

f

P(A)

.95

.29.001

.94

.002

P(E)

A P(M)

tf

.70

.01

Figure 14.2 FILES: figures/burglary2.eps (Tue Nov 3 16:22:29 2009). A typical Bayesian net-work, showing both the topology and the conditional probability tables (CPTs). In the CPTs, the lettersB, E, A, J , and M stand for Burglary , Earthquake , Alarm , JohnCalls, and MaryCalls , respec-tively.

Inference algorithms compute P(B|J,M)More generally, inference algorithms compute the distribution P(X|e)

Page 60: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Inference by enumeration

• Goal: Compute P (X|e)

Page 61: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Inference by enumeration

• Goal: Compute• Recall:

P (X|e)

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 62: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Inference by enumeration

• Goal: Compute• Recall:

P (X|e)

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Chapter 13, Equation 13.9Revisit Sec 13.3 if needed

Page 63: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Inference by enumeration

• Goal: Compute• Recall:

• For a Bayesian network, the joint probability is computed using chain rule

P (X|e)

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

P (x1, x2, ..., xn) = ⇧P (xi|Parents(Xi))

Page 64: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Query:•

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 65: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Query:• Hidden variables: Earthquake and Alarm

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 66: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Query:• Hidden variables: Earthquake and Alarm• Shorthand notation:

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 67: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Start by using product rule and replacing denominator with normalization constant

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 68: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Start by using product rule and replacing denominator with normalization constant

• Then sum over hidden variables

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 69: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Start by using product rule and replacing denominator with normalization constant

• Then sum over hidden variables

• Use chain rule

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 70: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Start by using product rule and replacing denominator with normalization constant

• Then sum over hidden variables

• Use chain rule

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Anyone confused about B vs b?

Page 71: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• How many terms are added in this formula?

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 72: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• How many terms are added in this formula?

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

E = 0 or 1A = 0 or 12^k terms, for k Boolean hidden variables

Page 73: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Can we do better?

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 74: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Example

• Can we do better?

• Collect the repeated terms

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 75: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

CAP 4621 ARTIFICIAL INTELLIGENCEReminder: [CAP4621] in your email subject line

“Probabilistic Reasoning”

Eakta Jain

[email protected]

With slides from: Stuart Russell, Hwee Tau Ng, Dan Klein, Pieter Abbeel

Page 76: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Wake up! (2 minutes)

Page 77: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Module 3• Objectives

– Perform inference by enumeration to answer queries on Bayesian networks

Page 78: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Finish up Classroom Exercise

• Answer for manual computation

• Recognize that inference by enumerationis a depth first traversal of an expression tree

Page 79: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Algorithm to perform inference

14 PROBABILISTICREASONING

function ENUMERATION-ASK(X , e, bn) returns a distribution over Xinputs: X , the query variable

e, observed values for variables Ebn , a Bayes net with variables {X} ∪ E ∪ Y /* Y = hidden variables */

Q(X )← a distribution over X , initially emptyfor each value xi of X doQ(xi)← ENUMERATE-ALL(bn .VARS, exi

)where exi

is e extended withX = xi

return NORMALIZE(Q(X))

function ENUMERATE-ALL(vars , e) returns a real numberif EMPTY?(vars) then return 1.0Y ← FIRST(vars)if Y has value y in ethen return P (y | parents(Y )) × ENUMERATE-ALL(REST(vars), e)else return

P

y P (y | parents(Y )) × ENUMERATE-ALL(REST(vars), ey)where ey is e extended with Y = y

Figure 14.9 The enumeration algorithm for answering queries on Bayesian networks.

function ELIMINATION-ASK(X , e, bn) returns a distribution over Xinputs: X , the query variable

e, observed values for variables Ebn , a Bayesian network specifying joint distribution P(X1, . . . , Xn)

factors← [ ]for each var in ORDER(bn .VARS) do

factors← [MAKE-FACTOR(var , e)|factors ]if var is a hidden variable then factors← SUM-OUT(var , factors )

return NORMALIZE(POINTWISE-PRODUCT(factors ))

Figure 14.10 The variable elimination algorithm for inference in Bayesian networks.

33

Page 80: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Algorithm to perform inference

14 PROBABILISTICREASONING

function ENUMERATION-ASK(X , e, bn) returns a distribution over Xinputs: X , the query variable

e, observed values for variables Ebn , a Bayes net with variables {X} ∪ E ∪ Y /* Y = hidden variables */

Q(X )← a distribution over X , initially emptyfor each value xi of X doQ(xi)← ENUMERATE-ALL(bn .VARS, exi

)where exi

is e extended withX = xi

return NORMALIZE(Q(X))

function ENUMERATE-ALL(vars , e) returns a real numberif EMPTY?(vars) then return 1.0Y ← FIRST(vars)if Y has value y in ethen return P (y | parents(Y )) × ENUMERATE-ALL(REST(vars), e)else return

P

y P (y | parents(Y )) × ENUMERATE-ALL(REST(vars), ey)where ey is e extended with Y = y

Figure 14.9 The enumeration algorithm for answering queries on Bayesian networks.

function ELIMINATION-ASK(X , e, bn) returns a distribution over Xinputs: X , the query variable

e, observed values for variables Ebn , a Bayesian network specifying joint distribution P(X1, . . . , Xn)

factors← [ ]for each var in ORDER(bn .VARS) do

factors← [MAKE-FACTOR(var , e)|factors ]if var is a hidden variable then factors← SUM-OUT(var , factors )

return NORMALIZE(POINTWISE-PRODUCT(factors ))

Figure 14.10 The variable elimination algorithm for inference in Bayesian networks.

33

Page 81: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Classroom Exercise

• Open handout again• Evaluate

• Step your way through the ENUMERATION-ASK algorithm to perform inference by enumeration

• Points of group discussion: what is the tree being traversed? Is this depth first or breadth first traversal?

Section 14.4. Exact Inference in Bayesian Networks 523

In the burglary network, we might observe the event in which JohnCalls = true andMaryCalls = true . We could then ask for, say, the probability that a burglary has occurred:

P(Burglary | JohnCalls = true,MaryCalls = true) = ⟨0.284, 0.716⟩ .

In this section we discuss exact algorithms for computing posterior probabilities and willconsider the complexity of this task. It turns out that the general case is intractable, so Sec-tion 14.5 covers methods for approximate inference.

14.4.1 Inference by enumeration

Chapter 13 explained that any conditional probability can be computed by summing termsfrom the full joint distribution. More specifically, a query P(X | e) can be answered usingEquation (13.9), which we repeat here for convenience:

P(X | e) = α P(X, e) = α

!

y

P(X, e, y) .

Now, a Bayesian network gives a complete representation of the full joint distribution. Morespecifically, Equation (14.2) on page 513 shows that the terms P (x, e, y) in the joint distri-bution can be written as products of conditional probabilities from the network. Therefore, aquery can be answered using a Bayesian network by computing sums of products of condi-tional probabilities from the network.

Consider the query P(Burglary | JohnCalls = true,MaryCalls = true). The hiddenvariables for this query are Earthquake and Alarm . From Equation (13.9), using initialletters for the variables to shorten the expressions, we have4

P(B | j,m) = α P(B, j,m) = α

!

e

!

a

P(B, j,m, e, a, ) .

The semantics of Bayesian networks (Equation (14.2)) then gives us an expression in termsof CPT entries. For simplicity, we do this just for Burglary = true:

P (b | j,m) = α

!

e

!

a

P (b)P (e)P (a | b, e)P (j | a)P (m | a) .

To compute this expression, we have to add four terms, each computed by multiplying fivenumbers. In the worst case, where we have to sum out almost all the variables, the complexityof the algorithm for a network with n Boolean variables is O(n2n).

An improvement can be obtained from the following simple observations: the P (b)

term is a constant and can be moved outside the summations over a and e, and the P (e) termcan be moved outside the summation over a. Hence, we have

P (b | j,m) = αP (b)!

e

P (e)!

a

P (a | b, e)P (j | a)P (m | a) . (14.4)

This expression can be evaluated by looping through the variables in order, multiplying CPTentries as we go. For each summation, we also need to loop over the variable’s possible

4 An expression such asP

eP (a, e) means to sum P (A = a, E = e) for all possible values of e. When E is

Boolean, there is an ambiguity in that P (e) is used to mean both P (E = true) and P (E = e), but it should beclear from context which is intended; in particular, in the context of a sum the latter is intended.

Page 82: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Expression tree

145

P(j|a).90

P(m|a).70 .01

P(m|¬a)

.05P( j|¬a) P( j|a)

.90

P(m|a).70 .01

P(m|¬a)

.05P( j|¬a)

P(b).001

P(e).002

P(¬e).998

P(a|b,e).95 .06

P(¬a|b,¬e).05P(¬a|b,e)

.94P(a|b,¬e)

Figure 14.8 FILES: figures/enumeration-tree.eps (Tue Nov 3 16:22:41 2009). The structure ofthe expression shown in Equation (??). The evaluation proceeds top down, multiplying values alongeach path and summing at the “+” nodes. Notice the repetition of the paths for j andm.

Page 83: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example

• Recommender systems• Examples?

Page 84: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)• U = user’s profile, e.g., age, sex• S = user’s situation, e.g., location• C = movie attributes, e.g., ??

Page 85: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)• U = user’s profile, e.g., age, sex• S = user’s situation, e.g., location• C = movie attributes, e.g., genre, director,

lead actor

Page 86: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)• U = user’s profile, e.g., age, sex• S = user’s situation, e.g., location• C = movie attributes, e.g., genre, director,

lead actor• V = user’s rating of given movie

Page 87: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)

• Why? What are the types of inferences you can make with this model?

Page 88: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)

• Why? What are the types of inferences you can make with this model?– Find the movies that a user is likely to rate

highly, i.e., compute P(V|u,s,c) for the target user u, given situation s, and movie attributes c, and then recommend movies in decreasing order of P(V|u,s,c)

Page 89: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)

• Why? What are the types of inferences you can make with this model?– Find the movies that a user is likely to rate

highly, i.e., compute P(C|u,s,v) for the target user u, given situation s, and rating v on a given movie, and then find a subset attributes in decreasing order of P(C|u,s,v) and recommend movies with those attributes

Page 90: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Practical Example: MovieRecommendations!

• Learn the probability model P(U,S,C,V)

• Why? What are the types of inferences you can make with this model?– Find the users that are likely to rate a given

movie highly, i.e., compute P(U|c,s,v) for the target movies c, given situation s, and rating v on a given movie, and then send promotional materials to this user

Page 91: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Ono C, Kurokawa M, Motomura Y, Asoh H. A context-aware movie preference model using a Bayesian network for recommendation and promotion. In International Conference on User Modeling 2007 Jul 25 (pp. 247-257). Springer, Berlin, Heidelberg.

Page 92: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

Page 93: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

Age Sex Location Genre Director

Laugh Cry

RatingWhere would you put arrows?

Page 94: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

Age Sex Location Genre Director

Laugh Cry

Rating

P(rating|age,sex,location,genre,director) = ??

Page 95: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

Age Sex Location Genre Director

Laugh Cry

Rating

Page 96: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

Page 97: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

Page 98: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

• Ask users

Page 99: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

• Ask users• Enumerate, and count!

Page 100: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

An approach to construct theBayesian network

• Assume a rough network structure based on recommendations by domain experts

• Estimate the conditional probabilities fromdata

• Ask users• Enumerate, and count!

Discussion: Is the network sexist?

Page 101: CAP 4621 ARTIFICIAL INTELLIGENCE Reminder: [CAP4621] in … · 2018. 10. 24. · Recap •Probability models are a representation of our uncertain knowledge about the world •Takeaways

Classroom Exercise (5 minutes)

• What is your one takeaway from this class• What is still unclear• Collector: Collect these for this class and

identify one common item in each category. Put on Canvas.

• PS: putting extra resources on Canvas for these topics counts towards class participation


Recommended