+ All Categories
Home > Documents > Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z:...

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z:...

Date post: 13-Dec-2015
Category:
Upload: christal-jones
View: 240 times
Download: 2 times
Share this document with a friend
Popular Tags:
27
Marginalization & Conditioning O ften w an t to d o th is fo r ( ) in stead o f (Y ). R ecall ( ) = ( ) ( ) P P P P P Y|X Y|X X Y X ) z ,..., z , P( ... ) P( ) P( n 1 1 Z Z z z z z n Y Y, Y ) P( ) P( ) P( z z | Y Y z • Marginalization (summing out): for any sets of variables Y and Z: • Conditioning(variant of marginalization):
Transcript
Page 1: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Marginalization & Conditioning

O ften w an t to d o th is fo r ( ) in stead o f (Y ).

R eca ll ( ) = ( )

( )

P P

PP

P

Y |X

Y |XX Y

X

)z,...,z ,P(...)P( )P( n

1

1

Z Zz

zz zn

Y Y,Y

)P()P( )P( zz| YYz

• Marginalization (summing out): for any sets of variables Y and Z:

• Conditioning(variant of marginalization):

Page 2: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Example of Marginalization• Using the full joint distribution

P(cavity) = P(cavity, toothache, catch) + P(cavity, toothache, catch) + P(cavity, toothache, catch) + P(cavity, toothache, catch)= 0.108 +0.012 + 0.072 + 0.008

= 0.2

tootache catch

catch), toothacheP(cavity, P(cavity)

Page 3: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Inference By Enumeration using Full Joint Distribution

• Let X be a random variable about which we want to know its probabilities, given some evidence (values e for a set E of other variables). Let the remaining (unobserved, so-called hidden) variables be Y. The query is P(X|e), and it can be answered using the full joint distribution by

P e P e P e yy

( | ) = (X , ) = (X , , )

X

Page 4: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Example of Inference By Enumeration using Full Joint Distribution

0.4 0.064) (0.016 ) toothache|cavity P(

0.6 0.012) (0.108 ) toothache|P(cavity

5

1 0.064) (0.016 0.012) (0.108

0.064) (0.016

) catch) , toothachecavity,P(

catch) , toothachecavity,(P(

catch) toothache,cavity,P(

) toothachecavity,P( ) toothache|cavity P(

0.012) (0.108

) catch) , toothacheP(cavity,

catch) , toothache(P(cavity,

catch) toothache,P(cavity,

) toothacheP(cavity, ) toothache|P(cavity

catch

catch

Page 5: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Independence

• Propositions a and b are independent if and only if

• Equivalently (by product rule):

• Equivalently:

P ( ) = P ( ) P ( )a b a bP ( | ) = P ( )a b a

P ( | = P ( )b a b)

Page 6: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Illustration of Independence

• We know (product rule) thatP ( ) =

P ( | )

P ( ). B y in d ep en d en ce :

too thache ca tch cavity W ea ther cloudy

W eather cloudy too thache ca tch cavity

too thache ca tch cavity

, , ,

, ,

, ,

P ( | ) =

P ( . T h ere fo re w e h av e th a t

P ( ) =

P ( P ( ).

W eather cloudy too thache ca tch cavity

W eather cloudy

too thache ca tch cavity W eather cloudy

W eather cloudy too thache ca tch cavity

, ,

)

, , ,

) , ,

Page 7: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Illustration continued

• Allows us to represent a 32-element table for full joint on Weather, Toothache, Catch, Cavity by an 8-element table for the joint of Toothache, Catch, Cavity, and a 4-element table for Weather.

• If we add a Boolean variable X to the 8-element table, we get 16 elements. A new 2-element table suffices with independence.

Page 8: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Difficulty with Bayes’ Rule with More than Two Variables

T h e d efin itio n o f B ay es ' R u le ex te n d s n a tu ra lly to

m u ltip le v ariab les:

( =

( ( ).

B u t n o tice th a t to ap p ly it w e m u st k n o w co n d itio n a l

p ro b ab ilitie s lik e

P (

fo r a ll 2 se ttin g s o f th e s an d a ll se ttin g s o f th e s

(assu m in g B o o lean s). M ig h t as w ell u se fu ll jo in t.

P

P P

X X Y Y

Y Y X X X X

y y x x

Y X

m n

n m m

n m

n m

1 1

1 1 1

1 1

2

, . . . , | , . . . , )

, . . . , | , . . . , ) , . . . ,

, . . . , | , . . . , )

Page 9: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Conditional Independence

• X and Y are conditionally independent given Z if and only if P(X,Y|Z) = P(X|Z) P(Y|Z).

• Y1,…,Yn are conditionally independent given X1,…,Xm if and only if P(Y1,…,Yn|X1,…,Xm)= P(Y1|X1,…,Xm) P(Y2|X1,…,Xm) … P(Ym|X1,…,Xm).

• We’ve reduced 2n2m to 2n2m. Additional conditional independencies may reduce 2m.

Page 10: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Conditional Independence

• As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used:

P(X|Y, Z) = P(X|Z) and

P(Y|X, Z) = P(Y|Z)

Page 11: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Benefits of Conditional Independence

• Allows probabilistic systems to scale up (tabular representations of full joint distributions quickly become too large.)

• Conditional independence is much more commonly available than is absolute independence.

Page 12: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Decomposing a Full Joint by Conditional Independence

• Might assume Toothache and Catch are conditionally independent given Cavity: P(Toothache,Catch|Cavity) = P(Toothache|Cavity) P(Catch|Cavity).

• Then P(Toothache,Catch,Cavity) =[product rule] P(Toothache,Catch|Cavity) P(Cavity) =[conditional independence] P(Toothache|Cavity) P(Catch|Cavity) P(Cavity).

Page 13: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Naive Bayes Algorithm

Let Fi be the i-th feature having valuej and Out be the target feature.

• We can use training data to estimate:

P(Fi = vj)

P(Fi = vj | Out = True)

P(Fi = vj | Out = False)

P(Out = True)

P(Out = False)

Page 14: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Naive Bayes Algorithm

•For a test example described by F1 = v1 , ..., Fn = vn , we need to compute:

P(Out = True | F1 = v1 , ..., Fn = vn )

Applying Bayes rule:

P(Out = True | F1 = v1 , ..., Fn = vn ) =

P(F1 = v1 , ..., Fn = vn | Out = True) P(Out = True)

_______________________________________

P(F1 = v1 , ..., Fn = vn)

Page 15: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Naive Bayes Algorithm

•By independence assumption:

P(F1 = v1 , ..., Fn = vn) = P(F1 = v1 )x ...x P(Fn = vn)

•This leads to conditional independence:

P(F1 = v1 , ..., Fn = vn | Out = True) =

P(F1 = v1 | Out = True) x ...x P(Fn = vn | Out = True)

Page 16: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Naive Bayes Algorithm

P(Out = True | F1 = v1 , ..., Fn = vn ) =

P(F1 = v1 | Out = True) x ...x P(Fn = vn | Out = True)x P(Out = True)

_______________________________________

P(F1 = v1 )x ...x P(Fn = vn)

•All terms are computed using the training data!

•Works well despite of strong assumptions(see [Domingos and Pazzani MLJ 97]) and thus provides a simple benchmark testset accuracy for a new data set

Page 17: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Bayesian Networks: Motivation• Although the full joint distribution can answer

any question about the domain it can become intractably large as the number of variable grows.

• Specifying probabilities for atomic events is rather unnatural and may be very difficult.

• Use a graphical representation for which we can more easily investigate the complexity of inference and can search for efficient inference algorithms.

Page 18: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Bayesian Networks

• Capture independence and conditional independence where they exist, thus reducing the number of probabilities that need to be specified.

• It represents dependencies among variables and encodes a concise specification of the full joint distribution.

Page 19: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

A Bayesian Network is a ...

• Directed Acyclic Graph (DAG) in which …

• … the nodes denote random variables

• … each node X has a conditional probability distribution P(X|Parents(X)).

• The intuitive meaning of an arc from X to Y is that X directly influences Y.

Page 20: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Additional Terminology• If X and its parents are discrete, we can

represent the distribution P(X|Parents(X)) by a conditional probability table (CPT) specifying the probability of each value of X given each possible combination of settings for the variables in Parents(X).

• A conditioning case is a row in this CPT (a setting of values for the parent nodes). Each row must sum to 1.

Page 21: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Bayesian Network Semantics

• A Bayesian Network completely specifies a full joint distribution over its random variables, as below -- this is its meaning.

• P

• In the above, P(x1,…,xn) is shorthand notation for P(X1=x1,…,Xn=xn).

( ) = P ( )x x x P aren ts Xn i i

i =

n

1

1

, . . . , | ( )

Page 22: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Inference Example

• What is probability alarm sounds, but neither a burglary nor an earthquake has occurred, and both John and Mary call?

• Using j for John Calls, a for Alarm, etc.:

P ( ) =

P ( ) P ( ) P ( ) P ( ) P ( ) =

(0 .9 )(0 .7 )(0 .0 0 1 )(0 .9 9 9 )(0 .9 9 8 ) = 0 .0 0 0 6 2

j m a b e

j a m a a b e b e

| | |

Page 23: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Chain Rule

• Generalization of the product rule, easily proven by repeated application of the product rule.

• Chain Rule: P ( ) =

P ( )P ( ). . . P ( )P ( )

= P ( )

x ,...x

x |x ,...,x x |x ,...,x x |x x

x |x ,...,x

n

n n- n - n -

i i- i

i=

n

1

1 1 1 2 1 2 1 1

1

1

Page 24: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Chain Rule and BN Semantics

B N sem an tics: P ( ) = P ( )

K ey P ro p erty : ( ) = ( )

p ro v id ed . S ay s a n o d e is

co n d itio n a lly in d ep e n d en t o f its p red ece sso rs in th e

n o d e o rd erin g g iv en its p aren ts , an d su g g ests

in crem en ta l p ro ced u re fo r n e tw o rk co n stru c tio n .

x ,...,x x |P aren ts(X )

X |X ,...,X X |P aren ts(X )

P aren ts(X ) X ,...,X

n i i

i=

n

i i- i i

i i-

1

1

1 1

1 1

P P

Page 25: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Example of the Key Property

• The following conditional independence holds:

P(MaryCalls |JohnCalls, Alarm, Earthquake, Burglary) =

P(MaryCalls | Alarm)

Page 26: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Procedure for BN Construction

• Choose relevant random variables.

• While there are variables left:

1 . a n ex t v a riab le an d ad d a n o d e fo r it.

2 . S e t to m in im al se t o f n o d es su ch

th a t th e K ey P ro p erty (p rev io u s s lid e ) is sa tisfied .

3 . D efin e th e co n d itio n a l d is trib u tio n ( ).

C hoose

som e

P

X

P aren ts(X )

X |P aren ts(X )

i

i

i i

Page 27: Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Principles to Guide Choices

• Goal: build a locally structured (sparse) network -- each component interacts with a bounded number of other components.

• Add root causes first, then the variables that they influence.


Recommended