+ All Categories
Home > Documents > S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia...

S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia...

Date post: 14-Dec-2015
Category:
Upload: ashanti-farson
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
38
C.Bielza, P.Larrañaga -UPM- S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid
Transcript
Page 1: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

S3-SEMINAR ON DATA MINING-BAYESIAN NETWORKS-

B. INFERENCE

Master Universitario en Inteligencia Artificial

Concha Bielza, Pedro Larrañaga

Computational Intelligence GroupDepartamento de Inteligencia ArtificialUniversidad Politécnica de Madrid

Page 2: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 2

Types of queries

Brute-force computation

Probabilistic logic sampling

Variable elimination algorithm

Message passing algorithm

Conceptos básicos

Inference in Bayesian networks

Exact inference:

Approximate inference:

Page 3: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 3

Queries: posterior probabilitiesGiven some evidence e (observations),

Posterior probability of a target variable(s) X :

Other names: probability propagation, belief updating or revision…

Alarm

Earth.Burgl.

WCalls

News

?

Vector

Types of queriesQueries Brute-force VE Message Approx

answer queries about

P

Page 4: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 4

Semantically, for any kind of reasoningPredictive reasoning or deductive (causal inference): predict effects

Alarm

Earth.Burgl.

WCalls

News?

Diagnostic reasoning (diagnostic inference): diagnose the causes

Alarm

Earth.Burgl.

WCalls

News

?

Symptoms|Disease

Disease|Symptoms

Types of queriesQueries Brute-force VE Message Approx

Target variable is usually a descendant of the evidence

Target variable is usually an ancestor of the evidence

Page 5: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 5

More queries: maximum a posteriori (MAP)

Most likely configurations (abductive inference): event that best explains the evidence

Total abduction: search for

Partial abduction: search for

K most likely explanations

subset. of unobserved (explanation set)

all the unobserved

Alarm

Earth.Burgl.

WCalls

News

??

Alarm

Earth.Burgl.

WCalls

News

?

??

?

Types of queriesQueries Brute-force VE Message Approx

In general, cannot be computed component-wise, with max P(xi|e)

Page 6: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 6

More queries: maximum a posteriori (MAP)

Types of queriesQueries Brute-force VE Message Approx

Use MAP for:

Classification: find most likely label, given the evidence

Explanation: what is the most likely scenario, given the evidence

Page 7: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 7

More queries: decision-making

Optimal decisions (of maximum expected utility), with influence diagrams

Types of queriesQueries Brute-force VE Message Approx

Page 8: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 8

Brute-force computation of P(X|e)

First, consider P(Xi), without observed evidence e. Conceptually simple but computationally complex

For a BN with n variables, each with its P(Xj|Pa(Xj)):

But this amounts to computing the JPD, often very inefficient and even intractable computationally

CHALLENGE: Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations)

Exact inference [Pearl’88; Lauritzen & Spiegelhalter’88]

Queries Brute-force VE Message Approx

Brute-force approach

Page 9: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 9

Improving brute-forceUse the JPD factorization and the distributive law

Table with 32 inputs (JPD) (if binary variables)

Exact inferenceQueries Brute-force VE Message Approx

?

Page 10: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 10

Improving brute-forceArrange computations effectively, moving some additions

over X5 and X3:

over X4:Biggest table with 8 (like the BN)

Exact inferenceQueries Brute-force VE Message Approx

Page 11: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 11

Variable elimination algorithmWanted:

A list with all functions of the problemSelect an elimination order of all variables (except i)For each Xk from , if F is the set of functions that involve Xk:

Delete F from the list

Add f’ to the listOutput: combination (multiplication) of all functions in the current list

Eliminate Xk= combine all the functions that contain this variable and marginalize out Xk

Compute

ONE variable

Exact inferenceQueries Brute-force VE Message Approx

Page 12: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 12

Variable elimination algorithm

Exact inferenceQueries Brute-force VE Message Approx

Repeat th

e a

lgorith

m fo

r each

targ

et

varia

ble

Page 13: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 13

Example with Asia network

Exact inferenceQueries Brute-force VE Message Approx

Visit to Asia (A)

Smoking (S)

Lung Cancer(L)

Tuberculosis(T)

Tub. or Lung Canc (E)

Bronchitis (B)

X-Ray (X) Dyspnea (D)

Page 14: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 14

Brute-force approach

Compute P(D) by brute-force:

Exact inferenceQueries Brute-force VE Message Approx

x b e l t s a

dxbeltsaPdP ),,,,,,,()(

Complexity is exponential in the size of the graph (number of variables *number of states for each variable)

Page 15: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 15

Exact inferenceQueries Brute-force VE Message Approx

not necessarily a probability term

Page 16: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 16

Exact inferenceQueries Brute-force VE Message Approx

4

Page 17: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 17

Variable elimination algorithm

Size = 8

Local computations (due to moving the additions)

Importance of the elimination ordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al.’87] (heuristics for good sequences)

Exact inferenceQueries Brute-force VE Message Approx

Complexity is exponential in the max N. of var. infactors of the summation

Page 18: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 18

Message passing algorithm

Operates passing messages among the nodes of the network. Nodes act as processors that receive, calculate and send information. Called propagation algorithms

Exact inferenceQueries Brute-force VE Message Approx

Clique tree propagation, based on the same principle as VE but with a sophisticated caching strategy that:Enables to compute the posterior prob. distr. of

all variables in twice the time it takes to compute that of one single variable

Works in an intuitive appealing fashion, namely message propagation

Page 19: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 19

Basic operations for a node

Ask info(i,j): Target node i asks info to node j. Does it for all neighbors j. They do the same until there are no nodes to ask

Exact inferenceQueries Brute-force VE Message Approx

Send-message(i,j): Each node sends a message to the node that asked him the info… until reaching the target nodeA message is defined over the intersection of domains of fi and fj. It is computed as:

And finally, we calculate locally at each node i:Target combines all received info with his info and marginalize over the target variable

Page 20: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 20

Procedure for X2

Exact inferenceQueries Brute-force VE Message Approx

Colle

ctE

vid

en

ceAsk

Page 21: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 21

P(X2) as a message passing algorithm

Exact inferenceQueries Brute-force VE Message Approx

?

Page 22: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 22

VE as a message passing algorithm

Direct correspondence:

Exact inferenceQueries Brute-force VE Message Approx

?

VE

Mess.

Page 23: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 23

Computing prob. P(Xi|e) of all (unobserved) variables i at a time

We can perform the previous process for each node: but many messages are repeated!

Exact inferenceQueries Brute-force VE Message Approx

Or, we can use 2 rounds of messages as follows:Select a node as a root (or pivot)Ask or collect evidence from the leaves toward the root (messages in downward direction). As VE.

Distribute evidence from the root toward the leaves (messages in upward direction)

Calculate marginal distributions at each node by local computation, i.e. using its incoming messages

This algorithm never constructs tables larger than those in the BN

Page 24: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 24

Message passing algorithm

X

1 1

12

22

34

56 7

778 8

8

CollectEvidence

Root node

Exact inferenceQueries Brute-force VE Message Approx

First sweep:

DistributeEvidenceSecond sweep:

Page 25: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 25

Networks with loops

If net is not a polytree, it does not work

Independence assumptions applied in the algorithm cannot be used here (now “any node separates the graph into 2 unconnected parts (polytrees)” does not hold)

Exact inferenceQueries Brute-force VE Message Approx

Request/messages go in a cycle indefinitely(info goes through 2 paths and is counted twice)

Alternatives??

Page 26: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 26

Complexity

Exact inferenceQueries Brute-force VE Message Approx

Complexity of propagation algorithms in polytrees (i.e., without loops, cycles in the underlying undirected graph) is linear in the size (nodes+arcs) of the network [brute-force is exponential]Exact inference in multiply-connected BNs is an NP-complete problem [Cooper 1990]

Page 27: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 27

Alternative: clustering methods [Lauritzen & Spiegelhalter’88]

Method implemented in the main BN software packagesTransform the BN into a probabilistically equivalent polytree by merging nodes, removing the multiple paths between two nodes

Exact inferenceQueries Brute-force VE Message Approx

M

S B

C H

Metastatic cancer (M) is a possible cause of brain tumors (B) and an explanation for increased total serum calcium (S). In turn, either of these could explain a patient falling into a coma (C). Severe headache (H) is also associated with brain tumors.

Create a new node Z, that combines S and B

M

Z=S,B

C H

States of Z: {tt,ft,tf,ff}

P(Z|M)=P(S|M)P(B|M) since they are c.i. given M

P(H|Z)=P(H|B) since H c.i. of S given B

Page 28: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 28

Alternative: clustering methods

Steps for the JUNCTION TREE CLUSTERING ALGORITHM:

1. Moralize the BN2. Triangulate the moral graph and obtain the

cliques3. Create the junction tree and its separators4. Compute new parameters5. Message passing algorithm

Exact inferenceQueries Brute-force VE Message Approx

Transform BN into a polytree (slow, much memory if dense, but only once)

Belief updating(fast)

CO

MP

ILA

TIO

N

Page 29: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 29

Inferencia aproximada

Why?

Because exact inference is intractable (NP-complete) with large (+40) and densely connected BNs

Both deterministic and stochastic simulation to find approximate answers

the associated cliques for the junction tree algorithm or the intermediate factors in the VE algorithm will grow in size, generating an exponential blowup in the number of computations performed

Approximate inferenceQueries Brute-force VE Message Approx

Page 30: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 30

Stochastic simulation

Uses the network to generate a large number of cases (full instantiations) from the network distribution

Inferencia aproximada Approximate inferenceQueries Brute-force VE Message Approx

P(Xi|e) is estimated using these cases by counting observed frequencies in the samples. By the Law of Large Numbers, estimate converges to the exact probability as more cases are generatedApproximate propagation in BNs within an arbitrary tolerance or accuracy is an NP-complete problemIn practice, if e is not too unlikely, convergence is

quickly

Page 31: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 31

Probabilistic logic sampling [Henrion’88]

2

1

6

4

3

5

When all the nodes have been visited, we have a case, an instantiation of all the nodes in the BN

A forward sampling algorithm

Given an ancestral ordering of the nodes (parents before children), generate from X once we have generated from its parents (i.e. from the root nodes down to the leaves)

Inferencia aproximada Approximate inferenceQueries Brute-force VE Message Approx

Repeat and use the observed frequenciesto estimate P(Xi|e)Use conditional prob.

given the known values of the parents

Page 32: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 32

Software

Page 33: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 33

Software

Page 34: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 34

Software

Page 35: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 35

genie.sis.pitt.edu

Software

Page 36: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 36

http.cs.berkeley.edu/~murphyk/

Software

Page 37: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM- 37

leo.ugr.es/elvira

Software

Page 38: S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational.

C.Bielza, P.Larrañaga -UPM-

S3-SEMINAR ON DATA MINING-BAYESIAN NETWORKS-

B. INFERENCE

Master Universitario en Inteligencia Artificial

Concha Bielza, Pedro Larrañaga

Computational Intelligence GroupDepartamento de Inteligencia ArtificialUniversidad Politécnica de Madrid


Recommended