+ All Categories
Home > Documents > Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730:...

Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730:...

Date post: 13-Jan-2016
Category:
Upload: albert-kennedy
View: 217 times
Download: 0 times
Share this document with a friend
21
Computing & Information Sciences Kansas State University Monday, 29 Oct 2008 CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course page: http://snipurl.com/v9v3 Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730 Instructor home page: http://www.cis.ksu.edu/~bhsu Reading for Next Class: Sections 14.3 – 14.5, Russell & Norvig 2 nd edition Graphical Models of Probability 2 Discussion: Distributions, KA & Learning
Transcript
Page 1: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Lecture 25 of 42

Wednesday, 29 October 2008

William H. Hsu

Department of Computing and Information Sciences, KSU

KSOL course page: http://snipurl.com/v9v3

Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730

Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading for Next Class:

Sections 14.3 – 14.5, Russell & Norvig 2nd edition

Graphical Models of Probability 2Discussion: Distributions, KA & Learning

Page 2: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Lecture Outline

Today and Friday’s Reading: Sections 14.3 – 14.5, R&N 2e

Next Week’s Reading: Sections 14.6 – 14.8, Chapter 15

Today: Graphical models Bayesian networks and causality

Inference and learning

BNJ interface (http://bnj.sourceforge.net)

Causality

Page 3: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

A Graphical View of Simple (Naïve) Bayes

xi {0, 1} for each i {1, 2, …, n}; y {0, 1}

Given: P(xi | y) for each i {1, 2, …, n}; P(y)

Assume conditional independence

i {1, 2, …, n} P(xi | xi, y) P(xi | x1, x2, …, xi-1, xi+1, xi+2, …, xn, y) = P(xi | y)

• NB: this assumption entails the Naïve Bayes assumption

• Why?

Can compute P(y | x) given this info

Can also compute the joint pdf over all n + 1 variables

Inference Problem for a (Simple) Bayesian Network Use the above model to compute the probability of any conditional event

Exercise: P(x1, x2, y | x3, x4)

Using Graphical Models

y

x1 x2 xnx3

P(x1 | y)

P(x2 | y)

P(x3 | y) P(xn | y)

i

ii

iin21 y|xP y,x|xPy|x,,x,xP

n

ii

n

iii y|xPyP y,x|xPyPy|xPyPy,xP

11

Page 4: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

In-Class Exercise:Probabilistic Inference

Inference Problem for a (Simple) Bayesian Network

Model: Naïve Bayes

Objective: compute the probability of any conditional event

Exercise

Given

• P(xi | y), i {1, 2, 3, 4}

• P(y)

Want: P(x1, x2, y | x3, x4)

Y43

ii

43

4321

43

2121434321

y|xPy|xP

y|xPyP

x,xP

y,x,x,x,xP

x,xP

y,x,xPy,x,x|x,xPx,x | y,x,xP

4

1

Page 5: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Unsupervised Learningand Conditional Independence

Given: (n + 1)-Tuples (x1, x2, …, xn, xn+1) No notion of instance variable or label

After seeing some examples, want to know something about the domain

• Correlations among variables

• Probability of certain events

• Other properties

Want to Learn: Most Likely Model that Generates Observed Data In general, a very hard problem

Under certain assumptions, have shown that we can do it

Assumption: Causal Markovity Conditional independence among “effects”, given “cause”

When is the assumption appropriate?

Can it be relaxed?

Structure Learning Can we learn more general probability distributions?

Examples: automatic speech recognition (ASR), natural language, etc.

y

x1 x2 xnx3

P(x1 | y)

P(x2 | y)

P(x3 | y) P(xn | y)

Page 6: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Polytrees aka singly-connected Bayesian networks

Definition: a Bayesian network with no undirected loops

Idea: restrict distributions (CPTs) to single nodes

Theorem: inference in singly-connected BBN requires linear time

• Linear in network size, including CPT sizes

• Much better than for unrestricted (multiply-connected) BBNs

Tree Dependent Distributions Further restriction of polytrees: every node has at one parent

Now only need to keep 1 prior, P(root), and n - 1 CPTs (1 per node)

All CPTs are 2-dimensional: P(child | parent)

Independence Assumptions As for general BBN: x is independent of non-descendants given (single) parent z

Very strong assumption (applies in some domains but not most)

Tree Dependent Distributions

x

z

root

Page 7: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Propagation Algorithm in Singly-Connected Bayesian Networks – Pearl (1983)

C1

C2

C3

C4 C5

C6

Upward (child-to-parent) messages

’ (Ci’) modified during

message-passing phase

Downward messages

P’ (Ci’) is computed during

message-passing phase

Adapted from Neapolitan (1990), Guo (2000)

Multiply-connected case: exact, approximate inference are #-complete

(counting problem is #-complete iff decision problem is -complete)

Page 8: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Inference by Clustering [1]: Graph Operations (Moralization, Triangulation, Maximal Cliques)

Adapted from Neapolitan (1990), Guo (2000)

A

D

B E G

C

H

F

Bayesian Network(Acyclic Digraph)

A

D

B E G

C

H

F

Moralize

A1

D8

B2

E3

G5

C4

H7

F6

Triangulate

Clq6

D8

C4

G5

H7

C4

Clq5

G5

F6

E3

Clq4

G5E3

C4 Clq3

A1

B2Clq1

E3

C4

B2

Clq2

Find Maximal Cliques

Page 9: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Inference by Clustering [2]:Junction Tree – Lauritzen & Spiegelhalter

(1988)Input: list of cliques of triangulated, moralized graph Gu

Output:

Tree of cliques

Separators nodes Si,

Residual nodes Ri and potential probability (Clqi) for all cliques

Algorithm:

1. Si = Clqi (Clq1 Clq2 … Clqi-1)

2. Ri = Clqi - Si

3. If i >1 then identify a j < i such that Clqj is a parent of Clqi

4. Assign each node v to a unique clique Clqi that v c(v) Clqi

5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi}

6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliquesAdapted from Neapolitan (1990), Guo (2000)

Page 10: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Inference by Clustering [3]:Clique-Tree Operations

Clq6

D8

C4

G5

H7

C4

Clq5

G5

F6

E3

Clq4

G5E3

C4 Clq3

A1

B2Clq1

E3

C4

B2

Clq2

(Clq5) = P(H|C,G)

(Clq2) = P(D|C)

Clq1

Clq3 = {E,C,G}R3 = {G}

S3 = { E,C }

Clq1 = {A, B}R1 = {A, B}S1 = {}

Clq2 = {B,E,C}R2 = {C,E}

S2 = { B }

Clq4 = {E, G, F}

R4 = {F} S4 = { E,G }

Clq5 = {C, G,H}R5 = {H}

S5 = { C,G }

Clq6 = {C, D}R5 = {D}

S5 = { C}

(Clq1) = P(B|A)P(A)

(Clq2) = P(C|B,E)

(Clq3) = 1

(Clq4) = P(E|F)P(G|F)P(F)

AB

BEC

ECG

EGF CGH

CD

B

EC

CGEG

C

Ri: residual nodes

Si: separator nodes(Clqi): potential probability of Clique i

Clq2

Clq3

Clq4Clq5

Clq6Adapted from Neapolitan (1990), Guo (2000)

Page 11: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Deciding Optimal Cutset: NP-hard

Current Open Problems Bounded cutset conditioning: ordering heuristics

Finding randomized algorithms for loop cutset optimization

Inference by Loop Cutset Conditioning

Split vertex in undirected cycle;

condition upon each of its state values

Number of network instantiations:Product of arity of nodes in minimal loop cutset

Posterior: marginal conditioned upon cutset variable values

X3

X4

X5

Exposure-To-Toxins

Smoking

Cancer X6

Serum Calcium

X2

Gender

X7

Lung Tumor

X1,1

Age = [0, 10)

X1,2

Age = [10, 20)

X1,10

Age = [100, )

Page 12: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

BNJ Visualization [2]Pseudo-Code Annotation (Code Page)

© 2004 KSU BNJ Development Team

ALARM Network

Page 13: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

BNJ Visualization [3]Network

© 2004 KSU BNJ Development Team

Poker Network

Page 14: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Inference by Variable Elimination [1]:Intuition

Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/

Page 15: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Inference by Variable Elimination [2]:Factoring Operations

Adapted from slides by S. Russell, UC Berkeley http://aima.cs.berkeley.edu/

Page 16: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

[2] Representation Evaluatorfor Learning Problems

Genetic Wrapper forChange of Representationand Inductive Bias Control

D: Training Data

: Inference Specification

Dtrain (Inductive Learning)

Dval (Inference)

[1] Genetic Algorithm

αCandidate

Representation

f(α)Representation

Fitness

OptimizedRepresentation

α̂

eI

Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning

Page 17: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Tools for Building Graphical Models

Commercial Tools: Ergo, Netica, TETRAD, Hugin

Bayes Net Toolbox (BNT) – Murphy (1997-present) Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html

Development group http://groups.yahoo.com/group/BayesNetToolbox

Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) Distribution page http://bnj.sourceforge.net

Development group http://groups.yahoo.com/group/bndev

Current (re)implementation projects for KSU KDD Lab

• Continuous state: Minka (2002) – Hsu, Guo, Li

• Formats: XML BNIF (MSBN), Netica – Barber, Guo

• Space-efficient DBN inference – Meyer

• Bounded cutset conditioning – Chandak

Page 18: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

References [1]:Graphical Models and Inference Algorithms

Graphical Models Bayesian (Belief) Networks tutorial – Murphy (2001)

http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html Learning Bayesian Networks – Heckerman (1996, 1999)

http://research.microsoft.com/~heckerman Inference Algorithms

Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988) http://citeseer.nj.nec.com/huang94inference.html

(Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989) http://citeseer.nj.nec.com/shachter94global.html

Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986) http://citeseer.nj.nec.com/dechter96bucket.html

Recommended Books• Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001)• Castillo, Gutierrez, Hadi (1997)• Cowell, Dawid, Lauritzen, Spiegelhalter (1999)

Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html

Page 19: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

References [2]:Machine Learning, KDD, and

Bioinformatics Machine Learning, Data Mining, and Knowledge Discovery

K-State KDD Lab: literature survey and resource catalog (1999-present) http://www.kddresearch.org/Resources

Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer, Thornton (2002-present) http://bnj.sourceforge.net

Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net

Bioinformatics European Bioinformatics Institute Tutorial: Brazma et al. (2001)

http://www.ebi.ac.uk/microarray/biology_intro.htm Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002)

http://www.cs.huji.ac.il/labs/compbio/ K-State BMI Group: literature survey and resource catalog (2002-2005)

http://www.kddresearch.org/Groups/Bioinformatics

Page 20: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Terminology

Introduction to Reasoning under Uncertainty Probability foundations Definitions: subjectivist, frequentist, logicist (3) Kolmogorov axioms

Bayes’s Theorem Prior probability of an event Joint probability of an event Conditional (posterior) probability of an event

Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses MAP hypothesis: highest conditional probability given observations (data) ML: highest likelihood of generating the observed data ML estimation (MLE): estimating parameters to find ML hypothesis

Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model

Bayesian Learning: Searching Model (Hypothesis) Space using CPs

Page 21: Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.

Computing & Information SciencesKansas State University

Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence

Summary Points

Introduction to Probabilistic Reasoning Framework: using probabilistic criteria to search H

Probability foundations Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist

Kolmogorov axioms

Bayes’s Theorem Definition of conditional (posterior) probability

Product rule

Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses Bayes’s Rule and MAP

Uniform priors: allow use of MLE to generate MAP hypotheses

Relation to version spaces, candidate elimination

Next Week: Chapter 14, Russell and Norvig Later: Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes

Categorizing text and documents, other applications


Recommended