+ All Categories
Home > Documents > Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical...

Probabilistic Graphical Modelssiamak/COMP767/slides/... · 2019-11-19 · Probabilistic Graphical...

Date post: 04-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
74
Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian networks Siamak Ravanbakhsh Fall 2019
Transcript

Probabilistic Graphical ModelsProbabilistic Graphical ModelsStructure learning in Bayesian networks

Siamak Ravanbakhsh Fall 2019

Learning objectivesLearning objectives

why structure learning is hard?two approaches to structure learning

constraint-based methodsscore based methods

MLE vs Bayesian score

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

 

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score  2O(n )2

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

2O(n )2

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

 

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

Perfect MAP

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

Perfect MAP

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

Perfect MAP

X ⊥ Y ∣ Z?

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI) I(G) = I(p )D

hypothesis testing

first attempt: a DAG that is I-map for

Perfect MAP

p D I(G) ⊆ I(p )D

X ⊥ Y ∣ Z?

minimal I-mapminimal I-map from CI test from CI test

input: IC test oracle; an orderingoutput: a minimal I-map G for i=1...n 

find minimal                                      s.t.set

X , … ,X 1 n

(X ⊥i X , … ,X −1 i−1 U ∣ U)U ⊆ {X , … ,X }1 i−1

X 1 X nX i

Pa ←X i U X ⊥ NonDesc ∣ Pa i X i X i

a DAG where removing an edge violates I-map property

minimal I-mapminimal I-map from CI test from CI testProblems:

CI tests involve many variablesnumber of CI tests is exponentiala minimal I-MAP may be far from a P-MAP

minimal I-mapminimal I-map from CI test from CI testProblems:

CI tests involve many variablesnumber of CI tests is exponentiala minimal I-MAP may be far from a P-MAP

different orderings give different graphs Example:

D,I,S,G,L(a topological ordering)

L,S,G,I,D L,D,S,I,G

Structure learningStructure learning in BayesNets in BayesNets

Identifiable up to I-equivalence

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

a DAG with the same set of conditional independencies (CI)

I(G) = I(p )D

first attempt: a DAG that is I-map for p D I(G) ⊆ I(p )D

can we find a perfect MAP with fewer IC testsinvolving fewer variables?

second attempt: a DAG that is P-map for

Perfect mapPerfect map from CI test from CI test

only up to I-equivalencethe same set of CIs

same skeletonsame immoralities

Perfect mapPerfect map from CI test from CI test

only up to I-equivalencethe same set of CIs

same skeletonsame immoralities

procedure:

1. find the undirected skeleton using CI tests2. identify immoralities in the undirected graph

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j

Perfect mapPerfect map from CI test from CI test1. finding the undirected skeleton

observation: if X and Y are not adjacent then                                 ORX ⊥ Y ∣ Pa X X ⊥ Y ∣ Pa Y

assumption: max number of parents d

idea: search over all subsets of size d, and check CI above

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j = O((n ) ×2 O((n − 2) )d

O(n )d+2

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

YX

Z

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

YX

Z

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

not immorality only if

X ⊥i X ∣j U⇒ Z ∈ UYX

Z

Perfect mapPerfect map from CI test from CI test2. finding the immoralities

input: CI oracle; bound on #parents d

output: undirected skeleton

initialize H as a complete undirected graph

for all pairs     for all subsets U of size            (within current neighbors of                 )

             If                      then remove                from Hreturn H

X ,X i j

≤ d

X ⊥i X ∣j U X −i X j

X ,X i j

potential immoralityX − Z,Y − Z ∈ H,X − Y ∈ H

not immorality only if

X ⊥i X ∣j U⇒ Z ∈ U

save the U when removing X-Ysee if Z in U?

if no, then we have immorality

X Y

Z

YX

Z

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edges

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergence

for exact CI tests, this guarantees the exact I-equivalence family

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

for exact CI tests, this guarantees the exact I-equivalence family

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

undirected skeleton+immoralities

for exact CI tests, this guarantees the exact I-equivalence family

Perfect mapPerfect map from CI test from CI test3. propagate the constraints

at this point: a mix of directed and undirected edgesadd directions using the following rules (needed to preserve immoralities / DAG structure)

until convergenceExample

Ground truth DAG

undirected skeleton+immoralities using rules R1,R2,R3

for exact CI tests, this guarantees the exact I-equivalence family

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

measure the deviance of                                    from

conditional mututal information

 

        statistics

p (X ∣D Z)p (Y ∣Z)D p (X,Y ∣Z)D

d (D) =I E [D(p (X,Y ∣Z)∣∣p (X∣Z)p (Y ∣Z))]Z D D D

χ2

d (D) =χ2 ∣D∣ ∑x,y,z p (z)p (x∣z)p (y∣z)D D D

(p (x,y,z)−p (z)p (x∣z)p (y∣z))D D D D2

using frequencies in thedataset

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

over all possible datasets

conditional independence (CI) testconditional independence (CI) test

how to decide                         from the datasetX ⊥ Y ∣ Z D

large deviance  rejects the null hypothesis (of conditional independence)

d(D) > tpick a threshold

p-value is the probability of false rejection pvalue(t) = P ({D : d(D) > t} ∣ X ⊥ Y ∣ Z)

over all possible datasets

it is possible to derive the distribution of deviance measures

e.g.,         distributionreject a hypothesis (CI) for small p-values (.05)

χ2

.05

.95

Structure learningStructure learning in BayesNets in BayesNets

family of methods

constraint-based methodsestimate cond. independencies from the data

find compatible BayesNets

search over the combinatorial space, maximizing a score 

Bayesian model averagingintegrate over all possible structures

Mutual informationMutual information

how much information does X encode about Y?

reduction in the uncertainty of X after observing Y

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y )

reduction in the uncertainty of X after observing Y

conditional entropy p(x)H(p(y∣x))∑x

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

conditional entropy p(x)H(p(y∣x))∑x

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

I(X,Y ) = p(x, y) log( )∑x,y p(x)p(y)p(x,y)

conditional entropy p(x)H(p(y∣x))∑x

Mutual informationMutual information

how much information does X encode about Y?

I(X,Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X)

reduction in the uncertainty of X after observing Y

symmetric = I(Y ,X)

= D (p(x, y)∥p(x)p(y))KL

I(X,Y ) = p(x, y) log( )∑x,y p(x)p(y)p(x,y)

positive

conditional entropy p(x)H(p(y∣x))∑x

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

using the empirical distribution

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

using the empirical distribution

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

= N p (x ,Pa ) log + log p (x )∑i∑x ,Pa i x iD i x i

(p (x )p (Pa )D i D x i

p (x ,Pa )D i x iD i )

using the empirical distribution

MLE in Bayes-nets MLE in Bayes-nets mutual information formmutual information form

log-likelihood ℓ(D; θ) = log p(x ∣∑x∈D∑i i Pa ; θ )x i i∣Pa i

= log p(x ∣∑i∑(x ,Pa )∈Di x ii Pa ; θ )x i i∣Pa i

= N p (x,Pa ) log p(x ∣∑i∑x ,Pa i x iD x i i Pa ; θ )x i i∣Pa i

use MLE estimate ℓ(D, θ ) =∗ N p (x ,Pa ) log p (x ∣∑i∑x ,Pa i x iD i xi D i Pa )xi

= N p (x ,Pa ) log + log p (x )∑i∑x ,Pa i x iD i x i

(p (x )p (Pa )D i D x i

p (x ,Pa )D i x iD i )

using the definition of mutual information = N I (X ,Pa ) −∑i D i X iH (X )D i

using the empirical distribution

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

does not depend on structure

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

does not depend on structure

I (X ,X )D i j

Optimal solution for Optimal solution for treestreeslikelihood score ℓ(D, θ ) =∗ N I (X ,Pa ) −∑i D i X i

H (X )D i

structure learning algorithms use mutual information in the structure search:

Chow-Liu algorithm: find the max-spanning tree: edge-weights = mutual information

add direction to edges later

make sure each node has at most one parent (i.e., no v-structure)

does not depend on structure

I (X ,X )D i j

I (X ,X ) =D j i I (X ,X )D i j

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θlog

score (G,D) =B log P (D∣G) + log P (G)

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

logscore (G,D) =B log P (D∣G) + log P (G)

G

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

for large sample size

any exp-family member

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

#parameters

for large sample size

any exp-family member

Bayesian about both structure        and parameters

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

P (G∣D) ∝ P (D∣G)P (G)

G θ

P (D∣θ,G)P (θ ∣∫θ∈Θ G

G)dθ marginal likelihood for a structure

assuming local and global parameter independence

factorizes to the marginal likelihood of each node

logscore (G,D) =B log P (D∣G) + log P (G)

G

for Dirichlet-multinomial has closed form

score (G,D) ≈B ℓ(D, θ ) −∗G log(∣D∣)K2

1Bayesian Information Criterion (BIC)

#parameters

for large sample size

any exp-family member

Akaike Information Criterion (AIC) ℓ(D, θ ) −∗G K2

1

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

Example

G 1

G 2

= ∣D∣

The Bayesian score is biased towards simpler structures

Bayesian Score Bayesian Score for BayesNetsfor BayesNets

Example The Bayesian score is biased towards simpler structures

= ∣D∣

data sampled from ICU alarm Bayesnet

Bayesian score of the true model (509 params.)simplified model (359 params)

simplified model (214 params)

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

O(N )2 possible moves

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

use the decomposition of the score

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

Structure searchStructure search

 is NP-hardarg max Score(D,G)G

use heuristic search algorithms (discussed for MAP inference)

local search using: edge additionedge deletionedge reversal

use the decomposition of the score

O(N )2 possible moves

collect sufficient statistics (frequencies)estimate the score

example ICU-alarm network

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

constraint-based methods:limit the max number of parents

rely on CI tests

identifies the I-equivalence class

SummarySummary

Structure learning is NP-hardMake assumptions to simplify:

constraint-based methods:limit the max number of parents

rely on CI tests

identifies the I-equivalence class

score based methods:tree structure

use a Bayesian score + heuristic search

finds a locally optimal structure


Recommended