Home > Documents > Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei...

# Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei...

Date post: 28-Dec-2015
Category:
View: 213 times
Embed Size (px)
Popular Tags:

#### ynyconsistent distribution

of 22 /22
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon
Transcript

Slide 1

Mean Field Inference in Dependency Networks: An Empirical StudyDaniel Lowd and Arash ShamaeiUniversity of OregonLearning and Inference inGraphical ModelsWe want to learn a probability distribution from data and use it to answer queries.

Applications: medical diagnosis, fault diagnosis, web usage analysis, bioinformatics, collaborative filtering, etc.ABCAnswers!DataModelLearningInference2One-Slide SummaryIn dependency networks, mean field inference is faster than Gibbs sampling, with similar accuracy.Dependency networks are competitive with Bayesian networks.ABCAnswers!DataModelLearningInference3If youre using DNs, try MF.If youre using BNs, try DNs.3OutlineGraphical models:Dependency networks vs. othersRepresentationLearningInferenceMean field inference in dependency networksExperiments 4Dependency NetworksRepresents a probability distribution over {X1, , Xn} as a set of conditional probability distributions.

Example:

X1X2X3[Heckerman et al., 2000]5Comparison of Graphical ModelsBayesian NetworkMarkov NetworkDependency NetworkAllow cycles?NYYEasy to learn?YNYConsistent distribution?YYN

InferencealgorithmslotslotsGibbs,MF (new!)6Learning Dependency NetworksFor each variable Xi, learn conditional distribution,

B=?

falsefalseC=?truetrue

C=?

falsetrue

[Heckerman et al., 2000]7Approximate Inference MethodsGibbs sampling: Slow but effectiveMean field: Fast and usually accurateBelief propagation: Fast and usually accurateABCAnswers!Model8Gibbs SamplingResample each variable in turn, given its neighbors:

Use set of samples to answer queries. e.g.,

Converges to true distribution, given enough samples (assuming positive distribution).

Previously, the only method used to compute probabilities in DNs.9Mean FieldApproximate P with simpler distribution Q:

To find best Q, optimize reverse K-L divergence:

Mean field updates converge to local optimum:

Works for DNs! Never before tested!10Mean Field in Dependency NetworksInitialize each Q(Xi) to a uniform distribution.Update each Q(Xi) in turn:

Stop when marginals Q(Xi) converge.

If consistent, this is guaranteed to converge.If inconsistent, this always seems to converge in practice.11Empirical QuestionsQ1. In DNs, how does MF compare to Gibbs sampling in speed and accuracy?

Q2. How do DNs compare to BNs in inference speed and accuracy?

12ExperimentsLearned DNs and BNs on 12 datasetsGenerated queries from test dataVaried evidence variables from 10% to 90%Score using average CMLL per variable(conditional marginal log-likelihood):

13Results: Accuracy in DNs14Negative CMLLResults: Timing in DNs (log scale)15DN faster than BN, for same inference method.MF significantly faster than Gibbs, and sometimes significantly faster than BP.15MF vs. Gibbs in DNs,run for equal timeEvidence# of MF wins% wins10%975%20%1083%30%1083%40%975%50%1083%60%1083%70%1192%80%1192%90%12100%Average10.285%In DNs, MF usually more accurate, given equal time.16Results: Accuracy17Gibbs: DN vs. BNEvidenceDN winsPercent wins10%325%20%18%30%18%40%325%50%542%60%758%70%1083%80%1083%90%1192%Average5.747%With more evidence, DNs are more accurate.18(Running time faster in DN.)18Experimental ResultsQ1. In DNs, how does MF compare to Gibbs sampling in speed and accuracy?A1. MF is consistently faster with similar accuracy, or more accurate with similar speed.Q2. How do DNs compare to BNs in inference speed and accuracy?A2. DNs are competitive with BNs better with more evidence, worse with less evidence.

19ConclusionMF inference in DNs is fast and accurate, especially with more evidence.Future work:Relational dependency networks (Neville & Jensen, 2007)More powerful approximations

Source code available: http://libra.cs.uoregon.edu/ 20Results: Timing (log scale)21DN faster than BN, for same inference method.MF significantly faster than Gibbs, and sometimes significantly faster than BP.21Learned ModelsLearning time is comparable.DNs usually have higher pseudo-likelihood (PLL)DNs sometimes have higher log-likelihood (LL)

22

Table 2: Learning time (in seconds), complexity (number of parameters), test set pseudo-log-likelihood (PLL), and test setlog-likelihood (LL) of learned models. Reported error range is one standard deviation of the mean. Standard errors of less than0.0005 are reported as 0.000. When differences in PLL or LL are statistically significant, the better result is in bold.

Dependency Networks Bayesian NetworksData set Time Cmplx. PLL LL Time Cmplx. PLL LLNLTCS 1.3 875 -0.311 0.004 -0.376 0.005 1.2 130 -0.314 0.004 -0.378 0.005MSNBC 38.8 6790 -0.254 0.001 -0.355 0.001 155.9 2016 -0.254 0.001 -0.354 0.001KDDCup 2000 27.1 4018 -0.032 0.000 -0.034 0.001 12.3 577 -0.033 0.000 -0.034 0.001Plants 8.3 3482 -0.131 0.002 -0.192 0.003 21.8 3167 -0.136 0.002 -0.185 0.003Audio 14.7 3641 -0.387 0.003 -0.405 0.005 26.4 1802 -0.398 0.003 -0.404 0.005Netflix 19.0 4003 -0.549 0.002 -0.573 0.002 28.9 1433 -0.557 0.002 -0.572 0.002Jester 12.7 3033 -0.523 0.002 -0.539 0.003 17.0 1081 -0.540 0.003 -0.538 0.003MSWeb 31.5 6313 -0.029 0.000 -0.033 0.001 24.3 1608 -0.029 0.000 -0.033 0.001Book 20.6 4675 -0.071 0.002 -0.072 0.003 11.8 1877 -0.075 0.003 -0.074 0.004WebKB 49.8 5193 -0.180 0.004 -0.191 0.006 41.1 2774 -0.186 0.004 -0.190 0.006Reuters-52 122.0 8247 -0.092 0.002 -0.106 0.003 126.3 5156 -0.097 0.002 -0.102 0.00320 Newsgroups 470.6 14526 -0.171 0.002 -0.172 0.003 495.2 6350 -0.177 0.002 -0.172 0.003

ResultsTable 2 summarizes the BN and DN models learned on eachdataset.Learning DNs is faster than learning BNs in 7 out of 12

datasets. Therefore, DN learning time is at least comparableto BNs.For pseudo-log-likelihood, DNs are significantly better

on 10 out of 12 datasets according to a paired t-test (p

Recommended