transcript
- Slide 1
- Modelling Relational Statistics With Bayes Nets School of
Computing Science Simon Fraser University Vancouver, Canada
Tianxiang Gao Yuke Zhu
- Slide 2
- 2/12 Class-Level and Instance-Level Queries Classic AI research
distinguished two types of probabilistic relational queries.
(Halpern 1990, Bacchus 1990). Halpern, An analysis of first-order
logics of probability, AI Journal 1990. Bacchus, Representing and
reasoning with probabilistic knowledge, MIT Press 1990. Relational
Query Class-level QueryReference Class What is the percentage of
flying birds? Birds What is the percentage of friendship pairs
where both are women? Pairs of Friends What is the percentage of A
grades awarded to highly intelligence students? Student-course
pairs where student is registered in course. Instance-Level Query
Given that Tweety is a bird, what is the probability that Tweety
flies? Given that Sam and Hilary are friends, and given the genders
of their other friends, what is the probability that Sam and Hilary
are both women? What is the probabiity that Jack is highly
intelligent given his grades? Instance-level queries Ground facts
Type 2 probabilities Class-level queries Relational Statistics Type
1 probabilities
- Slide 3
- 3/12 Visualizing Class-Level Probability Modelling Relational
Statistics With Bayes Nets Percentage of Flying Birds = 90%.
Halpern: Probability that a typical or random bird flies is 90%.
Contains some free variables. e.g. P(Flies(B)) = ?. Syntactic
Distinction Contains no free variables. e.g. P(Flies(tweety)) =
?.
- Slide 4
- 4/12 Applications of Class-Level Modelling 1 st -order rule
learning (e.g., intelligent students take difficult courses).
Strategic Planning (e.g., increase SAT requirements to decrease
student attrition). Query Optimization (Getoor, Taskar, Koller
2001). Class-level queries support selectivity estimation optimal
evaluation order for SQL query. Getoor, Lise, Taskar, Benjamin, and
Koller, Daphne. Selectivity estimation using probabilistic models.
ACM SIGMOD Record, 30(2):461472, 2001.
- Slide 5
- 5/12 No Grounding Semantics for Class- level Queries Unrolling
a network model of individual entities. No classes, cannot ask
class-level queries. Modelling Relational Statistics With Bayes
Netsa intelligence(S) diff(C) Registered(S,C) Class-level Template
with 1st-order Variables intelligence(jack) diff(100)
Registered(jack,100) intelligence(jane) diff(200)
Registered(jack,200) Registered(jane,100) Registered(jane,200)
Instance-level Model w/ domain(S) = {jack,jane} domain(C) =
{100,200}
- Slide 6
- 6/12 Previous Work: Probabilistic Queries in
Statistical-Relational Learning Class-LevelInstance-Level
Statistical-Relational Models (Lise Getoor, Taskar, Koller 2001)
Many Model Types: Probabilistic Relational Models, Markov Logic
Networks, Bayes Logic Programs, Logical Bayesian Networks,
- Slide 7
- 7/12 New Unified Approach David Poole, First-Order
Probabilistic Inference, IJCAI 2003. H. Khosravi, O. Schulte, T.
Man, X. Xu, and B. Bina, Structure learning for Markov logic
networks with many descriptive attributes, in AAAI, 2010. O.
Schulte and H. Khosravi. Learning graphical models for relational
data via lattice search. Machine Learning, 2012.
Class-LevelInstance-Level Parametrized Bayes Nets + new class-level
semantics Parametrized Bayes Nets + combining rules (Poole 2003) +
log-linear model (Khosravi, Schulte et al. 2010, Schulte and
Khosravi 2012)
- Slide 8
- 8/12 Random Selection Semantics: Example Apply the random
selection semantics for probabilistic 1 st - order logic (Halpern
1990; Bacchus 1990). Halpern, An analysis of first-order logics of
probability, AI Journal 1990. Bacchus, Representing and reasoning
with probabilistic knowledge, MIT Press 1990.
intelligence(S)diff(C) Registered(S,C) P(intelligence(S) = hi,
diff(C) = hi, Registered(S,C) = true) = 20% means: hi true if we
randomly select a student and a course, then the probability is 20%
that the student is registered in the course, and that the
intelligence of the student and the difficulty of the course are
high.
- Slide 9
- 9/12 Computing Parameter Estimates (I) Use conditional database
probabilities as Bayes net parameters. Maximizes the random
selection pseudo- likelihood (Schulte 2011). For database
probabilities with all true relationships, use SQL or Virtual Join
(Yin, Han et al. 2004). Schulte, O. A tractable pseudo-likelihood
function for Bayes nets applied to relational data. SIAM SDM, 2011.
Yin, X., Han. J. et al. CrossMine: Efficient Classification Across
Multiple Database Relations. Constraint-Based Mining and Inductive
Databases, 2004. R1R1 R2R2
- Slide 10
- 10/12 Computing Parameter Estimates (II) How to compute
database probabilities for negated relations? e.g., number of U.S.
users who are not friends? Materializing complement tables is
unscalable. For single false relation, 1-minus trick (Getoor et al.
2007). General case: New application of the fast Mbius transform
(Kennes and Smits 1990). Getoor, Lise, Friedman, Nir, Koller,
Daphne, Pfeffer, Avi, and Taskar, Benjamin. Probabilistic
relational models, 2007. Kennes, Robert and Smets, Philippe.
Computational aspects of the Mobius transformation. In UAI,
1990.
- Slide 11
- 11/12 The Mbius Parametrization Modelling Relational Statistics
With Bayes Netsa R1R1 R2R2 Count(*) R1R1 R2R2 R1R1 R2R2 R1R1 R2R2
For two link types R1R1 R2R2 Count(*) R1R1 R2R2 no condition Joint
probabilities Mbius Parameters
- Slide 12
- 12/12 Evaluation 1. Fast: parameters in minutes or less. 2.
Accurate queries/estimates. 3. Try it yourself in our demo!
Modelling Relational Statistics With Bayes Nets