Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | eugenia-patrick |
View: | 227 times |
Download: | 0 times |
DeepDiveModel
Dongfang XuPh .D student, School of Information, University of Arizona
Dec 13, 2015
Agenda
OverviewOverview
Factor GraphFactor Graph
Learning &InferenceLearning &Inference
ReferenceDeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.
Overview
What is Deep Dive?
•DeepDive is a new type of data management system that enables
one to tackle extraction, integration, and prediction problems in a
single system.
•DeepDive makes good use of uncertainty to improve predictions
during the probabilistic inference step.
For example, DeepDive may find a certain mention of "Barack" is only 60%
likely to actually refer to "Barack Obama", and use this fact to discount the impact
of that mention on the final result for the entity "Barack Obama")
Overview
Overview
What users/developers need do?
1. Generation and Extraction---Users schema and Correlation schema (correlation schema captures correlations among tuples in the user schema)---Extraction Features . (100K features) With weighted value.
2. Distant Supervision---One way is for user to create training data.---Take the real fact, and then label each training data true or false.
3. Inference and Learning
Overview
User Schema
Overview
What system will do?
1.Generation and Extraction---Extracted mentions (entity), candidate relation( based on features and supervise rule) ---Entity linking (Sometimes has glossaries or a database of all known entities; Sometimes need sophisticated machine learning approaches, with weighted value and boolean value)--- Label some of these pairs as true or false according to the supervision rules. (with weighted value and boolean value)
Overview
What system will do?
1.Generation and Extraction
2. Distant Supervision--- Make use of an already existing database to collect examples for the relation .--- Use these examples to automatically generate our training data, including positive and negative training data (harder process).
Agenda
OverviewOverview
Factor GraphFactor Graph
Learning &InferenceLearning &Inference
ReferenceDeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.
Learning and Inference step
1.Factor graph groundingDeepDive heavily relies on factor graphs, one type of probabilisticgraphical models, for its statistical inference and learning phase. A Factor graph has two types of nodes: Variable notes and factor Notes.
Factor Graph
Factor Graph
Learning and Inference stepBoth the features extracted and domain knowledge (inference rule
In factor from a factor graph) integrated need a weight to indicate
how strong an indicator they are to the target task.
---One way to do that is for the user to manually specify the weight.
---another more easy, consistent, and effective way is for DeepDive
to automatically learn the weight with machine learning techniques.
(Through an iterative way)
Learning and Inference step
1.Factor graph grounding---Variables, which can be used to quantitatively describe an event.Specifically, describe the tuple in users schema. The variables can beevidence variables when their value is known (from training data or user defined), or query variables when their value should bepredicted.---Factor (correlation relation, from correlation schema), is a function of variables, and is used to evaluate the relations among variable(s). The main task that DeepDive conducts on factor graphs is statistical inference, i.e., for a given node, what is the marginalprobability that this node takes the value 1?
Factor Graph
Learning and Inference step
1.Factor graph groundingThe variable nodes of the factor graph are connected to factorsaccording to inference rules specified by the user, who also defines the factor functions which describe how the variables are related. The user can specify whether the factor weights should be constant or learned by the system.
Inference rules are edges in graph. Each rule consists of three components: The input query specifies the variables to create (variable notes); The factor function (factor notes); The factor weight describes the confidence in the relationship expressed by the factor.
Factor Graph
Agenda
OverviewOverview
Factor GraphFactor Graph
Learning &InferenceLearning &Inference
Learning and Inference step2. How it works?
•A Each variable can take value 0 or 1, and let’s say there are two variables. So
we have four possible worlds (a combination of varaible(s)).
•B Define the probability of a possible world through factor functions. We give
different weight to factor functions, to express the relative influence of each
factor on the probability
Learning &Inference
Pr(I) measure{w1f1(v1, v2) + w2f2(v2)}. ∝
Learning and Inference step2. How it works?
•B+ The probability of a possible world graph is then defined to be proportional to
some measure of weighted combination of factor functions.
•C Now, we can perform marginal inference on factor graphs of one variable
taking a particular value. A marginal inference is to infer the probability of one
variable taking a particular value. This is similar to marginal probability and joint
probability.
Learning &Inference
Learning and Inference step2. How it works?
In DeepDive, you can assign factor weights manually, or you can let DeepDive learn weights automatically. In order to learn weights automatically, you must have enough training data available. DeepDive chooses weights that agree most with the training data. Formally, the training data is just set of possible worlds, and we choose weights by maximizing the probabilities of these possible worlds.
Learning &Inference
Learning &Inference
DeepDive
Resourcehttp://deepdive.stanford.edu.
https://www.youtube.com/watch?v=SfkLvExfl-s
http://pages.cs.wisc.edu/~czhang/zhang.thesis.pdf
Thank you!
Q&A