DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

DeepDiveModel

Dongfang XuPh .D student, School of Information, University of Arizona

Dec 13, 2015

Agenda

OverviewOverview

Factor GraphFactor Graph

Learning &InferenceLearning &Inference

ReferenceDeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.

Overview

What is Deep Dive?

•DeepDive is a new type of data management system that enables

one to tackle extraction, integration, and prediction problems in a

single system.

•DeepDive makes good use of uncertainty to improve predictions

during the probabilistic inference step.

For example, DeepDive may find a certain mention of "Barack" is only 60%

likely to actually refer to "Barack Obama", and use this fact to discount the impact

of that mention on the final result for the entity "Barack Obama")

Overview

Overview

What users/developers need do?

1. Generation and Extraction---Users schema and Correlation schema (correlation schema captures correlations among tuples in the user schema)---Extraction Features . (100K features) With weighted value.

2. Distant Supervision---One way is for user to create training data.---Take the real fact, and then label each training data true or false.

3. Inference and Learning

Overview

User Schema

Overview

What system will do?

1.Generation and Extraction---Extracted mentions (entity), candidate relation( based on features and supervise rule) ---Entity linking (Sometimes has glossaries or a database of all known entities; Sometimes need sophisticated machine learning approaches, with weighted value and boolean value)--- Label some of these pairs as true or false according to the supervision rules. (with weighted value and boolean value)

Overview

What system will do?

1.Generation and Extraction

2. Distant Supervision--- Make use of an already existing database to collect examples for the relation .--- Use these examples to automatically generate our training data, including positive and negative training data (harder process).

Agenda

OverviewOverview



ReferenceDeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.

Learning and Inference step

1.Factor graph groundingDeepDive heavily relies on factor graphs, one type of probabilisticgraphical models, for its statistical inference and learning phase. A Factor graph has two types of nodes: Variable notes and factor Notes.

Factor Graph

Factor Graph

Learning and Inference stepBoth the features extracted and domain knowledge (inference rule

In factor from a factor graph) integrated need a weight to indicate

how strong an indicator they are to the target task.

---One way to do that is for the user to manually specify the weight.

---another more easy, consistent, and effective way is for DeepDive

to automatically learn the weight with machine learning techniques.

(Through an iterative way)


1.Factor graph grounding---Variables, which can be used to quantitatively describe an event.Specifically, describe the tuple in users schema. The variables can beevidence variables when their value is known (from training data or user defined), or query variables when their value should bepredicted.---Factor (correlation relation, from correlation schema), is a function of variables, and is used to evaluate the relations among variable(s). The main task that DeepDive conducts on factor graphs is statistical inference, i.e., for a given node, what is the marginalprobability that this node takes the value 1?

Factor Graph


1.Factor graph groundingThe variable nodes of the factor graph are connected to factorsaccording to inference rules specified by the user, who also defines the factor functions which describe how the variables are related. The user can specify whether the factor weights should be constant or learned by the system.

Inference rules are edges in graph. Each rule consists of three components: The input query specifies the variables to create (variable notes); The factor function (factor notes); The factor weight describes the confidence in the relationship expressed by the factor.

Factor Graph

Agenda

OverviewOverview



Learning and Inference step2. How it works?

•A Each variable can take value 0 or 1, and let’s say there are two variables. So

we have four possible worlds (a combination of varaible(s)).

•B Define the probability of a possible world through factor functions. We give

different weight to factor functions, to express the relative influence of each

factor on the probability

Learning &Inference

Pr(I) measure{w1f1(v1, v2) + w2f2(v2)}. ∝


•B+ The probability of a possible world graph is then defined to be proportional to

some measure of weighted combination of factor functions.

•C Now, we can perform marginal inference on factor graphs of one variable

taking a particular value. A marginal inference is to infer the probability of one

variable taking a particular value. This is similar to marginal probability and joint

probability.

Learning &Inference


In DeepDive, you can assign factor weights manually, or you can let DeepDive learn weights automatically. In order to learn weights automatically, you must have enough training data available. DeepDive chooses weights that agree most with the training data. Formally, the training data is just set of possible worlds, and we choose weights by maximizing the probabilities of these possible worlds.

Learning &Inference

Learning &Inference

DeepDive

Resourcehttp://deepdive.stanford.edu.

https://www.youtube.com/watch?v=SfkLvExfl-s

http://pages.cs.wisc.edu/~czhang/zhang.thesis.pdf

Thank you!

Q&A

Date post:	21-Jan-2016
Category:	Documents
Upload:	eugenia-patrick
View:	227 times
Download:	0 times

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Documents