DeepDiveIntroduction
Dongfang XuPh .D student, School of Information, University of Arizona
Sept 10, 2015
Agenda
Brief IntroductionBrief Introduction
KBC modelKBC model
WorkflowWorkflow
ReferenceDeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.
Brief Introduction
What is Deep Dive?
•DeepDive is a new type of data management system that enables
one to tackle extraction, integration, and prediction problems in a
single system.
•It is built by generalizing from experience in building more than ten
high-quality Knowledge Base Construction (KBC) systems. (Flexible
framework)
What is KBC?•Knowledge Base Construction (KBC) is the process of populating a
knowledge base (KB).
Brief Introduction
Why Deep Dive? Or Why KBC?•Its potential to answer key scientific questions.---Collect facts, contribute to scientific discoveries
• Typical knowledge base require a large amount of resource.•Common problems in scientific area.
Brief Introduction
Why Deep Dive? Or Why KBC?•Its potential to answer key scientific questions.•Typical knowledge base require a large amount of resource.•Its good performance. ---- Developer thinks about features (extraction rules), not algorithms. ---- Large amounts of data from a variety of sources; ----High quality in extracting complex knowledge and building entity relation;---- Calibrated probabilities for each assertion it makes;----Domain knowledge + framework Deep Dive = KBC system in specific domain
Brief Introduction
General description Deep Dive----The application of Relational database.
All data in Deep Dive is stored in a relational database.
----The main task it to figure out the relation & entity.
---- A selection of target facts typically defined for an IE task.
---- Multiple non-content cues such as layout information may be used to assist extraction, e.g. section headers or their layout in tabular data.
----Extract all kinds of information about the entity and relation, high data volume.
Agenda
Brief IntroductionBrief Introduction
KBC modelKBC model
WorkflowWorkflow
KBC model
Entity: An entity is a real-world person, place, or thing. ----For example, the entity “Michelle Obama 1” represents the actual entity for a
person whose name is “Michelle Obama”.Relation: A relation associates two (or more) entities.----For example, the entity “Barack Obama 1” and “Michelle Obama 1” participate
in the HasSpouse relation, which indicates that they are married.Mention: a mention is a span of text in an input file that refers to an entity or relationship.---- “Michelle” may be a mention of the entity “Michelle Obama 1.”Relation Mention: A relation mention is a phrase that connects two mentions that participate in a relation.---- “and his wife” =“Barack Obama” and “M. Obama”.
KBC model
Agenda
Brief IntroductionBrief Introduction
KBC modelKBC model
WorkflowWorkflow
Work Flow
Work Flow
Input file
Work Flow
Input file
User Schema
Work Flow
Candidate Generation Feature Extraction
Work Flow
Candidate Generation & Feature Extraction
Work Flow
Supervision(1) hand-labeling, and (2) distant supervision
Work Flow
Supervision(1) hand-labeling, and (2) distant supervision
Work Flow
Learning and InferenceIn the learning and inference phase, Deep Dive generates a factor graph.
Work Flow
Learning and Inference
DeepDive
Resourcehttp://deepdive.stanford.edu.
https://www.youtube.com/watch?v=SfkLvExfl-s
http://pages.cs.wisc.edu/~czhang/zhang.thesis.pdf
Thank you!
Q&A