+ All Categories
Home > Documents > TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR...

TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR...

Date post: 19-Dec-2015
Category:
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav Rajkovic, Rudi Studer Presented By Stephen Lynn
Transcript
Page 1: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Transforming Arbitrary Tables into F-Logic Frames with TARTARAleksander Pivk, York Sure, Philipp Cimiano,Matjaz Gams, Vladislav Rajkovic, Rudi Studer

Presented By Stephen Lynn

Page 2: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Information Extraction Free-form Text

Linguistic/NLP approaches

Tabular StructuresTable comprehension task

html, excel, pdf, text, etc.Semantic interpretation taskMore effort???

Page 3: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

TARTAR Architecture

Page 4: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Semantic Representation Frame Logic (F-Logic)

Model-theoretic semanticsComplete resolution-based proof theoryExpressive power of logicAvailability of efficient reasoning tools

Page 5: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

F-Logic Frame

Page 6: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Table Comprehension Dimensions – a grouping of cells representing

similar entities

Page 7: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Table Comprehension Stub – dimension with headers used to index

elements in body

Page 8: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Table Comprehension Box head – column headers (often nested)

Page 9: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Table Comprehension Body – data values

Page 10: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Table Classes 1D, 2D, Complex

Page 11: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Methodology

Page 12: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Cleaning & Canonicalization Clean DOM tree

CyberNeko HTML Parser

Rowspan/Colspan expansion

Page 13: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Structure Detection Token Type Hierarchy Assign Functional Types and Probabilities

Page 14: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Structure Detection Detect Logical Table Orientation

Page 15: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Structure Detection Discover and Level Regions

Logical Units

Page 16: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

FTM Building Functional Table Model (FTM)

Arrange regions into a treeLeaf nodes are data

Page 17: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Semantic Enriching of FTM Labeling

WordNet and GoogleSets

Map FTM to a frame

Page 18: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Evaluation Crawl, extract, filter web tables

135 tables85.4% success rateMostly problems with complex tables

Compare auto-generated frames with human generated frames14 people transformed 3 tables each21 total tables (each done twice)Syntactic/Semantic correctness (Strict and Soft)

Page 19: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Results

Inter-annotator agreement

System-annotator agreement

Page 20: TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav.

TARTARInformation Extraction

Benefits Fully automated knowledge formalization Arbitrary tables Independent of domain knowledge Independent of document type Explicit semantics of generated frames Query answering over heterogeneous tables


Recommended