+ All Categories
Home > Technology > Using Linked Data to Mine RDF from Wikipedia's Tables

Using Linked Data to Mine RDF from Wikipedia's Tables

Date post: 14-Jun-2015
Category:
Upload: emir-munoz
View: 196 times
Download: 0 times
Share this document with a friend
Description:
In 7th ACM Web Search and Data Mining Conference (WSDM 2014), New York City, New York, 24-28 February
Popular Tags:
26
Using Linked Data to Mine RDF from Wikipedia’s Tables http://emunoz.org/wikitables Emir Muñoz Fujitsu (Ireland) Limited National University of Ireland Galway Joint work with A. Hogan and A. Mileo WSDM 2014 @ New York City, February 24-28
Transcript
Page 1: Using Linked Data to Mine RDF from Wikipedia's Tables

Using Linked Data to Mine

RDF from Wikipedia’s Tables

http://emunoz.org/wikitables

Emir Muñoz

Fujitsu (Ireland) Limited

National University of Ireland Galway

Joint work with A. Hogan and A. Mileo

WSDM 2014 @ New York City, February 24-28

Page 2: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 2

MOTIVATION (1/10)

Page 3: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 3

MOTIVATION

The tables embedded in Wikipedia articles contain rich,

semi-structured encyclopaedic content

… BUT we cannot query all that content…

A query example:

(2/10)

Wikipedia tables or tables in the body are ignored

[Borrowed from Entity Linking tutorial]

Page 4: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 4

Results at

25-02-2014

Page 5: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 5

First result

Page 6: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 6

Second result

10

Airlines

Page 7: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 7

Third result

19

Airlines

Page 8: Using Linked Data to Mine RDF from Wikipedia's Tables

• Same query in SPARQL over

Emir M. - WSDM, New York City, USA, 27th February, 2014 8

MOTIVATION

SELECT ?p ?o WHERE

{ <http://dbpedia.org/resource/Airbus_A380> ?p ?o . }

FAIL

(7/10)

Page 9: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 9

Page 10: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 10

No evidence of A380

Page 11: Using Linked Data to Mine RDF from Wikipedia's Tables

• We perform automatic facts extraction (RDF)

from Wikipedia tables using KBs

MOTIVATION

Emir M. - WSDM, New York City, USA, 27th February, 2014 11

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

(10/10)

Page 12: Using Linked Data to Mine RDF from Wikipedia's Tables

• As far as we know, DBpedia and YAGO

ignore tables in article’s body

– Mainly focused on info-boxes

• Languages such as R2RML can express

custom mappings from relational database

tables to RDF

– Each row as a subject, each column as a

predicate and each cell as an object

– Needs a mapping definition

Emir M. - WSDM, New York City, USA, 27th February, 2014 12

EXTRACTING RDF FROM TABLES (1/4)

Page 13: Using Linked Data to Mine RDF from Wikipedia's Tables

• [Limaye et al. 2010; Mulwad et al. 2010&2013]

presented approaches using a in-house KB and

small datasets for validation

– Entity recognition/disambiguation

– Determine types for each column

– Determine relationships between columns

• We focus on Wikipedia tables, running our

algorithms over the entire corpus with

“row-centric” features for Machine

Learning models

Emir M. - WSDM, New York City, USA, 27th February, 2014 13

EXTRACTING RDF FROM TABLES (2/4)

Page 14: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 14

EXTRACTING RDF FROM TABLES

• Extraction of two types of relationships

– Between main entity and cell in the same columns,

e.g., “Manchester United F.C.” and “David de Gea”

– Between entities in different columns but same row

(3/4)

dbp:currentClub

dbp:position

Page 15: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 15

EXTRACTING RDF FROM TABLES (4/4)

Page 16: Using Linked Data to Mine RDF from Wikipedia's Tables

• Wikipedia dump from February 13th 2013

• Table taxonomy

Emir M. - WSDM, New York City, USA, 27th February, 2014 16

WIKITABLES SURVEY (1/2)

1.14 million tables

Page 17: Using Linked Data to Mine RDF from Wikipedia's Tables

• Table model

– Input: a source of tables (a set of tables)

• E.g., a Wikipedia article

• Each table belongs to is modeled as

an matrix

• We do normalize the tables and convert

each HTML table into a matrix

Emir M. - WSDM, New York City, USA, 27th February, 2014 17

WIKITABLES SURVEY (2/2)

Page 18: Using Linked Data to Mine RDF from Wikipedia's Tables

• To extract RDF from Wikitables we rely on

a reference knowledge base

– Version 3.8

Emir M. - WSDM, New York City, USA, 27th February, 2014 18

MINING RDF FROM WIKITABLES

Extract links in the cells

Mapping links to DBpedia

Lookups on DBpedia to find

relationships between entities

in the same row

Candidate

relationships

Wikipedia

table

(1/6)

Page 19: Using Linked Data to Mine RDF from Wikipedia's Tables

• We aim to discover:

– Relations between entities on the same row

– Relations between entities in the table and the

protagonist of the article

• Map the links inside the cells to RDF

resources

• Get candidate relationships from the KB

Emir M. - WSDM, New York City, USA, 27th February, 2014 19

MINING RDF FROM WIKITABLES

SELECT DISTINCT ?p1 ?p2

WHERE { {<e1>} ?p1 <e2> } UNION { <e2> ?p2 <e1>} }

(2/6)

Page 20: Using Linked Data to Mine RDF from Wikipedia's Tables

• We detected some weak relationships

• … We need more filtering for relationships

Emir M. - WSDM, New York City, USA, 27th February, 2014 20

MINING RDF FROM WIKITABLES

dbp:currentClub

dbp:youthClubs

(3/6)

Page 21: Using Linked Data to Mine RDF from Wikipedia's Tables

• Features at different levels used to train

Machine Learning models • Article features (e.g., # of tables)

• Table features (e.g., #rows, #columns, ratios)

• Cell features (e.g., # of entities, string length, has

format)

• Column features (e.g., # of entities, # of unique

entities)

• Predicate/Column features (e.g., string similarity, # of

rows where relation holds)

• Predicate features (e.g., triple count, count unique)

• Triple features (e.g., is the table from article or body)

Emir M. - WSDM, New York City, USA, 27th February, 2014 21

MINING RDF FROM WIKITABLES (4/6)

Page 22: Using Linked Data to Mine RDF from Wikipedia's Tables

• The experimentation set-up

– Wikipedia dump from February 2013

– DBpedia dump version 3.8

– 8 machines (ca. 2005) with 4GB of RAM,

2.2GHz single-core processors

• After 12 days we got 34.9 million unique

triples not in DBpedia

• We manually annotated a sample of 750

triples to train the ML models

Emir M. - WSDM, New York City, USA, 27th February, 2014 22

MINING RDF FROM WIKITABLES (5/6)

Page 23: Using Linked Data to Mine RDF from Wikipedia's Tables

Emir M. - WSDM, New York City, USA, 27th February, 2014 23

MINING RDF FROM WIKITABLES (6/6)

Bagging DT Simple Logistic SVM

accuracy 78.1% 78.53% 72.6%

precision 81.5% 79.62% 72.4%

recall 77.4% 79.01% 75.8%

Page 24: Using Linked Data to Mine RDF from Wikipedia's Tables

• In this work we aimed to

– Interpret the semantic of tables using KB’s

– Enrich KB’s with new facts mined from tables

• With the best model we got 7.9 million

unique novel triples

• We still don’t

– consider literals/string values in the cells

– Explode domain/range of predicates

– Test other KBs like Freebase and YAGO

Emir M. - WSDM, New York City, USA, 27th February, 2014 24

CONCLUSION

Page 25: Using Linked Data to Mine RDF from Wikipedia's Tables

• Most of the related papers use some knowledge base, such as DBpedia

– They can be benefited by new RDF triples extracted from Wikipedia tables

• We can use the similarity proposed in Knowledge-based graph document modeling, by

Schuhmacher and Ponzetto, to improve the relation extraction

• And use the paper Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction

and Curation, Chun How et al, to determine the correctness of the quality of the output triples

Emir M. - WSDM, New York City, USA, 27th February, 2014

CONTRAST WITH OTHER PAPERS

25

Page 26: Using Linked Data to Mine RDF from Wikipedia's Tables

Thank you!

Emir Muñoz

SVM our third best model

http://emunoz.org/wikitables


Recommended