Fact Extraction from Wikipedia

Post on 04-Aug-2015

203 views 4 download

Tags:

transcript

Cutting Long Stories Short

Fact Extraction from Wikipedia

Marco Fossati fossati@spaziodati.eu

Poznan, 25th June 2015

What?A Google Summer of Code Project for DBpedia

What?

Teaching Machines to Read

Natural Language

Why?Text Contains a Huge Amount of Knowledge

Why?

DBpedia Focuses on Semi-structured Data

Discovery of New Relations

Automatic Knowledge Base Population

How?

Machine Learning +

Lexical Semantics

How?

Poland victory World Cup 2014

“Poland won the World Cup in 2014”

Approach

1. Lexical Units

1.1.Extraction via POS Tagging

1.2.Statistical Ranking

2. Frame Database (FrameNet, Kicktionary)

The Data-driven Way

Approach

3. Frame + Frame Elements Classification

Unsupervised, Rule-based

Supervised

4. Crowdsourced Training Set Construction

5. RDF Serialization

The Data-driven Way

Crowdsourcing the AnnotationLabel words with Frame Elements

Use Case

Soccer Domain

Widely Represented (223.000 articles)

Lots of Semi-structured Data

Italian Wikipedia

Wanna contribute?

https://github.com/dbpedia/fact-extractor

That’s all Folks!

Marco Fossati fossati@spaziodati.eu