Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann
Freie Universität Berlin, Universität LeipzigFreie Universität Berlin, Universität Leipzig
I f b E t tiInfobox Extraction
dbpedia:Albert Einstein p:namedbpedia:Albert_Einstein p:name„Albert Einstein“
dbpedia:Albert Einstein p:birth placedbpedia:Albert_Einstein p:birth_place dbpedia:Ulm
dbpedia:Albert Einstein p:birth datedbpedia:Albert_Einstein p:birth_date„ 1956‐07‐09“
St t i Wiki di ‘ K l dStructuring Wikipedia‘s Knowledge
• Structuring actual data, not modeling theworldworld
• Bound to Wikipedia Templates, parsers handle template values based on rules (propertysplitting merging transformation)splitting, merging, transformation)
DB di O t lDBpedia Ontology
• DBpedia Ontology build from scratch
• 170 classes 900 properties• 170 classes, 900 properties
T l t M iTemplate Mapping
Class TV Episode (Work)
Wikipedia Templates:Wikipedia Templates:
Television Episodep
UK Office Episode
Simpsons Episode
D t Wh BDoctorWhoBox
T l t M iTemplate Mapping
I f b C i k tInfobox CricketerInfobox Historic CricketerInfobox Historic CricketerInfobox Recent CricketerInfobox Old Cricketer
Infobox Cricketer BiographyInfobox Cricketer Biography
=> Class Cricketer (Athlete)
O i tiOrganisations
Band
Company
Educational InstitutionEducational Institution
Radio Station
Sports Team
M t t d d tMore structured data
• Categories in SKOS
• Intra‐wiki links
• Disambiguation• Disambiguation
• Redirects
• Links to Images (and Flickr)
Li k t t l b• Links to external webpages
M ltili lMultilingual
Abstracts– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 S di h 144 000– Swedish: 144,000
– Chinese: 101,000
S ti W bSemantic Web
“My document can point at your document on the Web but my database can't point atthe Web, but my database can t point at something in your database without writing
l d h bspecial purpose code. The Semantic Web aims at fixing that.”g
Prof. James Hendler
W b f D tWeb of Documents
Web Browsers
Search Engines
HTTP
HTML HTML HTMLhyper h h
HTMLhyperlinks
hyperlinks
hyperlinks
A B C DA B C D
W b f D tWeb of Data
Search Linked DataLinked DataEngines MashupsBrowsers
HTTP HTTP
Thing Thing Thing Thing Thing
data data data data
Thing Thing Thing Thing Thing
datalink
datalink
datalink
datalink
B CA D E
Li k d D tLinked Data
• Use URIs as names for thingsg• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.p , p• Include links to other URIs. so that they can discover more
things.
Wikipedia Article URI:h // iki di / iki/ d idhttp://en.wikipedia.org/wiki/Madrid
DBpedia Resource URIhttp://dbpedia org/resource/Madridhttp://dbpedia.org/resource/Madrid
HTTP URIHTTP URIs
Information Resources Real‐World Resources
htt //db di / /M d id
http://dbpedia.org/resource/Madrid
http://dbpedia.org/page/Madrid
HTTP GET > 200 OKHTTP GET ‐> 303 See other
HTTP GET ‐> 200 OKhttp://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid
‐> 200 OK
Online ActivitiesMusic Online Activities
PublicationsGeographic
Cross-Domain
Life SciencesLife Sciences
U CUse Cases
1. Data Source for Web‐Applications
2. Querying Wikipedia like a database
3 Tag Web content with concepts instead of3. Tag Web content with concepts instead offree‐text tags
4. Vocabulary and semantic backbone forenterprise linked data integrationenterprise linked data integration
DB di d tDBpedia as data source
• Embed structured information fromWikipedia into your web applicationsWikipedia into your web applications
• Build (mobile) maps applications usingDB di d b lDBpedia data about places
Di l ltili l titl &• Display multilingual titles &descriptions in 15 languages
A t ti D tAnnotating Documents
• Use DBpedia concepts to annotate documentsinstead of free‐text tagsinstead of free text tags
• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais Muddy Boots)(OpenCalais, Muddy Boots)
• Social Bookmarking with DBpedia URIs as tags www faviki comwww.faviki.com
A l “„Apple“
http://dbpedia.org/resource/Apple_Inc.
http://dbpedia org/resource/Apple (fruit)http://dbpedia.org/resource/Apple_(fruit)
http://dbpedia.org/resource/Apple_Records
A t ti D tAnnotating Documents
• BBC editors tag news articles with DBpediatconcepts
• DBpedia Lookup ServiceDBpedia Lookup Servicehttp://lookup.dbpedia.org
Li ki E t i D tLinking Enterprise Data
Take the Linking Open Data
h t th t iapproach to the enterprises
Li ki E t i D tLinking Enterprise Data
• Connect data sets with DBpedia as shared vocabulary
• Enable meaningful navigation paths across BBC websites• Enable meaningful navigation paths across BBC websites
• Browsing Madonna‐related information across BBC News, BBC Music BBC ProgrammesBBC Music, BBC Programmes, …
• Make use of the rich background information:
relate the release of a music album to a news article aboutthe artist
C L D t F iCross‐Language Data Fusion
• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian villages
– German Wikipedia contains more person infoboxesinfoboxes
• Augment the infobox dataset with facts from other Wikipedia editionsother Wikipedia editions.
A t DB di ith E t l D tAugment DBpedia with External Data
• Linking Open Data cloud provides more data than WikipediaWikipedia– EuroStat provides additional statistical information about countries.
– Musicbrainz contains additional information about other bands.
– Geonames provides additional information about locations.
• Idea – Augment DBpedia with additional data from external g psources.
C t ib t b k t Wiki diContribute back to Wikipedia
• OpportunityF d d t b k t Wiki di– Feed data back to Wikipedia
• Extend the Wikipedia authoring environment p gwith– Suggestions for infobox values– Suggestions for infobox values– Cross‐language consistency checking for infoboxes
• Currently going on– New maps in Wikipedia based on Dbpedia MobilNew maps in Wikipedia based on Dbpedia Mobil Code (OpenStreetMap)
C t ib t b k t Wiki diContribute back to Wikipedia
• Initialize Wikipedia Clean‐Up Cycles– Data‐driven search interfaces expose the weaknesses of Wikipedia template system.
– Preferred items not showing up in end‐user interfaces may motivate Wikipedia editors to useinterfaces may motivate Wikipedia editors to use templates more stringently.
Li U d tLive Update
• Current SituationDB di d t l 3 th– DBpedia update cycle: 3 month
– Wikipedia provides us with access to the live update stream
• OpportunityOpportunity– Increase the currency of the DBpedia dataset using this update streamusing this update stream
• Result– DBpedia in synchronization with Wikipedia.