Linked Data: principles and examples

Post on 07-Jul-2015

692 views 1 download

Tags:

description

Guest Lecture for VU Social Science students

transcript

Linked Data Principles and Examples

Victor de Boer25-11-2014

With slides from Knud Hinnerk Moller, Kasper Brandt, Christophe Gueret

Victor de Boer

Researcher at Netherlands Institute for Sound and Vision

Assistant professor at Web and Media Group VU

Semantic Web, Linked Data

Cultural Heritage

Digital History

Linked Data for Development

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)

Web of Documents (WWW)Linked Documents

From text to data > increased semantics

More and more structured data available online

• Governments

• Social web data

• Medical data

• Museums

• Research data

?

Mo

verum

.com

Web of Documents vs Web of Data

• People are often not interested in documents, they are interested in things (information) – Humans are very good at reading (web)

documents and distilling information

• Computers are very good at calculating, combining and filtering information. But they are very bad at reading documents– We need to help machines understand web data

– Write it down in a way that they can understand

LINKED DATA!!

Web of Documents (WWW)Linked Documents

Web of DataLinked Data

without

Slide stolen from Christophe Gueret

with Linked Data

Slide stolen from Christophe Gueret

http://info.cern.ch/Proposal.html

Tim Berners-Lee (The inventor of the Web)And the Semantic Web

What is Linked Open Data?

Intermezzo

Intermezzo

Open Datais about licenses to allow reuse

Linked Datais about technology for interoperability

Intermezzo

Intermezzo

★Available on the web (whatever format), but with an open license

★★

Available as machine-readable structured data (e.g. excel instead of image scan of a table)

★★★as (2) plus non-proprietary format (e.g. CSV instead of excel)

★★★★

All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff

★★★★★All the above, plus: Link your data to other people’s data to provide context

www.w3.org/designissues/linkeddata.html

Linked Data five star system (TBL)

Intermezzo

Intermezzo

http://lod-cloud.net/

Examples of Linked Data

• Academia, Research

• Community

• Libraries, Museums, Cultural Heritage

• Government and public institutions

(Open Data)

• Media

• Business

OpenPhacts explorer

http://www.openphacts.org/

Google knowledge graph

ww

w.h

uffin

gton

po

st.com

How does all this work?

• Data, not documents

• Structured data

• Graph (networked) data!

• W3C Web standards stack

– URIs, HTTP, RDF, RDFa, RDFS, OWL, SPARQL, etc.

Four rules of Linked Data

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF)

4. Include links to other URIs. so that they can discover more things.

http://www.w3.org/DesignIssues/LinkedData.html

Semantic Web standard for writing down data, information

(Subject, Relation, Object)

<Painting001, has_location, Amsterdam>

Resource Description Framework (RDF)

Painting001 Amsterdamhas_location

name

located in

located in

located inpopulation

population

capital

People’s Republic of

China

Beijing

SJTU

23,019,148

20,693,000

Shanghai Jiao Tong University

name

Shanghai

上海

SJTU name "Shanghai Jiao Tong University"

SJTU located in Shanghai

Shanghai name "上海"

Shanghai population "23,019,148"

Shanghai located in People’s Republic of China

People’s Republic of China capital Beijing

Beijing located in People’s Republic of China

Beijing population "20,693,000"

• Graph• Triple

Graph Thinking

Use HTTP URIs for Things

• Uniform Resource Identifier (URI) is a string of characters used to identify a name of a resource

• http://rijksmuseum.nl/data/schilderij1

• I can go there (dereference) and then I get information about it – HTML page for humans– RDF data for machines

Links

• Link your data to other data

– By establishing RDF triples that point to other people’s data

– By reusing other people’s URIs

Example: Link to Geonames

IDS: document 0002 Country:”Gambia”

Geonames:Gambia

Region: Africa

population : 1593256

N 13° 30' 0'' W 15° 30' 0'

Reuse things: Vocabularies

• FOAF (Friend of a Friend): People, Organisations, Social Networks

• Dublin Core (Bibliographic): publications, authors, media, etc.

• schema.org (Google, Yahoo!, Bing, Yandex): cross-domain, what search engines are interested in (people, events, products, locations)

• Good Relations: business, products, etc.

rijks:Painting001 Amsterdam

http://purl.org/dc/terms/spatial

Reuse things: Datasets

• GeoNames: Geographical data• DBPedia: RDF version of Wikipedia (also in

Dutch)• GTAA: (Gemeenschappelijke Thesaurus

Audiovisuele Archieven): Persons, topics, AV-terms

• VIAF: Persons

rijks:Painting001 http: //sws.geonames.org/2759794/

http://purl.org/dc/terms/spatial

Examples

Dutch Ships and Sailors Linked Data Cloud

Victor de Boer, Matthias van Rossum, Jur Leinenga, Rik Hoekstra

With input from Andrea Bravo Balado and Robin Ponstein

Netherlands Institute for Sound and Vision / VU University Amsterdam v.de.boer@vu.nl

ISWC2014

The Problem:((Maritime) historical) data is not integrated

25+ Maritime datasets; Heterogeneous

The solution

Well, Linked Data obviously!

But why Linked Data

• Heterogeneous models, one dataformat– Link what can be linked– Keep specificity of original data – Allow integration at project level (and beyond)

• Links to other sources: re-use knowledge

• Extensible

• Allow multiple levels of semantic enrichment/ normalization – Provenance

KB Delpher

Dutch-Asiatic Shipping (DAS) –Voyages (Huygens ING)

“VOC Opvarenden”Mustering and payroll information (DANS Easy)

Dutch Ships and Sailors

Modeling in collaboration with historians (1)

dss:Recordmdb:Aanmonstering

mdb:aanmonstering-del_gem-1879-101

dss:Recordmdb:PersoonsContractmdb:persoonscontract-

del_gem-1879-101-16858-Pieter_Hoekstra

dss:Schipmdb:Schip

mdb:schip-del_gem-1879-101-Isadora

dss:shipmdb:ship

“1870-1894"

"Isadora"

rdfs:labeldss:shipname

mdb:scheepsnaam

dss:ShipTypemdb:ScheepsTy

pemdb:schoener

dss:shiptypemdb:scheepstype

“32”

dcterms:identifiermdb:inventarisnummer

mdb:has_KB_article

<http://resolver.kb.nl/resolve?urn=ddd:010063756:mpeg21:a0045:ocr>

mdb:schip-del_gem-1879-137-Isadora

owl:sameAs

dss:has_aanmonstering

mdb:has_person

foaf:Persondss:Person

mdb:Personmdb:persoon-del_gem-1879-101-16858

dss:rank

mdb:rank

dss:Rankmdb:Rang

mdb:matroos

mdb:maandgage

“Pieter"foaf:firstnamemdb:voornaa

m“Hoekstra"

foaf:lastnamemdb:achternaam

Jur Leinenga(Huygens ING) Muster-rolls Northern Provinces1803-1937

Modeling in collaboration with historians (2)

dss:Recordgzmvoc:Telling

gzmvoc:telling-1046-De_Berkel __bnode_

1gzmvoc:aziatischeBemanning

dss:Shipgzmvoc:Schip

gzmvoc: schip-1046-De_Berkel

dss:has_shipgzmvoc:schip

"1046"

“Schip”

“De Berkel”rdfs:label

dss:scheepsnaamgzmvoc:scheepsnaam

dss:ShipTypegzmvoc:Scheepst

ypegzmvoc: type-

Ship

dss:has_shiptypegzmvoc:has_shiptype

gzmvoc:scheepstype

“21”

“Moorsemattroosen”

dss:azRegistratieKop

gzmvoc:azAantalMatrozen

gzmvoc:telling

gzmvoc:heeft DAS heenreis

dss:Recorddas:Voyagedas:voyage-

1918_61

Matthias van Rossum (VU-hist) Payroll information for European

vs Asiatic Sailors (17th / 18th C)

Modelling principles

• Model each dataset as directly as possible– Only “syntactical” transformation to RDF– No normalization

• Reusability

• Transparency, trust

• Normalize and link in second stage – store in separate RDF Named Graphs

mdb:Schip1 mdb:Kof

mdb:scheepsType

das:ShipX das:Kofship

das:typeOfShip

dss:has_shipType

rdfs:subPropertyOf

rdfs:subPropertyOf

Link properties and classes to interoperability layer

mdb:Schip1 mdb:Kof

mdb:scheepsType

das:ShipX das:Kofship

das:typeOfShip

Aat:Kof

Aat:Platbodems

skos:exactMatch

skos:exactMatch

skos:exactMatch

Vocabulary Links

Links to DBPedia (Ship types, places, ranks)Links to Getty AAT (Ship types, ranks)Links to GeoNames (Places)

Linking to Historical newspapers

• Automatically detect links between ships and historical newspaper articles (delpher.nl)

– Based on ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge

• 179,120 links

- Andrea Bravo Balado

Example

[HARLINGEN, 24 October.] . «et gestrande

Zweedsche schip , waarvan wij ons vorig no.

melding maakten , is door de 'eepboot van

hier afgebragt en hier binnengede u BiJ die

gelegenheid werd ons medegeeeid, dat nog

vier vaartuigen op Terschelling aren

gestrand. Tevens is het berigt ontvan°e > dat

het hier behoorende schoonerschip

Transit, kapitein Schaap, in de Noordzee is

gezonken, nadat het achterschip was

weggeslagen ; een ligtmatroos verloor

daarbij het leven. Mede zijn hier drie

vreemde schepen met meer en minder

zware averij binnengeloopen.Spoiler alert! It sank in the North Sea.

mdb:Aanmonstering_1859-55

mdb:Transit

Provenance

• Sets of triples have provenance information

– Who made it (people/software?)

– Based on what source

– Content confidence

• Matches historical

science requirements

DAS

GZMVOC

MDB

VOCOPVBegunstig

den

VOCOPVSoldijboek

en

PROV

AAT

VOCOPVOpvaren

den

foaf

owl:sameAs

dss:hasKBLink

rdfs:subClassOf,rdfs:subPropertyOf

dss:DAS link

skos :exactMatch

Data analysis and visualisation

Current work: linking original scans

[HARLINGEN, 24 October.] . «et gestrande

Zweedsche schip , waarvan wij ons vorig no.

melding maakten , is door de 'eepboot van

hier afgebragt en hier binnengede u BiJ die

gelegenheid werd ons medegeeeid, dat nog

vier vaartuigen op Terschelling aren

gestrand. Tevens is het berigt ontvan°e > dat

het hier behoorende schoonerschip

Transit, kapitein Schaap, in de Noordzee is

gezonken, nadat het achterschip was

weggeslagen ; een ligtmatroos verloor

daarbij het leven. Mede zijn hier drie

vreemde schepen met meer en minder

zware averij binnengeloopen.Spoiler alert! It sank in the North Sea.

mdb:Aanmonstering_1859-55

mdb:Transit

DataLab

http://dutchshipsandsailors.nl/data

v.de.boer@vu.nl

Networked heritage

Concept: Jan Sluijters (schilder)DBpedia

Related items

Links

• Styles (Expressionism, Cubism, Fauvism)

• Period (contemporaries)

LinkedTV: Example of contextualization

LinkedTV – SmartTV

12 februari 2013

Cultureel erfgoed scenario, Tussen Kunst & Kitsch

Met dank aan overeenkomst met AVRO!

DIVE INTO THE EVENT-BASED

BROWSING OF LINKED HISTORICAL

MEDIAVICTOR DE BOER, JOHAN OOMEN, OANA INEL, LORA

AROYO,

ELCO VAN STAVEREN, WERNER HELMICH AND DENNIS DE

BEURS

DIGITAL HUMANITIES RESEARCHERS Med

ia research

er Lars A

rveR

øsslan

do

f the U

niversity o

f Bergen

. (Ph

oto

: An

dreas R

. Graven

)

EXPLORATIVE SEARCH

Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber, G. Automatic Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadata_enrichment_with_hi

http

s://ww

w.flickr.co

m/p

ho

tos/d

rainrat/1

47

79

92

89

98

/

DATA: OPENIMAGES.EU

Open videos Netherlands Institute for Sound and Vision

3000, mostly news broadcasts

DATA: DELPHER.NL

Scans of Radio bulletins (hand annotated)

• 1937 – 1984

• 1.5 Million OCR’ed and NErred

ENTITY EXTRACTION

CROWDTRUTH.ORG

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

SIMPLE EVENT MODEL (SEM), OPENANNOTATION (OA) AND SKOS

DIVE:MEDIA

OBJECT

SEM:EVEN

T

SEM:PLACE

SEM:TIME

SEM:ACTOR

SKOS:CONCEPT

OA:ANNOTATIO

N

• LINKS TO EUROPEANA (MULTILINGUAL)• LINKS TO DBPEDIA

DIGITAL SUBMARINE UI

http

s://ww

w.flickr.co

m/p

ho

tos/b

enjcarso

n/2

45

17

18

85

INFINITY OF EXPLORATION

http

s://ww

w.flickr.co

m/p

ho

tos/m

ibu

chat/2

77

42

51

41

5

THANK YOU

http

s://ww

w.flickr.co

m/p

ho

tos/ro

bysalto

ri/

DIVE.BEELDENGELUID.NL

v.de.boer@vu.nl

Linked Data 4 Development

Linked Data for International Aid Transparency Initiative

Msc. Thesis by Kasper Brandt Victor de Boer

“IATI is a voluntary, multi-stakeholder initiative that seeks to improve the transparency of aid in order to increase its effectiveness in tackling poverty.”

Linking datasets and Applications User questions

1. In total, how much does a given country receive in aid?

2. A comparative index of aid versus the Human Development Index.

3. What is the geographic location of a project? How much aid went to a given province, constituency or village?

o Is the aid spent in places where the need is highest? Is it well distributed across the country?

o Can we attribute sub-national breakdowns for aid so we can see how much goes to different parts of recipient countries?

4. How does violent conflict in recipient countries affect aid activities?

5. How does aid spending as registered in the IATI standard compare to World Bank indicators?

IATI 2 LOD application

http://iati2lod.appspot.com/applications

Information sharing in rural developing areas

Need for information sharing in rural

developing areas

• Agricultural, Health, Education, Market prices…

Sharing (heterogeneous) knowledge is essential

• LD is well-suited because of:– Language-agnostic– Interface-agnostic– De-centralised authoring

• Slicing

– Re-usability• Local• Global

Based on Sbc4d.com

Local market data

Communiqué

GSM/Voice interface

Web Interface Text-To-Speech

Community radio

RadioMarché

Sahel Eco operativeBuyers

EcoMash

[M.Sc. thesis by Henk Kroon]

Linked Data for Development (LD4D)

Web applications

<VoiceXML> to SPARQL*

Voice browserTel: +31208080855

Skype: +990009369996162208

RadioMarché Linked market data

‘Allo, Linked Data?

DBpediaGeoNames

Agrovoc

Low-powered hardware and Mesh networking

ENTITY REGISTRY SYSTEM (ERS)• Fully decentralised Linked Data publication platform• Works under any kind of connectivity context• Tracks back individual edits back to their authors• Simple and versatile• Open Source https://github.com/ers-devs• Low resource demanding

... and open for contributions so don'thesitate to fork it!

Rapid-prototyping knowledge sharing platform

(aka “The Box”)

With the mainstream

Dev. countries can leapfrog directly into the information age,

jumping many phases of immature technologies

Img: flickr/n3v3rv0id

Linked Data is mainstream computer science research.

Test hypotheses in domains/environments

Take Home• Linked Data is a set of technologies and principles fpr

formalizing data and information to make it usable for computers– Based on triples and URIs– Data takes the form of graphs– We can link data from heterogeneous sources– Reuse

• It mirrors the Web of Documents, Social Web– But behind the scenes

• Networks are very powerful and flexible for representing and sharing information

Thank you!

Victor de Boer

http://victordeboer.comv.de.boer@vu.nl