Post on 20-Jan-2015
description
transcript
1/
Is Linked Data something for me?
Christophe Guéret, Clément Levallois eHumanities group meeting, November 22, 2012
2/
Get ready !
Goal of today Learn about Linked Data
See if that is something interesting for your activities
3/
Hands-on tutorial
Make groups, one per table Pick a famous person of your choice per group Grab the material on http://bit.ly/ehg_tutorial or
catch a USB stick
4/
Big data, but how to get it?
Can't always gather all the information manually
5/
Big data, but how to get it?
Data scattered in different information systems
6/
Big data, but how to get it?
Data in different formats
7/
What if we could?
If all data where “readable”, connections between datasets could be made. We would simply know more than we do today.
“Linked data” is an attempt to do that
8/
Why is it so hard?
Machines can not read the text and extract data
What is the name of that person?
9/
Ouch!
You just faced the same problem as machines: Can't read the document and extract the data
Linked Data is a solution to this problem
Note: in the following we take the example of data “buried” in webpages (html documents), but the same logic applies to other kinds of docs (csv files, databases, your collection of pictures…)
10/
Use case for the hands-on
11/
What we will do...
Take a the webpage of a researcher (one page per group!)
Explain why the data in this page is “buried”
Solve the issue by introducing some linked data sweetness in the webpage
Show what we gained: now, we can connect the researchers!
12/
Template 1
The name is in the title City is ambiguous
13/
Template 2
The name is not visible on the page City is ambiguous
14/
Template 3
The name is in the description City is ambiguous
15/
Hands-on: check out the templates Open the templates in a web browser and look at
their HTML source code
16/
Hands-on: check out the templates Change “William Smith” into a name of your own
(one name per group)
Change and pick another name!
17/
First part of the hands-on
18/
In what sense do we mean that the name of this researcher is buried in this web page?
There is no way for a software reading this page to guess: is there a name on this page? if so, what is this name? What does this name represent? What does it relate to?
But wait, my Internet browser can read html pages, why can’t it figure out the name of the researcher?
Because the html code gives info about how to display the page, but no info about what the content means!
19/
Two roads from there…
We could design a software that understands English This is the approach of natural language processing,
statistics, etc...
We can put extra code that tells directly to the software
what the data means This is the linked data approach! This extra code in html
pages is called “RDFa”
20/
Annotate the data
We use a VOCABULARY for these annotations
foaf:name
21/
Wait! What is that “foaf:name” ?
It is a term from a vocabulary foaf:name comes from the vocabulary FOAF and is used
to annotate the name of a person
Vocabulary = set of unambiguous consensual
terms used to annotate pages with data
Vocabulary are An agreement between data publisher and consumers Generally focused on particular topics
Key concept!!!
22/
Annotate the page with the data
23/
Hands-on: annotate with foaf:name
Add the “foaf:name” annotation to the three templates
Step 1: declare the vocabulary FOAF <html xmlns:foaf="http://xmlns.com/foaf/0.1/">
Step 2: annotate the data <span property="foaf:name">William Smith</span> Template 2 does not display the name we use a meta: <meta property="foaf:name" content="William Smith"/>
24/
Hands-on: extract annotations
Use the RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates
Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html
All the three return the same result: nothing!
25/
Bingo!
We get exactly the same result for the three templates
foaf:name = William Smith
26/
How this should look like now
(here showing template 1)
27/
How to choose a vocabulary?
Vocabulary => consensus
Therefore, it is better to Avoid obscure vocabularies nobody knows Focus on well organised and maintained vocabularies
Why did we use FOAF? Specialised for personal profiles and widely accepted W3C support & recommended for use by EU members
http://joinup.ec.europa.eu/asset/core_person/description
28/
What vocabularies are available?
Many are well established: FOAF, SIOC, Dublin Core, BIBO, …
Creating vocabularies is doable but beware that: New vocabularies won't necessarily gain adoption Need to maintain the vocabulary Need to host it on the Web
A vocabulary can borrow terms from other vocabs.
29/
EU initiative
“Core Vocabularies” from ISA program Combine existing terms and new ones
30/
Google/Bing/Yahoo/Yandex initiative
Vocabulary: Schema.org Used by search engines to extract pages' data
31/
Facebook initiative
Vocabulary: Open graph protocol Used to put the “Like” buttons on pages
32/
How to use a vocabulary?
Look at the documentation, e.g. http://xmlns.com/foaf/spec/
Map your concepts to terms from the vocabulary Naam → foaf:name Voornaam → foaf:firstName Achternaam → foaf:lastName Werklocatie → foaf:based_near
33/
Triples and subjects
Remember, we created this annotation . foaf:name "William Smith“
But what entity has “William Smith” for a name? <template1.html> foaf:name "William Smith"
This is a “triple” made of a subject, a predicate and an object Subject = <template1.html>
Predicate = foaf:name
Object = "William Smith"
Meaning: This document has for name “William Smith”
34/
We did not declare a subject
This says that this is the foaf:name but does not define a subject → Use the page name by default
foaf:name
35/
Why does this matter?
Subjects can be used as objects to create links
Need a common subject to group annotations
Durham foaf:based_near
William smith foaf:name
foaf:knows foaf:name
36/
Picking a resource
Need to be stable, web accessible, re-used
Consensus again, example: Amsterdam: http://dbpedia.org/resource/Amsterdam TBL: http://www.w3.org/People/Berners-Lee/card#i
The <C:/MyDirectory/templateX.html> are not valid
Web based, we need to change that
37/
Hands-on: set the subject
Step 1: decide on a resource for the person http://example.org/william_smith http://myurl.com/john_doe
Step 2: add the resource with an “about” tag in the same span as the foaf:name
Example: You had: <span property="foaf:name">
It becomes:
<span about="http://example.org/william_smith_page" property="foaf:name">
38/
5-star Linked Data
Rules (see http://5stardata.info/ ): Resource are valid URIs Machine readable data is associated to the resource The data contains links to other resources
Example http://dbpedia.org/resource/Amsterdam
39/
Great! We're done now!
We added this structured piece of data to all the templates:
<http://example.org/william_smith> foaf:name "William Smith"
This data can be extracted by a software
We can build our application that fetch persons'
name, but there are still no links between them :-/
40/
One of the new code
All the annotated templates have their name suffixed with “_with_name_and_subject”
41/
Second part of the hands-on
Create some links
42/
Creating links
Links are used to connect two resources
Example: William Smith knows Tim Berners-Lee <http://example.org/william_smith> foaf:knows
<http://www.w3.org/People/Berners-Lee/card#i>
Two usages: Create (social) networks by connecting resources Disambiguate text by pointing to the exact resource
43/
Hands-on: getting social Step 1: ask 3 other groups in this workshop for their subject (remember, a subject is: <span about="http://example.org/william_smith_page" property="foaf:name">
Step 2: use the 3 subjects you got to annotate the links Example:
I know
<span rel="foaf:knows" resource="http://example.org/john_doe">John Doe</span>
, and
<span rel="foaf:knows" resource="http://myUrl.com/nchomsky">Noam Chomsky</span>
, and also
<span rel="foaf:knows" resource="http://ehumanities.knaw.nl/sally_wyatt">Sally Wyatt</span>
44/
Let's make some links
45/
Remember, there are two Durham
One of the US, one in the UK, similar importance Which one is the “Durham” on the profile?
http://sws.geonames.org/4464368 http://sws.geonames.org/2650628
46/
Finding a resource on Geonames
Search by name, follow the RDF link, strip out the “/about.rdf” part
47/
Hands-on: disambiguate Durham
Annotate “Durham” with a link to the exact resource
Step 1: decide on which Durham to use
Step 2: annotate Durham with the link <span rel="foaf:based_near"
about="http://example.org/william_smith" resource="http://sws.geonames.org/4464368">Durham</span>
48/
Hands-on: extract annotations
Use the RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates
Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html
All the three return the same result!
49/
Hands-on: extract a network!
Now use a little software from the dropBox
50/
That's all for now!
(but there is more to discover: ontologies, reasoning, SPARQL, ...)