+ All Categories
Home > Documents > 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook –...

1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook –...

Date post: 01-Apr-2015
Category:
Upload: loren-tiley
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
68
Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF, Linked (Open) Data, NoSQL Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.be/~berendt/teaching/
Transcript
Page 1: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

1Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

1

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 2: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

2Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

2

Waar zijn we?

Les # wie wat1 ED intro, ER2 ED EER, (E)ER naar relationeel schema2 ED relationeel model3 KV Relationele algebra & relationeel calculus4,5 KV SQL6 KV Programma's verbinden met gegevensbanken7 KV Functionele afhankelijkheden & normalisatie8 KV PHP10 BB Beveiliging van gegevensbanken11 BB Geheugen en bestandsorganisatie12 BB Externe hashing13 BB Indexstructuren14 BB Queryverwerking15-17 BB Transactieverwerking en concurrentiecontrole18 BB Data mining en Information Retrieval9 BB XML (en meer over het Web als GB), NoSQL

Nieuwe thema‘s / vooruitblik

Hoe worden gegevens machtig? Analyse & combinatie

Page 3: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

3Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

3

Een motivatie

V:

Algemeen over het internet: valt dit te beschouwen als één grote ongeordende chaos van websites,

of zijn het meer allemaal aparte databases (bijvoorbeeld met alle webpagina's uit België of alle webpagina's van een internetprovider als Telenet)

die samen het internet vormen (en dus toelaten aan een grote, algemene database om die zijn taken te verdelen) ?

Page 4: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

4Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

4

Bijvoorbeeld: SIG.MA

Page 5: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

5Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

5

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 6: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

6Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

6

The original vision

The entertainment system was belting out the Beatles' "We Can Work It Out" when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor's office: "Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I'm going to have my agent set up the appointments." Pete immediately agreed to share the chauffeuring.

At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules. (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.)

Tim Berners-Lee, James Hendler and Ora Lassila (2001). The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American. http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21

Page 7: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

7Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

7

The Semantic Web layer cake (T. Berners-Lee talk at XML 2000)

RDF: W3C Rec. 2004

OWL: W3C Rec. 2004OWL2: W3C Rec. 2009

URI = Uniform Resource Identifier, bv:•URL (U.R. Locator) : waar te vinden (~ adres van een persoon)•URN (U.R. Name) : identiteit (~ naam van een persoon, ISBN van een boek)

Page 8: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

8Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

8

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 9: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

9Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

9

You have data … How should you structure it?

medium-altitude, long-endurance unmanned aerial vehicle

14.7 meters

512 kilograms70 knots

Here's some data about an aircraft:

400 nautical miles

Page 10: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

10Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

10The XML approach is to "wrap" each data item in start/end tags

<Aircraft> <wingspan>14.8 meters</wingspan> <weight>512 kilograms</weight> <cruise-speed>70 knots</cruise-speed> <range>400 nautical miles</range> <description> medium-altitude, long-endurance unmanned aerial vehicle </description></Aircraft>

RQ-1.xml

and define this data

schema, in a DTD

or XML Schema

Page 11: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

11Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

11

XML Terminology

<wingspan>14.8 meters</wingspan>

Start tag End tag

Data

Element

Page 12: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

12Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

12

Why use XML?

It is a universally accepted standard way of structuring data (syntax).

It is a W3C recommendation (W3C = World Wide Web Consortium)

The marketplace supports it with a lot of free/inexpensive tools.

The alternative to using XML is to define your own proprietary data syntax, and then build your own proprietary tools to support the proprietary syntax (Not a very appealing idea).

Page 13: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

13Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

13

But: What is this XML snippet talking about, i.e., what are the semantics?

<Predator> …</Predator>

What is a Predator?

Page 14: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

14Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

14

Predator - which one?

Predator: a medium-altitude, long-endurance unmanned aerial vehicle system.

Predator : one that victimizes, plunders, or destroys, especially for one's own gain.

Predator : an organism that lives by preying on other organisms.

Predator: a company which specializes in camouflage attire.

Predator: a video game.

Predator: software for machine networking.

Predator: a chain of paintball stores.

Page 15: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

15Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

15

A little more flexibility through namespaces

<?xml version="1.0" encoding=„UTF-8"?>

<myThings

xmlns:h=http://www.mySchemas.org/TR/aircraft/ xmlns:f="http://www.yourSchemas.com/animals">

<h:Predator>

<h:name>OL231-b</hname>

<h:wingspan>14.8 metres</h:wingspan>

</h:Predator>

<f:Predator>

<f:name>Panthera</f:name>

<f:eats>antelopes</f:eats>

</f:Predator>

</myThings>

Page 16: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

16Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

16

Querying XML

Verschillende querytalen, bv. XPath, XQuery

Page 17: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

17Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

17

Page 18: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

18Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

18

Page 19: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

19Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

19

Page 20: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

20Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

20

Page 21: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

21Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

21

Page 22: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

22Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

22

Page 23: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

23Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

23

Page 24: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

24Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

24

Problems of XML

1. What does nesting mean?

2. What do syntactical variations mean?

3. What do linguistic variations mean?

4. How can we extend our knowledge?

Page 25: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

25Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

25

1. What does nesting mean?

Schema 1 allows for expressions like:

<Person>

<name>Peter Parker</name> ...

</Person>

name being an XML-element of Person means: the person HAS-A ...

Schema 2 allows for expressions like:

<Person>

<type>Comic-book hero</type> ...

</Person>

type being an XML-element of Person means: the person IS-A ...

Problems: a) we don‘t know what nesting means, b) even if we do know, we can‘t express this in a machine-readable way (at most build it into an application that uses these XML statements, but that would bury meaning in procedures!)

Page 26: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

26Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

26

2. What do syntactical variations mean?

Schema 1 allows for expressions like:

<Person>

<name>Peter Parker</name>

<birthday>1932-04-12</birthday> ...

</Person>

Schema 2 allows for expressions like:

<Person name=“Peter Parker“>

<type>Comic-book hero</type> ...

</Person>

Problems: a) what does it mean for some information to be an XML-element vs. an XML-attribute? b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

Page 27: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

27Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

27

3. What do linguistic variations mean?

Schema 1 allows for expressions like:

<Person>

<name>Peter Parker</name> ...

</Person>

Schema 2 allows for expressions like:

<Person>

<naam>Peter Parker</naam> ...

</Person>

Problems: a) we do not know whether elements from different data sources that differ by, e.g. natural, language, are the same or not b) even if we do know that they are the same, we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

Page 28: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

28Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

28

4. How can we extend our knowledge?

Schema 1 allows for expressions like:

<WebResource>

<type>Picture</type>

<hasURL>http://www.example.org/Pictures/myPic.png</hasURL>

<isAbout>Peter Parker</isAbout> ...

</WebResource>

Schema 2 allows for expressions like:

<WebResource>

<hasURL>http://www.example.org/Pictures/myPic.png</hasURL>

<hasLicence>CreativeCommons</hasLicence> ...

</WebResource>

Problems: a) we cannot refine our schema information by that provided by another source b) even if we can be sure about principal linkability (here: via the URL), we can‘t express this in a machine-readable way, for example to combine the information from the two sources (same remark about applications as in 1.)

Page 29: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

29Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

29Summary: XML not well-suited for conceptual modelling and therefore not suited for truly semantic markup

XML makes no commitment on:

Domain-specific ontological vocabulary

Ontological modeling primitives

Requires pre-arranged agreement on &

Only feasible for closed collaboration

agents in a small & stable community

pages on a small & stable intranet

Not suited for sharing Web-resources

Page 30: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

30Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

30

Solution approach of the „higher levels“ of the Semantic Web

1. Break down information into atomic statements: subject-predicate-object

2. Define (in a formal-semantics way) what each component of each statement means

a. Give it a URI (uniform resource identifier) to enable uniform meaning specification

b. Define languages to say more about (specify) the meaning (by relating it to other units of meaning – cf. a dictionary in which each word is explained by other words)

3. The languages mentioned in 2.b. each add more expressivity:

1. RDF: subject-predicate-object statements (in RDF terminology: a resource has a property with a certain value.

2. RDFS: simple ontology building blocks: class, subclass-of relation, use RDF‘s type to denote that (e.g.) an individual is a instance of a class (= make it possible to define a schema and its instances), ...

3. OWL: more advanced ontology building blocks: a class (= concept) is disjoint with another one, is the same as another one; a property is functional, symmetric, the inverse of another one; ...

Page 31: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

31Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

31

Semantic Web vs. Database

Advantages of using RDF/RDFS/OWL to define an Ontology:

Extensible: much easier to add new properties. Contrast with a database - adding a new column may break a lot of applications

Portable: much easier to move an OWL document than to move a database.

Advantages of using a Database to define an Ontology:

Mature: the database technology has been around a long time and is very mature.

Page 32: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

32Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

32

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 33: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

33Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

33

RDF model

RDF “statements” consist of

resources (= nodes)which have propertieswhich have values (= nodes,strings)

http://www.w3.org/TR/REC-rdf-syntax/

“Ora Lassila”

author

= subject= predicate= object

“http://www.w3.org/TR/REC-rdf-syntax/ has the author Ora Lassila”

resource valueproperty

Page 34: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

34Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

34

RDF Model Example

http://www.w3.org/TR/REC-rdf-syntax/

“Ora Lassila”

dc:Creator

“1999-02-22”

dc:Date

“W3C”

dc:Publisher

Page 35: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

35Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

35

Complex values

So far, values of properties have been strings

A graph node (corresponding to a resource) also can be the value of a property

arbitrarily complex tree and graph structures are possible

syntactically, values can be embedded (i.e. lexically in-line) or referenced (linked)

Example:

http://www.w3.org/TR/REC-rdf-syntax/

“Ora Lassila”

dc:Creator

[email protected]

p:EMail

p:Name

Page 36: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

36Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

36

RDF in XML

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:p="http://example.org/persons/1.0/">

<rdf:Description rdf:about="http://www.w3.org/TR/REC-rdf-syntax"> <dc:creator> <rdf:nodeID="abc“> </dc:creator>

</rdf:Description>

<rdf:Description rdf:nodeID="abc"> <p:Name>“Ora Lassila”</p:Name> <p:Email>”[email protected]”</p:Email><p:HasHomepage><rdf:resource=“http://www.nokia.com”></p:…><p:WorksIn> <rdf:ID=“xyz"> </p:WorksIn>

</rdf:Description>

</rdf:RDF>

Page 37: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

37Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

37

RDF Schema

• Defines small vocabulary for RDF: • Class, subClassOf, type• Property, subPropertyOf• domain, range

• Vocabulary can be used to define other vocabularies for your application domain

Person

Student Researcher

subClassOfsubClassOf

Jeentype

hasSuperVisordomain range

Frank

type

hasSuperVisor

Page 38: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

38Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

38

<rdf:Description ID="MotorVehicle"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdfs:subClassOf rdf:resource="http://www.w3.org/...#Resource"/></rdf:Description>

<rdf:Description ID="Truck"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdfs:subClassOf rdf:resource="#MotorVehicle"/></rdf:Description>

<rdf:Description ID="registeredTo"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdfs:domain rdf:resource="#MotorVehicle"/> <rdfs:range rdf:resource="#Person"/></rdf:Description>

<rdf:Description ID=”ownedBy"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdfs:subPropertyOf rdf:resource="#registeredTo"/></rdf:Description>

RDF Schema syntax in XML

Page 39: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

39Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

39

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 40: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

40Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

40Wat is dit?Kunnen we hiermee iets doen?

Page 41: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

41Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

41

Gecombineerd door SIG.MA

Page 42: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

42Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

42

En hoe werkt dit?

Linked Open Data:

“A way of making the Semantic Web happen“ (it is hoped)

Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web

End 2011:

32 billion triples

Page 43: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

43Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

43

Data items are identified with HTTP URIs

pd:cygri

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri

dbpedia:Berlin = http://dbpedia.org/resource/Berlin

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 44: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

44Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

44

Resolving URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 45: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

45Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

45

Dereferencing URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

dbpedia:Hamburg

dbpedia:Muenchen

skos:subject

skos:subject

pd:cygri

From http://www.ai.sri.com/~nysmith/slides/aic-seminars/090724-bizer.ppt

Page 46: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

46Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

46

What is LOD?

“A way of making the Semantic Web happen“ (it is hoped)

Key concept: leverage the existence of structured data and combine it with the languages and infrastructures of the Web and the Semantic Web

Tim Berners-Lee: four principles of Linked Data (http://www.w3.org/DesignIssues/LinkedData)

Use URIs to identify things.

Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.

Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.

Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

Page 47: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

47Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

47

SPARQL: The standard query language for LOD

"What are all the country capitals in Africa?"

PREFIX abc: <http://example.com/exampleOntology#>

SELECT ?capital ?country

WHERE {

?x abc:cityname ?capital ;

abc:isCapitalOf ?y .

?y abc:countryname ?country ;

abc:isInContinent abc:Africa .

}

Page 48: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

48Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

48

Connecting to a database … ah … triple store

Page 49: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

49Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

49

The Linked Open Data Cloud

Page 50: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

50Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

50

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 51: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

51Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

51

History of the World, Part 1

Relational Databases – mainstay of business

Web-based applications caused spikes

Especially true for public-facing e-Commerce sites

Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache)

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 52: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

52Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

52

SELECT *FROM

membersWHERE name LIKE „%kirsten

%“????

Get write lockUpdate friends tableRelease write lock

????

Page 53: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

53Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

53

Herinnering: Taak voor de volgende les

Zijn alle ACID eigenschappen even belangrijk voor de volgende types van toepassingen?

Wat kann je doen als voor je toepassing snelheid heel belangrijk is?

Online banking

Een online shop (e.g. boeken/media)

Een sociale netwerk site

Page 54: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

54Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

54

Scaling Up

Issues with scaling up when the dataset is just too big

RDBMS were not designed to be distributed

Began to look at multi-node database solutions

Known as ‘scaling out’ or ‘horizontal scaling’

Different approaches include:

Master-slave

Sharding

All approaches come with their own respective problems

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 55: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

55Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

55

What is NoSQL?

Stands for Not Only SQL

Class of non-relational data storage systems

Usually do not require a fixed table schema nor do they use the concept of joins

All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)

NoSQL best gebruikt in grote gedistribueerde gegevensbanken!

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 56: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

56Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

56

Why NoSQL?

For data storage, an RDBMS cannot be the be-all/end-all

Just as there are different programming languages, need to have other data storage tools in the toolbox

A NoSQL solution is more acceptable to a client now than even a year ago

Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 57: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

57Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

57

Dynamo and BigTable

Three major papers were the seeds of the NoSQL movement

BigTable (Google)

Dynamo (Amazon)

Gossip protocol (discovery and error detection)

Distributed key-value data store

Eventual consistency

CAP Theorem (discuss in a sec ..)

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 58: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

58Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

58

CAP Theorem

Three properties of a system: consistency, availability and partitions

You can have at most two of these three properties for any shared-data system

To scale out, you have to partition. That leaves either consistency or availability to choose from

In almost all cases, you would choose availability over consistency

Note that this is a slightly different notion of consistency than the one we are used to from transaction systems (ACID)!

http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 59: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

59Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

59

Availability

Traditionally, thought of as the server/process available five 9’s (99.999 %).

However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.

Want a system that is resilient in the face of network disruption

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 60: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

60Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

60

Consistency Model

A consistency model determines rules for visibility and apparent order of updates.

For example:

Row X is replicated on nodes M and N

Client A writes row X to node N

Some period of time t elapses.

Client B reads row X from node M

Does client B see the write from client A?

Consistency is a continuum with tradeoffs

For NoSQL, the answer would be: maybe

CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance.

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 61: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

61Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

61

Eventual Consistency

When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent

For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service

Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 62: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

62Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

62

What kinds of NoSQL

NoSQL solutions fall into two major areas:

Key/Value or ‘the big hash table’. Amazon S3 (Dynamo)

Voldemort

Scalaris

Schema-less which comes in multiple flavors, column-based, document-based or graph-based.

Cassandra (column-based)

CouchDB (document-based)

Neo4J (graph-based)

HBase (column-based)

From: Perry Hoekstra. From: Perry Hoekstra. NoSQLNoSQL. . www.intertech.com/resource/usergroup/NoSQL.ppt www.intertech.com/resource/usergroup/NoSQL.ppt

Page 63: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

63Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

63

Dus, kunnen jullie nu beantwoorden:

p 26 tabel 2.4: Relationele databases komen slecht uit de vergelijking,

waarom worden deze dan zo veel gebruikt?

Page 64: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

64Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

64

Gegevensbanken Outlook –

The Semantic Web,

XML, RDF,

Linked (Open) Data,

NoSQL

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Page 65: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

65Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

65

Data mining/information retrieval and Linked Data?

Crowdsourcing:

Unstructured / semi-structured information Structured data

DM and IR:

Unstructured / semi-structured information Structured data

… and vice versa: LOD as a data source for DM !

Page 66: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

66Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

66

NoSQL and Linked Data ?

„RDF database systems are the only standardized NoSQL solutions available at the moment, being built on a simple, uniform data model and a powerful, declarative query language.”

http://blog.datagraph.org/2010/04/rdf-nosql-diff

More ideas:

http://webofdata.wordpress.com/2011/05/02/nosql-linked-data-processing/

Page 67: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

67Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

67

NoSQL and Data Mining / Information Retrieval ?

Indeed! Since scalability is a huge issue!

More in Advanced Databases and Text-Based Information Retrieval, where you‘ll work with such systems (and, if you want, use LOD …)

Page 68: 1 Berendt: Gegevensbanken, 2nd semester 2011/2012, berendt/teaching/ 1 Gegevensbanken Outlook – The Semantic Web, XML, RDF,

68Berendt: Gegevensbanken, 2nd semester 2011/2012, http://www.cs.kuleuven.be/~berendt/teaching/

68


Recommended