Date post: | 10-May-2015 |
Category: |
Education |
Upload: | joao-rocha-da-silva |
View: | 358 times |
Download: | 0 times |
Ontologies & Linked Open Data
A brief overview and some real-world applications
João Rocha da Silva
December 2013
Contents• Ontologies: the importance of semantics in the data
storage and querying layer
• Popular ontologies : DCTerms, FOAF
• The Semantic Web in practice: Linked Open Data in the Facebook API and in DBpedia
• Relational vs Graph : differences
• The SPARQL Language : examples
• A non-relational database : OpenLink Virtuoso
The importance of semantics
The importance of semantics
• How does someone understand the meaning of the columns in a relational database?
• Reading a lot of documentation
• Hard to provide information to external systems
• Tailor-made web services required!
SAP (one of 78,826 tables and counting) source : http://scn.sap.com/thread/1743542
MediaWiki source http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2500px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png
MediaWiki source http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2500px-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png
now imagine we want to have images of different kinds, with different attributes…
The importance of semantics
• Building a query over such a system is complex
• Requires knowledge of its intricate and subtle aspects
• Some columns even contain flags for business logic processing (o_O)
• Bad design decisions = “spaghetti code”
Relational vs. Ontology
!SELECT employee.id AS employee_id, engineer.id AS engineer_id, manager.id AS manager_id, employee.name AS employee_name, employee.type AS employee_type, engineer.engineer_info AS engineer_engineer_info, manager.manager_data AS manager_manager_data FROM employee LEFT OUTER JOIN engineer ON employee.id = engineer.id LEFT OUTER JOIN manager ON employee.id = manager.id []
Building the “U.Porto” Ontology
foaf:Person
up:PhDStudent
up:Student
rdfs:subclassOf
rdfs:subclassOf
up:Faculty
org:memberOf
http://www.w3.org/TR/vocab-org/
org:Organization
rdfs:subclassOf
up : a hypothetical ontology for U.Porto
rdfs:literal
up:thesis
up:Thesis
dc:title
Representing a person
http://www.fe.up.pt/~pro11004
“João Rocha”
foaf:name
up:PhDStudent rdf:type
http://www.w3.org/TR/rdf-schema/http://www.foaf-project.org/
http://www.fe.up.pt/
org:memberOf
Getting all the studentsSELECT ?uri ?attribute ?value FROM <http://myorganization.com/data> WHERE { ?uri rdfs:type up:Student. ?uri ?attribute ?value }
• Will fetch all the students, regardless of their type
• Will also return their attributes (“database columns”)
• Different types of students will have different attributes
How does the system know that a manager is also an employee?
Inference
http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html
The inference engine recognizes certain properties and builds “virtual triples” in the background
Inference is good
• Transitive Properties (subclass of subclass…) • Subclasses • Multiple Inheritance Handling
(Student + Researcher + ScholarshipHolder)
Saves coding time spent writing complex queries
Nothing comes for free• NO referential integrity or foreign keys!
• Aggregation operators slow
• Transactions are not supported in standard SPARQL
• (“SPARQL 1.1 Query/Update Services should be atomic but that they are not required to be atomic.”)
• Graph DBMS Solutions are in early stages (many bugs, many “beta”s, many mailing lists…)
However
• Graph databases allow for flexible, intuitive representations of the data
• They handle billions of triples
• Restriction-based querying makes queries more high-level
Query examples
DBpedia
PREFIX prop: <http://dbpedia.org/ontology/> PREFIX dbprop: <http://dbpedia.org/property/> select distinct ?s ?almaMater where { ?s dbpedia-owl:almaMater ?almaMater. ?s dbprop:knownFor ?knownFor. FILTER regex(?occupation, "Facebook", "i") ?s dbprop:occupation ?occupation. FILTER regex(?occupation, "CEO", "i") } LIMIT 100
“Find Facebook’s CEO and the university where he studied”
Try it at http://dbpedia.org/sparql
DBpedia
select distinct (?car) ?manufacturer where { ?car rdf:type dbpedia-owl:Automobile. ?car dbpedia-owl:layout <http://dbpedia.org/resource/Front-engine,_rear-wheel-drive_layout>. ?car dbpedia-owl:productionStartYear ?startYear. FILTER ( ?startYear < "1990-01-01 00:00:00"^^xsd:date ) FILTER ( ?startYear > "1980-01-01 00:00:00"^^xsd:date ) ?car <http://dbpedia.org/ontology/manufacturer> ?manufacturer. { SELECT distinct(?manufacturer) WHERE { ?car dbpedia-owl:manufacturer ?manufacturer. ?manufacturer <http://dbpedia.org/property/location> ?location. FILTER regex(?location, "Japan", "i") } } } LIMIT 100
“Find all fun (aka rear-wheel-drive) cars from the eighties, made by Japanese manufacturers”
Try it at http://dbpedia.org/sparql
Custom query
• What do you want to know?
Virtuoso, a graph database
Conclusions• Relational databases
Mature, robust, support transactions
Hard to model entities with dynamic attributes
Complex querying
• Graph Databases
Recent technology
Handle billions of triples
Higher-level querying, more abstract
João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets. !He is experienced in many programming languages (Javascript-Node, PHP with MVC frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems (everyday Mac user). Regardless of language, he is a quick learner that can adapt to any new technology quickly and effectively. !He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.
!Research Data Management and Semantic Web Researcher, Web & iPhone Developer
João Rocha da Silva!