Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 2 times |
Towards SemanticWeb engineeringMultichannel publishing 3/12/2009
Olli Alm
Outline
Part 1:Semantic WebOntologyRDF languagesQuerying and reasoning SW data
Part 2:Modelling SW dataSW data processingCase examplesSummary
Outline
Part 1
Outline: part 1
Semantic Web
• The vision: WWW with intelligent machines (Tim Berners-Lee)
• In practice: a set of languages and techniques for knowledge processing, modelling and representation
• W3C activity group: standards, specifications, recommendations, tools (www.w3.org)
Semantic Web
”The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.”(from W3C SW activity statement)
1) common formats for integration of data
2) for recording how the data relates to real world object
Semantic Web
The layer cake of the Semantic Web technologies
Semantic Web
• MVC & XML-movement in the web: separate the data model from it’s representation
• The Semantic Web:“unified” data model for representing (real world) data to be utilizedon any representation
what if we could…1. …represent any kind of (real world) data?2. …represent data in a unified way?3. …just take and reuse open data in our application?4. …integrate data easily from diverse sources?
Semantic Web
The Semantic Web:
• A branch of Artificial Intelligence?
• Symbolic AI: old ideas in a new form?
• Machine intelligence: symbolic representation of the facts
”Symbolic AI (or Classical AI) is the branch of artificial intelligence research that concerns itself with attempting to explicitly represent human knowledge in a declarative form (i.e. facts and rules).”
Semantic Web
The Semantic Web:
• Explicit representation: an ontology
Semantic Web
The Semantic Web:
• Explicit representation: an ontology
• Not just explicit representation, in addition: shared
Semantic Web
The Semantic Web: shared conceptualization?
Semantic Web
The Semantic Web: shared conceptualization? (the linked data project)
Semantic Web
The Semantic Web: shared conceptualization
• everything is connected
• everything is referable (URIs)
• distributed set of statements (ontologies) as a basis of our world model
• ontology language(s): 1. tool for identifying resources2. tool for stating facts about resources (=statements)3. tool for sharing and integrating statements4. tool for reasoning the data
-e.g. acquiring new statements with deductive reasoning
• in SW world, term “ontology languages” refer to RDF-based languages such as RDFS, OWL (and OWL2).
Ontology
”OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document”
RDF
An example of RDF-data (in XML serialization) -person info
<foaf:Person rdf:about="#me" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>
RDF
An example of RDF-data (in TURTLE / TTL serialization) -person info
<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .
RDF
An example of RDF-data (in TURTLE / TTL serialization) -web page info
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix dc: <http://purl.org/dc/elements/1.1/#>. @prefix exterms: <hhttp://www.example.org/terms/>.
<http://www.example.org/index.html> exterms:creation-date "August 16, 1999"; dc:language "en"; dc:creator
<http://www.example.org/staffid/85740>.
RDF
An example of RDF-data (graph representation) -web page info
RDF
An example of RDF-data (graph representation) -web page info
The graph-like nature of the RDF-resources / objects are nodes-properties / attributes are edges**properties are also resources (in the metalevel) and can be represented as a nodes in the graph (why is that?)
RDF
RDF (Resource Description Framework) is…-a statement language (logics)
-a statement = triple
A triple has three parts: 1) subject, 2) predicate and 3) object
Example from Friend-Of-A-Friend schema (FOAF)
<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .
subject
predicate
object
Triple says: ”me is a (type of) person”Triple says: ”me is called ”Dan Brickley”
Triple says: ”me has a homepage danbri.org”
The sets of triples forms a graph that interlinks resources with each other! (here: 4 triples, with subject #me)
The sets of triples forms a graph that interlinks resources with each other!
RDF
URI• in RDF, everything has a unique identifier, URI• Uniform Resource Identifier• URI is an URL without link: not always clickable • in SW, URLs can be and are utilized as a URIs
(don’t mix with URNs, IRIs or PURLs)
<http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .
Dan Brickley is identified by http://mynamespace.fi#me
foaf:name is an abbreviation for URI http://xmlns.com/foaf/0.1/(a property defined in foaf-namespace)
RDF
URI• For consistency, URIs should not change often (or at all)• (should the URI change if the “identity” or “essence” of the resource
changes?)• URI identifies an object, but that doesn’t mean that different URIs refer to different resources:
in Web Ontology Language (OWL), we can state that two different URIs refer to the same object:
<rdf:Description rdf:about="#William_Jefferson_Clinton"><owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>
(also the opposite is possible: we can state that two resources are distinct from each other)
RDFS
RDFS (Resource Description Framework Schema)• Divides the world into universals (classes) and particulars
(individuals / instances) TYPING
E.g. “Lassie is a dog” = @prefix sws: <http://www.metropolia.fi/~ollial/2009/11#>. <sws:lassie> rdf:type sws:dog ; foaf:name ”Lassie” .
Classes have subclasses: <sws:dog> rdf:type rdfs:Class ; rdfs:subClassOf sws:animal;
(Transitive) reasoning in RDFS:1) Lassie is a dog2) Dog is a kind of animal _ Lassie is a kind of animal
OWL
OWL (Ontology Web Language)
Extends RDFS to express• relations between classes, between instances
• property types: literal vs. objects literal property: foaf:name = “Olli” object property: foaf:knows http://someone/somewhere
• Subtyping of properties reasoning (e.g. functional, transitive)• Computability / complexity levels for the model• Three sublanguages OWL-FULL, OWL-DL, OWL-LITE
OWL
OWL (Ontology Web Language)
Extends RDFS to express• relations between classes, between instances
• property types: literal vs. objects literal property: foaf:name = “Olli” object property: foaf:knows http://someone/somewhere
• Subtyping of properties reasoning (e.g. functional, transitive)• Computability / complexity levels for the model• Three sublanguages OWL-FULL, OWL-DL, OWL-LITE
Reasoning in Symbolic AI
(Theory behind) ontology languages are (more or less) based on the assumptions that:
1) Logic is expressive (as a natural language): We can model our domain / world by defining a set of statements that holds (in our world). (state of affairs is the main concern, objects are secondary)
2) Language corresponds the world: If we are using strong and expressive language, we can model in a deep way real world phenomena in a consistent way and assume that our model corresponds the world.
3) Reason out the information: We can now deduce new (world) information (in the form of statements) by inferencing the set of statements.
Reasoning in Ontologies / open world
In addition to logic-as-a-language-correspondence-theories, the logicbehind ontologies follows the open-world semantics:• Our model may not contain all the relevant information• If something is stated, it is true, BUT• If something is not described, the machine don’t know the answer!
An example:
The statement in ontology: “Lassie is a dog”
A) The question: “Is Lassie a dog?”Closed world semantics: TRUEOpen world semantics: TRUE
B) The question “Is Lassie a cat?”Closed world semantics: FALSE Open world semantics: Don’t know
Practical reasoning in Ontologies
1) We load our data (e.g. the XML file) to the reasoning machine (e.g. Jena).
2) We set the inference engine on, and also define it’s level (e.g. reason out the transitive closures).
3) Now, we can ask statements from the model and get also the statements generated by the reasoner.
The data (1): “Lassie is a dog”, “Dog is a mammal”, “Mammal is an animal”Transitive closure inference (2):-reason out the is-a –relations, if there are related instances, add the new
facts for those instances.The deduced data (3):“Lassie is a dog”, “Dog is a mammal”, Mammal is an animal”, “Lassie is a
mammal”, “Lassie is an animal”
Practical reasoning in Ontologies
OWL: reasoning with properties
Transitive properties: P(x,y) AND P(y,z) P(x,z)An example:locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) locatedIn(Punavuori, Uusimaa)
Symmetric properties:P(x,y) P(y,x)An example:isFriendOf(Olli, Matti) isFriendOf(Matti, Olli)
Functional properties:P(x,y) AND P(x,z) y = z (~every object has it’s own unique value for P)An example:hasFather(Olli, Frank) AND hasFather(Olli, Paul) Frank = Paul
Practical reasoning in Ontologies
OWL: reasoning with properties
Transitive properties: P(x,y) AND P(y,z) P(x,z)An example:locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) locatedIn(Punavuori, Uusimaa)
Symmetric properties:P(x,y) P(y,x)An example:isFriendOf(Olli, Matti) isFriendOf(Matti, Olli)
Functional properties:P(x,y) AND P(x,z) y = z (~every object has it’s own unique value for P)An example:hasFather(Olli, Frank) AND hasFather(Olli, Paul) Frank = Paul
This means:We can define certain ”implication patterns” in ourmodel and utilize them for processing data.Instead of having only the ”static” data, new datais generated based on the ”implications”.
Reasoning and processing data
In addition to the inferencing in the model, we can processthe data in more traditional ways:
• Build a procedural program for processing data
• Use specific rule-language for processing
• Query the data by using specific RDF query language, e.g. SPARQL(RQL, RUL, RDQL, …)
• The best solution depends on the nature of the problem:e.g. the inference engine reasoning is usually expensive / costly
solution (=takes lot of time)
SparQL query language
SparQL: W3C recommendation
• Current de facto query language for RDF• Quite same as SQL to relational databases:
SELECT, WHERE, ORDER BY (why the FROM is missing?)
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox }
SparQL query language
SparQL: W3C recommendation
• Current de facto query language for RDF• Quite same as SQL to relational databases:
SELECT, WHERE, ORDER BY (why the FROM is missing?)
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?price WHERE { ?x ns:price ?price . FILTER (?price < 30.5) ?x dc:title ?title . }
SparQL query language
SparQL: why?
• Clear representation for data queries (instead of coding by hand)• Good query engine implementation fast data retrieval?• Implemented in many development libraries
What you cannot do with SparQL?
• Update data? (extension: SparQL Update)• Do recursive queries:
“get all the superclasses of the dog”
(procedural example)x = dogWhile(x has superclasses) {
add superclass to resultsetx = superclass
}
Part 2
Outline: part 2
Data modeling for the Semantic Web
When modelling things in ontologies, we can use “object-oriented” approach:
• Try to define the domain
• Model objects that exist in the domain and the relations between the objects
In the modelling task, we are defining
• The metadata schema as usual (~database schema / objects of the domain)
• In addition, we should also define the ‘domain ontologies’ or ‘domain vocabularies’ we are using
Data modeling for the Semantic Web
metadata schema
• Defines the primary objects (classes) to model: books, cars, persons, …
• Defines the properties for objects: title, author, edition, no of pages, ISBN, genre, …
• Properties either have literal values or object values• Literal / DatatypeProperty:
name, title, street address, isbn, hasGenre(?)
• Object property: hasFriend, isLocated, hasAuthor, hasGenre(?)
• For “similar” objects, you can use the inheritance (subclassing!)• woman is a person, person is an agent, agent is an entity…
Data modeling for the Semantic Web
metadata schema: defining properties for a class (in RDFS / OWL)
<myNS:book>rdf:type owl:Class .
<myNS:title>rdf:type owl:DatatypeProperty;rdfs:domain myNS:book;rdfs:range xsd:string.
<myNS:isbn> rdf:type owl:DatatypeProperty; rdfs:domain myNS:book; rdfs:range xsd:string .
<myNS:author>rdf:type owl:ObjectProperty;rdfs:domain myNS:book;rdfs:range myNS:author .
class definition
property definitions
Data modeling for the Semantic Web
metadata schema: defining properties for a class (in RDFS / OWL)
rdfs:domain the objects that have this property
rdfs:range the suitable values for the property
• Ontology languages are “schemaless” in the sense that youcan assign any properties for any objects. (open world assumption)
• Reasoning on the rdfs:domain:
<myNS:hasTail>rdf:type owl:ObjectProperty ;domain: myNS:donkey .
<myNS:matti>rdf:type myNS:person ;myNS:hasTail myNS:tail001 .
<myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .
Data modeling for the Semantic Web
metadata schema: defining properties for a class (in RDFS / OWL)
rdfs:domain the objects that have this property
rdfs:range the suitable values for the property
• Ontology languages are “schemaless” in the sense that youcan assign any properties for any objects. (open world assumption)
• Reasoning on the rdfs:domain:
<myNS:hasTail>rdf:type owl:ObjectProperty ;domain: myNS:donkey .
<myNS:matti>rdf:type myNS:person ;myNS:hasTail myNS:tail001 .
<myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .
”if it has the tail, it is a donkey!”
Data modeling for the Semantic Web
Domain vocabularies: reusing domain knowledge
• In our schema, we can refer to “external” ontologies thatdefine some domain of discourse.
• The idea: • you don’t have to reinvent the wheel• saves time and money• easy data integration (connected data)• (and you can always extend the domain vocabulary)
• In practice: 1) refer / fetch / download the ontology2) assign your schema properties (property range) to the values3) use the domain vocabulary to describe your resorces
Data modeling for the Semantic Web
Domain vocabularies: reusing domain knowledge
• Case study: ONKI ontology service: www.yso.fi
• User interface, web services for utilizing domain vocabularies
Data modeling for the Semantic Web
Data modeling for the Semantic Web
Domain vocabularies: reusing domain knowledge
Example domains:• Classification schemes• Geographical information (place+coordinate+relations)• YSO (General Finnish Upper Ontology – Yleinen Suomalainen
Ontologia)• DB-pedia (information extracted from the Wikipedia)• Author databases (Getty ULAN)
Data modeling for the Semantic Web
Domain vocabularies: reusing domain knowledge
In addition to domain vocabularies, the reusageof schema definitions is also encouraged!
Why? allow data integration based on the properties existing metadata schemas may provide, well-thinked, mature solutions for modelling
Example schemas:• Dublin Core, simple DC• SKOS (for thesauri and concept scheme modelling)• FOAF (Friend-of-a-Friend: social connections)
Processing the Ontology data
• Although the RDF data may be initially distributed, (usually) it has to be stored in one place for reasoning / processing. ontology repositories, usually build on the RDMS.
(triple-stores, few big tables, attributes for subject, predicate and object) repositories are usually quite slow when compared to RDMS (WHY?)
• The RDF data (graph data) is strongly interconnected, the whole model has to be in memory or in DB for processing. e.g. usually streaming / SAX-like processing is not possible
• Many Semantic Web applications are concerned on processing or analyzing 1) subsumption hierarchies OR 2) connections between the resources
SW application domains
• Data integration based on the ontologies (e.g. Linked Data)
• Multifaceted, hierarchical search (e.g. Museum Finland / Museosuomi)
• Modelling and analyzing networked data (e.g. FOAF, Linked Data)
• Trust issues: who stated what? Who agreed? In which namespace? Based on what?
• Anything starting with ‘Semantic’: Semantic search, Semantic wiki, Semantic annotation, Semantic desktop, Semantic portal, Semantic repository, …
Ontology building
Protégé ontology editor
Linked data project
Museosuomi / Museum Finland
Kulttuurisampo / FoaF
http://www.kulttuurisampo.fi/ff.shtml
Semantic Web engineering
• Tools, models and languages for managing and processing distributed data
• Not just data: emphasis for modelling “real world knowledge”
• Reuse schemas, content and domain vocabularies
• Identify everything (URIs), make resources referable
• Networked, hierarchical, interlinked data
• Data processing with inference, rules, query language or procedural programming
• Open data?
• RDF: good for modelling / designing complex domains
Semantic Web engineering (problems)
Ontologies may complicate things• Versioning, modification of domain ontology, how the data utilizing the
ontology should react? • Who is responsible for maintaining the ontology (=expensive)? • Complex data model scrappy, low quality data• You can model the same things in simpler models, e.g. in SQL• Who needs URIs anyway?• Triplestores are usually slow, the level of abstraction in data index is
low (=triple)
-----------------------------------------------------------------------------There isn’t (really) such thing as
Semantic Web application development framework------------------------------------------------------------------------------
Further material
W3C Semantic Activity: http://www.w3.org/2001/sw/RDFS spec: http://www.w3.org/TR/rdf-schema/OWL spec: http://www.w3.org/TR/owl-ref/OWL2: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/Wikipedia: semantic webLinked data: http://linkeddata.org/SKOS: http://www.w3.org/2004/02/skos/
Jena http://jena.sourceforge.net/ (for Java)RDFLib http://rdflib.net/ (for Python)
Reasoning:http://owl.man.ac.uk/2003/why/latest/#2http://www.w3.org/TR/2009/REC-owl2-primer-20091027/#Modeling_Knowledge:_Basic_Notions (in OWL2)
Thank you.Questions?