Copyright 2007 Digital Enterprise Research Institute. All rights reserved.
www.deri.org
SuRF – Tapping into the Web of Data
Cosmin Basca
Digital Enterprise Research Institute, Galway
Special Thanks to: Benjamin Heitman and Uldis Bojars
Digital Enterprise Research Institute, Galway
Outline
• About DERI• Why Semantic Web?
– Linked Open Data (LOD)– RDF (Resource Description Framework)– SPARQL
• O-RDF Mapping (ActiveRDF / SuRF)– How?– Architecture– Installation– Examples
• Simple: access DBpedia (Semantic Wikipedia)• More complex: create a blog on top of RDF
2
DERI – http://www.deri.ie/
• Digital Enterprise Research Institute (DERI): – http://www.deri.ie/ – main goal: enabling networked knowledge– research about the future of the Web– biggest Semantic Web research institute in the world
• 120 people– part of the National University of Ireland, Galway
3
Outline
• About DERI• Why Semantic Web?
– Linked Open Data (LOD)– RDF (Resource Description Framework)– SPARQL
• O-RDF Mapping (ActiveRDF / SuRF)– How?– Architecture– Installation– Examples
• Simple: access DBpedia (Semantic Wikipedia)• More complex: create a blog on top of RDF
4
Why ?
• Develop Web applications that allow – Data Integration– Flexibility
• Schema definition and modeling• Schema evolution
– Robustness– Support for new Data
• Sources• Types
5
There is a Wealth of (RDF) data out there
6
Popular Semantic Web Vocabularies
• FOAF = for describing people and social network connections between them http://xmlns.com/foaf/spec/
• SIOC = for describing Social Web content created by people http://sioc-project.org/
• DOAP = for describing software projects http://trac.usefulinc.com/doap – used by PyPi
7
Linked Open Data - Growth
8
Linked Open Data - Growth
9
Linked Open Data - Growth
10
The data model
• Traditional Approach use the Relational model– Usually leads to big ugly Schemas
11
The RDF (Graph) Data model
• Flexible– Support for both schema and data evolution during runtime– Simple model
• Relations are represented explicitly• Schema is a graph• Can integrate data – union of two graphs
12
A triple
The RDF (Graph) Data model
13
Eric Personis a
Subject Predicate Object
Example RDF graph describing Eric Miller (RDF Primer) – human readable format
14
EricEric Miller
Dr.
Person
is a
has full name
has e-mail
has personal title
Example RDF graph describing Eric Miller (RDF Primer) – machine readable format
15
mailto:[email protected]
http://w3.org/People/EM/contact#meEric Miller
Dr.
http://w3.org/2000/10/swap/pic/contact#Person
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/10/swap/pim/contact#fullName
http://www.w3.org/2000/10/swap/pim/contact#mailbox
http://www.w3.org/2000/10/swap/pim/contact#personalTitle
The RDF (Graph) Data model – Identification
• URI’s provide strong references– The URIref is a an unambiguous pointer to something of
meaning
Nodes (“Subjects”)
connect via Links (“Predicates”)
to Objects• Can be Nodes or Literals (plain or typed strings)
16
SPARQL – Querying the Semantic Web
• SPARQL is to RDF what SQL is to Relational tables• Expressive, designed with the Graph data model in mind
17
CarrieFisher Star
Wars
HarrisonFord
DarrylHannah
Blade Runner
starred_in
SELECT ?actor ?movie WHERE {?actor starred_in ?movie
}
starred_in
starred_in
starred_in
Levels of Data abstraction
18
APPLICATION
CONCEPTUAL
Relational Schemata Ontology
LOGICAL
SQL SPARQL RDQL Prolog Queries
PHYSICAL
Indexes Disk / Memory Data representation
DATA
Direct SPARQL Access
O-RDF Mapper SuRF
O-RDF Mapper, Why?
• Clean OO design
• Increased productivity– model is free from persistence constraints
• Separation of concerns and specialization
• ORMs often reduce the amount of code needed to be written, making the software more robust– 20% to 30% less code needs to be written– Less code – less testing – less errors
19
O-RDF Mapper, How?
• How do we see RDF data?– As a SET of triples?– As a SET of resources?
• The resource view is more suitable for the OO model
• How do we define an RDF resource ?– All triples <S,P,O> with same subject (ActiveRDF, SuRF)– And all triples <O,P,S> (SuRF)
• Apply Open World principles
20
Outline
• About DERI• Why Semantic Web?
– Linked Open Data (LOD)– RDF (Resource Description Framework)– SPARQL
• O-RDF Mapping (ActiveRDF / SuRF)– How?– Architecture– Installation– Examples
• Simple: access DBpedia (Semantic Wikipedia)• More complex: create a blog on top of RDF
21
SuRF – Semantic Resource Framework
• Inspired by ActiveRDF– Developed in DERI for ruby– Expose RDF as sets of resources
• Semantic attributes exposed as a “virtual API”, generated through introspection. – Naming convention:
• instance.namespace_attribute• cosmin.foaf_knows
• Finder methods– Retrieve resources by type or by attributes
• Session keeps track of resources, when calling session.commit() only dirty resources will be persisted
22
SuRF – Architecture
23
Session
Store
Reader Writer
Resource Proxy
Serializer Query
Namespace Manager
SuRF – Architecture – Currently supported plugins
24
Store
Reader
SPARQL HTTP
protocol
Sesame2 API
(Franz)
Sesame2 HTTP
Writer
Sesame2 API
(Franz)
Sesame2 HTTP
• Add your own plugins, extend:
surf.store.plugins.RDFReader
surf.store.plugins.RDFWriter
Redefine the __type__ attribute
This is the plugin identifier
• To install plugins
import my_plugin
SuRF - installation
• Available on PyPi– easy_install –U surf (to get the latest)
– Open-source available on Google Code, BSD licence
• http://code.google.com/p/surfrdf/
25
Outline
• About DERI• Why Semantic Web?
– Linked Open Data (LOD)– RDF (Resource Description Framework)– SPARQL
• O-RDF Mapping (ActiveRDF / SuRF)– How?– Architecture– Installation– Examples
• Simple: access DBpedia (Semantic Wikipedia)• More complex: create a blog on top of RDF
26
SuRF – simple example
DBpedia public SPARQL endpoint - read-only• Create the store proxy
from surf import *
store = Store(reader='sparql-protocol',endpoint='http://dbpedia.org/sparql', default_graph='http://dbpedia.org')
• Create the surf session
print 'Create the session'session = Session(store,{})
• Map a dbpedia concept to an internal class
PhilCollinsAlbums = session.get_class(ns.YAGO['PhilCollinsAlbums'])
27
SuRF – simple example
DBpedia public SPARQL endpoint - read-only• Get all Phill Collins albums
all_albums = PhilCollinsAlbums.all()
• Do something with the albums (display the links to their covers)
print 'All covers'for a in all_albums: if a.dbpedia_name: print '\tCover %s for "%s"'%(a.dbpedia_cover,a.dbpedia_name)
28
Outline
• About DERI• Why Semantic Web?
– Linked Open Data (LOD)– RDF (Resource Description Framework)– SPARQL
• O-RDF Mapping (ActiveRDF / SuRF)– How?– Architecture– Installation– Examples
• Simple: access DBpedia (Semantic Wikipedia)• More complex: create a blog on top of RDF
29
SuRF – integrate into Pylons
• Create a blog on top of an RDF database• Replace SQLAlchemy with SuRF• Download and install either AllegroGraph Free Edition
(preferred) or Sesame2– http://www.franz.com/downloads/clp/ag_survey– Free for up to 50.000.000 triples (records)
• Install pylons: easy_install pylons• Install SuRF: easy_install surf• Create a pylons application:
paster create -t pylons MyBlog
cd MyBlog
30
SuRF – Pylons Blog
• ~/MyBlog/development.ini: In the [app:main] section add
rdf_store = localhost
rdf_store_port = 6789
rdf_repository = tagbuilder
rdf_catalog = repositories
• ~/MyBlog/myblog/config/environment.pyfrom surf import *
rdf_store = Store( reader = 'sparql-sesame2-api',
writer = 'sesame2-api',
server = config['rdf_store'],
port = config['rdf_store_port'],
catalog = config['rdf_catalog'],
repository = config['rdf_repository'])
rdf_session = Session(rdf_store, {})
31
SuRF – Pylons Blog
• ~/MyBlog/myblog/model/__ init __.py from surf import *
def init_model(session):
global rdf_session
rdf_session = session
# register a namespace for the concepts in my blog
ns.register(myblog=‘http://example.url/myblog/namespace#’)
Blog = rdf_session.get_class(ns.MYBLOG[‘Blog’])
• Create the blog controller paster controller blog
• ~/MyBlog/myblog/controllers/blog.pyimport logging
from myblog.lib.base import *
log = logging.getLogger(__name__)
class BlogController(BaseController):
def index(self):
c.posts = model.Blog.all(0,5)
return render("/blog/index.html")
32
SuRF – Pylons Blog
• Create the template mkdir ~/MyBlog/myblog/templates/blog
• ~/MyBlog/myblog/templates/blog/index.html
<%inherit file="site.html" />
<%def name="title()">MyBlog Home</%def>
<p>${len(c.posts)} new blog posts!</p>
% for post in c.posts:
<p class="content" style="border-style:solid;border-width:1px">
<span class="h3"> ${post.myblog_title} </span>
<span class="h4">Posted on: ${post.myblog_date} by ${post.myblog_author}</span>
<br> ${post.myblog_content}
</p>
% endfor
• ~/MyBlog/myblog/templates/blog/site.html• Start the development built in server:
paster serve --reload development.ini
33
SuRF – Tapping into the Web of Data
• Can tap into the web of Data– SPARQL endpoints– Local or remote RDF Stores– Plugin framework, allows for more access protocols to be
defined
• Code is generated dynamically (pragmatic bottom up approach):– Introspection, meta-programming, – exposing a virtual API (defined by the data and the schema) to
the developer
• Can easily be integrated into popular python frameworks– pylons
34
exit()
• http://code.google.com/p/surfrdf/
• easy_install –U surf
35