Open Data and its Potential
- reuse of public sector information
- Svein Ølnes, Vestlandsforsking, 13.04.2011
www.vestforsk.no
Outline
About Vestforsk and myself
Semantic technologies
Linked (Open) Data
Open Data
Open Data -> LOD -> Sem. Techn.
Relevant projects and resources
Literature
www.vestforsk.no
Vestlandsforsking
ICT themes
Semantic technologies, information structures ++
Regional development, organizational changes with ICT
ICT application areas
Public sector (eGovernment, eHealth)
Tourism sector (local, regional, national, int’national level)
Vestforsk also does research in
Climate change
Transport and environment
Sustainable tourism
Renewable energy
www.vestforsk.no
About me
Vestforsk since 1996
eGovernment
Municipalities
Government
Semantic technologies
Projects
Norge.no (establishing in 1999/2000)
MiSide (development of demonstrator in 2004)
LivsIT/Los (2003 – to date)
Evaluation of public websites (2001 – to date)
www.vestforsk.no
Naming things!
[the famous cartoon by Gary Larson showing a man
painting ’the cat’, ’the dog’, ’the house’ on his cat,
dog, and house and explaining ”Now, this should
clear up a few things around here!”]
www.vestforsk.no
Technology waves
Procedure orientedFocus: Syntacs
Data: Hierarchical
Object orientedFocus: Structure
Data: Relational
Component basedFocus: Services
Data: XML
Model drivenFocus: Semantic
Data: Ontologi & Data
19951975 2005 201519851965
Stian Danenbarger, Bouvet.no
www.vestforsk.no
The ontology spectrum
The ontology spectrum: From weak to strong semantics
1. Vocabulary• plain text documents/HTML pages – almost no semantic structure
2. Controlled vocabularies (weak semantic structure)• adding metadata to the information
3. Taxonomies• metadata and hierarchy
4. Thesauri• metadata, hierarchy and a limited set of relations (BT, NT, related to ...)
5. Stronger semantic structures/ontologies• metadata, [hierarchy], any relations
(Daconta et al.: “The Semantic Web”)
www.vestforsk.no
Semantic technologies
AI tradition/Logics: Semantic web
W3C as the standardization body
Humanities/Library science: Topic Maps
ISO-standard
Light-weight, bottom-up: Microformats
Not a standard yet, but might be as part of HTML5
www.vestforsk.no
Semantic Web
”Web of data”
”Web 3.0”
”Semantic web” coined by Tim Berners-Lee in mid 1990s
The (in-)famous article ”The Semantic Web” in Scientific
American 2001 (TBL, Jim Hendler, Ora Lassila)
Wikipedia:However, the Semantic Web as originally envisioned, a system that
enables machines to understand and respond to complex human
requests based on their meaning, has remained largely unrealized and
its critics have questioned its feasibility.
www.vestforsk.no
Semantic Web stack
www.vestforsk.no
Lessons learned from the HTML history?
xhtml 1: HTML as XML
xhtml 2: Get rid of html altogether
... it was a disaster!
WHATWG TF – a rebellion inside W3C
Web Apps 1.0
.. eventually led to HTML5
pragmatism won over idealism
Jeremy Keith: ”HTML5 for Web Designers”
www.vestforsk.no
Semantic Web light
Is the Semantic Web too complex?
difficult to scale to the WWW
more suitable for use within smaller domains
Introducing ”Light-weight” SW:
RDFa: RDF expressed as (x)HTML – part of HTML
GRDDL: RDF data from XML/xHTML documents
SKOS: Simple Knowledge Organization System – representation of”classical” structures as taxonomies, thesauri in RDF
• organizing concepts with standard relations
Linked (Open) Data
www.vestforsk.no
Topic Maps
ISO standard from 2001 (present standard from 2003)
ISO 13250:2003
Strong Norwegian community
small world wide community compared to SW
Large uptake in portals, especially public portals
”Fight” between TM and SW
Largely over, ”SW has won”
Linked Data as a common ground for further development
Focus has shifted from technology to utilizing data
www.vestforsk.no
Simple Topic Maps model
3 Topic types: person, project and publication
2 Association types: Project manager of, Author of, and Result of
www.vestforsk.no
Topic Maps in use
Some Topic Maps driven portals:
uib.no
vestforsk.no
nofima.no
regjeringen.no
stortinget.no
bergen.kommune.no
www.vestforsk.no
The Linking Open Data (LOD) Project
www.vestforsk.no
Ultimate goal: My metadata is
your data (and vice versa)
SERES
Lov
data
LOS
Europe
ana
KS
Smiln
o
Volve
n
Yr.no
Kart-
verket
SKD
SSB
www.vestforsk.no
Linked (Open) Data
using the Web to lower the barriers to linking data
use of RDF to make typed statements
Linked Data = Use the Web to make typed links between data
from different sources
Alex Wright: The Web That Wasn’t (Topic Maps 2008 Conf.)
David Weinberger: Thank God! (Topic Maps 2008 Conf.)
”small pieces loosely joined”
www.vestforsk.no
Linked Data vs. Linked Open Data
www.vestforsk.no
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information,
using the standards (RDF, SPARQL)
4. Include linkes to other URIs, so that they can discover more
things
Linked Data can be serialized as
RDF/XML
N3 (Turtle)
RDFa
www.vestforsk.no
Linked Open Data Star Scheme
Tim Berners-Lee/DERI – University of Galway
www.vestforsk.no
Linked Data example
”Populated place” is a concept defined in the DBpedia ontology
Use established ontologies whereever possible
FOAF (friend-of-a-friend)
Dublin Core
hCard, hCalendar, hAtom
www.vestforsk.no
Linked Data vs. Semantic Web
The Semantic Web, or the Web of Data, is the ultimate goal
Linked Data provides the means to reach that goal
Linked Data helps build the Web of Data that later can be
exploited by more advanced techn. such as intelligent agents
(it has to be added that this is the proponents of the semantic
web/intelligent agents claim)
Tom Heath: ”Without Linked Data, no Semantic Web!
Talis Nodalities no. 11
www.vestforsk.no
Open data
In principle all data, but mostly public data because that is the
easiest to start with
PSI directive from EU an important enabler (also included in
Offentleg-lova)
data.norge.no
data.norge.no from FAD to Difi
and from blog to data repository (?)
www.vestforsk.no
Why open data?
1. Increase democratic control and political participation
Empower citizens to exercise their democratic rights
2. Foster service and product innovation
New opportunities for innovation generated by open governmentdata
3. Strengthen law enforcement
Especially the US and the UK strategies emphasize this
Study published in the European Journal of ePractice, 2011
www.vestforsk.no
“Open data and its enemies”
Some pressure from FAD (recently expressed in ”Tildelings-
brevet”), but slow movement in general
cultural issues
budget issues
fear of loosing control
transparency is seen as a threat
Map data is some of the most important – Map Authorities are
not willing to publish raw data
www.vestforsk.no
Closed map data a problem
Bente Kalsnes, Origo
www.vestforsk.no
Open data strategies
Study published in the European Journal of ePractice, 2011
www.vestforsk.no
Open Data Instruments
Study published in the European Journal of ePractice, 2011
www.vestforsk.no
Top 10 drivers of open data1. Strategies and experiences
2. Political leadership
3. Regional initiatives
4. Citizen initiatives
5. Market initiatives
6. Emerging technologies
7. European legislation
8. Thought leaders
9. Possibility of monitoring government
10. Budget cuts
European Journal of ePractice, 2011
www.vestforsk.no
Top 10 barriers to open data1. Closed government culture
2. Privacy legislation
3. Limited quality of data
4. Limited user-friendliness/Info overload
5. Lack of standardisation of open data
6. Security threats
7. Existing charging methods
8. Uncertain economic impact
9. Digital divide
10. Network overload
European Journal of ePractice, 2011
www.vestforsk.no
data.norge.no
Initiative from FAD started in 2010
(I will take credit for the name! :)
Mostly a blog
Gradually building up a data repository
From 01.05.2011 Difi will have the responsibility for
data.norge.no
www.vestforsk.no
data.norge.no as of April 2011
1. Byantikvarens gule liste (xls)
2. Einingsregisteret (rdf/xml)
3. Gardsmatrikkelen 1886 (xls)
4. Idrettsanlegg (csv)
5. Kraftprisar (Tab-sep. tekst)
6. Ladestasjonar (csv, ov2..)
7. Los (ods)
8. N5000 (div. grafiske format
+ sosi/shape)
9. Statlege styre, råd og utval
(html)
10. Statsbudsjettet og nasjonal-
budsjettet 2011 (xls, csv)
11. Tenestemannsregisteret
(csv)
• no.ckan.net lists 212
different data sources
www.vestforsk.no
7 tips for publishing linked open data
1. Use standard Internet protocols for access (http)
2. All objects need a unique identifier (URI)
3. Avoid aggregation of data
4. Structure metadata in a machine readable format (xml or xml/rdf/xtm)
5. Use international character set (UTF-8)
6. Use minimum Dublin Core as a standard way of describing metadata
7. Think about linking to other data sources by preparing for Linked Data
www.vestforsk.no
Relevant projects from Vestforsk
Sesam4 – Semantic technologies for SMEs
Los – a navigator for public services
Tourism concepts – a common vocabulary for the tourism industry
Seminars on semantic technologies
The WIMS’11 Conference
www.vestforsk.no
Sesam4
VERDIKT project 2008 – 2011 (ended 31st of March this year)
Use of semantic technologies in SMEs
Provided a set of tools for SMEs (and others) to use for ”semantisizing”
their data
Demonstrated semantic technologies in two pilots:
Tourism
Business information
NR, Vestlandsforsking, Esis, Computas, UNI Digital,
Cyberwatcher, TextUrgy, Ovitas, IKT-Norge
www.vestforsk.no
Sesam4 – lessons learned
Project planned in 2007
A lot of things have happened since 2007
Emerging of Linked Data
Sesam4 gradually tuned in to LOD
Too much focus, resources, and discussion (!) spent on ontologies!
Light-weight approach saves time & money
Valuable tools for semantic lifting and best practices remains
available for anybody to use (most of the project in open
source)
www.vestforsk.no
LOS – a navigator to public services
LivsIT (1996 – 2004)
Life situations
Los (2005 - ??)
Shared vocabulary for public services
More than 1/3 of the municipalities in Norway use Los as a
foundation of their web portal
Difi is the responsible agency
Los a success despite Difi’s lack of support and development
Problem with uptake in Governmental bodies
By using Los municipalities can share information with
Governmental bodies and themselves
www.vestforsk.no
What a difference a little semantics can do
Note: Bergen recently changed their internal search to Google search and lost the semantic support (Los) for search
www.vestforsk.no
What a difference a little semantics can do
www.vestforsk.no
Los – structure
Tema = Theme
Emneord = Keyword
Nettressurs = Net resources
www.vestforsk.no
How does Los work?
Keyword
Net resource
Help word
www.vestforsk.no
Tourism concepts
Pre-project for the Norwegian tourism industry (VisitNorway, NCE
Tourism)
Advice on constructing a common vocabulary for tourism concepts
Initiated by Anders Waage Nilsen in NCE Tourism/Fjord Norway
(Anders now in MediArena)
www.vestforsk.no
Tourism concepts - advice
1. Simplification (today’s categorizing scheme is too complicated)
2. Develop a controlled vocabulary with emphasis on keywords (the Los
method)
3. Not everything can be solved with categorizing
A controlled vocabulary is necessary but not enough
4. Publish the vocabulary in the cloud
5. Publish the vocabulary in many formats (html, xml, xml/rdf, xtm)
6. Publish also the information resources in the cloud, as linked open
data
www.vestforsk.no
Seminars on semantic technologies
Vestforsk initiated a series of seminars on semantic technologies as
part of its 25th anniversary in 2010
A total of 7-8 seminars will be held, 4 already arranged
Streaming of all seminars, and archived for video on demand
We also have project ”Kunnskap kryssar grenser”/”Access to Knowledge” where we focus on streaming and use of video
www.vestforsk.no
WIMS’11
International Conference on Web Intelligence, Mining, and Semantics
Sogndal, May 25 – 27
Keynote speakers:
Jim Hendler: The Semantic Web 10th Year Update (25.05)
Peter Mika: Making Things Findable (26.05)
Sören Auer: Creating Knowledge Out of Interlinked Data (26.05)
Ashwin Ram: Open Social Learning Communities (27.05)
Marko Grobelnik: Scalable Reasoning on Intensive Streams of Data (27.05)
wims.vestforsk.no
www.vestforsk.no
Some resources
Vestforsk series of seminars on semantic technologies
http://www.vestforsk.no/aktuelt/seminarserie-om-semantiske-teknologiar
Linked Data – The Story So Far (Bizer, Heath, Berners-Lee)
http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
Linked Data vs. Linked Open Data
http://datavisualization.ch/opinions/introduction-to-linked-data
Linked Data – Evolving the Web into a Global Data Space (Heath, Bizer)
http://linkeddatabook.com/editions/1.0/
Introduction to Linked Open Data for Visualization Creators:
http://datavisualization.ch/opinions/introduction-to-linked-data
CKAN: The Data Hub
http://ckan.net
www.vestforsk.no
More resources
Talis Nodalities:
http://www.talis.com/nodalities
Publishing Open Government Data (working draft)
http://www.w3.org/TR/2009/WD-gov-data-20090908/
Åpne data og journalistikk (Bente Kalsnes, Origo)
http://www.slideshare.net/benteka/pne-data-og-journalistikk
European Journal of ePractice
http://www.epractice.eu/en/journal/issues
Figshare: Sharing scientific data (http://figshare.com)
http://blog.okfn.org/2011/03/02/introducing-figshare-a-new-way-to-share-open-
scientific-data/