Date post: | 25-Jan-2015 |
Category: |
Documents |
Upload: | eswcsummerschool |
View: | 70 times |
Download: | 3 times |
Open Data Publication and Consumption
An Overview of Relevant Data Access Approaches and DaaS Solutions
@ESWC Summer School, 2014
Dumitru Roman, SINTEF, Norway
Outline
• The context: Open Data
• Data access: Web APIs, OData, SPARQL/LDP
• DaaS solutions landscape and open DaaS architecture
2
Outline
• The context: Open Data
• Data access: Web APIs, OData, SPARQL/LDP
• DaaS solutions landscape and open DaaS architecture
3
The context: Open Data
• Open Data Movement: make data available (primarily government data)
– Businesses and citizens can develop new ideas, services and applications
– Can support (government) transparency and accountability
4Source: McKinsey http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
Gartner:
By 2016, the use of "open data" will continue to
increase — but slowly, and predominantly limited to
Type A enterprises.
By 2017, over 60% of government open data
programs that do not effectively use open data
internally, will be scaled back or discontinued.
By 2020, enterprises and governments will fail to
protect 75% of sensitive data and will declassify and
grant broad/public access to it.
Source: Garner http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data_JUN+2014_v2.pdf
Lots of open datasets on the Web…
• A large number of datasets have been published as open data in the recent years
• Many kinds of data: cultural, science, finance, statistics, transport environment, …
• Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON, …
5
…but few applications
• Applications utilizing open and distributed datasets have been rather few, e.g.
• Challenges include:– Lack of resources: unreliable data access
– Lack of expertise: not easily available to organisations
– Technical/organizational
6
Open Data Portal Datasets Applications
data.gov ~ 110 000 ~ 350
publicdata.eu ~ 50 000 ~ 80
data.gov.uk ~ 20 000 ~ 350
data.norge.no ~ 300 ~ 40
Open data publication and access
• Data publishers: complicated data publishing and maintenance process
• Data consumers/developers: complicated programmatic data access
• A decision which lifts a data publication burden from a data publisher will place that burden on the data access for the data consumer
7
Easy data publication
Easy data access
Complicated data access
Complicated data publication
Simplify data access!Simplify data publication !
Outline
• The context: Open Data
• Data access: Web APIs, OData, SPARQL/LDP
• DaaS solutions landscape and open DaaS architecture
8
(Programmatic/Web-based) Data access
• Traditional approaches for programmatically consuming data: ODBC, JDBC, RMI, CORBA, ...
• Modern Web applications and data services rely extensively on lightweight Web service based approaches exchanging data via standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, …)
• Relevant approaches for programmatic access to open data
– Web APIs
– OData
– SPARQL and Linked Data Platform (LDP)
9
Web APIs
• Programmatic interfaces accessible through HTTP calls (e.g. GET, POST)
• Data (requests/responses) typically in JSON or XML
• Very popular among application developers
10Source: http://www.programmableweb.com/
Protocol: HTTP
Payload: JSON/XML/…
Data Consumer / Dev Data Provider
Client LibraryApp
Web Service
Web API
Web APIs - example
11
Request:GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58
Response payload:
http://api.yr.no/weatherapi/locationforecast/1.9/documentation
Open Data Protocol (OData)
• “ODBC for the Web”
• A protocol for creating and consuming data APIs
• Builds on HTTP and REST
• OASIS Standard (2014), promoted by Microsoft, IBM, and SAP
12
http://www.odata.org/
OData
• Principles: Metadata, Data, Querying, Editing, Operations, Vocabularies
• The OData Data Model – based on the Entity Data Model (EDM)
• The OData protocol: CRUD + query language
• XML and JSON serialization
Source: Microsofthttp://msdn.microsoft.com/en-us/data/hh237663.aspx
OData - requesting data examples
14
Request (entity by ID):GET serviceRoot/People('russellwhyte')
Source: http://www.odata.org/getting-started/basic-tutorial/
Response payload:
Request (collections):GET serviceRoot/People
Request (individual property):GET serviceRoot/Airports('KSFO')/Name
OData - querying data examples
15
Source: http://www.odata.org/getting-started/basic-tutorial/
Request (filter):GET serviceRoot/People?$filter=FirstName eq 'Scott'
Response payload:
Filter on complex type:GET serviceRoot/Airports?$filter=contains(Location/Address, 'San Francisco')
orderby:GET serviceRoot/People('scottketchum')/Trips?$orderby=EndsAt desc
top:GET serviceRoot/People?$top=2
count:GET serviceRoot/People/$count
expand:GET serviceRoot/People('keithpinckney')?$expand= Friends select:GET serviceRoot/Airports?$select=Name, IcaoCode
search:GET serviceRoot/People?$search=Boise
Lambda Operators: any / all GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))
OData - data modification example
16Source: http://www.odata.org/getting-started/basic-tutorial/
Request (Create an Entity):POST serviceRoot/People
OData-Version: 4.0
Content-Type:
application/json;odata.metadata=minimal
Accept: application/json
{
"@odata.type" :
"Microsoft.OData.SampleService.Models.TripP
in.Person",
"UserName": "teresa", "FirstName" : "Teresa",
"LastName" : "Gilbert", "Gender" : "Female",
"Emails" : ["[email protected]",
"[email protected]"], "AddressInfo" : [
{ "Address" : "1 Suffolk Ln.", "City" : {
"CountryRegion" : "United States", "Name" :
"Boise", "Region" : "ID“ }
}] }
Response payload:
Remove an Entity:DELETE serviceRoot/People('vincentcalabrese')
Update an Entity (uses PATCH or PUT)
Relationship Operations (Link to Related Entities):POST serviceRoot/People('scottketchum')/Friends/$ref
…
{
"@odata.id": "serviceRoot/People('vincentcalabrese')"
}
SPARQL
• A set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store
17
Service DescriptionRequest:GET /sparql/ Host: www.example.orgResponse: An RDF description, using the Service Description vocabulary
Protocol for RDFRequest:GET /sparql/?query=[SPARQL Query]Host: www.example.orgResponse: A SPARQL Results Document or RDF graph
Update LanguagePREFIX foaf: <http://xmlns.com/foaf/0.1/> .
INSERT DATA { <http://www.example.org/alice#me> foaf:knows [ foaf:name "Dorothy" ]. } ;
DELETE { ?person foaf:name ?mbox } WHERE { <http://www.example.org/alice#me> foaf:knows?person .
?person foaf:name ?name FILTER ( lang(?name) = "EN" ) .}
Examples taken from http://www.w3.org/TR/sparql11-overview/
Query LanguagePREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name (COUNT(?friend) AS ?count)WHERE {
?person foaf:name ?name . ?person foaf:knows ?friend .
} GROUP BY ?person ?name
Result (serialized in XML, JSON, CSV, TSV):
Graph Store HTTP ProtocolPOST /rdf-graphs/service?graph=http%3A%2F%2Fwww.example.org%2FaliceHost: example.orgContent-Type: text/turtle@prefix foaf: <http://xmlns.com/foaf/0.1/> .<http://www.example.org/alice#me> foaf:knows [ foaf:name "Dorothy" ] .
http://www.w3.org/TR/sparql11-overview/
Linked Data Platform
• Describes the use of HTTP for accessing, updating, creating and deleting resources from servers that expose data as Linked Data
• Centered around LDPRs, LDPCs, membership, containment
• Under development at W3C; working draft
18
http://www.w3.org/TR/ldp/
LDP-BC Request: GET /c1/ Response payload:
Resource Request: GET /netWorth/nw1Response payload:
LDP-DC Request: GET /netWorth/nw1/liabilities/ Response payload:
Examples taken from http://www.w3.org/TR/ldp/
LDP-DC Request:
Data Access Summary
• Web APIs
– Very flexible, popular with Web developers, no specific commitment to data models
• OData
– ER-based data model, abstract interface to datastores (focus on CRUD), perceived as vendor-pushed (strong tool support)
• SPARQL and LDP
– Graph data model, community-pushed, some interesting features (querying, federation, linking,…)
• Though there is overlapping between the various approaches, they all aim to simplify access to distributed data sources for application developers
– Which approach to choose depends on many factors, e.g. type of data, size, relationships, infrastructure, skills to support, frequency of updates, end-use scenarios, …
19
Outline
• The context: Open Data
• Data access: Web APIs, OData, SPARQL/LDP
• DaaS solutions landscape and open DaaS architecture
20
Data publication
• Data access mechanisms simplify data consumption for application developers
• But data needs to be provisioned to applications according to the chosen data access mechanism
– And applications will always be dependent on the hosting for the data they use
• Data publishers and application developers need to rely on generic Cloud platforms and build, deploy and maintain a complex Open Data software and data stack from scratch
– Complicated data provisioning and maintenance process
– Data-as-a-Service (DaaS) solutions are emerging to address this issue
21
“Like all members of the "as a Service" (XaaS) family, DaaS is based on the concept that the product,
data in this case, can be provided on demand to the user regardless of geographic or
organizational separation of provider and consumer.”
Source: Wikipedia; https://en.wikipedia.org/wiki/DaaS
Relevant DaaS solutions
22
Windows Azure Marketplace
Socrata DataMarket
Factual Junar PublishMyData
DaPaaS …
Windows Azure Marketplace
• A marketplace for applications and data (~170 datasets; ~700 applications)
• Charging data consumers
• Tools and APIs for data publishing, analytics, metadata management, account management and pricing, monitoring and billing, as well as a data portal for dataset exploration
• Supports OData
23
https://datamarket.azure.com/
Source: Microsofthttp://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409
Socrata
• Specific focus on Open Data
• Open Data Portal: data publishing & clean-up, metadata generation, data-driven portals for data exploration and portal management
• API Foundry for creating and deploying RESTful APIs on top of the data
• Hosted data is accessible through the Socrata Open Data API (SODA) – a RESTful interface for searching and reading data in XML, JSON or RDF
24
http://www.socrata.com/
Source: Socrata
DataMarket
• Provides statistical data from almost 100 data providers
• ~ 71 000 datasets
• Supports embeddable visualisations of data, data export, live feeds for data updates, ability for data publishers to monetize data via the marketplace, custom data driven portals for publishers, data portal, Web API
25
http://datamarket.com/
Factual
• Data for ~ 65 million local business and points of interest in 50 countries; a product database of over 650,000 products
• Used to provide the option for hosting thousands of 3rd party data sets (“Community Data”) but activity has been discontinued
• Data is populated by means of Web crawls, data extraction and 3rd party data services; data model is tabular, based on taxonomy of around 400 categories
• Pricing is based on a pay-per-use model
• Data access is provided through a RESTful API
• Provides a set of tools for data management
26
http://www.factual.com/
Junar
• Cloud-based Open Data platform to collect, enrich, publish and analyse open data
• Data can be consumed either directly via the Junar API, or via various visual widgets
27
http://www.junar.com/
PublishMyData
28
• Hosted, as-a-service solution for Open and Linked Data publishing
• Uses DCAT and provides data access via Web APIs, a SPARQL endpoint and raw data-dumps
http://www.swirrl.com/publishmydata
Other relevant solutions
• Comprehensive Knowledge Archive Network (CKAN)(http://ckan.org/) – web-based open source data management system for the storage and distribution of open data; datahub (http://datahub.io/)
• LOD2 (http://lod2.eu/) – research project aimed at providing an open source, integrated software stack for managing the lifecycle of Linked Data, from data extraction, enrichment, interlinking, to maintenance; not meant to be as-a-service solution
• Project Open Data (http://project-open-data.github.io/) – a set of open source tools, methodologies and use cases for publishing and utilising Open Data
• COMSODE (http://www.comsode.eu/) – research project aiming to create a publication platform for Open Data called Open Data Node
29
DaPaaS – towards an Open Data- and Platfom-as-a-Service for Open Data
• DaPaaS – research project for simplifying data publication and consumption via a Data- and Platform-as-a-Service approach
30
http://dapaas.eu
DaPaaS Platform
Data Publisher
End-Users Data Consumer
Application Developer
publishes
open data
develops and deploys
applications on top
published data
consumes data resulting
from the available
applications
DaPaaS – Requirements for Data Publisher
31
DP-02: Data storage and
querying
DP-04: Data interlinking
DP-03: Dataset search &
exploration
DP-09: Data availability
DaPaaS Platform
DP-05: Data cleaning &
transformation
DP-01: Dataset Import
DP-11: Secure access to platform
DP-10: User registration & profile
management
Data
Publisher
DP-08: Data scalability
DP-06: Dataset bookmarking &
notifications
DP-07: Dataset metadata management, statistics &
access policies
DP-12: UI for data publisher
DP-13: Data publishing
methodology support
DaPaaS – Requirements for Application Developer
32
AD-04:Configure
application deployment
AD-01: Access to Data Publisher
services (DP-01 – DP-13)
AD-03: Develop applications in state-of-art programming
languages
AD-05: Deploy and monitor application
AD-06: Application metadata management,
statistics & access policies
DaPaaS Platform
AD-07: UI for application developer
AD-08: Application development methodology
support
AD-02: Data export
Application
Developer
DaPaaS – Requirements for End-Users Data Consumer
33
DaPaaS Platform
End-User
Data Consumer
EU-03: Datasets and applications bookmarking
and notifications
EU-01: User registration & profile
management
EU-02: Search & explore datasets and applications
EU-04: Mobile and desktop GUI access
EU-07: High availability of data and applications
EU-05: Data export and download
DaPaaS Platform Abstract High-Level Architecture
34
Data Layer
UX Layer
UX Services
Open Data
Warehouse
Platform Layer
Us
ag
e M
on
ito
rin
g
Application Hosting
Environment
Sec
uri
ty &
Ac
ce
ss
Co
ntr
ol
To
ol-
su
pp
ort
ed
Me
tho
do
log
y f
or
Data
Pu
blis
hin
g/C
on
su
mp
tio
n
DaaS Services
PaaS Services
DatasetsDaaS Services
DaaS Services
Data-Driven
ApplicationsPaaS ServicesPaaS Services
UX ServicesUX Services
Summary
• Lots of open datasets, but few applications using them
• Simplifying data publication/consumption can enable an increase in the number (and quality) of applications using open data
• Various approaches emerging
– For data access: Web APIs, OData, SPARQL/LDP
– For data publication/provisioning: DaaS solutions
35