+ All Categories
Home > Documents > Wed roman tut_open_datapub

Wed roman tut_open_datapub

Date post: 25-Jan-2015
Category:
Upload: eswcsummerschool
View: 70 times
Download: 3 times
Share this document with a friend
Description:
 
36
Open Data Publication and Consumption An Overview of Relevant Data Access Approaches and DaaS Solutions @ESWC Summer School, 2014 Dumitru Roman, SINTEF, Norway [email protected]
Transcript
Page 1: Wed roman tut_open_datapub

Open Data Publication and Consumption

An Overview of Relevant Data Access Approaches and DaaS Solutions

@ESWC Summer School, 2014

Dumitru Roman, SINTEF, Norway

[email protected]

Page 2: Wed roman tut_open_datapub

Outline

• The context: Open Data

• Data access: Web APIs, OData, SPARQL/LDP

• DaaS solutions landscape and open DaaS architecture

2

Page 3: Wed roman tut_open_datapub

Outline

• The context: Open Data

• Data access: Web APIs, OData, SPARQL/LDP

• DaaS solutions landscape and open DaaS architecture

3

Page 4: Wed roman tut_open_datapub

The context: Open Data

• Open Data Movement: make data available (primarily government data)

– Businesses and citizens can develop new ideas, services and applications

– Can support (government) transparency and accountability

4Source: McKinsey http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information

Gartner:

By 2016, the use of "open data" will continue to

increase — but slowly, and predominantly limited to

Type A enterprises.

By 2017, over 60% of government open data

programs that do not effectively use open data

internally, will be scaled back or discontinued.

By 2020, enterprises and governments will fail to

protect 75% of sensitive data and will declassify and

grant broad/public access to it.

Source: Garner http://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data_JUN+2014_v2.pdf

Page 5: Wed roman tut_open_datapub

Lots of open datasets on the Web…

• A large number of datasets have been published as open data in the recent years

• Many kinds of data: cultural, science, finance, statistics, transport environment, …

• Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON, …

5

Page 6: Wed roman tut_open_datapub

…but few applications

• Applications utilizing open and distributed datasets have been rather few, e.g.

• Challenges include:– Lack of resources: unreliable data access

– Lack of expertise: not easily available to organisations

– Technical/organizational

6

Open Data Portal Datasets Applications

data.gov ~ 110 000 ~ 350

publicdata.eu ~ 50 000 ~ 80

data.gov.uk ~ 20 000 ~ 350

data.norge.no ~ 300 ~ 40

Page 7: Wed roman tut_open_datapub

Open data publication and access

• Data publishers: complicated data publishing and maintenance process

• Data consumers/developers: complicated programmatic data access

• A decision which lifts a data publication burden from a data publisher will place that burden on the data access for the data consumer

7

Easy data publication

Easy data access

Complicated data access

Complicated data publication

Simplify data access!Simplify data publication !

Page 8: Wed roman tut_open_datapub

Outline

• The context: Open Data

• Data access: Web APIs, OData, SPARQL/LDP

• DaaS solutions landscape and open DaaS architecture

8

Page 9: Wed roman tut_open_datapub

(Programmatic/Web-based) Data access

• Traditional approaches for programmatically consuming data: ODBC, JDBC, RMI, CORBA, ...

• Modern Web applications and data services rely extensively on lightweight Web service based approaches exchanging data via standard protocols (HTTP) and formats (e.g. XML, JSON, RDF, …)

• Relevant approaches for programmatic access to open data

– Web APIs

– OData

– SPARQL and Linked Data Platform (LDP)

9

Page 10: Wed roman tut_open_datapub

Web APIs

• Programmatic interfaces accessible through HTTP calls (e.g. GET, POST)

• Data (requests/responses) typically in JSON or XML

• Very popular among application developers

10Source: http://www.programmableweb.com/

Protocol: HTTP

Payload: JSON/XML/…

Data Consumer / Dev Data Provider

Client LibraryApp

Web Service

Web API

Page 11: Wed roman tut_open_datapub

Web APIs - example

11

Request:GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58

Response payload:

http://api.yr.no/weatherapi/locationforecast/1.9/documentation

Page 12: Wed roman tut_open_datapub

Open Data Protocol (OData)

• “ODBC for the Web”

• A protocol for creating and consuming data APIs

• Builds on HTTP and REST

• OASIS Standard (2014), promoted by Microsoft, IBM, and SAP

12

http://www.odata.org/

Page 13: Wed roman tut_open_datapub

OData

• Principles: Metadata, Data, Querying, Editing, Operations, Vocabularies

• The OData Data Model – based on the Entity Data Model (EDM)

• The OData protocol: CRUD + query language

• XML and JSON serialization

Source: Microsofthttp://msdn.microsoft.com/en-us/data/hh237663.aspx

Page 14: Wed roman tut_open_datapub

OData - requesting data examples

14

Request (entity by ID):GET serviceRoot/People('russellwhyte')

Source: http://www.odata.org/getting-started/basic-tutorial/

Response payload:

Request (collections):GET serviceRoot/People

Request (individual property):GET serviceRoot/Airports('KSFO')/Name

Page 15: Wed roman tut_open_datapub

OData - querying data examples

15

Source: http://www.odata.org/getting-started/basic-tutorial/

Request (filter):GET serviceRoot/People?$filter=FirstName eq 'Scott'

Response payload:

Filter on complex type:GET serviceRoot/Airports?$filter=contains(Location/Address, 'San Francisco')

orderby:GET serviceRoot/People('scottketchum')/Trips?$orderby=EndsAt desc

top:GET serviceRoot/People?$top=2

count:GET serviceRoot/People/$count

expand:GET serviceRoot/People('keithpinckney')?$expand= Friends select:GET serviceRoot/Airports?$select=Name, IcaoCode

search:GET serviceRoot/People?$search=Boise

Lambda Operators: any / all GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))

Page 16: Wed roman tut_open_datapub

OData - data modification example

16Source: http://www.odata.org/getting-started/basic-tutorial/

Request (Create an Entity):POST serviceRoot/People

OData-Version: 4.0

Content-Type:

application/json;odata.metadata=minimal

Accept: application/json

{

"@odata.type" :

"Microsoft.OData.SampleService.Models.TripP

in.Person",

"UserName": "teresa", "FirstName" : "Teresa",

"LastName" : "Gilbert", "Gender" : "Female",

"Emails" : ["[email protected]",

"[email protected]"], "AddressInfo" : [

{ "Address" : "1 Suffolk Ln.", "City" : {

"CountryRegion" : "United States", "Name" :

"Boise", "Region" : "ID“ }

}] }

Response payload:

Remove an Entity:DELETE serviceRoot/People('vincentcalabrese')

Update an Entity (uses PATCH or PUT)

Relationship Operations (Link to Related Entities):POST serviceRoot/People('scottketchum')/Friends/$ref

{

"@odata.id": "serviceRoot/People('vincentcalabrese')"

}

Page 17: Wed roman tut_open_datapub

SPARQL

• A set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store

17

Service DescriptionRequest:GET /sparql/ Host: www.example.orgResponse: An RDF description, using the Service Description vocabulary

Protocol for RDFRequest:GET /sparql/?query=[SPARQL Query]Host: www.example.orgResponse: A SPARQL Results Document or RDF graph

Update LanguagePREFIX foaf: <http://xmlns.com/foaf/0.1/> .

INSERT DATA { <http://www.example.org/alice#me> foaf:knows [ foaf:name "Dorothy" ]. } ;

DELETE { ?person foaf:name ?mbox } WHERE { <http://www.example.org/alice#me> foaf:knows?person .

?person foaf:name ?name FILTER ( lang(?name) = "EN" ) .}

Examples taken from http://www.w3.org/TR/sparql11-overview/

Query LanguagePREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name (COUNT(?friend) AS ?count)WHERE {

?person foaf:name ?name . ?person foaf:knows ?friend .

} GROUP BY ?person ?name

Result (serialized in XML, JSON, CSV, TSV):

Graph Store HTTP ProtocolPOST /rdf-graphs/service?graph=http%3A%2F%2Fwww.example.org%2FaliceHost: example.orgContent-Type: text/turtle@prefix foaf: <http://xmlns.com/foaf/0.1/> .<http://www.example.org/alice#me> foaf:knows [ foaf:name "Dorothy" ] .

http://www.w3.org/TR/sparql11-overview/

Page 18: Wed roman tut_open_datapub

Linked Data Platform

• Describes the use of HTTP for accessing, updating, creating and deleting resources from servers that expose data as Linked Data

• Centered around LDPRs, LDPCs, membership, containment

• Under development at W3C; working draft

18

http://www.w3.org/TR/ldp/

LDP-BC Request: GET /c1/ Response payload:

Resource Request: GET /netWorth/nw1Response payload:

LDP-DC Request: GET /netWorth/nw1/liabilities/ Response payload:

Examples taken from http://www.w3.org/TR/ldp/

LDP-DC Request:

Page 19: Wed roman tut_open_datapub

Data Access Summary

• Web APIs

– Very flexible, popular with Web developers, no specific commitment to data models

• OData

– ER-based data model, abstract interface to datastores (focus on CRUD), perceived as vendor-pushed (strong tool support)

• SPARQL and LDP

– Graph data model, community-pushed, some interesting features (querying, federation, linking,…)

• Though there is overlapping between the various approaches, they all aim to simplify access to distributed data sources for application developers

– Which approach to choose depends on many factors, e.g. type of data, size, relationships, infrastructure, skills to support, frequency of updates, end-use scenarios, …

19

Page 20: Wed roman tut_open_datapub

Outline

• The context: Open Data

• Data access: Web APIs, OData, SPARQL/LDP

• DaaS solutions landscape and open DaaS architecture

20

Page 21: Wed roman tut_open_datapub

Data publication

• Data access mechanisms simplify data consumption for application developers

• But data needs to be provisioned to applications according to the chosen data access mechanism

– And applications will always be dependent on the hosting for the data they use

• Data publishers and application developers need to rely on generic Cloud platforms and build, deploy and maintain a complex Open Data software and data stack from scratch

– Complicated data provisioning and maintenance process

– Data-as-a-Service (DaaS) solutions are emerging to address this issue

21

“Like all members of the "as a Service" (XaaS) family, DaaS is based on the concept that the product,

data in this case, can be provided on demand to the user regardless of geographic or

organizational separation of provider and consumer.”

Source: Wikipedia; https://en.wikipedia.org/wiki/DaaS

Page 22: Wed roman tut_open_datapub

Relevant DaaS solutions

22

Windows Azure Marketplace

Socrata DataMarket

Factual Junar PublishMyData

DaPaaS …

Page 23: Wed roman tut_open_datapub

Windows Azure Marketplace

• A marketplace for applications and data (~170 datasets; ~700 applications)

• Charging data consumers

• Tools and APIs for data publishing, analytics, metadata management, account management and pricing, monitoring and billing, as well as a data portal for dataset exploration

• Supports OData

23

https://datamarket.azure.com/

Source: Microsofthttp://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409

Page 24: Wed roman tut_open_datapub

Socrata

• Specific focus on Open Data

• Open Data Portal: data publishing & clean-up, metadata generation, data-driven portals for data exploration and portal management

• API Foundry for creating and deploying RESTful APIs on top of the data

• Hosted data is accessible through the Socrata Open Data API (SODA) – a RESTful interface for searching and reading data in XML, JSON or RDF

24

http://www.socrata.com/

Source: Socrata

Page 25: Wed roman tut_open_datapub

DataMarket

• Provides statistical data from almost 100 data providers

• ~ 71 000 datasets

• Supports embeddable visualisations of data, data export, live feeds for data updates, ability for data publishers to monetize data via the marketplace, custom data driven portals for publishers, data portal, Web API

25

http://datamarket.com/

Page 26: Wed roman tut_open_datapub

Factual

• Data for ~ 65 million local business and points of interest in 50 countries; a product database of over 650,000 products

• Used to provide the option for hosting thousands of 3rd party data sets (“Community Data”) but activity has been discontinued

• Data is populated by means of Web crawls, data extraction and 3rd party data services; data model is tabular, based on taxonomy of around 400 categories

• Pricing is based on a pay-per-use model

• Data access is provided through a RESTful API

• Provides a set of tools for data management

26

http://www.factual.com/

Page 27: Wed roman tut_open_datapub

Junar

• Cloud-based Open Data platform to collect, enrich, publish and analyse open data

• Data can be consumed either directly via the Junar API, or via various visual widgets

27

http://www.junar.com/

Page 28: Wed roman tut_open_datapub

PublishMyData

28

• Hosted, as-a-service solution for Open and Linked Data publishing

• Uses DCAT and provides data access via Web APIs, a SPARQL endpoint and raw data-dumps

http://www.swirrl.com/publishmydata

Page 29: Wed roman tut_open_datapub

Other relevant solutions

• Comprehensive Knowledge Archive Network (CKAN)(http://ckan.org/) – web-based open source data management system for the storage and distribution of open data; datahub (http://datahub.io/)

• LOD2 (http://lod2.eu/) – research project aimed at providing an open source, integrated software stack for managing the lifecycle of Linked Data, from data extraction, enrichment, interlinking, to maintenance; not meant to be as-a-service solution

• Project Open Data (http://project-open-data.github.io/) – a set of open source tools, methodologies and use cases for publishing and utilising Open Data

• COMSODE (http://www.comsode.eu/) – research project aiming to create a publication platform for Open Data called Open Data Node

29

Page 30: Wed roman tut_open_datapub

DaPaaS – towards an Open Data- and Platfom-as-a-Service for Open Data

• DaPaaS – research project for simplifying data publication and consumption via a Data- and Platform-as-a-Service approach

30

http://dapaas.eu

DaPaaS Platform

Data Publisher

End-Users Data Consumer

Application Developer

publishes

open data

develops and deploys

applications on top

published data

consumes data resulting

from the available

applications

Page 31: Wed roman tut_open_datapub

DaPaaS – Requirements for Data Publisher

31

DP-02: Data storage and

querying

DP-04: Data interlinking

DP-03: Dataset search &

exploration

DP-09: Data availability

DaPaaS Platform

DP-05: Data cleaning &

transformation

DP-01: Dataset Import

DP-11: Secure access to platform

DP-10: User registration & profile

management

Data

Publisher

DP-08: Data scalability

DP-06: Dataset bookmarking &

notifications

DP-07: Dataset metadata management, statistics &

access policies

DP-12: UI for data publisher

DP-13: Data publishing

methodology support

Page 32: Wed roman tut_open_datapub

DaPaaS – Requirements for Application Developer

32

AD-04:Configure

application deployment

AD-01: Access to Data Publisher

services (DP-01 – DP-13)

AD-03: Develop applications in state-of-art programming

languages

AD-05: Deploy and monitor application

AD-06: Application metadata management,

statistics & access policies

DaPaaS Platform

AD-07: UI for application developer

AD-08: Application development methodology

support

AD-02: Data export

Application

Developer

Page 33: Wed roman tut_open_datapub

DaPaaS – Requirements for End-Users Data Consumer

33

DaPaaS Platform

End-User

Data Consumer

EU-03: Datasets and applications bookmarking

and notifications

EU-01: User registration & profile

management

EU-02: Search & explore datasets and applications

EU-04: Mobile and desktop GUI access

EU-07: High availability of data and applications

EU-05: Data export and download

Page 34: Wed roman tut_open_datapub

DaPaaS Platform Abstract High-Level Architecture

34

Data Layer

UX Layer

UX Services

Open Data

Warehouse

Platform Layer

Us

ag

e M

on

ito

rin

g

Application Hosting

Environment

Sec

uri

ty &

Ac

ce

ss

Co

ntr

ol

To

ol-

su

pp

ort

ed

Me

tho

do

log

y f

or

Data

Pu

blis

hin

g/C

on

su

mp

tio

n

DaaS Services

PaaS Services

DatasetsDaaS Services

DaaS Services

Data-Driven

ApplicationsPaaS ServicesPaaS Services

UX ServicesUX Services

Page 35: Wed roman tut_open_datapub

Summary

• Lots of open datasets, but few applications using them

• Simplifying data publication/consumption can enable an increase in the number (and quality) of applications using open data

• Various approaches emerging

– For data access: Web APIs, OData, SPARQL/LDP

– For data publication/provisioning: DaaS solutions

35

Page 36: Wed roman tut_open_datapub

Thank you!

36

Contact: [email protected]


Recommended