+ All Categories
Home > Documents > Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf ·...

Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf ·...

Date post: 20-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
21
Project funded by the European Commission within the Seventh Framework Programme (2007 – 2013) Collaborative Project LOD2 – Creating Knowledge out of Interlinked Data Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools Dissemination Level Public Due Date of Deliverable Month 36, 31/08/13 Actual Submission Date Month 37, 10/09/13 Work Package WP 9, Use Case 3: LOD2 for Citizens: PublicData.eu Task T 9.1. Type Report Approval Status Final Version 0.1 Number of Pages 21 Filename lod2-d9-1-2-publicadata-eu-launch.pdf Abstract: This document provides a summary of the work to date on the publicdata.eu website and plans for future improvements. The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided “as is” without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/ her sole risk and liability. Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months
Transcript
Page 1: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

Project funded by the European Commission within the Seventh Framework Programme (2007 – 2013)

Collaborative Project

LOD2 – Creating Knowledge out of Interlinked Data

Deliverable 9.1.2: Intermediate Release of the

Publicdata.eu Website and Tools

Dissemination Level Public

Due Date of Deliverable Month 36, 31/08/13

Actual Submission Date Month 37, 10/09/13

Work Package WP 9, Use Case 3: LOD2 for Citizens: PublicData.eu

Task T 9.1.

Type Report

Approval Status Final

Version 0.1

Number of Pages 21

Filename lod2-d9-1-2-publicadata-eu-launch.pdf

Abstract: This document provides a summary of the work to date on the publicdata.eu website and plans for future improvements.

The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided “as is” without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a particular purpose. The user thereof uses the information at his/ her sole risk and liability.

Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months

Page 2: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 2

History Version Date Reason Revised by 0.1 09/09/13 Creation Sander van der Waal (OKFN) 1 First full version submitted for review 1.1 Updated to reflect reviewers comments

Author List Organisation Name Contact Information OKFN Sander van der Waal [email protected]

Page 3: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 3

Table of Contents

1.  OVERVIEW OF DELIVERABLE ............................................................................................................ 6 

1.1  BASIC FUNCTIONS .................................................................................................................................. 6 

1.2  ORIGINAL VISION (AT INCEPTION OF PROJECT – DEC 2010) ................................................................. 7 

1.3  PROBLEM STATEMENT ........................................................................................................................... 8 

  DELIVERY TIMELINE ........................................................................................................................ 10 

2.1  EARLY RELEASES.................................................................................................................................... 10 

2.2  RELEASES MARCH 2012 - AUGUST 2013 .............................................................................................. 10 

  CURRENT STATE OF PUBLICDATA.EU .............................................................................................. 11 

3.1  SCREENSHOT OVERVIEW ....................................................................................................................... 11 

  TECHNOLOGY ................................................................................................................................. 18 

4.1  CKAN ...................................................................................................................................................... 18 

4.2  RECLINE DATA EXPLORER (HTTP://RECLINEJS.COM/) ................................................................. 18 

4.3  TRIPLE STORE ......................................................................... FEHLER! TEXTMARKE NICHT DEFINIERT. 

  HARVESTED CATALOGUES .............................................................................................................. 18 

5.1  DATACATALOGS.ORG ............................................................................................................................ 19 

  KEY STATISTICS ON PUBLICDATA.EU .............................................................................................. 19 

  ENGAGEMENT AND EVENTS ........................................................................................................... 19 

  FUTURE .......................................................................................................................................... 19 

8.1  FUTURE DELIVERABLES ......................................................................................................................... 20 

Page 4: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 4

List of Figures

Figure 3: Publicdata.eu - Homepage ................................................................................................................ 11 

Figure 4: Publicdata.eu – Dataset page ............................................................................................................ 12 

Figure 6: Publicdata.eu – Search page .............................................................................................................. 13 

Figure 7: Publicdata.eu – Browse groups of datasets ..................................................................................... 14 

Figure 8: Publicdata.eu – Browse and search datasets within a group ........................................................ 15 

Figure 9: Publicdata.eu – Preview data ............................................................................................................ 16 

Page 5: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 5

Page 6: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 6

1. OverviewofDeliverable

PublicData.eu is a research prototype of a pan-European data catalogue and federation mechanism, developed by the Open Knowledge Foundation as part of the FP7-funded LOD2 project. Based on the CKAN1 open-source data portal, the site is developed as a use case and an early adopter of the LOD2 linked data stack technologies. After the LOD2 project’s launch in September 2010, an alpha version of the site was released in January 2011 followed by a full beta release in June 2011.

New releases on publicdata.eu have been prepared over the course of the last period and work continues to date. Changes in the CKAN code base have been synchronised with the main CKAN software development and releases, which also include work invested by the Open Knowledge Foundation outside of LOD2. In December 2012 a version of publicdata.eu has been released which included personalization features. This has been reported separately in Deliverable 9.2.

This Deliverable summarises development up until August 2013, the end of the third year of the LOD2 project. It includes discussion about publicdata.eu as well as the portal datacatalogs.org and highlights some of the key features that have been developed and other activities that have been prepared. Work that has been developed around improved support for RDF has been led by ULEI and is reported with the research paper in Annex 1.

1.1 Basic Functions

The portal’s backend uses CKAN’s harvesting framework to retrieve, normalise and convert dataset metadata from catalogues across Europe, including official national and regional catalogues as well as community-driven efforts. In the last year we have seen more interests from EU member states to provide their own national data catalogue, which has resulted in efforts on publicdata.eu to be shifted towards integrating official national data catalogues.

For example, the portal includes many national instances of CKAN, for example for the Netherlands, The United Kingdom, and Austria. The Austrian CKAN portal is a good example of a portal that by itself integrates regional portals, such as the Viennese one. We had already integrated the Viennese portal but there is no need for adding

The harvesting of datasets is performed via an automated service, using APIs where available and screen-scraping for the remaining catalogues. Further key functionality of the portal includes:

Multi-lingual metadata is managed within the system, allowing filtering and multi-lingual descriptions.

Multi-lingual interface: the interface language can be chosen by the user.

A SPARQL endpoint is offered in addition to the standard CKAN API, to allow easy access to structured catalogue metadata. All dataset views are RDFa-enabled and a RDF/DCat representation for each individual dataset is available through an extended API.

1 http://ckan.org/

Page 7: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 7

Applications and Ideas developed during the Open Data Challenge competition are presented on the site with multiple screenshots and tagging. This helps to highlight the value of making data available for reuse.

Categorisation is based on taxonomy normalisation, yielding a common set of dataset categories based on EUROVOC. This allows topical navigation across multiple dataset languages.

A custom visual style was applied to the portal, using CKAN’s advanced theming support to develop a functional and elegant interface.

A map-based overview of data availability throughout Europe, demonstrating which countries are leading in their effort to open up government information is displayed on the homepage.

1.2 Original Vision (at inception of project – Dec 2010)

Over the past 18 months there has been an explosion of interest in opening up official information for the public to reuse.

At a national level, numerous member states have national data catalogues, from the Digitalisér.dk data portal run by the Danish National IT and Telecom Agenc2y to the UK’s data.gov.uk site, launched under the direction of the founder of the World Wide Web, Sir Tim Berners-Lee3. There are countless city level initiatives across Europe as well -- from Helsinki to Munich, Paris to Zaragoza. As well as many open data cities that exist now, as many again are in the pipeline with plans to launch in the next 6 to 12 months.

This deluge of open, freely reusable data creates significant social and economic opportunities for European citizens. New digital services enable users to have information they are interested in delivered directly to them via email, web or mobile updates. For example, in the UK, TheyWorkForYou lets users know every time their elected representative speaks or when a topic of interest to them is discussed in the British Parliament4.

Large complex datasets can be broken down and presented in more intuitive ways. For example the WhereDoesMyMoneyGo project allows users to see where their taxes go using simple and intuitive data visualisation technologies in order to demystify a complex subject5.

Efforts are underway to link and combine datasets from a large number of different sources. This newfound data integration will ultimately allow developers to create new digital services capable of dealing with increasingly sophisticated questions and queries6.

2 http://bit.ly/dk-cat 3 http://data.gov.uk/ 4 http://www.theyworkforyou.com/ 5 http://wheredoesmymoneygo.org/ 6 http://linkeddata.org/

Page 8: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 8

In addition to increasing transparency and improving public service delivery, open data creates opportunities for businesses to build new kinds of commercial services around this new data. This opportunity is possible both because they have access technically to the data and because the data is legally accessible to use and reuse. A recent study estimates the market based on European public sector information could be worth as much as €27 billion7.

In order to unlock the potential of digital public sector information, developers and other prospective users must be able to find datasets they are interested in reusing. PublicData.eu will provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.

Information about European public datasets is currently scattered across many different data catalogues, portals and websites in many different languages, implemented using many different technologies. The kinds of information stored about public datasets may vary from country to country, and from registry to registry. PublicData.eu will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.

In addition to providing access to official information about datasets from public bodies, PublicData.eu will capture (proposed) edits, annotations, comments and uploads from the broader community of public data users. In this way, PublicData.EU will harness the social aspect of working with data to create opportunities for mass collaboration. For example, a web developer might download a dataset, convert it into a new format, upload it and add a link to the new version of the dataset for others to use. From fixing broken URLs or typos in descriptions to substantive comments or supplementary documentation about using the datasets, PublicData.eu will provide up to date information for data users, by data users. This “virtuous circle” of data improvement will help to police data quality as it demonstrates the need for open data.

Non-technical users, including researchers, journalists, ordinary citizens, will be able to use PublicData.eu to browse data and be able to find answers to questions -- much like how Wikipedia may be used. This will be accomplished through providing basic data analysis and visualisation tools together with more in-depth resources for those looking to dig deeper into the data.

1.3 Problem Statement

PublicData.eu aims to address the following problems:

It can be difficult to find datasets on official websites:

◦ Datasets often buried deep in websites

◦ Little or no relevant metadata for search engines

7 Estimate from the 2006 MEPSIR study, commissioned by the European Commission. See: http://bit.ly/mepsir

Page 9: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 9

◦ No central access point to datasets

◦ Language barriers

◦ Unclear licensing

◦ Broken links are not updated

No easy way to aggregate or compare data from different member states

Most official EU data catalogues lack any mechanism to capture value added to the datasets or metadata by users

Page 10: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 10

DeliveryTimelinePublicData.eu has been developed in an agile fashion, releasing early and often to maximise user

engagement. This has allowed us to go through several iterations of the site and make numerous improvements whilst maximizing value.

2.1 Early Releases

These releases were reported in Deliverable 9.1.1:

Alpha Launch - January 2011 - An alpha version of the site was launched in January 20118, as part of Work Package 1 on requirements gathering, prototyping and design. Feedback from this has helped us identify improvements, and the site was subsequently moved to using the CKAN software.

Beta Release - June 2011 - After releasing an experimental data catalogue federation and scraping front end in January 2011, this was the first iteration of the site based on CKAN, our data management system. While the basic functionality is still that of a read-only dataset search, a lot has changed behind the scenes.

Upgrade - March 2012 - In March 2012 we upgraded PublicData.eu to CKAN version 1.6 - adding the data preview functionality (powered by Recline), improvements to search, interface improvements to dataset pages, newly added resource (file) pages and group pages.

2.2 Releases March 2012 - August 2013

The main release done in the reporting period was around the personalization features of publicdata.eu. This was part of a major overhaul of CKAN which resulted in the CKAN 2.0 release, which was announced in beta in February 2013 and after more testing and bugfixing it was officially released in May 2013.

Development on publicdata.eu contributed significantly to CKAN 2.0.

8 See http://lod2.okfn.org/2011/01/27/alpha-prototype-of-publicdata-eu-is-now-live/

Page 11: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 11

CurrentStateofPublicData.EU3.1 Screenshot Overview

Publicdata.eu provides robust and useful search, filtering and previewing tools.

The front page (above) provides:

Immediate access to a free-text search, or the option to browse data by top-level category.

A clear link to an introductory ‘about’ page for new users, and links to register or login.

Selected applications from an apps showcase, also featured on the site, of applications that use open data. This includes featured apps from the Open Data Challenge9, and helps to demonstrate to end-users the wide range of possible uses for open data.

The site also features applications and ideas related to the datasets. We will work with the Apps for Europe project to integrate new projects and applications based on the data they work with.

A dataset may have one or more data ‘resources’ – linked or stored files (or linked database endpoints) 9 See http://opendatachallenge.org/

Figure 1: Publicdata.eu - Homepage

Page 12: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 12

containing the data. Each dataset has a dedicated page listing the resources, as well as general information about the dataset:

Figure 2: Publicdata.eu – Dataset page

Page 13: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 13

We have refreshed the search page, where users can search for datasets with a text search or tag. Datasets may be tagged with free-form tags, and searches filtered according by tag or other metadata, such as licence or file format:

Figure 3: Publicdata.eu – Search page

Page 14: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 14

As well as text search, users may also find it helpful to browse by general subject area. Datasets are arranged in subject-related groups:

Figure 4: Publicdata.eu – Browse groups of datasets

Page 15: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 15

From a group page, the user can browse or search the datasets within that group and use faceted search to filter on datasets of a specific type. The screenshot below shows all datasets that are Microsoft Excel sheets within the Transport group:

Figure 5: Publicdata.eu – Browse and search datasets within a group

Page 16: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 16

The resources within a dataset can be previewed on the site using the built-in data previewer. The previewer has the ability to do more sophisticated previews, such as graphs and maps. This is now also available for data that has been loaded remotely.

In the screenshot below you can see the map that has been previewed for a CSV dataset that has latitude and longitude fields to indicate the location of the data. The user can indicate which fields contain the references to the location and the software generates the map automatically.

Figure 6: Publicdata.eu – Preview data

Page 17: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 17

Another type of preview is the display of a graph for CSV files. The user can select which columns to display and the data is displayed in the graph. Different types of graphs are supported, including lines, points, bars and columns.

Figure 7 Example of previewing CSV datasets as a graph

Page 18: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 18

Technology4.1 CKAN

Publicdata.eu is powered by CKAN (http://ckan.org) - an open source data portal platform developed by The Open Knowledge Foundation. CKAN is a mature project with a growing community of users, and it powers both official and community-driven data catalogs around the world.

CKAN is built with Python on the backend and Javascript on the frontend, and uses The Pylons web framework and SQLAlchemy as its ORM. Its database engine is PostgreSQL and its search is powered by SOLR. It has a modular architecture that allows extensions to be developed to provide additional features such as harvesting or data upload.

CKAN uses its internal model to store metadata about the different records, and presents it on a web interface that allows users to browse and search this metadata. It also offers a powerful API that allows third-party applications and services to be built around it.

4.2 Recline Data Explorer (http://reclinejs.com/) The latest version of Publicdata.eu offers preview functionality for certain file types like CSV files. This

is powered by Recline, a Javascript library that offers both a powerful Data Explorer and a set of extensible of data components - data grid, graphing, and data connectors. As the screenshots in Chapter 3 show we have added support for grid, map and graphs to publicdata.eu.

HarvestedCatalogues The dataset information on PublicData.eu is gathered from national data catalogues from across the

EU. The majority of them are currently other CKAN instances, and the CKAN harvester makes it particularly easy to pull metadata from these catalogues.

In the first versions of publicdata.eu we have scraped open data catalogs that did not use dedicated software that published an API to the datasets. We have found this to be unsustainable, because some of these websites change rapidly which breaks the harvester and links to the datasets already stored. At the same time, we found that many new official data catalogs are being created using CKAN, and standards are emerging for other APIs to access the datasets. For example the Open Data Support project has intiated a working group to create a DCAT application profile for data portals in Europe. We have started work with the Spanish open data portal datos.gov.es, who are aiming to publish an RDF feed using the DCAT profile, to build a harvester that conforms to that standard.

New data catalogues are constantly being created and we know there are many catalogues which we aren’t currently federating. With this in mind we have created datacatalogs.org.

Page 19: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 19

5.1 Datacatalogs.org Datacatalogs.org aims to be the most comprehensive list of open data catalogs in the world. It is

curated by a group of leading open data experts from around the world - including representatives from local, regional and national governments, international organisations such as the World Bank, and numerous NGOs.

The site currently lists 116 official data catalogues from within the EU and we are working to include more of these in publicdata.eu, giving priority to the national data portals.

KeyStatisticsonPublicdata.eu Publicadata.eu now contains 20,262 datasets (as of 9 September 2013)

These datasets are harvested from 27 data catalogues

The portal also showcases 79 applications and ideas

The portal has received 42,359 visits between 23 August 2011 and 18 March 2012. (34,600 unique visitors)

EngagementandEventsWe see engagement with stakeholders as key to the success of the project. Therefore we have been

working hard to engage with key stakeholders at relevant events throughout the project. This includes:

Presented about publicdata.eu at the European Data Forum in Dublin – April 2013

Presented a webinar about CKAN and publicdata.eu in the LOD webinar series – April 2013

Engaging with programme partners and EU member state representatives at the Pan EU working group meetings – June 2013.

We have also been working to engage owners of other data catalogues to improve interoperability between different types of data catalogues. This will allow them to be more easily federated into projects like publicdata.eu.

FutureWe have a programme of improvements planned for Publicdata.eu. These are focussed on the

following key areas:

Work together with related projects to build a stronger network of data catalogue maintainers and support. This includes projects such as the Open Data Support project

Showcase newer and better applications that make use of the data on publicdata.eu by working together with app competition organisers via the Apps for Europe project.

Page 20: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 20

Focus on the sustainability of the CKAN development as an open source project, by building a stronger community of developers who are involved with developing CKAN portals throughout Europe

Working with consortium partners to integrate more closely with the rest of the LOD2 stack where possible.

8.1 Future Deliverables

Deliverable 9.1.3 Final release of the PublicData.eu website and tools (M48).

The final release will be devoted on deploying improvements for both the user interfaces and end-user functionalities, as well as regarding the classification, interlinking, enrichment and repair algorithms. Additional requirements and deployment targets for the final release will be obtained based on a large-scale user evaluation of the intermediate PublicData.eu releases.

Page 21: Deliverable 9.1.2: Intermediate Release of the Publicdata ...svn.aksw.org/lod2/D9.1.2/public.pdf · Deliverable 9.1.2: Intermediate Release of the Publicdata.eu Website and Tools

D9.1.1 – v. 1.1

Page 21

Annex1‐User‐drivenSemanticMappingofTabularDataAttached is the research paper that reports on the work that has been performed by Ivan Ermilov,

Sören Auer and Claus Stadler to improve support for RDF data on publicdata.eu. The abstract is copied below for reference.

Abstract

Governments and public administrations started recently to publish large amounts of structured data on the Web, mostly in the form of tabular data such as CSV files or Excel sheets. Various tools and projects have been launched aiming at facilitating the lifting of tabular data to reach semantically structured and linked data. However, none of these tools supported a truly incremental, pay-as-you-go data publication and mapping strategy, which enables effort sharing between data owners, community experts and consumers. In this article, we present an approach for enabling the user-driven semantic mapping of large amounts tabular data. We devise a simple mapping language for tabular data, which is easy to understand even for casual users, but expressive enough to cover the vast majority of potential tabular mappings use cases. We outline a formal approach for mapping tabular data to RDF. Default mappings are automatically created and can be revised by the community using a semantic wiki. The mappings are executed using a sophisticated streaming RDB2RDF conversion. We report about the deployment of our approach at the Pan-European data portal PublicData.eu, where we transformed and enriched almost 10,000 datasets accounting for 7.3 billion triples.


Recommended