Date post: | 24-Jan-2018 |
Category: |
Data & Analytics |
Upload: | opencubeproject |
View: | 728 times |
Download: | 0 times |
Andriy Nikolov, fluid Operations AG, Germany
2nd OpenCube Webinar
15 September 2015
The OpenCube Toolkit:Overview
The OpenCube Project: Overview
The OpenCube Toolkit
Base platform
Components for processing statistical data
Conclusions
2
Table of Contents
2nd OpenCube Webinar
Data Cube
Statistical data is often organized as data cubes, where each cell contains a measure described based on a number of dimensions.
OLAP Operations: drill up/down, slicing, dicing, pivot etc. Data cubes essential for Business Intelligence
Dimensions Hierarchy
Measure
2nd OpenCube Webinar 3
Linked Data has the potential to enable combining and performinganalytics on top of disparate and previously isolated statistical data
The RDF Data Cube Vocabulary has been proposed for modellingmulti-dimensional data as RDF graphs.
However, tools for handling linked data cubes:
are only few and scattered
have not been tested under real-life conditions
4
Linked Data
Potential of using LOD in statistical data analysis unexploited
2nd OpenCube Webinar
5
The OpenCube project
OpenCube is a 2-year project funded by the EU within FP7
The project aims to develop and test processes and tools for managing statistical
linked open data.
The results will:
Facilitate data publishers to create linked data cubes from legacy formats
Empower data users to browse, visualise, link, expand and analyse data cubes.
Enable analysis not possible before (merging data cubes at a Web scale)
2nd OpenCube Webinar
We propose a lifecycle for statistical LD
The lifecycle is divided into three phases: create, expand and exploit (or consume)
The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.
OpenCube also develops tools to support the whole lifecycle of linked statistical data.
Linked Statistical Data Lifecycle
6
E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.
* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.
2nd OpenCube Webinar
For more information http://opencube-project.eu http://opencube-toolkit.eu
Check out our free webinars!! 1st webinar: Project overview & OLAP
browser: Slides:
http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015
Video: https://vimeo.com/138860345
Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]
7
More on OpenCube…
OpenCube consortium
2nd OpenCube Webinar
The OpenCube Project: Overview
The OpenCube Toolkit
Base platform
Components for processing statistical data
Conclusions
8
Table of Contents
2nd OpenCube Webinar
Creating components TARQL extension
D2RQ extension
JSON-stat
Grafter
R2RML-cube extension
(commercial offering only)
Expanding components OpenCube Expander
OpenCube Linker
Exploiting components Data catalogue solution
OpenCube Browser
OpenCube MapView
R Analytics Integration
9
OpenCube Toolkit
Developed using the open source Information Workbenchas underlying linked data management platform
License scheme OpenCube components are
provided under open source licenses
Check http://opencube-toolkit.eu
But, commercial solutions are also offered by consortium members
2nd OpenCube Webinar
2nd OpenCube Webinar 10
Base platform: Information Workbench
Platform for development of linked data applications
Semantic Web Data
Semantics- & Linked Data-based
Integration of Enterprise and Open
Data Sources
Intelligent Data Access and
Analytics
• Visual exploration
• Semantic search
• Dashboarding and reporting
Collaboration and Knowledge
Management Platform
• Wiki-based curation &
authoring of data
• Collaborative workflows
Source: http://www.fluidops.com/information-workbench/
2nd OpenCube Webinar 11
Platform Architecture
Data storage and management platform
Reusable UI and data integration components
Customized application solutions
External resources to reuse data and create mashups
Template: …
Ontology as a “Structural Backbone”
Resource page
RDF DataGraph
Ontology(RDFS/OWL)
#BarackObama#WhiteHouse
foaf:Person
vcard:Address
rdf:typerdf:type
Template:vcard:Address
UI templates
Template:foaf:Person
Resource page
Defining data
structure
Defining UI structure
2nd OpenCube Webinar 12
• Open Source, written in Java
• Layered architecture for semantic data management
• Easy to plug in new data management components on demand
• Most of the existing triple stores support Sesame API
Sesame Access API
SAIL API
Stable (yet extensilble) APIs for data access, manipulation, ...
SAIL 1 (e.g. Query Optimization
Layer)
SAIL 2 (e.g. Distributed Query
Execution Layer)
DB1 DB2 DB3
Stackable architecure of custom data management components
Easy integration by implementing a generic API
Data Storage & Access
Data Management based on Sesame framework
2nd OpenCube Webinar 13
Data Integration: Data Provider Concept
Data providers support the periodic extraction & integration from external data sources into a central repository
• Lifting from arbitrary data formats to RDF (e.g., relational, XML, CSV)
• Parametrizable (e.g. connection information, refresh interval, ..)
• Built-in UI for instantiating providers
• Intuitive interfaces and APIs for writing own, custom providers
Connect to data source
Convert data into RDF
Extract data from source
ScriptProvider
SOAP ProviderR2RML
XML2RDF
REST Provider
Examples:
Store RDF in repository
2nd OpenCube Webinar 14
Data source concept
2nd OpenCube Webinar 15
Data integration
Data Source
• Low-level data access
Mapper
• Translation into triples
•Extract and manipulate data
Post Processor
(optional)
•Reconciliation (merging)
• Improve data quality
User Interface: One Page per URI
Resource page
RDF
Graph
Resource page
Resource page
Resource page
172nd OpenCube Webinar
Wiki Concept
• Resource view is defined using the wiki-based UI
• Go to a new wiki page…/resource/Widget123Page
• Change to the Edit View
182nd OpenCube Webinar
Analytics and ReportingVisualization and Exploration
Mashups with Social MediaAuthoring and Content Creation
Widgets are not static and can be integrated into the UI using a
Wiki-style syntax.
Configurable Widgets
2nd OpenCube Webinar 19
Page content is composed based on a template concept:
Barack Obama
rdf:type
• Wiki template Template:foaf:Person• Table view config for foaf:Person• Graph view config for foaf:Person• Pivot view config for foaf:Person• Additional widget definitions for foaf:Personrequest for
dbpedia:Barack_Obama
foaf:Person
Resource page
• Wiki view for dbpedia:Barack_Obama• Table view for dbpedia:Barack_Obama• Graph view for dbpedia:Barack_Obama• Pivot view for dbpedia:Barack_Obama• Additional widget definitions for
dbpedia:Barack_Obama
+
Combined information from template definition and specific instance (giving instance config a priority)
Instance Pages vs. Templates
2nd OpenCube Webinar 20
Download open-source Information Worbench Community Edition
http://www.fluidops.com/en/company/training/open_source
Detailed documentation
http://help.fluidops.com
2nd OpenCube Webinar 21
More information
The OpenCube Project: Overview
The OpenCube Toolkit
Base platform
Components for processing statistical data Creating linked data cubes
Exploiting statistical data
Conclusions
22
Table of Contents
2nd OpenCube Webinar
We propose a lifecycle for statistical LD
The lifecycle is divided into three phases: create, expand and exploit (or consume)
The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.
OpenCube also develops tools to support the whole lifecycle of linked statistical data.
Linked Statistical Data Lifecycle
23
E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.
* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.
2nd OpenCube Webinar
The OpenCube Project: Overview
The OpenCube Toolkit
Base platform
Components for processing statistical data Creating linked data cubes
Exploiting statistical data
Conclusions
26
Table of Contents
2nd OpenCube Webinar
Managing metadata catalogues
Allows the user to search for specific datasets by keyword/category/catalogue
explore pre-defined relations between datasets within the catalogue
explore the available metadata descriptions of datasets (dataset structure)
Data Catalogue Management Solution
272nd OpenCube Webinar
28
Exploring data: OpenCube browserSummarize observations
across a dimension
(dimension reduction)
Change the axes
of the table
Change the
language
Change the fixed
values
It enables the exploration of an RDF data cube by presenting a two-dimensional slice of the cube as a table.
The slice is created by setting a fixed valuesfor each dimensionthat is not presented in the table.
2nd OpenCube Webinar
See our first webinar: http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015
Visualizes RDF data cubes on a map
Allows selecting the cube, dimensions, and measuresto display in an interactiveway
Supports: Markers
Bubble
Choropleth maps
29
Exploring data: OpenCube MapView
2nd OpenCube Webinar
Enables advanced data analysis tasks using the well-established R software
2nd OpenCube Webinar 30
Analyzing data with R
Passing input data retrieved from an RDF triple store
using SPARQL
Reusing the analysis results for visualization or
integration with the original data
2nd OpenCube Webinar 31
R Analysis Tasks
Analysis task is editedusing a web UI form
2 types of inputparameters: Constants
interpreted as variables ofbasic types in R
SPARQL query results interpeted as data frames
in R
Script executed on the R server, and the resultsare passed back to theOpenCube Toolkit
Making use of the results Visualize
Store as linked data
Visualisation of analysis results as a table
as a static chart built in R
as an interactive stock chart
Reuse of analysis results: preserving R output aslinked data Use R output as a tabular data source to import data and
convert with R2RML
32
Analyzing data with R
2nd OpenCube Webinar
OpenCube public demo
An instance of the developed platform hosted by fluidOps.
Contains metadata and a set of cubes from Eurostat.
Illustrates the data catalogue functionalities and data analysis using R.
http://data.fluidops.net
The Flemish Government An instance of the developed
platform have been deployed at the premises of the Flemish government.
Flemish government had already opened up statistics by means of linked data cubes.
11 cubes had been transformed to linked data according to the QB vocabulary and stored in a Virtuoso RDF store.
Demos
332nd OpenCube Webinar
The OpenCube Project: Overview
The OpenCube Toolkit
Base platform
Components for processing statistical data
Conclusions
34
Table of Contents
2nd OpenCube Webinar
OpenCube project develops processes and tools for statistical data management
OpenCube Toolkit provides:
A platform for building customized applications with linked datacubes
A range of software components for: Tools for creating linked open statistical data
Tools for expanding open statistical data
Tools for exploiting linked open statistical data
35
Conclusions
2nd OpenCube Webinar
For more information http://opencube-project.eu http://opencube-toolkit.eu
Check out our free webinars!! 1st webinar: Project overview & OLAP
browser: Slides:
http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015
Video: https://vimeo.com/138860345
Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]
36
More on OpenCube…
OpenCube consortium
2nd OpenCube Webinar
The work presented in the paper is partially funded by
37
Acknowledgments
http://opencube-project.eu
@OpenCubeProject
2nd OpenCube Webinar
PublishMyData for publishing governmental statistical data
Tuesday, September 22 at 06:00 PM CEST
http://opencube.enterthemeeting.com/m/VCAJFCJW
38
Next webinar
2nd OpenCube Webinar