Flanders Open Data Day II - KeyNote - Erik Mannens

Post on 21-Jun-2015

393 views 0 download

Tags:

transcript

ELIS  –  Mul*media  Lab  

What if

dr.  Erik  Mannens  @erikmannens  

Open Data, Linked Data, and Big Data

We need

together

ELIS  –  Mul*media  Lab  

Open Data

ELIS  –  Mul*media  Lab  

Way of … Thinking

ELIS  –  Mul*media  Lab  

Silos of Data

ELIS  –  Mul*media  Lab  

“Stop Hugging your Data”

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

e.g. … Open Learning

ELIS  –  Mul*media  Lab  

Open Data Linked

ELIS  –  Mul*media  Lab  

Way of … Publishing

ELIS  –  Mul*media  Lab  

Semantic Web

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

Connect your Silos

ELIS  –  Mul*media  Lab  

5-stars (Technical Perspective)

Open Linked Data (Tim Berners-Lee)

Make your Stuff available on the Web

Make it available as Structured Data

In a non-proprietary Format

Use URLs to identify Things, so one can point at your Stuff

Link your Data to other People’s Data to provide Context

ELIS  –  Mul*media  Lab  

5-stars (Organisational Perspective)

Open Data Engagement (Tim Davies)

Be Demand-driven

Provide Context

Support Conversation

Build Skills & Capacity

Collaborate with the Community

ELIS  –  Mul*media  Lab  

5-stars (Functional Perspective)

Open Data Portal Functionalities (iMinds)

Dataset Registry

Metadata Provider

Co-creation Platform

Data Publishing Platform

Common Data Hub

ELIS  –  Mul*media  Lab  

Data as Commodity

ELIS  –  Mul*media  Lab  

Sidenote

R&Wbase

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

15’ Open Data Publishing Framework

e.g. data.gent.be

opendata.antwerpen.be

ELIS  –  Mul*media  Lab  

Publishes 2 to 5 Star Data

tdt/core tdt/input triple store

ELIS  –  Mul*media  Lab  

REST-full API for Developers

triple store

core RESTful data adapter

CSV

XLS

JSON

XML SPARQL endpoint

...

e.g. datatank.gent.be/Grondgebied/Straten or data.irail.be/NMBS/Stations

ELIS  –  Mul*media  Lab  

R&Wbase

git for triples

ELIS  –  Mul*media  Lab  

Read/Write

LINKED DATA

ELIS  –  Mul*media  Lab  

TRIPLE STORES are they up for the challenge?

ELIS  –  Mul*media  Lab  

Distributed Triple Version Control

Commits

Deltas Virtual graphs

Versions

store describe

identify resolve

ELIS  –  Mul*media  Lab  

LIVE triples require fast version retrieval

LIGHTWEIGHT algorithm

through a

ELIS  –  Mul*media  Lab  

Store triples QUADS <subject> <predicate> <object> <context>

using

ELIS  –  Mul*media  Lab  

R&Wbase

GRAPH access

TRIPLE STORES

PROVENANCE

VERSION

with direct

provides control for

and

ELIS  –  Mul*media  Lab  

Data BIG

ELIS  –  Mul*media  Lab  

Way of … Analyzing

ELIS  –  Mul*media  Lab  

How Difficult Can It Be?

ELIS  –  Mul*media  Lab  

Collaborative Effort found Higgs Boson

ELIS  –  Mul*media  Lab  

Banking Industry

Healthcare Industry

Marketing Industry

Smart Cities

Deep understanding of some key Big Data markets

ELIS  –  Mul*media  Lab  

•  US Securities and Exchanges Commission has estimated that it would need to collect 20 terabytes of data per month to monitor all US capital market activity

•  Unstructured data comprises some 80% of the total data held by the average financial institution

•  The total number of non-cash payments in the EU amounted to 90.6 billion in 2011.

•  The total number of automatic teller machines (ATMs) in the EU in 2011 was 0.44 million

•  The number of points of sale (POS) terminals in the EU was 8.8 million in 2011

Big (Data) Bang in Banking

ELIS  –  Mul*media  Lab  

What if it were

OPEN & LINKED

ELIS  –  Mul*media  Lab  

e.g. … OpenSpending

ELIS  –  Mul*media  Lab  

e.g. … OpenSpending

ELIS  –  Mul*media  Lab  

e.g. … OpenBank

ELIS  –  Mul*media  Lab  

e.g. … OpenCorporates

ELIS  –  Mul*media  Lab  

e.g. … OpenCorporates - Belgium

ELIS  –  Mul*media  Lab  

•  Medical images are increasing by 20-40% annually

•  Electronic medical records: in 2009, 99% of primary care physicians in the Netherlands used EMRs, compared to 46% in the United States and 36% in Canada

•  Medical research, in which 100,000 participants are genotyped (ca. 1.5 GB/person), could result in a staggering 150 terabytes of data.

•  As of July 2012 PatientsLikeMe members have shared 4,029,661 symptom reports about 7,338 symptoms and 548,650 treatment histories about 12,838 treatments

Big (Data) Bang in Healthcare

ELIS  –  Mul*media  Lab  

What if it were

OPEN & LINKED

ELIS  –  Mul*media  Lab  

e.g. … PatientsLikeMe

ELIS  –  Mul*media  Lab  

e.g. … 23AndMe

ELIS  –  Mul*media  Lab  

e.g. … PlayStation III

ELIS  –  Mul*media  Lab  

e.g. … OpenPhacts

ELIS  –  Mul*media  Lab  

e.g. … DisQover (iMinds –Ontoforce)

ELIS  –  Mul*media  Lab  

•  Data use is expected to grow by as much as 44 times, amounting to some 35.2ZB (zettabytes -- a billion terabytes) globally

•  Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.

•  Twitter has 200 million tweets per day or approximately 46MB/sec of data created (August 2011)

•  25% of search results for the World’s Top 20 largest brands are links to user-generated content

•  YouTube has 3 billion visitors per day, 48 hours of video is uploaded per minute (May 2011)

•  There are over 200,000,000 blogs: 34% of their posts are opinions about products & brands

Big (Data) Bang in Marketing

ELIS  –  Mul*media  Lab  

What if it were

OPEN & LINKED

ELIS  –  Mul*media  Lab  

e.g. … Consumers in 1990

ELIS  –  Mul*media  Lab  

e.g. … Consumers in 2000

ELIS  –  Mul*media  Lab  

e.g. … Consumers since 2010

ELIS  –  Mul*media  Lab  

The Tyranny of the Empowered ConsYOUmers

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

e.g. … GoodRelations

ELIS  –  Mul*media  Lab  

e.g. … Nike

ELIS  –  Mul*media  Lab  

•  Data use is expected to grow by as much as 44 times, amounting to some 35.2ZB (zettabytes -- a billion terabytes) globally

•  Sensors, social media feeds, photos, video and cellphone GPS signals account for 2.5 quintillion bytes of data per day

•  More than 50% global population lives in cities and this number is forecast to rise to 69% by 2050

•  The number of city residents is expected to grow from 3.5 billion to 5 billion in the next 20 years

•  ‘Internet of Things’ Age is approaching: 25 billion devices connected to the Internet by 2015 and 50 billion by 2020

•  Access to public data is estimated to be worth €27 billion in the EU •  ICT-enabled energy efficiency could translate into over €600 billion

worth of cost savings for the public and private sector

Big (Data) Bang in Smart Cities

ELIS  –  Mul*media  Lab  

What if it were

OPEN & LINKED

ELIS  –  Mul*media  Lab  

e.g. … OpenTransport

ELIS  –  Mul*media  Lab  

e.g. … OpenTransport

ELIS  –  Mul*media  Lab  

e.g. … OpenEnergyMonitor

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

e.g. … Big Data … in Iceland?

ELIS  –  Mul*media  Lab  

e.g. … a Trillion Sensors … in Iceland!

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

ELIS  –  Mul*media  Lab  

QUESTIONS?

dr. Erik Mannens erik.mannens@ugent.be

@erikmannens

Thoughts?

ELIS  –  Mul*media  Lab  

Credits

•  EMC - Greenplum •  Peter Hinssen •  Scott Brinker •  Jim Lecinski •  David Armano •  Did not have time to check all licenses of the Flickr

photos – in my defense, I did not kill anyone nor did I in any way insult and/or infringe the CIA, NSA, NDA, or any other JAA (Just Another Acronym)