+ All Categories
Home > Technology > SC7 Workshop 2: Big Data Technologies and Scenarios

SC7 Workshop 2: Big Data Technologies and Scenarios

Date post: 14-Apr-2017
Category:
Upload: bigdataeurope
View: 155 times
Download: 0 times
Share this document with a friend
21
BIG DATA EUROPE: PILOTS AND TECHNOLOGIES BDE SC7 Workshop, Brussels 18 October 2016 Vangelis Karkaletsis, NCSR Demokritos
Transcript
Page 1: SC7 Workshop 2: Big Data Technologies and Scenarios

BIG DATA EUROPE:

PILOTS AND TECHNOLOGIES

BDE SC7 Workshop, Brussels18 October 2016

Vangelis Karkaletsis, NCSR Demokritos

Page 2: SC7 Workshop 2: Big Data Technologies and Scenarios

BDE Architecture

Big Data Integrator (BDI):

o The prototype developed by BDE

Main points of the architecture

o Dockerization

o Support layer, including integrator UI

o Semantic layer

20-oct.-16www.big-data-europe.eu

Page 3: SC7 Workshop 2: Big Data Technologies and Scenarios

BDI components

20-oct.-16www.big-data-europe.eu

Processing and storage components

o Re-used existing docker containers where available

o Dockerized by BDE where not

o Ensured all can be provisioned through Docker Swarm

Components by BDE:

o Support Layer

o Semantic Layer

Page 4: SC7 Workshop 2: Big Data Technologies and Scenarios

BDE Docker Containers

20-oct.-16www.big-data-europe.eu

Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon,

Elastic Search, Hive, Semagrow

Processing: Spark, Flink, Sansa

Stream ingestion middleware: Flume, Kafka

Page 5: SC7 Workshop 2: Big Data Technologies and Scenarios

BigDataEurope Pilots

20-oct.-16www.big-data-europe.eu

Page 6: SC7 Workshop 2: Big Data Technologies and Scenarios

SC1: Pharmacology research

20-oct.-16www.big-data-europe.eu

Life

Sciences &

Health

• Extensive toolset

developed by OPF

and others

• Query a large number of datasets, some large

• Existing elaborate ingestion and homogenization by the

OpenPHACTS Foundation

Page 7: SC7 Workshop 2: Big Data Technologies and Scenarios

SC1 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Life

Sciences &

Health

• Existing distributed, scalable solution• Based on Virtuoso, proprietary distributed

database

• Porting to BDI gives flexibility• Using Virtuoso or a number of open source

alternatives without development effort for the

superstructure and tools around it

• Porting to BDI offers new functionalities• Logging and system health monitoring

Page 8: SC7 Workshop 2: Big Data Technologies and Scenarios

SC2: Viticulture resources

20-oct.-16www.big-data-europe.eu

Food and

Agriculture

• AgInfra is a major infrastructure for agriculture researchers,

serving cross-linked bibliography, data, and processing

services

• Pilot automates

publication ingestion

and thematic

classification

Page 9: SC7 Workshop 2: Big Data Technologies and Scenarios

SC2 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Food and

Agriculture

• AgInfra: Existing infrastructure for data and services

that process it

• BDI is deployed as an external infrastructure for

processing text (viticulture publications)

• Allows storing and processing text at a larger scale than

AgInfra can currently manage

• Extracts (smaller) bibliographic metadata from (larger)

full texts to be served by AgInfra

Page 10: SC7 Workshop 2: Big Data Technologies and Scenarios

SC3: Predictive maintenance

20-oct.-16www.big-data-europe.eu

Energy

• Wind turbine condition monitoring

applies computational models to

sensor data streams

• Models are weekly re-

parameterized using week’s data

from multiple turbines

Page 11: SC7 Workshop 2: Big Data Technologies and Scenarios

SC3 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Energy

• Existing in-house non-scalable solution for model

parameterization

• Reliable Fortran software for data analysis

• Efficient, but not scalable to data volume

• Developing a BDI orchestrator

• Re-uses existing software unmodified

• Makes it easy to apply in parallel to many datasets and

manage the outputs

Page 12: SC7 Workshop 2: Big Data Technologies and Scenarios

SC4: Traffic conditions estimation

20-oct.-16www.big-data-europe.eu

Transport

• Estimation of real-time traffic

conditions in Thessaloniki

• Combines:

• Traffic modelling from historical

data

• Current measurements from a taxi

fleet of 1200 vehicles

Page 13: SC7 Workshop 2: Big Data Technologies and Scenarios

SC4 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Transport

• New Flink implementations of map matching and traffic

prediction algorithms

• BDI provides access to varied data sources

• PostGIS database with city map

• ElasticSearch database of historical data

• Kafka stream of real-time data

Page 14: SC7 Workshop 2: Big Data Technologies and Scenarios

SC5: Climate modelling

20-oct.-16www.big-data-europe.eu

Climate

• Discovering and re-using previously computed

derivatives

• Lineage annotation: datasets and model

parameters used to compute derivative

datasets

• Finding appropriate past runs avoids

repeating weeks-long modelling runs

• Preparing modelling experiments

• Slicing, transforming, combining datasets into new datasets

• Submission to and retrieval from modelling infrastructure

Page 15: SC7 Workshop 2: Big Data Technologies and Scenarios

SC5 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Climate

• Existing infrastructure and stable, reliable software for

parallel computation of models

• BDI is deployed as an external infrastructure for

preparing and managing datasets

• BDI offers:

• Hive for managing data in a way that can be retrieved

and manipulated, rather than file blocks

• Cassandra stores structured and textual metadata for

searching headers and lineage

Page 16: SC7 Workshop 2: Big Data Technologies and Scenarios

SC6: Municipality budgets

20-oct.-16www.big-data-europe.eu

Social

Sciences

• Ingestion of budget and budget

execution data

• Multiple municipalities in varied

formats and data models

• Homogenized data made

available for analysis and

comparison

Page 17: SC7 Workshop 2: Big Data Technologies and Scenarios

SC6 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Social

Sciences

• Existing model, analytics and visualization tools• Use SPARQL queries to retrieve only the relevant slices of the

overall data

• BDI is deployed as an ingestion and storage infrastructure• Ingests and homogenizes a constant flow of JSON, CSV, XML,

and other formats following various data models

• Exposes data as SPARQL endpoint serving homogenized data,

stored in 4store, a scalable, distributed RDF store

• Creates an online Dashboard on economic data

Page 18: SC7 Workshop 2: Big Data Technologies and Scenarios

SC7: Change detection & verification

20-oct.-16www.big-data-europe.eu

Secure

Societies

• Events are extracted from text

published by news agencies and on

social networking sites

• Events are geo-located and relevant

changes are detected by comparing

current and previous satellite images

Page 19: SC7 Workshop 2: Big Data Technologies and Scenarios

SC7 Pilot: Points Demonstrated

20-oct.-16www.big-data-europe.eu

Secure

Societies

• Re-implementation of change detection algorithms for

Spark

• Parallel orchestrator for text analytics

• Re-uses existing software

• Scales to many input streams

• BDI provides:

• Cassandra for text content and metadata

• Strabon GIS store for detected change location

• Homogeneous access to both for analysis and

visualization

Page 20: SC7 Workshop 2: Big Data Technologies and Scenarios

Closing Remarks

20-oct.-16www.big-data-europe.eu

Page 21: SC7 Workshop 2: Big Data Technologies and Scenarios

Questions?

20-oct.-16www.big-data-europe.eu

BigDataEurope Web site:

https://www.big-data-europe.eu

Big Data Integrator:

https://github.com/big-data-europe

Thank you for your attention!


Recommended