Date post: | 08-Apr-2017 |
Category: |
Health & Medicine |
Upload: | bigdataeurope |
View: | 20 times |
Download: | 0 times |
BIG DATA EUROPE
H2020 CSA (2015-17)
SC1 – HEALTH CHALLENGE WEBINAR
Integrating Big Data, Software & Communities for Addressing Europe’s Societal ChallengesApril 4th 2017
Kiera McNeice, Ronald Siebes, Hajira Jabeen and Nick Lynch
BigDataEurope
5-avr.-17www.big-data-europe.eu
The 7 Societal
Challenges and their
first pilots
SC1: Life Sciences & Health
5-avr.-17www.big-data-europe.eu
SC1: Life Sciences & Health
SC1: Life Sciences & Health
5-avr.-17www.big-data-europe.eu
SC1: Life Sciences & Health
5-avr.-17www.big-data-europe.eu
SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
SC2: Food & Agriculture
SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Partners:FAO, the largest autonomous agency within the
United Nations system and one of the main
players in the agricultural information
community.
Big Data Focus area: Large-scale distributed agricultural data integration
Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural
Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase
Semantic Web Company (SWC) is a technology provider headquartered in
Vienna (Austria). SWC supports organizations from all industrial sectors
worldwide to improve their information management. Their core product is to
extract meaning from big data by making use of linked data technologies.
Agroknow is a company that captures, organizes and adds value to the
rich information available in agricultural and food sciences, in order to
make it universally accessible, useful and meaningful.
SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Pilot focus area:
Viticulture(from the Latin word for vine)
is the science, production,
and study of grapes.
It deals with the series of
events that occur in the vineyard.
SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Pilot 2: Support advanced crop
data discovery, processing,
combining and visualization from
distributed and heterogeneous
data repositories
Vine and Wine sector: emerging market in EU
Sustainability and biodiversity challenges:
local varieties are being lost
Exploitation of new grapevine varieties and
clones in terms of climate change adaptation
Quality and health status of viticultural
products
Contribution to human health (antioxidants,
prevention of heart diseases etc.)
Wide variety of heterogeneous (and big)
data from various information sources
Reasons:
SC3: Energy
5-avr.-17www.big-data-europe.eu
SC3: Energy
SC3: Energy
5-avr.-17www.big-data-europe.eu
Partners:A public entity supervised by the Ministry of Environment,
Energy and Climate Change in Greece, founded in
September 1987, active in the fields of Renewable
Energy Sources (RES), Rational Use of Energy (RUE) and
Energy Saving (ES).
Big Data Focus area: Real-time turbine monitoring stream processing and analytics
Selected Key Data assets: European Energy Exchange Data, smart meter sensor data,
gas/fuels market/price data, consumption statistics, stratigraphic model data (geology,
geophysics)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
SC3: Energy
5-avr.-17www.big-data-europe.eu
Pilot focus area:
System monitoring
in energy production
units.
SC3: Energy
5-avr.-17www.big-data-europe.eu
Pilot 3: Operation, maintenance
and production forecasting for
wind turbines on real-time sensor
data.
Current technology is not able to deal with
full amount of available valuable data
Economic benefit of predicting output and
prevention of damage (if one can predict one
part about to fail it can be prevented that other
parts get damaged)
Large continuous stream of sensor data,
perfect to test our platform
Reasons:
SC4: Transport
5-avr.-17www.big-data-europe.eu
SC4: Transport
SC4: Transport
5-avr.-17www.big-data-europe.eu
Partners: The Fraunhofer Society is a German research organization with 67
institutes spread throughout Germany, each focusing on different
fields of applied science.
Big Data Focus area: Streaming sensor network & geo-spatial data integration
Selected Key Data assets: GTFS data, OSM/LinkedGeoData, MobilityMaps, Transport
sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure -
EuroRoadS
The Centre for Research and Technology-Hellas (CERTH)
founded in 2000 is one of the leading research
centres in Greece. CERTH includes the Hellenic Institute of
Transport (HIT): Land, Sea and Air Transportation as well
as Sustainable Mobility services
ERTICO - ITS Europe is a partnership of around 100 companies
and institutions involved in the production of Intelligent Transport
Systems (ITS).
SC4: Transport
5-avr.-17www.big-data-europe.eu
Pilot focus area:
Info mobility and
traffic planning
SC4: Transport
5-avr.-17www.big-data-europe.eu
Pilot 4: Multisource data collection
for the provision of accurate info-
mobility and advanced transport
planning service in Thessaloniki,
Greece
Congestion is a major problem in Europe,
especially in urban areas.
utilizing real-time probe data for the
provision of accurate info-mobility services and
advanced transport planning, leads to better
decisions
The use of mobility data coming from multiple
sources presents significant challenges,
especially due to the different nature of the
datasets both in content and spatio-temporal
terms as well as due to the fact that the data
should be collected and processed in real time.
Reasons:
SC5: Climate
5-avr.-17www.big-data-europe.eu
SC5: Climate
SC5: Climate
5-avr.-17www.big-data-europe.eu
Partners:A public entity supervised by the Ministry of Environment,
Energy and Climate Change in Greece, founded in
September 1987, active in the fields of Renewable
Energy Sources (RES), Rational Use of Energy (RUE) and
Energy Saving (ES).
Big Data Focus area: Enormous simulation time. Extremely complicated computing model.
Selected Key Data assets: European Grid Infrastructure (EGI). Access to several data centres
hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan, NIKhEF-Amsterdam.
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
SC5: Climate
5-avr.-17www.big-data-europe.eu
Pilot focus area:
Supporting data-intensive
climate research
SC5: Climate
5-avr.-17www.big-data-europe.eu
Pilot 5: Downscaling, and retrieval
process on (raw) climate data via
User-defined parameters (e.g.
geographical areas, time period,
physical variables, computational
grids, time steps)
The provision of Climate model data satisfies
an important objective, that of assessing the
potential impacts of climate change on well
being for adaptation, prevention and mitigation
measures and supporting other policy making
decisions.
The awareness led to the availability of huge
datasets
Downscaling is a computational intensive
process
Reasons:
SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
SC6: Social Sciences
SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
Partners:CESSDA provides large scale, integrated and sustainable
data services to the social sciences. CESSDA is organised
as a limited company under Norwegian law owned and
financed by the individual EU member states’ ministry of
research or a delegated institution.
Big Data Focus area: Statistical and research data linking & integration
Selected Key Data assets: Federated social sciences data catalogs, statistical data from public
data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
Pilot focus area:
Citizens budget spending on
municipal level
SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
Pilot 6: Citizens budget
in municipal level
Budget: the most important document of
public policy
Budget execution affects everyday lives
Citizens are more involved in city level
Having a platform that integrates
heterogeneous budget data (many municipality
have their own data formats) and calculates
infographics would benefit the citizens, the
research community and policy makers
Reasons:
SC7: Security
5-avr.-17www.big-data-europe.eu
SC7: Security
SC7: Security
5-avr.-17www.big-data-europe.eu
Partners:The Centre supports the decision making of the European
Union in the field of the Common Foreign and Security
Policy (CFSP), by providing products and services
resulting from the exploitation of relevant space assets
and collateral data, including satellite imagery and
aerial imagery, and related services.
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
SC7: Security
5-avr.-17www.big-data-europe.eu
Big Data Focus area: Image data analysis
Selected Key Data assets: Earth Observation data (e.g. Very High Resolution Satellite
Imagery acquired from commercial providers and governmental systems) and collateral data
for supporting CFSP/CSDP missions and operations
SC7: Security
5-avr.-17www.big-data-europe.eu
Pilot focus area:Getting insight in man-made surface
changes triggered by automatic detection, news, or
social media information
SC7: Security
5-avr.-17www.big-data-europe.eu
Pilot 7: Ingestion of remote
sensing images and social
sensing data to detect and verify
man-made changes on the Earth
surface for security applications
Evacuation route planning
Monitoring of critical infrastructures
Border security
Satellite image data is HUGE and
computational intensive to compare
Smart ‘focus’ algorithms are needed to
prioritize the analysis jobs
Reasons:
Big Data Europe Integrator Platform
Dr Hajira Jabeen, University of Bonn
SC1 Webinar
Platform Goals
◎Opensource
◎Simple to get started with Big Data
◎Support a variety of use cases
◎Embrace emerging Big Data technologies
◎Simple integration with custom components
Key actors
Platform Architecture4
5
Platform Architecture
Platform Architecture6
Platform Architecture Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast Satellite Image Analysis
Platform Layer
Spark Flink Semantic Layer
Ontario SANSA SemagrowKafka
Real-time Stream Monitoring
...
...
Resource Management Layer (Swarm)
Hardware Layer
Premises Cloud (AWS, GCE, MS Azure, …)
Data Layer
Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
Supported FrameworksSearch/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
8
BDI Stack Lifecycle
BDI Stack Lifecycle
Deploy BDE
Platform/Stack
to the Cluster
BDI Stack Lifecycle
Stack/Cluster
Monitor
BDI Stack Lifecycle
Developing
Custom
Applications
BDI Stack Lifecycle
Docker Images
BDI Stack Lifecycle
BDI Stack (workflow)
builder
BDI Stack Lifecycle
Custom Components
*Init Daemon
*Integrator UI
◎ High level pictureo docker-compose.yml describes pipeline topology
◎ BDE provided componentso extend template image with your code
◎ New componentso build a Docker image for your componento this is your own little Virtual Machine for your component
◎ Sharingo publish topology as git repositoryo publish new components on docker hub
Platform development
Actors
◎Cluster Setup ◎Developer ◎Packaging◎Stack Composition / Integration◎Deployment◎Monitoring
17
Platform installation
◎Manual installation guide◎Using Docker Machine
o On local machine (VirtualBox)o In cloud (AWS, DigitalOcean, Azure)o Bare metal
◎Screencasts
18
Development◎Base Docker images
o Serve as a template for a (Big Data) technologyo Easily extendable custom algorithm/data
◎Published componentso Image repositories on GitHubo Automated builds on DockerHubo Documentation on BDE Wiki
19
Deploying a Big Data Stack◎ Stack
o collection of communicating components o to solve a specific problem
◎ Described in Docker Composeo Component configurationo Application topology
20
Enhancing the Component
◎ Orchestrator required for initialization process (init_daemon)o Components may depend on each othero Components may require manual intervention
◎ User Interface Integrationo Standard Interfaces from componentso Combine and align the interfaces
21
User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖Workflow Builder
❖Workflow Monitor
o Swarm UI
o Integrator UI
22
BDE Workflow Builder23
BDE Workflow Monitor24
Swarm UI
Swarm UI26
Integrator UI27
Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data by adding meaning to it!
28
Semantic Data Lake (Ontario)
◎Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎Data Lake
o Add a Semantic layer on top of the source datasets
o The data is semantically lifted using existing ontology terms
29
31
SANSA Stack
Thank youhttps://github.com/big-data-europe
32
33
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight virtualization
Plug & play components (no rigid schema)
no no no no yes
High Availability Single failure recovery (yarn)
Single failure recovery (yarn)
Self healing, mult. failure rec.
Single failure recovery (yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control system
- Docker swarm UI+ Custom
34
BDE vs Hadoop distributions◎BDE is not built on top of existing distributions◎Targets
o Communitieso Research institutions
◎Bridges scientists and open data◎Multi Tier research efforts towards Smart
Data
35
Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation
4 Apr 2017
Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation
4 Apr 2017
Summary
3
• Update on Docker and Open PHACTS
• Learnings & transition to AWS
• Next Steps & Future Releases
Open PHACTS @dockerhub
14
https://hub.docker.com/r/openphacts/
Open PHACTS Next Steps
34
• Data Refresh planned API 2.2:–Phase 1: ChEMBL, WikiPathways, Uniprot + Chemistry
Refreshed (RDF and linksets)
–Phases 2 & 3: Remaining data sources
–Build data refresh processes
• Wider Architecture Review
• Science and Open PHACTS Webinar–Science and Open PHACTS: Workflow tools for Life
Science Research
–https://register.gotowebinar.com/register/2550359383420450817
Open PHACTS
35
• Custom Data Staging:
–Different licensing options to cover Annotated SureChEMBL for members/non members
• MicroServices?
–Part of Architecture review to discuss future services/API
–Interested in experiences of this
• Workflow
–BioExcel Workflow blocks in development
–See Bio.tools