Date post: | 28-May-2015 |
Category: |
Data & Analytics |
Upload: | edward-curry |
View: | 607 times |
Download: | 2 times |
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
KEY TECHNOLOGY TRENDS FOR BIG DATA IN EUROPE
Edward Curry, Insight @ NUI Galway Tilman Becker, Andre Freitas, John Domnique, Helen Lippell, Felicia Lobillo, Ricard Munné, Axel Ngonga, Denise Paradowski, Sebnem Rusitschka, Holger Ziekow, Martin Strohbach, Sonja Zillner, and all the many many contributors to the Technical Working Groups and Sectorial Forums
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
2
OVERVIEW
Business Context Methodology
Value-Driven Use Case Technology Trends
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
BUSINESS CONTEXT
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
4
BIG DATA IN EUROPE
“Possibly one of the few last chances for Europe‘s software industry to take a true leadership “ K-H Streibich, CEO
“This is a revolution: and I want the EU to be right at the front of it.” Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda, March 2013
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
5
INCREASED OPENNESS
Ecosystems Approaches
Open Innovation Open Data
Community-based Tools and Data
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
BIG METHODOLOGY
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
7
SECTORIAL FORUMS AND TECHNICAL WORKING GROUPS
Health Public Sector Finance & Insurance
Telco, Media& Entertainment
Manufacturing, Retail, Energy,
Transport
Needs Offerings
Big Data Value Chain
Technical Working Groups
Industry Driven Sectorial Forums
Data Acquisition
Data Analysis
Data Curation
Data Storage
Data Usage
• Structured data • Unstructured data • Event processing • Sensor networks • Protocols • Real-time • Data streams • Multimodality
• Stream mining • Semantic analysis • Machine learning • Information extraction
• Linked Data • Data discovery • ‘Whole world’ semantics
• Ecosystems • Community data analysis
• Cross-sectorial data analysis
• Data Quality • Trust / Provenance • Annotation • Data validation • Human-Data Interaction
• Top-down/Bottom-up • Community / Crowd • Human Computation • Curation at scale • Incentivisation • Automation • Interoperability
• In-Memory DBs • NoSQL DBs • NewSQL DBs • Cloud storage • Query Interfaces • Scalability and Performance
• Data Models • Consistency, Availability, Partition-tolerance
• Security and Privacy • Standardization
• Decision support • Prediction • In-use analytics • Simulation • Exploration • Visualisation • Modeling • Control • Domain-specific usage
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
8
SECTORIAL ANALYSIS METHODOLOGY
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
9
TECHNICAL WORKGROUP APPROACH
Senior Academic
Senior Management
Middle Researcher
Middle Management
Position in Organisation
University
MNC
SME
Other
Types of Organisations
1. Literature & Technical Survey 2. Subject Matter Expert Interviews 3. Stakeholder Workshops 4. Online Questionnaire (with
NESSI)
• Early adopters • Business enablement • Technical maturity • Key Opinion Leaders
Methodology
Interviewee Breakdown
Target Interviewee
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
10
SUBJECT MATTER EXPERT INTERVIEWS
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
11
WORKING GROUP RESULTS
Interviews, Technical White Papers, Sector's requisites and Roadmaps available on: http://www.big-project.eu
Expert Interviews Technical Whitepapers
▶ Executive Overview
▶ Key Insights ▶ Social & Economic
Impact
▶ Concise State of the Art
▶ Future Requirements & Emerging Trends
▶ Sector-specific Case Studies
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
VALUE-DRIVEN USE CASE
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
13
VALUE-DRIVEN USE CASES
Health Public Sector Finance & Insurance
Telco, Media& Entertainment
Manufacturing, Retail, Energy,
Transport
Industry Driven Sectorial Forums
Industry 4.0
Increasing Productivity of Wind Farms
Public Service Integration with Open Data Retail
Data Markets
Data-Driven Therapy Guidance
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
14
THE DATA LANDSCAPE (1/2)
▶ Much of Big Data technology is evolving evolutionary ▶ Old technologies applied in a new context
▶ Volume, Variety, Velocity, Value …
▶ Business processes change must be
revolutionary to enable new opportunities ▶ Industry 4.0 (industrial internet)
▶ Predictive maintenance ▶ Opportunities for data-driven improvements
▶ integration with customer and supplier data ▶ Moving from infrastructure services (IaaS) to
software (SaaS) to business processes (BPaaS) to knowledge (KaaS)
Technology Evolution
Process Revolution
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
15
THE DATA LANDSCAPE (2/2)
▶ The long tail of data variety is a major shift in the data landscape ▶ Coping with data variety and verifiability are
central challenges and opportunities for Big Data
▶ Cross-sectorial uses of Big Data will open up new business opportunities ▶ Need for scalable approaches to cope with data
under different format and semantic assumptions
Variety and Reuse
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
16
REUSE OF HEALTH DATA
▶ Aggregation, analysis and presentation of clinical, financial, administrative and other related data
▶ Goal is to discover new valuable knowledge ▶ Identify trends, predict outcomes or influence patient care,
drug development, or therapy choices ▶ Patient recruiting & profiling for conducting clinical studies
Secondary Usage of Health Data
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
17
DATA POOLS IN HEALTHCARE MAIN IMPACT BY INTEGRATING VARIOUS AND HETEROGENEOUS DATA SOURCES
Clinical Data
§ Owned by providers (such as
hospitals, care centers, physicians, etc.)
§ Encompass any information stored within the classical hospital information systems or EHR, such as medical records, medical images, lab results, genetic data, etc.
Claims, Cost & Administrative Data
§ Owned by providers and payors § Encompass any data sets relevant for
reimbursement issues, such as utilization of care, cost estimates, claims, etc.
Pharmaceutical & R&D Data
§ Owned by the pharmaceutical
companies, research labs/academia, government
§ Encompass clinical trials, clinical studies, population and disease data, etc.
Patient Behaviour & Sentiment Data
§ Owned by consumers
or monitoring device producer
§ Encompass any information related to the patient behaviours and preferences
Health data on the web
§ Mainly open source § Examples are
websites such as PatientLikeMe, Linked Open Data, etc.
Highest Impact on integrated data sets
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
PEER ENERGY CLOUD
Dr. Martin Strohbach Senior Researcher
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
19
PEER ENERGY CLOUD
Smart grid pilot in Saarlouis 100 households
Berlin
Saarlouis Innovation award Engage consumers to optimally use local solar energy § Understand consumption and
save § Trade solar energy in the
neighborhood to balance the grid
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
20
DEVICE LEVEL ENERGY MONITORING
Monitored/controlled grid today Monitored/controlled grid tomorrow
Germany aims at 30% clean/renewable energy by 2020, seeking to build a smart grid
Sensors today
Sensors tomorrow(consumer level)
Energy Consumption
Temperature Movement,...
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
21
GETTING READY FOR DATA VOLUMES IN FUTURE GRIDS
PeerEnergyCloud Pilots allows us to get ready for future data volumes today
How much data is really needed for what?
1 value per year
35.040 values per year
today smartmetering
540 million values per year
PeerEnergy-Cloud
? Billion values per year
Future possibilities
Optimum?
7 devices per household every 2 seconds , 4-5 measurements
per devices
every 15 minutes real-time analytics
on mass data (grouped aggregation)
Scalable statisticsover hundreds of millionsof measurements
Automatic detectionof load anomalies(spotting inefficienciesand defects)
Household activity state inference and prediction
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
22
IDENTIFIED NEEDS FOR DEVICE LEVEL MONITORING
Managing Large Data RDBMs didn‘t easily support our data volumes as well as Hadoop did Real-time Insights E.g. for forecasting energy demand and anomaly detections is required to make
efficient decisions
Data Security and Privacy Privacy and confidentiality preserving data analytics are required to enable the
service provider to retrieve the knowledge without violating the agreed upon granularity, in PEC this was realized by dynamic configurability of data access( which data, what purpose, what granularity, …)
Ease of use Simplifications of applying machine learning techniques on Big Data sets would
help speeding up development, e.g. unified batch/stream abstractions, standardized data integration, visualization tools
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
KEY TECHNOLOGY TRENDS
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
24
THE DATA VALUE CHAIN
Data Acquisition
Data Analysis
Data Curation
Data Storage
Data Usage
• Structured data • Unstructured data
• Event processing
• Sensor networks
• Protocols • Real-time • Data streams • Multimodality
• Stream mining • Semantic analysis
• Machine learning
• Information extraction
• Linked Data • Data discovery • ‘Whole world’ semantics
• Ecosystems • Community data analysis
• Cross-sectorial data analysis
• Data Quality • Trust / Provenance • Annotation • Data validation • Human-Data Interaction
• Top-down/Bottom-up
• Community / Crowd
• Human Computation
• Curation at scale • Incentivisation • Automation • Interoperability
• In-Memory DBs • NoSQL DBs • NewSQL DBs • Cloud storage • Query Interfaces • Scalability and Performance
• Data Models • Consistency, Availability, Partition-tolerance
• Security and Privacy
• Standardization
• Decision support • Predictions • In-use analytics • Simulation • Exploration • Modeling • Control • Domain-specific usage
Big Data Value Chain
• Technical working groups examine the the state of the art and future developments in big data across the whole value chain of big data:
• Working groups publish Technical white papers that result from desktop research and in-depth interviews with leading experts.
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
25
IMPROVING USABILITY
Usability ▶ Lowering the usability barrier for data tools: Users should
be able to directly manipulate the data ▶ Improvement of Human-Data interaction: Enabling experts
& casual users to query, explore, transform, & curate data ▶ Interactive exploration: Big Data generates insights beyond
existing models, new analysis interfaces must support browsing and modeling (visual analytics)
▶ Convergence within analytical frameworks Analytical databases for better performance and lower development complexity (Mahout, Spark, Hadoop/R, rasdaman, SciDB)
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
26
BLENDING HUMAN AND ALGORITHM Blended Approaches ▶ Blended human and algorithmic data processing
approaches for coping with data acquisition, transformation, curation, access, and analysis challenges for Big Data
Analytics & Algorithms
Entity Linking Data Fusion
Relation Extraction
Human Computation
Relevance Judgment
Data Verification Disambiguation
Better Data Internal Community - Domain Knowledge - High Quality Responses - Trustable
Web Data
Databases
Sensor Data
Programmers Managers
External Crowd - High Availability - Large Scale - Expertise Variety
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
27
A CROSS-SECTOR TREND… Telco, Media, & Entertainment
Manufacturing, Retail, Energy & Transport
Public Sector Life Sciences
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
28
COMMUNITY AND ECOSYSTEMS
Community ▶ Solutions based on large communities (crowd-based
approaches) and Ecosystems are emerging as a trend to cope with Big Data challenges
Emerging Economic Model for Open Data ▶ Pre-competitive collaboration efforts ▶ Pistoia Alliance (pharmaceutical data) ▶ Share costs, risks and technical challenges ▶ Benefit from collective wisdom and network
effect for curated dataset
▶ Community provided data (crowd-based collection, data quality, analysis and usage)
▶ Community tools which are interoperable and usable ▶ Support from large communities or large companies
Ecosystems are Important
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
29
COMMUNITY DATA Community Analysis and Collection § Number of data collection points can be dramatically increased; § Communities are creating bespoke tools for the particular situation and to
handle any problems in data collection (Developer Ecosystem) § Citizen engagement is increased significantly
Real-time radiation monitoring City Noise Levels
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
30
STANDARDS
Standardization & interoperability ▶ Principled semantic and standardized data representation
models are central to cope with data heterogeneity ▶ Minimum information models needed
▶ Significant increase in the use of new data models (i.e. graph-based) (expressivity and flexibility)
▶ Better integration between data tools ▶ Standardization of Query Interfaces
!source: TU Berlin, FG DIMA 2013
Technology Stacks Open Challenges • Unclear Adoption Paths for
Non-IT Based Sectors • Lack of standards and
best practices is major barrier for adoption
• Privacy and Security is Lacking Behind
Open Challenges
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
31
END-TO-END ARCHITECTURES
Architectures ▶ Design end-to-end architectures for full data lifecycle
▶ Support for both “Data-at-Rest” and “Data-in-Motion” ▶ Data Hubs and Markets: Hadoop-based solutions tend to
become central integration point for all enterprise data
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
32
BIGGEST BLOCKERS
▶ Lack of Business-driven Big Data strategies ▶ Undiscovered und unclaimed potential business
values ▶ Data Sharing & Exchange ▶ Need for format and data storage technology
standards ▶ Data Privacy and Security ▶ Regulations & markets for data access ▶ Legal frameworks for data sharing &
communication are needed ▶ Human resources ▶ Lack of skilled data scientists and data
engineers
Key Technical Requirements
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG Big Data Public Private Forum
33
KEY INSIGHTS
Key Trends ▶ Lower usability barrier for data tools ▶ Blended human and algorithmic data processing for coping with
for data quality ▶ Leveraging large communities (crowds) ▶ Need for semantic standardized data representation ▶ Significant increase in use of new data models (i.e. graph)
(expressivity and flexibility)
▶ Much of (Big Data) technology is evolving evolutionary
▶ But business processes change must be revolutionary
▶ Data variety and verifiability are key opportunities
▶ Long tail of data variety is a major shift in the data landscape
The Data Landscape ▶ Lack of Business-driven Big Data
strategies ▶ Need for format and data storage
technology standards ▶ Data exchange between
companies, institutions, individuals, etc.
▶ Regulations & markets for data access
▶ Human resources: Lack of skilled data scientists and data engineers
Biggest Blockers
BIG Final Event Workshop - September 30, 2014 - Heidelberg
Thank you
http://www.bigdatavalue.eu http://www.big-project.eu
Dr. Edward Curry Research Fellow, Insight @ NUI Galway. [email protected]
Tilman Becker (DFKI, Data Usage), Andre Freitas (NUI Galway, Data Curation), John Domnique (STI, Data Analysis), Helen Lippell (Press Association, Media), Felicia Lobillo (ATOS, Retail), Ricard Munné (ATOS, Public Sector), Axel Ngonga (InfAI, Data Acquisition), Denise Paradowski (DFKI, Retail), Sebnem Rusitschka (Siemens, Energy and Transport), Holger Ziekow (AGT, PEC), Martin Strohbach (AGT, Data Storage), Sonja Zillner (Siemens, Health), and all the many many contributors to the Technical Working Groups and Sectorial Forums
Interviews, Technical White Papers, Sector's requisites and Roadmaps available on: http://www.big-project.eu