Business Data Lake best practices OOP Munich, 2017-01-31
2 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
The speaker – Arne Roßmann
! Part of Insights & Data team • Global team delivering around BI, DWH, Information
Strategy & Big Data
! Working in Business Intelligence since 2008 ! Delivering as Big Data architect & Project
Manager at our clients • Defining processes • Creating architectures • Leading projects
! Worked in many industries • Retail, Chemical, Financial, Logistics, Automotive, ...
3 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Capgemini’s Insights & Data Global Practice
With 15,000 experts globally, we are a recognized leader in information-led transformation
Capgemini’s Insights & Data Global Practice
Expertise in Big Data & Analytics Capgemini Solutions
! Over 15,000 consultants globally
! Industrialized delivery framework Next Gen Business Insights Service Centre
! CUBE lab on the cloud with various demonstrations for BI environments
! Built-in Tools for interactive agile BI and Devops
Partner Ecosystem
800+ Big Data & 400+ Data Science Global Consultants
Customer Analytics ! Segmentation &
Behavior Profiling ! Behavior Propensity
scoring ! Pricing Analytics
Marketing & Campaign Analytics
! Campaign Recommendation
! Cross Sell/Up Sell ! Campaign
Measurement ! Campaign Execution
Management
Operations Analytics ! Sales/ Demand
Forecasting ! Activity Based Costing ! Call Center Analytics
Asset/ Equipment Analytics
! Warranty Analytics ! Asset Performance
Monitoring ! Predictive Asset
Maintenance ! Insights from Connected
Equipment
Fraud Analytics ! Fraud Scoring ! Collusion Fraud
Identification ! Fraud Framework for
Public Sector (Trouve)
Content Analytics ! Text Mining Accelerators ! Key Opinion Leader ! Content Analytics for
Fraud Detection
Business Data Lake offering
Data Warehouse Optimization Solution
Strategic Alliances and partnerships with major vendors
Enabling Co-Innovation with the CUBE lab
Experience in designing and deploying big data analytics solutions in a varied ecosystems
4 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Table of Contents
! Why the Business Data Lake works
! Services your Business Data Lake should provide
! Standardize, Industrialize and Innovate!
Why the Business Data Lake works
6 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Big Data creates opportunities but poses challenges as well
Where do I start ?
“We know that Big Data can be helpful but how do we quantify the
benefits and develop a Business Case?”
“How do we know which Big Data technology/platform(s) suits our
architecture and business requirement? “
“How do I get all the unstructured data (mainly images) out of my operational
processes, into an analytical environment that allows me to
experiment with data?”
“Can we easily combine data from multiple source systems into our Big Data environment and visa versa?”
“Can I do it myself? What skills do I need for Big Data? “
“How do I measure the effectiveness or performance of my Big Data
initiative? How do I measure ROI?”
7 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Businesses are looking to close the gap towards ‘insight driven’
Have not completely integrated their data sources across the organization
79% Scattered data lying in silos across the organization
Do not have well-defined criteria to measure the success of their own Big Data initiatives
67%
Absence of clear business case for funding and implementation
Dependence on legacy systems for data processing and management Use cloud based Big Data
and analytics platforms 36%
Have either scattered pockets of resources or follow a decentralized model for analytics initiatives
Ineffective co-ordination of Big Data and analytics teams
47%
8 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
The Business Data Lake delivers what we need for the new data landscape.
Govern Where it matters
Encourage local requirements
Distill on demand
Store securely
! Focus on MDM ! Enforce only when sharing ! Treat Corporate as
aggregation of Local.
! Let the business decide what they need
! Build from the bottom ! Enable traceability to
source disposable data views.
! Store everything ‘as is’ ! Include structured and
unstructured data ! Store it cheaply were
possible
! Select only what you want ! Business friendly tooling ! Re-usable information
maps ! Rapid change cycle.
9 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Business Challenges driving the need for BDL services
Business Enablement
! Achieve real-time optimization of business processes through predictive insights and performance analytics
! Enhance new services and stay competitive in the market
! Be agile, get insights fast
Control Control
! Ensure data security and compliance with EU data regulations
! Enable up- and downscaling according to business needs
Control Control
! Reduce costs associated with the governance and secure storage of data
! Control the costs of running flexible data services
! Reduce Capex
Services your Business Data Lake should provide
11 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Capgemini can help accelerate clients’ journey to Insights..
A cloud powered, big data & insights service; bring all your data in one place, deliver insights at the point of action and generate differentiated business value.
‘Software- Defined’’, full stack cloud
infrastructure
Flexible ‘Pay-as-you-go’
Commercial Model
Secure as a Vault
‘Ready to Harvest’ Sector & Domain
Insights
Modular Hybrid & Elastic
powered by ‘Intelligent
Automation’
Get started quickly: with our platform , tools and expertise we can support you at any level to manage your data and harvest insights
Your ‘Lab in the Cloud’
! Experiment ! Hypothesize ! Simulate
12 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
The BDL architecture we built for our clients
Pla$
orm
asaService
Insights Platform UX Portal HTML 5, CSS, Angular JS
Big Data Lab Dataset Library
Data Science Lab Models Library
Insights Lab Ready Insights
Common Services Common Services
Ingest Algorithm Library
Sector Insight Labs
Smart Insights 360
Catalog & Provision
Meter &
Bill
Resource M
onitor
Provision
Service C
atalog
IoT Framew
ork
Access M
gmt
Know
ledge Base
Helpdesk
RESTfulWebServices
Infrastructure
asaService
HybridCloudExtensibility-(Bosh,CF)CG-CSB,Virtustream
StorageandParallelizaIon-EMCIsilon
Compute&Memory-EMCVCE
BigDataSuite–Pivotal,Cloudera,Hortonworks
VMware,Cortex
DataManagement–InformaIca,Talend,HDF,ApacheNify
AnalyIcstools-SAS,Madlib,RStudio,Spark
Vmware
Security&Governance
RSA,AD,Kno
x,Ranger,Ke
rberos,A
tlas,TDE
,W2W
,Metron,
Falcon
ITSM
-BMCRe
med
y
• Common Web UI and UX architecture
• Fully Virtualized compute, storage & Network
• Intelligent automation of provisioning, process, service and support orchestration
• Modular Component Architecture
• Multiple points of presence
• Seamless integration between on-premise, private & public cloud
• Proven reference and component architecture for on-premise builds
• Professional Services teams to build full stack
• Demo of full stack • Accelerated Partner
enablement
MD&LM Environment
HadoopDistribuIon–Hortonworks,Cloudera
RE&D,DevOps-CloudFoundry,Jira,Jit,
Application Layer Infra Layer User Access Layer Software & Services
VisualisaIon–Qlik,Tableau,SASVA,D3,HighCharts
Visualisation Visualisation Self Service Insights
CapgeminiPrivateCloud OnPremiseCloud
13 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
BDLaaS – illustrative example service Dashboard
14 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Standardize, Industrialize and Innovate!
15 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Big data processing is done in three different stages and we have to cater to each stage differently
! Continuously running analytics processes
! Trust in data quality ! Service levels secured ! Managed by IT
Operationalize
! Store everything: internal and external, structured and unstructured
! Store granular data ! Minimal effort on IT
Load “as-is”
! Agile and explorative way of work
! Self service ! Fail fast
Distill on demand
Time
Stage
Actors
Paradigms
IT implements data integration process for
production
Data providers and IT provide and store data
Data scientists and engineers explore and
analyze data
1 2 3
Allow creativity Encourage collaboration Ensure Business Meta
Data & Data Catalogue Enable Data Masking
Industrialize!
Examples of technical metadata ! Path (folder location)
! Filename
! File type
! File size
! Date of ingestion
! Technical Owner / Group
! For HIVE:
! Nr of records / lines
! Column number
! Column names if available
! Column data types
! Value distribution
! Min/Max
Examples of business metadata
! Project (possibly automatic)
! Data set name
! Logical description of dataset
! Data owner/data stewart
! Confidentiality classification
! Line of business
16 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
Start using ELT tools now!
Need for more platform updates
Need for more denormalization
Need for more specialized Know-How
" Abstraction layer to Hadoop processing engines
" Abstraction layer to NoSQL & SQL databases
" Standardized control flows
" Availability of developers
ELT Tools offer:
17 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
17 Copyright © Capgemini 2016. All Rights Reserved
Insights as a Service – Analytics Cloud for Oil & Gas major
Well Health Dashboards
Equipment Performance
Disaster Management
Supply Chain Analytics
Predictive Maintenance z z z Device Data
Driving behavior, GPS, diagnostics, etc.
Real Time Data System Data
Environment Data Project Data • 10 data points per
sec • 40 GB per field • 5-6 GB per day
per well, • 80TB Well data
year
• 24x7x365 monitoring usage
• Real time charts of streaming data
• Real time alerts • Thermal
Visualizations
18 Copyright © Capgemini 2015. All Rights Reserved
OOP MUC 2017 - Business Data Lake best practices
We helped customers getting to real value within 12 weeks from idea to production.
1 3 a 5 6 7 9 11
Business Insights Need
Integrate DataSet
Model Build and Training
Iterate and Tune
Data Exploration
Test Data Science Model
Apply Data Science
12
Business Validation
Publish Insights
Weeks
Business Problem Identified
Business Value Delivered
The information contained in this presentation is proprietary. Copyright © 2015 Capgemini. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
www.capgemini.com
About Capgemini
With more than 145,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2014 global revenues of EUR 10.573 billion.
Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model
Learn more about us at www.capgemini.com.