1© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Is your Big Data journey stalling?Take the Leap with Capgemini and ClouderaIndustrializing your transition to the Modern Data Landscape
|
2© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Speakers
Andrea Capodicasa
Senior Solution Architect
Insights & Data
Goutham Belliappa
Big Data practice leader
Insights & Data
Alex Gutow
Senior Manager,
Product Marketing
3© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Agenda
• The Case for Change
• Industrializing the Change
• Adoption
• Q&A
4© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Capgemini Insights & Data Global PracticeGlobal reach with over 13,000 professionals across 40+ countries
with over 500 Big Data & Data
Science professionals, including
100+ Hadoop certified
consultants
with over 500 Big Data & Data
Science professionals, including
100+ Hadoop certified
consultants
We employ >13,000 information
management specialist
practitioners, deployed across
Capgemini’s global network
We employ >13,000 information
management specialist
practitioners, deployed across
Capgemini’s global network
We were recognised again by
Gartner as one of the 4 leading
information service providers
globally
We were recognised again by
Gartner as one of the 4 leading
information service providers
globally
Capgemini Insights & Data Global
Practice since 2015, delivering
business & IT Insights and data
services
Capgemini Insights & Data Global
Practice since 2015, delivering
business & IT Insights and data
services
Capgemini has a global reach and
local presence in 44 Countries and
over 100 Languages
Capgemini has a global reach and
local presence in 44 Countries and
over 100 Languages
5© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
The case for change
6© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Information Trends: What are seeing in the market place?
Recent years have brought unprecedented changes to the Information landscape. Each of these “disruptors” have
individual momentum and collectively represent significant opportunity to improve
an organization’s effectiveness.
Successful CIOs and leaders consciously take these trends into consideration when planning
the evolution of their information architecture.
Empower the business by focusing from the “user down”, not the “system up”.
Modeling business requirements months or even years
in advance and IT delivering a multi year plan to rollout
a solution that may not apply in a fast changing
business environment are long gone
Ms. Agility killed Mr. Waterfall
The availability of “finished” business functions within
the cloud provides organizations with tremendous
opportunities while increasing IT information
challenges
Cloud Computing
Open source architecture provides substantial
development and complexity cost savings vs. legacy
software packages.
Open Source
Software as a Service offerings in Big Data,
Data Transformation & finished analytics are removing
the infrastructure bottle necks of servers, software and
maintenance from obstructing
speed to market
As a Service
The proliferation of web-connected IP devices creates
a “hyper-evolving” cyber breach potential for
organizations; privacy laws create compliance
challenges with mobile devices
Security & Privacy
Traditionally data dictionaries have been single
purpose and technically focused. As data becomes
more valuable and the same information is used in
multiple ways, then the need for Business Meta-data
will become critical
Business Meta-Data
Has resulted in data where segments are loosely
connected and correlations are at times
non-intuitive, requiring new ways to mine
and derive insights
Social Computing
Massive in-memory databases with intensely complex
analytics are highly scalable -- change anything,
anytime, and simultaneously compare the results of
multiple scenarios in seconds
In Memory Analytics
Describes the transition from historical or hind-sight
indicators to insight and foresight indicators and
visualizations.
“Real” Analytics
7© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Customers are Looking for a Guide
8© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Cloudera Enterprise Making Hadoop Fast, Easy, and Secure
A new kind of data
platform
• One place for unlimited
data
• Unified, multi-
framework data access
Cloudera makes it
• Fast for business
• Easy to manage
• Secure without
compromisePublic CloudPrivate CloudHybrid Environments
Hybrid Deployment Flexibility
OPERATIONSDATA
MANAGEMENT
STRUCTURED UNSTRUCTURED
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT SECURITY
NoSQL
STORE
INTEGRATE
BATCH STREAM SQL SEARCH OTHER
OTHERFILESYSTEM RELATIONAL
9© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|The traditional approach to BI & Analytics is a bottleneck in the operational value chain
Traditional BI & Analytics approachTraditional BI & Analytics approach • Centralised BI teams too monolithic and divorced
from the business operations
• Insights latency
• Reporting on the past, limited ability to predict
and prescribe what is needed now
• Each new business question asked = more time
required to crunch the right data
• Heavy duplication in operational data throughout
the BI layers & systems
• Diluted data quality & governance create risks of
security breach, compliance issues & risk exposure
• Significant costs – infrastructure and people.
• Limited ability to scale - either from organic data
volumes growth or increasing data complexity
10© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|The Insights-driven enterprise puts information at the centreand insights “at the point of action”
Next Generation approachNext Generation approach • Next-generation data management platform enabling a
pervasive, real-time “insights & data fabric” serving
operations
• Standardized & cost effective data management, allowing
high agility on insights and the ability to “ask any
questions”
• Operational applications provide data and integrate
insights back in a continuous improvement loop
• Operations integrate predicted best outcomes to optimise
business processes, automatically where possible
• Ability to detect and catch events on the fly that will
require immediate action (e.g. fraud detection) for
optimal reaction or proactive action
• Coherent management of platforms & data management
processes, with insights & data science skills embedded
directly in the operational units for maximum impact
• Optimized total cost of ownership (TCO) with a
rationalized and simplified data landscape
11© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
OP
ER
AT
ION
S
DA
TA
MA
NA
GE
ME
NT
UNIFIED SERVICES
PROCESS, ANALYZE, SERVE
STORE
INTEGRATE
Key challenges blur the vision on both the target andthe journey to the Insights-driven enterprise
Challenges addressed
“Which data should we
retain and/or which data
could we archive?”
“I don’t know how to
drive value from my
data”
“Can I decrease costs by
moving my data
(landscape) to the cloud
or As-A-Service”
“How mature is my data
landscape in comparison
to the best industrial
trends?”
“I have been told to“
do something” about big
data analytics but don’t
know where to start”
“Can the Business
Intelligence landscape be
optimized to derive the
maximum value out of it?”
“Our data landscape is
scattered, complex and
very expensive, can we
fix it?”
Value created
A modern data strategy will enable:
� Reduced complexity: Rationalizing the
data strategy to meet demand
� Lower cost: Reduce the operating cost of
your data strategy
� Increased agility and better time to
market: More speed in the development
of new information applications
� More/Better insights and return on
intelligence: Ease to derive meaningful
insights and enable business
transformation
� Less risk: Reduce complexity of the data
strategy
� Data security & privacy: Make your data
strategy compliant with rules and
regulations
12© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Industrializing the change
13© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
MisuraMisura DiligentDiligent IdemIdem BlendBlend PapillonPapillon VirtuVirtu
Capgemini’s Leap Data Transformation FrameworkModules overview
Essence
(Semantic Layer consolidation)
� Analyze existing semantic layer of architecture
� Identify potential functional overlap and produce
recommendations for consolidation
Data concierge
� Business Information Catalog
� Self service ingestion, distillation, analytics
� Data Operations Services
Estimation Discovery Design/Build Testing
� Agile environment provisioning
� Continuous Integration lifecycleOne-Click leap
� Optimize/reduce
transformation scope
� Optimize
reporting design
� Optimize SQL � Industrialize end to
end testing
� Estimate the
transformation effort
� Optimize ETL semantic
design
14© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Diligent / Blend Applications
Business Problem
� Large and complex DW estates have been built over the last
20 years or, so and the infrastructure hosting them might need
update
� A number of reports and underlying tables will be duplicated
or not utilised anymore – they can be decommissioned saving
valuable resources
� Users are reluctant to give up “their” reports/data when
migrations programmes occur
Solution
� Scope reduction through identifying current BO reports that are not used. Up to 40% discovered with a customer of ours
� Scope reduction in identifying reports that are duplicates or share a number of data items.
� Automated method to migrate BO reports to Pentaho, hence reduced workload and reduced errors.
� A scientific and objective approach to measure which data are
actually used
� Diligent BO Audit data explorer to identify interactions
between users and Universes / Reports and tables
� Diligent BO Meta data gathering Module to extract Universe
and report information.
� Blend Report merger to identify reports reduction
� Blend XML Generator to create Pentaho reporting cubes from
Diligent gathered metadata.
DiligentDiligent BlendBlend
Accelerator Results
15© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|IDEM-DA
Business Problem
� The customer has very strict security and normalisation
requirements when loading their data, they need different
obfuscation types for different “semantic types pre” e.g.
names, phone numbers, social security numbers. Etc.
� Left it as a manual activity, this would imply a laborious and
time consuming identification of hundred of thousands of
columns – a costly and error prone activity
Solution
� Automated identification of tables columns for encryption,
and standardisation
� Automated creation of ETL meta-data spreadsheets which
drive Data Acquisitions Pentaho jobs for data migration
Accelerator Results
� Manual generation of meta-data
spreadsheet: Several Days - Weeks
� IDEM-DA: 15mins - 2 hours
� Manual eyeballing of data – human errors.
Can take hours to several days
� IDEM-DA: Approximately 70% reduction
and more accurate identification of known
types
Project manager of Data Migration
project: “IDEM-DA is the only way
forward”
IdemIdem
16© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Example table
IDEM-DA
Column Name Dataset
mob_no 07710232931,07083210302
email [email protected],
free_text_field My address is 12 lucky street,
London, E12 2TF
serial_id 11234, 22313, 3231313
Semantic Type
MOBILE_NO
Address
UNKNOWN
IDEM-DA
IDEM-DA is a Module used to support the ETL from legacy data warehouses into Modern architecture
IdemIdem
17© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|IDEM-ES
Business Problem
� The customer has a load pattern called “cutover+delta” –
historical tables are updated with daily files
� Although many tables have most of the columns with
similar names, Left it as a manual activity, this would
imply a time consuming identification of hundred of
thousands of columns – a error prone activity
Solution
� Machine learning based solution to automatically identify
similarity between columns (humanly supervised)
� Column name similarity (ngrams)
� Column content similarity (ngrams)
� Column content agnostic distribution (hist)
� Open architecture to automatically evaluate best
model (tested 600+)
� Automated creation of INSERT INTO ETL scripts
Accelerator Results
- Acceleration expected around 30-50%� Can automatically generate SQL insert statements to create
the current view
IdemIdem
18© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|IDEM-ES
IdemIdem
19© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|IDEM ES
IdemIdem
20© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Virtu – Data testing Framework
Business Problem
� Testing data migrations – and in general integrity of data
transformations in large scale BI/DW estates is complicated
� Thousands of objects moved across during the migration –
and when in production loaded every day might lead to
hundred of defects – without an automated system to keep
track of all of them can become a daunting task
� Continuously monitoring of the DQ performance and
regression error history is essential to maintain acceptable
levels of quality
Solution
Benefits
• Customer can easily plan and execute a large amount of checks – completely controlling their lifecycle (creation, modification,
decommissioning)
• Configurable engine to store details of defects to have maximum visibility and transparency on errors and their resolutions
• Native connection to modern defect management systems (Jira) – and easily expandable to any systems with reachable API
• DQ dashboard gives real time and drillable information on current DQ state
• Compatible with 3 system types – Oracle, Impala & MySQL
� A complete e2e testing framework that accelerates the
configuration, execution and evaluation of tests for large scale BI
domains
� Comprised of Web UI for maximum user friendliness in
configuration
� Scheduler engine to launch configurable batches of tests
� Real time Defect manager for timely defects issuing and
progress check
� DQ dashboard for monitoring state and progress
21© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Virtu – Testing Framework
22© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Virtu – Testing Framework
23© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
Adoption
24© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Leap Data Transformation Framework is the result of a client co-innovation process and delivered efficiencies on large projects
� Capgemini client in Public Sector is building a Business Data Lake (BDL) to
support all digital channels interactions as well as rationalize/optimize its IT
Business Intelligence legacy landscape on top of the new Big Data architecture
� In the scope of the IT Rationalization project, 10+ data warehouses, hundreds of
analytical business services, and thousands of BO reports must be moved on top
of the BDL, for thousands of business users throughout the organization.
� In this context, Leap Data Transformation Framework was used on a 1st business
scope
� Leap is a framework consisting of a transformation methodology and
accelerators across the transformation lifecycle which can operate at scale:
� The methodology is modular and covering all phases of transformations
� Elements of the Discovery phase were automated
� Design and Build process automation (metadata driven) and application
deployment controls delivered development efficiencies and scalability
� A metadata driven test automation framework reduced initial test effort
and subsequent regression test activities
� A Continuous Development process
� Platform application stack deployment efficiencies
ApproachApproach Key OutcomesKey Outcomes
Accelerator ResultsAccelerator Results
An end to end, fact-based transformation framework to deliver IT Rationalization on top of Big Data ar chitectures
� 40% reduction of the transformation
scope
DiligentDiligent
� 40% reduction of the transformation
scope
Diligent
� 15% efficiency in the design/build
process through use of:
• Semi-Automated ETL code optimizer
• Semi-Automated SQL optimizer
• Semi-Automated report optimizer
Idem Papillon BlendIdem Papillon Blend
� 15% efficiency in the design/build
process through use of:
• Semi-Automated ETL code optimizer
• Semi-Automated SQL optimizer
• Semi-Automated report optimizer
Idem Papillon Blend
� 10% efficiency in the test development
process (1st pass) & 30% efficiency in
regression testing through:
• Automated test & assurance
framework
VirtuVirtu
� 10% efficiency in the test development
process (1st pass) & 30% efficiency in
regression testing through:
• Automated test & assurance
framework
Virtu
25© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Use cases for Capgemini’s Leap Data TransformationFramework for optimized business data lakes
� For advanced clients embracing the potential of modern
architectures
� Opportunity to transform, simplify and rationalize an
organization’s data landscape for optimized TCO
� Leap Data Transformation full suite enables risk and cost
reduction working well in an agile approach
� For advanced clients embracing the potential of modern
architectures
� Opportunity to transform, simplify and rationalize an
organization’s data landscape for optimized TCO
� Leap Data Transformation full suite enables risk and cost
reduction working well in an agile approach
ReplatformingReplatforming
� For clients in need of better visibility of their current data
assets before moving to Big Data
� Leap Data Transformation Framework can help optimize
current data management processes, reduce substantially
transformation scope, identify the optimal platform for
the workloads and shape a future project for success
� For clients in need of better visibility of their current data
assets before moving to Big Data
� Leap Data Transformation Framework can help optimize
current data management processes, reduce substantially
transformation scope, identify the optimal platform for
the workloads and shape a future project for success
Legacy Discovery/DW optimizationLegacy Discovery/DW optimization
� Capgemini takes over current BI estate and modernizes it
through its NextGen BISC approach
� For clients with redundant and expensive DW estates
concerned about risks to move to modern architectures
� Leap Data Transformation Framework full suite is a key
element to optimize the TCO and ensuring quality in the
transformation process
� Capgemini takes over current BI estate and modernizes it
through its NextGen BISC approach
� For clients with redundant and expensive DW estates
concerned about risks to move to modern architectures
� Leap Data Transformation Framework full suite is a key
element to optimize the TCO and ensuring quality in the
transformation process
Managing existing BI &
move to modern architectures
Managing existing BI &
move to modern architectures
� For clients needing to automate their data testing in big
data environments or large relational environments
� Tools can automate the testing lifecycle for both big data
and traditional relational DW estates
� For clients needing to automate their data testing in big
data environments or large relational environments
� Tools can automate the testing lifecycle for both big data
and traditional relational DW estates
TestingTesting
26© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Replatforming legacy BI applications requires strong strategiesfor user adoption and decommissioning
Strong user adoption strategy
� End users understand the new value
they will get out of the new system
� They are empowered to use it
� Their success is spreading to new
initiatives
• They forget all about the old & slow
stuff fairly quickly
Weak user adoption strategy
� End users fear the new system will
impact their capacity to do their jobs
� The known is safer than the new
� First tests on the new systems
disappoint, any failure goes viral
� Evolutions still run on the old system,
“just in case”
Strong kill strategy
� Systems are killed according to
roadmap, costs linked to unused HW
& SW are recovered
� IT & Business impacts are
anticipated, managed and
communicated
� The energy is focused on the new
Weak kill strategy
� First systems are shut down ignoring
business constraints, impacting
operations
� Endless hours spent to compare the
old and the new and explain
differences
� Unprepared board escalations when
unplanned impacts arise
THE USER
ADOPTION
STRATEGY
THE KILL
STRATEGY
27© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Sample Table of contents for the output of a 4 week Data Warehouse Optimization roadmap based on LEAP
� Data Extract & Staging
� Data Management & EDW
� Semantic Layer
� Sandbox & Analytics
� Operational Analytics
� Data Virtualization Layer
� Master Data Management
� Metadata Management
� Data Distribution Layer
� Our Understanding
� Big Data Trends in Heavy Equipment /farm Industry
� Technology Principles
� Reference Architecture
– Conceptual Architecture
– Architecture Components
� Technology Choice Points
– ETL tool comparison
– EMR vs. Hadoop
� ETL & Data Offloading Plan
– Project Structure, Sequence, Sprints
– Assumptions
– Collaborative Planning & Prep
� Logical Architecture
� Business Value Proposition
� Current State Architecture
� End State Architecture
� Current State + 6 months Architecture
� Current State + 12 months
Architecture
� Current State + 18 months
Architecture
� Data Distribution Layer
28© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|
What’s next?
29© Cloudera, Inc. All rights reserved.
Copyright © Capgemini 2016. All Rights Reserved.
|Contact our experts
Schedule a discovery session with our
experts
Schedule a discovery session with our
experts
Schedule a first assessment of the value of
Leap for your organization
Schedule a first assessment of the value of
Leap for your organization
Goutham Belliappa
https://www.linkedin.com/in/gouthambelliappa
Andrea CAPODICASA
Duane Garrett