#TalendConnect#TalendConnect
Best practices for unleashing the power of data lakesIsabelle Nuage & Christophe Toum, Big Data Products, Talend
#TalendConnect
Self-service data lake, cafeteria style
Using sensor data collected in real-time to improve gas turbines reliability, operational performance and extend lifetime value.
#TalendConnect
Why Do We Need a Data Lake?“Data lakes are enterprise-wide data management platforms for analyzing disparate sources of data in its native format.”, Gartner.
Busin
ess V
alue
Reducing cost
Generating new opportunities
• ETL offload• EDW offload/optimization• Data archiving
• Customer acquisition, retention..• Real-time engagement• Pricing optimization• Demand forecasting• Risk and fraud• Predictive maintenance• Smart products…
#TalendConnect
But Data Lakes Bring New Challenges
The rest of us
Data Lakes Bring New Challenges
High-end users
Complexity, poor governance and control, no reuse
#TalendConnect
Data Lake – Conceptual Architecture
AcquireIngest
Understand & Improve
Curate & Govern
DeliverSelf-service
SCALE
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Continuously refreshed data Continuous data delivery and data processes
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Wide connectivity Batch & streaming ubiquity Scale with volume and variety
Pitfalls:o Hand codingo Fragmented tools
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Add context on data (provenance, semantics…)
Optimize data with curation, stewardship, preparation…
Use a collaborative process
Pitfalls:o Authoritative governanceo Inconsistent framework
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Pervasive DQ, masking… Consistent operationalization Single platform for all use cases
& personas
Pitfalls:o Fragmented toolso Hand codingo Shadow IT
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Make data accessible Governed self-service Scalable operationalization
Pitfalls:o Unmanaged autonomyo Self-service tools for the tech
savvy
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
GET READY FOR CHANGE
#TalendConnect
Ingestion Best Practices
Transactions
Messages & Events
1011011100
10
1011011100
10
Logs
Sensors
Data Analytics & Data Science
Real-time Data Visualization
Real-time Indicators / Scorecard
Collect - Distribute
Track
Streaming
WindowingAlert
NYC Taxi Data Streaming
#TalendConnect#TalendConnect
NYC Taxi Data Streaming
#TalendConnect
• The future features described in this presentation are under consideration by Talend and are not commitments for future products, technologies, or services.• The roadmap is subject to change and Talend does not guarantee the features
or release dates.
Disclaimer
#TalendConnect
Roadmap 2017
Addressing the needs of large enterprises
Big Data
1st on Spark 2.0&
Data Prep on Big Data
Data Prep&
Data Ingestion
Cloud Self-service
Data Stewardship &
Self-service connectors
Governance
Apache Atlas
#TalendConnect
Analyze way more data to find more opportunities for innovations and transformations
Real-time data streaming brings increased agility
To unleash data lakes, data governance is essential
Key Take Aways
#TalendConnect
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide
• Real-world scenarios using Spark, Kafka, MapReduce & NoSQL
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning
#TalendConnect
#TalendConnect#TalendConnect
Thank You