A company of Daimler AG
ARE DATA LAKES THE NEW CORE DWHS?ANDREAS BUCKENHOFER, DAIMLER TSSDOAG BIG DATA, REPORTING, GEODATA DAYS - KASSEL 2017
ABOUT MEhttps://de.linkedin.com/in/buckenhofer
https://twitter.com/ABuckenhofer
https://www.doag.org/de/themen/datenbank/in-memory/
http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/
https://www.xing.com/profile/Andreas_Buckenhofer2
Andreas BuckenhoferSenior DB [email protected]
Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics
DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE.
We're a specialist and strategic business partner for innovative IT Solutions within Daimler –not just another supplier! As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an innovative and technological lead. With our outstanding technological and methodical know-how we are a competent provider of services that help those who benefit from them to stand out from the competition. When it comes to demanding IT questions we create impetus, especially in the core fields car IT and mobility, information security, analytics, shared services and Digital Customer Experience.
Are Data Lakes the new Core DWHs?Daimler TSS GmbH 3
TSS 2 0 2 0 ALWAYS ON THE MOVE.
Daimler TSS GmbH 4
LOCATIONS
Are Data Lakes the new Core DWHs?
Daimler TSS ChinaHub Beijing6 Employees
Daimler TSS MalaysiaHub Kuala Lumpur38 Employees
Daimler TSS IndiaHub Bangalore16 Employees
Daimler TSS GermanyMore than 1000 Employees
Ulm (Headquarters)
Stuttgart AreaBöblingen, Echterdingen,Leinfelden, Möhringen
Berlin
Karlsruhe
AGENDA
1. Introduction/Motivation2. From the classic DWH architecture to the Data Lake3. Data Lake usage scenarios4. Summary
• Software is becoming more and more important• 100Mio lines of code
• Physical products • are significantly enhanced with
digital service capabilities, e.g. the value of the car comes increasingly from digital assets
• become digital services, e.g. car2go
• IOT, Robotics, etc.
DIGITIZATION – DATA AS AN ASSET FOR ANALYTICAL DECISIONS
Are Data Lakes the new Core DWHs?Daimler TSS 6
Source image: https://www.linkedin.com/pulse/20140626152045-3625632-car-software-100m-lines-of-code-and-counting
Agility• Is the Organization ready? IT (Dev + Ops) and Business
Flexibility• Data Modeling under pressure, model as you go• New data formats coming from logs, sensors, etc.
Performance• Right Time• Scale to high volumes• Integrate data arriving at high speed
DWH AS INTEGRATION SYSTEM FOR DIGITAL ASSETS SOME OF TODAY’S MAIN CHALLENGES
Are Data Lakes the new Core DWHs?Daimler TSS 7
IS THE DATA WAREHOUSE DEAD? AND ETL, TOO?
Are Data Lakes the new Core DWHs?Daimler TSS 8
Sources: https://www.linkedin.com/groups/45685/45685-6224210695295168512?trk=hp-feed-group-discussion&_mSplash=1https://speakerdeck.com/nehanarkhede/etl-is-dead-long-live-streamshttps://gcn.com/blogs/reality-check/2014/01/hadoop-vs-data-warehousing.aspx
AGENDA
1. Introduction/Motivation2. From the classic DWH architecture to the Data Lake3. Data Lake usage scenarios4. Summary
REFERENCE DATA WAREHOUSE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 10
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager
subject-oriented,
integrated, time-
variant,non-
volatile
REFERENCE DATA WAREHOUSE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 11
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager
subject-oriented,
integrated, time-
variant,non-
volatile
Are Data Lakes the new Core DWHs?Daimler TSS 12
Data Lake on Hadoop
Data Swamp
Data Reservoir
Landing Zone
Data Library
Data RepositoryData Archive
Data Lake on Spark
Data Lake 3.0
DATA LAKE REFERENCE ARCHITECTUREDATA LAKE OVERALL ARCHITECTURE VS DATA LAKE LAYER
Are Data Lakes the new Core DWHs?Daimler TSS 13
Landing Zone
Data
Gov
erna
nce
Data Reservoir / Presentation
Data Lake
Met
adat
a M
anag
emen
tData Archival
Data Security
DATA LAKE REFERENCE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 14
Landing ZoneData
Gov
erna
nce Data Reservoir /Presentation
Data Lake
Met
adat
a M
anag
emen
t Data Archival
Data Security
Firewall
Firewall
Sqoop Kafka
Knox
Rest API
ODBC/JDBC Restful Client
Sources
• Architecture, conceptData Lake
• Tools (that can be used to implement a Lake)
Hadoop, Spark, Elastic Stack
DATA LAKE VS HADOOP
Are Data Lakes the new Core DWHs?Daimler TSS 15
• Data has a structure: schema-less does not exist• You apply
• schema-on-reade.g. copy files (csv, json, html, …) into HDFS
• schema-on-writee.g. create table on data files in HDFS
HOW TO STRUCTURE THE DATA LAKE?SCHEMA-LESS REVOLUTION?
Are Data Lakes the new Core DWHs?Daimler TSS 16
Flexibility• For whom? Writing the data vs reading the data
Simplicity• For whom? Writing the data vs reading the data• Human mistakes while trying to reading the data
Agility / Model as you go• Just copy files into the directory
SCHEMA-ON-READ
Are Data Lakes the new Core DWHs?Daimler TSS 17
LAMBDA ARCHITECTUREAN EARLY COMPREHENSIVE BIG DATA ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 18
Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015
• It can be argued about the complexity of the Lambda architecture
• More interesting is the author’s view on data• Rawness
Store the data as it is. No transformations.• Immutability
Don’t update or delete data, just add more.
• Graph-like schema recommended
LAMBDA ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 19
Source image: Nathan Marz, James Warren: Big Data: Principles and best practices of scalable realtime data systems, Manning Publications 2015
• It can be argued about the complexity of the Lambda architecture
• More interesting is the author’s view on data• Rawness
Store the data as it is. No transformations.• Immutability
Don’t update or delete data, just add more.
• Graph-like schema recommended
„Many developers go down the path of writing their raw data in a schemaless
format like JSON. This is appealing because of how easy it is to get started, but this
approach quickly leads to problems. Whether due to bugs or misunderstandings
between different developers, data corruption inevitably occurs“
(see page 103, Nathan Marz, „Big Data: Principles and best practices of scalable
realtime data systems", Manning Publications)
Just dumping data into the Lake?
• General Data Protection Regulation, e.g. Privacy by Design• Vehicle identifier VIN is already sensitive data that needs to be protected
(anonymized) depending from usage• Earmarked use of data
Schema-on-read: How do you protect data assets if you are not aware that the data exists or where it exists?
STRUCTURING THE DATA LAKEDATA SECURITY
Are Data Lakes the new Core DWHs?Daimler TSS 20
DATA LAKE REFERENCE ARCHITECTURE
Are Data Lakes the new Core DWHs?Daimler TSS 21
Landing Zone
Data
Gov
erna
nce
Data Presentation
Data Lake
Met
adat
a M
anag
emen
tData Archival
Data Security
load
structure
transform
archive
archive
archive
access
Temporary storage
Immutable, modeled dataTool neutral
Structured data for fast access
Raw data
Distinguish Data Lake as overall concept vs Data Lake as a layer• Landing Zone
• Source data programmatically loaded• Data is partitioned for processing• Governance includes catalog and ILM (Security, Retention)
• Data Lake• Lightly integrated by Keys• Data accessible via SQL-on-Hadoop or using SerDes on raw data• Data is partitioned for access• Governance includes catalog, ILM, lightweight model
DATA LAKE HAS LAYERS (1)DATA LAKE AS CONCEPT VS DATA LAKE AS LAYER
Are Data Lakes the new Core DWHs?Daimler TSS 22
• Presentation Zone• Data is structured and partitioned/tuned for data access• Full Governance including e.g. catalog, ILM, model
• Known schema including metadata about tables and columns• Lineage• Documented quality
DATA LAKE HAS LAYERS (2)
Are Data Lakes the new Core DWHs?Daimler TSS 23
GOVERNANCE BY DAIMLER AG / COEE.G. SAMPLE HDFS LAYOUT
Are Data Lakes the new Core DWHs?Daimler TSS 24
/
scripts
data
Source_system
Landing_zone
scripts
data
Source_system
Data_archive
scripts
data
Source_system_object
Data_lake
model
data
Data_science_results
scripts
data
Use_case
Data_reservoir
scripts
data
Data_science_sandbox
AGENDA
1. Introduction/Motivation2. From the classic DWH architecture to the Data Lake3. Data Lake usage scenarios4. Summary
USE CASESWHAT IS THE BUSINESS PROBLEM TO SOLVE?
Are Data Lakes the new Core DWHs?Daimler TSS 26
Source: http://ww
w.azquotes.com
/
USE CASE: ANALYSIS BATTERY AGING
Are Data Lakes the new Core DWHs?Daimler TSS 27
Max capacityCurrent capacity
• CSV data ingested into HDFS, Hive tables on files
• Identify breaks (“> 8h”) and compute current drain
• Sensor data format change without notice• Sensors get regularly updated with new versions• Names of metrics may change• Sensors with various versions in the field• Sensors from different suppliers
• Often many fields >>100 and increasing with new sensor versions• Easy storing of data in HDFS and applying schema later• Data from Robots, vehicles, …
STRUCTURING THE DATA LAKENEW DATA SOURCES – SENSOR DATA
Are Data Lakes the new Core DWHs?Daimler TSS 28
• Sensor data format change without notice• Time consuming and error-prone
data integration into the Data Lake• Therefore preparation of data for
usage in the Data Reservoir required: “Data Engineer”
STRUCTURING THE DATA LAKE“SCHEMA-ON-READ”
Are Data Lakes the new Core DWHs?Daimler TSS 29
Landing ZoneDa
ta G
over
nanc
e
Data Reservoir
Data Lake
Met
adat
a M
anag
emen
t
Data Archival
Data Security
csv
Samp-ling / filter
Hive tables
Hive tables
Struc-ture
R Python
USE CASE: OPTIMIZE CYCLE TIME FOR LIGHTWEIGHT ROBOTS
Are Data Lakes the new Core DWHs?Daimler TSS 30
• JSON data from Orient NoSQL-DB ingested into HDFS, Hive tables on files• Partly automatize the diagnosis of anomalies (e.g. the identification of
reasons for idle times)
USE CASE: BOM EXPLOSIONHADOOP COMPUTING POWER
Are Data Lakes the new Core DWHs?Daimler TSS 31
• PLMXML files supplied by source systems• Compute changes by comparing last BOM with current BOM• Data Lake contains data across all tiers• Data Reservoir contains “dedicated, secured” views for tiers• Transfer changes to local relational DBs
USE CASE: BOM EXPLOSIONHADOOP COMPUTING POWER
Are Data Lakes the new Core DWHs?Daimler TSS 32
• Several stakeholders, e.g. different (independent) truck units• Dumping existing systems (or new data sources like logs) into the Data
Lake• Data is available fast, but• Different data models• No integration: IF ETL is reduced to EL, then T is performed by Data Scientists
many times• Some lightweight data integration required
Data Vault
STRUCTURING THE DATA LAKE LAYEREXISTING INTERNAL DATA FOR ANALYTICS
Are Data Lakes the new Core DWHs?Daimler TSS 33
• Hub and Link tables: how to ensure uniqueness?• No unique constraints or indexes like RDBMS
• Use View with distinct or group by on Hub or Link table• Don’t create Hub or Link table. Create view with distinct or group by on original
persisted incoming files• Use HBase NoSQL wide-column store for Hub, Link (+ Sat) and Phoenix for SQL
access via Hive• Hub and Link in RDBMS only
• Data Reservoir needs different structure or export data into Data Mart in RDBMS for faster access
STRUCTURING THE DATA LAKE LAYERDATA VAULT CHALLENGES WITH HADOOP
Are Data Lakes the new Core DWHs?Daimler TSS 34
• Vision: One central Enterprise DWH• Reality for many organizations: Many DWHs
• more flexible• acquisition of companies. Merge of systems?• units with different (innovation) speeds and different interests, e.g. trucks
(Mercedes Benz LKW, Freightliner, Fuso, BharatBenz, Western Star, Fleetboard)• legal requirements (e.g. data export)
• Vision: One central Data Lake• Reality: ?
DATA LAKE IN ANALOGY TO AN ENTERPRISE DWH?
Are Data Lakes the new Core DWHs?Daimler TSS 35
“The long-term vision was clear –the data warehouse should not be confined physically to a single database or machine” (09-MAR-2017)
BARRY DEVLIN – LOGICAL DATA WAREHOUSE
Are Data Lakes the new Core DWHs?Daimler TSS 36
Source: https://upside.tdwi.org/articles/2017/03/09/making-the-most-of-a-logical-data-warehouse.aspx
Barry Devlin wrote the first published article describing a data warehouse architecture in 1988 ( http://www.9sight.com/1988/02/art-ibmsj-ebis/ )
AGENDA
1. Introduction/Motivation2. From the classic DWH architecture to the Data Lake3. Data Lake usage scenarios4. Summary
“Data modeling is the process of learning about the data, and regardless of technology,this process must be performed for a successful application.”
• Learn about the data and promote collective data understanding
• Derive security classification and measures
• Design for performance
• Accelerate development
• Improve Software quality
• Reduce maintenance costs
• Generate code
• NoSQL Schema-on-read: understand model versions after years
WHY DATA MODELING?
Are Data Lakes the new Core DWHs?Daimler TSS 38
Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
DWH AND DATA LAKE
Are Data Lakes the new Core DWHs?Daimler TSS 39
DWH on RDBMS
Slowly Changing DimensionELT vs ETL3-Layer vs 2-LayerKimball ApproachInmon DefinitionStar SchemaData VaultAnchor Modelingetc
Data Lake on Hadoop
Schema-on-ReadAgilityParquetHiveHbaseSQL-on-HadoopImpalaOozieZoekeeper
Methods, Concepts,
Techniques
Tools,Tools,Tools
Many ETL problems are home-made, e.g.• Inefficient: ETL vs ETL / row-based vs set-based• Expensive: repetitive tasks should be accomplished with generators
NO DATA INTEGRATION - IS ETL DEAD?DATA SCIENCE REQUIRES PROPER DATA ENGINEERING
Are Data Lakes the new Core DWHs?Daimler TSS 40
Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling. Source: https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.ywjvuca6z (Luke de Oliveira)
Data Lakes currently focus too much on tools instead on concepts and methods•Tools come and go•Flexibility / Schema-on read: Integration just postponed to Data Reservoir or in the worst case even
later to end user
PoCs vs production-ready implementation•Many tools, but still low-productivity tools (Oozie, etc)•Error handling coding nightmare across tools
Data Lakes and Core DWHs will coexist•Another choice that makes sense for many use cases•DWH: e.g. Data Vault 2.0 architecture with storing raw data and postponing data cleansing /
harmonization for lightweight data integration has similar ideas
IS THE CLASSICAL DWH DEAD?ARE DATA LAKES THE NEW CORE DWHS?
Are Data Lakes the new Core DWHs?Daimler TSS 41
Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Are Data Lakes the new Core DWHs?Daimler TSS 42
THANK YOU
GARTNER DATA LAKE ARCHITECTURE STYLES
Are Data Lakes the new Core DWHs?Daimler TSS 43
Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/
• Inflow Lake: accommodates a collection of data ingested from many different sources that are disconnected outside the lake but can be used together by being colocated within a single place
• Outflow Lake: a landing area for freshly arrived data available for immediate access or via streaming. It employs schema-on-read for the downstream data interpretation and refinement.
• Data Science Lab: most suitable for data discovery and for developing new advanced analytics models
GARTNER DATA LAKE ARCHITECTURE STYLES
Source: http://blogs.gartner.com/nick-heudecker/data-lake-webinar-recap/ and https://www.asug.com/news/gartner-separate-data-lakes-myths-from-facts-before-you-dive-in
Slide 12: Creative Commons Licence, Hernán Piñerahttps://www.flickr.com/photos/hernanpc/7175577368/in/photolist-bW5Hab-JF9HNW-a2LHAF-pwWNjx-oC1Jq8-noeV4d-oLsHUa-gUjhFx-qNB2Sw-jKLDCR-DB3B8-pRUpx2-crB6A7-nTUuNp-cXdPgN-bX7mA4-7oHeKJ-arQCtK-njdhWh-nSadX3-dykooG-sjSZHV-eq69Ux-oW44NF-i2eUbE-5AyaGL-QkmoFh-nU7KcU-QEG6Nf-oziZ4t-oUbQi4-e2NWAT-i3Yna1-eJchKZ-pGC8eC-GDux8r-5FQt95-cWdzfh-ciwtqL-jQg8BL-4X83Uc-nBZXBA-nogVER-oekb6A-9F7w4M-jKPnYQ-bAGrjd-qNB4Hq-8gJRqp-ahC2fg
Slide 47: Creative Commons Licence, James Loeschhttps://www.flickr.com/photos/jal33/5182574275/in/photolist-8TY3LT-7M8Fb9-4jWYv1-hrdbHV-4jSWSn-6cHmvc-m4NnDV-s9Efoy-ccFCcW-5t3Csw-8R87fq-mT6WNq-89mMuL-pzzDjq-2iq7ti-bBA7PT-rjPdnX-buU2V9-aottwt-4zHTZv-mT6gA6-5hLzzx-9aWGiZ-s9DJRY-jwfgr3-7WZA75-bVmho1-bXkF7U-9aWGba-3mJSwv-sa4Esa-4jWZaA-aottqr-8bj7rS-5NiZbm-oowJXV-3vp25c-5t3EkQ-NnLMaJ-naLPJm-m78nWk-nqnUYk-mT7Wso-o54T1J-bVmgA9-emeyU1-5hQFV5-akhQQL-naLDim-pPeh93
IMAGE ATTRIBUTION
Are Data Lakes the new Core DWHs?Daimler TSS 45
Are Data Lakes the new Core DWHs?Daimler TSS 46
DWH = inflexible development, bad performance,
complex architecture with 3 layers
Failure to talk to business to obtain proper requirements
Ingestion of wrong data
Storage of data with errors
Business Keys (independent object) nested into document
Read performance
SCHEMA-ON-READ OR WHY MODELING CAN STILL BE USEFUL
Are Data Lakes the new Core DWHs?Daimler TSS 47
SCHEMA-ON-READ OR WHICH BUSINESS PROBLEMS ARE SOLVED
Are Data Lakes the new Core DWHs?Daimler TSS 48
Schema-on-read Remark
Data storage Yes, flexible Store data from various systems
Data integration no Integrate data from various systems
Has to be done during each access by each user
Data historization Yes, auditable Stamp data with timestamp
Information delivery no Turn data into valuable information.
Has to be done during each access by each user
DATA MODELS IN THE DWH
Are Data Lakes the new Core DWHs?Daimler TSS 49
Layer Characteristics Data Model
Staging Layer Temporary storage
Ingest of source data
Normally 1:1 copy of source table structure –usually without constraints and indexes
Core Warehouse Layer
Historization / bitemporal data
Integration
Tool-independent
Non-redundant data storage
Historization
3NF with historization
Head and Version modelling
Data Vault
Anchor modeling
Dimensional model with historization (possible)
Data Mart Layer Performance for end user queries required, Tool-dependent
Lots of joins necessary to answer complex questions
Flat structures, esp. Dimensional model(ROLAP / MOLAP / HOLAP)
Understand business requirements
Understand problem space
Design solution space
Think ideas (incl. alternatives) through
WHY MODEL?
Are Data Lakes the new Core DWHs?Daimler TSS 50
SQL is universal language to access and manipulate data in a RDBMS
SQL is a language not only for DBAs or developers
SQL is standard for OLTP and OLAP, especially for BI tools
MAKE SQL GREAT AGAIN OR WHY SQL ON BIG DATA?
Are Data Lakes the new Core DWHs?Daimler TSS 51
STRATA 2012 VS 2016
Are Data Lakes the new Core DWHs?Daimler TSS 52
Source: http://www.cazena.com/blog/strata-word-cloud-2012-vs-2016-data-lakes-spark-real-time-and-other-trends
• Architecture with Atlas• Supports the classical tools:
• Hive• Sqoop
• HDFS?• Schema-on-read?
ATLAS FOR METADATA MANAGEMENT
Are Data Lakes the new Core DWHs?Daimler TSS 53
NO DATA INTEGRATION NECESSARY ORWHO REALLY DOES UNDERSTANDS DATA MODELS?
Are Data Lakes the new Core DWHs?Daimler TSS 54
Source: Corr / Stagnitto: Agile Data Warehouse Design, DecisionOne Press, 2011, page 5
• 3NF is inefficient for query processing• 3NF models are difficult to
understand• 3NF gets even more complicated with
history added
• Many ways from person to order
“Data modeling is the process of learning about the data, and regardless of technology,this process must be performed for a successful application.”
• Learn about the data and promote collective data understanding
• Derive security classification and measures
• Design for performance
• Accelerate development
• Improve Software quality
• Reduce maintenance costs
• Generate code
• NoSQL Schema-on-read: understand model versions after years
WHY DATA MODELING?
Are Data Lakes the new Core DWHs?Daimler TSS 55
Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
„Expanding yourmodeling skillsenables you to
reduce documentation.“
Scott Ambler
• Standard approach in Data Marts in DWH• Not just for performance reasons
• Performance is also an issue on Hadoop-based systems, e.g. Hive, Spark• Joins!
• But also due to understandability for end users• Understandability is also an issue on Hadoop-based systems
DIMENSIONAL MODELING
Are Data Lakes the new Core DWHs?Daimler TSS 56
A prime motivation for this evolution towards a more “database-like”system was driven by the experiences of Google developers trying to buildon previous “key-value” storage systems. The prototypical example of sucha key-value system is Bigtable, which continues to see massive usage atGoogle for a variety of applications. However, developers of many OLTPapplications found it difficult to build these applications without astrong schema system, cross-row transactions, consistent replication anda powerful query language.Source: https://research.google.com/pubs/pub46103.html
IMPORTANCE OF STRONG SCHEMA @GOOGLE
Are Data Lakes the new Core DWHs?Daimler TSS 57
HADOOP VS CLASSIC DWHSQL APPROACH
Are Data Lakes the new Core DWHs?Daimler TSS 58
Classic DWH Hadoop
Tables Yes Yes
SQL language Yes Yes, SQL-on-Hadoop
Query Optimizer Yes Yes
Indexes, Pks Yes No
Data “Owner” Proprietary RDBMS Open data formatAccess by many engines like Spark, HiveMany open formats like Parquet, Avro
Metadata dictionary User data + dictionary in RDBMS
User data and dictionary (“Hive Metastore”) separate
New data sources• Sensors, Logs, NoSQL, etc. as data source• Schema-on-read useful as sensor data format change frequent
Existing internal data• Dump RDBMS exports into Data Lake for data analytics• Schema-on-read does not make any sense as data is already in a
documented data model
STRUCTURING THE DATA LAKE
Are Data Lakes the new Core DWHs?Daimler TSS 59