Date post: | 16-Apr-2017 |
Category: |
Data & Analytics |
Upload: | dataconomy-media |
View: | 412 times |
Download: | 0 times |
Data Virtualization: Fulfilling the Promise of Data LakesDr. Christian KurzePrincipal Sales Engineer – [email protected]
2
Key qestions I want to answer today What is Data Virtualization? How to leverage Hadoop Data Lakes to support Internet of Things / Operational Data Store / Offloading / … use cases? How to query Hadoop Data Lakes combined with any other structured, semi-structured and unstructured data sources using a single logical data lake? What about Cloud? How to avoid Data Swamps via a light weight data governance approach that helps enterprises maximize the value of their Data Lake? How to use a logical data lake/data warehouse to prevent a physical data lake from becoming a silo?
Agenda
3
Status Quo – Data IntegrationAccess to all information
MarketingSales ExecutiveSupport
Access to complete information … in an economically meaningful way … real-time and in high quality incl.
monitoring, security and audit
Cross-sell / Up-sellChannel
WarrantyProduct Customer
Database
Apps Warehouse Cloud
Big DataDocuments
AppsNoSQL
Manual Access to legacy systems and constantly new technologies – IoT, Big Data, Cloud
Point-to-Point connections Too slow projects for new initiatives
– from disparate silos and technologiesThe Requirement…
… versus the current architecture
4
Status Quo – Data IntegrationAccess to all information
MarketingSales ExecutiveSupport
Access to complete information … in an economically meaningful way … real-time and in high quality incl.
monitoring, security and audit
Cross-sell / Up-sellChannel
WarrantyProduct Customer
Database
Apps Warehouse Cloud
Big DataDocuments
AppsNoSQL
Manual Access to legacy systems and constantly new technologies – IoT, Big Data, Cloud
Point-to-Point connections Too slow projects for new initiatives
– from disparate silos and technologiesThe Requirement…
… versus the current architecture
„My architecture works fine, but I am not able to access all my silos.“- Enterprise Data Architect• Different locations• Different technologies• Different data structures• Too large datasets to move them• Different APIs and access methods• Excessive use of ETL to copy data• Synchronization issues
5
The SolutionData Virtualization as a Data Abstraction Layer
DATA ABSTRACTION LAYER
Central repository to access all dataAbstracts the underlying technology of the data sourcesEnables the definition of a semantic data modelOffers a metadata-rich catalogMultiple access methods:
SQL basedKeyword based search (via index)RESTful navigation (hyperlinks) Native support for nexted documentstructures (XML, JSON, …)
6
Modelling in a Data Virtualization Solution
Sources
Combine, Transform & Integrate
Publish
Base View (Source Abstraction)Client Address ClientType Company Invoicing ServiceUsageProduct Logs WebIncidents
Customer Invoice Product
Customer Invoicing
Service Usage Incident
Hadoop Web SiteRest Web Service MultidimensionalSalesforceSQL ServerOracle
SQL, SOAP, REST, ODATA, Message Queues (JMS), etc.. Denodo’sInformation Self ServiceIndependent of theaccess method – all views use the same metadata and accessprivileges
7
Common Data Virtualization Use CasesData Virtualization
BIG DATA, CLOUD INTEGRATION Advanced Analytics Data Warehouse Offloading Big Data for Enterprise Cloud / SaaS Integration
AGILE BUSINESS INTELLIGENCE Logical Data Warehouse Virtual Data Marts Self-Service BI Operational BI / Analytics
SINGLE VIEW APPLICATIONS Single Customer View - Call Centers, Portals Single Product View - Catalogs Single Inventory View - Inventory Reconciliation Vertical Specific - Single View of Wells
DATA SERVICES Unified Data Services Layer Logical Data Abstraction Agile Application Development Linked Data Services
8
DWH & MartsAdvanced Analytics(multiple structures) Advanced Analytics(structured) MDMStreams
Multiple platforms optimized for different WorkloadsAdditionally in a hybrid environment: OnPrem vs. Cloud
CR
UD
NoSQL / Graph DBData Lake: Hadoop / Spark / Hive / …
EDWMart
DW ApplianceDW ApplianceCust
Prod
Real-time streamprocessing & decisionmanagementGraph analysisGraph analysis
Investigative analysis, data refineryData mining, modeldevelopment
Data mining, modeldevelopment
Traditional query, reporting & analysisGovernedcontextinformation
Traditional query, reporting & analysis
9
Business requires a combination of dataMDM
CR
UD
Hadoop
CustProd
Who are our customers?What products do we sell?
What are the most popularnaviational paths throughour web site that led tohigh-fee products?
Who are our most loyal, lowrisk customers that generatelow fees?
What is the online behaviorof our loyal, low risk, low feecustomers so that we canoffer them higher feeproducts?
Where do I find this data? How to combine this data? How to share it with mycolleagues? What abouttheir access privileges?
EDW
Big Data ConnectivityBigData and Cloud Databases Connectivity■ Hadoop Ecosystem:
■ SQL on Hadoop: Hive, Impala, Presto,… ■ HDFS, Parquet, Avro, CSV…■ Execution of map/reduce Jobs■ Certified with major Hadoop distributions
■ In-memory platforms: Apache Spark SQL, Presto DB, HANA,…■ Parallel DWs and Appliances: Vertica, Impala, Teradata, Greenplum,…■ Cloud RDBMS: Redshift, Snowflake, DynamoDB,…■ NoSQL (MongoDB, CouchDB, Neo4J, Redis, Oracle NoSQL, Cassandra, etc.)■ Streaming data (Spark streams, Splunk, IBM Streams, Kafka,…)
10
Enhanced Adapters for Big Data ecosystem
Delimited text filesSequence filesMap filesAvro files
11
How to provide access by multiple tools and technologies?
DWH MDM Hadoop Appliances NoSQL ExternalServices
Excel / MS BI Tableau Power BI Composite Desktop 360 Views Cockpit Other Applications
Complex Security Policies? RBAC? Single Sign On (Kerberos) Governance / Audit Fast Prototyping? Automated Processes? Manual development of Service Layer?
Source Changes New Attributes and Requirements Accounting of source usage(cloud migration pending) Refactoring of sources New Sources
12Marketing
Data Lakes
Research
Logical Data Lake
Finance
Self-ServiceAnalyticsOperationalApps
A Single Governed Logical Data LakeData Virtualization combines one or more physical data lakes with other enterprise data to create a “virtual” or “logical” data lake.
Other Data Sources
MDM Cloud Apps
BI/AnalyticalToolsExcelReports
DATA VIRTUALIZATIONSemanticModel Data Discovery MetadataCatalog Security Governance
Denodo Platform Bridges Distinct Data Architectures
Simplified Architecture Single Point of Access Lower TCO Lower Operational Costs Improved Agility Improved Flexibility Consistency and Integrityfor multiple tools
13
Information Self ServiceE/R diagram
1Click on a view to navigate to the details
2 Hover on the arrows to show the details of the PK-FK relationships
14
Information Self ServiceBrowse Metadata Catalog
1Browse and searchvirtual databases
2 Browse and search available views
3 Review metadata and descriptions
4 Query the view
15
Information Self ServiceSearch Metadata Catalog
1 Full-text search within view metadata (name, column names, descriptions)
2 Show additional view information and query data
16
Information Self ServiceQuerying Data
1Access to the Denodo catalog
2 Query and filter for data
3 Click on the green arrows to drill down into related information
17
Information Self ServiceData Lineage 1 Select Data Lineage for the View
2 Select column to see lineage
3 Hover and click the icons to see details
18
Telematics & Predictive MaintenanceLeading Construction Manufacturer
Dealer
Maintenance
Parts InventoryOSI PI Hadoop Cluster
Tableau: Dealer / Customer Dashboard
19
Business Benefits Improved asset performance and proactive maintenance. Reduced warranty costs due to proactive maintenance of
parts preventing parts failure. Optimized pricing for services and parts among global service
providers. New Business Model opportunities based on real-time
analysis of detailed sensor data.
20
How can I get started?
Read New Whitepaper by Rick F. Van der LansDeveloping a Bimodal Logical Data Warehouse Architecture Using Data VirtualizationRegister at: http://bit.ly/2frs782Get Started Today!Download Denodo Express: www.denodoexpress.comAccess Denodo on AWS:www.denodo.com/en/denodo-platform/denodo-platform-for-aws
www.denodo.com [email protected]© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.