+ All Categories
Home > Documents > An Introduction to Data Virtualization in Business Intelligence

An Introduction to Data Virtualization in Business Intelligence

Date post: 13-Jan-2016
Category:
Upload: davidmwalker
View: 220 times
Download: 1 times
Share this document with a friend
Description:
A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013
18
An Introduction to Data Virtualization in Business Intelligence David M Walker Data Management & Warehousing http://datamgmt.com 18 OKTOBRIS 2013
Transcript
Page 1: An Introduction to Data Virtualization in Business Intelligence

An Introduction to Data Virtualization in Business Intelligence

David M Walker Data Management & Warehousing

http://datamgmt.com

18 OKTOBRIS 2013

Page 2: An Introduction to Data Virtualization in Business Intelligence

What Is Data Virtualization?

•  Wikipedia: “Data virtualization is [..] an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.”

•  Or more simply: A solution that sits in front of multiple data sources and allows them to be treated as a single SQL database

Page 3: An Introduction to Data Virtualization in Business Intelligence

Basic Model End$Users$access$

via$a$Repor0ng$

Tools$

•  Tradi0onal$Databases$•  IBM$(DB2$&$Netezza)$

•  Microso@$(SQL$Server)$

•  Oracle$(Oracle$&$MySQL)$

•  Postgres$

•  Sybase$(ASE$&$IQ)$

•  Etc.$

•  NoSQL$/$NewSQL$•  Apache$Hadoop$

•  Cassandra$

•  Mongo$

•  Neo4J$

•  etc.$

•  Other$Formats$•  Microso@$Office$

•  Messaging$

•  Flat$Files$

•  XML$

•  Web$

•  Cloud$

•  Applica0on$APIs$

•  etc.$

Data$Virtualiza0on$PlaWorm$

Defines$a$‘model’$of$the$source$systems$(similar$in$concept$to$a$BO$Universe)$

Models$can$generally$be$layered$on$top$of$other$models$$$

ETL$treats$$

DV$plaWorm$$

as$a$source$

Data$Publishing$

Batch/RESTful$

Message$Based$

SOA/Publica0on$

Page 4: An Introduction to Data Virtualization in Business Intelligence

Advanced Features: Role Based Access Control & Data Masking

First&Name& Last&Name& DoB& Salary&

Joe$ Bloggs$ 30^Jan^1983$ €60,100$

Jane$ Smith$ 17^Jun^1978$ €75,400$

Data$Virtualiza0on$PlaWorm:$

Manages$sensi0ve$informa0on$based$on$a$users$role$

Role$Based$

Authen0ca0on$

First&Name& Last&Name& DoB& Salary&

Joe$ Bloggs$ 30^Jan^1983$ NULL$

Jane$ Smith$ 17^Jun^1978$ NULL$

First&Name& Last&Name& Age&

Joe$ Bloggs$ 30$

Jane$ Smith$ 35$

User$1$ User$2$

Page 5: An Introduction to Data Virtualization in Business Intelligence

Advanced Features: Caching

Data$Virtualiza0on$PlaWorm$

$$$

Local$Database$Table$$

with$good$connec0vity$$

Remote$Database$Table$

with$poor$connec0vity$$

Cached$Copy$of$$

Remote$Database$Table$

User$sees$performance$as$if$all$the$data$was$local$

Page 6: An Introduction to Data Virtualization in Business Intelligence

Advanced Features: Creating a Canonical Data Model

Data$Virtualiza0on$PlaWorm$

$$$

Finance$System$

User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$

Billing$System$

CRM$System$

Website$

Other$Systems$

Data$mapped$to$

conform$to$a$$$

Canonical$Model$

Page 7: An Introduction to Data Virtualization in Business Intelligence

But it’s not a Silver Bullet

•  Can be slow –  Depending on how much data has to be fetched from remote

systems to the DV platform – platforms try to be smart to reduce this

•  Can impact performance on underlying systems –  Lots of BI users making queries on resource sensitive OLTP

systems is not a good idea •  Requires Resources

–  Another set of servers, technologies, etc. to manage, but this cost is often offset against the reduction in complexity elsewhere.

•  Not a replacement – it is an additional tool –  You will still need ETL and Messaging

Page 8: An Introduction to Data Virtualization in Business Intelligence

BI Use Cases: Agile Data Mart Design

•  Access data warehouse data quickly and easily

•  Design the data mart you think you want

•  Test it with real data and your actual reporting tool

•  Also possible with data warehouse design

Data$Warehouse$

Data$Virtualiza0on$PlaWorm$

OR$A$ B$

Page 9: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Virtual Data Marts

•  Big Tin Appliance with lots of horse power?

•  Don’t want to duplicate data in the appliance and consume disk space for a data mart but want the star schema for ease of use?

Data$Warehouse$

Data$Virtualiza0on$PlaWorm$

Page 10: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Data Mart Extensions

•  Existing (physical) data mart

•  New Data source that needs to be incorporated quickly

•  Create virtual copy of existing data mart and data source

•  Integrate into updated data mart design

Data$

Virtualiza0on$

PlaWorm$

New$Data$

Source$

$

Data$Mart$

Page 11: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Agile Set Based ELT Design

•  If your normal ETL style is a series of set SQL queries built on top of each other then you can quickly prototype ETL before moving it into your normal ETL engine to persist execute (normally for performance) Source$ Source$ Source$

Data$Virtualiza0on$PlaWorm$

Page 12: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Big Data Integration

•  DV Platform connects to Big Data Sources

•  Data Sources are mapped into DV

•  User accesses them via standard tools (SQL, RESTful interfaces, etc.)

Data$Virtualiza0on$PlaWorm$

SQL$Interface$

Map$Reduce,$etc.$Interface$

SQL$based$tools$

Page 13: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Source System Analysis

•  Apply your data quality and data profiling tools to all your data sources

•  Look for relationships across systems

•  Remove limitations of accessibility by enabling caching so that you are not hitting the source system but have fresh data

Source$ Source$ Source$

Data$Virtualiza0on$PlaWorm$

Data$Quality$&$Profiling$Tools$

Page 14: An Introduction to Data Virtualization in Business Intelligence

BI Use Case: Data Masking

•  Currently building two versions of a data mart, one with sensitive data in and one without

•  Instead build one and use Role Based Access Control (RBAC) to restrict what an individual can see

Physical$Data$Mart$

Data$Virtualiza0on$PlaWorm$

AND$

Page 15: An Introduction to Data Virtualization in Business Intelligence

BI Use Cases

•  Some examples – Usefulness of each example depends on the

organization

•  Generally an enabler for more agility – Quicker prototyping and integration

•  Will not solve all your problems – And has a cost associated with it (license &

hardware

Page 16: An Introduction to Data Virtualization in Business Intelligence

Vendors: What The Analysts Say

•  Forrester Wave Data Virtualization Q1 2012

•  Forrester Wave Q1/12 –  Informatica –  IBM –  Denodo

•  EU (Spanish) Origins –  Composite

•  Now part of Cisco •  Was OEM’d by Informatica

–  Microsoft –  SAP –  And others

•  Gartner –  No Magic Quadrant, instead

includes Data Virtualization in Data Integration

Page 17: An Introduction to Data Virtualization in Business Intelligence

Vendors: Product Positioning

Stand Alone •  Players

–  Cisco (Composite) –  Denodo

•  Selection –  Popular where IBM/

Informatica are not already embedded

Integrated •  Players

–  IBM –  Informatica

•  Selection –  Popular with organisations

that already have the vendor ETL tool

Page 18: An Introduction to Data Virtualization in Business Intelligence

An Introduction to Data Virtualization in Business Intelligence

David M Walker Data Management & Warehousing

http://datamgmt.com

THANK YOU - PALDIES


Recommended