Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | davidmwalker |
View: | 220 times |
Download: | 1 times |
An Introduction to Data Virtualization in Business Intelligence
David M Walker Data Management & Warehousing
http://datamgmt.com
18 OKTOBRIS 2013
What Is Data Virtualization?
• Wikipedia: “Data virtualization is [..] an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.”
• Or more simply: A solution that sits in front of multiple data sources and allows them to be treated as a single SQL database
Basic Model End$Users$access$
via$a$Repor0ng$
Tools$
• Tradi0onal$Databases$• IBM$(DB2$&$Netezza)$
• Microso@$(SQL$Server)$
• Oracle$(Oracle$&$MySQL)$
• Postgres$
• Sybase$(ASE$&$IQ)$
• Etc.$
• NoSQL$/$NewSQL$• Apache$Hadoop$
• Cassandra$
• Mongo$
• Neo4J$
• etc.$
• Other$Formats$• Microso@$Office$
• Messaging$
• Flat$Files$
• XML$
• Web$
• Cloud$
• Applica0on$APIs$
• etc.$
Data$Virtualiza0on$PlaWorm$
Defines$a$‘model’$of$the$source$systems$(similar$in$concept$to$a$BO$Universe)$
Models$can$generally$be$layered$on$top$of$other$models$$$
ETL$treats$$
DV$plaWorm$$
as$a$source$
Data$Publishing$
Batch/RESTful$
Message$Based$
SOA/Publica0on$
Advanced Features: Role Based Access Control & Data Masking
First&Name& Last&Name& DoB& Salary&
Joe$ Bloggs$ 30^Jan^1983$ €60,100$
Jane$ Smith$ 17^Jun^1978$ €75,400$
Data$Virtualiza0on$PlaWorm:$
Manages$sensi0ve$informa0on$based$on$a$users$role$
Role$Based$
Authen0ca0on$
First&Name& Last&Name& DoB& Salary&
Joe$ Bloggs$ 30^Jan^1983$ NULL$
Jane$ Smith$ 17^Jun^1978$ NULL$
First&Name& Last&Name& Age&
Joe$ Bloggs$ 30$
Jane$ Smith$ 35$
User$1$ User$2$
Advanced Features: Caching
Data$Virtualiza0on$PlaWorm$
$$$
Local$Database$Table$$
with$good$connec0vity$$
Remote$Database$Table$
with$poor$connec0vity$$
Cached$Copy$of$$
Remote$Database$Table$
User$sees$performance$as$if$all$the$data$was$local$
Advanced Features: Creating a Canonical Data Model
Data$Virtualiza0on$PlaWorm$
$$$
Finance$System$
User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$
Billing$System$
CRM$System$
Website$
Other$Systems$
Data$mapped$to$
conform$to$a$$$
Canonical$Model$
But it’s not a Silver Bullet
• Can be slow – Depending on how much data has to be fetched from remote
systems to the DV platform – platforms try to be smart to reduce this
• Can impact performance on underlying systems – Lots of BI users making queries on resource sensitive OLTP
systems is not a good idea • Requires Resources
– Another set of servers, technologies, etc. to manage, but this cost is often offset against the reduction in complexity elsewhere.
• Not a replacement – it is an additional tool – You will still need ETL and Messaging
BI Use Cases: Agile Data Mart Design
• Access data warehouse data quickly and easily
• Design the data mart you think you want
• Test it with real data and your actual reporting tool
• Also possible with data warehouse design
Data$Warehouse$
Data$Virtualiza0on$PlaWorm$
OR$A$ B$
BI Use Case: Virtual Data Marts
• Big Tin Appliance with lots of horse power?
• Don’t want to duplicate data in the appliance and consume disk space for a data mart but want the star schema for ease of use?
Data$Warehouse$
Data$Virtualiza0on$PlaWorm$
BI Use Case: Data Mart Extensions
• Existing (physical) data mart
• New Data source that needs to be incorporated quickly
• Create virtual copy of existing data mart and data source
• Integrate into updated data mart design
Data$
Virtualiza0on$
PlaWorm$
New$Data$
Source$
$
Data$Mart$
BI Use Case: Agile Set Based ELT Design
• If your normal ETL style is a series of set SQL queries built on top of each other then you can quickly prototype ETL before moving it into your normal ETL engine to persist execute (normally for performance) Source$ Source$ Source$
Data$Virtualiza0on$PlaWorm$
BI Use Case: Big Data Integration
• DV Platform connects to Big Data Sources
• Data Sources are mapped into DV
• User accesses them via standard tools (SQL, RESTful interfaces, etc.)
Data$Virtualiza0on$PlaWorm$
SQL$Interface$
Map$Reduce,$etc.$Interface$
SQL$based$tools$
BI Use Case: Source System Analysis
• Apply your data quality and data profiling tools to all your data sources
• Look for relationships across systems
• Remove limitations of accessibility by enabling caching so that you are not hitting the source system but have fresh data
Source$ Source$ Source$
Data$Virtualiza0on$PlaWorm$
Data$Quality$&$Profiling$Tools$
BI Use Case: Data Masking
• Currently building two versions of a data mart, one with sensitive data in and one without
• Instead build one and use Role Based Access Control (RBAC) to restrict what an individual can see
Physical$Data$Mart$
Data$Virtualiza0on$PlaWorm$
AND$
BI Use Cases
• Some examples – Usefulness of each example depends on the
organization
• Generally an enabler for more agility – Quicker prototyping and integration
• Will not solve all your problems – And has a cost associated with it (license &
hardware
Vendors: What The Analysts Say
• Forrester Wave Data Virtualization Q1 2012
• Forrester Wave Q1/12 – Informatica – IBM – Denodo
• EU (Spanish) Origins – Composite
• Now part of Cisco • Was OEM’d by Informatica
– Microsoft – SAP – And others
• Gartner – No Magic Quadrant, instead
includes Data Virtualization in Data Integration
Vendors: Product Positioning
Stand Alone • Players
– Cisco (Composite) – Denodo
• Selection – Popular where IBM/
Informatica are not already embedded
Integrated • Players
– IBM – Informatica
• Selection – Popular with organisations
that already have the vendor ETL tool
An Introduction to Data Virtualization in Business Intelligence
David M Walker Data Management & Warehousing
http://datamgmt.com
THANK YOU - PALDIES