+ All Categories
Home > Technology > Metadata in Business Intelligence

Metadata in Business Intelligence

Date post: 27-Jan-2015
Category:
Upload: jose-luis-lopez-pino
View: 111 times
Download: 1 times
Share this document with a friend
Description:
This presentation is part of my work for the course 'Heterogeneous and Distributed Information Systems' at TU Berlin within the IT4BI (Information Technology for Business Intelligence) master programme.
Popular Tags:
42
Metadata in Business Intelligence Jose Luis Lopez Pino Database Systems and Information Management Technische Universit¨ at Berlin January 28, 2014 v1.2
Transcript
Page 1: Metadata in Business Intelligence

Metadata in Business Intelligence

Jose Luis Lopez Pino

Database Systems and Information ManagementTechnische Universitat Berlin

January 28, 2014

v1.2

Page 2: Metadata in Business Intelligence

Table of Contents

1 MetadataWhat is Metadata?Metadata for InformationSystems

2 Business IntelligenceWhat is BusinessIntelligence?Business Intelligence in aNutshellThe Dimensional FactModelData Warehousing

3 Metadata in BIMotivationClassificationThe Four Commandmentsof BI Metadata

4 ExamplesROLAP and MetadataOracle Administration Tool

5 ResearchMetadata andInteroperabilityPlatform-IndependentModelsMetadata in MultiversionDWH

6 Big DataExamplesSome Thoughts aboutMetadata and Hadoop

7 Conclusions10 Reasons why Metadatamatters in BIFinal Conclusions

Page 3: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata

Jose Luis Lopez Pino 3

Page 4: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

What is Metadata?

“ Metadata is a set of data that describes and givesinformation about other data. ”

— Oxford Dictionary

“ Metadata is explicitly managed data describing other data orsystem elements to support their documentation, reusabilityand interoperation.” 1

1Susanne Busse, Ralf-Detlef Kutsche, Ulf Leser, and Herbert Weber.Federated information systems: Concepts, terminology and architectures.Citeseer, 1999

Jose Luis Lopez Pino 4

Page 5: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata for Information Systems

I Technical metadata: describes information regarding thetechnical access mechanisms of components.

I Logical metadata: relates to the schemas and their logicalrelationships.

I Metamodels: supports the interoperability of schemas indifferent data models.

I Semantic metadata: helps to describe the semantic ofconcepts.

I Quality-related: describes source-specific properties ofinformation systems regarding their quality.

I Infrastructure metadata: helps users to find relevant data.

I User-related metadata: describes responsibilities andpreferences of the users

Jose Luis Lopez Pino 5

Page 6: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Business Intelligence

Jose Luis Lopez Pino 6

Page 7: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

What is Business Intelligence?

Processing and organizing data in order to extract informationand using this information to make business decisions.

“ Business intelligence (BI) is an umbrella term that includesthe applications, infrastructure and tools, and best practicesthat enable access to and analysis of information to improveand optimize decisions and performance.”

— Gartner

Jose Luis Lopez Pino 7

Page 8: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Why Data Analysis?

Jose Luis Lopez Pino 8

Page 9: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Business Intelligence in a Nutshell I

I OLTP: information system oriented to small and interactiveoperations

I ETL: process that consist of extractions, transformations andloads of data

I Data warehouse: central repository of data used for reportingand analysis

I Datamart: contains a subset of the information of a datawarehouse and it is personalized for a single business view.

I OLAP: technique to analyse multi-dimensional data

I ROLAP: using a relational database do OLAP analysis

I MDX: query language for multidimensional data

I Data mining: discovering patterns in data

Jose Luis Lopez Pino 9

Page 10: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Business Intelligence in a Nutshell II

I Data visualization: representation of data to make it moremeaningful and/or attractive

I Decision support: tools that facilitates making a decisionbased on data

I Data-driven business: companies leaded by a strategy basedon data

Jose Luis Lopez Pino 10

Page 11: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

The Dimensional Fact Model I

I Fact: is an event that is relevant to the decision-makingprocess.

I Measure: is a numerical attribute of the fact

I The dimensions categorize the data into a finite number ofslots.

Jose Luis Lopez Pino 11

Page 12: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

The Dimensional Fact Model II

Jose Luis Lopez Pino 12

Page 13: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Cube

Jose Luis Lopez Pino 13

Page 14: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Data Warehousing

Copyright 2013 Toon Calders http://goo.gl/ds8nZc

Jose Luis Lopez Pino 14

Page 15: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata Management in Data Warehousing

Copyright 2014 LINGARO http://goo.gl/Wfxsni

Jose Luis Lopez Pino 15

Page 16: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata in BI

Jose Luis Lopez Pino 16

Page 17: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Motivation: Quotes I

“Metadata is a vital element of the data warehouse.”

— William Inmon2

“Metadata is the DNA of the data warehouse.”

— Ralph Kimball3

“Metadata is analogous to the data warehouse encyclopedia.”

— Ralph Kimball3

2William H Inmon. Metadata in the Data Warehouse. Morgan Kaufmann,2000

3Ralph Kimball. The data warehouse lifecycle toolkit: expert methods fordesigning, developing, and deploying data warehouses. Wiley. com, 1998

Jose Luis Lopez Pino 17

Page 18: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Motivation: Quotes II

“The fact that metadata drives the warehouse is the literaltruth. If you think you wont use metadata, you are mistaken.”

— Ralph Kimball4

“In the scope of data warehousing, meta-data plays anessential role because it specifies source, values, usage andfeatures of data warehouse data and defines how data can bechanged and processed at every architecture layer.”

— Matteo Golfarelli, Stefano Rizzi54Ralph Kimball. The data warehouse lifecycle toolkit: expert methods for

designing, developing, and deploying data warehouses. Wiley. com, 19985M. Golfarelli and S. Rizzi. Data Warehouse Design: Modern Principles and

Methodologies. Mcgraw-Hill, 2009Jose Luis Lopez Pino 18

Page 19: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata is everywhere!

I Meaning of the objects.I User profiles.I Security permissions.I Usage statistics.I Logical model.I Relation between physical and logical objects.I DBMS metadata: tables, indexes, FKs, PKs, etc.I Reporting / Data analysis objects.I Transformations of the data.I Data sources and data targets.I Query logs.I ETL logs.I Materialized information.

Jose Luis Lopez Pino 19

Page 20: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Classification

1. Technical metadata:� Describes the physical objects that make up the datata

warehouse.� Tables, fields, indexes, sources, targets, transformations, etc.

2. Business metadata:� Describes the contents of the data warehouse in an accessible

way to conduct the day-to-day business.6

� Facts, dimensions, logical relationships, etc.

3. Process metadata:� Describes operations executed on the warehouse and their

results.� Results of the ETL process, query logging, etc.

6William H Inmon, Bonnie O’Neil, and Lowell Fryman. Business Metadata:Capturing Enterprise Knowledge: Capturing Enterprise Knowledge. MorganKaufmann, 2010

Jose Luis Lopez Pino 20

Page 21: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

The Four Commandments of BI Metadata

A data warehouses likelihood for success is greatly increased byfollowing Ralph Kimball advices:7

1. Be aware of what metadata you keep.

2. Centralize it where possible.

3. Track your metadata.

4. Keep it up to date.

7Ralph Kimball. The data warehouse lifecycle toolkit: expert methods fordesigning, developing, and deploying data warehouses. Wiley. com, 1998

Jose Luis Lopez Pino 21

Page 22: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Examples

Jose Luis Lopez Pino 22

Page 23: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

ROLAP and Metadata

Figure: PostgreSQL’s ROLAP server translates MDX query into SQL

Jose Luis Lopez Pino 23

Page 24: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

ROLAP and Metadata

1SELECT2Expenses . ” Expenses p e r day ” saw 0 ,3Expenses . ” Days w i t h e x p e n s e s ” saw 1 ,4Expenses . ” T o t a l Expenses ” saw 2 ,5P e r i o d . ” Year ” saw 36FROM ”HR − T r a v e l Expenses ”7ORDER BY saw 3

Figure: MDX Query

Jose Luis Lopez Pino 24

Page 25: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

ROLAP and Metadata

1s e l e c t2sum( case when T1757 .ZD NUM = 0 then 0 e l s e ( T1757 .

ZMDTACE NAC IM + 1 7 5 7 .ZMDTACO NAC IM + T1757 . ZD NAC IM +T1757 . ZCOMD NAC IM + T1757 . ZCOMDDIC IM + T1757 .ZMDTACE EXT IM + T1757 . ZMDTACO EXT IM + T1757 . ZD EXT IM +

T1757 . ZCOMD EXT IM) / n u l l i f ( T1757 .ZD NUM, 0) end ) asc1 ,

3sum( T1757 .ZD NUM) as c2 ,4sum( T1757 . ZCLV 032 + T1757 . ZCLV 132 ) as c3 ,5T623 .YEAR as c46from7SYSADM. PS ZOBI CALENDA VW T623 ,8SYSADM. PS ZOBI DS TBL T17579where ( T623 . MONTH OF YEAR = T1757 . MONTH OF YEAR and T1757 .

ZID COL = ’T ’ and T623 . MONTH OF YEAR <= 201206 and T623 .YEAR between 2012 − 2 and 2012 )

10group by T623 .YEAR11order by c4

Figure: SQL Query

Jose Luis Lopez Pino 25

Page 26: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Oracle Administration Tool

Figure: The physical layer stores the tehnical metadata meanwhile theother two layers store the business metadata.

Jose Luis Lopez Pino 26

Page 27: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Advantages

I Abstraction: the data analysts do not need to have knowledgeof the complex data sources involved in the system. Dataanalysts only worry about the business question, not abouthow to answer it.

I Portability: the changes on the physical model don’t affectthe logical model.

I Security: defining a strong security policy allow theadministrators to restrict the access of the users toinformation that they must not know about.

I Customization: the information is adapted to the user.

Azriel Marla and Bob Ertl. Oracle fusion middleware metadata repositorybuilder’s guide for oracle business intelligence enterprise edition, 11g release 1(11.1. 1), 2011

Jose Luis Lopez Pino 27

Page 28: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Research

Jose Luis Lopez Pino 28

Page 29: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata and Interoperability

I The BI environment is compound of a wide variety of toolsI Complex bridges are crucial to integrate metadata among

them.I It is necessary to define a standard to facilitate the

interoperability and integration.I Some attempts:

� Open Information Model (OIM) by Meta Data Coalition.� Common Warehouse Metamodel (CWM) by OMG.� OIM was integrated to CWM.

I Suggestion: to use domain ontologies to establish semanticmappings between different data-marts

Stefano Rizzi, Alberto Abello, Jens Lechtenborger, and Juan Trujillo.Research in data warehouse modeling and design: dead or alive? In Proceedingsof the 9th ACM international workshop on Data warehousing and OLAP, pages3–10. ACM, 2006

Jose Luis Lopez Pino 29

Page 30: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

How Standards proliferate?

Figure: XKCD http://xkcd.com/927/

Jose Luis Lopez Pino 30

Page 31: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

OIM Vs. CWD

I They both are metadata standards for data warehousing

I OIM’s scope is wider, not only for metadata.

I Good for technical metadata, not for business metadata.

I OIM is limited to relational data.

I Using CWM, metadata exchange between tools that use theXMI standard is automatic.

Thomas Vetterli, Anca Vaduva, and Martin Staudt. Metadata standards fordata warehousing: open information model vs. common warehouse metadata.ACM Sigmod Record, 29(3):68–75, 2000

Jose Luis Lopez Pino 31

Page 32: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Platform-Independent Models

I The problem: You have to provide OLAP metadata to bridgethe gap between the conceptual and logical model. Thismetadata depends on the platform.

I The solution:� Define an OLAP algebra that provides semantics in

multidimensional models.� It derives the logical design automatically, for any platform.� Model Driven Architecture: derive the metadata from the

conceptual model.

Jesus Pardillo, Jose-Norberto Mazon, and Juan Trujillo. Bridging thesemantic gap in olap models: platform-independent queries. In Proceedings ofthe ACM 11th international workshop on Data warehousing and OLAP, pages89–96. ACM, 2008

Jesus Pardillo, Jose-Norberto Mazon, and Juan Trujillo. Towards theautomatic generation of analytical end-user tools metadata for data warehouses.In Sharing Data, Information and Knowledge, pages 203–206. Springer, 2008

Jose Luis Lopez Pino 32

Page 33: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Metadata in Multiversion DWH

I Multiversion DWH:

� It keeps track of the changes in the schema and the data.� Metadata become more complex and useful in these systems.

I Proposal:� Use a metamodel to manage different versions of the DWH.� Use a metamodel to detect changes in the external data

sources.

Robert Wrembel and Bartosz Bebel. Metadata management in amultiversion data warehouse. In On the Move to Meaningful Internet Systems2005: CoopIS, DOA, and ODBASE, pages 1347–1364. Springer, 2005

Jose Luis Lopez Pino 33

Page 34: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Big Data

Jose Luis Lopez Pino 34

Page 35: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Examples: HDFS

I The NameNode stores all the metadata in a single point.

I It keeps all the metadata in memory.

I It might be problematic when we store a vast amount of smallfiles14

14Grant Mackey, Saba Sehrish, and Jun Wang. Improving metadatamanagement for small files in hdfs. In Cluster Computing and Workshops, 2009.CLUSTER’09. IEEE International Conference on, pages 1–4. IEEE, 2009

Jose Luis Lopez Pino 35

Page 36: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Examples: Query Planner

Figure: Apache Drill architecture: http://goo.gl/icZctF

Jose Luis Lopez Pino 36

Page 37: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Examples: Table and Storage Management Layer

Figure: HCatalog http://goo.gl/7E1xLc

Jose Luis Lopez Pino 37

Page 38: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Examples: Authorization to Data and Metadata

Figure: Apache Sentry: http://goo.gl/zAsIyk

Jose Luis Lopez Pino 38

Page 39: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Some Thoughts about Metadata and Hadoop

I Technical metadata is necessary.

I Hadoop is rapidly becoming a mature platform and hencemetadata will be more relevant in the following years.

I Metadata seems to be a perfect fit for the heterogeneousHadoop ecosystem.

Jose Luis Lopez Pino 39

Page 40: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Conclusions

Jose Luis Lopez Pino 40

Page 41: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

10 Reasons why Metadata matters in BI

1. It’s everywhere!

2. It meets the disparate needs of the data warehouses technical,administrative, and business user groups.

3. It contains information at least as valuable as regular data.

4. It is used to describe the semantic of concepts.

5. It facilitates the extraction, transformation and load process.

6. It improves data security.

7. It hides implementation details.

8. We can customize how the user sees the data.

9. It helps interoperability among systems.

10. It allow us to design portable solutions.

Jose Luis Lopez Pino 41

Page 42: Metadata in Business Intelligence

Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions

Final Conclusions

1. Metadata matters

2. Metadata is everywhere.You can’t get out ofdodge

3. Research is alive

4. Metadata management isless painful when usingthe right tools

5. Big data challenges areeased by metadata

Jose Luis Lopez Pino 42


Recommended