+ All Categories
Home > Technology > Automating DWH Patterns Through Metadata

Automating DWH Patterns Through Metadata

Date post: 21-Nov-2014
Category:
Upload: davide-mauri
View: 2,413 times
Download: 1 times
Share this document with a friend
Description:
Around 80% of the work to create a data warehouse/BI solution is spent on the ETL phase. Although building an ETL solution can be a challenge, you can break down the project into at least two separate processes for easier management. One process is strictly related to business modeling, and therefore cannot be replicated. But the other is made up of purely technical processes that are always the same, regardless of the business environment we operate in, and thus can be highly automated. In this session, we will look at well-known patterns to solving common problems and how they can be automated with the help of specific tools and techniques that use metadata to reduce development time and bugs. Using these engineering techniques, you will be able to adopt an Agile approach to your BI solution.
Popular Tags:
48
Transcript
Page 1: Automating DWH Patterns Through Metadata
Page 2: Automating DWH Patterns Through Metadata

Automating Data Warehouse Patterns Through MetadataDavide [email protected]

Page 3: Automating DWH Patterns Through Metadata

Davide Mauri20 Years of experience on the SQL Server Platform

– Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence, Data Warehouse, Big Data & Analytics

Microsoft SQL Server MVPPresident of UGISS (Italian SQL Server UG)Mentor @ SolidQ

– Regular Speaker @ SQL Server events– Projects, Consulting, Mentoring & Training

Find me here:– Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx– Twitter:@mauridb

Page 4: Automating DWH Patterns Through Metadata

Building a DWH in 2013Is still a (almost) manual process

A *lot* of repetitive low-value work

No (or very few) standard tools available

Page 5: Automating DWH Patterns Through Metadata

How it should beSemi-automatic process

– “develop by intent”

Define the mapping logic from a semantic perspective– Source to Dimensions / Measures

• (Metadata anyone?)

Design the model and let the tool build it for you

CREATE DIMENSION CustomerFROM SourceCustomerTableMAP USING CustomerMetadata

ALTER DIMENSION CustomersADD ATTRIBUTE LoyaltyLevelAS TYPE 1

CREATE FACT OrdersFROM SourceOrdersTableMAP USING OrdersMetadata

ALTER FACT OrdersADD DIMENSION Customer

Page 6: Automating DWH Patterns Through Metadata

The perfect BI process & architecture

AGILE BI

Iterative!

Page 7: Automating DWH Patterns Through Metadata

DWH PROCESSESIs automation possible?

Page 8: Automating DWH Patterns Through Metadata

Invest on Automation?Faster development

– Reduce Costs– Embrace Changes

Less bugs

Increase solution quality and make it consistent throughout the whole product

Page 9: Automating DWH Patterns Through Metadata

Automation Pre-RequisitesSplit the process to have two separate type of processes

– What can be automated– What can NOT be automated

Create and impose a set of rules that defines– How to solve common technical problems– How to implement such identified solutions

Page 10: Automating DWH Patterns Through Metadata

No Monkey Work!Let the people think and let the machines do the «monkey» work.

Page 11: Automating DWH Patterns Through Metadata

Design Pattern“A general reusable solution to a commonly occurring problem within a given context”

Page 12: Automating DWH Patterns Through Metadata

Design PatternGeneric ETL Pattern

– Partition Load– Incremental/Differential Load

Generic BI Design Pattern– Slowly Changing Dimension

• SCD1, SCD2, ecc.– Fact Table

• Transactional, Snapshot, Temporal Snapshot

Page 13: Automating DWH Patterns Through Metadata

Design PatternSpecific SQL Server Patterns

– Change Data Capture– Change Tracking– Partition Load– SSIS Parallelism

Page 15: Automating DWH Patterns Through Metadata

Sample Rules• Always put «last_update» column• Always log Inserted/Updated/Deleted rows to

log.load_info table• Use MD5 – binary(16) for checksums• Use views to expose data

– Dimension & Fact views MUST use the same column names for lookup columns

Page 19: Automating DWH Patterns Through Metadata

Hi-Level Vision

STGETLETL

OLTP DWH

ETL

Technical Process

Business Process

Technical Process

Page 20: Automating DWH Patterns Through Metadata

ETL Phases«E» and «L» must be

– Simple, Easy and Straightforward– Completely Automated– Completely Reusable

«E» and «L» have ZERO value in a BI Solution– Should be done in the most economic way

Page 21: Automating DWH Patterns Through Metadata

PATTERN Well known solution to common problems

Page 22: Automating DWH Patterns Through Metadata

Source Full Load E

Page 23: Automating DWH Patterns Through Metadata

Source Incremental Load EIn this scenario, “ID” is a IDENTITY/SEQUENCE.Probably a PK.

Page 24: Automating DWH Patterns Through Metadata

Source Differential Load/1 E

In this scenario the source tabledoesn’t offer any specific way to Understand what’s changed

Page 25: Automating DWH Patterns Through Metadata

Source Differential Load/2 E

In this scenario the source table has a TimeStamp-Like column

Page 26: Automating DWH Patterns Through Metadata

Source Differential Load• SQL Server 2012 that can help with

incremental/differential load– Change Data Capture

• Natively supported in SSIS 2012• http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-ser

ver-2012-2/– Change Tracking

• Underused feature in BI…not so rich as CDC but MUCH more simpler and easier

E

Page 27: Automating DWH Patterns Through Metadata

SCD 1 & SCD 2 LStart

Lookup Dimension Id and MD5 Checksum From Business Key

Calculate MD5 Checksum of Non-SCD-Key Colums

Dimension Id is Null?YesInsert new members

into DWH No Checksum are different?

Yes

Store into temp table

Merge data from temp table to DWHEnd

Page 28: Automating DWH Patterns Through Metadata

SCD 2 Special Note• Merge => UPDATE Interval + INSERT New Row

L

Page 29: Automating DWH Patterns Through Metadata

FACT TABLE LOAD L

Page 30: Automating DWH Patterns Through Metadata

Partition Load EL

Page 31: Automating DWH Patterns Through Metadata

Parallel Load• Logically split the work in several steps

– E.g: Load/Process one customer at time• Create a «queue» table the stores information for each step

– Step 1 -> Load Customer «A»– Step 2 -> Load Customer «B»

• Create a Package that 1. Pick the first not already picked up 2. Do work3. Back to step 3

• Call the Package «n» times simultaneously

EL

Page 32: Automating DWH Patterns Through Metadata

Other SSIS Specific Patterns• Range Lookup

– Not natively supported – Matt Masson has the answer in his blog

• http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

Page 33: Automating DWH Patterns Through Metadata

METADATAA key ingredient in automation

Page 34: Automating DWH Patterns Through Metadata

MetadataProvide context information

– Which columns are used to build/feed a Dimension?

– Which columns are Business Keys?– Which table is the Fact Table?– How Fact and Dimension are connected?

• Which columns are used?

Page 35: Automating DWH Patterns Through Metadata

How to manage Metadata?• Naming Convention

• Extended Properties

• Specific, Ad Hoc Database or Tables

• Other (XML, File, ecc.)

Page 36: Automating DWH Patterns Through Metadata

Naming Convention• The easiest and cheapest

– No additional (hidden) costs– No need to be maintained– Never out-of-sync– No documentation need

• Actually, it IS PART of the documentation– Imposes a Standard

• Very limited in terms of flexibility and usage

Page 37: Automating DWH Patterns Through Metadata

Extended PropertiesSupport most of metadata needs

No additional software needed

Very verbose usage– Development of a wrapper to make usage simpler is

feasible and encouraged

Page 38: Automating DWH Patterns Through Metadata

Metadata ObjectsDedicated Ad-Hoc Database and Tables

As Flexible as you need

Maintenance Overhead to keep metadata in-sync with data– Development of automatic check procedure is needed– DMV can help a lot here

Page 39: Automating DWH Patterns Through Metadata

External Metadata ObjectsReally expensive to keep them in-sync

– A tool is needed, otherwise too much manual work

Does not give any specific benefits with respect to Ad-Hoc Database/Tables

Page 40: Automating DWH Patterns Through Metadata

DEMO

Page 41: Automating DWH Patterns Through Metadata

AUTOMATIONLet’s make it possible!

Page 42: Automating DWH Patterns Through Metadata

Automation Scenarios• Run-Time: «Auto-Configuring» Packages

– Really hard to customize packages– SSIS limitations must be managed

• Eg: Data Flow cannot be changed at runtime• On-the fly creation of package may be needed

• Design-Time: Package Generators / Package Templates– Easy to customize created packages

Page 43: Automating DWH Patterns Through Metadata

Automation Solutions• Specific Tool/frameworks

– BIML / MIST

• SQL Server Platform– SQL, PowerShell, .NET– SMO, AMO

Page 44: Automating DWH Patterns Through Metadata

Package GeneratorsRequired Assemblies

Microsoft.SqlServer.ManagedDTSMicrosoft.SqlServer.DTSRuntimeWrapMicrosoft.SqlServer.DTSPipelineWrap

Path:C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies

Page 45: Automating DWH Patterns Through Metadata

DEMO

Page 46: Automating DWH Patterns Through Metadata

Useful Resources• «STOCK» Tasks:

– http://msdn.microsoft.com/en-us/library/ms135956.aspx

• How to set Task properties at runtime:– http://technet.microsoft.com/en-us/library/microsoft

.sqlserver.dts.runtime.executables.add.aspx

Page 47: Automating DWH Patterns Through Metadata

BIML – BI Markup Language• Developed by Varigence

– http://www.varigence.com – http://bimlscript.com/ – MIST: BIML Full-Featured IDE

• Free via BIDS Helper– Support “limited” to SSIS package generation– http://bidshelper.codeplex.com

Page 48: Automating DWH Patterns Through Metadata

THANK YOU!• For attending this session and

PASS SQLRally Nordic 2013, Stockholm


Recommended