+ All Categories
Transcript
Page 1: Automating DWH Patterns Through Metadata
Page 2: Automating DWH Patterns Through Metadata

Automating Data Warehouse Patterns Through MetadataDavide [email protected]

Page 3: Automating DWH Patterns Through Metadata

Davide Mauri20 Years of experience on the SQL Server Platform

– Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence, Data Warehouse, Big Data & Analytics

Microsoft SQL Server MVPPresident of UGISS (Italian SQL Server UG)Mentor @ SolidQ

– Regular Speaker @ SQL Server events– Projects, Consulting, Mentoring & Training

Find me here:– Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx– Twitter:@mauridb

Page 4: Automating DWH Patterns Through Metadata

Building a DWH in 2013Is still a (almost) manual process

A *lot* of repetitive low-value work

No (or very few) standard tools available

Page 5: Automating DWH Patterns Through Metadata

How it should beSemi-automatic process

– “develop by intent”

Define the mapping logic from a semantic perspective– Source to Dimensions / Measures

• (Metadata anyone?)

Design the model and let the tool build it for you

CREATE DIMENSION CustomerFROM SourceCustomerTableMAP USING CustomerMetadata

ALTER DIMENSION CustomersADD ATTRIBUTE LoyaltyLevelAS TYPE 1

CREATE FACT OrdersFROM SourceOrdersTableMAP USING OrdersMetadata

ALTER FACT OrdersADD DIMENSION Customer

Page 6: Automating DWH Patterns Through Metadata

The perfect BI process & architecture

AGILE BI

Iterative!

Page 7: Automating DWH Patterns Through Metadata

DWH PROCESSESIs automation possible?

Page 8: Automating DWH Patterns Through Metadata

Invest on Automation?Faster development

– Reduce Costs– Embrace Changes

Less bugs

Increase solution quality and make it consistent throughout the whole product

Page 9: Automating DWH Patterns Through Metadata

Automation Pre-RequisitesSplit the process to have two separate type of processes

– What can be automated– What can NOT be automated

Create and impose a set of rules that defines– How to solve common technical problems– How to implement such identified solutions

Page 10: Automating DWH Patterns Through Metadata

No Monkey Work!Let the people think and let the machines do the «monkey» work.

Page 11: Automating DWH Patterns Through Metadata

Design Pattern“A general reusable solution to a commonly occurring problem within a given context”

Page 12: Automating DWH Patterns Through Metadata

Design PatternGeneric ETL Pattern

– Partition Load– Incremental/Differential Load

Generic BI Design Pattern– Slowly Changing Dimension

• SCD1, SCD2, ecc.– Fact Table

• Transactional, Snapshot, Temporal Snapshot

Page 13: Automating DWH Patterns Through Metadata

Design PatternSpecific SQL Server Patterns

– Change Data Capture– Change Tracking– Partition Load– SSIS Parallelism

Page 15: Automating DWH Patterns Through Metadata

Sample Rules• Always put «last_update» column• Always log Inserted/Updated/Deleted rows to

log.load_info table• Use MD5 – binary(16) for checksums• Use views to expose data

– Dimension & Fact views MUST use the same column names for lookup columns

Page 19: Automating DWH Patterns Through Metadata

Hi-Level Vision

STGETLETL

OLTP DWH

ETL

Technical Process

Business Process

Technical Process

Page 20: Automating DWH Patterns Through Metadata

ETL Phases«E» and «L» must be

– Simple, Easy and Straightforward– Completely Automated– Completely Reusable

«E» and «L» have ZERO value in a BI Solution– Should be done in the most economic way

Page 21: Automating DWH Patterns Through Metadata

PATTERN Well known solution to common problems

Page 22: Automating DWH Patterns Through Metadata

Source Full Load E

Page 23: Automating DWH Patterns Through Metadata

Source Incremental Load EIn this scenario, “ID” is a IDENTITY/SEQUENCE.Probably a PK.

Page 24: Automating DWH Patterns Through Metadata

Source Differential Load/1 E

In this scenario the source tabledoesn’t offer any specific way to Understand what’s changed

Page 25: Automating DWH Patterns Through Metadata

Source Differential Load/2 E

In this scenario the source table has a TimeStamp-Like column

Page 26: Automating DWH Patterns Through Metadata

Source Differential Load• SQL Server 2012 that can help with

incremental/differential load– Change Data Capture

• Natively supported in SSIS 2012• http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-ser

ver-2012-2/– Change Tracking

• Underused feature in BI…not so rich as CDC but MUCH more simpler and easier

E

Page 27: Automating DWH Patterns Through Metadata

SCD 1 & SCD 2 LStart

Lookup Dimension Id and MD5 Checksum From Business Key

Calculate MD5 Checksum of Non-SCD-Key Colums

Dimension Id is Null?YesInsert new members

into DWH No Checksum are different?

Yes

Store into temp table

Merge data from temp table to DWHEnd

Page 28: Automating DWH Patterns Through Metadata

SCD 2 Special Note• Merge => UPDATE Interval + INSERT New Row

L

Page 29: Automating DWH Patterns Through Metadata

FACT TABLE LOAD L

Page 30: Automating DWH Patterns Through Metadata

Partition Load EL

Page 31: Automating DWH Patterns Through Metadata

Parallel Load• Logically split the work in several steps

– E.g: Load/Process one customer at time• Create a «queue» table the stores information for each step

– Step 1 -> Load Customer «A»– Step 2 -> Load Customer «B»

• Create a Package that 1. Pick the first not already picked up 2. Do work3. Back to step 3

• Call the Package «n» times simultaneously

EL

Page 32: Automating DWH Patterns Through Metadata

Other SSIS Specific Patterns• Range Lookup

– Not natively supported – Matt Masson has the answer in his blog

• http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

Page 33: Automating DWH Patterns Through Metadata

METADATAA key ingredient in automation

Page 34: Automating DWH Patterns Through Metadata

MetadataProvide context information

– Which columns are used to build/feed a Dimension?

– Which columns are Business Keys?– Which table is the Fact Table?– How Fact and Dimension are connected?

• Which columns are used?

Page 35: Automating DWH Patterns Through Metadata

How to manage Metadata?• Naming Convention

• Extended Properties

• Specific, Ad Hoc Database or Tables

• Other (XML, File, ecc.)

Page 36: Automating DWH Patterns Through Metadata

Naming Convention• The easiest and cheapest

– No additional (hidden) costs– No need to be maintained– Never out-of-sync– No documentation need

• Actually, it IS PART of the documentation– Imposes a Standard

• Very limited in terms of flexibility and usage

Page 37: Automating DWH Patterns Through Metadata

Extended PropertiesSupport most of metadata needs

No additional software needed

Very verbose usage– Development of a wrapper to make usage simpler is

feasible and encouraged

Page 38: Automating DWH Patterns Through Metadata

Metadata ObjectsDedicated Ad-Hoc Database and Tables

As Flexible as you need

Maintenance Overhead to keep metadata in-sync with data– Development of automatic check procedure is needed– DMV can help a lot here

Page 39: Automating DWH Patterns Through Metadata

External Metadata ObjectsReally expensive to keep them in-sync

– A tool is needed, otherwise too much manual work

Does not give any specific benefits with respect to Ad-Hoc Database/Tables

Page 40: Automating DWH Patterns Through Metadata

DEMO

Page 41: Automating DWH Patterns Through Metadata

AUTOMATIONLet’s make it possible!

Page 42: Automating DWH Patterns Through Metadata

Automation Scenarios• Run-Time: «Auto-Configuring» Packages

– Really hard to customize packages– SSIS limitations must be managed

• Eg: Data Flow cannot be changed at runtime• On-the fly creation of package may be needed

• Design-Time: Package Generators / Package Templates– Easy to customize created packages

Page 43: Automating DWH Patterns Through Metadata

Automation Solutions• Specific Tool/frameworks

– BIML / MIST

• SQL Server Platform– SQL, PowerShell, .NET– SMO, AMO

Page 44: Automating DWH Patterns Through Metadata

Package GeneratorsRequired Assemblies

Microsoft.SqlServer.ManagedDTSMicrosoft.SqlServer.DTSRuntimeWrapMicrosoft.SqlServer.DTSPipelineWrap

Path:C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies

Page 45: Automating DWH Patterns Through Metadata

DEMO

Page 46: Automating DWH Patterns Through Metadata

Useful Resources• «STOCK» Tasks:

– http://msdn.microsoft.com/en-us/library/ms135956.aspx

• How to set Task properties at runtime:– http://technet.microsoft.com/en-us/library/microsoft

.sqlserver.dts.runtime.executables.add.aspx

Page 47: Automating DWH Patterns Through Metadata

BIML – BI Markup Language• Developed by Varigence

– http://www.varigence.com – http://bimlscript.com/ – MIST: BIML Full-Featured IDE

• Free via BIDS Helper– Support “limited” to SSIS package generation– http://bidshelper.codeplex.com

Page 48: Automating DWH Patterns Through Metadata

THANK YOU!• For attending this session and

PASS SQLRally Nordic 2013, Stockholm


Top Related