Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft...

Post on 04-Jan-2016

218 views 0 download

Tags:

transcript

Advanced ETL: Embedding Advanced ETL: Embedding Integration ServicesIntegration Services

Ashvini SharmaAshvini SharmaDevelopment LeadDevelopment LeadDAT411 DAT411 Microsoft Microsoft CorporationCorporation

Sergei IvanovSergei IvanovTechnical LeadTechnical LeadDAT411DAT411Microsoft Microsoft CorporationCorporation

2

PrerequisitesPrerequisites

Knowledge of Integration ServicesKnowledge of Integration Services

Knowledge of Data Flow FunctionalityKnowledge of Data Flow Functionality

Level 400. Really.Level 400. Really.

3

ObjectivesObjectives

Introduction to SSIS programming Introduction to SSIS programming modelmodel

Learn how to integrate with dynamic Learn how to integrate with dynamic metadatametadata

Learn how to utilize data cleansing Learn how to utilize data cleansing functionality in your appsfunctionality in your apps

4

Integration ServicesIntegration Services

5

SSIS TerminologySSIS TerminologyPackagePackage

TasksTasks

Precedence ConstraintsPrecedence Constraints

Connection ManagersConnection Managers

ContainersContainers

Data Flow TaskData Flow TaskComponents – Source Components – Source Adapters, Transformations, Adapters, Transformations, Destination AdaptersDestination Adapters

PathsPaths

6

Application OverviewApplication Overview

Get data from an Excel fileGet data from an Excel file

Provide fuzzy cleansing for certain Provide fuzzy cleansing for certain text fieldstext fields

FirstName, LastName FirstName, LastName

Save cleaned data in another Excel Save cleaned data in another Excel filefile

Look at finished application first, then Look at finished application first, then go through several iterations to build go through several iterations to build itit

7

Application Application

8

SSIS is embeddableSSIS is embeddableSQL Server uses SSISSQL Server uses SSIS

SMOSMO

Maintenance PlansMaintenance Plans

Other (non SQL) products in development are using Other (non SQL) products in development are using SSISSSIS

Writing your own UI is possibleWriting your own UI is possibleSSIS designer, Management Studio, Import/Export Wizard, SSIS designer, Management Studio, Import/Export Wizard, Migration WizardMigration Wizard

Uses Uses nono secret APIssecret APIsEnumerating/adding/removing/changing/listening/Enumerating/adding/removing/changing/listening/scheduling/…scheduling/…

Considering releasing Migration Wizard in Shared SourceConsidering releasing Migration Wizard in Shared Source

Digital signing enables tamper resistanceDigital signing enables tamper resistance

Several customers doing metadata driven package Several customers doing metadata driven package developmentdevelopment

9

Pipeline MetadataPipeline Metadata

Pipeline engine requires static Pipeline engine requires static metadatametadata

Early design decisionEarly design decision

Buffers laid out during pre executeBuffers laid out during pre executeStrict data typesStrict data types

Cannot map columns during executionCannot map columns during execution

Designer debugging expects design time Designer debugging expects design time metadata at execution timemetadata at execution time

Configured (dynamic) queries must Configured (dynamic) queries must resolve to design time metadata at resolve to design time metadata at runtimeruntime

10

Dynamic MetadataDynamic Metadata

ScenariosScenariosSource schema changes/not known until Source schema changes/not known until executionexecution

Metadata driven ETL processesMetadata driven ETL processes

Handling dynamic metadataHandling dynamic metadataGenerate data flows dynamicallyGenerate data flows dynamically

11

Creating PackagesCreating Packages

12

Creating PackagesCreating Packages

From scratch through object modelFrom scratch through object modelCreate all package elements from Create all package elements from scratchscratch

Fast, small, efficientFast, small, efficient

Harder to evolve the applicationHarder to evolve the application

From template packageFrom template packageAdjust only what needs adjusting after Adjust only what needs adjusting after loading the template packageloading the template package

Need to embed potentially large Need to embed potentially large template filetemplate file

Easier to evolve the applicationEasier to evolve the application

Digital signing detects user changesDigital signing detects user changes

13

Components TerminologyComponents TerminologyComponentComponent

InputInputInput Columns (Only data referenced by component)Input Columns (Only data referenced by component)

Virtual Input Columns (All available data produced by Virtual Input Columns (All available data produced by upstream components – used at design time for upstream components – used at design time for selecting input columns)selecting input columns)

External Metadata Columns (Schema snapshot)External Metadata Columns (Schema snapshot)

OutputOutputOutput Columns (Produced data)Output Columns (Produced data)

External Metadata Columns (Schema snapshot)External Metadata Columns (Schema snapshot)

LineageID uniquely identifies a column LineageID uniquely identifies a column Every output column gets a new Lineage IDEvery output column gets a new Lineage ID

Column MappingColumn MappingSources: ExternalColumn<->OutputColumnSources: ExternalColumn<->OutputColumn

Transforms: InputColumn<->OutputColumnTransforms: InputColumn<->OutputColumn

Destinations: InputColumn<->ExternalColumnDestinations: InputColumn<->ExternalColumn

14

Pipeline Programming Pipeline Programming ModelModel ComponentMetadataComponentMetadata

Provided for all Provided for all components by the components by the engine automaticallyengine automatically

Manages metadata and Manages metadata and persistence for the persistence for the componentcomponent

Contact information for Contact information for unregistered unregistered componentscomponents

Helps delay creation of Helps delay creation of components until components until necessarynecessary

Runtime Connection Runtime Connection CollectionCollection

Connection managers Connection managers used by the componentused by the component

ComponentMetaDataComponentMetaData

InputsInputs

OutputOutputss

ComponentComponent

RCCRCC

15

Configuring Data FlowsConfiguring Data Flows

16

Using Fuzzy transformsUsing Fuzzy transforms

17

SSIS As A SourceSSIS As A Source

ETL processes ETL processes typically encode typically encode complex business complex business rulesrules

Reuse is importantReuse is importantOne version of the One version of the truthtruth

Updates in one placeUpdates in one place

Leverage advantages Leverage advantages of SSIS: scalability, of SSIS: scalability, manageability, visual manageability, visual building of complex building of complex processes, etc.processes, etc.

18

SSIS Source SSIS Source ImplementationImplementation

Implements Implements IDbConnectionIDbConnection

ConnectionString is the ConnectionString is the command line args to command line args to dtexec.exedtexec.exe

CommandCommandCommandText is the CommandText is the name of the name of the DataReaderDest DataReaderDest component in packagecomponent in packageExecuteReader runs the ExecuteReader runs the package when asked for package when asked for data, returns IDataReaderdata, returns IDataReader

Supports SchemaOnly Supports SchemaOnly alsoalso

DataReaderDest DataReaderDest implements IDataReaderimplements IDataReaderGets the first buffer and Gets the first buffer and waits for data requestwaits for data request

Microsoft.SqlServer.Dts.DtsClientMicrosoft.SqlServer.Dts.DtsClient Data Reader Destination ComponentData Reader Destination Component

19

Putting it togetherPutting it together

20

SummarySummary

Programming SSIS is straightforward Programming SSIS is straightforward

Several embedding options existSeveral embedding options exist

SSIS can handle flexible metadataSSIS can handle flexible metadata

SSIS provides rich functionality and SSIS provides rich functionality and high performance high performance

21

ResourcesResources

Embedding Reporting and Analysis in your Embedding Reporting and Analysis in your Smart Client AppSmart Client App DAT313 – 502AB 5:00PM DAT313 – 502AB 5:00PM

Samples installed by setupSamples installed by setup

Community site, run by MVPsCommunity site, run by MVPshttp://www.sqlis.comhttp://www.sqlis.com

Interact with product team on MSDN Interact with product team on MSDN ForumsForums

http://forums.microsoft.com/msdn/http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=80 ShowForum.aspx?ForumID=80

Webcasts, training, blog links, books, …Webcasts, training, blog links, books, …http://msdn.microsoft.com/SQL/sqlwarehouse/http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx SSIS/default.aspx

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.